in Social Media Eugene Agichtein Emory University Outline Usergenerated content Community Question Answering Contributor authority Content quality Asker satisfaction Open problems 3 Trends in search and social media ID: 210168
Download Presentation The PPT/PDF document "Modeling User Interactions" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Modeling User Interactions in Social Media
Eugene Agichtein
Emory UniversitySlide2
Outline
User-generated content
Community Question Answering
Contributor authority
Content quality
Asker satisfaction
Open problemsSlide3
3Slide4
Trends in search and social media
Search in the East:
Heavily influenced by social media:
Naver
,
Baidu Knows, TaskCn, ..
Search in the West:
Social media mostly indexed/integrated in search repositories
Two opposite trends in social media search:
Moving towards point relevance (answers, knowledge search)
Moving towards browsing experience, subscription/push model
How
to integrate “active” engagement and contribution with “passive” viewing of content?Slide5
Social Media Today
Published:
4Gb/day
Social Media:
10Gb/Day
Page views:
180-200Gb/day
Technorati+Blogpulse
~120M blogs
~2M posts/day
Twitter: since 11/07:~2M users~3M msgs/dayFacebook/Myspace: 200-300M usersAverage 19 min/dayYahoo Answers90M users, ~20M questions, ~400M answers
[From Andrew Tomkins/Yahoo!, SSM2008 Keynote]Slide6
People Helping People
Naver
: popularity
reportedly exceeds
web searchYahoo! Answers: some users answer thousands of questions
dailyAnd get a t-shirtOpen,
“quirky”,
information shared, not
“sold”
Unlike Wikipedia:
Chatty threads: opinions, support, validationNo core group of moderators to enforce “quality”6Slide7
Where is the nearest car rental to Carnegie Mellon University?Slide8
8Slide9
9Slide10
10
Successful Search
Give up on “magic”.
Lookup
CMU address/
zipcodeGoogle maps
Query: “car rental near:5000 Forbes Avenue Pittsburgh, PA 15213”Slide11
11
Total time: 7-10 minutes, active “work”Slide12
Someone must know this… Slide13
13
+0 minutes : 11pmSlide14
14Slide15
15Slide16
16
+1 minuteSlide17
17
+36 minutesSlide18
+7 hours:
perfect answerSlide19
Why would one wait hours?
Rational thinking: effective use of time
Unique information need
Subjective/normative question
ComplexHuman contact/communityMultiple viewpointsSlide20
20
http://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdO Slide21
21
Challenges in
____
ing Social Media
Estimating
contributor expertiseEstimating
content quality
Infering
user intent
Predicting
satisfaction: general, personalizedMatching askers with answerersSearching archivesDetecting spamSlide22
22
Work done in collaboration with:
Qi Guo
Yandong Liu
Abulimiti Aji
Thanks:
Prof. Hongyuan Zha
Jiang Bian
Yahoo! Research:
ChaTo
Castillo,
Gilad
Mishne
,
Aris
Gionis
, Debora
Donato
, Ravi Kumar
Pawel JurczykSlide23
Related Work
Adamic
et al
., WWW 2007, WWW 2008
Expertise sharing, network structure
Kumar et al.: Info diffusion in blogspaceHarper et al., CHI 2008: Answer qualityLescovec
et al: Cascades, preferential attachment models
Glance & Hurst: Blogging
Kraut et al
.: community participation and retention
SSM 2008 Workshop (Searching Social Media)Elsas et al, blog search, ICWSM 2008s23Slide24
24
Estimating Contributor Authority
Question 1
Question 2
Answer 5
Answer 1
Answer 2
Answer 4
Answer 3
User 1
User 2
User 3
User 6
User 4
User 5
Answer 6
Question 3
User 1
User 2
User 3
User 6
User 4
User 5
P. Jurczyk
and E. Agichtein,
Discovering Authorities in Question Answer Communities Using Link Analysis
(poster), CIKM 2007
Hub (asker)
Authority (answerer)Slide25
25
Finding Authorities: ResultsSlide26
26
Qualitative Observations
HITS effective
HITS ineffectiveSlide27
27
TrollsSlide28
28
Estimating Content Quality
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne,
Finding High Quality Content in Social Media,
WSDM 2008 Slide29
29
29Slide30
30
30Slide31
31
31Slide32
32
32Slide33
33
33
CommunitySlide34
34
34Slide35
35
35Slide36
36Slide37
37
37Slide38
38
from all subsets, as follows:
UQV Average number of "stars" to questions by the same
asker.
; The punctuation density in the question's subject.
; The question's category (assigned by the asker).; \Normalized Clickthrough:" The number of clicks on
the question thread, normalized by the average number
of clicks for all questions in its category.
UAV Average number of "Thumbs up" received by answers
written by the asker of the current question.
; Number of words per sentence.UA Average number of answers with references (URLs)given by the asker of the current question.UQ Fraction of questions asked by the asker in which heopens the question's answers to voting (instead of pick-ing the best answer by hand).UQ Average length of the questions by the asker.UAV The number of \best answers" authored by the user.U The number of days the user was active in the system.UAV \Thumbs up" received by the answers wrote by theasker of the current question, minus \thumbs down",divided by total number of \thumbs" received.; \Clicks over Views:" The number of clicks on a ques-tion thread divided by the number of times the ques-tion thread was retrieved as a search result (see [2]).; The KL-divergence between the question's languagemodel and a model estimated from a collection of ques-tion answered by the Yahoo editorial team (availablein http://ask.yahoo.com).Slide39
39
39Slide40
40
; Answer length.
; The number of words in the answer with a corpus
fre
-
quency larger than c.UAV The number of \thumbs up" minus \thumbs down" re-ceived
by the answerer, divided by the total number of
\thumbs" s/he has received.
; The entropy of the trigram character-level model of
the answer.
UAV The fraction of answers of the answerer that have beenpicked as best answers (either by the askers of suchquestions, or by a community voting).; The unique number of words in the answer.U Average number of abuse reports received by the an-swerer over all his/her questions and answers.UAV Average number of abuse reports received by the an-swerer over his/her answers.; The non-stopword word overlap between the questionand the answer.; The Kincaid [21] score of the answer.QUA The average number of answers received by the ques-tions asked by the asker of this answer.; The ratio between the length of the question and thelength of the answer.UAV The number of \thumbs up" minus \thumbs down" re-ceived by the answerer.QUAV The average numbers of \thumbs" received by the an-swers to other questions asked by the asker of this an-swer.Slide41
Rating Dynamics
41Slide42
42
42
Editorial Quality != Popularity != UsefulnessSlide43
43
Yahoo! Answers: Time to Fulfillment
1. 2006 FIFA World Cup
2. Optical
3. Poetry
4. Football (American)
5. Scottish Football (Soccer)
Time to close a question (hours) for sample question categories
Time to close (hours)
6. Medicine
7. Winter Sports8. Special Education9. General Health Care10. Outdoor RecreationSlide44
44
Predicting Asker Satisfaction
Given
a question submitted by an asker in CQA, predict whether the user will be
satisfied
with the answers contributed by the community.
“Satisfied”
:
The
asker has closed the question ANDSelected the best answer ANDRated best answer >= 3 “stars” Else, “Unsatisfied
Yandong Liu
Jiang Bian
Y. Liu, J. Bian, and E. Agichtein,
Predicting Information Seeker Satisfaction in Community Question Answering
, in SIGIR 2008 Slide45
45
Motivation
Save time: don’t bother to post
Suggest a good forum for information need
Notify user when satisfactory answer contributed
From “relevance” to information need fulfillmentExplicit ratings from asker & communitySlide46
46
ASP: Asker Satisfaction Prediction
asker is
satisfied
asker is not
satisfied
Text
Category
Answerer History
Asker History
Answer
Question
Wikipedia
News
ClassifierSlide47
47
Datasets
Crawled from Yahoo! Answers in early 2008 (Thanks, Yahoo!)
Questions
Answers
Askers
Categories
% Satisfied
216,170
1,963,615
158,515
100
50.7%
Available at
http://ir.mathcs.emory.edu/shared Slide48
48
Dataset Statistics
Category
#Q
#A
#A per Q
Satisfied
Avg asker rating
Time to close by asker
2006 FIFA World Cup(TM)
1194
35659
329.86
55.4%
2.63
47 minutes
Mental Health
151
1159
7.68
70.9%
4.30
1 day and 13 hours
Mathematics
651
2329
3.58
44.5%
4.48
33 minutes
Diet & Fitness
450
2436
5.41
68.4%
4.30
1.5 days
Asker satisfaction varies by category
#Q, #A, Time to close
…
-> Asker SatisfactionSlide49
49
Satisfaction Prediction: Human Judges
Truth: asker’s rating
A random sample of 130 questions
Researchers
Agreement: 0.82 F1: 0.45
Amazon Mechanical Turk
Five workers per question.
Agreement: 0.9 F1: 0.61.
Best when at least 4 out of 5 raters agreeSlide50
50
ASP vs. Humans (F1)
Classifier
With Text
Without Text
Selected Features
ASP_SVM
0.69
0.72
0.62
ASP_C4.5
0.75
0.76
0.77
ASP_RandomForest
0.70
0.74
0.68
ASP_Boosting
0.67
0.67
0.67
ASP_NB
0.61
0.65
0.58
Best Human Perf
0.61
Baseline (naïve)
0.66
ASP is significantly more effective than humans
Human F1 is lower than the na
ï
ve baseline!Slide51
51
Features by Information Gain
0.14219
Q: Askers’ previous rating
0.13965
Q: Average past rating by asker
0.10237
UH: Member since (interval)
0.04878 UH: Average # answers for by past Q0.04878 UH: Previous Q resolved for the asker0.04381 CA: Average rating for the category0.04306
UH: Total number of answers received0.03274 CA: Average voter rating
0.03159
Q: Question posting time
0.02840
CA: Average # answers per QSlide52
52
“Offline” vs. “Online” Prediction
Offline prediction:
All features( question, answer, asker & category)
F1: 0.77
Online prediction:
NO
answer features
Only asker history and question features (stars, #comments, sum of votes…)
F1: 0.74Slide53
53
Feature Ablation
Precision
Recall
F1
Selected features
0.80
0.73
0.77
No question-answer features
0.76
0.74
0.75
No answerer features
0.76
0.75
0.75
No category features
0.75
0.76
0.75
No asker features
0.72
0.69
0.71
No question features
0.68
0.72
0.70
Asker & Question features are most important.
Answer quality/Answerer expertise/Category characteristics:
may not be important
caring or supportive answers often preferredSlide54
54
54
Satisfaction: varying by asker experience
Group together questions from askers with the same number of previous questions
Accuracy of prediction increase dramatically
Reaching F1 of 0.9 for askers with >= 5 questionsSlide55
55
Personalized
Prediction of Asker Satisfaction with info
Same information != same usefulness for different users!
Personalized classifier achieves surprisingly good accuracy (even with just 1 previous question!)
Simple strategy of grouping users by number of previous questions is more effective than other methods for users with moderate amount of history
For users with >= 20 questions, textual features are more significantSlide56
56
Some Personalized ModelsSlide57
57
Satisfaction Prediction When
Grouping
Users by “Age”Slide58
58
Self-Selection: First Experience Crucial
Days as member vs. rating
# prev questions vs. ratingSlide59
59
Summary
Asker satisfaction is predictable
Can achieve
higher than human
accuracy by exploiting interaction history
User’s experience is important
General model: one-size-fits-all
2000 questions for training model are enough
Personalized
satisfaction prediction:Helps with sufficient data (>= 1 prev interactions, can observe text patterns with >=20 prev. interactions)Slide60
Problems
Sparsity
: most users post only a single question
Cold start problem
CF: individualize content, no (visible) rating history C.f:
Digg: ratings are publicSubjective information needs
60Slide61
61Slide62
62
Subjectivity in CQA
How can we exploit structure of CQA to improve question classification?
Case Study: Question Subjectivity Prediction
Subjective
: Has anyone got one of those home blood pressure monitors?
and if so what make is it and
do you think they are worth getting?
Objective: What is the difference between chemotherapy and radiation treatments?62B. Li, Y. Liu, and E. Agichtein, CoCQA: Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation, in EMNLP 2008 Slide63
63
Dataset Statistics (~1000 questions)
http://ir.mathcs.emory.edu/shared/
Objective
SubjectiveSlide64
64
Key Observations
Analysis of real questions in CQA is challenging:
Typically complex and subjective
Can be ill-phrased and vague
Not enough annotated data
Idea:
Can we utilize the inherent
structure
of the CQA interactions, and use
unlabeled CQA data to improve classification performance?64Slide65
65
Natural Approach: Co-Training
Introduced in:
Combining labeled and unlabeled data with co-training
, Blum and Mitchell, 1998
Two views of the data
E.g.: content and hyperlinks in web pages
Provide complementary information
Iteratively construct additional labeled data
65Slide66
66
Questions and Answers: Two Views
Example:
Q: Has anyone got one of those home blood pressure monitors? and if so what make is it and do you think they are worth getting?
A: My mom
has one as she is diabetic so its important for her to monitor it she finds it useful.
Answers
usually
match/fit question
My mom… she finds…
Askers can usually identify matching answers by selecting the “best answer”66Slide67
67
CoCQA
: A Co-Training Framework over Questions and Answers
67
Labeled
Data
C
Q
C
A
Q
A
Unlabeled Data
??????????
??????????
Q
A
+--++--
--++--+
Unlabeled
Data
??????????
??????????
Validation
(Holdout training
data)
Classify
StopSlide68
68
Results Summary
Features
Method
Question
Question+
Best Answer
Supervised
0.717
0.695
GE
0.712 (-0.7%)
0.717 (+3.2%)
CoCQA
0.731 (+1.9%)
0.745
(+7.2%)
68Slide69
69
CoCQA for varying amount of labeled data
69Slide70
70
Summary
User-generated Content
Growing
Important: impact on main-stream media, scholarly publishing, …
Can provide insight into information seeking and social processes
“Training” data for IR, machine learning, NLP, ….
Need to re-think quality, impact, usefulnessSlide71
71
Current work
Intelligently route a question to ``good’’ answerers
Improve web search ranking by incorporating CQA data
``Cost’’ models for CQA-based question processing vs. other methods
Dynamics of User Feedbacks
Discourse
analysisSlide72
72
Takeaways
People specify their information need fully when they know humans are on the other end
Next generation of search must be able to cope with complex, subjective, and personal information needs
To move beyond relevance, must be able to model user satisfaction
CQA
generates rich data to allow us (and other researchers) to study user satisfaction, interactions, intent for
real
usersSlide73
Estimating
contributor expertise
[CIKM 2007]
Estimating
content quality [WSDM 2008]
Inferring asker intent [EMNLP 2008]Predicting
satisfaction
[SIGIR 2008, ACL 2008]
Matching
askers with answerers
Searching CQA archives [WWW 2008]Coping with spam [AIRWeb 2008]Thank you!http://www.mathcs.emory.edu/~eugene Slide74
Backup SlidesSlide75
75
75
Question-Answer Features
Q: length, posting time…
QA: length, KL divergence
Q:Votes
Q:TermsSlide76
76
76
User Features
U: Member since
U: Total points
U: #Questions
U: #AnswersSlide77
77
77
Category Features
CA: Average time to close a question
CA: Average # answers per question
CA: Average asker rating
CA: Average voter rating
CA: Average # questions per hour
CA: Average # answers per hour
Category
#Q
#A
#A per Q
Satisfied
Avg asker rating
Time to close by asker
General Health
134
737
5.46
70.4%
4.49
1 day and 13 hoursSlide78
Backup slidesSlide79
79
79
Prediction Methods
Heuristic:
# answers
Baseline:
guess the majority class (satisfied).
ASP:
(our system)
ASP_SVM:
Our system with the SVM classifierASP_C4.5: with the C4.5 classifierASP_RandomForest
: with the RandomForest
classifier
ASP_Boosting
:
with the
AdaBoost
algorithm combining weak learners
ASP_NaiveBayes
:
with the Naive
Bayes
classifier
…Slide80
80
80
Satisfaction Prediction: Human Perf (Cont’d): Amazon Mechanical Turk
Methodology
Used the same 130 questions
For each question, list the best answer, as well as other four answers ordered by votes
Five independent raters for each question.
Agreement: 0.9 F1: 0.61.
Best accuracy achieved when at least 4 out of 5 raters predicted asker to be ‘satisfied’ (otherwise, labeled as “unsatisfied”).Slide81
81
81
Some
ResultsSlide82
82
Details of CoCQA implementation
Base classifier
LibSVM
Term Frequency as Term Weight
Also tried Binary, TF*IDF
Select top K examples with highest confidence
Margin value in SVM
82Slide83
83
Feature Set
Character 3-grams
has, any, nyo, yon, one…
Words
Has, anyone, got, mom, she, finds…
Word with Character 3-grams
Word n-grams (n<=3, i.e.
W
i
, WiWi+1, WiWi+1Wi+2)Has anyone got, anyone got one, she finds it…Word and POS n-gram (n<=3, i.e. Wi, Wi
Wi+1, Wi POSi+1
,
POS
i
W
i+1
,
POS
i
POS
i+1
, etc
.
)
NP VBP, She PRP, VBP finds…
83