/
Modeling User Interactions Modeling User Interactions

Modeling User Interactions - PowerPoint Presentation

giovanna-bartolotta
giovanna-bartolotta . @giovanna-bartolotta
Follow
393 views
Uploaded On 2015-11-30

Modeling User Interactions - PPT Presentation

in Social Media Eugene Agichtein Emory University Outline Usergenerated content Community Question Answering Contributor authority Content quality Asker satisfaction Open problems 3 Trends in search and social media ID: 210168

asker question answer answers question asker answers answer questions user number average satisfaction data 2008 information prediction features search

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Modeling User Interactions" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Modeling User Interactions in Social Media

Eugene Agichtein

Emory UniversitySlide2

Outline

User-generated content

Community Question Answering

Contributor authority

Content quality

Asker satisfaction

Open problemsSlide3

3Slide4

Trends in search and social media

Search in the East:

Heavily influenced by social media:

Naver

,

Baidu Knows, TaskCn, ..

Search in the West:

Social media mostly indexed/integrated in search repositories

Two opposite trends in social media search:

Moving towards point relevance (answers, knowledge search)

Moving towards browsing experience, subscription/push model

How

to integrate “active” engagement and contribution with “passive” viewing of content?Slide5

Social Media Today

Published:

4Gb/day

Social Media:

10Gb/Day

Page views:

180-200Gb/day

Technorati+Blogpulse

~120M blogs

~2M posts/day

Twitter: since 11/07:~2M users~3M msgs/dayFacebook/Myspace: 200-300M usersAverage 19 min/dayYahoo Answers90M users, ~20M questions, ~400M answers

[From Andrew Tomkins/Yahoo!, SSM2008 Keynote]Slide6

People Helping People

Naver

: popularity

reportedly exceeds

web searchYahoo! Answers: some users answer thousands of questions

dailyAnd get a t-shirtOpen,

“quirky”,

information shared, not

“sold”

Unlike Wikipedia:

Chatty threads: opinions, support, validationNo core group of moderators to enforce “quality”6Slide7

Where is the nearest car rental to Carnegie Mellon University?Slide8

8Slide9

9Slide10

10

Successful Search

Give up on “magic”.

Lookup

CMU address/

zipcodeGoogle maps 

Query: “car rental near:5000 Forbes Avenue Pittsburgh, PA 15213”Slide11

11

Total time: 7-10 minutes, active “work”Slide12

Someone must know this… Slide13

13

+0 minutes : 11pmSlide14

14Slide15

15Slide16

16

+1 minuteSlide17

17

+36 minutesSlide18

+7 hours:

perfect answerSlide19

Why would one wait hours?

Rational thinking: effective use of time

Unique information need

Subjective/normative question

ComplexHuman contact/communityMultiple viewpointsSlide20

20

http://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdO Slide21

21

Challenges in

____

ing Social Media

Estimating

contributor expertiseEstimating

content quality

Infering

user intent

Predicting

satisfaction: general, personalizedMatching askers with answerersSearching archivesDetecting spamSlide22

22

Work done in collaboration with:

Qi Guo

Yandong Liu

Abulimiti Aji

Thanks:

Prof. Hongyuan Zha

Jiang Bian

Yahoo! Research:

ChaTo

Castillo,

Gilad

Mishne

,

Aris

Gionis

, Debora

Donato

, Ravi Kumar

Pawel JurczykSlide23

Related Work

Adamic

et al

., WWW 2007, WWW 2008

Expertise sharing, network structure

Kumar et al.: Info diffusion in blogspaceHarper et al., CHI 2008: Answer qualityLescovec

et al: Cascades, preferential attachment models

Glance & Hurst: Blogging

Kraut et al

.: community participation and retention

SSM 2008 Workshop (Searching Social Media)Elsas et al, blog search, ICWSM 2008s23Slide24

24

Estimating Contributor Authority

Question 1

Question 2

Answer 5

Answer 1

Answer 2

Answer 4

Answer 3

User 1

User 2

User 3

User 6

User 4

User 5

Answer 6

Question 3

User 1

User 2

User 3

User 6

User 4

User 5

P. Jurczyk

and E. Agichtein,

Discovering Authorities in Question Answer Communities Using Link Analysis

(poster), CIKM 2007

Hub (asker)

Authority (answerer)Slide25

25

Finding Authorities: ResultsSlide26

26

Qualitative Observations

HITS effective

HITS ineffectiveSlide27

27

TrollsSlide28

28

Estimating Content Quality

E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne,

Finding High Quality Content in Social Media,

WSDM 2008 Slide29

29

29Slide30

30

30Slide31

31

31Slide32

32

32Slide33

33

33

CommunitySlide34

34

34Slide35

35

35Slide36

36Slide37

37

37Slide38

38

from all subsets, as follows:

UQV Average number of "stars" to questions by the same

asker.

; The punctuation density in the question's subject.

; The question's category (assigned by the asker).; \Normalized Clickthrough:" The number of clicks on

the question thread, normalized by the average number

of clicks for all questions in its category.

UAV Average number of "Thumbs up" received by answers

written by the asker of the current question.

; Number of words per sentence.UA Average number of answers with references (URLs)given by the asker of the current question.UQ Fraction of questions asked by the asker in which heopens the question's answers to voting (instead of pick-ing the best answer by hand).UQ Average length of the questions by the asker.UAV The number of \best answers" authored by the user.U The number of days the user was active in the system.UAV \Thumbs up" received by the answers wrote by theasker of the current question, minus \thumbs down",divided by total number of \thumbs" received.; \Clicks over Views:" The number of clicks on a ques-tion thread divided by the number of times the ques-tion thread was retrieved as a search result (see [2]).; The KL-divergence between the question's languagemodel and a model estimated from a collection of ques-tion answered by the Yahoo editorial team (availablein http://ask.yahoo.com).Slide39

39

39Slide40

40

; Answer length.

; The number of words in the answer with a corpus

fre

-

quency larger than c.UAV The number of \thumbs up" minus \thumbs down" re-ceived

by the answerer, divided by the total number of

\thumbs" s/he has received.

; The entropy of the trigram character-level model of

the answer.

UAV The fraction of answers of the answerer that have beenpicked as best answers (either by the askers of suchquestions, or by a community voting).; The unique number of words in the answer.U Average number of abuse reports received by the an-swerer over all his/her questions and answers.UAV Average number of abuse reports received by the an-swerer over his/her answers.; The non-stopword word overlap between the questionand the answer.; The Kincaid [21] score of the answer.QUA The average number of answers received by the ques-tions asked by the asker of this answer.; The ratio between the length of the question and thelength of the answer.UAV The number of \thumbs up" minus \thumbs down" re-ceived by the answerer.QUAV The average numbers of \thumbs" received by the an-swers to other questions asked by the asker of this an-swer.Slide41

Rating Dynamics

41Slide42

42

42

Editorial Quality != Popularity != UsefulnessSlide43

43

Yahoo! Answers: Time to Fulfillment

1. 2006 FIFA World Cup

2. Optical

3. Poetry

4. Football (American)

5. Scottish Football (Soccer)

Time to close a question (hours) for sample question categories

Time to close (hours)

6. Medicine

7. Winter Sports8. Special Education9. General Health Care10. Outdoor RecreationSlide44

44

Predicting Asker Satisfaction

Given

a question submitted by an asker in CQA, predict whether the user will be

satisfied

with the answers contributed by the community.

“Satisfied”

:

The

asker has closed the question ANDSelected the best answer ANDRated best answer >= 3 “stars” Else, “Unsatisfied

Yandong Liu

Jiang Bian

Y. Liu, J. Bian, and E. Agichtein,

Predicting Information Seeker Satisfaction in Community Question Answering

, in SIGIR 2008 Slide45

45

Motivation

Save time: don’t bother to post

Suggest a good forum for information need

Notify user when satisfactory answer contributed

From “relevance” to information need fulfillmentExplicit ratings from asker & communitySlide46

46

ASP: Asker Satisfaction Prediction

asker is

satisfied

asker is not

satisfied

Text

Category

Answerer History

Asker History

Answer

Question

Wikipedia

News

ClassifierSlide47

47

Datasets

Crawled from Yahoo! Answers in early 2008 (Thanks, Yahoo!)

Questions

Answers

Askers

Categories

% Satisfied

216,170

1,963,615

158,515

100

50.7%

Available at

http://ir.mathcs.emory.edu/shared Slide48

48

Dataset Statistics

Category

#Q

#A

#A per Q

Satisfied

Avg asker rating

Time to close by asker

2006 FIFA World Cup(TM)

1194

35659

329.86

55.4%

2.63

47 minutes

Mental Health

151

1159

7.68

70.9%

4.30

1 day and 13 hours

Mathematics

651

2329

3.58

44.5%

4.48

33 minutes

Diet & Fitness

450

2436

5.41

68.4%

4.30

1.5 days

Asker satisfaction varies by category

#Q, #A, Time to close

-> Asker SatisfactionSlide49

49

Satisfaction Prediction: Human Judges

Truth: asker’s rating

A random sample of 130 questions

Researchers

Agreement: 0.82 F1: 0.45

Amazon Mechanical Turk

Five workers per question.

Agreement: 0.9 F1: 0.61.

Best when at least 4 out of 5 raters agreeSlide50

50

ASP vs. Humans (F1)

Classifier

With Text

Without Text

Selected Features

ASP_SVM

0.69

0.72

0.62

ASP_C4.5

0.75

0.76

0.77

ASP_RandomForest

0.70

0.74

0.68

ASP_Boosting

0.67

0.67

0.67

ASP_NB

0.61

0.65

0.58

Best Human Perf

0.61

Baseline (naïve)

0.66

ASP is significantly more effective than humans

Human F1 is lower than the na

ï

ve baseline!Slide51

51

Features by Information Gain

0.14219

Q: Askers’ previous rating

0.13965

Q: Average past rating by asker

0.10237

UH: Member since (interval)

0.04878 UH: Average # answers for by past Q0.04878 UH: Previous Q resolved for the asker0.04381 CA: Average rating for the category0.04306

UH: Total number of answers received0.03274 CA: Average voter rating

0.03159

Q: Question posting time

0.02840

CA: Average # answers per QSlide52

52

“Offline” vs. “Online” Prediction

Offline prediction:

All features( question, answer, asker & category)

F1: 0.77

Online prediction:

NO

answer features

Only asker history and question features (stars, #comments, sum of votes…)

F1: 0.74Slide53

53

Feature Ablation

Precision

Recall

F1

Selected features

0.80

0.73

0.77

No question-answer features

0.76

0.74

0.75

No answerer features

0.76

0.75

0.75

No category features

0.75

0.76

0.75

No asker features

0.72

0.69

0.71

No question features

0.68

0.72

0.70

Asker & Question features are most important.

Answer quality/Answerer expertise/Category characteristics:

may not be important

caring or supportive answers often preferredSlide54

54

54

Satisfaction: varying by asker experience

Group together questions from askers with the same number of previous questions

Accuracy of prediction increase dramatically

Reaching F1 of 0.9 for askers with >= 5 questionsSlide55

55

Personalized

Prediction of Asker Satisfaction with info

Same information != same usefulness for different users!

Personalized classifier achieves surprisingly good accuracy (even with just 1 previous question!)

Simple strategy of grouping users by number of previous questions is more effective than other methods for users with moderate amount of history

For users with >= 20 questions, textual features are more significantSlide56

56

Some Personalized ModelsSlide57

57

Satisfaction Prediction When

Grouping

Users by “Age”Slide58

58

Self-Selection: First Experience Crucial

Days as member vs. rating

# prev questions vs. ratingSlide59

59

Summary

Asker satisfaction is predictable

Can achieve

higher than human

accuracy by exploiting interaction history

User’s experience is important

General model: one-size-fits-all

2000 questions for training model are enough

Personalized

satisfaction prediction:Helps with sufficient data (>= 1 prev interactions, can observe text patterns with >=20 prev. interactions)Slide60

Problems

Sparsity

: most users post only a single question

Cold start problem

CF: individualize content, no (visible) rating history C.f:

Digg: ratings are publicSubjective information needs

60Slide61

61Slide62

62

Subjectivity in CQA

How can we exploit structure of CQA to improve question classification?

Case Study: Question Subjectivity Prediction

Subjective

: Has anyone got one of those home blood pressure monitors?

and if so what make is it and

do you think they are worth getting?

Objective: What is the difference between chemotherapy and radiation treatments?62B. Li, Y. Liu, and E. Agichtein, CoCQA: Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation, in EMNLP 2008 Slide63

63

Dataset Statistics (~1000 questions)

http://ir.mathcs.emory.edu/shared/

Objective

SubjectiveSlide64

64

Key Observations

Analysis of real questions in CQA is challenging:

Typically complex and subjective

Can be ill-phrased and vague

Not enough annotated data

Idea:

Can we utilize the inherent

structure

of the CQA interactions, and use

unlabeled CQA data to improve classification performance?64Slide65

65

Natural Approach: Co-Training

Introduced in:

Combining labeled and unlabeled data with co-training

, Blum and Mitchell, 1998

Two views of the data

E.g.: content and hyperlinks in web pages

Provide complementary information

Iteratively construct additional labeled data

65Slide66

66

Questions and Answers: Two Views

Example:

Q: Has anyone got one of those home blood pressure monitors? and if so what make is it and do you think they are worth getting?

A: My mom

has one as she is diabetic so its important for her to monitor it she finds it useful.

Answers

usually

match/fit question

My mom… she finds…

Askers can usually identify matching answers by selecting the “best answer”66Slide67

67

CoCQA

: A Co-Training Framework over Questions and Answers

67

Labeled

Data

C

Q

C

A

Q

A

Unlabeled Data

??????????

??????????

Q

A

+--++--

--++--+

Unlabeled

Data

??????????

??????????

Validation

(Holdout training

data)

Classify

StopSlide68

68

Results Summary

Features

Method

Question

Question+

Best Answer

Supervised

0.717

0.695

GE

0.712 (-0.7%)

0.717 (+3.2%)

CoCQA

0.731 (+1.9%)

0.745

(+7.2%)

68Slide69

69

CoCQA for varying amount of labeled data

69Slide70

70

Summary

User-generated Content

Growing

Important: impact on main-stream media, scholarly publishing, …

Can provide insight into information seeking and social processes

“Training” data for IR, machine learning, NLP, ….

Need to re-think quality, impact, usefulnessSlide71

71

Current work

Intelligently route a question to ``good’’ answerers

Improve web search ranking by incorporating CQA data

``Cost’’ models for CQA-based question processing vs. other methods

Dynamics of User Feedbacks

Discourse

analysisSlide72

72

Takeaways

People specify their information need fully when they know humans are on the other end

Next generation of search must be able to cope with complex, subjective, and personal information needs

To move beyond relevance, must be able to model user satisfaction

CQA

generates rich data to allow us (and other researchers) to study user satisfaction, interactions, intent for

real

usersSlide73

Estimating

contributor expertise

[CIKM 2007]

Estimating

content quality [WSDM 2008]

Inferring asker intent [EMNLP 2008]Predicting

satisfaction

[SIGIR 2008, ACL 2008]

Matching

askers with answerers

Searching CQA archives [WWW 2008]Coping with spam [AIRWeb 2008]Thank you!http://www.mathcs.emory.edu/~eugene Slide74

Backup SlidesSlide75

75

75

Question-Answer Features

Q: length, posting time…

QA: length, KL divergence

Q:Votes

Q:TermsSlide76

76

76

User Features

U: Member since

U: Total points

U: #Questions

U: #AnswersSlide77

77

77

Category Features

CA: Average time to close a question

CA: Average # answers per question

CA: Average asker rating

CA: Average voter rating

CA: Average # questions per hour

CA: Average # answers per hour

Category

#Q

#A

#A per Q

Satisfied

Avg asker rating

Time to close by asker

General Health

134

737

5.46

70.4%

4.49

1 day and 13 hoursSlide78

Backup slidesSlide79

79

79

Prediction Methods

Heuristic:

# answers

Baseline:

guess the majority class (satisfied).

ASP:

(our system)

ASP_SVM:

Our system with the SVM classifierASP_C4.5: with the C4.5 classifierASP_RandomForest

: with the RandomForest

classifier

ASP_Boosting

:

with the

AdaBoost

algorithm combining weak learners

ASP_NaiveBayes

:

with the Naive

Bayes

classifier

…Slide80

80

80

Satisfaction Prediction: Human Perf (Cont’d): Amazon Mechanical Turk

Methodology

Used the same 130 questions

For each question, list the best answer, as well as other four answers ordered by votes

Five independent raters for each question.

Agreement: 0.9 F1: 0.61.

Best accuracy achieved when at least 4 out of 5 raters predicted asker to be ‘satisfied’ (otherwise, labeled as “unsatisfied”).Slide81

81

81

Some

ResultsSlide82

82

Details of CoCQA implementation

Base classifier

LibSVM

Term Frequency as Term Weight

Also tried Binary, TF*IDF

Select top K examples with highest confidence

Margin value in SVM

82Slide83

83

Feature Set

Character 3-grams

has, any, nyo, yon, one…

Words

Has, anyone, got, mom, she, finds…

Word with Character 3-grams

Word n-grams (n<=3, i.e.

W

i

, WiWi+1, WiWi+1Wi+2)Has anyone got, anyone got one, she finds it…Word and POS n-gram (n<=3, i.e. Wi, Wi

Wi+1, Wi POSi+1

,

POS

i

W

i+1

,

POS

i

POS

i+1

, etc

.

)

NP VBP, She PRP, VBP finds…

83