112729 Announcement Project reports next week same drill as midterm reports reverse order as midterm reports W e know youre not done yet but you will be by midnight Mon 1210 right ID: 624687
Download Presentation The PPT/PDF document "Style and Influence in Social Text" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Style and Influence in Social Text
11-27-29Slide2
Announcement
Project reports next week
same drill as midterm reports
reverse order as midterm reports
W
e know you’re not done yet
… but you will be by midnight Mon 12/10, right?
start with one slide summarizing midtermSlide3
FCE’s
Are now open
We do read them…and people do care
Especially this year
free-text comments on assignments/structure/layout of course
very
welcomeSlide4
Puzzle time
Ths
sntnc
hs
n
vwls
i
eee
a o
osoaSlide5
Today’s topics
Summary:
there are
signals
in
common words
What can you
infer
from how
people use the most frequent words in text?Slide6
Today’s topics
Summary
: there are
signals
in
common words
What can you
infer
from how
people use the most frequent words
in
text
?Slide7
Today’s topics
Summary: there are signals in common words
What can you infer from how people use the most frequent words in text?Slide8
Today’s topics
Summary: there are signals in common words
What can you infer from how people use the most frequent words in text?
Patterns of usage
”literary style”
predicts: authorship, gender, …
Style changes according to situation
and is transmitted from person to person
Outline:
some background and two recent papersSlide9Slide10
Background: Authorship attribution
Mosteller
and Wallace, 1964. “Inference and Disputed Authorship”: frequency of function words can be used to classify documents by author.
Function words are not under conscious control
Function word use is independent of content
Histogram of function words is okSlide11
Authorship attribution
Schlomo
Argamon
,
Schlomo
Levitan
SVM on histogram
of 200 most frequent wordsSlide12
COLING 2006Slide13Slide14
LIWC
1986: writing about emotional upheavals improved physical health (!)
Can you refine this statement?
what
sort
of writings yield the
best
results?
but: people don’t agree on ratings
and: “judges tend to get depressed when reading depressing stories.”so: design an automatic “instrument” to rate writings (Linguistic Inquiry and Word Count) based on most frequent wordsSlide15
LIWC words - cover about 55% of the tokens (not types) in most text
Categories are mostly designed by hand, by committeeSlide16Slide17Slide18Slide19Slide20
Another signal of rank: starting a fashionSlide21
most frequent 200 words
Is literary style like fashion? Can you track literary influence? Can you find high-status, influential people by modeling literary style?Slide22
most frequent 200 wordsSlide23Slide24
People adopt each other’s mannerisms and style in many ways….Slide25Slide26Slide27
Corpus
Pennebaker
&
Niederhoffer
, 2002:
98 pairs in the lab + Watergate tapes
Twitter A:
1.3M “conversations” between 300k users--many are too short to analyze successfully
Twitter B: More crawling
all pairs with 2+ conversationsall posts from these pairs15M tweets, 7800 users, 215k conversations, 2200 pairsSlide28
Measuring “cohesion” for a property CSlide29
Measuring “cohesion”
Tweet T contains word from class C
Reply R contains word from class C
T and R are a “turn”Slide30Slide31
Measuring “accommodation” and “influence”
T
b
, from
b,
is a reply to
T
a
, from
a Slide32
T
b
uses word class C in a reply to
a
T
b
uses word class C in a reply to
a
after
a
uses CSlide33
Evidence of
fashion
in linguistic style spreading through a conversation
Time lag suggests
influence
not
associative sorting
We don’t have anything like direction…..Slide34
If
Acc
(
a,b
)>0:
Symmetric:
Acc
(
b,a
) > 0
Default asymmetric:
Acc
(
b,a
) = 0
Divergent asymmetric:
Acc
(
b,a
) < 0Slide35
Does one party accommodate more than the other?
Accommodation does
not
correlate with “status” features like #followers, #days on Twitter, ….Slide36
????
Does one party accommodate more than the other?Slide37Slide38
Datasets
Wikipedia:
wikipedia
editors
talk
pages: 240k conversations; plus 32k discussions over who gets promoted to admins.
Status: admin
vs
non-admin
Dependence: learning to support/rejectSupreme court: 50k verbal exchanges for 204 cases.Status: chief justice vs justice vs lawyerDependence
: leaning to support/learning to rejectSlide39
Experiments
Similar notion of “coordination” (=
accomodation
)
Hypotheses:
e.g., you accommodate more when speaking to a big shot
and he coordinates less with other peopleSlide40Slide41
more coordination with admins than non-admins
admins coordinate
more
with others than non-adminsSlide42
admins coordinate
more
with others than non-admins
Why?
Maybe the folks that become admins are different somehow?
eg
more accommodating?Slide43
the people that
eventually become admins
coordinate more than people
who
eventually fail to become adminsSlide44
revised hypothesis:
after
you become an admin you will coordinate with others
less
than you did beforeSlide45
What about the court dataset?Slide46
What about the court dataset?Slide47
Status prediction
Given conversation between
x,y
predict if
status(x)>status(y)
or vice-versa
Very easy to do in Supreme Court domain (“your honor,….”)
Hard for humans in Wikipedia (inter-annotator
aggrement ~= 80%, accuracy ~=70%)Slide48Slide49
One more observation…Slide50Slide51
So to summarize…
Summary: there are signals in common words
Even though we don’t think about how we use them
Patterns of usage
”literary style”
predicts: authorship, gender, …
Style changes according to situation
and is transmitted from person to person
you can observe that transmission (accommodation, coordination) and determine its direction
the direction of accommodation it tells you something about the status of the speakers