Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus Institute for Advanced Computer Studies and CLIP lab HumanComputer Interaction Lab Department of Computer Science ID: 547755
Download Presentation The PPT/PDF document "Generating High-Coverage" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus
†
Institute for Advanced Computer Studies and CLIP lab
‡
Human-Computer Interaction LabDepartment of Computer Science, University of Maryland. *Human Language Technology Center of Excellence.
Saif Mohammad
†
,
Cody Dunne
‡
,
and
Bonnie Dorr
†∗Slide2
Evaluative sentencesSony’s new digital camera is fabulous.The characters in the movie are flawed.
Creative solutions are valued.Singapore has an immaculate transportation system.
Our waters have never been more contaminated.
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
2Slide3
Evaluative sentencesSony’s new digital camera is fabulous.
The characters in the movie are flawed.
Creative
solutions are valued.
Singapore has an immaculate transportation system.Our waters have never been more contaminated.Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.3Slide4
Semantic orientationPositive semantic orientation (SO) (or polarity) Term is often used to convey favorable sentiment or evaluation of the target.
E.g.: excellent,
happy,
honest, …
Negative semantic orientation Term is often used to convey unfavorable sentiment or evaluation of the target. E.g.: poor, sad, dishonest, …
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
4Slide5
ApplicationsAutomatic product recommendation systems (Tatemura, 2000; Terveen1 et al., 1997)
Question answering (Somasundaran et al., 2007;
Lita et al., 2005)Summarizing multiple view points and opinions
(Seki et al., 2004; Mohammad et al., 2008a)Identifying flames
(Spertus, 1997)Appropriate ad placement(Jin et al. 2007)Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.5Slide6
Manually created lexiconsGeneral Inquirer (GI) (Stone et al., 1966)http://
www.wjh.harvard.edu/inquirerhas labels for only about 3,600 entriesPittsburgh subjectivity lexicon (PSL)
(Wilson et al., 2005)http://
www.cs.pitt.edu/mpqadraws from the General Inquirer and other sources
has labels for only for about 8,000 words.Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.6Slide7
Automatically created lexiconsHatzivassiloglou and
McKeown (1997) a supervised algorithm to determine the semantic orientation of adjectives. Turney
and Littman lexicon (TLL) (2003)Exploit tendency to co-occur with a seed setNeed very large corpora (100 billion words)
Esuli and Sebastiani
(2006) — SentiWordNet (SWN) Attach labels to WordNet synsetsUse supervised classifiersNeed significant manual annotationGenerating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.7Slide8
Semantic oppositeness scaleGenerating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
antonymous
not antonymous
big
–
small
big
–
large
many antonym pairs have opposite
semantic orientation (one positive, one negative)
good
–
bad
;
beautifu
l
–
ugly
;
honest
–
dishonest
8Slide9
Detecting word-pair antonymy:Mohammad, Dorr, Hirst (2008)
Use affix patterns to identify seed pairs of strong antonyms.Use a Roget-like thesaurus to identify near-synonyms of seed words.Mark pairs of words near-synonymous to seed pairs as contrasting.
The degree of antonymy is proportional to their tendency to co-occur.Created a list of more than 3 million strongly antonymous word pairs.
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
9Slide10
Our approachIdentify a seed set of positive and negative words:From edicts of marking theoryIdentify their synonyms:
Use a Roget-like thesaurus Mark as negative: words synonymous with a negative seed
Mark as positive: words synonymous to a positive seed
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
10Slide11
Step 1: Identify seed wordsFrom marking theory:
Overtly marked words tend to be negative.E.g.,
undo, unhappy
, dishonest,
immobileTheir unmarked counterparts tend to be positive.E.g., do, happy, honest, mobile
Exceptions exist:
impartial
—
partial
,
unbiased
—
biased
,
unstuck
—
stuck
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
11Slide12
Affix patternsGenerating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
12
word1
word2
# of word pairs
example pairs
X
dis
X
382
honest
–
dishonest
X
im
X
196
possible
–
impossible
X
in
X
691
consistent
–
inconsistent
X
mal
X
28
adroit
–
maladroit
X
mis
X
146
fortune
–
misfortune
X
non
X
73
sense
–
nonsense
X
un
X
844
happy
–
unhappy
X
X
less
208
gut
–
gutless
l
X
ill
X
25
legal
–
illegal
r
X
ir
X
48
responsible
–
irresponsible
X
less
X
ful
51
harmless
–
harmful
Total
2692Slide13
Step 2: Identify synonyms of seed words
Take synonyms from a Roget-like thesaurusWe used the Macquarie Thesaurus
Has 98,000 word-typesGenerating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
13Slide14
Thesaurus categories
All words classified into ~1000 categories
ability
absence accept
accompanied action affect affirm agree allow approach ask assemble attack attitude awareness
be
beautiful
beings
belief
better
big
blood
body
breath
calm
care for
careful
cause
certain
change
choice
clean
clear
collect
colors
comfort
concern
conflict
connect
continue
control
convex
correct
count
courtesy
…
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
14Slide15
Example category entry
369
HONESTY
adj. paragraph honest above board
authentic
bona fide
legit
…
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
noun paragraph
bona fides
reliability
soundness
trueness
trustiness
…
adj. paragraph
reliable
sound
steadfast
trustworthy
trusty
…
15
noun paragraph
honesty
incorruptness
integrity
probity
sincerity
…Slide16
369
HONESTY
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
16Words in each paragraph are near-synonyms.Step 2: Identify synonyms of seed words
adj. paragraph
honest
above board
authentic
bona fide
legit
…
noun paragraph
bona fides
reliability
soundness
trueness
trustiness
…
adj. paragraph
reliable
sound
steadfast
trustworthy
trusty
…
noun paragraph
honesty
incorruptness
integrity
probity
sincerity
…Slide17
adj. paragraph
reliable
sound steadfast trustworthy trusty …
adj. paragraph
honest
above board
authentic
bona fide
legit
…
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
17
Seed pair:
honest — dishonest
(positive)
(negative)
+
+
+
+
+
Seed pair:
reliable — unreliable
(positive)
(negative)
+
+
+
+
+
Step 3:
Mark as positive
synonyms of positive seeds
369
HONESTY
noun paragraph
bona fides
reliability
soundness
trueness
trustiness
…
noun paragraph
honesty
incorruptness
integrity
probity
sincerity
…Slide18
370
DISHONESTY
noun paragraph crookedness dishonesty fraudulence improbity trickery
…
adj. paragraph
crooked
dishonest
knavish
shady
unjust
…
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
…
…
18
Seed pair:
honest — dishonest
(positive)
(negative)
-
-
-
-
-
Step 4:
Mark as negative
synonyms of negative seedsSlide19
Majority votingAll words in a paragraph assigned identical orientation.If multiple seeds in the same paragraph:simple voting determines orientation.
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
19
369
HONESTY
noun paragraph
honesty
incorruptness
integrity
probity
sincerity
…
Seed pairs:
honesty — dishonesty
(positive)
(negative)
+
-
corruptness
—
incorruptness
(positive)
(negative)
+
probity
…
—
improbity
(positive)
(negative)
+
sincerity
..
—
insincerity
(positive)
(negative)Slide20
369
HONESTY
noun paragraph
honesty
incorruptness
integrity
probity
sincerity
…
Majority voting
All words in a paragraph have identical orientation.
If multiple seeds in the same paragraph:
simple voting determines orientation.
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
20
+
+
+
+
+
Positive orientation has majority, so all words in the paragraph are marked positive.Slide21
Sense and word lexicons
Macquarie Semantic Orientation Lexicon (MSOL)Assigns orientation to word—category combinationsCategories are coarse word senses
Most natural language text is not sense disambiguated
We create word lexicons from MSOL and SentiWordNet
By choosing for each word the orientation most common amongst its senses Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.21Slide22
Size of lexiconsSentiWordNet (SWN)56,200 entries
(85.1sitive
and 14.9% negative) Affix seeds lexicon (ASL)5,031 entries (47.3% positive and 52.7% negative)
MSOL(ASL)51,157 entries
(66.8% positive and 33.2% negative) 3,643 multi-word expressionsMSOL(ASL and GI)Uses both affix pairs and GI entries as seeds76,400 entries (39.9% positive and 60.1% negative)Available for download:http://www.umiacs.umd.edu/~saif/WebPages/ResearchInterests.html#SemanticOrientationGenerating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
22Slide23
Intrinsic evaluation:The percentage of GI entries that match those of the automatically generated lexicons. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
23
F-scoreSlide24
Extrinsic evaluationGold standard of phrases manually annotated with semantic orientation:MPQA corpus (version 1.1)positive phrases (1726) and negative phrases (4485)
A simple algorithm to determine the polarity of a phrase: If target phrase has a negative word, then the phrase is marked negative.If target phrase has no negative word and has at least one positive word, then it is marked positive.
Otherwise, the classifier refrains from assigning a tag.
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
24Even better accuracies: supervised classifiers and more sophisticated context features (Choi and Cardie, 2008).Slide25
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.25
F-score
Extrinsic evaluation:
Performance of phrase polarity tagging.
No semantic-orientation labeled data used.Slide26
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.26
F-score
Extrinsic evaluation:
Performance of phrase polarity tagging.Using GI labels.Slide27
Orientation of thesaurus categoriesGenerating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
Red:
negative;
Blue: positive; Size of node:
intensity; Edge: oppositeness27Slide28
Polyanna HypothesisPeople use positive expressions morefrequently than negative expressions.
(Boucher and Osgood, 1969; Kelly, 2000)
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
28
5031 entries
Percentage of entriesSlide29
Polyanna HypothesisPeople use positive expressions morefrequently than negative expressions.
(Boucher and Osgood, 1969; Kelly, 2000)
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
29
5031 entriesPercentage of entries
51157 entriesSlide30
SummaryCreated a high-coverage semantic orientation lexicon:using only affix rules and a Roget-like thesaurus.no manually annotated semantic orientation labels required.
The lexicon:has about twenty times the number of entries in GI.has entries for both single-words and common multi-word expressions.
more useful in phrase-polarity annotation than SentiWordNet, GI, or the Turney
lexicon.Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
30Slide31
Future workCreating even better semantic orientation lexicons by combining:our approach (affix rules and thesaurus) with the
Turney–Littman 2003 method (co-occurrence statistics).Create orientation lexicons for resource-poor languages.
use a bilingual dictionaryuse English thesaurususe affix rules from both (multiple) languages.
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
31Slide32
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.32Slide33
Automatic approaches:sentiment analysisThose that rely on a lexical-semantic resource
Most use WordNet Strapparava and Valitutti
, 2004; Hu and Liu, 2004; Kamps
et al., 2004; Takamura et al., 2005; Esuli
and Sebastiani, 2006; Andreevskaia and Bergler, 2006; Kanayama and Nasukawa, 2006Those that rely only on text corporaHatzivassiloglou and McKeown, 1997; Turney and Littman, 2003; Yu and Hatzivassiloglou
, 2003;
Grefenstette
et al., 2004
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.
33Slide34
Intrinsic evaluation:The percentage of GI entries that match those of the automatically generated lexicons.
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.34
F-scoreSlide35
Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.35
F-score
Extrinsic evaluation:
Performance of phrase polarity tagging.
Using GI labels.