/
Generating High-Coverage Generating High-Coverage

Generating High-Coverage - PowerPoint Presentation

stefany-barnette
stefany-barnette . @stefany-barnette
Follow
384 views
Uploaded On 2017-05-13

Generating High-Coverage - PPT Presentation

Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus Institute for Advanced Computer Studies and CLIP lab HumanComputer Interaction Lab Department of Computer Science ID: 547755

semantic orientation mohammad lexicons orientation semantic lexicons mohammad dorr generating dunne negative positive paragraph words entries word seed honesty

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Generating High-Coverage" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus

Institute for Advanced Computer Studies and CLIP lab

Human-Computer Interaction LabDepartment of Computer Science, University of Maryland. *Human Language Technology Center of Excellence.

Saif Mohammad

,

Cody Dunne

,

and

Bonnie Dorr

†∗Slide2

Evaluative sentencesSony’s new digital camera is fabulous.The characters in the movie are flawed.

Creative solutions are valued.Singapore has an immaculate transportation system.

Our waters have never been more contaminated.

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

2Slide3

Evaluative sentencesSony’s new digital camera is fabulous.

The characters in the movie are flawed.

Creative

solutions are valued.

Singapore has an immaculate transportation system.Our waters have never been more contaminated.Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.3Slide4

Semantic orientationPositive semantic orientation (SO) (or polarity) Term is often used to convey favorable sentiment or evaluation of the target.

E.g.: excellent,

happy,

honest, …

Negative semantic orientation Term is often used to convey unfavorable sentiment or evaluation of the target. E.g.: poor, sad, dishonest, …

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

4Slide5

ApplicationsAutomatic product recommendation systems (Tatemura, 2000; Terveen1 et al., 1997)

Question answering (Somasundaran et al., 2007;

Lita et al., 2005)Summarizing multiple view points and opinions

(Seki et al., 2004; Mohammad et al., 2008a)Identifying flames

(Spertus, 1997)Appropriate ad placement(Jin et al. 2007)Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.5Slide6

Manually created lexiconsGeneral Inquirer (GI) (Stone et al., 1966)http://

www.wjh.harvard.edu/inquirerhas labels for only about 3,600 entriesPittsburgh subjectivity lexicon (PSL)

(Wilson et al., 2005)http://

www.cs.pitt.edu/mpqadraws from the General Inquirer and other sources

has labels for only for about 8,000 words.Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.6Slide7

Automatically created lexiconsHatzivassiloglou and

McKeown (1997) a supervised algorithm to determine the semantic orientation of adjectives. Turney

and Littman lexicon (TLL) (2003)Exploit tendency to co-occur with a seed setNeed very large corpora (100 billion words)

Esuli and Sebastiani

(2006) — SentiWordNet (SWN) Attach labels to WordNet synsetsUse supervised classifiersNeed significant manual annotationGenerating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.7Slide8

Semantic oppositeness scaleGenerating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

antonymous

not antonymous

big

small

big

large

many antonym pairs have opposite

semantic orientation (one positive, one negative)

good

bad

;

beautifu

l

ugly

;

honest

dishonest

8Slide9

Detecting word-pair antonymy:Mohammad, Dorr, Hirst (2008)

Use affix patterns to identify seed pairs of strong antonyms.Use a Roget-like thesaurus to identify near-synonyms of seed words.Mark pairs of words near-synonymous to seed pairs as contrasting.

The degree of antonymy is proportional to their tendency to co-occur.Created a list of more than 3 million strongly antonymous word pairs.

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

9Slide10

Our approachIdentify a seed set of positive and negative words:From edicts of marking theoryIdentify their synonyms:

Use a Roget-like thesaurus Mark as negative: words synonymous with a negative seed

Mark as positive: words synonymous to a positive seed

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

10Slide11

Step 1: Identify seed wordsFrom marking theory:

Overtly marked words tend to be negative.E.g.,

undo, unhappy

, dishonest,

immobileTheir unmarked counterparts tend to be positive.E.g., do, happy, honest, mobile

Exceptions exist:

impartial

partial

,

unbiased

biased

,

unstuck

stuck

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

11Slide12

Affix patternsGenerating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

12

word1

word2

# of word pairs

example pairs

X

dis

X

382

honest

dishonest

X

im

X

196

possible

impossible

X

in

X

691

consistent

inconsistent

X

mal

X

28

adroit

maladroit

X

mis

X

146

fortune

misfortune

X

non

X

73

sense

nonsense

X

un

X

844

happy

unhappy

X

X

less

208

gut

gutless

l

X

ill

X

25

legal

illegal

r

X

ir

X

48

responsible

irresponsible

X

less

X

ful

51

harmless

harmful

Total

2692Slide13

Step 2: Identify synonyms of seed words

Take synonyms from a Roget-like thesaurusWe used the Macquarie Thesaurus

Has 98,000 word-typesGenerating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

13Slide14

Thesaurus categories

All words classified into ~1000 categories

ability

absence accept

accompanied action affect affirm agree allow approach ask assemble attack attitude awareness

be

beautiful

beings

belief

better

big

blood

body

breath

calm

care for

careful

cause

certain

change

choice

clean

clear

collect

colors

comfort

concern

conflict

connect

continue

control

convex

correct

count

courtesy

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

14Slide15

Example category entry

369

HONESTY

adj. paragraph honest above board

authentic

bona fide

legit

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

noun paragraph

bona fides

reliability

soundness

trueness

trustiness

adj. paragraph

reliable

sound

steadfast

trustworthy

trusty

15

noun paragraph

honesty

incorruptness

integrity

probity

sincerity

…Slide16

369

HONESTY

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

16Words in each paragraph are near-synonyms.Step 2: Identify synonyms of seed words

adj. paragraph

honest

above board

authentic

bona fide

legit

noun paragraph

bona fides

reliability

soundness

trueness

trustiness

adj. paragraph

reliable

sound

steadfast

trustworthy

trusty

noun paragraph

honesty

incorruptness

integrity

probity

sincerity

…Slide17

adj. paragraph

reliable

sound steadfast trustworthy trusty …

adj. paragraph

honest

above board

authentic

bona fide

legit

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

17

Seed pair:

honest — dishonest

(positive)

(negative)

+

+

+

+

+

Seed pair:

reliable — unreliable

(positive)

(negative)

+

+

+

+

+

Step 3:

Mark as positive

synonyms of positive seeds

369

HONESTY

noun paragraph

bona fides

reliability

soundness

trueness

trustiness

noun paragraph

honesty

incorruptness

integrity

probity

sincerity

…Slide18

370

DISHONESTY

noun paragraph crookedness dishonesty fraudulence improbity trickery

adj. paragraph

crooked

dishonest

knavish

shady

unjust

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

18

Seed pair:

honest — dishonest

(positive)

(negative)

-

-

-

-

-

Step 4:

Mark as negative

synonyms of negative seedsSlide19

Majority votingAll words in a paragraph assigned identical orientation.If multiple seeds in the same paragraph:simple voting determines orientation.

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

19

369

HONESTY

noun paragraph

honesty

incorruptness

integrity

probity

sincerity

Seed pairs:

honesty — dishonesty

(positive)

(negative)

+

-

corruptness

incorruptness

(positive)

(negative)

+

probity

improbity

(positive)

(negative)

+

sincerity

..

insincerity

(positive)

(negative)Slide20

369

HONESTY

noun paragraph

honesty

incorruptness

integrity

probity

sincerity

Majority voting

All words in a paragraph have identical orientation.

If multiple seeds in the same paragraph:

simple voting determines orientation.

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

20

+

+

+

+

+

Positive orientation has majority, so all words in the paragraph are marked positive.Slide21

Sense and word lexicons

Macquarie Semantic Orientation Lexicon (MSOL)Assigns orientation to word—category combinationsCategories are coarse word senses

Most natural language text is not sense disambiguated

We create word lexicons from MSOL and SentiWordNet

By choosing for each word the orientation most common amongst its senses Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.21Slide22

Size of lexiconsSentiWordNet (SWN)56,200 entries

(85.1sitive

and 14.9% negative) Affix seeds lexicon (ASL)5,031 entries (47.3% positive and 52.7% negative)

MSOL(ASL)51,157 entries

(66.8% positive and 33.2% negative) 3,643 multi-word expressionsMSOL(ASL and GI)Uses both affix pairs and GI entries as seeds76,400 entries (39.9% positive and 60.1% negative)Available for download:http://www.umiacs.umd.edu/~saif/WebPages/ResearchInterests.html#SemanticOrientationGenerating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

22Slide23

Intrinsic evaluation:The percentage of GI entries that match those of the automatically generated lexicons. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

23

F-scoreSlide24

Extrinsic evaluationGold standard of phrases manually annotated with semantic orientation:MPQA corpus (version 1.1)positive phrases (1726) and negative phrases (4485)

A simple algorithm to determine the polarity of a phrase: If target phrase has a negative word, then the phrase is marked negative.If target phrase has no negative word and has at least one positive word, then it is marked positive.

Otherwise, the classifier refrains from assigning a tag.

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

24Even better accuracies: supervised classifiers and more sophisticated context features (Choi and Cardie, 2008).Slide25

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.25

F-score

Extrinsic evaluation:

Performance of phrase polarity tagging.

No semantic-orientation labeled data used.Slide26

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.26

F-score

Extrinsic evaluation:

Performance of phrase polarity tagging.Using GI labels.Slide27

Orientation of thesaurus categoriesGenerating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

Red:

negative;

Blue: positive; Size of node:

intensity; Edge: oppositeness27Slide28

Polyanna HypothesisPeople use positive expressions morefrequently than negative expressions.

(Boucher and Osgood, 1969; Kelly, 2000)

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

28

5031 entries

Percentage of entriesSlide29

Polyanna HypothesisPeople use positive expressions morefrequently than negative expressions.

(Boucher and Osgood, 1969; Kelly, 2000)

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

29

5031 entriesPercentage of entries

51157 entriesSlide30

SummaryCreated a high-coverage semantic orientation lexicon:using only affix rules and a Roget-like thesaurus.no manually annotated semantic orientation labels required.

The lexicon:has about twenty times the number of entries in GI.has entries for both single-words and common multi-word expressions.

more useful in phrase-polarity annotation than SentiWordNet, GI, or the Turney

lexicon.Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

30Slide31

Future workCreating even better semantic orientation lexicons by combining:our approach (affix rules and thesaurus) with the

Turney–Littman 2003 method (co-occurrence statistics).Create orientation lexicons for resource-poor languages.

use a bilingual dictionaryuse English thesaurususe affix rules from both (multiple) languages.

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

31Slide32

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.32Slide33

Automatic approaches:sentiment analysisThose that rely on a lexical-semantic resource

Most use WordNet Strapparava and Valitutti

, 2004; Hu and Liu, 2004; Kamps

et al., 2004; Takamura et al., 2005; Esuli

and Sebastiani, 2006; Andreevskaia and Bergler, 2006; Kanayama and Nasukawa, 2006Those that rely only on text corporaHatzivassiloglou and McKeown, 1997; Turney and Littman, 2003; Yu and Hatzivassiloglou

, 2003;

Grefenstette

et al., 2004

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

33Slide34

Intrinsic evaluation:The percentage of GI entries that match those of the automatically generated lexicons.

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.34

F-scoreSlide35

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.35

F-score

Extrinsic evaluation:

Performance of phrase polarity tagging.

Using GI labels.