Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnickstanfordedu Dan Preston dprestonstanfordedu OpenTablecom Short Characters Words Sparse An unexpected combination of LeftBank Paris ID: 489087
Download Presentation The PPT/PDF document "Great Food, Lousy Service" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Great Food, Lousy Service
Topic Modeling for Sentiment Analysis in Sparse Reviews
Robin Melnickrmelnick@stanford.edu
Dan Preston
dpreston@stanford.eduSlide2
OpenTable.comSlide3
Short
Characters
WordsSlide4
Sparse
“An unexpected combination of Left-Bank Paris and Lower Manhattan in Omaha.
Divine. Inspirational and a great value.”Food?Ambiance?Service?Noise?Slide5
SkewedSlide6
CorrelationsSlide7
SVM + Features, Features, Features!
tokenize punctuation
"white list" (only use sentiment words)
id, neutralize proper nouns
remove stop words
strip numbers
POS tagging, ADJ only
contraction splitting
POS tagging, add ADV
lower casing
Brill tagger unigram (Bag of Words) sentiment "white list" (Harvard lexicon) bigram count of sentiment words (pos/neg) trigram balanced training set mixed n-grams binary accuracy ignore stop words sub-topic classifiers, hand list stemming WordNet topic list expansion negation processing topic-filtered n-grams expanded negation processing topic-word proximity filtering large training set size strict entropy modeling varying dictionary size frequency-weighted entropy modeling SVM scaling
30+
preprocessing and SVM classification features,
~50
configurationsSlide8
Key Features
StemmingPorter 1980 via NLTK<fast>, <faster>, <fastest
> <fast>Negation processing (enhanced approach from Pang et al. 2002)“Not a great experience.” NOT_great“They never
disappoint
!”
NOT_disappoint
Net sentiment count
pos/
neg
lexicon (Harvard General Inquirer)
running +/- count
“
Incredible(+)
food, but our server was
rude(-)
.” (0)Slide9
Results (so far)Trained on 10,000 reviews
Tested on ~80,000 reviewsAccuracyBaseline: 50.0%Intermediate model: 56.6% (1.13x
)abs( average scoring delta ): 0.56Slide10
Topic Modeling
Hand-seeded topic-word list expanded via WordNet SynSetssub-topic classifierst
opic-filtered n-grams<soupFOOD was fantasticADJ><fantasticADJ soup
FOOD
was
>
t
opic-word proximity filtering
both above
<
fantastic
ADJ
/FOOD
>.
Results:
Food
Ambiance
ServiceNoise1.39.15%47.26%53.70%48.43%3.40.05%47.88%54.92%50.35%1.02x1.01x1.02x1.03xSlide11
Word-Rating Distributions
“worst”
“mediocre”
“decent”
“solid”
“exceeded”Slide12
Frequency-Weighted Entropy Model
AccuracyBaseline: 50.0%Intermediate model: 56.6%Best (entropy) model: 58.6% (
1.17x)abs( average scoring delta ): 0.56 0.52