Sentiment Analysis Hilbert Locklear Andreea Cotoranu Md Ali Aziz Altowayan and Stephanie Houghton Agenda Why Sentiment Analysis The Sentiment Analysis Problem Project Goals Data and Data Features ID: 601429
Download Presentation The PPT/PDF document "A Comparison of Several Classifiers for ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A Comparison of Several Classifiers for Movie ReviewSentiment Analysis
Hilbert Locklear, Andreea Cotoranu, Md Ali, Aziz Altowayan, and Stephanie HoughtonSlide2
Agenda
Why Sentiment Analysis?The Sentiment Analysis ProblemProject GoalsData and Data FeaturesClassifiers and Results
Conclusions and Directions for Future WorkSlide3
Why Sentiment Analysis
Use of “sentiment determination as a service” A
vailability
of
opinion content
The sentiment
analysis process includes four main tasks: keyword identificationlexical interpretationunderstanding of linguistic conceptsstatistical methods
Opinion Mining
Polarity DeterminationComplicated ClassificationSlide4
Problem: Classification of Movie Reviews
Our project was developed to classify the sentiment of movie reviewsMovie review classification is a BINARY CLASSIFICATION problem
Rating
Free Text
Unstructured Text
7 stars = Positive
4 stars = NegativeSlide5
Project Goals
Use a variety of Machine Learning techniques to determine which one provides best classification accuracy
Achieve high
accuracy
-- less
than 2%
errorSlide6
Data Exploration and Feature Extraction
25,000 labelled reviews...5.8 million wordsNo reviewer allowed more than 30 reviews
Samples
Features
14 FeaturesSlide7
Development of Feature Vector
Selected 14 featuresFeatures are centered on keyword identification.Features include unigram count, unigram ratio, n-gram countClassification requires “Positive” and “Negative” vectors which represent the mean feature values of positively and negatively classified vectors.
Training Set
Mean Positive Vector
1
234.5
120..9
113.6
11.89
21.7
1.95
0.41
0.12
0.01
0.01
0.001
0.08
0.002
Training Set
Mean Negative Vector
0
232.2
120.7
111.4
12.06
21.32
1.09
1.44
0.04
0.1
0.005
0.007
0.03
0.01
*All 14 feature categorical vector values are AVGSlide8
Model Development Plan
Strong belief in the non-linearity of the problemRelated work shows problem is non-linearShow proof that the problem is non-linear
Show that linear approximation is poor
Distance
Similarity
Clustering
Probabilistic
Regression
Neural Network
Support Vector MachineSlide9
Classification Model Distance Metric
Distance metric classifiers determine the distance between the mean training vectors and the test vector
Mean Positive Train Vector
Mean Negative Train Vector
Test Vector
Distance-N
Distance-P
Threshold
Correct
Classification
Incorrect Classification
True Positive
True Negative
False Positive
False Negative
Vectors are Numeric
Threshold is Numeric
Applied HeuristicSlide10
Classification Results Distance Metric
Samples
Distance Metric
78.10%
54.3%Slide11
Classification Model Similarity Measure
Similarity measure classifiers determine the feature to feature similarity between vectorsVectors are numeric and categoricalCategories are defined as LOW, MED, or HIGH
Mean Positive Train Vector
Mean Negative Train Vector
Test Vector
Similarity-N
Similarity-P
Threshold
Correct
Classification
Incorrect Classification
True Positive
True Negative
False Positive
False Negative
Threshold is Numeric for Cosine
Threshold is match count for Jaccard
Jaccard
Matching Count
Cosine
NumericSlide12
Classification Results Similarity Measure
Samples
Similarity Measure
79.00%Slide13
Classification Model K-Nearest Neighbor
K- Nearest Neighbor uses Euclidean distance to determine which cluster to assign a vector
Red Vector in Green Cluster is an Incorrect Classification
Green Vector in Green Cluster is a Correct Classification
TP
FP
TN
FNSlide14
Classification Results K-Nearest Neighbor
Samples
43.00%
Normalization TechniqueSlide15
Classification Model Logistic Regression
Measures the relationship between the sentiment classification of “positive” or “negative by estimating the probabilities using the logistic function
which is the cumulative logistic distribution
Good Fit
Conduct Hypothesis test for all coefficients
2 Coefficients are rejected
UWC and SMWC
Wald TestSlide16
Classification Results Logistic Regression
Must accept high false positive
rate to
achieve high true positive rate
Only about 80% of the data points can be predictedSlide17
Classification Model Naïve Bayes
Simple
probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features
.
We can think of it in three parts:
Prior Probability
Posterior Probability
LikelihoodSlide18
Classification Model Naïve Bayes
Positive Class
Negative
Class
Probability that feature occurs in the class determines the likelihood that a positive or negative sentiment is associated with that feature
Slide19
Classification Results Naïve Bayes
Samples
Categorical Vector
66.58%Slide20
Classification Model
Artificial Neural Network
f
1
f
2
f
5
f
7
f
3
f
4
f
6
f
3
f
8
f
12
f
14
f
10
f
11
f
13
f
9
TF
AF
B1
B1
Threshold
Positive
NegativeSlide21
Classification Model
Artificial Neural Network
f
1
f
2
f
5
f
7
f
3
f
4
f
6
f
3
f
8
f
12
f
14
f
10
f
11
f
13
f
9
TF
AF
B1
B1
Threshold
Positive
Negative
TFSlide22
Classification Results Artificial Neural Network
Samples
Neurons
69.52%
69.26%Slide23
Classification Model Support Vector Machines
Non-Probabilistic Binary Classifier
Constructs
a hyperplane or set of hyperplanes in a
14-dimensional space
A
good separation is achieved by the hyperplane that has the largest distance to the nearest training-data point of any
classSlide24
Classification Model SVM Kernels
Faster
More Accurate
Use linear kernel when number of features is larger than number of observations.
Use
Gaussian
kernel when number of observations is larger than number of features
.Slide25
Classification Results Support Vector
Machine
Samples
Kernels
70.24%
69.87%Slide26
Classification Results Summary
Classifier
%
79.0%
43.0%Slide27
Conclusions
FindingsCosine Similarity provided best resultsThreshold heuristic proved valuableProof of non-linearity
Evidence that non-linear classification methods
perform
better
Expectations
SVM to provide best resultsWhy not?Support for non-linearitySupport for poor linear approximations
Yes
Yes
NoSlide28
Conclusions (contd.)
Sentiment analysis is a nonlinear problem
Sentiment analysis does not allow for a good linear
approximation
Word context may be the most important
feature for sentiment classificationSlide29
Recommendations for Future Work
Neural Network-based ModelsWord2Vec architecture
Continuous Bag-of-Words and Skip-Gram
Variations based on the Word2Vec architecture
Hidden Markov-based ModelsSlide30
Q & A and Thank You!Slide31
Classification Model Hidden Markov
t
1
t
2
t
3
t
4
t
5
<DET>
1(1)
0
0
0
0
<N>
0
1(.85)
0
0
0
<V>
0
0
1(.90)
0
0
<ADJ>
0
0
0
1(.47)0<ADV>00001(.55)
.19Slide32
Classification Model Hidden Markov
A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states.
The
Movie
is
very
good.
STOP
<DET>
<N>
<V>
<ADJ>
<ADV>
<DET>
<N>
<V>
<ADJ>
<ADV>
STOP
<DET>
0
1
0
0
0
<N>
0
0
1
0
0
<V>
0
0010<ADJ>00001<ADV>00000Associate some sentiment value with this order
Transition Matrix