/
A Comparison of Several Classifiers for Movie Review A Comparison of Several Classifiers for Movie Review

A Comparison of Several Classifiers for Movie Review - PowerPoint Presentation

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
382 views
Uploaded On 2017-11-01

A Comparison of Several Classifiers for Movie Review - PPT Presentation

Sentiment Analysis Hilbert Locklear Andreea Cotoranu Md Ali Aziz Altowayan and Stephanie Houghton Agenda Why Sentiment Analysis The Sentiment Analysis Problem Project Goals Data and Data Features ID: 601429

vector classification model positive classification vector positive model negative sentiment distance feature results similarity features samples analysis threshold problem

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A Comparison of Several Classifiers for ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A Comparison of Several Classifiers for Movie ReviewSentiment Analysis

Hilbert Locklear, Andreea Cotoranu, Md Ali, Aziz Altowayan, and Stephanie HoughtonSlide2

Agenda

Why Sentiment Analysis?The Sentiment Analysis ProblemProject GoalsData and Data FeaturesClassifiers and Results

Conclusions and Directions for Future WorkSlide3

Why Sentiment Analysis

Use of “sentiment determination as a service” A

vailability

of

opinion content

The sentiment

analysis process includes four main tasks: keyword identificationlexical interpretationunderstanding of linguistic conceptsstatistical methods

Opinion Mining

Polarity DeterminationComplicated ClassificationSlide4

Problem: Classification of Movie Reviews

Our project was developed to classify the sentiment of movie reviewsMovie review classification is a BINARY CLASSIFICATION problem

Rating

Free Text

Unstructured Text

7 stars = Positive

4 stars = NegativeSlide5

Project Goals

Use a variety of Machine Learning techniques to determine which one provides best classification accuracy

Achieve high

accuracy

-- less

than 2%

errorSlide6

Data Exploration and Feature Extraction

25,000 labelled reviews...5.8 million wordsNo reviewer allowed more than 30 reviews

Samples

Features

14 FeaturesSlide7

Development of Feature Vector

Selected 14 featuresFeatures are centered on keyword identification.Features include unigram count, unigram ratio, n-gram countClassification requires “Positive” and “Negative” vectors which represent the mean feature values of positively and negatively classified vectors.

Training Set

Mean Positive Vector

1

234.5

120..9

113.6

11.89

21.7

1.95

0.41

0.12

0.01

0.01

0.001

0.08

0.002

Training Set

Mean Negative Vector

0

232.2

120.7

111.4

12.06

21.32

1.09

1.44

0.04

0.1

0.005

0.007

0.03

0.01

*All 14 feature categorical vector values are AVGSlide8

Model Development Plan

Strong belief in the non-linearity of the problemRelated work shows problem is non-linearShow proof that the problem is non-linear

Show that linear approximation is poor

Distance

Similarity

Clustering

Probabilistic

Regression

Neural Network

Support Vector MachineSlide9

Classification Model Distance Metric

Distance metric classifiers determine the distance between the mean training vectors and the test vector

Mean Positive Train Vector

Mean Negative Train Vector

Test Vector

Distance-N

Distance-P

Threshold

Correct

Classification

Incorrect Classification

True Positive

True Negative

False Positive

False Negative

Vectors are Numeric

Threshold is Numeric

Applied HeuristicSlide10

Classification Results Distance Metric

Samples

Distance Metric

78.10%

54.3%Slide11

Classification Model Similarity Measure

Similarity measure classifiers determine the feature to feature similarity between vectorsVectors are numeric and categoricalCategories are defined as LOW, MED, or HIGH

Mean Positive Train Vector

Mean Negative Train Vector

Test Vector

Similarity-N

Similarity-P

Threshold

Correct

Classification

Incorrect Classification

True Positive

True Negative

False Positive

False Negative

Threshold is Numeric for Cosine

Threshold is match count for Jaccard

Jaccard

Matching Count

Cosine

NumericSlide12

Classification Results Similarity Measure

Samples

Similarity Measure

79.00%Slide13

Classification Model K-Nearest Neighbor

K- Nearest Neighbor uses Euclidean distance to determine which cluster to assign a vector

Red Vector in Green Cluster is an Incorrect Classification

Green Vector in Green Cluster is a Correct Classification

TP

FP

TN

FNSlide14

Classification Results K-Nearest Neighbor

Samples

43.00%

Normalization TechniqueSlide15

Classification Model Logistic Regression

Measures the relationship between the sentiment classification of “positive” or “negative by estimating the probabilities using the logistic function

which is the cumulative logistic distribution

 

Good Fit

Conduct Hypothesis test for all coefficients

2 Coefficients are rejected

UWC and SMWC

Wald TestSlide16

Classification Results Logistic Regression

Must accept high false positive

rate to

achieve high true positive rate

Only about 80% of the data points can be predictedSlide17

Classification Model Naïve Bayes

 

 

 

Simple

probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features

.

We can think of it in three parts:

Prior Probability

Posterior Probability

LikelihoodSlide18

Classification Model Naïve Bayes

Positive Class

Negative

Class

Probability that feature occurs in the class determines the likelihood that a positive or negative sentiment is associated with that feature

 Slide19

Classification Results Naïve Bayes

Samples

Categorical Vector

66.58%Slide20

Classification Model

Artificial Neural Network

f

1

f

2

f

5

f

7

f

3

f

4

f

6

f

3

f

8

f

12

f

14

f

10

f

11

f

13

f

9

TF

AF

 

B1

B1

Threshold

Positive

NegativeSlide21

Classification Model

Artificial Neural Network

f

1

f

2

f

5

f

7

f

3

f

4

f

6

f

3

f

8

f

12

f

14

f

10

f

11

f

13

f

9

TF

AF

 

B1

B1

Threshold

Positive

Negative

TFSlide22

Classification Results Artificial Neural Network

Samples

Neurons

69.52%

69.26%Slide23

Classification Model Support Vector Machines

Non-Probabilistic Binary Classifier

Constructs

a hyperplane or set of hyperplanes in a 

14-dimensional space

A

good separation is achieved by the hyperplane that has the largest distance to the nearest training-data point of any

classSlide24

Classification Model SVM Kernels

 

 

Faster

More Accurate

Use linear kernel when number of features is larger than number of observations.

Use

Gaussian

kernel when number of observations is larger than number of features

.Slide25

Classification Results Support Vector

Machine

Samples

Kernels

70.24%

69.87%Slide26

Classification Results Summary

Classifier

%

79.0%

43.0%Slide27

Conclusions

FindingsCosine Similarity provided best resultsThreshold heuristic proved valuableProof of non-linearity

Evidence that non-linear classification methods

perform

better

Expectations

SVM to provide best resultsWhy not?Support for non-linearitySupport for poor linear approximations

Yes

Yes

NoSlide28

Conclusions (contd.)

Sentiment analysis is a nonlinear problem

Sentiment analysis does not allow for a good linear

approximation

Word context may be the most important

feature for sentiment classificationSlide29

Recommendations for Future Work

Neural Network-based ModelsWord2Vec architecture

Continuous Bag-of-Words and Skip-Gram

Variations based on the Word2Vec architecture

Hidden Markov-based ModelsSlide30

Q & A and Thank You!Slide31

Classification Model Hidden Markov

t

1

t

2

t

3

t

4

t

5

<DET>

1(1)

0

0

0

0

<N>

0

1(.85)

0

0

0

<V>

0

0

1(.90)

0

0

<ADJ>

0

0

0

1(.47)0<ADV>00001(.55)

 

.19Slide32

Classification Model Hidden Markov

A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states.

The

Movie

is

very

good.

STOP

<DET>

<N>

<V>

<ADJ>

<ADV>

<DET>

<N>

<V>

<ADJ>

<ADV>

STOP

<DET>

0

1

0

0

0

<N>

0

0

1

0

0

<V>

0

0010<ADJ>00001<ADV>00000Associate some sentiment value with this order

Transition Matrix