/
Opinion Mining & Sentiment Opinion Mining & Sentiment

Opinion Mining & Sentiment - PowerPoint Presentation

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
342 views
Uploaded On 2019-06-22

Opinion Mining & Sentiment - PPT Presentation

Analysis Navid Rekabsaz rekabsazifstuwienacat Sentiment of banks reports towards risk in economy 6 Analysis of 350 occupations over the Wikipedia articles showing bias in English language to femalemale factors ID: 759817

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Opinion Mining & Sentiment" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Opinion Mining & Sentiment Analysis

Navid

Rekabsaz

rekabsaz@ifs.tuwien.ac.at

Slide2

Sentiment of banks’ reports towards risk

in economy [6]

Slide3

Analysis of 350 occupations

over

the Wikipedia articles, showing bias in English language to female/male factors.

Slide4

Overview

Sentiment analysis & opinion mining

Approaches

Supervised sentiment analysis

Crash course on Machine Learning

Data in lower dimensions

Slide5

Text Mining

[2]

Slide6

Natural Language Processing (NLP)

How machine can perceive human language

[2]

Slide7

Opinion Definition

Opinion is a subjective statement describing what a person believes or thinks about something [2]

Objective: states a fact (can be proved if wrong/right)

Subjective: related to the sentiment holder and requires the aggregation of opinions of several individuals.

Opinion holder

Opinion target

Depends on context, culture, background

Slide8

Opinion and Subjectivity

Opinion mining aims to

quantify

the

subjectivity

of an opinion

A common form of subjectivity is

sentiment

i.e.

the feeling of the opinion holder regarding to the

opinion

Subjectivity can also be other concepts e.g.

Affection and appreciation

Perspective

,

instability and risk,

Agreement and disagreement,

Political stance

Hedge and

speculation

Slide9

Terminology

Similar terms with slightly different objectives:

Opinion mining

Sentiment analysis/mining

Subjectivity analysis

Affect analysis

Emotion analysis

Slide10

Why Opinion Mining?

To understand/aggregate opinions Humans as sensors! Business intelligence and market researchData-driven (computational) social science researchApplication areaBusiness, sale, CRMFinance and bankingManagement sciencePolitical scienceSocial science

[9]

Slide11

Challenges in NLP

Ambiguity:

“design” can be a noun or a verb (ambiguous POS)

“root” has multiple meanings (ambiguous sense)

Co-reference resolution:

“John persuaded Bill to buy a TV for

himself

.” (himself = John or Bill?)

Presupposition:

“He has quit smoking” implies that he smoked before.

Sarcasm

“I’m so

pleased

mom woke me up with vacuuming this morning.”

Negation

“I don’t like it very much”

Slide12

Challenges in Opinion Mining

Explicit

opinion:

a subjective statement

“the microphone works very good

Implicit

opinion

:

an objective statement that implies an opinion by expressing a desirable or undesirable fact

.

“The earphone broke in two days.”

Direct

opinion

: directly expressed on opinion target

“the microphone works very good

Indirect

opinion: expressed indirectly based on its effect on other entities

“After taking the drug, my stomach felt worse.”

Slide13

Overview

Sentiment analysis & opinion mining

Approaches

Supervised sentiment analysis

Crash course on Machine Learning

Data in lower dimensions

Slide14

Example

“This past Saturday, I bought a Nokia phone and my girlfriend bought a Motorola phone with Bluetooth. We called each other when we got home. The voice on my phone was clear, better than my previous Samsung phone. The battery life was however short. My girlfriend was quite happy with her phone. I wanted a phone with good sound quality. So my purchase was a real disappointment. I returned the phone yesterday.”

[1]

Slide15

Levels of Analysis

Aspect LevelAnalyses each opinion target and its sentimentfollows by opinion summarisationSentence Levelassumes one sentiment per each sentencefollows by opinion summarisationText (document) Levelassumes whole the text expresses one sentiment about one opinion targetUsually shallower NLP techniquesCan scale to bigger collections or longer texts

Slide16

Document-Level Sentiment Analysis

To analyse the text, we need to provide a (numerical) document representationEach document is represented as a vector with dimensionsThe dimensions or features usually correspond to the terms in the documents but not necessarily (discussed later) is the weight (value) of the dimension for the document

 

Slide17

Document Representation

2

2

General representation of documents with dimnesions

 

Slide18

Bag-Of-Words Representation

A common document representation approach is considering all the terms in the collection as features (dimensions)The value can be defined by any term weighting method: # of occurrence of term in doc term frequency of term in doc scoring of term in doc scoring of term in doc

 

Slide19

Bag-Of-Words Representation

Bag-of-word representation of

documents

with terms in the collection.

 

Slide20

Problem Definition

Given the document representations, estimate/quantify their sentiments.

Possible sentiment (target) values:

[

-1, 0, 1]

[negative, neutral, positive]

classification problem

[-2, -1, 0, 1, 2]

[very neg., neg., neutral, pos., very pos.]

either

classification or regression problem

Decimal numbers e.g. the stock price of a stock

regression problem

Slide21

Unsupervised Sentiment Analysis

We don’t have any beforehand knowledge about the sentiment of the document (

no judged/annotated data

)

The unsupervised approach uses the representations

of

the documents together with a

lexicon

to estimate their sentiments.

Lexicon is a list of terms with

sentiment scores

or

terms categorized

in different

sentiment groups

(positive, negative, etc.).

Slide22

Sample Sentiment Lexicons

SentiWordNetEach term with positive and negative sentiment scores. E.g.:Bing Liu’s opinion lexicon [3]2006 positive, 4783 negative wordsLoughran-McDonold financial sentiment dictionary [4]

TermsNeg. ScorePos. Scoreable00.125unable0.750emerging00

Group

# of words

Sample terms

Negative

2337

discontinued, penalties, misconduct

Positive

353

achieve, efficient, profitable

Uncertainty

285

approximate, fluctuate, uncertain, variability

Slide23

Sentiment Estimation

Sentiment is calculated using the lexicon terms and their sentiment scores, e.g. by sum of term weights of lexicon terms multiplied with their sentiment score :Negations (not, don’t, etc.) are usually handled by heuristics, e.g.: if one of the three predecessors of a word in the sentence is a negation word, the sentiment score is negated.Unsupervised approach is good for primitive analysis and observations but not much effective in practice.

 

Slide24

Overview

Sentiment analysis & opinion mining

Approaches

Supervised sentiment analysis

Crash course on Machine Learning

Data in lower dimensions

Slide25

Supervised Sentiment Analysis

Supervised sentiment analysis requires a set of documents with annotated sentiments Document’s sentiment: From the annotated data, it first learns a model of estimating sentiments from representationsThen, it uses the model to estimate/predict the sentiment values of new (non-annotated) documents

 

Slide26

Supervised Sentiment Analysis

sentiment…

sentiment…

Learn a model from annotated documents to estimate the sentiment of a new document.

 

Create

ML Model

Predict

Slide27

Overview

Sentiment analysis & opinion mining

Approaches

to

sentiment

analysis

Supervised sentiment analysis

Crash course on Machine Learning

Data in lower dimensions

Slide28

Machine Learning

We have a set of data records:each with features: and each corresponds to an output (label) : Based on the observed data, we assume that there exists a function to map to such that:

 

[7]

Slide29

Machine Learning Model

is a fixed but unknown function is irreducible error There is always some unknown factors that not exists in dataIt means that we can never completely infer from The aim of Machine Learning is to estimatei.e. (predicted output) be close to (real output).The difference between and is reducible error We call a machine learning model

 

Slide30

Example

income

  with two featuresblue surface 

 

[7]

Slide31

Machine Learning Model

Some machine learning models:

Logistic Regression / Linear Regression

SVM

Naïve Bias

Decision Trees

Neural networks

Slide32

Learning the model

Annotated records are splitted into:Training set: for creating the modelValidation set (optional)For tuning model’s hyper-parametersTest setAssumption: Test set is similar to not-yet-observed dataTest data is used to evaluate the model’s performance

Observed (annotated) data records

Training

Test

Training set

Validation set

Test set

Slide33

Learning the model

Pstatus: parent's cohabitation status ('T' - living together 'A' - apart) Romantic: with a romantic relationshipWalc: weekend alcohol consumption (from 1 - very low to 5 - very high)http://archive.ics.uci.edu/ml/datasets/STUDENT+ALCOHOL+CONSUMPTION#

sexagePstatusromanticWalcF18Ano1F17Tno1F15Tno3F15Tyes1F16Tno2M16Tno2M16Tno1F17Ano1M15Ano1M15Tno1F15Tno2F15Tno1M15Tno3M15Tno2M15Ayes1F16Tno2F16Tno2F16Tno1M17Tno4

Dataset

Features / Variables

 

Labels / Output Variable

 

Slide34

Learning the model

sexagePstatusromanticWalcF18Ano1F17Tno1F15Tno3F15Tyes1F16Tno2M16Tno2M16Tno1F17Ano1M15Ano1M15Tno1F15Tno2F15Tno1M15Tno3

Train Set

M15Tno2M15Ayes1F16Tno2F16Tno2F16Tno1M17Tno4

Test Set

Slide35

Learning the model

Train Set

Test Set

212214

M15Tno?M15Ayes?F16Tno?F16Tno?F16Tno?M17Tno?

sexagePstatusromanticWalcF18Ano1F17Tno1F15Tno3F15Tyes1F16Tno2M16Tno2M16Tno1F17Ano1M15Ano1M15Tno1F15Tno2F15Tno1M15Tno3

 

Slide36

Learning the model

Train Set

Test Set

Train

ML Model

 

M

15Tno?M15Ayes?F16Tno?F16Tno?F16Tno?M17Tno?

sexagePstatusromanticWalcF18Ano1F17Tno1F15Tno3F15Tyes1F16Tno2M16Tno2M16Tno1F17Ano1M15Ano1M15Tno1F15Tno2F15Tno1M15Tno3

2

1

2

2

1

4

Slide37

Learning the model

Train Set

Test Set

Train

ML Model

Predict

 

M

15Tno1M15Ayes1F16Tno2F16Tno2F16Tno3M17Tno4

 

sex

agePstatusromanticWalcF18Ano1F17Tno1F15Tno3F15Tyes1F16Tno2M16Tno2M16Tno1F17Ano1M15Ano1M15Tno1F15Tno2F15Tno1M15Tno3

2

1

2

2

1

4

Slide38

Learning the model

Train Set

Test Set

Train

ML Model

Predict

performance evaluation

M

15

Tno1M15Ayes1F16Tno2F16Tno2F16Tno3M17Tno4

 

 

sex

age

PstatusromanticWalcF18Ano1F17Tno1F15Tno3F15Tyes1F16Tno2M16Tno2M16Tno1F17Ano1M15Ano1M15Tno1F15Tno2F15Tno1M15Tno3

2

1

2

2

1

4

Slide39

Evaluation

Some popular measures

ClassificationAccuracy Precision Recall F-measure RegressionMean squared error

 

Slide40

How to split train and test sets

Split data into training and test sets

E.g. train-test 80%-20% or 66%-33%

The performance on the test set may highly vary depends on the selection of the test set

Slide41

n-fold Cross Validation

n

-Fold

cross validation

is an alternative to simple split with more reliable evaluation results.

Split

data into

n

folds with equal size

(usually

n=

10)

Repeat

n

times

:

Train a model using

n-1

folds

Test the model on the left-out fold

Final performance

is

usually the average

of the

results of the folds

Slide42

Overfitting

When the model

fits exactly to train data  overfittingA good model suppose to perform good on unobserved dataML models use regularization to provide a reasonable generalisation

Wikipedia

Slide43

Overfitting: Learning Curve

From Wikipedia. Training

error is shown in blue,

test error

in

red. X-axes shows various models or the

number of training cycles.

Slide44

ML model example : Linear Regression

Linear regression formula:

+…+Using the train set, we can estimate a best-fit for the coefficients (ML model)Example: yellow surface next page

 

Slide45

ML model example : Linear Regression

[7]

 

Slide46

Overview

Sentiment analysis & opinion mining

Approaches

Supervised sentiment analysis

Crash course on Machine Learning

Data in lower dimensions

Slide47

Supervised Sentiment Analysis

When is all the terms in the collection, the feature space becomes extremely sparse (a lot zeros) and high dimensionalCurse of dimensionality: The amount of data does not suffice to support the sparsity in dimensionality It degrades the performance of ML models

 

sentiment…

sentiment…

 

Slide48

Feature Reduction

We need to reduce dimensions/features of the data

An important step in text mining

tasks

.

Dimensionality/feature reduction methods:

Feature selection

: keep some important features and get rid of the rest!

Dimensionality reduction

: project the data from high

to lower

dimensions

Dense representation (

Embedding

)

: learn a dense (low-dimension) representation of the document

Slide49

Feature Selection

Simple methods

Remove stop words or very common words

TF-IDF do it in a “soft” way

why?

Remove very rare words

They might be due to misspelling

Stemming

Advanced methods

Lexicon-based

Keeping

the features with high information content

by using some measures of

informativeness

Subset selection

Slide50

Feature Selection - Lexicon

Instead of all the collection terms, only use the lexicon terms.

Pros:

Considerably reduce noise as

most the terms

are uninformative

Focus on

domains

by using domain-specific

lexicon, e.g. finance, health

Cons

:

Same word may have various orientations:

This camera

sucks

” and “

This vacuum cleaner really

sucks

Sentence with sentiment

words

but express no sentiment

“W

hat is a good camera?

Sentence

without

sentiment

words, implies opinion

This washer uses a lot of

water!

Sarcastic

sentences

What a great car! It stopped working in two days.

Slide51

Feature Selection - Informativeness

Informativeness

measures

try to reveal how much a particular feature gives us information for predicting the label

Feature selection:

measure

the

informativeness

of

features

features

with values higher than a threshold

A percentage of the most informative ones

Slide52

Feature Selection - Variance

Unsupervised feature selection: does not consider labelsThe variance () of the values of a feature (term weights) along data instances (documents)Not appropriate for text processing why?The text data is very sparse and the variance values will be small and close to each other.

 

Slide53

Feature Selection - Mutual Information

Supervised feature selection: considers labelsMutual Information probability of label (class) , given documents contain term probability of label (class) Term has mutual information with class , when To measure if a term (feature) is in general informative, we use either max or mean over the labels (classes):

 

Slide54

Feature Selection – MI Example

Sentiment100-1101-11110111-110110011001-1100-1010-1011-11111111-10110

Sentiment100-1101-11110111-110110011001-1100-1010-1011-11111111-10110

 

 

 

 

 

Slide55

Feature Selection – Informativeness measures

Other supervised measure:Gini Index statisticsInformation Gain

 

Slide56

Feature Selection – Subset selection

Finding a subset of features with reasonable evaluation performance on training data

Not test data, it is cheating!

Subset selection algorithms:

Best Subset Selection

Forward Stepwise Selection

Backward Stepwise Selection

Slide57

Feature Selection – Best Subset

Let denote the null model, which contains no features.For :Fit all models that contain exactly featuresPick the best among these models, and call it . Here best is defined as having smallest error in training-setSelect a single best model among using validation-set

 

Slide58

Feature Selection – Forward Step selection

Let denote the null model, which contains no features.For :Consider all models that augment the features in with one additional featureChoose the best among these models, and call it . Here best is defined as having smallest error in training-set.Select a single best model among using validation-set

 

Slide59

Feature Selection – Subset selection

Best subset selection searches on all possible combinations guarantees to find the best feature subset Computationally expensivefit modelsForward and backward stepwise selectiondo not guarantee to find the optimal solution good in practicefit models

 

Slide60

Dimensionality Reduction

Reducing dimension by removing noise and keeping informative dataSimilar to lossy compressingSome popular methods:Principle Component Analysis (PCA)Latent Semantic Analysis (LSA)Probabilistic Latent Semantic Analysis (pLSA)Latent Dirichlet Allocation (LDA)designed for text processing

Slide61

Learn the dense optimal representation of documents during the classification process, generally by deep learning and neural network methods

Representation Learning

[10]

[8

]

Slide62

Praktikum/Masterarbeit

Financial news/reports sentiment analysis with deep learningFinancial news and also annual reports are valuable resources for predicting economical and financial measuresThe aim is to provide better prediction of financial indexes using the text data in combination with market data with focus on deep learning methodsRequirements: good programming as well as analytical skillskeen to learn, cod, and researchInterested in participating in challenges and publicationgood performance in IR course

navid.rekabsaz@tuwien.ac.at

Cuadrilla

files to delay application to frack in Lancashire

-0.416

GSK aims to file up to 20 new drugs for approval by 2020

0.422

Industry

NewsWhitbread

sales sink in fourth quarter on Costa slowdown

-0.562

Slide63

Further Reading

Bing Liu,

Sentiment Analysis and Opinion Mining

, now Publishers, 2012. some slides are adapted

Text Mining and Analytics

course in

coursera.org

Aggarwal,

Charu

C., and

ChengXiang

Zhai

.

Mining Text Data

. Springer, 2012

Slide64

References

[1] Bing Liu, Sentiment Analysis and Opinion Mining, now Publishers, 2012.

[2] Text Mining and Analytics course in

coursera.org

[3]

Minqing

Hu and Bing Liu.

Mining

and summarizing customer

reviews,

KDD-2004

[4]

Loughran

, Tim, and Bill McDonald.

When

is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks

.

The Journal of Finance

66.1 (2011): 35-65

.

[5] Livia

Polanyi and Annie

Zaenen

. 2006.

Contextual valence shifters.

In

Computing

Attitude and

Affect

in Text: Theory and Applications, volume 20 of The Information Retrieval Series,

Springer.

[6]

Nopp

, Clemens, and Allan

Hanbury

.

"Detecting Risks in the Banking System by Sentiment Analysis."

EMNLP 2015

[7]

James, Gareth, et al.

An introduction to statistical learning

. Vol. 6. New York: springer, 2013

.

[8]

Socher

, Richard, et al. "Recursive deep models for semantic compositionality over a sentiment treebank." 

EMNLP

2013.

[9]

Navid

Rekabsaz

et al.

“Volatility

Prediction using Financial Disclosures Sentiments with Word Embedding-based IR

Models”

ACL 2017

[10]

Tang,

Duyu

, Bing Qin, and Ting Liu. "Document Modeling with Gated Recurrent Neural Network for Sentiment Classification." 

EMNLP

. 2015.