Analysis Navid Rekabsaz rekabsazifstuwienacat Sentiment of banks reports towards risk in economy 6 Analysis of 350 occupations over the Wikipedia articles showing bias in English language to femalemale factors ID: 759817
Download Presentation The PPT/PDF document "Opinion Mining & Sentiment" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Opinion Mining & Sentiment Analysis
Navid
Rekabsaz
rekabsaz@ifs.tuwien.ac.at
Slide2Sentiment of banks’ reports towards risk
in economy [6]
Slide3Analysis of 350 occupations
over
the Wikipedia articles, showing bias in English language to female/male factors.
Slide4Overview
Sentiment analysis & opinion mining
Approaches
Supervised sentiment analysis
Crash course on Machine Learning
Data in lower dimensions
Slide5Text Mining
[2]
Slide6Natural Language Processing (NLP)
How machine can perceive human language
[2]
Slide7Opinion Definition
Opinion is a subjective statement describing what a person believes or thinks about something [2]
Objective: states a fact (can be proved if wrong/right)
Subjective: related to the sentiment holder and requires the aggregation of opinions of several individuals.
Opinion holder
Opinion target
Depends on context, culture, background
Slide8Opinion and Subjectivity
Opinion mining aims to
quantify
the
subjectivity
of an opinion
A common form of subjectivity is
sentiment
i.e.
the feeling of the opinion holder regarding to the
opinion
Subjectivity can also be other concepts e.g.
Affection and appreciation
Perspective
,
instability and risk,
Agreement and disagreement,
Political stance
Hedge and
speculation
Slide9Terminology
Similar terms with slightly different objectives:
Opinion mining
Sentiment analysis/mining
Subjectivity analysis
Affect analysis
Emotion analysis
Slide10Why Opinion Mining?
To understand/aggregate opinions Humans as sensors! Business intelligence and market researchData-driven (computational) social science researchApplication areaBusiness, sale, CRMFinance and bankingManagement sciencePolitical scienceSocial science
[9]
Slide11Challenges in NLP
Ambiguity:
“design” can be a noun or a verb (ambiguous POS)
“root” has multiple meanings (ambiguous sense)
Co-reference resolution:
“John persuaded Bill to buy a TV for
himself
.” (himself = John or Bill?)
Presupposition:
“He has quit smoking” implies that he smoked before.
Sarcasm
“I’m so
pleased
mom woke me up with vacuuming this morning.”
Negation
“I don’t like it very much”
Slide12Challenges in Opinion Mining
Explicit
opinion:
a subjective statement
“the microphone works very good
”
Implicit
opinion
:
an objective statement that implies an opinion by expressing a desirable or undesirable fact
.
“The earphone broke in two days.”
Direct
opinion
: directly expressed on opinion target
“the microphone works very good
”
Indirect
opinion: expressed indirectly based on its effect on other entities
“After taking the drug, my stomach felt worse.”
Slide13Overview
Sentiment analysis & opinion mining
Approaches
Supervised sentiment analysis
Crash course on Machine Learning
Data in lower dimensions
Slide14Example
“This past Saturday, I bought a Nokia phone and my girlfriend bought a Motorola phone with Bluetooth. We called each other when we got home. The voice on my phone was clear, better than my previous Samsung phone. The battery life was however short. My girlfriend was quite happy with her phone. I wanted a phone with good sound quality. So my purchase was a real disappointment. I returned the phone yesterday.”
[1]
Slide15Levels of Analysis
Aspect LevelAnalyses each opinion target and its sentimentfollows by opinion summarisationSentence Levelassumes one sentiment per each sentencefollows by opinion summarisationText (document) Levelassumes whole the text expresses one sentiment about one opinion targetUsually shallower NLP techniquesCan scale to bigger collections or longer texts
Slide16Document-Level Sentiment Analysis
To analyse the text, we need to provide a (numerical) document representationEach document is represented as a vector with dimensionsThe dimensions or features usually correspond to the terms in the documents but not necessarily (discussed later) is the weight (value) of the dimension for the document
Document Representation
2
2
General representation of documents with dimnesions
Bag-Of-Words Representation
A common document representation approach is considering all the terms in the collection as features (dimensions)The value can be defined by any term weighting method: # of occurrence of term in doc term frequency of term in doc scoring of term in doc scoring of term in doc
Bag-Of-Words Representation
Bag-of-word representation of
documents
with terms in the collection.
Problem Definition
Given the document representations, estimate/quantify their sentiments.
Possible sentiment (target) values:
[
-1, 0, 1]
[negative, neutral, positive]
classification problem
[-2, -1, 0, 1, 2]
[very neg., neg., neutral, pos., very pos.]
either
classification or regression problem
Decimal numbers e.g. the stock price of a stock
regression problem
Slide21Unsupervised Sentiment Analysis
We don’t have any beforehand knowledge about the sentiment of the document (
no judged/annotated data
)
The unsupervised approach uses the representations
of
the documents together with a
lexicon
to estimate their sentiments.
Lexicon is a list of terms with
sentiment scores
or
terms categorized
in different
sentiment groups
(positive, negative, etc.).
Slide22Sample Sentiment Lexicons
SentiWordNetEach term with positive and negative sentiment scores. E.g.:Bing Liu’s opinion lexicon [3]2006 positive, 4783 negative wordsLoughran-McDonold financial sentiment dictionary [4]
TermsNeg. ScorePos. Scoreable00.125unable0.750emerging00
Group
# of words
Sample terms
Negative
2337
discontinued, penalties, misconduct
Positive
353
achieve, efficient, profitable
Uncertainty
285
approximate, fluctuate, uncertain, variability
Slide23Sentiment Estimation
Sentiment is calculated using the lexicon terms and their sentiment scores, e.g. by sum of term weights of lexicon terms multiplied with their sentiment score :Negations (not, don’t, etc.) are usually handled by heuristics, e.g.: if one of the three predecessors of a word in the sentence is a negation word, the sentiment score is negated.Unsupervised approach is good for primitive analysis and observations but not much effective in practice.
Overview
Sentiment analysis & opinion mining
Approaches
Supervised sentiment analysis
Crash course on Machine Learning
Data in lower dimensions
Slide25Supervised Sentiment Analysis
Supervised sentiment analysis requires a set of documents with annotated sentiments Document’s sentiment: From the annotated data, it first learns a model of estimating sentiments from representationsThen, it uses the model to estimate/predict the sentiment values of new (non-annotated) documents
Supervised Sentiment Analysis
sentiment…
sentiment…
Learn a model from annotated documents to estimate the sentiment of a new document.
Create
ML Model
Predict
Slide27Overview
Sentiment analysis & opinion mining
Approaches
to
sentiment
analysis
Supervised sentiment analysis
Crash course on Machine Learning
Data in lower dimensions
Slide28Machine Learning
We have a set of data records:each with features: and each corresponds to an output (label) : Based on the observed data, we assume that there exists a function to map to such that:
[7]
Slide29Machine Learning Model
is a fixed but unknown function is irreducible error There is always some unknown factors that not exists in dataIt means that we can never completely infer from The aim of Machine Learning is to estimatei.e. (predicted output) be close to (real output).The difference between and is reducible error We call a machine learning model
Example
income
with two featuresblue surface
[7]
Slide31Machine Learning Model
Some machine learning models:
Logistic Regression / Linear Regression
SVM
Naïve Bias
Decision Trees
Neural networks
Slide32Learning the model
Annotated records are splitted into:Training set: for creating the modelValidation set (optional)For tuning model’s hyper-parametersTest setAssumption: Test set is similar to not-yet-observed dataTest data is used to evaluate the model’s performance
Observed (annotated) data records
Training
Test
Training set
Validation set
Test set
Slide33Learning the model
Pstatus: parent's cohabitation status ('T' - living together 'A' - apart) Romantic: with a romantic relationshipWalc: weekend alcohol consumption (from 1 - very low to 5 - very high)http://archive.ics.uci.edu/ml/datasets/STUDENT+ALCOHOL+CONSUMPTION#
sexagePstatusromanticWalcF18Ano1F17Tno1F15Tno3F15Tyes1F16Tno2M16Tno2M16Tno1F17Ano1M15Ano1M15Tno1F15Tno2F15Tno1M15Tno3M15Tno2M15Ayes1F16Tno2F16Tno2F16Tno1M17Tno4
Dataset
Features / Variables
Labels / Output Variable
Learning the model
sexagePstatusromanticWalcF18Ano1F17Tno1F15Tno3F15Tyes1F16Tno2M16Tno2M16Tno1F17Ano1M15Ano1M15Tno1F15Tno2F15Tno1M15Tno3
Train Set
M15Tno2M15Ayes1F16Tno2F16Tno2F16Tno1M17Tno4
Test Set
Slide35Learning the model
Train Set
Test Set
212214
M15Tno?M15Ayes?F16Tno?F16Tno?F16Tno?M17Tno?
sexagePstatusromanticWalcF18Ano1F17Tno1F15Tno3F15Tyes1F16Tno2M16Tno2M16Tno1F17Ano1M15Ano1M15Tno1F15Tno2F15Tno1M15Tno3
Learning the model
Train Set
Test Set
Train
ML Model
M
15Tno?M15Ayes?F16Tno?F16Tno?F16Tno?M17Tno?
sexagePstatusromanticWalcF18Ano1F17Tno1F15Tno3F15Tyes1F16Tno2M16Tno2M16Tno1F17Ano1M15Ano1M15Tno1F15Tno2F15Tno1M15Tno3
2
1
2
2
1
4
Slide37Learning the model
Train Set
Test Set
Train
ML Model
Predict
M
15Tno1M15Ayes1F16Tno2F16Tno2F16Tno3M17Tno4
sex
agePstatusromanticWalcF18Ano1F17Tno1F15Tno3F15Tyes1F16Tno2M16Tno2M16Tno1F17Ano1M15Ano1M15Tno1F15Tno2F15Tno1M15Tno3
2
1
2
2
1
4
Slide38Learning the model
Train Set
Test Set
Train
ML Model
Predict
performance evaluation
M
15
Tno1M15Ayes1F16Tno2F16Tno2F16Tno3M17Tno4
sex
age
PstatusromanticWalcF18Ano1F17Tno1F15Tno3F15Tyes1F16Tno2M16Tno2M16Tno1F17Ano1M15Ano1M15Tno1F15Tno2F15Tno1M15Tno3
2
1
2
2
1
4
Slide39Evaluation
Some popular measures
ClassificationAccuracy Precision Recall F-measure RegressionMean squared error
How to split train and test sets
Split data into training and test sets
E.g. train-test 80%-20% or 66%-33%
The performance on the test set may highly vary depends on the selection of the test set
Slide41n-fold Cross Validation
n
-Fold
cross validation
is an alternative to simple split with more reliable evaluation results.
Split
data into
n
folds with equal size
(usually
n=
10)
Repeat
n
times
:
Train a model using
n-1
folds
Test the model on the left-out fold
Final performance
is
usually the average
of the
results of the folds
Slide42Overfitting
When the model
fits exactly to train data overfittingA good model suppose to perform good on unobserved dataML models use regularization to provide a reasonable generalisation
Wikipedia
Slide43Overfitting: Learning Curve
From Wikipedia. Training
error is shown in blue,
test error
in
red. X-axes shows various models or the
number of training cycles.
Slide44ML model example : Linear Regression
Linear regression formula:
+…+Using the train set, we can estimate a best-fit for the coefficients (ML model)Example: yellow surface next page
ML model example : Linear Regression
[7]
Overview
Sentiment analysis & opinion mining
Approaches
Supervised sentiment analysis
Crash course on Machine Learning
Data in lower dimensions
Slide47Supervised Sentiment Analysis
When is all the terms in the collection, the feature space becomes extremely sparse (a lot zeros) and high dimensionalCurse of dimensionality: The amount of data does not suffice to support the sparsity in dimensionality It degrades the performance of ML models
sentiment…
sentiment…
Feature Reduction
We need to reduce dimensions/features of the data
An important step in text mining
tasks
.
Dimensionality/feature reduction methods:
Feature selection
: keep some important features and get rid of the rest!
Dimensionality reduction
: project the data from high
to lower
dimensions
Dense representation (
Embedding
)
: learn a dense (low-dimension) representation of the document
Slide49Feature Selection
Simple methods
Remove stop words or very common words
TF-IDF do it in a “soft” way
why?
Remove very rare words
They might be due to misspelling
Stemming
Advanced methods
Lexicon-based
Keeping
the features with high information content
by using some measures of
informativeness
Subset selection
Slide50Feature Selection - Lexicon
Instead of all the collection terms, only use the lexicon terms.
Pros:
Considerably reduce noise as
most the terms
are uninformative
Focus on
domains
by using domain-specific
lexicon, e.g. finance, health
Cons
:
Same word may have various orientations:
“
This camera
sucks
” and “
This vacuum cleaner really
sucks
”
Sentence with sentiment
words
but express no sentiment
“W
hat is a good camera?
”
Sentence
without
sentiment
words, implies opinion
“
This washer uses a lot of
water!
”
Sarcastic
sentences
“
What a great car! It stopped working in two days.
”
Slide51Feature Selection - Informativeness
Informativeness
measures
try to reveal how much a particular feature gives us information for predicting the label
Feature selection:
measure
the
informativeness
of
features
features
with values higher than a threshold
A percentage of the most informative ones
Slide52Feature Selection - Variance
Unsupervised feature selection: does not consider labelsThe variance () of the values of a feature (term weights) along data instances (documents)Not appropriate for text processing why?The text data is very sparse and the variance values will be small and close to each other.
Feature Selection - Mutual Information
Supervised feature selection: considers labelsMutual Information probability of label (class) , given documents contain term probability of label (class) Term has mutual information with class , when To measure if a term (feature) is in general informative, we use either max or mean over the labels (classes):
Feature Selection – MI Example
Sentiment100-1101-11110111-110110011001-1100-1010-1011-11111111-10110
Sentiment100-1101-11110111-110110011001-1100-1010-1011-11111111-10110
Feature Selection – Informativeness measures
Other supervised measure:Gini Index statisticsInformation Gain
Feature Selection – Subset selection
Finding a subset of features with reasonable evaluation performance on training data
Not test data, it is cheating!
Subset selection algorithms:
Best Subset Selection
Forward Stepwise Selection
Backward Stepwise Selection
Slide57Feature Selection – Best Subset
Let denote the null model, which contains no features.For :Fit all models that contain exactly featuresPick the best among these models, and call it . Here best is defined as having smallest error in training-setSelect a single best model among using validation-set
Feature Selection – Forward Step selection
Let denote the null model, which contains no features.For :Consider all models that augment the features in with one additional featureChoose the best among these models, and call it . Here best is defined as having smallest error in training-set.Select a single best model among using validation-set
Feature Selection – Subset selection
Best subset selection searches on all possible combinations guarantees to find the best feature subset Computationally expensivefit modelsForward and backward stepwise selectiondo not guarantee to find the optimal solution good in practicefit models
Dimensionality Reduction
Reducing dimension by removing noise and keeping informative dataSimilar to lossy compressingSome popular methods:Principle Component Analysis (PCA)Latent Semantic Analysis (LSA)Probabilistic Latent Semantic Analysis (pLSA)Latent Dirichlet Allocation (LDA)designed for text processing
Slide61Learn the dense optimal representation of documents during the classification process, generally by deep learning and neural network methods
Representation Learning
[10]
[8
]
Slide62Praktikum/Masterarbeit
Financial news/reports sentiment analysis with deep learningFinancial news and also annual reports are valuable resources for predicting economical and financial measuresThe aim is to provide better prediction of financial indexes using the text data in combination with market data with focus on deep learning methodsRequirements: good programming as well as analytical skillskeen to learn, cod, and researchInterested in participating in challenges and publicationgood performance in IR course
navid.rekabsaz@tuwien.ac.at
Cuadrilla
files to delay application to frack in Lancashire
-0.416
GSK aims to file up to 20 new drugs for approval by 2020
0.422
Industry
NewsWhitbread
sales sink in fourth quarter on Costa slowdown
-0.562
Slide63Further Reading
Bing Liu,
Sentiment Analysis and Opinion Mining
, now Publishers, 2012. some slides are adapted
Text Mining and Analytics
course in
coursera.org
Aggarwal,
Charu
C., and
ChengXiang
Zhai
.
Mining Text Data
. Springer, 2012
Slide64References
[1] Bing Liu, Sentiment Analysis and Opinion Mining, now Publishers, 2012.
[2] Text Mining and Analytics course in
coursera.org
[3]
Minqing
Hu and Bing Liu.
Mining
and summarizing customer
reviews,
KDD-2004
[4]
Loughran
, Tim, and Bill McDonald.
When
is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks
.
The Journal of Finance
66.1 (2011): 35-65
.
[5] Livia
Polanyi and Annie
Zaenen
. 2006.
Contextual valence shifters.
In
Computing
Attitude and
Affect
in Text: Theory and Applications, volume 20 of The Information Retrieval Series,
Springer.
[6]
Nopp
, Clemens, and Allan
Hanbury
.
"Detecting Risks in the Banking System by Sentiment Analysis."
EMNLP 2015
[7]
James, Gareth, et al.
An introduction to statistical learning
. Vol. 6. New York: springer, 2013
.
[8]
Socher
, Richard, et al. "Recursive deep models for semantic compositionality over a sentiment treebank."
EMNLP
2013.
[9]
Navid
Rekabsaz
et al.
“Volatility
Prediction using Financial Disclosures Sentiments with Word Embedding-based IR
Models”
ACL 2017
[10]
Tang,
Duyu
, Bing Qin, and Ting Liu. "Document Modeling with Gated Recurrent Neural Network for Sentiment Classification."
EMNLP
. 2015.