/
Unifying Topic, Sentiment & Preference Unifying Topic, Sentiment & Preference

Unifying Topic, Sentiment & Preference - PowerPoint Presentation

phoebe-click
phoebe-click . @phoebe-click
Follow
411 views
Uploaded On 2017-09-10

Unifying Topic, Sentiment & Preference - PPT Presentation

in an HDPBased Rating Regression Model for Online Reviews Zheng Chen 1 Yong Zhang 1 2 Yue Shang 1 Xiaohua Hu 1 1 Drexel University USA 2 China Central Normal University China ID: 586859

rating word topic model word rating model topic sentiment preference sentiments user prediction ratings aspect aspects reviews regression product neutral design main

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Unifying Topic, Sentiment & Preferen..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Unifying Topic, Sentiment & Preferencein an HDP-Based Rating Regression Model for Online Reviews

Zheng Chen

1

, Yong Zhang

1,

2

, Yue Shang

1

, Xiaohua Hu

1

1

Drexel University, USA

2

China Central Normal University, ChinaSlide2

Overview

TSPRA

Prediction Model

Reviews with Ratings

Product Aspects

Word Sentiments

User-Aspect Preference

Rating RegressionSlide3

Overview

TSPRA

Prediction Model

Reviews with Ratings

Product Aspects

Word Sentiments

User-Aspect Preference

Rating Regression

Topic-Sentiment-Preference Regression AnalysisSlide4

Overview

TSPRA

Prediction Model

Reviews with Ratings

Product Aspects

Word Sentiments

User-Aspect Preference

Rating RegressionSlide5

Overview

TSPRA

Prediction Model

Reviews with Ratings

Product Aspects

Word Sentiments

User-Aspect Preference

Rating Regression

“topics”, a multinomial distribution over vocabulary

a

value representing from negative to positive sentiments

 

a

value

representing how much a user cares an aspect

 

a bunch of distributions inferred from input review dataSlide6

Overview

TSPRA

Prediction Model

Reviews with Ratings

Product Aspects

Word Sentiments

User-Aspect Preference

Rating Regression

Aspect Sentiments

Critical Aspects

aggregated word sentiments

aspects with high user preference but low sentimentSlide7

Overview

TSPRA

Prediction Model

Reviews with Ratings

Product Aspects

Word Sentiments

User-Aspect Preference

Reviews without Ratings

Rating Prediction

Rating Regression

Aspect Sentiments

Critical Aspects

The accuracy of rating prediction serves as an indicator of the model performanceSlide8

MotivationConsider preference and sentiment on a product aspect as independent variables.

JMARS (2014)

FLAME (2015)

user preference

sentiment

rating

user preference functions like topic-level sentimentSlide9

MotivationConsider preference and sentiment on a product aspect as independent variables.

user preference

sentiment

rating

In our model preference

and sentiment

are designed as independent variables that co-determine the review rating

 Slide10

MotivationAn automatic approach to build up word sentiment resources. Review ratings tend to be a genuine reflection of the review text sentiments.

The is actually an important application of such rating regression model. However, previous publications cited in the paper do not demo their sentiment result.

Employment of a non-parametric model Hierarchical Dirichlet Process (HDP).

Data an parameters together determine the number of topics.Slide11

Model – HDP

The Chinese Restaurant Franchise Representation.

Document – Restaurant

Document-Level Cluster

– TableWord – CustomerTopic – DishTopic Prior – Franchise

The documents are viewed as being generated by the following process:Every word of a doc is generated from a topic associated with a doc-level cluster. There is a chance that the cluster is new, or the associated topic is new.Or figuratively, when a customer comes into a restaurant, one must choose an existing table, or sits on a new table. The table might order an existing dish, or creates a new dish from the franchise. Then the customer enjoys the dish.

The cluster indexes of words , and the topic indexes of clusters

form the state space. During the Gibbs sampling, a word might chose a new cluster, a cluster might change its associated topic. Thus we keep update and

alternately. 

infinite

and

priored version of LDA

 

 

G

j

G

0

 

 

 

 

D

 

 

 

z

di

 

 

 

 

 

D

 

 

 Slide12

Model – HDP

 

G

j

G

0

 

 

 

 

D

 

 

 

z

di

 

 

 

 

 

D

 

 

HDPSlide13

Model – Main Design

(drawn from a binomial distribution), the user preference associated with the topic

of the

th

word in doc

authored by user

.

is dependent on the topic of the word and the author of the doc, so

is designed to be drawn from a binomial distribution

indexed

by topics and users.

There are in total

such binomials, with their prior being Dirichlet distribution with concentration parameter

.

Although the

is computed for each word, it is only dependent on topics and authors.

 Slide14

Model – Main Design

(drawn from a 3d multinomial distribution), the word sentiment associated with the

the topic

of the

th

word

in doc

.

is dependent on the topic of the word

and the word

itself, so it is designed to be drawn from a multinomial distribution

indexed

by topics and words.

There are in total

such multinomial

s

, with their prior being Dirichlet distribution with concentration parameter

.

 Slide15

Model – Main Design

, the rating of word

computed based on a rule.

 

strong preference, negative sentiment: rating 1Slide16

Model – Main Design

, the rating of word

computed based on a rule.

 

strong preference,

positive

sentiment: rating 5Slide17

Model – Main Design

, the rating of word

computed based on a rule.

 

weak preference

negative

sentiment:

the middle value between 1 and the neutral rating

.

p

ositive sentiment

: the middle value between 5 and the neutral rating

.

It is likely that

.

 Slide18

Model – Main Design

, the rating of word

computed based on a rule.

 

Otherwise, the sentiment is neutral, and the rating should be neutral as well.

Reason for this association rule is to reduce the state space and computation complexity. Besides

table indexes and topic indexes that we need to update for HDP, we only add word ratings

to the state space.

 Slide19

Model – Main Design

Design

, where

is the mean of non-neutral word ratings (i.e. average of word ratings excluding neutral words), and

is the rating noise.

Run the Gibbs sampling sufficiently many times, then those multinomials

, binomials

, along with the observable words, ratings, can be viewed as a sample from the designed generative process.

 Slide20

Prediction Model

Use result from inference and

are treated as known. The prediction process no longer updates them.

 

becomes unknown, and needs to be summed out during prediction.

 

We do not present inference formulas because it is a long story. Please refer to original HDP papers and our paper & presentations for details.

http://www.pages.drexel.edu/~zc86/Slide21

Prediction Evaluation

Dataset

Compare with FLAME (WSDM 2015)

https://snap.stanford.edu/data/web-Amazon.htmlSlide22

Prediction EvaluationComparison with FLAME:

absolute error

, correlation, inverted pairs Slide23

Prediction EvaluationComparison with FLAME: absolute error,

correlation

, inverted pairs Slide24

Prediction EvaluationComparison with FLAME: absolute error, correlation,

inverted pairs

Slide25

Word Sentiments

 

The probability of word

being positive under topic

 

The probability of word

being negative under topic

 

Comparison with S

enticNet3, a public general sentiment resourceSlide26

Word SentimentsSlide27

Word Sentiments

quite neutral words in general contextSlide28

Critical Aspects

 

 

The probability of topic

being of high concern by user

 

The probability of topic

being positive under topic

 

The probability of word

being negative under topic

 

We also report a weak Pearson's correlation

between user preferences and sentiments

 Slide29

Experiments with Parameters

T

he neutral rating is slightly above 3.

Generally

is good for all tested data sets.

 

Generally

is good for all tested data sets.

 

Most people are rounding when rating since

.

 Slide30

Contribution Summary

Decoupling of “user preference” from sentiments.

Invention of “critical aspects”.

An approach to automatically generate sentiment resources for online reviews.

First attempt to make use of non-parametric topic models for online reviews.

Thank you!