/
Opinion Mining and Topic Categorization with Novel Term Wei Opinion Mining and Topic Categorization with Novel Term Wei

Opinion Mining and Topic Categorization with Novel Term Wei - PowerPoint Presentation

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
413 views
Uploaded On 2016-03-17

Opinion Mining and Topic Categorization with Novel Term Wei - PPT Presentation

Roman Sergienko PhD student Tatiana Gasanova PhD student Ulm University Germany Shaknaz Akhmedova PhD student Siberian State Aerospace University Krasnoyarsk ID: 259116

weighting size svm term size weighting term svm word number class confweight deft

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Opinion Mining and Topic Categorization ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Opinion Mining and Topic Categorization with Novel Term Weighting

Roman

Sergienko

,

Ph.D

student

Tatiana

Gasanova

,

Ph.D

student

Ulm University,

Germany

Shaknaz

Akhmedova

,

Ph.D

.

student

Siberian

State Aerospace University,

Krasnoyarsk

,

RussiaSlide2

Contents

Motivation

Databases

Text preprocessing methodsThe novel term weighting methodFeatures selectionClassification algorithmsResults of numerical experimentsConclusions

2Slide3

Motivation

The

goal

of the work is to evaluate the competitiveness of the novel term weighting in comparison with the standard techniques for opining mining and topic categorization.The criteria are:Macro F-measure for the test setComputational time3Slide4

Databases: DEFT’07 and DEFT’08

4

Corpus

SizeClassesBooksTrain size = 2074

Test size = 1386

Vocabulary = 52507

0: negative,

1: neutral,

2: positive

Games

Train size = 2537Test size = 1694Vocabulary = 631440: negative, 1: neutral, 2: positiveDebatesTrain size = 17299Test size = 11533Vocabulary = 596150: against, 1: for

Corpus

Size

Classes

T1

Train size = 15223

Test size = 10596

Vocabulary = 202979

0: Sport,

1: Economy,

2: Art,

3: Television

T2

Train size = 23550

Test size = 15693

Vocabulary = 262400

0: France,

1: International,

2: Literature,

3: Science,

4: SocietySlide5

The existing text preprocessing methods

Binary preprocessing

TF-IDF

(Salton and Buckley, 1988)5

Confident Weights

(

Soucy

and

Mineau

, 2005)Slide6

The novel term weighting method

6

L

– the number of classes; ni – the number of instances of the i-th class; Nji – the number of j-

th

word occurrence in all instances of the

i

-th

class; Tji=Nji/ni – the relative frequency of j-th word occurrence in the i-th class;Rj=maxiTji, Sj=arg(maxiTji) – the number of class which we assign to j-th word. Slide7

Features selection

Calculating a relative frequency for each word in the each class

Choice for each word the class with the maximum relative frequency

For each classification utterance calculating sums of weights of words which belong to each classNumber of attributes = number of classes7Slide8

Classification algorithms

k

-nearest neighbors algorithm with distance weighting (we have varied

k from 1 to 15);kernel Bayes classifier with Laplace correction;neural network with error back propagation (standard setting in RapidMiner);Rocchio classifier with different metrics and parameter;support vector machine (SVM) generated and optimized with Co-Operation of Biology Related Algorithms (COBRA) (Akhmedova and Semenkin, 2013).

 

8Slide9

Computational effectiveness

9

DEFT’07

DEFT’08Slide10

The best values of F-measure

10

Problem

F-measureThe best known valueTerm weighting methodClassification algorithm

Books

0.619

0.603

The novel TW

SVM

Games

0.7200.784ConfWeightk-NNDebates0.7140.720ConfWeightSVMT10.8560.894The novel TWSVMT20.8510.880

The novel TW

SVMSlide11

Comparison of ConfWeight and the novel term weighting

11

Problem

ConfWeightThe novel TWDifferenceBooks0.588

0.619

+0.031

Games

0.720

0.712

-0.008

Debates0.7140.700-0.014T10.8550.856+0.001T20.8510.820+0.031Slide12

Conclusions

The novel term weighting method gives similar or better classification quality than the

ConfWeight

method but it requires the same amount of time as TF-IDF.12