Yue Lu Huizhong Duan Hongning Wang ChengXiang Zhai University of Illinois at UrbanaChampaign August 24 COLING2010 Beijing China 1 Online Opinions Valuable Resource ID: 395867
Download Presentation The PPT/PDF document "Exploiting Structured Ontology to Organi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Exploiting Structured Ontology to Organize Scattered Online Opinions
Yue Lu, Huizhong Duan, Hongning Wang, ChengXiang ZhaiUniversity of Illinois at Urbana-Champaign
August 24, COLING’2010 Beijing, China
1Slide2
Online Opinions: Valuable Resource
2
Need to organize them
in a meaningful way!
…Slide3
Aspect Summarization3
Childhood
Barack Obama is an African American whose father was born in Kenya and got a sholarship to study in American. born in Honolulu, Hawaii, to Barack Hussein Obama Sr., a Kenyan, and Kansas born Ann Dunham.
President
Campagne
The Obama campaign’s use of new media technologies to revitalize political activism among youth, engage the public at large, and raise enormous, record-breaking sums of money was unlike that of any political campaign to date.
Health Care Reform
Several months after the landmark
healthcare
bill was passed, America's faith in
healthcare
increases dramatically.
For health insurance brokers, the new
health care reform legislation has created uncertainty of …
What are “good aspects”?1. Concise
2. Relevant to topic3. Captures major opinions4. Reasonable orderSlide4
Existing Work
What are “good aspects”?1. Concise 2. Relevant to topic3. Captures major opinions4. Reasonable order4
Clustering + Phrase Selection
NA
[
Chen&Dumais
2000]
Our idea:
use structured ontologySlide5
Why Using Ontology?
What are “good aspects”?1. Concise 2. Relevant to topic3. Captures major opinions4. Reasonable order5Ontologybased
In addition:
Great coverage
12 millions of entities, e.g. person, place, or thing
Consistently growing
Anyone can contribute data
Clustering
based
NASlide6
Problem Definition
Topic = “Abraham Lincoln”
Ontology
(>50 aspects)
Professions
Quotations
Parents
…
Date of Birth
Place of Death
Professions
Online Opinion Sentences
…
Selected Subset of Aspects
Selected Matching Opinions
Ordered to optimize readability
6
Date of Birth
Books written
Place of Death
Place of Birth
Children
Output
Spouse
Two Main Tasks:
- Aspect Selection
- Aspect OrderingSlide7
Aspect Selection: Task DefinitionWhat are “good aspects”?
3. Captures major opinions
…
Professions
KL-divergence
retrieval model
Query:
Collection:
Aligned relevant opinions
Professions
Parents
…
…
…
7
Task:
Select a subset of K aspectsSlide8
Aspect Selection: Methods (1) (2)Size-basedSize = Number of aligned relevant opinions
Select K aspects of largest sizeOpinion Coverage-basedReduce redundancy, maximum coverageSelect K aspects sequentially (max cover problem)Professions
1
2
3
…
Position
4
5
3
…
Size=800
Size=600
8
Parents
4
5
6
…
Size=500Slide9
Aspect Selection: Method (3)Conditional Entropy-based
Professions
…
…
Collection:
Clustering, e.g. K-means
C1
C2
C3
…
…
…
Parents
Position
…
9
…
Clusters:
C
Aspect
Subset:
A
A =
argmin
H(C|A)
p(
A
i
,C
i
)
=
argmin
- ∑
i
p(
A
i
,C
i
) log ----------
p(A
i
)
A1
A2
A3
Use a greedy algorithm to approximate the solutionSlide10
Aspect Ordering: Task Definition
Date of BirthPlace of Death
Professions
Quotations
Date of Birth
Place of Death
Professions
Quotations
Ordered
Un-Ordered Aspect Subset
10
What are “good aspects”?
4. Reasonable orderSlide11
Aspect Ordering: MethodsOntology OrderUse the order that aspects appear in ontology
Coherence OrderFollow the order of aligned opinions in their original articles (e.g. blog article, customer review)11Slide12
Aspect Ordering: Coherence Order12
OriginalArticlesDate of Birth
Place of Death
A1
A2
Coherence(A1, A2)
#( is before )
Coherence(A2, A1)
#( is before )
…
So, Coherence(A2, A1) > Coherence (A1, A2)
Π
(A) =
argmax
∑
Ai before
Aj
Coherence(A
i
,
A
j
)
Use a greedy algorithm to approximate the solutionSlide13
Experiments: Data SetsOntology
FreebaseOpinionsBlog entries and CNET customer reviewsStatisticsUS PresidentsDigital Cameras# Topics36110
# Aspects/Topic65±2632±4# Opinions/Topic
1001±1542
140±249
13Slide14
Sample Results: Sony Cybershot DSC-W200
14Freebase AspectssupRepresentative Opinion SentencesFormat: Compact13
Quality pictures in a compact package.
…amazing is that this is such a small and compact unit but packs so much power
Supported Storage Types: Memory Stick Duo
11
This camera can use Memory Stick Pro Duo up to 8 GB
Using a universal storage card and cable (c’mon Sony)
Sensor type: CCD
10
I think the larger
ccd
makes a difference.
but remember this is a small CCD in a compact point-and-shoot.
Digital zoom: 2X47
once the digital :smart” zoom kicks in you get another 3x of zoom. I would like a higher optical zoom, the W200 does a great digital zoom translation...Slide15
Aspect Selection: Evaluation Measures
Aspect Coverage (AC)Aspect Precision (AP) = Jaccard similarityAverage Aspect Precision (AAP)15Professions
C1
C2
C3
Parents
Position
A1
A2
A3
J(A1,C2)=1
J(A2,C2)=2/4
J(A3,C1)=2/4
AP=0.5
AP=0.75
AP=0
= 2/3
= 0.625
= 0.42Slide16
Conditional Entropy-based method provides best trade-off for Aspect Selection
MethodsAspect CoverageAspect PrecisionAverage Aspect PrecisionRandom0.51400.0933
0.1223Size-based0.3108
0.1508
0.0949
Opin
-Cover
0.5463
0.0913
0.1316
Cond
Ent
0.57700.0856
0.1552
Random0.65540.0871
0.1271Size-based0.60710.1077
0.1340Opin-Cover0.69980.0914
0.1564Cond Ent
0.74970.0789
0.1574
US
Presidents
Digital
Cameras
16Slide17
Aspect Ordering: Human Labeling
ProfessionsQuotations
Parents…
Cluster Constraints
Order Constraints
Parents
Spouse
Party
Positions
…
Date of Birth
Date of Death
Education
Positions
…
Aspect subset
size = K
17
Children
Spouse
Children
Date of Birth
Spouse
Human Agreement
X 3
X 3
X 3Slide18
Aspect Ordering: Measures18
Cluster ConstraintsParents
Spouse
Party
Positions
Children
Parents
Spouse
Parents
Children
Children
Spouse
Party
Positions
Cluster Precision
= 0.5
Is this pair presented
together in the output?
Cluster Penalty
= 1.25
# aspects placed between
this pair in the output?
1
0
1
0
0
2
0
3Slide19
Aspect Ordering: Evaluation Results
Measures:Cluster PrecisionHigher is betterCluster Penalty Lower is betterGold STDRandomOrderOntology
OrderCoherenceOrder1
0.2540
0.9355
0.8978
2
0.2335
0.7758
0.8323
3
0.2523
0.4030
0.5545union
0.30670.7268
0.748819
Gold STDRandomOrderOntologyOrder
CoherenceOrder12.06560.2957
0.201622.1790
0.75300.5222
3
2.3079
2.1328
1.1611
union
1.9735
1.0720
0.7196Slide20
Aspect Ordering: Evaluation Results
Higher is betterGold STDRandomOrderOntologyOrderCoherenceOrder1
0.5106071110.5444
2
0.4759
0.6759
0.5093
3
0.5294
0.7143
0.8175
union
0.5006
0.65000.6833
20
Order Constraints
Date of Birth
Date of Death
Education
Positions
Is this order pair
preserved in the output?
Spouse
Children
1
0
1
Order Precision
= 0.67Slide21
ConclusionsNovel Problem: exploit ontology for structured organization of online opinions
Aspect selectionAspect orderingEvaluation: US presidents and digital camerasConditional Entropy-based aspect selectionCoherence orderingFuture Directions:New aspect suggestion for ontologyBetter alignment of opinion sentences and aspectsOntology + well-written articles21Slide22
Thank you!
&Questions?22