Author Sz rung Shiang HungYI Lee Lin shan Lee Speaker Sz rung Shiang National Taiwan University Outline Introduction Extractive summarization Structured support vector machine ID: 436666
Download Presentation The PPT/PDF document "Supervised Spoken Document Summarization..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Supervised Spoken Document Summarization Based on Structured Support Vector Machine with Utterance Clusters as Hidden Variables
Author: Sz-rung Shiang, Hung-YI Lee, Lin-shan LeeSpeaker: Sz-rung Shiang
National
Taiwan UniversitySlide2
Outline
IntroductionExtractive summarizationStructured support vector machineProposed methodStructured support vector machine with hidden variables ExperimentsConclusionSlide3
IntroductionSlide4
Introduction-Extractive Summarization
Extractive summarizationSelect the indicative utterancesCascade the utterances to form a summaryThe number of utterances selected as summary is decided by a predefined ratio (10%, 30%)
Document: Two food critics have eaten meat that was grown in a lab. It is the first time anyone has eaten artificial meat. The experiment is part of a project run by Google co-founder Sergey Brin. He invested over $380,000 in research for the burger.
Summary:It is the first time anyone has eaten artificial meat.
The experiment is part of a project run by Google co-founder Sergey
Brin
. Slide5
Previously proposed method - SVM
In the previous work using support vector machine:Summarization is taken as a binary classification problem.
Utterance 1Utterance 2Utterance 3
Utterance 4Binary SVM
-0.3 / -1
0.5 / +1
0.8 / +1
-0.7 / -1
Utterance 2
Utterance 3
s
core / label
summary
Select utterances according to the rank of score
until the length reaches constraint Slide6
Previously proposed method - SVM
However, even though we select the utterances with highest score, it may not be the best summary.Similar utterances are prone to be selected at the same time.Selected utterances can not cover all the information in the document.Add “redundancy consideration”
to the selection of summary !!!Slide7
Previously proposed method - MMR
maximal marginal relevance (MMR)UnsupervisedTake redundancy into considerationObjective function (for each utterance)
Importance score
Redundancy
(similarity between the utterance
and the selected summary)
Predefined & fixed parameter
: utterance
: whole document
: selected summary
: similarity score
Slide8
Previously proposed method – structured support vector machine
Combining the benefits of :MMR - Redundancy considerationSVM - Supervised → Structured Support Vector MachineSlide9
Previously proposed method – structured support vector machine
For a document d
with 3 utterances
Utterance in summary
Utterance not in summary
Enumerate
All the possible
Utterance set
summarySlide10
Previously proposed method – structured support vector machine
Inspired by MMR, structured SVM used…importance of the utteranceRedundancy of the utteranceThe objective function:
Constraint of Length
Importance of an utterance
Redundancy:
Similarity of selected utterance pairs
Parameter to balance
Length of the selected summary
→
jointly learned with the
weights for features
The utterance subset which has highest
output of objective
function
→
automatically generated summarySlide11
Proposed approachSlide12
Proposed method
In spontaneous speech…consecutive utterances are more likely to be selected as long summary.One utterance is selected on behalf of a paragraph as short summary.…
…
Important cluster?
unimportant cluster?
Representative of the clusterSlide13
Proposed method
To model this characteristic, we consider “cluster of utterances”.Assume the cluster information were known, we could consider the utterances directly.However, “cluster” is not
labeled in the corpus. Hidden variables!!!Jointly learned with the summary.Slide14
Proposed method
In a spoken document
d: utterancesclusters
h
k
h
k+1
summary (utterance subset)
s
d
: {…, x
i-1
, x
i+1
, x
i+3
, …}
…
…
cluster set
H
d
Utterance in summary
Utterance not in summarySlide15
Proposed method
…
Enumerate all the permutation
For a document
with 3 utterances
Utterance In summary
Utterance not in summary
Enumerate all the cluster set
…
…Slide16
Objective function
Based on the previous work using Structured SVMAnd we add “cluster
” as hidden variables.summary - considering not only “utterance” but also “cluster”Objective function: relation between the cluster
and the selected summary
Cluster quality
Objective function
of structured SVM
Cluster related function
→Slide17
Proposed method
Each function is the inner-product of(1) weight (learned in the training process)(2) feature vector
Slide18
Training & Testing
In the training process, we want to find out a set of weights that…Output of objective function using reference summary and oracle cluster set is the maximum.In testing process…The utterance subset which can generate the maximal output is our generated summary.
To be explained belowSlide19
Training Process
Oracle cluster set :
Reference summary labeled by human
Oracle cluster set
For a document d
with 3 utterances
Reference summary
(answer of training data)
Enumerate cluster set
Cluster set that maximizes
the objective function
Oracle
cluster setSlide20
Training Process
Objective function
…
…
…
Higher than the other with margin:
0.9
0.4
0.1
-0.2
-0.3
0.5
0.7
-0.4
-0.8
0.6
The one using reference
summary and oracle cluster
high
…
…Slide21
Training Process
Loss functionWhere is the ROUGE 1-F measure when…
is the generated summary
is the reference summary (labeled by human)
Slide22
Testing Process
Objective function
…
…
…
0.9
0.4
0.1
-0.2
-0.3
0.5
0.7
-0.4
-0.8
0.6
Maximum!
Generated summarySlide23
FEATURESSlide24
Features for an utterance - F0
(xi) Semantic feature (32)PLSA with 32 topics. Similarity to the whole document (1)PLSA based similarity scoreProsodic feature (60)Pause (12)Duration ( 15)Pitch (20)
Energy (13)Slide25
Features for an utterance - F0
(xi) Key term related feature (2)Number of key terms in an utteranceNumber of key terms occurring first time in the document.Utterance length (1)Number of English words and Chinese characters.Normalized position of utterance (1)“
i/N “ for the i-th utterance in the document with N utterances. Significance score (1)Sum of TF-IDF in an utterance.Slide26
Features for relation between cluster and summary - F
1(sd, hk)Inclusion Completeness (2)The ratio of utterance included in the summarythe purity of utterance (included or not included)
Included in the summary
Not included in the summary
Slide27
Features for relation between cluster and summary - F1(
sd, hk)Consecutiveness (1)Number of utterances included in the summary with neighbor utterances also included in the summary
Included in the summary
Not included in the summary
0
1
1
→
2Slide28
Features for the quality of cluster – F2(
hk)Average of similarity scores (plsa-based) for all pairs of utterances within a cluster. (1)Similarity (
plsa-based) between a cluster and a document (1)
Slide29
experimentsSlide30
Experimental Setup
Corpus: course offered in National Taiwan UniversityMandarin Chinese embedded by English wordsSingle speaker45.2 hoursASR systemAccuracy: 88% for Chinese characters and English words.Slide31
Experimental Setup
Spoken Document:The corpus is segmented into 193 documents.The average length of each document was about 17.5 minutesHuman produced reference summaries for each documentOnly 40 documents are used in this task.4 fold validation (30 for training, 10 for testing)Slide32
Experimental Result
UNSUPERVISED
SUPERVISEDconstraint
Evaluation MeasureMMRbinary SVM
Structured
SVM
Proposed
(without
inclusion completeness
)
ProposedSlide33
UNSUPERVISED
SUPERVISED
constraintEvaluation Measure
MMRbinary SVMStructured SVM
Proposed
(without
inclusion completeness
)
Proposed
10%
ROUGE-1
0.3966
0.4117
0.4315
0.4363
0.4406
ROUGE-2
0.1777
0.1761
0.2162
0.2329
0.2208
ROUGE-L
0.3983
0.4057
0.4229
0.4285
0.4333
30%
ROUGE-1
0.5484
0.5372
0.5624
0.5628
0.5657
ROUGE-2
0.3380
0.3354
0.3500
0.3688
0.3627
ROUGE-L
0.5445
0.5335
0.5577
0.5591
0.5616
Experimental ResultSlide34
UNSUPERVISED
SUPERVISED
constraintEvaluation Measure
MMRbinary SVMStructured SVM
Proposed
(without
inclusion completeness
)
Proposed
10%
ROUGE-1
0.3966
0.4117
0.4315
0.4363
0.4406
ROUGE-2
0.1777
0.1761
0.2162
0.2329
0.2208
ROUGE-L
0.3983
0.4057
0.4229
0.4285
0.4333
30%
ROUGE-1
0.5484
0.5372
0.5624
0.5628
0.5657
ROUGE-2
0.3380
0.3354
0.3500
0.3688
0.3627
ROUGE-L
0.5445
0.5335
0.5577
0.5591
0.5616
Experimental ResultSlide35
UNSUPERVISED
SUPERVISED
constraintEvaluation Measure
MMRbinary SVMStructured SVM
Proposed
(without
inclusion completeness
)
Proposed
10%
ROUGE-1
0.3966
0.4117
0.4315
0.4363
0.4406
ROUGE-2
0.1777
0.1761
0.2162
0.2329
0.2208
ROUGE-L
0.3983
0.4057
0.4229
0.4285
0.4333
30%
ROUGE-1
0.5484
0.5372
0.5624
0.5628
0.5657
ROUGE-2
0.3380
0.3354
0.3500
0.3688
0.3627
ROUGE-L
0.5445
0.5335
0.5577
0.5591
0.5616
Experimental ResultSlide36
UNSUPERVISED
SUPERVISED
constraintEvaluation Measure
MMRbinary SVMStructured SVM
Proposed
(without
inclusion completeness
)
Proposed
10%
ROUGE-1
0.3966
0.4117
0.4315
0.4363
0.4406
ROUGE-2
0.1777
0.1761
0.2162
0.2329
0.2208
ROUGE-L
0.3983
0.4057
0.4229
0.4285
0.4333
30%
ROUGE-1
0.5484
0.5372
0.5624
0.5628
0.5657
ROUGE-2
0.3380
0.3354
0.3500
0.3688
0.3627
ROUGE-L
0.5445
0.5335
0.5577
0.5591
0.5616
Experimental ResultSlide37
UNSUPERVISED
SUPERVISED
constraintEvaluation Measure
MMRbinary SVMStructured SVM
Proposed
(without
inclusion completeness
)
Proposed
10%
ROUGE-1
0.3966
0.4117
0.4315
0.4363
0.4406
ROUGE-2
0.1777
0.1761
0.2162
0.2329
0.2208
ROUGE-L
0.3983
0.4057
0.4229
0.4285
0.4333
30%
ROUGE-1
0.5484
0.5372
0.5624
0.5628
0.5657
ROUGE-2
0.3380
0.3354
0.3500
0.3688
0.3627
ROUGE-L
0.5445
0.5335
0.5577
0.5591
0.5616
Experimental ResultSlide38
UNSUPERVISED
SUPERVISED
constraintEvaluation Measure
MMRbinary SVMStructured SVM
Proposed
(without
inclusion completeness
)
Proposed
10%
ROUGE-1
0.3966
0.4117
0.4315
0.4363
0.4406
ROUGE-2
0.1777
0.1761
0.2162
0.2329
0.2208
ROUGE-L
0.3983
0.4057
0.4229
0.4285
0.4333
30%
ROUGE-1
0.5484
0.5372
0.5624
0.5628
0.5657
ROUGE-2
0.3380
0.3354
0.3500
0.3688
0.3627
ROUGE-L
0.5445
0.5335
0.5577
0.5591
0.5616
Experimental ResultSlide39
Conclusion
The performance of summarization can be improved by considering utterance cluster as document structure.We proposed a method to add the utterance clusters to structured SVM. Slide40
Q & a
Thanks for your attention!