Debanjan Ghosh Smaranda Muresan Nina Wacholder Mark Aakhus and Matthew Mitsui First Workshop on Argumentation Mining ACL June 26 2014 But when we first tried the iPhone it felt natural immediately we didnt have to unlearn old habits from our antiquated ID: 358140
Download Presentation The PPT/PDF document "Analyzing Argumentative Discourse Units ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Analyzing Argumentative Discourse Units in Online Interactions
Debanjan Ghosh, Smaranda Muresan, Nina Wacholder, Mark Aakhus and Matthew Mitsui
First Workshop on Argumentation
Mining, ACL
June 26, 2014Slide2
But when we first tried the iPhone it felt natural immediately, we didn't have to 'unlearn' old habits from our antiquated
Nokias & Blackberrys. That happened because the iPhone is a truly great design.
That's very true. With the iPhone, the sweet goodness part
of the UI is immediately apparent. After a minute or two, you’
re feeling empowered and comfortable. It's the weaknesses that take several days or weeks for you to really understanding and get frustrated by.I disagree that the iPhone just "felt natural immediately"... In my opinion it feels restrictive and over simplified, sometimes to the point of frustration.
User1
User2
User3
when we first tried the iPhone it felt natural immediately,
That’s very true. With the iPhone, the sweet goodness part of
The UI is immediately apparent. After a minute or two, you’re
Feeling empowered and comfortable.
I disagree that the iPhone just “felt natural immediately”… in my
Opinion it feels restrictive and over simplified, sometimes to the
Point of frustration.
Segmentation
Segment Classification
Relation Identification
Argumentative Discourse Units (ADU;
Peldszus and Stede, 2013)
That’s very true. With the iPhone, the sweet goodness part ofThe UI is immediately apparent. After a minute or two, you’re
Feeling empowered and comfortable.
I disagree that the iPhone just “felt natural immediately”… in myOpinion it feels restrictive and over simplified, sometimes to the
Point of frustration.Slide3
Annotation Challenges
A complex annotation scheme seems infeasibleThe problem of high *cognitive load* (annotators have to read all the threads)High complexity demands two or more annotatorsUse of expert annotators for all tasks is costly
3Slide4
Our Approach: Two
-tiered Annotation Scheme Coarse-grained annotationExpert annotators (EAs)
Annotate entire threadFine-grained annotation
Novice annotators (Turkers)Annotate only text labeled by EAs
4Slide5
Our Approach: Two
-tiered Annotation Scheme Coarse-grained annotationExpert annotators (EAs)
Annotate entire thread
Fine-grained annotationNovice
annotators (Turkers
)Annotate only text labeled by EAs5Slide6
Coarse-grained Expert Annotation
Pragmatic Argumentation Theory (PAT;
Van Eemeren et al
., 1993
)
based annotation6Post1
Post2
Post3
Post4
Target
Callout
Post2
Post3Slide7
ADUs: Callout and Target
A Callout is a subsequent action that selects all or some part of a prior action (i.e., Target) and comments on it in some way. A Target is a part of a prior action that has been called out by a subsequent
action.
7Slide8
But when we first tried the iPhone it felt natural immediately, we didn't have to 'unlearn' old habits from our antiquated
Nokias & Blackberrys. That happened because the iPhone is a truly great design.
That's very true. With the iPhone, the sweet goodness part
of the UI is immediately apparent. After a minute or two, you’
re feeling empowered and comfortable. It's the weaknesses that take several days or weeks for you to really understanding and get frustrated by.I disagree that the iPhone just "felt natural immediately"... In my opinion it feels restrictive and over simplified, sometimes to the point of frustration.
User1
User2
User3
when we first tried the iPhone it felt natural immediately,
That’s very true. With the iPhone, the sweet goodness part of
The UI is immediately apparent. After a minute or two, you’re
Feeling empowered and comfortable.
I disagree that the iPhone just “felt natural immediately”… in my
Opinion it feels restrictive and over simplified, sometimes to the
Point of frustration.
That’s very true. With the iPhone, the sweet goodness part of
The UI is immediately apparent. After a minute or two, you’re
Feeling empowered and comfortable.
I disagree that the iPhone just “felt natural immediately”… in my
Opinion it feels restrictive and over simplified, sometimes to the
Point of frustration.
Target
Callout
CalloutSlide9
More on Expert Annotations and Corpus
Five Annotators were free to choose any text segment to represent an ADUFour blogs and their first one-hundred comment sections are used as our argumentative corpusAndroid (iPhone vs. Android phones)
iPad (usability of iPad as a tablet)Twitter (use of Twitter as a micro-blog platform)Job Layoffs (layoffs and outsourcing)
9Slide10
Inter Annotator Agreement (IAA) for Expert Annotations
Thread
F1_EM
F1_OM
Krippendorff’s
Android54.487.80.64iPad 51.286.00.73
Layoffs
51.9
87.5
0.87Twitter
53.8
88.5
0.82
P/R/F1 based IAA
(Wiebe et al., 2005)
exact
match (EM)
overlap match (OM
)Krippendorff’s (Krippendorff, 2004)
10Slide11
Issues
Different IAA metrics have different outcomeIt is difficult to infer from IAA that what segments of the text are easier or harder to annotate
11Slide12
Our solution: Hierarchical Clustering
We utilize a hierarchical clustering technique to cluster ADUs that are variant of a same Callout
Thread
# of Clusters
# of
Expert Annotator/ADUs per cluster54321Android
91
52
16
11
7
5
Ipad
88
41
177
13
10
Layoffs
86
4118
11
610
Twitter
84
4417
14
4
5
Clusters with 5 and 4 annotators
shows
Callouts
that are plausibly easier to
identify
C
lusters
selected by only one or two annotators are harder to
identify
12Slide13
Example of a Callout Cluster
13Slide14
Motivation for a finer-grained annotation
What is the nature of the relation between a Callout and a Target?Can we identify finer-grained ADUs in a Callout?14Slide15
Our Approach: Two
-tiered Annotation Scheme Coarse-grained annotation
Expert annotators (EAs) Annotate entire thread
Fine-grained annotationNovice
annotators (Turkers
)Annotate only text labeled by EAs15Slide16
Novice Annotation: task 1
16
T
CO
T
CO
T
T
CO
CO
This is related to annotation of
agreement/disagreement (
Misra
and Walker,
2013;
Andreas et al.,
2012) identification research.
Agree/Disagree/OtherSlide17
But when we first tried the iPhone it felt natural immediately, we didn't have to 'unlearn' old habits from our antiquated
Nokias & Blackberrys. That happened because the iPhone is a truly great design.
That's very true. With the iPhone, the sweet goodness part
of the UI is immediately apparent. After a minute or two, you’
re feeling empowered and comfortable. It's the weaknesses that take several days or weeks for you to really understanding and get frustrated by.I disagree that the iPhone just "felt natural immediately"... In my opinion it feels restrictive and over simplified, sometimes to the point of frustration.
User1
User2
User3
when we first tried the iPhone it felt natural immediately,
That’s very true. With the iPhone, the sweet goodness part of
The UI is immediately apparent. After a minute or two, you’re
Feeling empowered and comfortable.
I disagree that the iPhone just “felt natural immediately”… in my
Opinion it feels restrictive and over simplified, sometimes to the
Point of frustration.
That’s very true. With the iPhone, the sweet goodness part of
The UI is immediately apparent. After a minute or two, you’re
Feeling empowered and comfortable.
I disagree that the iPhone just “felt natural immediately”… in my
Opinion it feels restrictive and over simplified, sometimes to the
Point of frustration.
Target
Callout
CalloutSlide18
More from Agree
/Disagree Relation Label
For each Target/Callout pair we employed five TurkersFleiss’ Kappa shows
moderate agreement between the Turkers143 Agree/153 Disagree/50 Other
data instance
We run preliminary experiments for predicting the relation label (rule based, BoW, Lexical Features…)Best results (F1): 66.9% (Agree) 62.9% (Disagree)18Slide19
Novice Annotation: task 2
2: Identifying Stance vs. Rationale
19
This is related to identification of justification task (
Biran
and Rambow, 2011)
CO
S
R
Difficulty
TSlide20
That's very true. With the iPhone, the sweet goodness part
of the UI is immediately apparent. After a minute or two, you’re
feeling empowered and comfortable.
It's the weaknesses that take several days or weeks for you to
really understanding and get frustrated by.
I disagree that the iPhone just "felt natural immediately"... In my opinion it feels restrictive and over simplified, sometimes to the point of frustration.User2User3That’s very true. With the iPhone, the sweet goodness part of
The UI is immediately apparent. After a minute or two, you’re
Feeling empowered and comfortable.
I disagree that the iPhone just “felt natural immediately”… in my
Opinion it feels restrictive and over simplified, sometimes to the
Point of frustration.
That’s very true
I disagree that the iPhone just “felt natural immediately”
Stance
RationaleSlide21
Examples of Callout/Target pairs with difficulty level (majority voting)
Target
Callout
Stance
Rationale
Difficultythe iPhone is a truly great design.I disagree too. some things they get right, some things they do not.I…tooSome things…do notEasythe dedicated `Back' button
that back button is key. navigation is actually much easier on the android.
That back button is key
Navigation is…android
Moderate
It's more about the features and apps and Android seriously lacks on latter.
Just because the iPhone has a huge amount of apps, doesn't mean they're all worth having.
-
Just because the iPhone has a huge amount of apps, doesn't mean they're all worth having.
DifficultI feel like your comments about Nexus One is too positive …
I feel like your poor grammar are to obvious to be self thought...
-
-
Too difficult/ unsure
21Slide22
Difficulty judgment (majority voting)
Diff
Number of Expert Annotators per cluster
5
4
321Easy81.070.860.963.625.0Moderate
7.7
7.0
17.1
6.1
25.0
Difficult
5.9
5.9
7.3
9.1
12.5
Too Difficult to code
5.4
16.4
14.621.2
37.5
22Slide23
Conclusion
We propose a two-tiered annotation scheme for argument annotation for online discussion forumsExpert annotators detect Callout/Target pairs where crowdsourcing is employed to discover finer units like Stance/RationaleOur study also assists in detecting the text that is easy/hard to annotatePreliminary experiments to predict agreement/disagreement among ADUs
23Slide24
Future Work
Qualitative analysis of the Callout phenomenon to process finer-grained analysisStudy the different use of the ADUs on different situations Annotation on different domain (e.g. healthcare forums) and adjust our annotation schemePredictive modeling of Stance/Rationale phenomenon
24Slide25
Thank you
!25Slide26
26
Example from the discussion thread
Stance
Rationale
User2
User3Slide27
Predicting the Agree/
Disagree Relation Label
Training data (143 Agree/153 Disagree)Salient Features for the experimentsBaseline: rule based (`agree’, `disagree’)
Mutual Information (MI): MI is used to select words to represent each categoryLexFeat: Lexical features based on sentiment lexicons (
Hu and Liu, 2004
), lexical overlaps, initial words of the Callouts… 10-fold CV using SVM27Slide28
Predicting the Agree/Disagree Relation
Label (preliminary result) Lexical features result in F1 score between 60-70% for Agree/Disagree relationsAblation tests show initial words of the Callout is the strongest feature
Rule-based system show very low recall (7%), which indicates a lot of Target-Callout relations are *implicit*Limitation – lack of data (in process of annotating more data currently…)
28Slide29
# of Clusters for each Corpus
Thread
# of Clusters
# of EA ADUs per cluster
5
43219152
16
11
7
5
Ipad
88
41
17
7
1310
Layoffs
86
41
18
116
10Twitter
84
44
1714
4
5
Clusters with 5 and 4 annotators
shows
Callouts
that are plausibly easier to
identify
C
lusters
selected by only one or two annotators are harder to
identify
29Slide30
30
Target
Callout2
Callout1
User1
User2
User3Slide31
31
Target
Callout2
Callout1
User1
User2
User3Slide32
Fine-GrainedNovice Annotation
32
T
CO
T
T
CO
T
CO
E.g., Agree/Disagree/Other
E.g., Relation Identification
Finer-Grained Annotation
E.g., Stance &Rationale
COSlide33
Motivation and Challenges
33
Post1
Post2
Post3
Post4
Segmentation
Segment Classification
Relation Identification
Argumentative Discourse Units (ADU;
Peldszus and Stede,
2013) Slide34
Why we propose a two-layer annotation?
A two-layer annotation schema Expert AnnotationFive annotators who received extensive training for the taskPrimary task includes selecting discourse units from user’ posts (argumentative discourse units: ADU)Peldszus and Stede (2013
Novice AnnotationUse of Amazon Mechanical Turk (AMT) platform to detect the nature and role of the ADUs selected by the experts
34Slide35
Annotation Schema for Expert Annotators
Call Out A Callout is a subsequent action that selects all or some part of a prior
action (i.e., Target) and comments on it in some
way.
Target
A Target is a part of a prior action that has been called out by a subsequent action 35Slide36
Motivation and Challenges
User generated conversational data provides a wealth of naturally generated argumentsArgument mining of such online interactions, however, is still in its infancy…
36Slide37
Detail on Corpora
Four blog posts and the responses (e.g. first 100 comments) from Technorati between 2008-2010. We selected blog postings in the general topic of technology, which contain many disputes and arguments.Together they are denoted as – argumentative corpus
37Slide38
Motivation and Challenges (cont.)
A detailed single annotation scheme seems infeasibleThe problem of high *cognitive load* (e.g. annotators have to read all the threads)Use of expert annotators for all tasks is costly We propose a scalable and principled
two-tier scheme to annotate corpora for arguments
38Slide39
Annotation Schema(s)
A two-layer annotation schema Expert AnnotationFive annotators who received extensive training for the taskPrimary task includes a) segmentation, b) segment classification, and c) relation identification lecting discourse units from user’ posts (argumentative discourse units: ADU)
Novice AnnotationUse of Amazon Mechanical Turk (AMT) platform to detect the nature and role of the ADUs selected by the experts
39Slide40
Example from the discussion thread
40Slide41
A picture is worth…
41Slide42
Motivation and Challenges
Segmentation
Segment ClassificationRelation Identification
Argument annotation includes three tasks (
Peldszus and
Stede, 2013) 42Slide43
Summary of the Annotation Schema(s)
First stage of annotationAnnotators: expert (trained) annotatorsA coarse-grained annotation scheme inspired by Pragmatic Argumentation Theory (PAT; Van Eemeren et al., 1993) Segment, label, and link Callout and Target
Second stage of annotationAnnotators: novice (crowd) annotators
A finer-grained annotation to detect Stance and Rationale of an argument
43Slide44
Expert Annotation
Expert Annotators
Segmentation
Labeling
Linking
Peldszus and Stede (2013) Coarse-grained annotationFive Expert (trained) annotators detect two types of ADUsADU: Callout and Target
44Slide45
The Argumentative Corpus
Blogs and comments extracted from
Technorati
(2008-2010)
3
12445Slide46
Novice Annotations: Identifying Stance and Rationale
Callout
Crowdsourcing
Identify the task-difficulty (very difficult….very easy)
Identify the text segments (Stance and Rationale)
46Slide47
Novice Annotations: Identifying the relation between ADUs
Crowdsourcing
Callout
Target
…
………Relation labelNumber of EA ADUs per cluster
5
4
3
2
1
Agree
39.4
43.3
42.5
35.5
48.4
Disagree
56.9
31.7
32.525.8
19.4
Other
3.7025.0
25.0
38.7
32.3
47Slide48
More on Expert Annotations
Annotators were free to chose any text segment to represent an ADU
Splitters
Lumpers
48Slide49
Novice Annotation: task 1
1: Identifying the relation
(agree/disagree/other)
This is related to annotation of
agreement/disagreement (
Misra and Walker, 2013; Andreas et al., 2012) and classification of stances (Somasundaran and Wiebe, 2010) in online forums. 49Slide50
ADUs: Callout and Target
50Slide51
Examples of Clusters
# of EAsCallout
Target5
I disagree too. some things they get right, some things they do not.
the iPhone is a truly great design.
I disagree too…they do not.That happened because the iPhone is a truly great design.2These iPhone Clones are playing catchup. Good luck with that.griping about issues that will only affect them once in a blue moon 1Do you know why the Pre ...various hand- set/builds/resolution issues?
Except for games?? iPhone is clearly dominant there.
51Slide52
More on Expert Annotations
Annotators were free to chose any text segment to represent an ADU
52Slide53
Example from the discussion thread
53Slide54
Coarse-grained Expert Annotation
Target
Callout
Pragmatic Argumentation Theory (PAT;
Van Eemeren et al., 1993) based annotation
54Slide55
ADUs: Callout and Target
55Slide56
More on Expert Annotations and Corpus
Five Annotators were free to chose any text segment to represent an ADUFour blogs and their first one-hundred comment sections are used as our argumentative corpus
56
Layoffs
Android
TwitteriPadSlide57
Examples of Cluster
# of EAsCallout
Target
5
I disagree too. some things they get right, some things they do not.
the iPhone is a truly great design.I disagree too…they do not.That happened because the iPhone is a truly great design.I disagree too. But when we first tried the iPhone it felt natural immediately . . . iPhone is a truly great design.
Hi there, I disagree too . . . they do not. Same as OSX.
-Same
as above-
I disagree too. . . Same as OSX . . . no problem.
-Same
as above-
57Slide58
Predicting the Agree/
Disagree Relation Label
Features
Categ.
P
RF1Baseline Agree83.36.9012.9Disagree
50.0
5.20
9.50
Unigrams
Agree
57.9
61.5
59.7
Disagree
61.858.2
59.9
MI-based unigram
Agree
60.1
66.463.1
Disagree
65.2
58.861.9
LexF
Agree
61.4
73.4
66.9
Disagree
69.6
56.9
62.6
58Slide59
Novice Annotation: task 2
2: Identifying Stance vs. Rationale
59
This is related to identification of claim/justification task(
Biran
and Rambow, 2011)