Scientific Literature ChenTse Tsai Gourab Kundu Dan Roth CS UIUC Understanding Research Communities Consider following questions What are the key applications studied by the community ID: 580170
Download Presentation The PPT/PDF document "Concept-Based Analysis of" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Concept-Based Analysis of Scientific Literature
Chen-Tse
Tsai
,
Gourab Kundu, Dan Roth
CS @ UIUCSlide2
Understanding Research CommunitiesConsider following questionsWhat are the key applications studied by the
community?
What
applications have matured enough to be used as a technique of other applications?What methods were developed to solve a particular problem? In this paperExtract concepts from scientific papersA concept is a cluster of possible mentions{svm, support vector machines, maximal margin classifiers,…}Analyze computational linguistic research by answering above questions
2Slide3
OutlineComputational ApproachConcept Mention ExtractionCitation-Context based Concept Clustering
Evaluation of Algorithms
Understanding Computational Linguistic Research
3Slide4
Concept Mention ExtractionIdentify and categorize mentions of concepts (Gupta and Manning, 2011)
TECHNIQUE
and
APPLICATION “We apply support vector machines on text classification.”Unsupervised Bootstrapping algorithm (Yarowsky, 1995; Collins and Singer, 1999)The proposed algorithmExtract noun phrases
(
Punyakanok
and Roth, 2001)
For
each
category, initialize a decision list by seeds.
For several rounds,Annotate NPs using the decision lists.Extract top features from new annotated phrases, and add them into decision lists.
4Slide5
Paper1……………………………………
support vector machine
………………... …………………………………………………………………………………….
c4.5
……..
Paper2……………………………………
svm-based classification
…………………
.………………………………….............
decision
_
trees
………….…….…………………………………
Paper4……………………………………
maximal_margin_classifiers…………………………………….…………………………………………………………………..
Paper3.…………………………………………………………………………..svm….…………………………………….……………………………………………………
(Cortes,1995)
(Quinlan,1993)
(Vapnik,1995)
(Vapnik,1995)
(Quinlan,1993)
(Cortes,1995)
(Cortes,1995)
(Quinlan,1993)
(Vapnik,1995)
(Vapnik,1995)
(Quinlan,1993)
(Cortes,1995)
c4.5decision trees
s
upport vector machinesvm-based classification
svm
maximal margin classifiers
Citation-Context Based Concept Clustering(CitClus)
Cluster mentions into semantic coherent concepts
Group concept mentions by citation context
Merge clusters based on lexical similarity between mentions in the clustersSlide6
OutlineComputational ApproachConcept Mention Extraction
Citation-Context based Concept Clustering
Evaluation of Algorithms
Understanding Computational Linguistic Research 6Slide7
Evaluation of Mention ExtractionACL Anthology Network Corpus (Radev
et al., 2009)
Training data: 11,005 abstracts
Test data: 474 abstracts (Gupta and Manning 2011)7
Approach
Technique
Application
Pre.
Rec.
F1
Pre.
Rec.
F1
GM 2011
30.5
46.7
36.9
27.6
57.5
37.3
Our approach
48.2
48.8
48.5
44.0
47.3
45.6Slide8
Evaluation of Concept ClusteringManually cluster the extracted mentions from 1000 full text papers. CitClus
: the proposed
approach
LexClus: group the concept mentions by lexical similarityCitClus groups “maximal entropy classifier” and “logistic classifier”“topic modeling” and “
latent dirichlet allocation
”
8
Approach
Technique
Application
LexClus
1.72
1.62
CitClus
1.28
1.49Slide9
OutlineComputational ApproachConcept Mention Extraction
Citation-Context based Concept Clustering
Evaluation of Algorithms
Understanding Computational Linguistic Research 9Slide10
Trends Analysis10
CitClus
LexClus
LDA
The emergence of SVM
The emergence of
Topic modeling
Topic modeling is high in 90’s, because LDA cannot generate a tight enough cluster for a specific concept Slide11
Predictive QualityFor a concept, predict the number of papers in a year, given the number of papers in the previous three years
Linear regression over every three consecutive years
The better the grouping of mentions into coherent concept is, the more stable the trend graph is.
11
Approach
SVM
Decision
Tree
Topic
Modeling
Sentiment Analysis
LexClus
0.97
0.83
0.73
0.48
CitClus
0.52
0.37
0.37
0.46Slide12
Relations Between Concept CategoriesFor a given concept, calculate the ratio between number of application mentions and technique mentions.Three concepts in ACL community
S
upport vector machines, Machine translation, POS tagging
12
SVM,
#app/#tech
MT, #tech/#app
POS tagging, #tech/#appSlide13
Relations Between Concept CategoriesFor a given application, what techniques have been applied to it.
13
Machine translation
Named entity recognition
Phrase-based and MERT
Decision Tree
Decision Tree disappears
CRF Slide14
ConclusionThis work proposed algorithms for identifying, categorizing and clustering mentions of scientific concepts.These tools can provide rather deep understanding and useful insight of research communities.
14
Named entity recognition