Presenter Wei Hao Huang Authors Furu Wei Shixia Liu Yangqiu Song Shimei Pan Michelle X Zhou Weihong Qian Lei Shi Li Tan ID: 786794
Download The PPT/PDF document "1 TIARA: A Visual Exploratory Text Analy..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
1
TIARA: A Visual Exploratory Text Analytic System
Presenter : Wei-
Hao
Huang
Authors
:
Furu
Wei,
Shixia
Liu,
Yangqiu
Song,
Shimei
Pan Michelle
X.
Zhou,
Weihong
Qian
,
Lei
Shi,
Li
Tan
Qiang
Zhang
SIGKDD
2010
Slide22
Outlines
Motivation
Objectives
Methodology
Experiments
Conclusions
Comments
Slide33
Motivation
The
large
collection of text to
locate needed information or simply deciding
is
very
costly
and
time-consuming
.
Although a number of text analysis technologies
are
often
abstract
and
complex
, may
not be consumable by users.
Slide4Objectives
4
To present
exploratory
visual
analytic system
called
TIARA
(Text Insight via
Automated Responsive
Analytics
).
To combine text analytics and interactive visualization to help users explore and analyze large collections of text.
Documents
TIARA System
Slide55
Methodology
TIARA
Topic Analysis
Topic Ranking
Keyword based Topic Summarization
Time-sensitive Keyword Extraction
Slide6TIARA6
Slide7TIARA System architecture7DatabaseFile system
Slide8Topic AnalysisTo use unsupervised learning methods. is the number of Document is word of Document is vocabulary of size K is the number of topic is document-topic distribution matrix is topic-word distribution matrix 8
N1
N2
K1
0
1
K2
1
1
K1
K2
V1
0.30.7V2
0.80.1
Term frequencies in each cluster
Slide9Topic RankingTopic rank is measured by a combination of both topic content coverage and topic variance.9
Slide10Keyword based Topic Summarization10
Slide11Time-sensitive Keyword Extraction11
Slide1212Time-sensitive Keyword Extraction
Slide13ExperimentsTime-sensitive keyword extraction procedureCompletenessDistinctivenessResponse TimeData set:A personal email collection with 8326 email messages.Emergency room data set containing 23,501 patient records.13
Slide14CompletenessDefined as whether we can recover the original keywords of a topic by combining the keywords associated associated with each time segment.14
Slide15DistinctivenessDefined as whether we can distinguish one topic segment from another based on their associated keywords to avoid redundancy.15
Slide16Completeness and Distinctiveness Results16
Slide17Response Time17
Slide1818
Conclusions
TIARA tightly integrates text analytics with
interactive visualization
to support
effective exploratory text analysis
.
Future work
Add sentence-base summaries
Support other languages
Improve performance
Slide1919
Comments
Advantages
To
explore
and
analyze large
text
collections with interactive visualization
Applications
Text mining