TienDuy B Le Shaowei Wang and David Lo School of Information Systems Singapore Management University 1 Motivation Concern Localization Locating code units that match a text descriptions ID: 777831
Download The PPT/PDF document "Multi-Abstraction Concern Localization" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Multi-Abstraction Concern Localization
Tien-Duy B. Le, Shaowei Wang, and David LoSchool of Information SystemsSingapore Management University
1
Slide2Motivation
Concern LocalizationLocating code units that match a text descriptionsText descriptions: bug reports or feature requestsCode units: classes or methods’ source codeDocuments are compared Based on words (IR) or topics (topic modeling) that they contain
compared at
one level of abstraction i.e. word/topic level
2
Slide3Motivation
A word can be abstracted at multiple levels of abstraction. 3
Eindhoven
North Brabant
Netherlands
Western Europe
European Continent
Level 1
Level 2
Level 3
Level N
…
Slide4Multi-Abstraction Concern Localization
4
Level 1
Level 2
Level 3
Level N
…
Level 1
Level 2
Level 3
Level N
…
Source Code
Bug Report or
Feature Request
compare
Slide5Multi-Abstraction Concern Localization
Locating code units that match a textual descriptionsBy comparing documents at multiple abstraction levels.By leveraging multiple topic models3 main components
Text preprocessing
Hierarchy creation
Multi-abstraction retrieval technique
5
Slide6Abstraction
Hierarchy
Method Corpus
Concerns
Preprocessing
Hierarchy
Creation
Level 1
Level 2
Level N
….
Standard Retrieval Technique
+
Multi-Abstraction Retrieval
Ranked
Methods
Per Concern
Overall framework
6
Slide7Hierarchy Creation
We apply Latent Dirichlet Allocation (LDA) a number of timesLDA (with default setting) acceptsNumber of topics KA set of documentsLDA returnsK
topics, each is a distribution of words
Probability of topic
t to appear in document d
7
Slide8Hierarchy Creation
Each application of LDA creates a topic model with K topicsAssigned to a documentCorresponds to an abstraction levelAbstraction hierarchy of height
L
Height = number of topic models Created by
L
LDA applications
8
Slide9Multi-Abstraction Vector Space Model
Multi-Abstraction Vector Space Model (VSM)Standard VSM + Abstraction HierarchyIn standard Vector Space ModelDocument is represented as a vector of weightsEach element corresponds to a wordIts value is the weight of the word
Term frequency-inverse document frequency (
tf-idf
)
9
Slide10Multi-Abstraction Vector Space Model
We extend document vectorsAdded elements:Topics of topic models in the abstraction hierarchyTheir values are the probabilities of the topics to appear in the documentsExample:Document vector has length of 10
Abs. hierarchy has 3 topic models of size 50,100,150
Extended document vector is of size:
10+ (50+100+150) = 310
10
Slide11Experiments
Dataset:285 AspectJ faulty versions extracted from iBugsEvaluation Metric:Mean Average Precision (MAP)
11
Hierarchies
Number
of Topics
H1
50
H2
50,
100
H3
50, 100, 150
H4
50,
100, 150, 200
Slide12Empirical Result
MAPImprovement Over Baseline
Baseline (VSM)
0.0669
N/A
H1
0.0715
6.82%
H2
0.0777
16.11%
H3
0.0787
17.65%
H40.0799
19.36%
The MAP improvement of H4 is 19.36%
The MAP is improved when the height of the abstraction hierarchy is increased
12
Slide13Empirical Result
Improvement(p)H1H2
H3
H4
21
27
30
30
25
22
25
22
18
14
12
11
113
6442
41
108158
176
181
Number of concerns with various Improvements:
The improvements are positive for most of the concerns
13
Slide14Conclusion
We propose a multi-abstraction concern localization frameworkWe also propose a multi-abstraction vector space modelOur experiments on 285 AspectJ bugs show that MAP improvement is up to 19.36%14
Slide15Future work
Extend experiments by investigating:Different numbers of topics in each level of the hierarchy Different hierarchy heightsDifferent topic models
Topic
Model
Word Ordering
Word
Correlation
Latent
Dirichlet
Allocation
Bag of Words
No
Pachinko Allocation Model
Bag of Words
YesSyntactic Topic Model
Sequence of WordsNo
15
Slide16Future work
Analyze the effects of document lengths:For different number of topicsFor different hierarchy heightsExperiment with Panichella et al. ‘s method [1] to infer good LDA configurations for our approach[1] A.
Panichella
, B.
Dit, R.Oliveto, M.D.
Penta
, D.
Poshyvanyk
, and A.D Lucia.
How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms
. (
ICSE 2013)
16
Slide1717
Thank you!
Questions
? Comments? Advice?
{btdle.2012, shaoweiwang.201,
davidlo
}@smu.edu.sg