/
Multi-Abstraction Multi-Abstraction

Multi-Abstraction - PowerPoint Presentation

debby-jeon
debby-jeon . @debby-jeon
Follow
411 views
Uploaded On 2015-11-20

Multi-Abstraction - PPT Presentation

Retrieval Motivation Experiments Overall Framework MultiAbstraction Concern Localization TienDuy B Le Shaowei Wang and David Lo btdle2012 shaoweiwang2010davidlosmuedusg Abstraction ID: 199303

hierarchy abstraction multi topic abstraction hierarchy topic multi level topics document concern model lda vsm localization number models multiple

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Multi-Abstraction" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Multi-Abstraction

Retrieval

Motivation

Experiments

Overall Framework

Multi-Abstraction Concern Localization

Tien-Duy B. Le, Shaowei Wang, and David Lo

{btdle.2012, shaoweiwang.2010,davidlo}@smu.edu.sg

Abstraction

Hierarchy

Method Corpus

 

Concerns

Preprocessing

Hierarchy

Creation

Level 1

Level 2

Level N

….

Standard Retrieval Technique

+

 

Multi-Abstraction Retrieval

Ranked

Methods

Per Concern

 

We remove Java keywords, punctuation marks, special symbols, and break identifiers into tokens based on Camel casing convention

Finally, we apply Porter Stemming algorithm to reduce English words into their root forms.

We apply Latent

Dirichlet

Allocation (LDA), with different number of topics, a number of times, to construct an abstraction hierarchy

Each application of LDA creates a topic model, which corresponds to an abstraction level.

We refer to the number of topic models contained in a hierarchy as the height of the hierarchy

Concern Localization is the process of locating code units that match a particular textual description (bug reports or feature requests)

Recent concern localization techniques compare documents at one level of abstraction (i.e. words/topics)

A word can be abstracted at multiple levels of abstraction. For example,

Eindhoven

can be abstracted to

North Brabant

,

Netherlands

, Western Europe, European Continent, Earth etc. In multi-abstraction concern localization, we represent documents at multiple abstraction levels by leveraging multiple topic models.

Text Preprocessing

Hierarchy Creation Step

We propose multi-abstraction Vector Space Model (VSMMA) by combining VSM with our abstraction hierarchy.In multi-abstraction VSM, document vectors are extended by adding elements corresponding to topics in the hierarchy.Given a query q and a document d in corpus D, the similarity between q and d is calculated in VSMMA as follows:

V

is the size of the original document vector

wi is the ith word in d L is the height of abstraction hierarchy HHi is the ith abstraction level in the hierarchy is the probability of topic ti to appear in d as assigned by the kth topic model in abstraction hierarchy H tf-idf (w,d,D) is the term frequency-inverse document frequency of word w in document d given corpus D

Effectiveness of Multi-Abstraction VSM

Number

of Topics

MAPImprovementBaseline (VSM)0.0669N/AH1500.07156.82%H250, 1000.077716.11%H350, 100, 1500.078717.65%H450, 100, 150, 2000.079919.36%

The MAP improvement of H4 (over baseline) is 19.36%The MAP is improved when the height of the abstraction hierarchy is increased

Future Work

Extend the experiments with combinations of

Different numbers of topics in each level of the hierarchy Different hierarchy heightsDifferent topic models (Pachinko Allocation Model, Syntactic Topic Model, Hierarchical LDA)Experiment with Panichella et al. ‘s method [1] to infer good LDA configurations for our approach[1]A. Panichella, B. Dit, R.Oliveto, M.D. Penta, D. Poshyvanyk, and A.D Lucia. How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. (ICSE 2013)

Where