Patterns and Techniques Wahyu Andhyka Kusuma SKom kusumawahyuagmailcom 081233148591 M ateri 5 Problem Detection Topik Metrics ObjectOriented Metrics dalam Praktek Duplikasi k ode ID: 796279
Download The PPT/PDF document "Object-Oriented Reengineering" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Object-Oriented Reengineering Patterns and Techniques
Wahyu Andhyka Kusuma, S.Komkusuma.wahyu.a@gmail.com081233148591
M
ateri 5
Problem Detection
Slide2Topik
MetricsObject-Oriented Metrics dalam PraktekDuplikasi kode
Slide3Topik
MetricsKualitas dari Perangkat LunakMenganalisa KecenderunganObject-Oriented Metrics dalam Praktek
Duplikasi k
ode
Slide47.4
Mengapa menggunakan OO dalam Reengineering?Menaksir kualitas dari perangkat lunakKomponen mana yang memiliki kualitas yang buruk?
(
sehingga dapat di reengineering
)
Komponen yang mana memiliki kualitas yang baik
?
(sehingga dapat di reverse engineered) Metrics sebagai peralatan untuk reengineeringMengontrol proses dari reengineeringMenganalisa kecenderungan : Komponen mana yang bisa diubah??Bagian refactoring mana yang dapat digunakan? Metrics sebagai peralatan reverse engineering!
Slide57.5
ISO 9126 Quantitative Quality ModelSoftware
Quality
Functionality
Reliability
Efficiency
Usability
Maintainability
Portability
ISO 9126
Factor
Characteristic
Metric
Error tolerance
Accuracy
Simplicity
Modularity
Consistency
defect density
= #defects / size
correction impact
= #components
changed
correction time
Slide67.6Product & Process Attributes
Product Attribute
Definisi
:
Mengukur aspek dari
Hasil yang dikirimkan ke pelanggan
Contoh
:
Jumlah dari sistem
Yang rusak, mempelajari
tentang sistem
Process Attribute
Definisi
:
Mengukur aspek dari
Proses dimana memproduksi produk
Contoh
:
waktu untuk memperbaiki
,
kerusakan jumlah dari komponen
Yang dirubah per perbaikan
Slide77.7External & Internal Attributes
External Attribute
Definisi
:
mengukur bagaimana
product/process
berjalan dalam
environment
Contoh
:
waktu rata-rata dalam
kesalahan
,
#components changed
Internal Attribute
Definisi
:
mengukur didalam
Istilah didalam produk Memisahkan FORM, dalam konteks behaviour
Contoh
: class coupling
dan
cohesion, method size
Slide87.8
External vs. Internal Product Attributes
External
Internal
Keuntungan
:
close relationship
dengan
quality factors
Kerugian
:
relationship
dengan
quality factors
tidak dalam
empirically validated
Kerugian
:
Mengukur hanya setelah produk digunakan
Pengumpulan data sulit
data
serinkali ada interfrensi pengguna
Menghubungkan eksternal efek ke dalam internal sangat sulit
Keuntungan
:
Dapat diukur kapanpun
Pengumpulan data dapat secara mudah dan otomatis
Berhubungan langsung dengan pengukuran dan penyebabnya
Slide97.9
Metrik dan PengukuranWeyuker [1988] mendefinisikan sembilan properti dimana Metrik software harus diambilUntuk OO
hanya
6
properti yang sangat penting
[
Chidamber
94, Fenton &
Pfleeger ]Non coarseness:Diberikan sebuah Class P dan sebuak metrik m, kelas lain misal Q juga dapat ditemukan sehingga menjadi m(P) m(Q)Tidak semua kelas memiliki nilai yang sama untuk metrikNon uniqueness. Dimana kelas P dan Q memiliki ukuran tetap sedemikian sehingga m(P) = m(Q)Dua kelas dapat memiliki metrik yang samaMonotonicitym(P)
m (P+Q) dan
m(Q) m (P+Q), P+Q adalah “kombinasi
” dari kelas P dan Q.
Slide107.10
Metrik dan PengukuranDesign Details are ImportantInti utama dari Class harus mempengaruhi nilai dari metrik. Setiap class melakukan aksi yang sama dengan detailnya harus memberikan dampak terhadap nilai dari metrik.Nonequivalence of Interactionm(P) = m(Q)
m(P+R) = m(Q+R)
dimana
R
interaksi dengan Class
Interaction Increases Complexity
m(P) + (Q) < m (P+Q).
Dimana dua class digabungkan, interaksi diantaranya juga akan menambah nilai dari metrik
Kesimpulan: Tidak semua pengukuran berupa Metrik
Slide117.11
Memilih MetrikCepatScalable: Kita tidak dapat menghasilkan log(n2) dimana n
1
juta
LOC
(Line of Code)
Tepat
(
misalnya #methods — perhitungkan semua method, public, juga inherited?)Bergantung pada kodeScalable: Kita menginginkan mengumpulkan metrik dalam waktu samaSederhanaMetrik yang komplek sulit untuk diterjemahkan
Slide127.12
Menaksir kemudahan perbaikanUkuran dari sistem, termasuk entitas dari sistemUkuran Class, Ukuran method, inheritanceUkuran entitas mempengaruhi maintainability
Kesatuan dari
entities
Class internal
Perubahan harusnya ada dikelas tersebut
Coupling
(penggabungan)
diantara entitasDidalam inheritance: coupling diantara class-subclassDiluar inheritanceStrong coupling mempengarui perubahan di kelas tersebut
Slide137.13Sample Size and Inheritance Metrics
Class
Attribute
Method
Access
Invoke
BelongTo
Inherit
Inheritance Metrics
hierarchy nesting level (HNL)
# immediate children (NOC)
# inherited methods, unmodified (NMI)
# overridden methods (NMO)
Class Size Metrics
# methods (NOM)
# instance attributes (NIA, NCA)
# Sum of method size (WMC)
Method Size Metrics
# invocations (NOI)
# statements (NOS)
# lines of code (LOC)
Slide147.14Sample class Size
(NIV) [Lore94] Number of Instance Variables (NIV) [Lore94] Number of Class Variables (static) (NCV) [Lore94] Number of Methods (public, private, protected) (NOM)(LOC) Lines of Code(NSC) Number of semicolons [Li93] number of Statements (WMC) [Chid94] Weighted Method Count
WMC = ∑ c
i
where c is the complexity of a method (number of exit or McCabe Cyclomatic Complexity Metric)
Slide157.15Hierarchy Layout
(HNL) [Chid94] Hierarchy Nesting Level , (DIT) [Li93] Depth of Inheritance Tree, HNL, DIT = max hierarchy level(NOC) [Chid94] Number of Children (WNOC) Total number of Children (NMO, NMA, NMI, NME) [Lore94] Number of Method Overridden, Added, Inherited, Extended (super call)(SIX) [Lore94]SIX (C) = NMO * HNL / NOMWeighted percentage of Overridden Methods
Slide167.16Method Size
(MSG) Number of Message Sends(LOC) Lines of Code(MCX) Method complexity Total Number of Complexity / Total number of methodsAPI calls= 5, Assignment = 0.5, arithmetics op = 2, messages with params = 3....
Slide177.17Sample Metrics: Class Cohesion
(LCOM) Lack of Cohesion in Methods [Chidamber 94] for definition[Hitz 95] for critique Ii = set of instance variables used by method Mi
let P = { (Ii, Ij ) | Ii
Ij =
}
Q = { (Ii, Ij ) | Ii
Ij
} if all the sets are empty, P is empty LCOM = |P| - |Q| if |P|>|Q| 0 otherwiseTight Class Cohesion (TCC)Loose Class Cohesion (LCC)[Bieman 95] for definitionMeasure method cohesion across invocations
Slide187.18Sample Metrics: Class Coupling (i)
Coupling Between Objects (CBO)[Chidamber 94a] for definition, [Hitz 95a] for a discussionNumber of other classes to which it is coupled
Data Abstraction Coupling (DAC)
[Li 93] for definition
Number of ADT’s defined in a class
Change Dependency Between Classes (CDBC)
[
Hitz
96a] for definition
Impact of changes from a server class (SC) to a client class (CC).
Slide197.19Sample Metrics: Class Coupling (ii)
Locality of Data (LD)[Hitz 96] for definitionLD = ∑ |Li | / ∑ |Ti | Li = non public instance variables + inherited protected of superclass
+ static variables of the class
Ti = all variables used in Mi, except non-static local variables
Mi = methods without accessors
Slide207.20The Trouble with Coupling and Cohesion
Coupling and Cohesion are intuitive notionsCf. “computability”E.g., is a library of mathematical functions “cohesive”E.g., is a package of classes that subclass framework classes cohesive? Is it strongly coupled to the framework package?
Slide217.21
Conclusion: Metrics for Quality Assessment
Can internal product metrics reveal which components have good/poor quality?
Yes, but...
Not reliable
false positives: “bad” measurements, yet good quality
false negatives: “good” measurements, yet poor quality
Heavyweight Approach
Requires team to develop (customize?) a quantitative quality model
Requires definition of thresholds (trial and error)
Difficult to interpret
Requires complex combinations of simple metrics
However...
Cheap once you have the quality model and the thresholds
Good focus (± 20% of components are selected for further inspection)
Note: focus on the most complex components first!
Slide22Topik
MetricsObject-Oriented Metrics dalam PraktekDetection strategies, filters and compositionSample detection strategies: God Class …Duplikasi k
ode
Slide237.23Detection strategy
A detection strategy is a metrics-based predicate to identify candidate software artifacts that conform to (or violate) a particular design rule
Slide247.24Filters and composition
A data filter is a predicate used to focus attention on a subset of interest of a larger data setStatistical filtersI.e., top and bottom 25% are considered outliersOther relative thresholdsI.e., other percentages to identify outliers (e.g., top 10%)
Absolute thresholds
I.e., fixed criteria, independent of the data set
A useful detection strategy can often be expressed as a
composition
of data filters
Slide257.25God Class
A God Class centralizes intelligence in the system
Impacts understandibility
Increases system fragility
Slide267.26Feature Envy
Methods that are more interested in data of other classes than their own [Fowler et al. 99]
Slide277.27Data Class
A Data Class provides data to other classes but little or no functionality of its own
Slide287.28Data Class (2)
Slide297.29Shotgun Surgery
A change in an operation implies many (small) changes to a lot of different operations and classes
Slide30Topik
MetricsObject-Oriented Metrics dalam PraktekDuplikasi kodeDetection techniques
Visualizing duplicated code
Slide317.31
Kode di salinContoh dari Mozilla Distribution (Milestone 9)Diambil dari /dom
/
src
/base/nsLocation.cpp
Slide327.32
Contoh
LOC
Duplikasi tanpa komentar
Dengan komentar
gcc
460’000
8.7%
5.6%
Database Server
245’000
36.4%
23.3%
Payroll
40’000
59.3%
25.4%
Message Board
6’500
29.4%
17.4%
Berapa banyak kode diduplikasi?
Biasanya diperkirakan
:
8
hingga
12%
dari kode
Slide337.33
Apa itu duplikasi kode?Duplikasi kode
=
Bagian dari kode program ditemukan ditempat lain dalam satu sistem yang sama
Dalam File yang berbeda
Dalam File sama tapi Method berbeda
Dalam Method yang sama Bagian tersebut harus memiliki logika atau struktur yang sama sehingga dapat diringkas,
Slide347.34
Permasalahan dari duplikasiBiasanya memberikan efek negatifPenggelembungan kodeEfek negatif ketika perbaikan sistem atau softwareMenyalin menjadi kerusakan tambahan dalam kode
Software
Aging, “hardening of the arteries”,
“Software Entropy” increases even small design changes become very difficult to effect
Slide357.35
Nontrivial problem: No a priori knowledge about which code has been copied How to find all clone pairs among all possible pairs of segments?
Mendeteksi duplikasi kode
Slide367.36
Author
Level
Transformed Code
Comparison Technique
Johnson 94
Lexical
Substrings
String-Matching
Ducasse 99
Lexical
Normalized Strings
String-Matching
Baker 95
Syntactical
Parameterized Strings
String-Matching
Mayrand 96
Syntactical
Metric Tuples
Discrete comparison
Kontogiannis 97
Syntactical
Metric Tuples
Euclidean distance
Baxter 98
Syntactical
AST
Tree-Matching
General Schema of Detection Process
Slide377.37Recall and Precision
Slide387.38
…//assign same fastid as containerfastid = NULL;const char* fidptr = get_fastid();if(fidptr != NULL) {
int l = strlen(fidptr);
fastid = newchar[ l + 1 ];
…
fastid=NULL;
constchar*fidptr=get_fastid();
if(fidptr!=NULL)
intl=strlen(fidptr)
fastid = newchar[l+]Simple Detection Approach (i) Assumption: Code segments are just copied and changed at a few places Noise elimination transformation remove white space, comments remove lines that contain uninteresting code elements (e.g., just ‘else’ or ‘}’)
Slide397.39Simple Detection Approach (ii)
Code Comparison StepLine based comparison (Assumption: Layout did not change during copying)Compare each line with each other line. Reduce search space by hashing: Preprocessing: Compute the hash value for each line Actual Comparison: Compare all lines in the same hash bucketEvaluation of the Approach
Advantages: Simple, language independent
Disadvantages: Difficult interpretation
Slide407.40
A Perl script for C++ (i)
Slide417.41A Perl script for C++ (ii)
Handles multiple files
Removes comments
and white spaces
Controls noise (if, {,)
Granularity (number of lines)
Possible to remove keywords
Slide427.42Output Sample
Lines: create_property(pd,pnImplObjects,stReference,false,*iImplObjects);create_property(pd,pnElttype,stReference,true,*iEltType);
create_property(pd,pnMinelt,stInteger,true,*iMinelt);
create_property(pd,pnMaxelt,stInteger,true,*iMaxelt);
create_property(pd,pnOwnership,stBool,true,*iOwnership);
Locations: </face/typesystem/SCTypesystem.C>6178/6179/6180/6181/6182
</face/typesystem/SCTypesystem.C>6198/6199/6200/6201/6202
Lines:
create_property(pd,pnSupertype,stReference,true,*iSupertype);
create_property(pd,pnImplObjects,stReference,false,*iImplObjects);create_property(pd,pnElttype,stReference,true,*iEltType);create_property(pd,pMinelt,stInteger,true,*iMinelt);create_property(pd,pnMaxelt,stInteger,true,*iMaxelt);Locations: </face/typesystem/SCTypesystem.C>6177/6178</face/typesystem/SCTypesystem.C>6229/6230Lines = duplicated linesLocations = file names and line number
Slide437.43Enhanced Simple Detection Approach
Code Comparison StepAs before, but nowCollect consecutive matching lines into match sequencesAllow holes in the match sequenceEvaluation of the ApproachAdvantagesIdentifies more real duplication, language independentDisadvantagesLess simple
Misses copies with (small) changes on every line
Slide447.44Abstraction
Abstracting selected syntactic elements can increase recall, at the possible cost of precision
Slide457.45Metrics-based detection strategy
Duplication is significant if:It is the largest possible duplication chain uniting all exact clones that are close enough to each other.
The duplication is large enough.
Slide467.46Automated detection in practice
Wettel [ MSc thesis, 2004] uses three thresholds:Minimum clone length: the minimum amount of lines present in a clone (e.g., 7)Maximum line bias: the maximum amount of lines in between two exact chunks (e.g., 2)Minimum chunk size: the minimum amount of lines of an exact chunk (e.g., 3)
Mihai Balint, Tudor Gîrba and Radu Marinescu,
“How Developers Copy,”
ICPC 2006
Slide477.47
Visualization of Duplicated CodeVisualization provides insights into the duplication situation
A simple version can be implemented in three days
Scalability issue
Dotplots — Technique from DNA Analysis
Code is put on vertical as well as horizontal axis
A match between two elements is a dot in the matrix
Slide487.48
Detected ProblemFile A contains two copies of a piece of codeFile B contains another copy of this codePossible SolutionExtract MethodAll examples are made using Duploc from an industrial case study (1 Mio LOC C++ System)
Visualization of Copied Code Sequences
Slide497.49
Detected Problem4 Object factory clones: a switch statement over a type variable is used to call individual construction codePossible Solution
Strategy Method
Visualization of Repetitive Structures
Slide507.50Visualization of Cloned Classes
Class A
Class B
Class B
Class A
Detected Problem:
Class A is an edited copy
of class B. Editing & Insertion
Possible Solution
Subclassing …
Slide517.51
20 Classes implementing lists for different data types
Detail
Overview
Visualization of Clone Families
Slide527.52
KesimpulanDuplikasi Kode adalah masalah nyataMembuat sistem semakin susah untuk diubahMendeteksi duplikasi kode adalah masalah beratBeberapa teknik sederhana dapat membantuDukungan dari alat lain juga dibutuhkan
Visualisasi dari kode sangat berguna
Mengatasi duplikasi kode bisa dijadikan bahan penelitian