Michael Lai AMD Related Work Structure splitting structure peeling structure field reordering Hagog amp Tice Hundt Mannarswamy amp Chakrabarti Above implemented in the Open64 Compiler ID: 320736
Download Presentation The PPT/PDF document "Extensions to Structure Layout Optimizat..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Extensions to Structure Layout Optimizations in the Open64 Compiler
Michael Lai
AMDSlide2
Related Work
Structure splitting, structure peeling, structure field reordering (
Hagog
& Tice,
Hundt
,
Mannarswamy
&
Chakrabarti
)
Above implemented in the Open64 Compiler (
Chakrabarti
& Chow)
Structure instance interleaving (Truong,
Bodin
&
Seznec
)
Data splitting (Curial, Zhao &
Amaral
)
Array reshaping (Zhao, Cui,
Gao
,
Silvera
&
Amaral
)Slide3
Current Framework
source
source
source
WHIRL
WHIRL
WHIRL
.o
.o
.o
WHIRL
WHIRL
ipa_link
frontend
frontend
frontend
ipl
ipl
ipl
Slide4
Instance Interleaving
a[0].field_1
a[0].field_2 an “instance” of the structure
a[0].field_3
a[1].field_1
a[1].field_2 another “instance” of the structurea[1].field_3...Slide5
Instance Interleaving
a[0].field_1
field_1
of all the instances are
a[1].field_1 interleaved together
…a[0].field_2 field_2 of all the instances area[1].field_2 interleaved together…a[0].field_3 field_3 of all the instances area[1].field_3 interleaved together...Slide6
Instance Interleaving
array[0].field_1
array[0].field_2
array[0].field_3
…
array[0].field_marray[1].field_1array[1].field_2array[1].field_3…array[1].field_m…array[n-1].field_1array[n-1].field_2array[n-1].field_3…
array[n-1].field_m
array[0].field_1array[1].field_1array[2].field_1…array[n-1].field_1array[0].field_2array[1].field_2array[2].field_2…array[n-1].field_2…array[0].field_marray[1].field_m
array[2].field_m…array[n-1].field_mSlide7
Implementation
Profitability analysis (done in
ipl
)
During
ipl compilation of each source file, access patterns of structure fields are analyzed and their usage statistics recordedAfter all the functions have been compiled by ipl, the “most likely to benefit” structure (if any) is marked and passed to ipo(By way of illustration, the ideal structure is one with many fields, each of which appearing in its own hot loop)Slide8
Implementation
Legality analysis (done in
ipo
)
Usual checking for address taken, escaped types, etc.
Code transformation (done in ipo)Create internal pointers ptr_1, ptr_2, …, ptr_m to keep track of the m locations array[0].field_1, array[0].field_2, …, array[0].field_mRewrite array[i].field_j to ptr_j[i], if “i” is known; otherwise, incur additional overhead to compute “i”Slide9
Instance Interleaving
array[0].field_1
array[0].field_2
array[0].field_3
…
array[0].field_marray[1].field_1array[1].field_2array[1].field_3…array[1].field_m…array[n-1].field_1array[n-1].field_2array[n-1].field_3…
array[n-1].field_m
array[0].field_1array[1].field_1array[2].field_1…array[n-1].field_1array[0].field_2array[1].field_2array[2].field_2…array[n-1].field_2…array[0].field_marray[1].field_m
array[2].field_m…array[n-1].field_m
= ptr_1
= ptr_2= ptr_marray[i].field_j becomes ptr_j[i]Slide10
Array Remapping
field_1
field_2
field_3
…
field_mfield_1field_2field_3…field_m…field_1field_2field_3…field_m
field_1
field_1field_1…field_1field_2field_2field_2…field_2…field_mfield_mfield_m…field_m
iteration 0
iteration 1
iteration n-1
iteration 0
iteration 1
iteration 2
iteration n-1
iteration 0
iteration 1
iteration 2
iteration n-1
iteration 0
iteration 1
iteration 2
iteration n-1
a[0]
a[1]
a[2]
…
a[m-1]
a[m]
a[m+1]
a[m+2]
…
a[2m-1]…a[(n-1)m]a[(n-1)m+1]a[(n-1)m+2]
…a[nm-1]
a[0]a[1]a[2]…a[n-1]a[n]a[n+1]a[n+2]…a[2n-1]…a[(m-1)n]a[(m-1)n+1]a[(m-1)n+2]…a[mn-1]Slide11
Implementation
Profitability analysis (done in
ipl
)
During
ipl compilation of each source file, discover if there are arrays that behave like structures and suffer poor data cache utilization at the same timeAfter all the functions have been compiled by ipl, the “most likely to benefit” arrays (if any) are marked and passed to ipoFor each of these arrays, record the stride, group size, and array size associated with itSlide12
Implementation
Legality analysis (done in
ipo
)
Check for array aliasing, address taken, argument passing, etc.
Code transformation (done in ipo)Construct the array remapping permutation alpha(i) = (i % m) * n + (i / m), where m is the group size and n is the number of such groupsRewrite a[i] to a[alpha(i)]Slide13
Array Remapping
field_1
field_2
field_3
…
field_mfield_1field_2field_3…field_m…field_1field_2field_3…field_m
field_1
field_1field_1…field_1field_2field_2field_2…field_2…field_mfield_mfield_m…field_m
iteration 0
iteration 1
iteration n-1
iteration 0
iteration 1
iteration 2
iteration n-1
iteration 0
iteration 1
iteration 2
iteration n-1
iteration 0
iteration 1
iteration 2
iteration n-1
a[0]
a[1]
a[2]
…
a[m-1]
a[m]
a[m+1]
a[m+2]
…
a[2m-1]…a[(n-1)m]a[(n-1)m+1]a[(n-1)m+2]
…a[nm-1]
a[0]a[1]a[2]…a[n-1]a[n]a[n+1]a[n+2]…a[2n-1]…a[(m-1)n]a[(m-1)n+1]a[(m-1)n+2]…a[mn-1]
a[
i] becomes a[(i%m)*n+(i/m)]Slide14
Performance Results
AMD system
speed (1-copy) run
rate (12-copy) run
462.libquantum (structure peeling)
+6.35%+43.43%429.mcf (instance interleaving)+2.43%+38.38%470.lbm (array remapping)-16.35% (degradation)+138.55%
Intel systemspeed (1-copy) runrate (4-copy) run462.libquantum (structure peeling)+7.01%+24.30%429.mcf (instance interleaving)-6.04% (degradation)+34.62%470.lbm (array remapping)-23.28% (degradation)+119.51%Slide15
Future Work
Integrate existing structure layout optimizations with the new structure instance interleaving work
Combine profitability heuristics of all structure layout optimizations
Extend structure instance interleaving optimization to more than one structure
Extend array remapping optimization to multi-dimensional arraysSlide16
References
G.
Chakrabarti
and F. Chow. “Structure Layout Optimizations in the Open64 Compiler.” Proceedings of the Open64 Workshop, Boston, 2008.
M.
Hagog and C. Tice. “Cache Aware Data Layout Reorganization Optimization in gcc.” Proceedings of the gcc Developers Summit, 2005.R. Hundt, S. Mannarswamy, and D.R. Chakrabarti. “Practical Structure Layout Optimization and Advice.” Proceedings of the International Symposium on Code Generation and Optimization, New York, 2006.D.N. Truong, F. Bodin, and A. Seznec. “Improving Cache Behavior of Dynamically Allocated Data Structures.” Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Washington D.C., 1998.