/
Extensions to Structure Layout Optimizations in the Open64 Extensions to Structure Layout Optimizations in the Open64

Extensions to Structure Layout Optimizations in the Open64 - PowerPoint Presentation

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
452 views
Uploaded On 2016-05-15

Extensions to Structure Layout Optimizations in the Open64 - PPT Presentation

Michael Lai AMD Related Work Structure splitting structure peeling structure field reordering Hagog amp Tice Hundt Mannarswamy amp Chakrabarti Above implemented in the Open64 Compiler ID: 320736

array field structure iteration field array iteration structure 1field 1array 2array 2field interleaving ipl instance ptr marray mfield ipo

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Extensions to Structure Layout Optimizat..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Extensions to Structure Layout Optimizations in the Open64 Compiler

Michael Lai

AMDSlide2

Related Work

Structure splitting, structure peeling, structure field reordering (

Hagog

& Tice,

Hundt

,

Mannarswamy

&

Chakrabarti

)

Above implemented in the Open64 Compiler (

Chakrabarti

& Chow)

Structure instance interleaving (Truong,

Bodin

&

Seznec

)

Data splitting (Curial, Zhao &

Amaral

)

Array reshaping (Zhao, Cui,

Gao

,

Silvera

&

Amaral

)Slide3

Current Framework

source

source

source

WHIRL

WHIRL

WHIRL

.o

.o

.o

WHIRL

WHIRL

ipa_link

frontend

frontend

frontend

ipl

ipl

ipl

Slide4

Instance Interleaving

a[0].field_1

a[0].field_2 an “instance” of the structure

a[0].field_3

a[1].field_1

a[1].field_2 another “instance” of the structurea[1].field_3...Slide5

Instance Interleaving

a[0].field_1

field_1

of all the instances are

a[1].field_1 interleaved together

…a[0].field_2 field_2 of all the instances area[1].field_2 interleaved together…a[0].field_3 field_3 of all the instances area[1].field_3 interleaved together...Slide6

Instance Interleaving

array[0].field_1

array[0].field_2

array[0].field_3

array[0].field_marray[1].field_1array[1].field_2array[1].field_3…array[1].field_m…array[n-1].field_1array[n-1].field_2array[n-1].field_3…

array[n-1].field_m

array[0].field_1array[1].field_1array[2].field_1…array[n-1].field_1array[0].field_2array[1].field_2array[2].field_2…array[n-1].field_2…array[0].field_marray[1].field_m

array[2].field_m…array[n-1].field_mSlide7

Implementation

Profitability analysis (done in

ipl

)

During

ipl compilation of each source file, access patterns of structure fields are analyzed and their usage statistics recordedAfter all the functions have been compiled by ipl, the “most likely to benefit” structure (if any) is marked and passed to ipo(By way of illustration, the ideal structure is one with many fields, each of which appearing in its own hot loop)Slide8

Implementation

Legality analysis (done in

ipo

)

Usual checking for address taken, escaped types, etc.

Code transformation (done in ipo)Create internal pointers ptr_1, ptr_2, …, ptr_m to keep track of the m locations array[0].field_1, array[0].field_2, …, array[0].field_mRewrite array[i].field_j to ptr_j[i], if “i” is known; otherwise, incur additional overhead to compute “i”Slide9

Instance Interleaving

array[0].field_1

array[0].field_2

array[0].field_3

array[0].field_marray[1].field_1array[1].field_2array[1].field_3…array[1].field_m…array[n-1].field_1array[n-1].field_2array[n-1].field_3…

array[n-1].field_m

array[0].field_1array[1].field_1array[2].field_1…array[n-1].field_1array[0].field_2array[1].field_2array[2].field_2…array[n-1].field_2…array[0].field_marray[1].field_m

array[2].field_m…array[n-1].field_m

= ptr_1

= ptr_2= ptr_marray[i].field_j becomes ptr_j[i]Slide10

Array Remapping

field_1

field_2

field_3

field_mfield_1field_2field_3…field_m…field_1field_2field_3…field_m

field_1

field_1field_1…field_1field_2field_2field_2…field_2…field_mfield_mfield_m…field_m

iteration 0

iteration 1

iteration n-1

iteration 0

iteration 1

iteration 2

iteration n-1

iteration 0

iteration 1

iteration 2

iteration n-1

iteration 0

iteration 1

iteration 2

iteration n-1

a[0]

a[1]

a[2]

a[m-1]

a[m]

a[m+1]

a[m+2]

a[2m-1]…a[(n-1)m]a[(n-1)m+1]a[(n-1)m+2]

…a[nm-1]

a[0]a[1]a[2]…a[n-1]a[n]a[n+1]a[n+2]…a[2n-1]…a[(m-1)n]a[(m-1)n+1]a[(m-1)n+2]…a[mn-1]Slide11

Implementation

Profitability analysis (done in

ipl

)

During

ipl compilation of each source file, discover if there are arrays that behave like structures and suffer poor data cache utilization at the same timeAfter all the functions have been compiled by ipl, the “most likely to benefit” arrays (if any) are marked and passed to ipoFor each of these arrays, record the stride, group size, and array size associated with itSlide12

Implementation

Legality analysis (done in

ipo

)

Check for array aliasing, address taken, argument passing, etc.

Code transformation (done in ipo)Construct the array remapping permutation alpha(i) = (i % m) * n + (i / m), where m is the group size and n is the number of such groupsRewrite a[i] to a[alpha(i)]Slide13

Array Remapping

field_1

field_2

field_3

field_mfield_1field_2field_3…field_m…field_1field_2field_3…field_m

field_1

field_1field_1…field_1field_2field_2field_2…field_2…field_mfield_mfield_m…field_m

iteration 0

iteration 1

iteration n-1

iteration 0

iteration 1

iteration 2

iteration n-1

iteration 0

iteration 1

iteration 2

iteration n-1

iteration 0

iteration 1

iteration 2

iteration n-1

a[0]

a[1]

a[2]

a[m-1]

a[m]

a[m+1]

a[m+2]

a[2m-1]…a[(n-1)m]a[(n-1)m+1]a[(n-1)m+2]

…a[nm-1]

a[0]a[1]a[2]…a[n-1]a[n]a[n+1]a[n+2]…a[2n-1]…a[(m-1)n]a[(m-1)n+1]a[(m-1)n+2]…a[mn-1]

a[

i] becomes a[(i%m)*n+(i/m)]Slide14

Performance Results

AMD system

speed (1-copy) run

rate (12-copy) run

462.libquantum (structure peeling)

+6.35%+43.43%429.mcf (instance interleaving)+2.43%+38.38%470.lbm (array remapping)-16.35% (degradation)+138.55%

Intel systemspeed (1-copy) runrate (4-copy) run462.libquantum (structure peeling)+7.01%+24.30%429.mcf (instance interleaving)-6.04% (degradation)+34.62%470.lbm (array remapping)-23.28% (degradation)+119.51%Slide15

Future Work

Integrate existing structure layout optimizations with the new structure instance interleaving work

Combine profitability heuristics of all structure layout optimizations

Extend structure instance interleaving optimization to more than one structure

Extend array remapping optimization to multi-dimensional arraysSlide16

References

G.

Chakrabarti

and F. Chow. “Structure Layout Optimizations in the Open64 Compiler.” Proceedings of the Open64 Workshop, Boston, 2008.

M.

Hagog and C. Tice. “Cache Aware Data Layout Reorganization Optimization in gcc.” Proceedings of the gcc Developers Summit, 2005.R. Hundt, S. Mannarswamy, and D.R. Chakrabarti. “Practical Structure Layout Optimization and Advice.” Proceedings of the International Symposium on Code Generation and Optimization, New York, 2006.D.N. Truong, F. Bodin, and A. Seznec. “Improving Cache Behavior of Dynamically Allocated Data Structures.” Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Washington D.C., 1998.