### Presentations text content in Advanced Compiler Techniques

Advanced Compiler Techniques

LIU XianhuaSchool of EECS, Peking University

Loops

Slide2“Advanced Compiler Techniques”

Content

Concepts:DominatorsDepth-First OrderingBack edgesGraph depthReducibilityNatural LoopsEfficiency of Iterative AlgorithmsDependences & Loop Transformation

2

Slide3“Advanced Compiler Techniques”

Loops are Important!

Loops dominate program execution timeNeeds special treatment during optimizationLoops also affect the running time of program analysese.g., A dataflow problem can be solved in just a single pass if a program has no loops

3

Slide4“Advanced Compiler Techniques”

Dominators

Node d dominates node n if every path from the entry to n goes through d.written as: d dom nQuick observations:Every node dominates itself.The entry dominates every node.Common Cases:The test of a while loop dominates all blocks in the loop body.The test of an if-then-else dominates all blocks in either branch.

4

Slide5“Advanced Compiler Techniques”

Dominator Tree

Immediate dominance: d idom n d dom n, d n, no m s.t. d dom m and m dom nImmediate dominance relationships form a tree

1

3

5

2

4

1

3

5

2

4

5

Slide6“Advanced Compiler Techniques”

Finding Dominators

A dataflow analysis problem: For each node, find all of its dominators.Direction: forwardConfluence: set intersectionBoundary: OUT[Entry] = {Entry}Initialization: OUT[B] = All nodesEquations:OUT[B] = IN[B] U {B}IN[B] = p is a predecessor of B OUT[p]

6

Slide7Example: Dominators

7

1

3

5

2

4

{1,5}

{1,4}

{1,2,3}

{1,2}

{1}

{1}

{1}

{1}

{1}

{1,2}

“Advanced Compiler Techniques”

Slide8“Advanced Compiler Techniques”

Depth-First Search

Start at entry.If you can follow an edge to an unvisited node, do so.If not, backtrack to your parent (node from which you were visited).

8

Slide9“Advanced Compiler Techniques”

Depth-First Spanning Tree

Root = entry.Tree edges are the edges along which we first visit the node at the head.

1

5

3

4

2

9

Slide10“Advanced Compiler Techniques”

Depth-First Node Order

The reverse of the order in which a DFS retreats from the nodes. 1-4-5-2-3Alternatively, reverse of postorder traversal of the tree. 3-2-5-4-1

1

3

5

2

4

10

Slide11“Advanced Compiler Techniques”

Four Kinds of Edges

Tree edges.Advancing edges (node to proper descendant).Retreating edges (node to ancestor, including edges to self).Cross edges (between two nodes, neither of which is an ancestor of the other.

11

Slide12“Advanced Compiler Techniques”

A Little Magic

Of these edges, only retreating edges go from high to low in DF order.Example of proof: You must retreat from the head of a tree edge before you can retreat from its tail.Also surprising: all cross edges go right to left in the DFST.Assuming we add children of any node from the left.

12

Slide13Example: Non-Tree Edges

13

1

3

5

2

4

Retreating

Forward

Cross

“Advanced Compiler Techniques”

Slide1414

Back Edges

An edge is a back edge if its head dominates its tail.Theorem: Every back edge is a retreating edge in every DFST of every flow graph.Converse almost always true, but not always.

Back

edge

Head reached before

tail in any DFST

Search must reach the

tail before retreating

from the head, so tail is

a descendant of the head

“Advanced Compiler Techniques”

Slide15Example: Back Edges

15

1

3

5

2

4

{1,5}

{1,4}

{1,2,3}

{1,2}

{1}

“Advanced Compiler Techniques”

Slide1616

Reducible Flow Graphs

A flow graph is reducible if every retreating edge in any DFST for that flow graph is a back edge.Testing reducibility: Remove all back edges from the flow graph and check that the result is acyclic.Hint why it works: All cycles must include some retreating edge in every DFST.In particular, the edge that enters the first node of the cycle that is visited.

“Advanced Compiler Techniques”

Slide17DFST on a Cycle

17

Depth-first search

reaches here first

Search must reach

these nodes before

l

eaving the cycle

So this is a

retreating edge

“Advanced Compiler Techniques”

Slide18“Advanced Compiler Techniques”

Why Reducibility?

Folk theorem: All flow graphs in practice are reducible.Fact: If you use only while-loops, for-loops, repeat-loops, if-then(-else), break, and continue, then your flow graph is reducible.

18

Slide19Example: Remove Back Edges

19

1

3

5

2

4

Remaining graph is acyclic.

“Advanced Compiler Techniques”

Slide20Example: Nonreducible Graph

20

A

C

B

In any DFST, one

of these edges will

be a retreating edge.

A

B

C

A

B

C

But no heads dominate their tails, so deleting

back edges leaves the cycle.

“Advanced Compiler Techniques”

Slide2121

Why Care AboutBack/Retreating Edges?

Proper ordering of nodes during iterative algorithm assures number of passes limited by the number of “nested” back edges.Depth of nested loops upper-bounds the number of nested back edges.

“Advanced Compiler Techniques”

Slide22“Advanced Compiler Techniques”

DF Order and Retreating Edges

Suppose that for a RD analysis, we visit nodes during each iteration in DF order.The fact that a definition d reaches a block will propagate in one pass along any increasing sequence of blocks.When d arrives at the tail of a retreating edge, it is too late to propagate d from OUT to IN.The IN at the head has already been computed for that round.

22

Slide23Example: DF Order

23

1

3

5

2

4

d

d

d

d

d

d

d

d

d

d

Definition d is

Gen’d

by node 2.

The first pass

The second pass

“Advanced Compiler Techniques”

Slide24“Advanced Compiler Techniques”

Depth of a Flow Graph

The depth of a flow graph with a given DFST and DF-order is the greatest number of retreating edges along any acyclic path.For RD, if we use DF order to visit nodes, we converge in depth+2 passes.Depth+1 passes to follow that number of increasing segments.1 more pass to realize we converged.

24

Slide25Example: Depth = 2

25

1->4->7 ---> 3->10->17 ---> 6->18->20

increasing

retreating

increasing

increasing

retreating

Pass 1

Pass 2

Pass 3

“Advanced Compiler Techniques”

Slide26“Advanced Compiler Techniques”

Similarly . . .

AE also works in depth+2 passes.Unavailability propagates along retreat-free node sequences in one pass.So does LV if we use reverse of DF order.A use propagates backward along paths that do not use a retreating edge in one pass.

26

Slide27“Advanced Compiler Techniques”

In General . . .

The depth+2 bound works for any monotone framework, as long as information only needs to propagate along acyclic paths.Example: if a definition reaches a point, it does so along an acyclic path.

27

Slide28However . . .

Constant propagation does not have this property.

28

a = b

b = c

c = 1

L: a = b

b = c c = 1 goto L

“Advanced Compiler Techniques”

Slide29“Advanced Compiler Techniques”

Why Depth+2 is Good

Normal control-flow constructs produce reducible flow graphs with the number of back edges at most the nesting depth of loops.Nesting depth tends to be small.A study by Knuth has shown that average depth of typical flow graphs =~2.75.

29

Slide30Example: Nested Loops

30

3 nested while-

loops; depth =

3

3 nested repeat-

loops; depth = 1

“Advanced Compiler Techniques”

Slide31“Advanced Compiler Techniques”

Natural Loops

A natural loop is defined by:A single entry-point called headera header dominates all nodes in the loopA back edge that enters the loop headerOtherwise, it is not possible for the flow of control to return to the header directly from the "loop" ; i.e., there really is no loop.

31

Slide32“Advanced Compiler Techniques”

Find Natural Loops

The natural loop of a back edge a->b is {b} plus the set of nodes that can reach a without going through bRemove b from the flow graph, find all predecessors of aTheorem: two natural loops are either disjoint, identical, or nested.

32

Slide33Example: Natural Loops

33

1

3

5

2

4

Natural loop

of 3 -> 2

Natural loop

of 5 -> 1

“Advanced Compiler Techniques”

Slide34“Advanced Compiler Techniques”

Relationship between Loops

If two loops do not have the same headerthey are either disjoint, orone is entirely contained (nested within) the otherinnermost loop: one that contains no other loop.If two loops share the same headerHard to tell which is the inner loopCombine as one

1

2

3

4

34

Slide35Basic Parallelism

Examples:FOR i = 1 to 100 a[i] = b[i] + c[i]FOR i = 11 TO 20 a[i] = a[i-1] + 3FOR i = 11 TO 20 a[i] = a[i-10] + 3Does there exist a data dependence edge between two different iterations?A data dependence edge is loop-carried if it crosses iteration boundariesDoAll loops: loops without loop-carried dependences

35

“Advanced Compiler Techniques”

Slide36“Advanced Compiler Techniques”

Data Dependence of Variables

True dependence

Anti-dependence

a = = a

= aa =

a = a =

= a = a

Output dependenceInput dependence

36

Slide37Affine Array Accesses

Common patterns of data accesses: (i, j, k are loop indexes)A[i], A[j], A[i-1], A[0], A[i+j], A[2*i], A[2*i+1], A[i,j], A[i-1, j+1]Array indexes are affine expressions of surrounding loop indexesLoop indexes: in, in-1, ... , i1Integer constants: cn, cn-1, ... , c0Array index: cnin + cn-1in-1+ ... + c1i1+ c0Affine expression: linear expression + a constant term (c0)

37

“Advanced Compiler Techniques”

Slide38Formulating DataDependence Analysis

FOR i := 2 to 5 do A[i-2] = A[i]+1;

38

Between read access A[i] and write access A[i-2] there is a dependence if:there exist two iterations ir and iw within the loop bounds, s.t.iterations ir & iw read & write the same array element, respectively ∃integers iw, ir 2≤iw,ir≤5 ir=iw-2Between write access A[i-2] and write access A[i-2] there is a dependence if:∃integers iw, iv 2≤iw,iv≤5 iw–2=iv–2To rule out the case when the same instance depends on itself: add constraint iw ≠ iv

“Advanced Compiler Techniques”

Slide39Memory Disambiguation

Undecidable at Compile Time read(n) For i = … a[i] = a[n]

39

“Advanced Compiler Techniques”

Slide40Domain of Data Dependence Analysis

Only use loop bounds and array indexes that are affine functions of loop variablesfor i = 1 to nfor j = 2i to 100a[i + 2j + 3][4i + 2j][i * i] = … … = a[1][2i + 1][j]Assume a data dependence between the read & write operation if there exists:∃integers ir,jr,iw,jw 1 ≤ iw, ir ≤ n 2iw ≤ jw ≤ 100 2ir ≤ jr ≤ 10 iw + 2jw + 3 = 1 4iw + 2jw = 2ir + 1Equate each dimension of array access; ignore non-affine onesNo solution No data dependenceSolution There may be a dependence

40

“Advanced Compiler Techniques”

Slide41Iteration Space

41

An abstraction for loops Iteration is represented as coordinates in iteration space.

for i= 0, 5 for j = 0, 3 a[i, j] = 3

i

j

“Advanced Compiler Techniques”

Slide42Iteration Space

42

An abstraction for loops

for i = 0, 5 for j = i, 3 a[i, j] = 0

i

j

“Advanced Compiler Techniques”

Slide43Iteration Space

43

An abstraction for loops

for

i = 0, 5 for j = i, 7 a[i, j] = 0

i

j

“Advanced Compiler Techniques”

Slide44Affine Access

44

“Advanced Compiler Techniques”

Slide45Affine Transform

45

i

j

u

v

“Advanced Compiler Techniques”

Slide46Loop Transformation

46

for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_forend_for

for u = 1, 200 for v = 1, 100 A[v,u] = A[v,u]+ 3 end_forend_for

“Advanced Compiler Techniques”

Slide47Old Iteration Space

47

for

i = 1, 100

for

j = 1, 200 A[i, j] = A[i, j] + 3 end_forend_for

“Advanced Compiler Techniques”

Slide48New Iteration Space

48

for

u = 1, 200

for

v = 1, 100 A[v,u] = A[v,u]+ 3 end_forend_for

“Advanced Compiler Techniques”

Slide49Old Array Accesses

49

for

i = 1, 100

for

j = 1, 200

A[i, j] = A[i, j] + 3

end_forend_for

“Advanced Compiler Techniques”

Slide50New Array Accesses

50

for

u = 1, 200

for v = 1, 100 A[v,u] = A[v,u]+ 3 end_forend_for

“Advanced Compiler Techniques”

Slide51Interchange Loops?

51

for i = 2, 1000 for j = 1, 1000 A[i, j] = A[i-1, j+1]+3 end_forend_for

e.g. dependence vector dold = (1,-1)

i

j

for

u

=

1, 1000 for v = 2, 1000 A[v, u] = A[v-1, u+1]+3 end_forend_for

“Advanced Compiler Techniques”

Slide52Interchange Loops?

A transformation is legal, if the new dependence is lexicographically positive, i.e. the leading non-zero in the dependence vector is positive.Distance vector (1,-1) = (4,2)-(3,3)Loop interchange is not legal if there exists dependence (+, -)

52

“Advanced Compiler Techniques”

Slide53GCD Test

53

Is there any dependence?Solve a linear Diophantine equation2*iw = 2*ir + 1

for i = 1, 100 a[2*i] = … … = a[2*i+1] + 3

“Advanced Compiler Techniques”

Slide54GCD

The greatest common divisor (GCD) of integers a1, a2, …, an, denoted gcd(a1, a2, …, an), is the largest integer that evenly divides all these integers. Theorem: The linear Diophantine equation has an integer solution x1, x2, …, xn iff gcd(a1, a2, …, an) divides c

54

“Advanced Compiler Techniques”

Slide55Examples

55

Example 1: gcd(2,-2) = 2. No solutionsExample 2: gcd(24,36,54) = 6. Many solutions

“Advanced Compiler Techniques”

Slide56Loop Fusion

56

for i = 1, 1000 A[i] = B[i] + 3end_forfor j = 1, 1000 C[j] = A[j] + 5end_for

for i = 1, 1000 A[i] = B[i] + 3 C[i] = A[i] + 5end_for

Better reuse between A[i] and A[i]

“Advanced Compiler Techniques”

Slide57Loop Distribution

57

for i = 1, 1000 A[i] = A[i-1] + 3end_forfor i = 1, 1000 C[i] = B[i] + 5end_for

for i = 1, 1000 A[i] = A[i-1] + 3 C[i] = B[i] + 5end_for

2nd loop is parallel

“Advanced Compiler Techniques”

Slide58Register Blocking

for j = 1, 2*m for i = 1, 2*n A[i, j] = A[i-1, j] + A[i-1, j-1] end_forend_for

for j = 1, 2*m, 2 for i = 1, 2*n, 2 A[i, j] = A[i-1,j] + A[i-1,j-1] A[i, j+1] = A[i-1,j+1] + A[i-1,j] A[i+1, j] = A[i, j] + A[i, j-1] A[i+1, j+1] = A[i, j+1] + A[i, j] end_forend_for

Better reuse between A[

i,j] and A[i,j]

“Advanced Compiler Techniques”

58

Slide59Virtual Register Allocation

for j = 1, 2*M, 2 for i = 1, 2*N, 2 r1 = A[i-1,j] r2 = r1 + A[i-1,j-1] A[i, j] = r2 r3 = A[i-1,j+1] + r1 A[i, j+1] = r3 A[i+1, j] = r2 + A[i, j-1] A[i+1, j+1] = r3 + r2 end_forend_for

Memory operations reduced to register load/store

8MN loads to 4MN loads

“Advanced Compiler Techniques”

59

Slide60Scalar Replacement

for i = 2, N+1 = A[i-1]+1 A[i] =end_for

t1 = A[1]for i = 2, N+1 = t1 + 1 t1 = A[i] = t1end_for

Eliminate loads and stores for array references

“Advanced Compiler Techniques”

60

Slide61Unroll-and-Jam

for j = 1, 2*M for i = 1, N A[i, j] = A[i-1, j] + A[i-1, j-1] end_forend_for

for j = 1, 2*M, 2 for i = 1, N A[i, j]=A[i-1,j]+A[i-1,j-1] A[i, j+1]=A[i-1,j+1]+A[i-1,j] end_forend_for

Expose more opportunity for scalar replacement

“Advanced Compiler Techniques”

61

Slide62Large Arrays

for i = 1, 1000 for j = 1, 1000 A[i, j] = A[i, j] + B[j, i] end_forend_for

Suppose arrays A and B have row-major layout

B has poor cache locality.Loop interchange will not help.

“Advanced Compiler Techniques”

62

Slide63Loop Blocking

for v = 1, 1000, 20 for u = 1, 1000, 20 for j = v, v+19 for i = u, u+19 A[i, j] = A[i, j] + B[j, i] end_for end_for end_forend_for

Access to small blocks of the arrays has good cache locality.

“Advanced Compiler Techniques”

63

Slide64Loop Unrolling for ILP

for i = 1, 10 a[i] = b[i]; *p = ... end_for

for I = 1, 10, 2 a[i] = b[i]; *p = … a[i+1] = b[i+1]; *p = …end_for

Large scheduling regions. Fewer dynamic branchesIncreased code size

“Advanced Compiler Techniques”

64

Slide65“Advanced Compiler Techniques”

Next Time

Homework9.6.2, 9.6.4, 9.6.7Single Static Assignment (SSA)Readings: Cytron'91, Chow'97

65

## Advanced Compiler Techniques

Download Presentation - The PPT/PDF document "Advanced Compiler Techniques" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.