/
1 Dataflow Analysis 2 Overview 1 Dataflow Analysis 2 Overview

1 Dataflow Analysis 2 Overview - PowerPoint Presentation

ceila
ceila . @ceila
Follow
67 views
Uploaded On 2023-06-23

1 Dataflow Analysis 2 Overview - PPT Presentation

To perform optimizations like constant propagation or dead code elimination we must Analyze program to find opportunities for performing optimizations safely Transform program Analysis is called ID: 1002101

point program dataflow solution program point solution dataflow equations start constant paths definitions variable path set def reaching flow

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "1 Dataflow Analysis 2 Overview" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. 1Dataflow Analysis

2. 2OverviewTo perform optimizations like constant propagation or dead code elimination, we mustAnalyze program to find opportunities for performing optimizations safelyTransform programAnalysis is called dataflow analysisModel values of interest as domainFormulate system of fixpoint equations whose unknowns are the required information at different points in programSolve system of equationsLater we will study other types of program analyses

3. Some conceptsA statement is a definition of a variable v if it may write to v.A statement is a use of variable v if it may read from v.A variable v is live at a point p in a CFG if there is a path from p to END on which there is a use of v before any definition of v.3{x,y,z,d,c}if (c)x = y+1y=2*zif(d)x=y+zz=1z=x{}{x}

4. 4Use/Def for each 3-address op if I is x = y OP z : use[I] = {y, z} def[I] = {x} if I is x = OP y : use[I] = {y} def[I] = {x} if I is x = y : use[I] = {y} def[I] = {x} if I is if (x) : use[I] = {x} def[I] = {} if I is return x : use[I] = {x} def[I] = {} if I is x = f(y1 ,…, yn) : use[I] = {y1, …, yn} def[I] = {x}

5. AliasingTwo or more program names for the same memory locationCan happen if you have pointers or call-by-reference parameters x = 3 y = @x *y = 5 …x…Aliasing complicates def/use computationWe will assume no aliasing for now so variable name is enough to determine defs and usesx and *y are aliases

6. Dataflow informationFacts about program execution that are useful for optimization at compile-timeExamples:Is the value of x always a constant at this point of the program? If so, what is the value? Analysis: constant propagationGiven a use of x in a statement, which assignment statements for x could have written the value at that use? Analysis: reaching definitionsIs this computation of “x+y” redundant because it has been computed earlier in the program? Analysis: common subexpression elimination

7. 7Program PointsTwo program points for each instruction: There is a program point before each instructionThere is a program point after each instructionIn a basic block:Program point after an instruction = program point before the successor instructionx = y+1Point beforePoint after

8. 8Program Points: Examplex = y+1y =2*zif (d)x = y+zz = 1Multiple successor blocks means that point at the end of a block has multiple successor program pointsDepending on the execution, control flows from a program point to one of its successors

9. Reaching definitionsA definition d of variable v is said to reach a point p ifthere is a path from START to p that contains dthere are no other definitions of v on that path after d Convention: START has a definition for all variables of interestHow do we compute reaching definitions at each point of program?To keep things simple, we consider only one variablein general, you would perform analysis for all variables of interestSTARTa:= ….if p(a)a:= ….a:= …. …a...d1d2d3{d2,d3}{d2}{d1}{START}{d3}

10. To find definitions that reach a point pEnumerate all paths from START to pFor each path, find the definition that reaches pCompute the union of these definitionsThis is called the “meet-over-paths” (MOP)solutionConfluence operation:How do we combine information from different paths to a given point?Problem:If program has loops, set of paths to p will be unboundedBetter idea: dataflow equationsOne solution: meet-over-paths{d3}STARTa:= ….if p(a)a:= ….a:= …. …a...d1d2d3{d2,d3}{d2}{d1}{START}

11. Dataflow equationsSystem of fixpoint equations in which unknowns are solutions to the dataflow problem at different points in programSolve the system of equations as described in previous lectureFor many problems, this gives the MOP solution and for other problems, it gives a “safe” approximation to MOP solution Safe approximation to MOP solutionNot as precise as MOP solution but will not lead to incorrect optimizationExample: an analysis that says no variable is a constant anywhere in the program is a safe approximation to MOP Granularity of dataflow equationsStatement-level: unknowns are associated with input or output points of each statementBasic-block level: more commonDataflow Equations{d3}STARTa:= ….if p(a)a:= ….a:= …. …a...d1d2d3{d2,d3}{d2}{d1}{START}

12. Reaching definitions: domain x4STARTa:= ….if p(a)a:= ….a:= …. …a...d1d2d3x5 x3 x1x0Unknowns: reaching definitions at output of each statement.Solution at each point will be some element of domain. x2 x2Domain is Power-set ({START,d1,d2,d3})

13. Reaching definitions: equations x4STARTa:= ….if p(a)a:= ….a:= …. …a...d1d2d3x5 x3 x1x0 x2 x2x0 = {START}x1 = {d1}x2 = x1x3 = {d2}x4 = {d3}x5 = x3 U x4Give a name to output of each statementFor each statement, write down equation for output as a function of its input(s).Rule: v:= ….dnxi1xoxi2(v==a): equation is xo = {dn}otherwise: xo = xi1 U xi2

14. Reaching definitions: multiple variablesYou can solve for one variable at a time, but it is better to solve for all variables at the same timeAssignment to variable x will “kill” all definitions of x that reach input of assignment but will pass through all other assignments“generate” itself at outputEquation: Out = (In – Kill) U GenNotation: Dx = set of definitions of variable x v:= ….dnxi1xoxi2Dataflow equation:xo = ((xi1 U xi2) – Dv) U {dn}

15. More complicated examplex1 = {START}x2 = x1 U x7x3 = (x2 – {d6,d3}) U {d3}x4 = (x3 – {d4}) U {d4}x5 = x4x6 = (x5 – {d3,d6}) U {d6}x7 = ((x6 U x5) – {d7,d8}) U d7x8 = (x2 – {d7,d8}) U {d8}if (c)x = y+1y=2*zif(d)x=y+zz=1z=x1234567START8

16. ObservationsSystem of equations can be solved to find least solutionIterative method for solving equations will converge even though there are an unbounded number of paths in programIntuitively we need to consider only a finite number of paths to find solution to reaching definitions problem since longer paths do not give us more informationIn general, when you have loops, the system of equations will have multiple solutions. Exercise: convince yourself using the example in the last slideWhich one do we want?We want the least solution. So in iterative solution, initialize all unknowns to {}.

17. 17Classification of dataflow problemsDirection of information propagationForward-flow: information propagates in the direction of control-flow from input to outputBackward-flow: information propagates in reverse direction of control-flow from output to inputAll-paths vs. any-pathAll-paths problem: dataflow fact is true at some point in program if it is true along all paths to/from that pointAny-path problem: dataflow fact is true at some point in program if it is true along any path to/from that pointClassification of reaching definitions?STARTENDp

18. 18ExamplesForwardBackwardAll-pathsAny-pathConstant propagationAvailable expressionsReaching definitionsAnticipatable expressionsLive variablesSTARTENDp

19. Recall{x,y,z,d,c}if (c)x = y+1y=2*zif(d)x=y+zz=1z=x{}{x}A variable v is live at a point p in a CFG if there is a path from p to END on which there is a use of v before any definition of vWe want to compute the set of variables live at each point in programBackward-flow problem: information at p depends on paths from p to ENDAny-path problem: variable is live at p if there is any path from p to END that satisfies the conditionDomain: power-set of set of variables in procedureENDSTART

20. 20Straight-line codeRelation between Live sets: Live1 = ( Live2-{x} )  {y} Live2 = ( Live3-{y} )  {z} Live3 = ( Live4-{} )  {d}Relation: in[I] = ( out[I] – def[I] )  use[I]Information flows backward! Instructions: can compute in[I] if we know out[I]Live1x = y+1Live2y =2*zLive3if (d)Live4I1I2I3Iin[I]out[I]

21. 21in[B1]B1Analyze Control FlowA variables is live at end of block B if it is live at the beginning of one (or more) successor blocksCharacterizes all possible program executionsMathematically: out[B] =  in[B’] Bout[B]in[Bn]Bn…B’  succ(B)Confluence operator is unionCompute least solution

22. Live variablesL10 = { }L3 = {x} U (L10- {z})L2 = L4 U L3 U {c}L7 = L2 – {z}L8 = {y,z} U (L7 – {x})L6 = L8 U L7 U {d}L5 = {z} U (L6 – {y})L4 = {y} U (L5 – {x}){x,y,z,d,c}{x,y,d,c}{y,z,d,c}{x,y,d,c}if (c)x = y+1y=2*zif(d)x=y+zz=1z=x{}{x}{x,y,z,d,c}{x,z,d,c}{x,y,z,d,c}234567810STARTEND

23. Available expressionsAn expression “x op y” is available at p if every path from START to p contains an evaluation of x op y after which there are no assignments to x or y.Classification: forward-flow, all-pathsDomain: power-set of expressions of interestEquation for “x = y op z”: Out = (In – Ex) U {“y op z”}Confluence operator: intersectionCompute greatest solution: start by assuming all expressions are available everywhere except at START and iterateSTARTif (c)w=x+yz = x+y…x+y..Computation in B5 is “totally redundant”. B5STARTif (c)t=x+yw=tt = x+yz=t…t..B5

24. Anticipatable expressionsAn expression “x op y” is anticipatable at p if every path from p to END contains an evaluation of “x op y” before any assignment to x or y.Classification: backward-flow, all-pathsDomain: power-set of expressions of interestEquation for “x = y op z”: In = (Out – Ex) U {“y op z”}Confluence operator: intersectionCompute greatest solution: start by assuming all expressions are anticipatable everywhere except at END and iterateSTARTif (c)z = x+y…x+y..Computation in B5 is “partially redundant”. B5STARTif (c)t=x+yt = x+yz=t…t..Partial redundancy in B5 is removed.B5

25. Constant propagationA variable x is a constant c at a point p if x has the value c at that point on all paths from START to that point.Classification: forward-flow, all-pathsPerform constant propagation on set of variables VDataflow information at point p: vector of size |V| where each value comes from domainEquation for statement x = eOUT = IN [(Eval(e) in IN) / x]Confluence operation: joinInitialize all vectors to [,,,..] except at START where it is [T,T,..T]…. -2 -1 0 1 2 … T Domain for constant propagationDefinitely not constantMay or may notbe constant

26. Constant propagation (contd.)Solution to dataflow equations does not give meet-over-paths solution in generalGives a safe approximationExample on left:In B5, x+y is constant 5 but dataflow solution will give T, which is a safe approximationSTARTif (c)x = 3y = 2x = 2y = 3…x+y..B5[2,3][3,2][T,T][T,T][T,T][T,T]Dataflow vector: [x,y]

27. Implementing constant propagationUsing vectors to represent values of variables is inefficientAssignment is to one variable but values of other variables must be copied from input vector to output vectorIdea: can we use def-use chains to perform constant propagation?If we have an assignment “x = 3”, find all uses reached only by this definition and replace use with constant 3.This works but it may not find all the constants that the CFG algorithm doesSolution: use a variant of def-use chains called the static single assignment form (SSA)See laterOUT = IN [(Eval(e) in IN) / x]x = eINOUT

28. Solving dataflow equations using elimination

29. High-level ideaSolving systems of linear equationsIterative methods: Jacobi, Gauss-Seidel,..Elimination-based methods: Gaussian eliminationDataflow equationsWe have seen iterative methodFor structured programs, we can use elimination-based methods and avoid iterationFor unstructured programs, elimination can be used to reduce the number of equations given to iterative solver

30. Intuitive idea of eliminationReaching-definitionsBasic-block level equations…..Out[B5] = Out[B4]Out[B6] = {d7} U (Out[B5]- Dz)Out[B7] = {d9} U (Out[B5]- Dz)…..Replace equations with this ….. Out[F] = {d7,d9} U (Out[B4]-Dz) ……Solve equations iteratively and then interpolate solution into F STARTif (c)z=w*wz = x+y…...……d7d9B4B5B6B7B8F….

31. General idea for reaching definitionsgen[R]: set of definitions in R for which there is a path from that definition to exit free of other definitions of that variablekill[R]: set of definitions in R that do not reach exit even if they reach entry of REquation for region R:out[R] = gen[R] U (in[R] – kill[R])entryexit

32. Elimination and interpolation

33. Elimination-based methodsSame idea can be used for other dataflow problemsCompute gen and kill setsFor structured programsCan solve dataflow equations without iterationCan compute dataflow solution from AST without building control-flow graphFor unstructured programsGeneralization of loop: intervalSingle-entry multiple-exit loop Reducible program: program that can be decomposed into nested intervals (Cocke and Allen)Irreducible program: needs iterationJohn Cocke (in wheelchair)and Fran Allen (pink shirt).Both won the Turing Award

34. SummaryProgram analysis is needed to perform optimizations safelyAnalysis must consider all possible execution pathsMeet over paths (MOP) solutionDataflow analysisSolve a set of fixpoint equations to find dataflow informationDomain: usually a powerset ordered by containmentTransfer function for assignment statementsConfluence operator: usually either union or intersectionType of solutionConfluence operator is union: least solutionConfluence operator is intersection: greatest solutionFor many problems, dataflow solution gives MOP solution. For other problems like constant propagation, it gives a safe approximation to MOP.In practiceCompute transfer function for entire basic blocks so you have as many unknowns as basic blocks rather than statements, and then iterateFor structured programs, you can avoid iteration entirely: elimination-based methodsSets are represented using bit-vectors and union/intersection are implemented using bitwise OR/AND