Ben Hardekopf Calvin Lin The University of Texas at Austin POPL 09 Simplified by Eric Villasenor Overview Background FlowSensitive Analysis SemiSparse FlowSensitive Analysis Questions ID: 1003355
Download Presentation The PPT/PDF document "Semi-Sparse Flow-Sensitive Pointer Analy..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1. Semi-Sparse Flow-Sensitive Pointer AnalysisBen Hardekopf Calvin LinThe University of Texas at AustinPOPL ’09Simplified by Eric Villasenor
2. OverviewBackgroundFlow-Sensitive AnalysisSemi-Sparse Flow-Sensitive AnalysisQuestions
3. UsesGather pointer information to improve precision which allows optimizationsFlow sensitive is beneficial for the followingSecurity analysisDeep error checkingHardware synthesisMulti-threaded programs
4. Types of AnalysisTypes of pointer AnalysisFlow Consider statement ordering in code Little progress made in scalabilityContextConsider Procedure callsGood progress in scalabilityComplimentary improvement of precision
5. Analysis TradeoffsScalability vs PrecisionIt takes time to analysis codeIt takes memory to hold the analysisInsensitive vs SensitiveInsensitive less complex/preciseSensitive more complex/preciseLarger pieces of code in general are complex
6. Traditional Flow-Sensitive AnalysisLattice of dataflow factsMeet operator on latticeTransfer functions map lattice elements to other lattice elementsUse CFG = <N,E>N nodes (program points)E edges (flow)
7. Traditional Flow-Sensitive AnalysisIterative algorithmRuns until convergenceAdds successor nodes to work list when output set changesPropagates pointer information to all reachable nodesProhibitive in memory and computation complexity
8. ContributionsTwo IdeasSemi-sparse analysisNovel use of Binary Decision DiagramsTwo new optimizationsTop-level pointer equivalenceLocal points-to graph equivalence
9. Static Single AssignmentDef/use relation capturedLet us use it to reduce information sent to nodesw = a;x = b;y = &c;z = y;y = &d;w1 = a1;x1 = b1;y1 = c1;z1 = y1;y2 = d1;w = a;x = b;y = c;z = y;y = d;w1 = a1;x1 = b1;y1 = ?;z1 = ?;y2 = ?;
10. Partial Single Static AssignmentTwo classes of variableAddress-TakenIn memoryUse ALLOC/STORETop-levelNever expose addressNot dynamically allocatedint a, b, *c, *d;int* w = &a;int* x = &b;int** y = &c;int** z = y; c = 0; *y = w; *z = x; y = &d; z = y; *y = w; *z = x;w1 = ALLOCax1 = ALLOCby1 = ALLOCcz1 = y1STORE 0 y1STORE w1 y1STORE x1 z1y2 = ALLOCdz2 = y2STORE w1 y2STORE x1 z2
11. Partial Single Static AssignmentAdvantagesSingle global points-to graph for top-level variablesThey have same pointer information over entire programTop-level def/use info immediately availableLocal points-to graph only contain address-taken information
12. Dataflow GraphDFG - combination of sparse evaluation graph (SEG) and def-use chainOptimized version of CFGOmits nodes that neither define nor use pointer infoConnects adr-taken statements so defs reach usesTwo stage constructionFirst DEFadr and USEadr are consideredSecond stage connects top-level defs to uses
13. Dataflow GraphInst TypeExampleDef-Use InfoALLOCx = ALLOCiDEFtopCOPYx = y zDEFtop, USEtopLOADx = *yDEFtop, USEtop, USEadrSTORE*x = yUSEtop, DEFadr, USEadrCALLx = foo(y)DEFtop, USEtop, DEFadr, USEadrRETreturn xUSEtop, USEadr
14. Dataflow Graphy1 = ALLOCcSTORE 0 y1w1 = ALLOCax1 = ALLOCbz1 = y1STORE w1 y1y2 = ALLOCdSTORE x1 z1z2 = y2STORE w1 y2STORE x1 z2w1 = ALLOCax1 = ALLOCby1 = ALLOCcz1 = y1STORE 0 y1STORE w1 y1STORE x1 z1y2 = ALLOCdz2 = y2STORE w1 y2STORE x1 z2
15. Semi-Sparse AnalysisEach function has program statement work listInitialized to statements that define variablesEach program statement that uses or defines address-taken variables has two points-to graphsIN = incoming address-taken infoOUT = outgoing address-taken infoGlobal points-to graph holds pointer info for top-level variablesFunction work list that holds function waiting to be processedInitialized to contain all functions in program
16. Semi-Sparse AnalysisIterative algorithmComputes for all nodes until convergenceINk = U(x in pred(k)) OUTxOUTk = GENk U (INk – KILLk)KILL set determines strong or weak updateKnow value of left hand side do strong updatepreciseUnsure of left hand side do weak updateconservative
17. Top-Level Pointer EquivalenceOptimizationReduces number of top-level variables in DFGx equiv y iff x points-to z and y points-to zKey IdeaReplace variables with identical points-to sets with single set representativeMember of the set selected as representative
18. Top-Level Pointer Equivalencey1 = ALLOCcSTORE 0 y1w1 = ALLOCax1 = ALLOCbz1 = y1STORE w1 y1y2 = ALLOCdSTORE x1 z1z2 = y2STORE w1 y2STORE x1 z2w1 = ALLOCax1 = ALLOCby1 = ALLOCcz1 = y1STORE 0 y1STORE w1 y1STORE x1 z1y2 = ALLOCdz2 = y2STORE w1 y2STORE x1 z2STORE x1 y1STORE x1 y1STORE x1 y2STORE x1 y2w1 = ALLOCax1 = ALLOCby1 = ALLOCcSTORE 0 y1STORE w1 y1STORE x1 y1y2 = ALLOCdSTORE w1 y2STORE x1 y2
19. Local Points-to Graph EquivalenceOptimizationEliminates nodes in DFG with identical points-to graphsShare a single points-to graphUsed in SEG portion of graphKey IdeaNon-preserving nodesOnly STORE and CALL modify adr-taken pointer info.Preserving nodesPropagate pointer info to other nodes
20. Local Points-to Graph EquivalenceProcess takes O(n3)N is the number of nodes in SEG portion of DFG(DEFadr or USEadr)Further optimized to only use STORE0.1% precision lossSimilar to RTLSTORE to STORE collapsible
21. BDDsCompressed representation of set relationsOperations performed without decompressionSet operations can be performed in polynomial-timeUseful to store CFG and points-to graphTransfer functions are BDD operationsSet operations
22. Semi-Sparse Symbolic AnalysisEncode top-level points-to information in BDDMost variables are top-levelBDDs can not operate on individual statements efficientlyUse iterative algorithm for address-taken points-to informationStrong and weak updatesAllows BDD to operate efficiently
23. Results of the AnalysisPointer Information RepresentationSemi-Sparse Flow-SensitiveSemi-Sparse Flow-Sensitive OptimizedSSO vs SSbitmap75x faster26x less memoryAgainst baseline183x faster47x less memoryAgainst baseline2.5x faster6.8x less memoryAgainst SSBDD44.8x faster1.4x less memoryAgainst baseline114x faster1.4x less memoryAgainst baseline4.4x faster1.03x less memoryAgainst SS
24. Questions