/
Graph Neural Net work  and Reinforcement Learning in EDA and beyond Graph Neural Net work  and Reinforcement Learning in EDA and beyond

Graph Neural Net work and Reinforcement Learning in EDA and beyond - PowerPoint Presentation

jocelyn
jocelyn . @jocelyn
Follow
66 views
Uploaded On 2023-11-06

Graph Neural Net work and Reinforcement Learning in EDA and beyond - PPT Presentation

Callie Hao Assistant Professor ECE Georgia Institute of Technology Sharclab Georgia Tech httpssharclabecegatechedu Background Graph Neural Network GNN Reinforcement Learning RL ID: 1029321

program hls gnn circuit hls program circuit gnn dsp graph level reinforcement resource predict synthesis high hao code int

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Graph Neural Net work and Reinforcement..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Graph Neural Network and Reinforcement Learning in EDA and beyondCallie HaoAssistant ProfessorECE, Georgia Institute of TechnologySharc-lab @ Georgia Tech https://sharclab.ece.gatech.edu/

2. BackgroundGraph Neural Network (GNN)Reinforcement Learning (RL)High Level Synthesis (HLS)HLS: Program-to-CircuitIRONMAN: GNN-assisted design space exploration in high-level synthesis via reinforcement learningNan Wu, Yuan Xie, and Cong Hao, GLSVLSI, 2021Program-to-Circuit: Exploiting GNNs for Program Representation and Circuit TranslationNan Wu, Huake He, Yuan Xie, Pan Li, Cong Hao, arXiv, 2021And beyond: Program-to-XCall for future research interests 2Outline

3. 3AcknowledgementNan WuUniversity of California, Santa BarbaraYuan XieUniversity of California, Santa BarbaraPan LiPurdue University… and myself (Just to fill up the space)

4. BackgroundGraph Neural Network (GNN)Reinforcement Learning (RL)High Level Synthesis (HLS)HLS: Program-to-CircuitIRONMAN: GNN-assisted design space exploration in high-level synthesis via reinforcement learningNan Wu, Yuan Xie, and Cong Hao, GLSVLSI, 2021Program-to-Circuit: Exploiting GNNs for Program Representation and Circuit TranslationNan Wu, Huake He, Yuan Xie, Pan Li, Cong Hao, arXiv, 2021And beyond: Program-to-XCall for future research interests 4Outline

5. Traditional neural networks are designed for simple sequences & grids5What is Graph Neural Network (GNN)[Slide credit: http://web.stanford.edu/class/cs224w]Speech/Text

6. Reality: A lot of real-world data does not “live” on gridsArbitrary size and complex topological structure No fixed node ordering or reference pointOften dynamic and have multimodal features6What is Graph Neural Network (GNN)Image credit: Madhavicmu / Wikimedia Commons/CC-BY-SA-4.0Social NetworksEconomic NetworksProtein Interaction Networks

7. 7What is Graph Neural Network (GNN)[Slide credit: Structured deep models: Deep learning on graphs and beyond]Main Idea: Pass massages between pairs of nodes and agglomerate

8. Key idea: Generate node embeddings based on local network neighborhoods Node embedding: a vector to represent the node features8How is GNN Computed (e.g., GCN)[Slide credit: http://web.stanford.edu/class/cs224w]MLPMLPMLPMLP

9. Model parameters are shared for all nodesModel can generalize to unseen nodes and entirely unseen graphs9Inductive Capability of GNNs[Slide credit: http://web.stanford.edu/class/cs224w]

10. BackgroundGraph Neural Network (GNN)Reinforcement Learning (RL)High Level Synthesis (HLS)HLS: Program-to-CircuitIRONMAN: GNN-assisted design space exploration in high-level synthesis via reinforcement learningNan Wu, Yuan Xie, and Cong Hao, GLSVLSI, 2021Program-to-Circuit: Exploiting GNNs for Program Representation and Circuit TranslationNan Wu, Huake He, Yuan Xie, Pan Li, Cong Hao, arXiv, 2021And beyond: Program-to-XCall for future research interests 10Outline

11. An agent learns in an interactive environment by trial and error using feedback from its own actions and experiencesSuitable to handle control problems or sequential decision-making processesCan explore design space proactively and intelligentlyAgent behaviors are optimized by embedding optimization goals into reward functions11What is Reinforcement Learning (RL)[Slide credit: https://arxiv.org/pdf/2102.07952.pdf]

12. BackgroundGraph Neural Network (GNN)Reinforcement Learning (RL)High Level Synthesis (HLS)HLS: Program-to-CircuitIRONMAN: GNN-assisted design space exploration in high-level synthesis via reinforcement learningNan Wu, Yuan Xie, and Cong Hao, GLSVLSI, 2021Program-to-Circuit: Exploiting GNNs for Program Representation and Circuit TranslationNan Wu, Huake He, Yuan Xie, Pan Li, Cong Hao, arXiv, 2021And beyond: Program-to-XCall for future research interests 12Outline

13. 13What is High Level Synthesis (HLS)for (i=1; i<=c;) a = a++; b = x*2-a; a = y+b/3;for (i=1; i<=c;) a = a++; b = x*2-a; a = y+b/3;for (i=1; i<=c;) a = a++; b = x*2-a; a = y+b/3;HLS ToolsREGin+*REGin+*outREGin+*C / C++, Chisel, …Verilog, VHDL, …Software Specification and ProgramCircuit (ASIC, FPGA) DesignHigh-Level SynthesisLogic SynthesisPhysical Synthesis

14. 14What is High Level Synthesis (HLS)for (i=1; i<=c;) a = a++; b = x*2-a; a = y+b/3;for (i=1; i<=c;) a = a++; b = x*2-a; a = y+b/3;for (i=1; i<=c;) a = a++; b = x*2-a; a = y+b/3;HLS ToolsREGin+*REGin+*outREGin+*C / C++, Chisel, …Verilog, VHDL, …Software Specification and ProgramCircuit (ASIC, FPGA) DesignHigh-Level SynthesisLogic SynthesisPhysical SynthesisWhy HLS?Easy programming: C/C++/Python v.s. VerilogFast → productivityPromote device usageMore optimization opportunities at higher level

15. BackgroundGraph Neural Network (GNN)Reinforcement Learning (RL)High Level Synthesis (HLS)HLS: Program-to-CircuitIRONMAN: GNN-assisted design space exploration in high-level synthesis via reinforcement learningNan Wu, Yuan Xie, and Cong Hao, GLSVLSI, 2021Program-to-Circuit: Exploiting GNNs for Program Representation and Circuit TranslationNan Wu, Huake He, Yuan Xie, Pan Li, Cong Hao, arXiv, 2021And beyond: Program-to-XCall for future research interests 15Outline

16. Programs are translated to Data Flow Graphs (DFGs) as the Intermediate Representation (IR)Techniques applied to DFGs: operation scheduling, resource allocation, resource binding, etc.Finally translated to register-transfer level (RTL) Circuit16HLS: Program-to-Circuit++***+*       x1 = a + b;x2 = b + c;x3 = d * e;y1 = x1 * x2;y2 = x2 * x3;o1 = y1 + 5;o2 = y1 * y2;Data Flow Graph (DFG) ProgramADDMULREGREGREGREGRTL (Circuit)

17. 17HLS: Program-to-Circuit Challenges++***+*       x1 = a + b;x2 = b + c;x3 = d * e;y1 = x1 * x2;y2 = x2 * x3;o1 = y1 + 5;o2 = y1 * y2;Data Flow Graph (DFG) ProgramRTL (Circuit)Program-to-CircuitHard-to-predict RTL circuit qualityMulti-objective trade-offsManual code transformation123ADDMULREGREGREGREG

18. Getting actual circuit quality after implementation is very time-consumingE.g,. resource usage, critical path timingHow about quality prediction?18Challenge 1: Hard-to-predict RTL Qualityx1 = a + b;x2 = b + c;x3 = d * e;y1 = x1 * x2;y2 = x2 * x3;o1 = y1 + 5;o2 = y1 * y2;ProgramRTL (Circuit)ADDMULREGREGREGREGActual ImplementationHours to days…A few Seconds++***+*       Data Flow Graph (DFG) Minutes to hours

19. Getting actual circuit quality after implementation is very time-consumingE.g,. resource usage, critical path timingHow about quality prediction?RTL design quality is hard to predict, especially for irregular data pathsExisting HLS tools’ estimation sometimes can be far from accurateAnalytical-model-based predictors only work for well-structured data flowsML-based predictors requires abundant features after design synthesis and/or implementation19Challenge 1: Hard-to-predict RTL Qualityx1 = a + b;x2 = b + c;x3 = d * e;y1 = x1 * x2;y2 = x2 * x3;o1 = y1 + 5;o2 = y1 * y2;ProgramRTL (Circuit)ADDMULREGREGREGREGActual ImplementationHours to days…A few Seconds++***+*       Data Flow Graph (DFG) Minutes to hours

20. 20Motivation 1: Predict Impl. Quality Accuratelyx1 = a + b;x2 = b + c;x3 = d * e;y1 = x1 * x2;y2 = x2 * x3;o1 = y1 + 5;o2 = y1 * y2;ProgramRTL (Circuit)ADDMULREGREGREGREGActual ImplementationHours to days…A few Seconds++***+*       Data Flow Graph (DFG) Minutes to hoursExtract the DFG…… and predict the implementation quality accurately

21. 21Solution 1: Use Graph Neural Network (GNN)x1 = a + b;x2 = b + c;x3 = d * e;y1 = x1 * x2;y2 = x2 * x3;o1 = y1 + 5;o2 = y1 * y2;ProgramRTL (Circuit)ADDMULREGREGREGREGActual ImplementationHours to days…A few Seconds++***+*       Data Flow Graph (DFG) Minutes to hoursExtract the DFG…… and predict the implementation quality accuratelyUse Graph Neural Networks!…and they can generalize to unseen programs!

22. 22HLS: Program-to-Circuit Challenges++***+*       x1 = a + b;x2 = b + c;x3 = d * e;y1 = x1 * x2;y2 = x2 * x3;o1 = y1 + 5;o2 = y1 * y2;Data Flow Graph (DFG) ProgramRTL (Circuit)Program-to-CircuitHard-to-predict RTL circuit qualityMulti-objective trade-offsManual code transformation123ADDMULREGREGREGREG

23. 23Challenge 2: Multi-objective Trade-offs(Example) ConstraintsNumber of DSPs on FPGA(Example) Design ChoicesEach multiplication: use DSP or LUTs?ObjectivesMeeting user-specified constraintMinimize other resources (area)Optimizing critical path timing (clock frequency)DSP=2

24. 24Challenge 2: Multi-objective Trade-offs(Example) ConstraintsNumber of DSPs on FPGA(Example) Design ChoicesEach multiplication: use DSP or LUTs?    Operations using DSPsOperations using LUTsDSP: 4Latency: 3(a) HLS default solution   DSP=2ObjectivesMeeting user-specified constraintMinimize other resources (area)Optimizing critical path timing (clock frequency)

25. 25Challenge 2: Multi-objective Trade-offs(Example) ConstraintsNumber of DSPs on FPGA(Example) Design ChoicesEach multiplication: use DSP or LUTs?    Operations using DSPsOperations using LUTsDSP: 4Latency: 3(a) HLS default solution     DSP: 2Latency: 4(b) HLS solution with naïve constraints     DSP=2ObjectivesMeeting user-specified constraintMinimize other resources (area)Optimizing critical path timing (clock frequency)

26. 26Challenge 2: Multi-objective Trade-offs(Example) ConstraintsNumber of DSPs on FPGA(Example) Design ChoicesEach multiplication: use DSP or LUTs?    Operations using DSPsOperations using LUTsDSP: 4Latency: 3(a) HLS default solution     DSP: 2Latency: 4(b) HLS solution with naïve constraints     DSP=2ObjectivesMeeting user-specified constraintMinimize other resources (area)Optimizing critical path timing (clock frequency)Sacrificing latency (# of clock cycles)?

27. 27Challenge 2: Multi-objective Trade-offs(Example) ConstraintsNumber of DSPs on FPGA(Example) Design ChoicesEach multiplication: use DSP or LUTs?    Operations using DSPsOperations using LUTsDSP: 4Latency: 3(a) HLS default solution     DSP: 2Latency: 4(b) HLS solution with naïve constraints   (c) A better solution    DSP: 2Latency: 3     DSP=2ObjectivesMeeting user-specified constraintMinimize other resources (area)Optimizing critical path timing (clock frequency)Sacrificing latency (# of clock cycles)?

28. 28Motivation 2: Automated DSE       (c) A better solution       Initial DFG       Automatically find a best resource allocation that:Learns to assign resourcesMeet user-specified constraintsTrade-off between resource and critical path timingDo not sacrifice latency

29. 29Solution 2: Reinforcement Learning (RL)       (c) A better solution       Initial DFG       Use Reinforcement Learning (RL)!Use Reinforcement Learning (RL)!

30. 30HLS: Program-to-Circuit Challenges++***+*       x1 = a + b;x2 = b + c;x3 = d * e;y1 = x1 * x2;y2 = x2 * x3;o1 = y1 + 5;o2 = y1 * y2;Data Flow Graph (DFG) ProgramRTL (Circuit)Program-to-CircuitHard-to-predict RTL circuit qualityMulti-objective trade-offsManual code transformation123ADDMULREGREGREGREG

31. 31Challenge 3: Manual Code TransformationHigh level abstraction in HLS can conceal optimization opportunitiesStructured HLS coding style hinders fine-grained optimizationLoops, function calls, etc.Irregular logic requires manual or complicated code transformations

32. 32Challenge 3: Manual Code TransformationHigh level abstraction in HLS can conceal optimization opportunitiesStructured HLS coding style hinders fine-grained optimizationLoops, function calls, etc.Irregular logic requires manual or complicated code transformationsfor (int i = 0; i < 8; i++){ sum += a[i] * b[i];}If DSP constraint is 3?HLS MethodCyclesDSPLUTsCP (ns)Original171754.07Unroll (factor=8, complete)281005.04Unroll (factor=4)44874.83Unroll (factor=3)831097.44Unroll + allocation (limit=3)461688.76The best we can get: DSP = 3, Cycles = 8

33. 33Challenge 3: Manual Code TransformationHigh level abstraction in HLS can conceal optimization opportunitiesStructured HLS coding style hinders fine-grained optimizationLoops, function calls, etc.Irregular logic requires manual or complicated code transformationsfor (int i = 0; i < 8; i++){ sum += a[i] * b[i];}If DSP constraint is 3?HLS MethodCyclesDSPLUTsCP (ns)Original171754.07Unroll (factor=8, complete)281005.04Unroll (factor=4)44874.83Unroll (factor=3)831097.44Unroll + allocation (limit=3)461688.76Code Transformation (CT)281005.03int m1 = a[0] * b[0];int m2 = a[1] * b[1];…int m8 = a[7] * b[7];int m9 = m1 + m2;…Transformed Code

34. 34Challenge 3: Manual Code TransformationHigh level abstraction in HLS can conceal optimization opportunitiesStructured HLS coding style hinders fine-grained optimizationLoops, function calls, etc.Irregular logic requires manual or complicated code transformationsfor (int i = 0; i < 8; i++){ sum += a[i] * b[i];}If DSP constraint is 3?HLS MethodCyclesDSPLUTsCP (ns)Original171754.07Unroll (factor=8, complete)281005.04Unroll (factor=4)44874.83Unroll (factor=3)831097.44Unroll + allocation (limit=3)461688.76Code Transformation (CT)281005.03CT + resource (5 Mul_LUT)2217424.24CT + resource (4 Mul_LUT)2217414.01CT + resource (3 Mul_LUT)2314613.98int m1 = a[0] * b[0];int m2 = a[1] * b[1];…int m8 = a[7] * b[7];int m9 = m1 + m2;…The best we can get: DSP = 3, Cycles = 2

35. 35Challenge 3: Manual Code TransformationHigh level abstraction in HLS can conceal optimization opportunitiesStructured HLS coding style hinders fine-grained optimizationLoops, function calls, etc.Irregular logic requires manual or complicated code transformationsfor (int i = 0; i < 8; i++){ sum += a[i] * b[i];}If DSP constraint is 3?HLS MethodCyclesDSPLUTsCP (ns)Original171754.07Unroll (factor=8, complete)281005.04Unroll (factor=4)44874.83Unroll (factor=3)831097.44Unroll + allocation (limit=3)461688.76Code Transformation (CT)281005.03CT + resource (5 Mul_LUT)2217424.24CT + resource (4 Mul_LUT)2217414.01CT + resource (3 Mul_LUT)2314613.98int m1 = a[0] * b[0];int m2 = a[1] * b[1];…int m8 = a[7] * b[7];int m9 = m1 + m2;…The best we can get: DSP = 3, Cycles = 2DSP = 3, Cycles = 8v.s.

36. 36Challenge 3: Manual Code TransformationHigh level abstraction in HLS can conceal optimization opportunitiesStructured HLS coding style hinders fine-grained optimizationLoops, function calls, etc.Irregular logic requires manual or complicated code transformationsfor (int i = 0; i < 8; i++){ sum += a[i] * b[i];}If DSP constraint is 3?for (int i = 0; i < 17; i++) { for (int j = i; j < 23; j+=2) { sum += a[i] * b[j]; }}If DSP constraint is 13?

37. 37Challenge 3: Manual Code TransformationHigh level abstraction in HLS can conceal optimization opportunitiesStructured HLS coding style hinders fine-grained optimizationLoops, function calls, etc.Irregular logic requires manual or complicated code transformationsMotivation: Need better performance and more flexible optimization choices Solution: Code transformation

38. BackgroundGraph Neural Network (GNN)Reinforcement Learning (RL)High Level Synthesis (HLS)HLS: Program-to-CircuitIRONMAN: GNN-assisted design space exploration in high-level synthesis via reinforcement learningNan Wu, Yuan Xie, and Cong Hao, GLSVLSI, 2021Program-to-Circuit: Exploiting GNNs for Program Representation and Circuit TranslationNan Wu, Huake He, Yuan Xie, Pan Li, Cong Hao, arXiv, 2021And beyond: Program-to-XCall for future research interests 38Outline

39. 39IronMan: GNN + RL for HLS Predict & DSEIronMan: An end-to-end framework integrating CT, GPP, and RLMD targeting HLS (Program-to-Circuit)Wu, Nan, Yuan Xie, and Cong Hao. GLSVLSI, 2021

40. 40IronMan: GNN + RL for HLS Predict & DSECode Transformer (CT) → Challenge 3Manual CTExpose more optimization opportunities

41. 41IronMan: GNN + RL for HLS Predict & DSEGPP→Challenge 1Hard-to-predict RTL QualityImpl. quality predictionCode Transformer (CT) → Challenge 3Manual CTExpose more optimization opportunities

42. 42IronMan: GNN + RL for HLS Predict & DSEGPP→Challenge 1Hard-to-predict RTL QualityImpl. quality predictionRLMD → Challenge 2Multi-objective Trade-offsAutomated DSECode Transformer (CT) → Challenge 3Manual CTExpose more optimization opportunities

43. 43IronMan: GNN + RL for HLS Predict & DSEIRONMAN Goal (a case study):Find a solution that strictly meets DSP constraintsFind Pareto solutions between DSP and LUT resources on FPGAs, without sacrificing latency

44. 44GPP: GNN-based Performance Predictor

45. 45GPP: GNN-based Performance PredictorGenerating graph representations

46. 46GPP: GNN-based Performance Predictor3 models of the same structure to separately predict LUT/DSP and CP timingFeed-forward NNGenerating graph representations

47. 47GPP: GNN-based Performance Predictor3 models of the same structure to separately predict LUT/DSP and CP timingFeed-forward NNGenerating graph representationsNode feature vector:1st ~ 4th: Input / node type (, ) / output5th ~9th: Bit-width (2~32)10th: w/wo pragma 

48. 48GPP: GNN-based Performance Predictor3 models of the same structure to separately predict LUT/DSP and CP timingFeed-forward NNGenerating graph representationsGoal of GPP:Provide graph embeddings of DFGs → generalize across different graphsProvide high-accuracy performance predictions → quick evaluation of generated solutions and accelerate RL trainingNode feature vector:1st ~ 4th: Input / node type (, ) / output5th ~9th: Bit-width (2~32)10th: w/wo pragma 

49. 49RLMD: RL-based Multi-objective DSEState: every possible partially assigned DFGsAction: whether to assign a certain directive to the current nodeReward: a negative weighted sum of predicted resource utilization

50. 50Putting GPP and RLMD Together

51. 51Putting GPP and RLMD TogetherTwo policy optimization methods:Actor-criticPolicy-gradient

52. 52Putting GPP and RLMD TogetherActor: Probability distribution on actionsCritic: State-value functionTwo policy optimization methods:Actor-criticPolicy-gradientReward

53. DatasetSynthetic: 47 different topologies x 100 sets of directives per topology → 4,700 graphsReal-world: 8 real-case benchmarks x 100 sets of directives per benchmark → 800 graphsReal applications come from MachSuite, CHStoneand PolyBench/C: gemm, kernel_2mm, kernel_durbin(small,large), spmv, stencil3d(small, large), and kernel_adiBaselinesSimulated annealing (SA), Genetic algorithm (GA), particle swarm optimization (PSO), Vivado HLSTrainingTrained on 41 different synthetic topologies and 4 real-case benchmarks → 4,500 graphsEvaluated on the rest of graphs → 1,000 graphsEvaluationGround-truth and solutions are synthesized by Vivado HLS and implemented by Vivado Actual resource utilization (LUT/DSP) and critical path (CP) timing 53Experiment Setup

54. 54GPP Evaluation: Implementation PredictionGPP predictions on resource utilization (LUT, DSP), and critical path timing (CP)

55. 55GPP Evaluation: Implementation PredictionGPP GPP GPP HLSHLSHLSGPP predictions on resource utilization (LUT, DSP), and critical path timing (CP)

56. 56GPP Evaluation: Implementation PredictionGPP predictions on resource utilization (LUT, DSP), and critical path timing (CP)Reduce prediction error of Vivado HLS by 10.9x in resource and 5.7x in timingGPP GPP GPP HLSHLSHLS

57. 57GPP Design Choices

58. 58IronMan Evaluation: Pareto Solutions GEMMKernel_2mmDiscussionsIronMan outperforms SA/GA/PSO by a large margin (>12.0%)Fine-tune helps (11.6% additional reduction)In general policy-gradient outperforms actor-criticPG with fine-tuneAC with fine-tunePGACSA/GA/PSOSA/GA/PSOSA/GA/PSOIronManIronMan

59. 59IronMan Evaluation: Multi-objective Opt.GEMMKernel_2mmGEMMKernel_2mmLUT : CP = 1:9 → Better TimingLUT : CP = 9:1 → Less LUTs

60. 60IronMan Evaluation: Meeting ConstraintsRLMD meets user-specified constraints (# of DSPs) almost perfectly!

61. Independently:GPP achieves high prediction accuracyRLMD obtains Pareto solutions surpassing GA/SA/PSOIntegrated, IronMan is can:Help HLS tools generate higher quality solutions under user-specified constraintsPerform flexible DSEs to provide Pareto solutions that are not currently supported by HLS toolsFind solutions perfectly matching various DSP constraintsExecute up to 400X faster than the heuristic algorithms and HLS tools61Summary of IronMan

62. BackgroundGraph Neural Network (GNN)Reinforcement Learning (RL)High Level Synthesis (HLS)HLS: Program-to-CircuitIRONMAN: GNN-assisted design space exploration in high-level synthesis via reinforcement learningNan Wu, Yuan Xie, and Cong Hao, GLSVLSI, 2021Program-to-Circuit: Exploiting GNNs for Program Representation and Circuit TranslationNan Wu, Huake He, Yuan Xie, Pan Li, Cong Hao, arXiv, 2021And beyond: Program-to-XCall for future research interests 62Outline

63. 63Diving Deeper: Program-to-Circuitx1 = a + b;x2 = b + c;x3 = d * e;y1 = x1 * x2;y2 = x2 * x3;o1 = y1 + 5;o2 = y1 * y2;ProgramRTL (Circuit)ADDMULREGREGREGREGActual ImplementationHours to days…A few Seconds++***+*       Data Flow Graph (DFG) Minutes to hoursExtract the DFG…… and predict the implementation quality accuratelyHow powerful are GNNs to represent programs and to solve Program-to-Circuit problem?

64. 64From IronMan to Program-to-CircuitIronManProgram-to-CircuitOnly MUL and ADD operationsAll types of operations: arithmetic computation, memory (load/store), control (loop, branch), etc.Data Flow Graph (DFG)Control Data Flow Graph (CDFG)10 node features (one-hot embedding)More complex learnable node embeddings2-layer GCN14 types of GNNsNan Wu, Huake He, Yuan Xie, Pan Li, Cong Hao, “Program-to-Circuit: Exploiting GNNs for Program Representation and Circuit Translation”, arXiv, 2021

65. 65Prediction Performance on DFG and CDFG

66. 66Prediction Performance on DFG and CDFGDFG v.s. CDFG

67. 67Prediction Performance on DFG and CDFG

68. DFG v.s. CDFGCDFG size is approximately twice as large as DFGMore difficult for graph-level regressionCDFGs has a considerable number of loopsChallenge the representation power of GNNsControl nodes/edges introduce additional complexity68Discussions (1): DFG v.s. CDFGExample DFGExample CDFGloop

69. GNN Model ChoicesPNA and RGCN generally show superior performancePNA with multiple aggregators is more powerful to characterize different neighborhood informationRelational information is important in IR graphs, representing data/control dependency69Discussions (2): GNN Model ChoicesPrinciple Neighborhood Aggregation (PNA)Corso, G., Cavalleri, L., Beaini, D., Liò, P. and Veličković, P., Principal Neighborhood Aggregation for Graph Nets. NeurIPS, 2020Schlichtkrull, Michael, et al. "Modeling relational data with graph convolutional networks." European semantic web conference, 2018.Relational Graph Convolutional Network (RGCN)

70. CP Timing: shows lower error rates and better consistency between DFGs and CDFGs(Possibly) CP timing is insensitive to graph sizes since it captures local information between FFs70Discussions (3): Global v.s. Local InformationDQDQCombinational LogicinoutclkFF1FF2 Critical Path (CP): local informationvoid top_dfg(int* a, int* b, int* c){    *a = (*b) + 2;    int t1 = (*a) * (*b) + 4;    int t2 = t1 + (*c);    int t3 = (*b) * (t2);    *c = t1 + t2 + t3;}(a) Program(c) CircuitDSPsn11_readn1_portn2_portn13_writen12_addn14_muln15_addn3_portn21_writen16_readn17_addn20_addn19_addn18_mul(b) IR graphSLICEResource: global information

71. 71GNNs v.s. HLSHLS PredictionPNA Prediction

72. 72GNNs v.s. HLSHLS PredictionPNA PredictionGNNs are slightly better in resource but obviously better in CP timing, which is hard-to-predict

73. Generalization across graph sizes, node degrees, and out-of-distribution casesStructural and algorithmic innovations to process heavy-loop graph topologiesGNNs are in general fragile in loops (1-WL test)Small loops usually relate to memory operations and are confusing for resource estimationGap from classification to regressionMost existing GNNs are for classification (node, link, graph)The resource/timing prediction is regression → a harder problem!73Observed Challenges (so far…)

74. BackgroundGraph Neural Network (GNN)Reinforcement Learning (RL)High Level Synthesis (HLS)HLS: Program-to-CircuitIRONMAN: GNN-assisted design space exploration in high-level synthesis via reinforcement learningNan Wu, Yuan Xie, and Cong Hao, GLSVLSI, 2021Program-to-Circuit: Exploiting GNNs for Program Representation and Circuit TranslationNan Wu, Huake He, Yuan Xie, Pan Li, Cong Hao, arXiv, 2021And beyond: Program-to-XCall for future research interests 74Outline

75. 75From Program-to-Circuit to Program-to-XProgramToCircuitProgramToX

76. GNN representation power for programsHandling loops (very typical in programs)Largely varied size…Specific GNN design for different down-stream tasksProgram analysis on vulnerabilityBehavior predictionThroughput estimation…Benchmark development76From Program-to-Circuit to Program-to-X

77. Discussed the power of GNN and RL in solving EDA problems, e.g., HLSIronMan!Discussed a more general problem, Program-to-CircuitHoping for more datasets, better GNNs, more domain-specific insights, etc.The potential from Program-to-Circuit to Program-to-XLooking forward to seeing more works!Contact:callie.hao@ece.gatech.eduSharc-lab @ Georgia Tech (https://sharclab.ece.gatech.edu/)77Summary & Thanks!