/
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning - PowerPoint Presentation

leusemij
leusemij . @leusemij
Follow
353 views
Uploaded On 2020-07-03

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning - PPT Presentation

T Chen T Moreau Z Jiang L Zheng S Jiao E Yan H Shen M Cowan L Wang Y Hu L Ceze C Guestrin and A Krishnamurthy Presentation by Grzegorz ID: 794598

level optimizations memory optimization optimizations level optimization memory hardware operator computation tvm computational search advantage limited operators operations tensor

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "TVM: An Automated End-to-End Optimizing ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

T. Chen, T. Moreau, Z. Jiang, L. Zheng, S. Jiao, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L.

Ceze

, C. Guestrin, and A. Krishnamurthy Presentation by Grzegorz wilk

Slide2

BackgroundGoal: Rewrite the computational graphs to functionally equivalent, but more efficient ones.Subject to what hardware backend we are running on.High level computation graph optimizations as well as operator level optimizations.

Slide3

TVM Computational Graph Optimizations

Fuses whichever operators it can to reduce memory operations

Transforms the shapes of intermediate tensor data to allow for more efficient execution on the used hardware

Slide4

But what about low-level?

ML computational graphs are often too high level to allow for hardware back-end specific operator-level optimizations.

Tensor-Flow,

PyTorch

, MXNet all leave it to vendor libraries.

Slide5

Separation of computation definition and low-level schedulingHalide’s approach extended with new optimizations.

Slide6

Operator-level optimizations

loop transformations

thread binding

locality computation

special memory scopetensorization latency hiding

Slide7

Operator-level optimizations

loop transformations

thread binding

locality computation

special memory scopetensorization latency hiding

Slide8

Memory scoping

Take advantage of memory locality in parallel settings (e.g. GPUs).

Slide9

Tensorization

Means of generically describing arbitrary tensor operations, to allow for seamlessly using TVM with new hardware capabilities.

Slide10

Latency hiding

Takes advantage of rearranging memory transfer operations to overlap computation with waiting on memory.

Slide11

Automatic optimization search space explorationCost model estimated using machine learning.Used to predict the strategies performance. Guides the automatic optimization exploration.

Runs experiments which then feedback to the cost model.

Slide12

Evaluation

Slide13

Slide14

CritiqueSolves the problem it posesEvaluates each proposed optimization

…as well as the end-to-end computational pipeline across 4 hardware platforms across 5 ML workloads…including specific operators within these workloads

Positives Negatives

Optimization search limited to one device, can’t take advantage of multi-GPU.

Missing descriptions of the operation optimizations which are included in TVM but weren’t introduced by it.

Limited evaluation of how long it takes to explore the optimization search space to produce the chosen strategy (example on just one conv2d operator).

Slide15

CritiqueSolves the problem it posesEvaluates each proposed optimization

…as well as the end-to-end computational pipeline across 4 hardware platforms across 5 ML workloads…including specific operators within these workloads

Positives Negatives

Optimization search limited to one device, can’t take advantage of multi-GPU.

Missing descriptions of the operation optimizations which are included in TVM but weren’t introduced by it.

Limited evaluation of how long it takes to explore the optimization search space to produce the chosen strategy (example on just one conv2d operator).

Slide16

fin