/
ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars

ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars - PowerPoint Presentation

edolie
edolie . @edolie
Follow
65 views
Uploaded On 2023-12-30

ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars - PPT Presentation

Ali Shafiee Anirban Nag Naveen Muralimanohar Rajeev Balasubramonian John Paul Strachan Miao Hu R Stanley Williams Vivek Srikumar University of Utah ID: 1035684

bit16 bit2 1kernel adc bit2 bit16 adc 1kernel isaac dadiannao design resolution efficiency balanced msb power bitsw memristor overheads

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "ISAAC: A Convolutional Neural Network Ac..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in CrossbarsAli Shafiee*, Anirban Nag*, Naveen Muralimanohar†, Rajeev Balasubramonian*, John Paul Strachan†, Miao Hu†, R. Stanley Williams†, Vivek Srikumar**University of Utah †Hewlett Packard Labs

2. Executive Summary2Classifying Images is in vogueConv netsare the bestLots of vector-matrix multiplicationAnalog memristorcrossbar is a great fitAnalog to Digital conversion overheads!Smart encoding reduces such overheads Balanced pipeline critical for high efficiencyPreserving high precision is essential in analogISAAC 14.8x better in throughput and 5.5x better in energy than digital state of the art (DaDianNao)

3. State of the art Convolutional Neural Networks3Deep Residual NetworksConvolution LayersPooling LayersFully Connected Layers152 layers!11 billion operations!

4. Convolution OperationNxNi = 3Ky = 2Kx = 2Kernel 0Stride Sx, Sy = 1Kernel 1Kernel 2No = 3Ny4

5. Memristor Dot-product Enginex0x1x2x3V1G1I1 = V1.G1V2G2I2 = V2.G2I = I1 + I2 =V1.G1 + V2.G2w00w01w02w03w10w11w12w13w30w31w32w33w20w21w22w23y0y1y2y35

6. Memristor Dot-product Engine6Ky = 2Kx = 2Kernel 0Stride Sx, Sy = 1Kernel 1Kernel 2NxNi = 3NyNo = 3Kernel 0Kernel 1Kernel 2

7. Crossbar716-bit16-bit16-bit16-bit16-bit16-bit16-bit16-bitInput Neurons2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit2-bit16 iterations1-bit1-bit1-bit1-bit

8. ISAAC Organization8Digital To AnalogOutputRegisterShift and AddSigmoidRows0 - 12716 IterationsPartial Output 0Partial Output 1Rows 128 - 255Analog to DigitalInput RegisterCrossbarOutputRegisterShift and Add

9. An ISAAC Chip – Inter-Tile Pipelined 9Layer 1Layer 2Layer 3eDRAMeDRAMeDRAMTile 1Tile 2Tile 3

10. Balanced Pipeline10Layer i: Sx = 1 and Sy = 2Replicate layer i–1 two times.Storage allocation:Start from last layer Not computed yetReceived from previous layerServiced and released

11. Balanced Pipeline11Crossbar128x128Crossbar128x128Crossbar128x128Sx = 2, Sy = 2Sx = 1 Sy = 2Crossbar128x128Crossbar128x128Crossbar128x128Crossbar128x128Crossbar128x128Crossbar128x128Crossbar128x128Crossbar128x128Crossbar128x128Crossbar128x128Sx = 2, Sy = 2

12. The ADC overhead12Large areaPower hungryArea and power increases exponentially with ADC resolution and frequency

13. The ADC overhead13ADC Resolution= log (R) + v + w – 1 (if v=1)ADC Resolution = 9 bitsw bitsw bitsw bitsw bitsR rowsv bitsv bitsv bitsv bitsDACMemristor cellslog (R) + v + w – 1 w = 2 R = 128 v = 1 9-bit ADC

14. Encoding Scheme14 1111DACMemristor cells   R-1     If MSB = 1 with maximal inputIf MSB = 1Store weights in flipped form such that MSB = 0 always.Effective ADC resolution required = 8 bitsMSB = 0MAXIMALINPUT

15. Handling Signed Arithmetic15Input neurons MSB = 1 represents −215 For 16th iteration do shift-and-subtractWeightsBias of 215 Subtract as many biases as the number of 1s in input2’s ComplimentLike FP exponent representation

16. Analysis Metrics16CE: Computational Efficiency -> GOPS/s × mm2PE: Power Efficiency -> GOPS/WSE: Storage Efficiency -> MB/mm2

17. Design Space Exploration171) rows per crossbar2) ADCs per IMA3) crossbars per IMA4) IMA per tile

18. Design Space Exploration18ISAAC-CEISAAC-PEISAAC-SEISAAC-CEISAAC-PEISAAC-SEGOPs/mm2GOPs/WVarious Design PointsVarious Design Points

19. Power Contribution1949%7%12%58%5%16%Hyper Transport3%Router

20. Improvement over DaDianNao (Throughput)20Throughput: 14.8x better because:Memristor crossbar have high computational parallelism2. DaDianNao fetches both inputs and weights from eDRAM, ISAAC fetches just inputs3. DaDianNao suffers due to bandwidth limitation in fully connected layers.ISAAC requires more power but is 5.5x better in terms of energy due to above reasons.Deep Neural Net Benchmarks

21. Conclusion21Takes advantage of analog in-situ computing. Fetches just the input neurons.Handles ADC overheads with smart encoding.Does not compromise on output precision.Is faster than DaDianNao due to 8x better computational efficiency and a balanced pipeline keeping all units busy. Few questions still remain: integrate online training?