/
Efficient Hardware Implementation of Artificial Efficient Hardware Implementation of Artificial

Efficient Hardware Implementation of Artificial - PowerPoint Presentation

molly
molly . @molly
Follow
0 views
Uploaded On 2024-03-13

Efficient Hardware Implementation of Artificial - PPT Presentation

Neural Networks Using Approximate MultiplyAccumulate Blocks Mohammadreza Esmali Nojehdeh Levent Aksoy and Mustafa Altun Emerging Circuits and Computation ECC Group Istanbul Technical University ID: 1048019

approximate pbam multipliers design pbam approximate design multipliers ann adders smac area power neuron architecture energy hardware mul 2nm

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Efficient Hardware Implementation of Art..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Efficient Hardware Implementation of ArtificialNeural Networks Using ApproximateMultiply-Accumulate BlocksMohammadreza Esmali Nojehdeh, Levent Aksoy and Mustafa AltunEmerging Circuits and Computation (ECC) GroupIstanbul Technical UniversityIEEE Computer Society Annual Symposium on VLSI 2020

2. OutlineIntroductionBackgroundMotivationANN Design by Exploiting Approximate blocksExperimental ResultsConclusions

3. IntroductionArtificial neural network (ANN) is a computing system made up of a number of simple and highly interconnected processing elementsANNs have been applied to a wide range of problemsclassification and pattern recognitionThey have been realized in different design platformsanalog, digital, hybrid very large scale integrated (VLSI) circuits, field programmable gate-arrays (FPGAs), and neuro-computers

4. Background  Hardware complexity of an ANN is dominated by the multiplication of weights by input variables.Neuron - a fundamental unit of ANNAn ANN architecture

5. BackgroundApproximate computing is used for area, power, and energy improvement, targeting applications not strictly requiring high accuracy including image processing and learning.Conventional mirror adder cell transistor level schematic[1]Approximate mirror adder cell transistor level schematic[1]Mirror Adder CellArea(m2)Conventional40.66Approximate13.54InputsAccurateApproximateABCinSumCoutSumCout00000000011000010101001101101001001101010111001111111111Truth table for conventional full adder and approximate adder[1]Layout Area of mirror adders[1][1]Almurib, H.A.F., Kumar, T.N., Lombardi, F., 2016. Inexact designs for approximate low power addition by cell replacement, in: 2016 Design, Automation Test in Europe Conference Exhibition .

6. MotivationControl logic – an up-counterComplexity  number of inputs (or weights)Multiplexers Complexity  number and bitwidths of inputs and weightsMultiplierComplexity  maximum bitwidths of inputs and weightsAdder and register (R)Complexity  bit-width of the inner product of inputs and weights Time-multiplexed design of a neuron) << k = / Simplified time-multiplexed design of a neuron

7. MotivationMultipliers and adders are frequently used in ANNs and dominate the hardware complexity. Since exploiting approximate multipliers and adders for neuron computation can be significantly reduces hardware complexity, taking into account the deviation in ANN accuracy. ApproximateApproximateX+

8. Time-Multiplexed ANN DesignThe design procedure has three main steps:Given the ANN structure, train the ANN using state-of-art techniques and find the weight and bias valuesPost-training stageDetermine the minimum quantization valueConvert the floating-point weight and bias values to integersReplace multipliers and adders by approximate version and check accuracyDescribe the time-multiplexed ANN design in hardware

9. TrainingOur training tool includesseveral iterative optimization algorithms, namely conventional and stochastic gradient descent methods and Adam optimizer [2]different weight initialization techniques, namely Xavier [3], He [4], and fully randomseveral stopping criteria, namely number of iterations, early stopping using validation data set, and saturation of logic functionsdifferent activation functions for neurons in each layer, namely sigmoid, hyperbolic tangent, hard sigmoid, hard hyperbolic tangent, linear rectified linear unit, and softmax[2] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv e-prints, 2014, arXiv:1412.6980.[3] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in International Conference on Artificial Intelligence and Statistics, 2010, pp. 249–256.[4] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” arXiv e-prints, 2015, arXiv:1502.01852.

10. Hardware-aware Post-trainingComputing the minimum quantization value Set the quantization value, q, and the related ANN accuracy in hardware, ha(q), to 0Increase q by 1Convert each floating-point weight and bias value to an integer by multiplying it 2q and ceiling this multiplication resultCompute ha(q) value on the validation data set using the integer weight valuesIf ha(q) > 0 and ha(q) – ha(q-1) > 0.1%, go to Step 2Otherwise, return q as the minimum quantization value

11. Hardware Design ANN Design Using a MAC Block for each Neuron (SMAC NEURON) ANN Design Using a Single MAC Block (SMAC ANN)

12. Hardware Design Exact 4-bit Unsigned MultiplierApproximate 4-bit Unsigned Multiplier with Lest 3 bits are set to logic value 0Approximate multiplier is implemented by setting r least significant output of an exact multiplier to zero, where r denotes its approximation level.000

13. Experimental ResultsPen-based handwritten digit recognition problem [24] was used as an application.In the convolutional neural network design of this application, 5 ANN structures with different number of hidden layers and number of neurons in the hidden layers were used.ANN structure is 16-16-10 and was implemented in two different architecturesTime-multiplexed using a MAC block for each neuronTime-multiplexed using a single MAC block for ANN ANN designs were described in Verilog and synthesized using the Cadence RTL Compiler with the TSMC 40nm design library.

14. Experimental ResultsRESULTS OF SMAC NEURON ARCHITECTURE USING APPROXIMATE MULTIPLIERS.Multiplier TypeApproximate levelarea(m2)delay(ns)latency(ns)power(mW)energy(pj) HMR area gainenergy gainHiddenOutputBehavioral00 153273.58121.681.44174.775.000%0%mul12s_2NM[5]NANA139293.72126.311.23155.045.129%11%Mul_12s_2KM[5]NANA172273.70125.801.44181.335.00-12%-3%PBAM[6]711132763.57121.351.31159.144.8513%9%PBAM[6]71212,9923.66124.371.30161.525.037%15%PBAM[6]811127613.41115.911.26145.515.3717%17%LEBZAM69119993.68125.021.00125.215.0328%22%LEBZAM711102243.45117.401.04122.054.8030%33%LEBZAM71297233.41116.010.94109.415.0937%36%[5] V. Mrazek, R. Hrbacek, Z. Vasicek, and L. Sekanina, “Evoapproxsb:Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods,” in Design, Automation and Test in Europe Conference and Exhibition (DATE), 2017, pp. 258–261.[6] M. E. Nojehdeh and M. Altun, “Systematic synthesis of approximate adders and multipliers with accurate error calculations,” Integration, vol. 70, pp. 99 – 107, 2020.

15. Experimental ResultsRESULTS OF SMAC NEURON ARCHITECTURE USING APPROXIMATE MULTIPLIERS AND ADDERS.Multiplier TypeApproximate levelarea(m2)delay(ns)latency(ns)power(mW)energy(pj) HMR area gainenergy gainHiddenOutputMulAddMulAddBehavioral0000 153273.58121.681.44174.775.000%0%mul12s_2NM[5]NA10NA14118543.92133.140.5978.765.1723%55%Mul_12s_2KM[5]NA9NA15131333.95134.300.6992.485.3514%47%PBAM[6]771211102263.66124.370.6176.255.0321%56%PBAM[6]77121297983.64123.860.6175.705.2037%57%PBAM[6]77121393543.66124.370.6277.255.1739%56%LEBZAM610913103923.58121.720.5870.115.3232%60%LEBZAM712101388013.61122.880.5567.324.8943%61%LEBZAM711101489893.61122.810.5263.684.9763%41%[5] V. Mrazek, R. Hrbacek, Z. Vasicek, and L. Sekanina, “Evoapproxsb:Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods,” in Design, Automation and Test in Europe Conference and Exhibition (DATE), 2017, pp. 258–261.[6] M. E. Nojehdeh and M. Altun, “Systematic synthesis of approximate adders and multipliers with accurate error calculations,” Integration, vol. 70, pp. 99 – 107, 2020.

16. Experimental ResultsRESULTS OF SMAC ANN ARCHITECTURE USING APPROXIMATE MULTIPLIERS.Multiplier TypeApproximate levelarea(m2)delay(ns)latency(ns)power(mW)energy(pj) HMR area gainenergy gainBehavioral031803.521646.420.35569.335.000%0%mul12s_2NM[5]NA32783.721738.620.29499.805.00-3%12%Mul_12s_2KM[5]NA32793.771764.830.29504.745.00-3%11%PBAM[6]032873.791774.190.29518.385.00-3%9%PBAM[6]731943.761760.150.28499.604.83-1%12%PBAM[6]831483.241518.190.28431.605.352%24%LEBZAM531893.691725.980.27472.954.95-2%8%LEBZAM631523.691724.580.28490.384.941%14%LEBZAM730913.561664.680.27449.894.803%21%[5] V. Mrazek, R. Hrbacek, Z. Vasicek, and L. Sekanina, “Evoapproxsb:Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods,” in Design, Automation and Test in Europe Conference and Exhibition (DATE), 2017, pp. 258–261.[6] M. E. Nojehdeh and M. Altun, “Systematic synthesis of approximate adders and multipliers with accurate error calculations,” Integration, vol. 70, pp. 99 – 107, 2020.

17. Experimental ResultsRESULTS OF SMAC ANN ARCHITECTURE USING APPROXIMATE MULTIPLIERS AND ADDERS.Multiplier TypeApproximate levelarea(m2)delay(ns)latency(ns)power(mW)energy(pj) HMR area gainenergy gainMulAddBehavioral0031803.521646.420.35569.335.000%0%mul12s_2NM[5]NA1329083.401590.260.25391.635.069%31%Mul_12s_2KM[5]NA1331403.681721.300.26451.515.461%21%PBAM[6]71029723.551659.530.26426.625.037%25%PBAM[6]8929783.591679.180.25421.985.036%26%PBAM[6]71130293.841798.520.25448.544.665%21%LEBZAM61430463.531652.510.28469.894.954%17%LEBZAM71230413.621692.290.26440.254.664%23%LEBZAM713 30213.531650.170.26426.735.405%25%[5] V. Mrazek, R. Hrbacek, Z. Vasicek, and L. Sekanina, “Evoapproxsb:Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods,” in Design, Automation and Test in Europe Conference and Exhibition (DATE), 2017, pp. 258–261.[6] M. E. Nojehdeh and M. Altun, “Systematic synthesis of approximate adders and multipliers with accurate error calculations,” Integration, vol. 70, pp. 99 – 107, 2020.

18. Experimental ResultsMultiplier TypeApproximate levelarea(m2)delay(ns)latency(ns)power(mW)energy(pj) HMR area gainenergy gainHiddenOutputBehavioral00 153273.58121.681.44174.775.000%0%mul12s_2NM[5]NANA139293.72126.311.23155.045.129%11%Mul_12s_2KM[5]NANA172273.70125.801.44181.335.00-12%-3%PBAM[6]711132763.57121.351.31159.144.8513%9%PBAM[6]712129923.66124.371.30161.525.037%15%PBAM[6]811127613.41115.911.26145.515.3717%17%LEBZAM69119993.68125.021.00125.215.0328%22%LEBZAM711102243.45117.401.04122.054.8030%33%LEBZAM71297233.41116.010.94109.415.0937%36%Multiplier TypeApproximate levelarea(m2)delay(ns)latency(ns)power(mW)energy(pj) HMR area gainenergy gainHiddenOutputMAMABehavioral0000153273.58121.681.44174.775.000%0%mul12s_2NM[5]NA10NA14118543.92133.140.5978.765.1723%55%Mul_12s_2KM[5]NA9NA15131333.95134.300.6992.485.3514%47%PBAM[6]771211102263.66124.370.6176.255.0321%56%PBAM[6]77121297983.64123.860.6175.705.2037%57%PBAM[6]77121393543.66124.370.6277.255.1739%56%LEBZAM610913103923.58121.720.5870.115.3232%60%LEBZAM712101388013.61122.880.5567.324.8943%61%LEBZAM711101489893.61122.810.5263.684.9763%41%RESULTS OF SMAC NEURON ARCHITECTURE USING APPROXIMATE MULTIPLIERS.RESULTS OF SMAC NEURON ARCHITECTURE USING APPROXIMATE MULTIPLIERS AND ADDERS.[5] V. Mrazek, R. Hrbacek, Z. Vasicek, and L. Sekanina, “Evoapproxsb:Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods,” in Design, Automation and Test in Europe Conference and Exhibition (DATE), 2017, pp. 258–261.[6] M. E. Nojehdeh and M. Altun, “Systematic synthesis of approximate adders and multipliers with accurate error calculations,” Integration, vol. 70, pp. 99 – 107, 2020.

19. Experimental ResultsMultiplier TypeApproximate levelarea(m2)delay(ns)latency(ns)power(mW)energy(pj) HMR area gainenergy gainMulAddBehavioral0031803.521646.420.35569.335.000%0%Mul12s_2NM[5]NA1329083.401590.260.25391.635.069%31%Mul_12s_2KM[5]NA1331403.681721.300.26451.515.461%21%PBAM[6]71029723.551659.530.26426.625.037%25%PBAM[6]8929783.591679.180.25421.985.036%26%PBAM[6]71130293.841798.520.25448.544.665%21%LEBZAM61430463.531652.510.28469.894.954%17%LEBZAM71230413.621692.290.26440.254.664%23%LEBZAM713 30213.531650.170.26426.735.405%25%Multiplier TypeApproximate levelarea(m2)delay(ns)latency(ns)power(mW)energy(pj) HMR area gainenergy gainBehavioral031803.521646.420.35569.335.000%0%mul12s_2NM[5]NA32783.721738.620.29499.805.00-3%12%Mul_12s_2KM[5]NA32793.771764.830.29504.745.00-3%11%PBAM[6]032873.791774.190.29518.385.00-3%9%PBAM[6]731943.761760.150.28499.604.83-1%12%PBAM[6]831483.241518.190.28431.605.352%24%LEBZAM531893.691725.980.27472.954.95-2%8%LEBZAM631523.691724.580.28490.384.941%14%LEBZAM730913.561664.680.27449.894.803%21%RESULTS OF SMAC ANN ARCHITECTURE USING APPROXIMATE MULTIPLIERS.RESULTS OF SMAC ANN ARCHITECTURE USING APPROXIMATE MULTIPLIERS AND ADDERS.[5] V. Mrazek, R. Hrbacek, Z. Vasicek, and L. Sekanina, “Evoapproxsb:Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods,” in Design, Automation and Test in Europe Conference and Exhibition (DATE), 2017, pp. 258–261.[6] M. E. Nojehdeh and M. Altun, “Systematic synthesis of approximate adders and multipliers with accurate error calculations,” Integration, vol. 70, pp. 99 – 107, 2020.

20. Experimental ResultsMultiplier TypeApproximate levelarea(m2)delay(ns)latency(ns)power(mW)energy(pj) HMR area gainenergy gainBehavioral031803.521646.420.35569.335.000%0%mul12s_2NM[5]NA32783.721738.620.29499.805.00-3%12%Mul_12s_2KM[5]NA32793.771764.830.29504.745.00-3%11%PBAM[6]032873.791774.190.29518.385.00-3%9%PBAM[6]731943.761760.150.28499.604.83-1%12%PBAM[6]831483.241518.190.28431.605.352%24%LEBZAM531893.691725.980.27472.954.95-2%8%LEBZAM631523.691724.580.28490.384.941%14%LEBZAM730913.561664.680.27449.894.803%21%Multiplier TypeApproximate levelarea(m2)delay(ns)latency(ns)power(mW)energy(pj) HMR area gainenergy gainHiddenOutputBehavioral00 153273.58121.681.44174.775.000%0%mul12s_2NM[5]NANA139293.72126.311.23155.045.129%11%Mul_12s_2KM[5]NANA172273.70125.801.44181.335.00-12%-3%PBAM[6]711132763.57121.351.31159.144.8513%9%PBAM[6]712129923.66124.371.30161.525.037%15%PBAM[6]811127613.41115.911.26145.515.3717%17%LEBZAM69119993.68125.021.00125.215.0328%22%LEBZAM711102243.45117.401.04122.054.8030%33%LEBZAM71297233.41116.010.94109.415.0937%36%RESULTS OF SMAC ANN ARCHITECTURE USING APPROXIMATE MULTIPLIERS.RESULTS OF SMAC NEURON ARCHITECTURE USING APPROXIMATE MULTIPLIERS.

21. ConclusionsThis paper presented efficient techniques to reduce the hardware complexity of a time-multiplexed feedforward ANN designApproximate multipliers and adders are employed to reduce the hardware complexityIt is shown that the proposed techniques yield a significant reduction in design complexity

22. ACKNOWLDGEMENTThis work is supported by the TUBITAK-1001 projects #117E078 , #119E507 and Istanbul Technical University BAP project #42446.

23. Contact: Mohammadreza Esmali NojehdehE-mail: nojehdeh@itu.edu.trTHANKS for YOUR ATTENTIONQuestions