/
Previously, we discussed about “prototyping” code for S Previously, we discussed about “prototyping” code for S

Previously, we discussed about “prototyping” code for S - PowerPoint Presentation

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
376 views
Uploaded On 2018-01-11

Previously, we discussed about “prototyping” code for S - PPT Presentation

evalsha1sv 21988 MHz evalsha256sv 1294 MHz Today we will consider prototyping the unfolding of SHA1 and SHA256 2 rounds per cycle evalsha12xsv 15117 MHz 31 slower Fmax evalsha2562xsv 8699 MHz 33 slower Fmax ID: 622773

mhz eval sha256 fmax eval mhz fmax sha256 sha1 unfolding aluts registers padding words implement slower strategy design block

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Previously, we discussed about “protot..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Previously, we discussed about “prototyping” code for SHA1 and SHA256

eval_sha1.sv (219.88 MHz)

eval_sha256.sv (129.4 MHz)

Today, we will consider prototyping the “unfolding” of SHA1 and SHA256 (2 rounds per cycle)

eval_sha1_2x.sv (151.17 MHz, 31% slower Fmax)

eval_sha256_2x.sv (86.99 MHz, 33% slower Fmax)

Note that doing 2 rounds/cycle does not reduce Fmax by 50%, more like 31-33%.Slide2

eval_sha1

#ALUTS = 205, #registers = 680

Fmax = 219.88 MHzSlide3

eval_sha1_2x

#ALUTS = 384, #registers = 679

Fmax = 151.17 MHzSlide4

eval_sha256

#ALUTS = 526, #registers = 774

Fmax = 129.4 MHzSlide5

eval_sha256_2x

#ALUTS = 940, #registers = 779

Fmax = 86.99 MHzSlide6

To implement unfolding, best to read in all 16 16 words from memory (or generate necessary padding) first before processing each block

To “hide” the delay of reading in 16 words (or generating padding), can read ahead the 16 words (generate padding) for the next blockUnfolding possibly a good design strategy for “DELAY” metric, but you will likely need to do a different design for the “AREA*DELAY” metric.

Can further improve unfolding performance by “pipelining” (see Lecture 10 on unfolding)

Can also pre-compute the W’s and the K’s as they do not depend on A, B, C, D, E …Slide7

To implement a different unfolding or pipelining strategy for each hash algorithm, you can implement a different state machine sequence. e.g.,