/
Wenlong Yang        Lingli Wang Wenlong Yang        Lingli Wang

Wenlong Yang Lingli Wang - PowerPoint Presentation

liane-varnes
liane-varnes . @liane-varnes
Follow
418 views
Uploaded On 2016-03-06

Wenlong Yang Lingli Wang - PPT Presentation

State Key Lab of ASIC and System Fudan University Shanghai China Alan Mishchenko Department of EECS University of California Berkeley 1 Lazy Mans Logic Synthesis Outline Introduction Previous Work ID: 244534

library input lms delay input library delay lms functions logic structures structure synthesis rec lut benchmarks add number cut

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Wenlong Yang Lingli Wang" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Wenlong Yang Lingli WangState Key Lab of ASIC and SystemFudan University, Shanghai, China

Alan MishchenkoDepartment of EECSUniversity of California, Berkeley

1

Lazy Man’s Logic SynthesisSlide2

OutlineIntroductionPrevious WorkLazy Man’s Logic Synthesis(LMS)Experimental ResultsConclusion & Future Work

2Slide3

IntroductionGoal of logic synthesis: Deriving a circuit or improving an available circuitWe proposed a “Lazy” approach to reuse optimal structures derived by other synthesis tools based on a pre-computed library

AIG

A Function with N variables

Other tools

LMS

precomputed library

3Slide4

OutlineIntroductionPrevious WorkLazy Man’s Logic Synthesis(LMS)Experimental ResultsConclusion

4Slide5

Previous WorkLogic synthesis based on precomputed library have been proposed in several papers, but they are all different from LMS:

Previous workPrecompute structures in terms of LUTs

[Kennings, IWLS

, 2010

]

Didn't use preexisting benchmarks or tools [

Bjesse

, ICCAD ,

2004

]

Look at only

4-5

input functions

[

Li, IWLS, 2011]Only compute multiple structure choices

[Chatterjee, TCAD, 2006]

LMS

Precompute structures in terms of AIGs

Use public benchmarks and existing tools

Look at 6-16 input functions

Store many equivalent structures

5Slide6

Previous Work – SOP Balancing

For each nodeCompute several k-input cutsPerform delay-optimal tree balancing of the SOPSelect the best one to replace the current structure.

An AIG subgraph found in benchmark

s27.blif

where SOP balancing loses to the proposed approach

F = !c*!b + !c*a

F’ = !c*!(b*!a)

6Slide7

OutlineIntroductionPrevious WorkLazy Man’s Logic Synthesis(LMS)Equivalence Classes Library Representation/Construction

ImplementationExperimental ResultsConclusion7Slide8

Equivalence Classes LMS is based on collecting, storing, and re-using circuit structures of Boolean functions with 6-16 input variables.

The total number of completely-specified Boolean functions of N variables is 2^(2^N).Experiments shows that even for the practical functions, this number can be very large.

To reduce the number and memory need to store functions in a library, a canonical form is used to break them into

Equivalence Classes

.

8Slide9

NPNTwo functions are NPN-equivalent if one of them can be obtained from the other by negation and/or permutation of the inputs and outputs.

Complete NPN canonical form is not affordable to LMS Drawbacks of NPN computation:Time-consuming

Complicated

9Slide10

Semi-Canonical FormThe idea is to order the input variables and the polarities of inputs/outputs using the number of positive minterms and cofactors w.r.t. each variable.

Input:

TruthTable

F

Determine the polarity of F by the number of 1’s in

TruthTable

Determine

the polarity of each variable by the number of 1s in the negative cofactor w.r.t. each variable

Sort

input variables by the number of 1s in their negative

cofactors and permute

inputs

accordingly

Output:

canonicized

TruthTable

F

A reasonable trade-off between accuracy and speed

10Slide11

Library RepresentationAn N-input library contains functions up to N variables.Structures of all functions are represented as a shared AIGEach output of the AIG is the root node of one logic structure.

When a library is loaded, the following actions are performed:A hash table is created to hash the outputs by its semi-canonical form.For each structure, the area and pin-to-output delays are computed and stored.11Slide12

Pin-To-Output Delay & Dominated Structure

Example of using pin-to-output delaysto compute structure delaySuppose arrival time:

{3, 2, 4, 5, 2, 3, 1}

Pin-to-output delay:

{3, 3, 3, 5, 5, 4, 1}

+

{6, 5, 7,

10

, 7, 7, 2}

=

If one structure’s pin-to-output delay is worse than another with respect to every input, the structure is

dominated

.

12Slide13

Library ConstructionLUT mapper if in ABC is used as a structural cut browser to generate K-input cuts whose logic structures are added to the library.

Input:

Cut C

I

f

cut C does not meet the

requirements

return

Compute

Boolean function F of cut C as a

truthtabl

e

Compute

the semi-canonical form of

F

R

ebuild

the structure of the cut in the

library

I

f

( the structure already exists or is dominated

)

return

A

dd

a new primary output to store the structure in the hash table

13Slide14

A case study of LMS: AIG level minimization

Input:

And-Inverter

Graph

For each node, in a topological

order

Compute several

K-input

cuts

For each cut

Compute truth table

Look up in

the

library

If

there is no structure for this function Mark the cut to ensure it is not selected as best cut

Else if the best structure found leads to smaller AIG level

Save the cut as the best cutIf

there is an improvement in level, update AIG

14Slide15

ImplementationThe LMS algorithm is implemented in ABC. The LUT mapper if in ABC is used as:(a) A cut browser for computing the libraries (b) A mapper in the case study on AIG level minimization

Commands related to library construction:rec_start: Starts the LMS recorder.rec_add: Add structures from benchmarks

rec_filter: Removes the structures with less frequency

rec_merge

: Merges two previously computed libraries

rec_ps

: Prints statistics for the currently loaded library

rec_use

: Transforms the internal library to the current network in ABC

rec_stop

: Deletes the current library.

Commands used to perform LMS mapping:

if –y –K <

num

> -C<num>

-y enables level optimization by LMS -K <num> is the cut size

-C <num

> is the number of cuts used at each node

15Slide16

OutlineIntroductionPrevious WorkLazy Man’s Logic Synthesis(LMS)Experimental ResultsLibrary Coverage6-input Library

Optimize Delay After LUT MappingConclusion16Slide17

Library CoverageThis experiment was performed to show that LMS has practical memory requirements for functions up to 12 inputs.Semi-canonical classes of all functions appearing in the cuts of the benchmark circuits without synthesis, were collected and the frequency of their appearance was recorded.

occurrence frequency

~2 M classes in total

~740 K classes for 90% functions

~400MB for truth tables

17

Function #Slide18

Constructing Library for 6-input Functions The goal of this experiment is to derive a 6-input library used in the following case study of AIG level minimization.

The following ABC scripts are used to collect structures:• read file; st; rec_add; • dc2; rec_add; • if -K 8; bidec; st; rec_add;• if -K 8; mfs; st; rec_add;• if -K 8; bidec; st; rec_add;• if -g -K 6; st; rec_add;

• if -g -K 6; st; rec_add;

Inputs

Classes #

Structures #

Ratio

2

3

3

1.00

3

32

88

2.75

4

2,430

12,673

5.22

5

98,208

471,973

4.81

6

1,148,556

5,202,924

4.53

Total

1,249,229

5,687,661

4.55

Statistics of the precomputed 6-input library

~77MB AIGER file

18Slide19

Optimize Delay After LUT Mapping

Two sets of benchmarks are used in this paper: 20 MCNC benchmarks and 10 large Altera benchmarks.LUT mapping was performed by the following scripts:Map: st; resyn2; if -K 4 or 6MapC: st; resyn2; dch -f; if -K 4 or 6

SOPBC: st; if -gm -K 6; st; resyn2; dch -f; if -K 4 or 6

LMSC:

st; if -ym -K 6; st; resyn2; dch -f; if -K 4 or 6

Benchmarks were run on a workstation with a Intel Xeon Quad Core CPU and

256

GBytes RAM (~4GB used for the experiment)

The resulting networks were verified by command cec in ABC.

19Slide20

Mapping results for Altera benchmarks(4-LUTS)LMSC reduced delay by

37% with an area increase of 13% 20Slide21

Mapping results for Altera benchmarks(6-LUTS)

LMSC reduced delay by 26% with an area increase of 13% 21Slide22

Mapping results for MCNC benchmarksDesign

4-LUT level

4-LUT count

6-LUT level

6-LUT count

 

Map

MapC

SOPBC

LMSC

Map

MapC

SOPBC

LMSC

Map

MapC

SOPBC

LMSC

Map

MapC

SOPBC

LMSC

alu4

7

7

7

7

694

701

702

714

5

5

5

5

503

525

520

532

apex2

8

8

8

8

871

867

874

890

6

6

6

6

691

683

728

711

b14

21

20

17

17

1761

1771

1913

1849

13

13

10

11

1275

1263

1517

1442

b15

22

22

21

21

3147

3103

3186

3233

15

15

14

13

2119

2211

2255

2419

b17

31

31

27

26

9676

9507

9527

9570

21

21

16

16

6510

6356

6667

6670

b20

23

22

19

19

3692

3587

3886

3829

15

15

12122679261930703044b21232220193768361238473908151511122701257731143115b22232319195423528056935729151512113985384746384677clma13131212401640084189415099882975289431453246des666612281257124912735554824862866953elliptic88884314324424436666317317327333ex5p66664714624724815454351382378408frisc20201916227922612332227913121191807181118831948i10141413127467417437419999598608575583pdc9888192620471925207577671428135016191416s385849988402139783985398066662720280228162831s537866554594514704684444356355369358seq66669469359489415555685668707696spla9998189918031860192877661414136114451455tseng131312107568007438098866648694689731Raito1.00 0.99 0.92 0.90 1.00 1.00 1.02 1.03 1.00 0.99 0.90 0.88 1.00 1.00 1.07 1.08

4-LUTs: LMSC reduced delay by 10% with an area increase of 3%

6-LUTs: LMSC reduced delay by 12% with an area increase of 8%

22Slide23

ConclusionA new method to harvest and re-use circuit structures produced by different tools on benchmark circuits

The “lazy” approach is made practical by A semi-canonical form to reduce the number of equivalence classesUsing AIGs to store precomputed libraries in memory and on diskUsing truth tables to manipulate Boolean functions

As

the

case-study, the proposed approach was applied

to

improve delay after FPGA

mapping

For

industrial benchmarks,

compared to SOP balancing,

the

delay was reduced by

17% (18%) for LUT

4 (LUT6) the

area penalty was 2% (5%)

23Slide24

Future workImproving implementationReducing memory by using a low-memory AIGBuilding libraries in terms of multi-input gatesFiltering libraries based on their performanceGiving the user control over the area increase

Continuing experimentsPerforming case studies with larger functionsEvaluating delay improvements after P&R24Slide25

Q&AAuthors' E-mail:Wenlong

Yang allanwin@hotmail.comLingli Wang llwang@fudan.edu.cnAlan Mishchenko

alanmi@eecs.berkeley.edu

25Slide26

Abstract Deriving a circuit for a Boolean function or improving an available circuit are typical tasks solved by logic synthesis. Numerous algorithms in this area have been proposed and implemented over the last 50 years. This paper presents a "lazy” approach to logic synthesis based on the following observations: (a) optimal or near-optimal circuits for many practical functions are already derived by the tools, making it unnecessary to implement new algorithms or even run the old ones repeatedly; (b) larger circuits are composed of smaller ones, which are often isomorphic up to a permutation/negation of inputs/outputs. Experiments confirm these observations. Moreover, a case-study shows that logic level minimization using lazy man’s synthesis improves delay after LUT mapping into 4- and 6-input LUTs, compared to earlier work on high-effort delay optimization.