/
Multiplier-less Multiplication Multiplier-less Multiplication

Multiplier-less Multiplication - PowerPoint Presentation

aaron
aaron . @aaron
Follow
366 views
Uploaded On 2018-09-26

Multiplier-less Multiplication - PPT Presentation

by Constants Dr Shoab A Khan Multiplication by Constant In many algorithms a large percentage of multiplications are by constants Complexity of a general purpose multiplier is not required Generate Partial Products PPs only for 1s in the constant multiplier ID: 680398

filter fir implementation csd fir filter csd implementation based sign cpa multiplier digit coefficient tree form design common feedback

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Multiplier-less Multiplication" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Multiplier-less Multiplicationby Constants

Dr. Shoab A. KhanSlide2

Multiplication by ConstantIn many algorithms a large percentage of multiplications are by constants Complexity of a general purpose multiplier is not required

Generate Partial Products (PPs) only for 1s in the constant multiplier

The number of PPs can be further reduced using canonic sign digit format

2Slide3

Example: FIR FilterIn an FIR filter all coefficients are constantFor a fully parallel implementation, general purpose multipliers are not required

Coefficients are converted in canonic sign digit form

3Slide4

No 2 consecutive bits are non-zeroContains minimum possible number of non-zero bitsRepresentation is uniqueCanonic Sign Digit (CSD)

4Slide5

CSD is obtained using string propertyExamples a Q1.7 format number

01111111

= 2

0 - 2

-7

= 10000001

01101111 → 01110001 → 10010001

k

= 2

0

- 2

-3

- 2

–7Kx = x20 - x2 -3 - x2 –7

5

Canonic Sign Digit (CSD)Slide6

Convolution summation with constant coefficients h[k] FIR filter6Slide7

Conversion of FIR Coefficient in CSD Only one nonzero CSD digit for approximately each 20 dB of

stopband

attenuation

Four non-zero digits per coefficient for 80 dB

stopband

attenuation

7Slide8

Converting to CSD and keeping 4 non-zero digits:Example: CSD Representation8

1 0 0 1 0 0 0 1 0 0 1

2

0

2

-1

2

-2

2

-3

2

-4

2

-5

2

-6

2

-7

2

-8

2

-9

2

-10

Let a coefficient is

0 1 1 0 1 1 1 0 1 1 0

1 1Slide9

CSD multiplier

xn as it is

Shift

xn

by 3 Take 1’s complement

Add a 1 to the LSB, sign extension

s s s

1

Shift

xn

by 7 Take 1’s complement,

Add a 1 to the LSB, sign extension

s s s s s s s

1

Shift

xn

by

10

Take 1’s complement,

Add a 1 to the LSB, sign extension

s

s

s

s

s

s

s

s

s

s

1Slide10

CSD Multiplier in 5-coeff FIR filter10

REG

REG

REG

REG

x

n

N

N

N

N

N

X

n-1

X

n-2

X

n-3

X

n-4

h

0

h

1

h

2

h

3

h

4Slide11

An Optimal Direct Form FIR Filter Architecture

11Slide12

Example: CSD Representation0110111011011101

0110111011100101

0110111100100101

0111000100100101

1001000100100101Slide13

CSD FIR paper

CANONICAL SIGNED DIGIT REPRESENTATION FOR FIR DIGITAL FILTERSSlide14

14Slide15

Use compression tree and remove the use of CPA in a feedback loopThe result is kept in partial sum and partial carry formThe first order difference equation changes to Optimized DFG Transformation

15Slide16

Example 1: First Order IIR Filter

DFG with one adder and one multiplier in the critical path.

Transformed DFG with Wallace compression tree and CPA outside the feedback loopSlide17

Example 2: DFT 2nd Order IIR FilterSlide18

Example: Optimal Mapping: Design Option 1

18

Optimized implementation with CSD multipliers, compression trees and CPA outside the IIR filterSlide19

Design Option 2

Using unified reduction trees for the

feedforward

and feedback computations and CPA outside the filterSlide20

Design Option 3

CPA outside the feedback loopSlide21

FIR Filter: Direct FormSlide22

All multiplications are implemented as onecompression tree and a single CPASlide23

h[n] = [0.0246 0.2344 0.4821 0.2344 0.0246]h[n] = round(h[n]*215) = [805 7680 15798 7680 805]

16’b0000_0110_0100_1010

16’b0011_1100_0000_0000

16’b0011_1101_1011_0110

16’b0011_1100_0000_0000

16’b0000_0110_0100_1010

Example: Conversion to Fixed-Point

23Slide24

Conversion in CSD24Slide25

Keeping maximum of 4 non-zero CSD in each coefficient results in25Slide26

Input to Compression Tree26Slide27

CV Computation for first CSD multiplier27Slide28

Pipelined DF FIR Filter28

Pipeline direct form FIR filter for FPGAs with DSP48 blocksSlide29

Critical Path

Transpose Direct Form FIR Filter

X

X

X

+

X

X

+

+

+

h

0

h

1

h

2

h

3

h

4

x

n

x

n

h

0

x

n

h

1

x

n

h

2

x

n

h

3

x

n

h

4Slide30

Critical PathSlide31

Filter ImplementationSlide32

TD FIR with one stage of pipelining registers Slide33

Deeply pipelined TDF FIRfilter with critical path equal to one full adder delaySlide34

Same Example34Slide35

TDF Implementation 35Slide36

Example from the BookSlide37

Hybrid FIR Filter Structure37Slide38

Hybrid Designs 38Slide39

Adv DSD contentsComplexity Reduction Slide40

Complexity ReductionConstituent sub graphs that are shared in the original graphExample: three multipliers, 3, 53 and 585 with xSlide41

41Slide42

Optimized ImplementationSelected sub-graphs from previous slideSlide43

Find common sub-expressionEliminate their re-use43Slide44

Example: Common Sub-expression Elimination

(a)

(b)Slide45

Horizontal Common Sub-expressions for the example in the text

45Slide46

Vertical Sub-expressions Elimination46Slide47

Common Sub Expression 47Slide48

Optimized implementation exploiting vertical common sub-expressions48Slide49

Example of horizontal and vertical sub-expressions elimination49Slide50

Distributed Arithmetic Based Design50

Yet another way of looking at dot product design Slide51

ROM for Distributed Arithmeticx

2b

x

1b

x

0b

Contents of ROM

0

0

0

0

0

0

1

A

00

10

A

10

11

A

1

+

A

0

1

0

0

A

2

1

0

1

A

2

+

A

0

110A2+ A1

1

1

1

A

2

+

A

1

+

A

0Slide52

DA for computing the dot product of integer numbers for N=4 and K=352Slide53

Look-up table 53

x

2b

x

1b

x

0b

Contents of ROM

0

0

0

0

0

0

0

1

A

0

3

0

1

0

A

1

-1

0

1

1

A

1

+

A

0

2

1

0

0

A

2

5

1

0

1

A

2

+

A

0

8

1

1

0

A

2

+

A

1

4

1

1

1

A

2

+

A

1

+

A

0

7Slide54

DA-based architecture for implementing an FIR filter of length L and N-bit data samples

54Slide55

Cycle by cycle working of DA55

Cycle

Address

LUT

Accumulator

0

3’b100

5

000101_000

1

3’b111

7

001001_100

2

3’b000

-1

000011_110

3

3’b101

8

111001_111Slide56

DA-based parallel implementation of an 18-coefficient FIR filter setting L=3 and M=6

56Slide57

A LUT-less implementation of a DA-based FIR filter57Slide58

A parallel implementation for M=K uses a 2:1 MUX, compression tree and a CPA58Slide59

Reducing the output of the multiplexers using a CPA-based adder tree and one accumulator59Slide60

DA-based IIR filter design 60Slide61

Two ROM-based design61Slide62

One ROM-based design62Slide63

DFT implementation using circular convolution63Slide64

Optimized TDF implementation of the DF implementation in previous figure64Slide65

Questions/Feedback !!

65