by Constants Dr Shoab A Khan Multiplication by Constant In many algorithms a large percentage of multiplications are by constants Complexity of a general purpose multiplier is not required Generate Partial Products PPs only for 1s in the constant multiplier ID: 680398
Download Presentation The PPT/PDF document "Multiplier-less Multiplication" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Multiplier-less Multiplicationby Constants
Dr. Shoab A. KhanSlide2
Multiplication by ConstantIn many algorithms a large percentage of multiplications are by constants Complexity of a general purpose multiplier is not required
Generate Partial Products (PPs) only for 1s in the constant multiplier
The number of PPs can be further reduced using canonic sign digit format
2Slide3
Example: FIR FilterIn an FIR filter all coefficients are constantFor a fully parallel implementation, general purpose multipliers are not required
Coefficients are converted in canonic sign digit form
3Slide4
No 2 consecutive bits are non-zeroContains minimum possible number of non-zero bitsRepresentation is uniqueCanonic Sign Digit (CSD)
4Slide5
CSD is obtained using string propertyExamples a Q1.7 format number
01111111
= 2
0 - 2
-7
= 10000001
01101111 → 01110001 → 10010001
k
= 2
0
- 2
-3
- 2
–7Kx = x20 - x2 -3 - x2 –7
5
Canonic Sign Digit (CSD)Slide6
Convolution summation with constant coefficients h[k] FIR filter6Slide7
Conversion of FIR Coefficient in CSD Only one nonzero CSD digit for approximately each 20 dB of
stopband
attenuation
Four non-zero digits per coefficient for 80 dB
stopband
attenuation
7Slide8
Converting to CSD and keeping 4 non-zero digits:Example: CSD Representation8
1 0 0 1 0 0 0 1 0 0 1
2
0
2
-1
2
-2
2
-3
2
-4
2
-5
2
-6
2
-7
2
-8
2
-9
2
-10
Let a coefficient is
0 1 1 0 1 1 1 0 1 1 0
1 1Slide9
CSD multiplier
xn as it is
Shift
xn
by 3 Take 1’s complement
Add a 1 to the LSB, sign extension
s s s
1
Shift
xn
by 7 Take 1’s complement,
Add a 1 to the LSB, sign extension
s s s s s s s
1
Shift
xn
by
10
Take 1’s complement,
Add a 1 to the LSB, sign extension
s
s
s
s
s
s
s
s
s
s
1Slide10
CSD Multiplier in 5-coeff FIR filter10
REG
REG
REG
REG
x
n
N
N
N
N
N
X
n-1
X
n-2
X
n-3
X
n-4
h
0
h
1
h
2
h
3
h
4Slide11
An Optimal Direct Form FIR Filter Architecture
11Slide12
Example: CSD Representation0110111011011101
0110111011100101
0110111100100101
0111000100100101
1001000100100101Slide13
CSD FIR paper
CANONICAL SIGNED DIGIT REPRESENTATION FOR FIR DIGITAL FILTERSSlide14
14Slide15
Use compression tree and remove the use of CPA in a feedback loopThe result is kept in partial sum and partial carry formThe first order difference equation changes to Optimized DFG Transformation
15Slide16
Example 1: First Order IIR Filter
DFG with one adder and one multiplier in the critical path.
Transformed DFG with Wallace compression tree and CPA outside the feedback loopSlide17
Example 2: DFT 2nd Order IIR FilterSlide18
Example: Optimal Mapping: Design Option 1
18
Optimized implementation with CSD multipliers, compression trees and CPA outside the IIR filterSlide19
Design Option 2
Using unified reduction trees for the
feedforward
and feedback computations and CPA outside the filterSlide20
Design Option 3
CPA outside the feedback loopSlide21
FIR Filter: Direct FormSlide22
All multiplications are implemented as onecompression tree and a single CPASlide23
h[n] = [0.0246 0.2344 0.4821 0.2344 0.0246]h[n] = round(h[n]*215) = [805 7680 15798 7680 805]
16’b0000_0110_0100_1010
16’b0011_1100_0000_0000
16’b0011_1101_1011_0110
16’b0011_1100_0000_0000
16’b0000_0110_0100_1010
Example: Conversion to Fixed-Point
23Slide24
Conversion in CSD24Slide25
Keeping maximum of 4 non-zero CSD in each coefficient results in25Slide26
Input to Compression Tree26Slide27
CV Computation for first CSD multiplier27Slide28
Pipelined DF FIR Filter28
Pipeline direct form FIR filter for FPGAs with DSP48 blocksSlide29
Critical Path
Transpose Direct Form FIR Filter
X
X
X
+
X
X
+
+
+
h
0
h
1
h
2
h
3
h
4
x
n
x
n
h
0
x
n
h
1
x
n
h
2
x
n
h
3
x
n
h
4Slide30
Critical PathSlide31
Filter ImplementationSlide32
TD FIR with one stage of pipelining registers Slide33
Deeply pipelined TDF FIRfilter with critical path equal to one full adder delaySlide34
Same Example34Slide35
TDF Implementation 35Slide36
Example from the BookSlide37
Hybrid FIR Filter Structure37Slide38
Hybrid Designs 38Slide39
Adv DSD contentsComplexity Reduction Slide40
Complexity ReductionConstituent sub graphs that are shared in the original graphExample: three multipliers, 3, 53 and 585 with xSlide41
41Slide42
Optimized ImplementationSelected sub-graphs from previous slideSlide43
Find common sub-expressionEliminate their re-use43Slide44
Example: Common Sub-expression Elimination
(a)
(b)Slide45
Horizontal Common Sub-expressions for the example in the text
45Slide46
Vertical Sub-expressions Elimination46Slide47
Common Sub Expression 47Slide48
Optimized implementation exploiting vertical common sub-expressions48Slide49
Example of horizontal and vertical sub-expressions elimination49Slide50
Distributed Arithmetic Based Design50
Yet another way of looking at dot product design Slide51
ROM for Distributed Arithmeticx
2b
x
1b
x
0b
Contents of ROM
0
0
0
0
0
0
1
A
00
10
A
10
11
A
1
+
A
0
1
0
0
A
2
1
0
1
A
2
+
A
0
110A2+ A1
1
1
1
A
2
+
A
1
+
A
0Slide52
DA for computing the dot product of integer numbers for N=4 and K=352Slide53
Look-up table 53
x
2b
x
1b
x
0b
Contents of ROM
0
0
0
0
0
0
0
1
A
0
3
0
1
0
A
1
-1
0
1
1
A
1
+
A
0
2
1
0
0
A
2
5
1
0
1
A
2
+
A
0
8
1
1
0
A
2
+
A
1
4
1
1
1
A
2
+
A
1
+
A
0
7Slide54
DA-based architecture for implementing an FIR filter of length L and N-bit data samples
54Slide55
Cycle by cycle working of DA55
Cycle
Address
LUT
Accumulator
0
3’b100
5
000101_000
1
3’b111
7
001001_100
2
3’b000
-1
000011_110
3
3’b101
8
111001_111Slide56
DA-based parallel implementation of an 18-coefficient FIR filter setting L=3 and M=6
56Slide57
A LUT-less implementation of a DA-based FIR filter57Slide58
A parallel implementation for M=K uses a 2:1 MUX, compression tree and a CPA58Slide59
Reducing the output of the multiplexers using a CPA-based adder tree and one accumulator59Slide60
DA-based IIR filter design 60Slide61
Two ROM-based design61Slide62
One ROM-based design62Slide63
DFT implementation using circular convolution63Slide64
Optimized TDF implementation of the DF implementation in previous figure64Slide65
Questions/Feedback !!
65