Wei jen Hsu TA for EE457 at USC Fall 2004 Modified in Fall 2005 A simple onebit full adder A B Cin S Cout It takes A B and Cin as input and generates S and Cout in 2 gate delays SOP ID: 910958
Download Presentation The PPT/PDF document "How does CLA (carry look-ahead adder) wo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
How does CLA (carry look-ahead adder) work?
Wei-
jen
Hsu
TA for EE457 at USC, Fall 2004
Modified in Fall 2005
Slide2A simple one-bit full adder
(+)
A
B
Cin
S
Cout
It takes A, B, and Cin as input and generates S and Cout in 2 gate delays (SOP)
Slide3(+)
4-bit RCA
(+)
A3
B3
S3
C4
(+)
A1
B1
S1
A2
B2
S2
(+)
A0
B0
C0
S0
C1
C2
C3
Work from lowest bit to highest bit sequentially.
With A0, B0, and C0, the lowest bit adder generates S0 and C1 in 2 gate delay.
With A1, B1, and C1 ready, the second bit adder generates S1 and C2 in 2 gate
delay.
Each bit adder has to wait for the lower bit adder to propagate the carry.
Carry propagation
forms a long sequential
wait chain, hence RCA
Is slow!!
Slide4Observations
The critical component each bit adder waits for is the carry input.
Instead of generating and propagating carry bit-by-bit, can we generate all of them in parallel and break the sequential chain?
This is exactly the idea of CLA (carry look-ahead adder).
Slide5Carry Look Ahead Logic
Now even before the carry in (
Cin
) is available, based on the inputs (A,B) only, can we say anything about the carry out?
Under what condition will the bit-cell propagate an outgoing carry (Cout), if there is an incoming carry (Cin)?Under what condition will the bit-cell generate an outgoing carry (Cout
), regardless of whether there is an incoming carry (Cin)?
Slide61-bit CLA adder (a primary cell)
(+)
A
B
Cin
S
p
g
Instead of
Cout
, an 1-bit CLA adder block takes A, B inputs and generates
p,g
p=propagate => I will propagate the
Cin
to the next bit. p = A+B
(If either A or B is 1, then
Cin
=1 causes
Cout
=1)
g=generate => I will generate a
Cout
independent of what
Cin
is. g = A.B
(If both A and B are 1,
Cout
=1 for sure)
p,g
are generated in 1 gate delay after we have A,B. Note that
Cin
is not needed
to generate p,g.
S is generated in 2 gate delays after we get Cin later (using SOP or XOR).
Slide74-bit CLA
(+)
A
B
C0
p
g
(+)
A
B
p
g
(+)
A
B
p
g
(+)
A
B
p
g
CLL (carry look-ahead logic)
The CLL takes
p,g
from all 4 bit-cells and C0 as inputs and generates all Cs in 2 gate delays.
C1=g0+p0C0,
C2=g1+p1g0+p1p0C0,
C3=g2+p2g1+p2p1g0+p2p1p0c0,
C4=g3+p3g2+p3p2g1+p3p2p1g0+p3p2p1p0c0 (Note: this C4 is too complicated
to generate in 2-level SOP representation as it exceeds Fan-in of 4 for AND
and
OR)
C1
C2
C3
C4
Slide84-bit CLA
(+)
A0
B0
C0
(+)
A1
B1
(+)
A2
B2
(+)
A3
B3
CLL (carry look-ahead logic)
p0
g0
p1
g1
p2
g2
p3
g3
Given A,B’s, all p,g’s are generated in 1 gate delay in parallel.
C1
C2
C3
Given all p,g’s, all C’s are generated in 2 gate delay in parallel.
S3
S2
S1
S0
Given all C’s, all S’s are generated in 2 gate delay in parallel.
Key virtue of CLA
: sequential operation in RCA is broken into parallel operation!!
Slide9Observation
The CLL block cannot be made arbitrarily big (at most 4 bits) because if the equations for C’s are too long it cannot be evaluated in 2 gate delay (
Fan-in limitation on gates is 4 as required by VLSI technology
).
So how about long operands, say 16 bits?We add another layer of CLL and make a multi-level CLA.
Slide1016-bit CLA
Same as before, p,g’s are generated in parallel in 1 gate delay
The
second-tier CLL
takes
the P,G’s from first-tier CLLs and the C0
to generate the “seed C’s”
for first-tier CLLs in 2 gate delay. (note that the logic for generating “seed C’s” from
P,G’s is exactly the same to generating C’s from
p,g’s
!)
The seed C’s form input
Cin
to the first-tier CLLs. They use this
Cin
and
p,g’s
to generate C’s in 2 gate delays.
With all C’s in place, S’s are calculated in 2 gate delays
Therefore, totally
1+2+2+2+2=9 gate delay
to finish the whole thing!!
Now, without input carry, the first-tier CLL cannot generate C’s……
Instead they generate P,G’s (group propagate and group generate) in 2 gate delays
P => This group will propagate the input carry as output carry P=p0p1p2p3
G => This group will generate an output carry G=g3+p3g2+p3p2g1+p3p2p1g0
Slide11Now, how about 64-bit CLA?
You can visualize that in mind by yourself now, I guess.
Slide12A bit more details
(+)
A0
B0
C0
(+)
A1
B1
(+)
A2
B2
(+)
A3
B3
CLL (carry look-ahead logic)
p0
g0
p1
g1
p2
g2
p3
g3
C1
C2
C3
C4
S3
S2
S1
S0
Do all these 4 S’s (S3, S2, S1, S0) come together?
Actually no! Since C0 is available from the beginning, S0 can be calculated in
2 gate delays (using original SOP expression for S bit in a single bit adder)
(before S3,S2,S1)
Slide13A bit more details (Cont’d)
Again, actually not all the S’s come together!
C0 is readily available, so S0 can be calculated in 2 gate delays.
Since C0 is readily available, the lowest first-tier CLL can generate C1, C2, C3
independent of the second-tier CLL. Since C1, C2, C3 are done earlier, so is S1,
S2, S3. (in 5 gate delays. 1 for (
p,g
), 2 for (C1,C2,C3), 2 for S)
When we get C4, C8, C12, we can start to calculate S4, S8, S12 and get them
in 2 more gate delays. That is the same time when we get the other C’s (purple
guys) at 7 gate delays.
Timing for items generated (in terms of gate delay):
black=already available (incorrect black color for S0 because S0 comes out at 2)
orange=1
,
green=3,
blue=5,
purple=7,
brown=9
C0
Slide14Thanks!!
(You can distribute these slides as one whole file to anywhere you feel it may be useful.)