Fall 2017 Chapter 3 Arithmetic for Computers Haojin Zhu httptdtsjtueducnhjzhu Adapted from Computer Organization and Design 4 th Edition Patterson amp Hennessy 2012 MK ID: 808716
Download The PPT/PDF document "CS 314 Computer Organization" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
CS 314 Computer Organization Fall 2017Chapter 3: Arithmetic forComputers
Haojin
Zhu (
http://tdt.sjtu.edu.cn/~hjzhu/
)
[Adapted from
Computer Organization and Design, 4
th
Edition
,
Patterson & Hennessy, © 2012, MK]
Slide2Review: MIPS (RISC) Design PrinciplesSimplicity favors regularityfixed size instructionssmall number of instruction formatsopcode always the first 6 bits
Smaller is faster
limited instruction set
limited number of registers in register file
limited number of addressing modes
Make
the common case fast
arithmetic operands from the
register file
(load-store machine)
allow instructions to contain immediate
operands
Good design demands good compromises
three instruction formats
Specifying Branch DestinationsUse a register (like in lw and sw) added to the 16-bit offsetwhich register? Instruction Address Register (the PC)its use is automatically
implied
by instruction
PC gets updated (PC+4) during the
fetch
cycle so that it holds the address of the next instructionlimits the branch distance to -215 to +215-1 (word) instructions from the (instruction after the) branch instruction, but most branches are local anyway
PC
Add
32
32
32
32
32
offset
16
32
00
sign-extend
from the low order 16 bits of the branch instruction
branch dst
address
?
Add
4
32
Slide4MIPS also has an unconditional branch instruction or jump instruction:
j label #go to label
Other Control Flow Instructions
Instruction Format (
J
Format):
0x02
26-bit address
PC
4
32
26
32
00
from the low order 26 bits of the jump instruction
Why shift left by two bits?
Slide5Review: MIPS Addressing Modes Illustrated
1. Register addressing
op rs rt rd funct
Register
word
operand
op rs rt offset
2. Base
(displacement) addressing
base register
Memory
word or byte
operand
3. Immediate addressing
op rs rt
operand
4. PC-relative addressing
op rs rt offset
Program Counter (PC)
Memory
branch destination
instruction
5. Pseudo-direct addressing
op jump address
Program Counter (PC)
Memory
jump destination
instruction
||
Slide632-bit signed numbers (2’s complement):0000 0000 0000 0000 0000 0000 0000 0000
two
= 0
ten
0000 0000 0000 0000 0000 0000 0000 0001
two = + 1ten...
0111 1111 1111 1111 1111 1111 1111 1110
two = + 2,147,483,646ten
0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647
ten
1000 0000 0000 0000 0000 0000 0000 0000two
= – 2,147,483,648ten
1000 0000 0000 0000 0000 0000 0000 0001
two = – 2,147,483,647ten
...
1111 1111 1111 1111 1111 1111 1111 1110two = – 2
ten1111 1111 1111 1111 1111 1111 1111 1111
two = – 1ten
Number Representations
maxint
minint
Converting <32-bit values into 32-bit values
copy the most significant bit (the sign bit) into the “empty” bits
0010 -> 0000 0010
1010 -> 1111 1010
sign extend
versus zero extend (
lb
vs.
lbu
)
MSB
LSB
Slide7MIPS Arithmetic Logic Unit (ALU)Must support the Arithmetic/Logic operations of the ISAadd, addi
,
addiu
,
addu
sub, subumult,
multu, div,
divusqrt
and, andi, nor, or,
ori, xor
, xori
beq, bne
, slt, slti
, sltiu, sltu
32
32
32
m (operation)
result
A
B
ALU
4
zero
ovf
1
1
With special handling for
sign extend –
addi
,
addiu
,
slti
,
sltiu
zero extend –
andi
,
ori
,
xori
overflow detection
–
add,
addi
, sub
Slide8Dealing with Overflow
Operation
Operand
A
Operand
BResult indicating overflow
A + B≥
0
≥ 0< 0
A + B
< 0
< 0
≥ 0
A - B
≥ 0
< 0
< 0
A - B
< 0
≥ 0
≥ 0
Overflow occurs when
the result of an operation cannot be represented in 32-bits, i.e., when the sign bit contains a
value
bit of the result and not the proper
sign bit
When adding
operands with different signs or when subtracting operands with the same sign, overflow can
never occur
MIPS signals overflow with an
exception
(
aka interrupt) – an unscheduled procedure call where the EPC contains the address of the instruction that caused the exception
Slide9Just like in grade school (carry/borrow 1s)
0111 0111 0110
+ 0110
- 0110
- 0101
Two's complement operations are easydo subtraction by negating and then adding
0111
0111 - 0110
+ 1010
Overflow (result too large for
finite computer word)
e.g., adding two n-bit numbers does not yield an n-bit number
0111
+ 0001
Addition & Subtraction
Slide10Just like in grade school (carry/borrow 1s)
0111 0111 0110
+ 0110
- 0110
- 0101
Two's complement operations are easy
do subtraction by negating and then adding
0111
0111 - 0110
+ 1010
Overflow (result too large for
finite
computer word)e.g., adding two n-bit numbers does not yield an n-bit number
0111
+ 0001
Addition & Subtraction
1
101
0001 0001
0001
1 0001
1
000
Slide11Building a 1-bit Binary Adder
1 bit Full Adder
A
B
S
carry_in
carry_out
S = A xor B xor carry_in
carry_out = A
&
B | A
&
carry_in | B
&
carry_in
(
majority function
)
How can we use it to build a 32-bit adder?
How can we modify it easily to build an adder/subtractor?
A
B
carry_in
carry_out
S
0
0
0
0
0
0
0
1
0
1
0
1
0
0
1
0
1
1
1
0
1
0
0
0
1
1
0
1
1
0
1
1
0
1
0
1
1
1
1
1
Slide12Building 32-bit Adder
1-bit FA
A
0
B
0
S
0
c
0
=carry_in
c
1
1-bit FA
A
1
B
1
S
1
c
2
1-bit FA
A
2
B
2
S
2
c
3
c
32
=carry_out
1-bit FA
A
31
B
31
S
31
c
31
. . .
Just connect the carry-out of the least significant bit FA to the carry-in of the next least significant bit and connect . . .
Ripple Carry Adder (RCA)
advantage: simple logic, so small (low cost)
disadvantage: slow and lots of glitching (so lots of energy consumption)
Slide13A 32-bit Ripple Carry Adder/Subtractor
Remember 2’s complement is just
complement all the bits
add a 1 in the least significant bit
A 0111
0111 B
- 0110
+
1-bit FA
S
0
c
0
=carry_in
c
1
1-bit FA
S
1
c
2
1-bit FA
S
2
c
3
c
32
=carry_out
1-bit FA
S
31
c
31
. . .
A
0
A
1
A
2
A
31
B
0
B
1
B
2
B
31
add/sub
B
0
control
(0=add,1=sub)
B
0
if control = 0, !B
0
if control = 1
Slide14A 32-bit Ripple Carry Adder/Subtractor
Remember 2’s complement is just
complement all the bits
add a 1 in the least significant bit
A 0111
0111 B
- 0110
+
1-bit FA
S
0
c
0
=carry_in
c
1
1-bit FA
S
1
c
2
1-bit FA
S
2
c
3
c
32
=carry_out
1-bit FA
S
31
c
31
. . .
A
0
A
1
A
2
A
31
B
0
B
1
B
2
B
31
add/sub
B
0
control
(0=add,1=sub)
B
0
if control = 0
!B
0
if control = 1
0001
1001
1
1 0001
Slide15Overflow Detection LogicCarry into MSB ! = Carry out of MSBFor a N-bit ALU: Overflow = CarryIn [N-1] XOR
CarryOut
[N-1]
Overflow
X
Y
X XOR Y
0
0
0
1
1
1
0
1
1
1
0
A0
B0
1-
bit
ALU
Result0
CarryIn0
CarryOut0
A1
B1
1-
bit
ALU
Result1
CarryIn1
CarryOut1
A2
B2
1-
bit
ALU
Result2
CarryIn2
CarryOut2
A3
B3
1-
bit
ALU
Result3
CarryIn3
CarryOut3
0
why?
Slide16MultiplyBinary multiplication is just a bunch of right shifts and adds
multiplicand
multiplier
partial
product
array
double precision
product
n
2n
n
can be formed in parallel and added in parallel for faster multiplication
Slide17More complicated than addition
Can be accomplished via shifting and adding
0010
(multiplicand)
x_
1011
(multiplier)
0010 0010
(partial product 0000
array) 0010 00010110
(product)
In every step multiplicand is shifted
next bit of multiplier is examined (also a shifting step) if this bit is 1, shifted multiplicand is added to the product
Multiplication
Slide18In every step
multiplicand is shifted
next bit of multiplier is examined (also a shifting step)
if this bit is 1, shifted multiplicand is added to the product
Multiplication Algorithm 1
Slide19Slide20Comments on Multiplicand Algorithm 1
Performance
Three basic steps for each bit
It requires 100 clock cycles to multiply two 32-bit numbers If each step took a clock cycle,
How to improve it?
Motivation (
Performing the operations in parallel
):
Putting multiplier and the product together
Shift them together
Slide21Refined Multiplicand Algorithm 2
multiplicand
32-bit ALU
multiplier
Control
add
shift
right
product
32-bit ALU and multiplicand is untouched
the sum keeps shifting right
at every step, number of bits in product + multiplier = 64,
hence, they share a single 64-bit register
Slide22Add and Right Shift Multiplier Hardware
multiplicand
32-bit ALU
multiplier
Control
add
shift
right
product
0 1 1 0 = 6
0 0 0 0 0 1 0 1 = 5
add 0 1 1 0 0 1 0 1
0 0 1 1 0 0 1 0
add 0 0 1 1 0 0 1 0
0 0 0 1 1 0 0 1
add 0 1 1 1 1 0 0 1
0 0 0 1 1 1 1 0
add 0 0 1 1 1 1 0 0
0 0 1 1 1 1 0 0
= 30
Slide23ExerciseUsing 4-bit numbers to save space, multiply 2ten*3ten, or 0010two * 0011two
Slide24DivisionDivision is just a bunch of quotient digit guesses and left shifts and subtractsdividend = quotient x divisor + remainder
dividend
divisor
partial
remainder
array
quotient
n
n
remainder
n
0
0
0
0
0
0
Slide25Division
1001
ten
Quotient
Divisor 1000
ten
| 1001010
ten
Dividend
-1000
10
101
1010
-1000
10
ten
Remainder
At every step,
shift divisor right and compare it with current dividend
if divisor is larger, shift 0 as the next bit of the quotient
if divisor is smaller, subtract to get new dividend and shift 1
as the next bit of the quotient
Slide2626
First Version of Hardware
for Division
A comparison requires a subtract; the sign of the result is
examined; if the result is negative, the divisor must be added back
Slide271. Subtract the Divisor register from the
Remainder register, and place the result in the
Remainder register
.
Test Remainder
Remainder < 0
Remainder >=0
2
a. Shift the Quotient register to the left
setting the new rightmost bit to 1.
2
b. Restore the original value by adding the Divisor
reg
to the Remainder
reg
and place the sum in
the Remainder
reg. Also shift the Quotient register
to the
left, setting the new LSB to 0
3.
Shift the Divisor register right1 bit.
3
3rd
repetition?
No: < 33
repetitions
Done
Yes: 33
repetitions
Start
Divide Algorithm
Slide2828
Divide Example
Divide 7
ten
(0000 0111
two
) by 2
ten
(0010
two
)
Iter
Step
Quot
Divisor
Remainder
0
Initial values
1
2
3
4
5
Slide29Divide Example
Divide 7
ten
(0000 0111
two
) by 2
ten
(0010
two
)
Iter
Step
Quot
Divisor
Remainder
0
Initial values
0000
0010 0000
0000 0111
1
Rem = Rem – Div
Rem < 0
+Div, shift 0 into Q
Shift Div right
0000
0000
0000
0010 0000
0010 0000
0001 0000
1110 0111
0000 0111
0000 0111
2
Same steps as 1
0000
0000
0000
0001 0000
0001 0000
0000 1000
1111 0111
0000 0111
0000 0111
3
Same steps as 1
0000
0000 0100
0000 0111
4
Rem = Rem – Div
Rem >= 0
shift 1 into Q
Shift Div right
0000
0001
0001
0000 0100
0000 0100
0000 0010
0000 0011
0000 0011
0000 0011
5
Same steps as 4
0011
0000 0001
0000 0001
Slide3030
Efficient Division
Remainder
Quotient
Divisor
64-
bit ALU
Shift Right
Shift Left
Write
Control
32
bits
64
bits
64
bits
divisor
32-bit ALU
quotient
Control
subtract
shift
left
dividend
remainder
Slide31Left Shift and Subtract Division Hardware
divisor
32-bit ALU
quotient
Control
subtract
shift
left
dividend
remainder
0 0 1 0 = 2
0 0 0 0 0 1 1 0 = 6
0 0 0 0 1 1 0 0
sub 1 1 1 0 1 1 0 0
rem
neg
, so ‘
ient
bit = 0
0 0 0 0 1 1 0 0
restore remainder
0 0 0 1 1 0 0 0
sub 1 1 1 1 1 1 0 0
rem
neg
, so ‘
ient
bit = 0
0 0 0 1 1 0 0 0
restore remainder
0 0 1 1 0 0 0 0
sub 0 0 0 1 0 0 0 1
rem
pos, so ‘
ient
bit = 1
0 0 1 0 0 0 1 0
sub 0 0 0 0 0 0 1 1
rem
pos, so ‘
ient
bit = 1
= 3 with 0 remainder
Slide32s
(0)
= z
for
j = 1 to k
if
2 s
(j-1)
- 2
k
d > 0 qk-j = 1
s(j) = 2 s(j-1) - 2k
d else qk
-j = 0 s(j) = 2 s(j-1)
32
Restoring Unsigned Integer Division
No need to restore
the remainder
in
the case of
R-D>0
,
Restore the remainder
In the case of
R-D<0,
the remainder
s
hift left by 1 bit
K=32, put divisor in the left 32 bit register
Slide33Non-Restoring Unsigned Integer Division
s
(1)
= 2 z
-
2
k
d
for
j = 2 to k
if
s
(j
-1
)
0 qk-(j-1) = 1
s(j) = 2 s(j-1)
- 2k d else
qk-(j-1) = 0
s(j) = 2 s(j-1) + 2
k dend forif s
(k) 0
q0 = 1else q
0 = 0 Correction step
If in the last step, remainder –divisor >0,
Perform subtraction
If in the last step, remainder –divisor
<0
,
Perform
addition
why?
Slide34s
(0)
= z
for
j = 1 to k
if
2 s
(j-1)
- 2k d > 0
q
k-j = 1 s(j) = 2 s
(j-1) - 2k d else
qk-j
= 0 s(j) = 2 s(j-1)
s
(1)
= 2 z
-
2
k
d
for j = 2 to k
if s(j-1
) 0
qk-(j-1) = 1
s(j) = 2 s(j-1)
- 2k d else
qk-(j-1)
= 0 s(j) = 2 s
(j-1) + 2k dend forif
s(k)
0 q0 = 1else q0
= 0 Correction step
Restoring Unsigned Integer Division
equal
Why?
Non-Restoring
Unsigned Integer Division
considering two
consequent
steps
j-1 and
j, in particular
2s
(j
-
2)
- 2
k
d
<
0
In the j-1 step, Restoring
Algorithm computes
q
k
-j
= 0
s
(j-1)
= 2
s
(j-2)
Non-Restoring Algorithm
s
(j-1)
= 2
s
(j-2)
-
2
k
d
In the subsequent j step, Restoring Algorithm computes
2
s
(j-1)
- 2
k
d
=
= 2*2 s
(j-2)
- 2
k
d
In the subsequent j step, non-Restoring Algorithm computes
2
s
(j-1)
+
2
k
d
=
2*2
s
(j-2)
-
2*2
k
d
+2
k
d
=
2*2 s
(j-2)
-
2
k
d
2x-y= 2(x-y)+y
Slide35Non-restoring algorithm
set
subtract_bit
true
1: If subtract bit true: Subtract the Divisor register from the Remainder and place the result in the remainder register else Add the Divisor register to the Remainder and place the result in the remainder register
2:If Remainder >= 0
Shift the Quotient register to the left, setting rightmost bit to 1 else Set subtract bit to
false3: Shift the Divisor register right 1 bit if < 33rd rep
goto 1
else Add Divisor register to remainder and place in Remainder register exit
Slide36Example:
Perform n + 1 iterations for n bits
Remainder 0000 1011
Divisor 00110000
-----------------------------------
Iteration 1:(subtract)Rem 1101 1011Quotient 0Divisor 0001 1000-----------------------------------Iteration 2:
(add)Rem 11110011
Q00Divisor 0000 1100-----------------------------------Iteration 3:
(add)Rem 11111111Q000Divisor 0000 0110
-----------------------------------
Iteration 4:
(add)
Rem 0000 0101Q0001Divisor 0000 0011-----------------------------------
Iteration 5:(subtract)Rem 0000 0010
Q 00011Divisor 0000 0001Since reminder is positive, done.Q = 0011 and Rem = 0010
Slide37ExerciseCalculate A divided by B
using restoring and
non-restoring
division. A=26, B=5
Slide38Divide (div and
divu
) generates
the reminder in
hi
and the quotient in lo div $s0, $s1 # lo = $s0 / $s1 # hi = $s0 mod $s1
Instructions mfhi
rd and mflo
rd are provided to move the quotient and reminder to (user accessible) registers in the register fileMIPS Divide Instruction
As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0.
0 16 17 0 0 0x1A
Slide39Lecture 1