Pipelining Datapath and Control Learning Objectives Name the five stages of the pipelined processor Explain what each stage does Calculate the total CPU times for singlecycle implementation and pipelined implementation ID: 784003
Download The PPT/PDF document "The Processor Lecture 3.4:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The ProcessorLecture 3.4: Pipelining Datapath and Control
Slide2Learning ObjectivesName the five stages of the pipelined processorExplain what each stage doesCalculate the total CPU times for single-cycle implementation and pipelined implementation
Specify
how the
datapath
components and control signals are distributed among 5 pipeline stages
Understand
that the instruction a pipeline stage works on is decided by the content of the pipeline register in front of the stage
Calculate
the total length (i.e., the number of bits) of each pipeline
register
Determine
the content of a pipeline
register
Slide3CoverageChapters 4.5 & 4.6
Slide4Introduction to Pipelining DesignChapter 4.5
Slide5Instruction Critical Paths
Instr.
I Mem
Reg Rd
ALU Op
D Mem
Reg Wr
Total
R-typeloadstorebeqjump
200100200100600
200100200200100800
What is the clock cycle time assuming negligible delays for muxes, control unit, sign extension, PC access, shift left 2, wires, setup and hold times except: Instruction Memory and Data Memory (200 ps) ALU and adders (200 ps) Register File access (reads or writes) (100 ps)
200100200200700
200100200500
200
200
Slide6Single Cycle Disadvantages & AdvantagesUses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instructionespecially problematic for more complex instructions like
floating-point multiplication
Is slow
but
Is simple and easy to understand
Clk
lw
sw
WasteCycle 1Cycle 2
Slide7How Can We Make It Faster?Start fetching and executing the next instruction before the current one has completedPipelining – modern processors are pipelined for performanceRemember the performance equation:
CPU time = IC
×
CPI
×
CC
Under
ideal
conditions and with a large number of instructions, the speedup from pipelining is approximately equal to the number of pipe stagesA five stage pipeline is nearly five times faster because the CC is nearly five times fasterCPI=1 for single-cycle implementationCPI≈1 for pipelined implementation
Slide8Analogy: Assembly Line v.s. Mechanic ShopMechanic ShopThe mechanic needs to do everythingIt takes hours to fix just one car
Car assembly line
Many workers work together
Each worker just puts one
or a few
components into the car
One assembly line can produce hundreds or thousands of cars per day
Slide9The Five Stages of Executing InstructionIFetch: Instruction Fetch and Update PCDec: Registers Fetch and Instruction DecodeExec: Execute R-type; calculate memory
address; etc.
Mem
: Read/write the data from/to the Data Memory
WB: Write the result data into the register file
Cycle 1
Cycle 2
Cycle 3
Cycle 4Cycle 5IFetchDecExecMemWBlw
Slide10Why Pipeline? For Performance!
I
n
s
t
r.
O
r
derTime (clock cycles)Inst 1Inst 2Inst 3Inst 5Inst 4ALUIMRegDMRegALUIMRegDMRegALUIMRegDMReg
ALUIMRegDM
RegALUIM
Reg
DMRegOnce the pipeline is full, one instruction is completed every cycle, so CPI = 1
Time to fill the pipeline
Slide11A Pipelined MIPS ProcessorStart the next instruction before the current one has completedimproves throughput - total amount of work done in a given timeinstruction
latency
(execution time, delay time, response time - time from the start of an instruction to its completion) is
not
reduced
Cycle 1
Cycle 2
Cycle 3
Cycle 4Cycle 5IFetchDecExecMemWBlwCycle 7Cycle 6Cycle 8swIFetchDecExecMemWBR-typeIFetchDecExecMemWBclock cycle (pipeline stage time) is limited by the slowest stagesome stages don’t need the whole clock cycle (e.g., WB)
Slide12Single Cycle versus Pipeline
lw
IFetch
Dec
Exec
Mem
WB
Pipeline
Implementation (CC = 200 ps):IFetchDecExecMemWBswIFetchDecExecMemWBR-typeClkSingle Cycle Implementation (CC = 800 ps):lwswWasteCycle 1Cycle 2To complete an entire instruction in the pipelined case takes 1000 ps (as compared to 800 ps for the single cycle case). Why ?How long does each take to complete 1,000,000 instrs ?400 ps
Slide13Ideal CPU Time of Pipelined Execution N: total number of instructionsK: pipeline stagesCPU time =
Number of clock cycles
×
Clock period
Number of clock cycles
= N + K - 1
Slide14Pipelining the MIPS ISAWhat makes it easyall instructions are the same length (32 bits)
can fetch in the 1
st
stage and decode in the 2
nd
stage
few instruction formats (
three)
memory operations occur only in loads and storescan use the execute stage to calculate memory addresseseach instruction writes at most one result (i.e., changes the machine state) and does it in the last few pipeline stages (MEM or WB)operands must be aligned in memory so a single data transfer takes only one data memory accessOnly cover the following 8 instructions as an examplelw, sw, add, sub, and, or, slt, beq
Slide15Pipelined DatapathChapter 4.6, 286
Slide16MIPS Pipelined Datapath
Slide17Pipeline RegistersNeed registers between stagesHold information produced in previous cycle
1
0
Slide18Single-clock-cycle diagramCycle-by-cycle flow of instructions through the pipelined datapath“Single-clock-cycle” pipeline diagramShow pipeline usage in a single cycleHighlight resources used in each cycle
We will look at “single-clock-cycle” diagrams for
load
&
store
instructions
Slide19IF for Load & Store
1
0
Slide20ID for Load & Store
1
0
Slide21EX for Load & Store
1
0
Slide22MEM for Load
1
0
Slide23MEM for Store
1
0
Slide24WB for Load
Wrong
register
number
1
0
Slide25Corrected Pipelined Datapath
1
0
Slide26MIPS Pipeline DatapathState registers between each pipeline stage to
isolate
them
IF:IFetch
ID:Dec
EX:Execute
MEM:
MemAccess
WB:WriteBackReadAddressInstructionMemoryAddPC4Write DataRead Addr 1Read Addr 2Write AddrRegisterFileRead Data 1Read Data 21632ALUShiftleft 2AddDataMemoryAddressWrite DataReadDataIF/IDSignExtend
ID/EXEX/MEMMEM/WBSystem Clock
Slide27Graphically Representing MIPS Pipeline
Can help with answering questions like:
How many cycles does it take to execute this code?
What is the ALU doing during cycle 4?
Is there a hazard, why does it occur, and how can it be fixed?
ALU
IM
Reg
DMReg
Slide28Multi-Cycle Pipeline DiagramShowing the resource usage
Slide29Multi-Cycle Pipeline DiagramTraditional form
Slide30Pipeline ControlChapter 4.6, page 300
Slide31Pipelined Control
Slide32Pipelined Control SignalsControl signals derived from instructionsAs in single-cycle implementation
Slide33Pipeline ControlIF Stage: read Instr Memory (always asserted) and write PC (on System Clock)ID Stage: no control signals to set
EX Stage
MEM Stage
WB Stage
RegDst
ALUOp1
ALUOp0
ALUSrc
BrchMemReadMemWriteRegWriteMem toRegR110000010lw000101011swX0010010X
beqX01010
00X
Slide34MIPS Pipeline Control Path Modifications
1
0