/
The Processor Lecture 3.4: The Processor Lecture 3.4:

The Processor Lecture 3.4: - PowerPoint Presentation

majerepr
majerepr . @majerepr
Follow
343 views
Uploaded On 2020-06-23

The Processor Lecture 3.4: - PPT Presentation

Pipelining Datapath and Control Learning Objectives Name the five stages of the pipelined processor Explain what each stage does Calculate the total CPU times for singlecycle implementation and pipelined implementation ID: 784003

pipeline cycle pipelined stage cycle pipeline stage pipelined clock instruction single time mem memory data number control mips load

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "The Processor Lecture 3.4:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The ProcessorLecture 3.4: Pipelining Datapath and Control

Slide2

Learning ObjectivesName the five stages of the pipelined processorExplain what each stage doesCalculate the total CPU times for single-cycle implementation and pipelined implementation

Specify

how the

datapath

components and control signals are distributed among 5 pipeline stages

Understand

that the instruction a pipeline stage works on is decided by the content of the pipeline register in front of the stage

Calculate

the total length (i.e., the number of bits) of each pipeline

register

Determine

the content of a pipeline

register

Slide3

CoverageChapters 4.5 & 4.6

Slide4

Introduction to Pipelining DesignChapter 4.5

Slide5

Instruction Critical Paths

Instr.

I Mem

Reg Rd

ALU Op

D Mem

Reg Wr

Total

R-typeloadstorebeqjump

200100200100600

200100200200100800

What is the clock cycle time assuming negligible delays for muxes, control unit, sign extension, PC access, shift left 2, wires, setup and hold times except: Instruction Memory and Data Memory (200 ps) ALU and adders (200 ps) Register File access (reads or writes) (100 ps)

200100200200700

200100200500

200

200

Slide6

Single Cycle Disadvantages & AdvantagesUses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instructionespecially problematic for more complex instructions like

floating-point multiplication

Is slow

but

Is simple and easy to understand

Clk

lw

sw

WasteCycle 1Cycle 2

Slide7

How Can We Make It Faster?Start fetching and executing the next instruction before the current one has completedPipelining – modern processors are pipelined for performanceRemember the performance equation:

CPU time = IC

×

CPI

×

CC

Under

ideal

conditions and with a large number of instructions, the speedup from pipelining is approximately equal to the number of pipe stagesA five stage pipeline is nearly five times faster because the CC is nearly five times fasterCPI=1 for single-cycle implementationCPI≈1 for pipelined implementation

Slide8

Analogy: Assembly Line v.s. Mechanic ShopMechanic ShopThe mechanic needs to do everythingIt takes hours to fix just one car

Car assembly line

Many workers work together

Each worker just puts one

or a few

components into the car

One assembly line can produce hundreds or thousands of cars per day

Slide9

The Five Stages of Executing InstructionIFetch: Instruction Fetch and Update PCDec: Registers Fetch and Instruction DecodeExec: Execute R-type; calculate memory

address; etc.

Mem

: Read/write the data from/to the Data Memory

WB: Write the result data into the register file

Cycle 1

Cycle 2

Cycle 3

Cycle 4Cycle 5IFetchDecExecMemWBlw

Slide10

Why Pipeline? For Performance!

I

n

s

t

r.

O

r

derTime (clock cycles)Inst 1Inst 2Inst 3Inst 5Inst 4ALUIMRegDMRegALUIMRegDMRegALUIMRegDMReg

ALUIMRegDM

RegALUIM

Reg

DMRegOnce the pipeline is full, one instruction is completed every cycle, so CPI = 1

Time to fill the pipeline

Slide11

A Pipelined MIPS ProcessorStart the next instruction before the current one has completedimproves throughput - total amount of work done in a given timeinstruction

latency

(execution time, delay time, response time - time from the start of an instruction to its completion) is

not

reduced

Cycle 1

Cycle 2

Cycle 3

Cycle 4Cycle 5IFetchDecExecMemWBlwCycle 7Cycle 6Cycle 8swIFetchDecExecMemWBR-typeIFetchDecExecMemWBclock cycle (pipeline stage time) is limited by the slowest stagesome stages don’t need the whole clock cycle (e.g., WB)

Slide12

Single Cycle versus Pipeline

lw

IFetch

Dec

Exec

Mem

WB

Pipeline

Implementation (CC = 200 ps):IFetchDecExecMemWBswIFetchDecExecMemWBR-typeClkSingle Cycle Implementation (CC = 800 ps):lwswWasteCycle 1Cycle 2To complete an entire instruction in the pipelined case takes 1000 ps (as compared to 800 ps for the single cycle case). Why ?How long does each take to complete 1,000,000 instrs ?400 ps

Slide13

Ideal CPU Time of Pipelined Execution N: total number of instructionsK: pipeline stagesCPU time =

Number of clock cycles

×

Clock period

Number of clock cycles

= N + K - 1

Slide14

Pipelining the MIPS ISAWhat makes it easyall instructions are the same length (32 bits)

can fetch in the 1

st

stage and decode in the 2

nd

stage

few instruction formats (

three)

memory operations occur only in loads and storescan use the execute stage to calculate memory addresseseach instruction writes at most one result (i.e., changes the machine state) and does it in the last few pipeline stages (MEM or WB)operands must be aligned in memory so a single data transfer takes only one data memory accessOnly cover the following 8 instructions as an examplelw, sw, add, sub, and, or, slt, beq

Slide15

Pipelined DatapathChapter 4.6, 286

Slide16

MIPS Pipelined Datapath

Slide17

Pipeline RegistersNeed registers between stagesHold information produced in previous cycle

1

0

Slide18

Single-clock-cycle diagramCycle-by-cycle flow of instructions through the pipelined datapath“Single-clock-cycle” pipeline diagramShow pipeline usage in a single cycleHighlight resources used in each cycle

We will look at “single-clock-cycle” diagrams for

load

&

store

instructions

Slide19

IF for Load & Store

1

0

Slide20

ID for Load & Store

1

0

Slide21

EX for Load & Store

1

0

Slide22

MEM for Load

1

0

Slide23

MEM for Store

1

0

Slide24

WB for Load

Wrong

register

number

1

0

Slide25

Corrected Pipelined Datapath

1

0

Slide26

MIPS Pipeline DatapathState registers between each pipeline stage to

isolate

them

IF:IFetch

ID:Dec

EX:Execute

MEM:

MemAccess

WB:WriteBackReadAddressInstructionMemoryAddPC4Write DataRead Addr 1Read Addr 2Write AddrRegisterFileRead Data 1Read Data 21632ALUShiftleft 2AddDataMemoryAddressWrite DataReadDataIF/IDSignExtend

ID/EXEX/MEMMEM/WBSystem Clock

Slide27

Graphically Representing MIPS Pipeline

Can help with answering questions like:

How many cycles does it take to execute this code?

What is the ALU doing during cycle 4?

Is there a hazard, why does it occur, and how can it be fixed?

ALU

IM

Reg

DMReg

Slide28

Multi-Cycle Pipeline DiagramShowing the resource usage

Slide29

Multi-Cycle Pipeline DiagramTraditional form

Slide30

Pipeline ControlChapter 4.6, page 300

Slide31

Pipelined Control

Slide32

Pipelined Control SignalsControl signals derived from instructionsAs in single-cycle implementation

Slide33

Pipeline ControlIF Stage: read Instr Memory (always asserted) and write PC (on System Clock)ID Stage: no control signals to set

EX Stage

MEM Stage

WB Stage

RegDst

ALUOp1

ALUOp0

ALUSrc

BrchMemReadMemWriteRegWriteMem toRegR110000010lw000101011swX0010010X

beqX01010

00X

Slide34

MIPS Pipeline Control Path Modifications

1

0