/
Nov. 9, 2004 Nov. 9, 2004

Nov. 9, 2004 - PowerPoint Presentation

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
408 views
Uploaded On 2016-12-16

Nov. 9, 2004 - PPT Presentation

1 Lecture 6 Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm Section 24 2 Scoreboard Implications Outoforder completion gt WAR WAW hazards Solutions for WAR CDC 6600 ID: 502254

nov 2004 lec tomasulo 2004 nov tomasulo lec cycle unit scoreboard functional register execution operands write hazards waw read

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Nov. 9, 2004" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Nov. 9, 2004

1

Lecture

6: Dynamic Scheduling with

Scoreboarding

and

Tomasulo

Algorithm (Section 2.4)Slide2

2

Scoreboard Implications

Out-of-order completion => WAR, WAW hazards

Solutions for

WAR

CDC 6600:

Stall Write

to allow Reads to take place; Read registers only during Read Operands stage.

For

WAW

, must detect hazard:

stall in the Issue

stage until other completes

Need to have multiple instructions in execution phase => multiple execution units or pipelined execution units

Scoreboard replaces ID with 2 stages (Issue and RO)

Scoreboard keeps track of dependencies, state or operations

Monitors every change in the hardware.

Determines when to read ops, when can execute, when can

wb

.

Hazard detection and resolution is centralized.Slide3

3

Four Stages of Scoreboard Control

1. Issue

—decode instructions & check for structural hazards (ID1)

If a functional unit for the instruction is free and no other active instruction has the same destination register (WAW), the scoreboard issues the instruction to the functional unit and updates its internal data structure.

If a structural or WAW hazard exists, then the instruction issue stalls

, and no further instructions will issue until these hazards are cleared.

2. Read operands

—wait until no data hazards, then read operands (ID2)

A source operand is available if no earlier issued active instruction is going to write it, or if the register containing the operand is being written by a currently active functional unit. When the source operands are available, the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution. The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order.Slide4

4

Four Stages of Scoreboard Control

3.Execution

—operate on operands (EX)

The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution.

4.Write result

—finish execution (WB)

Once the scoreboard is aware that the functional unit has completed execution, the scoreboard checks for WAR hazards. If none, it writes results.

If WAR, then it stalls the instruction.

Example:

DIVD F0,F2,F4

ADDD F10,F0,

F8

SUBD

F8

,F8,F14

CDC 6600 scoreboard would stall SUBD until ADDD reads

operands

CDC 6600 has one integer, 2 FP multipliers, 1 FP divide, 1 FP add units.

See Fig. A.50.Slide5

5

Three Parts of the Scoreboard

1. Instruction status

which of 4 steps the instruction is in

2. Functional unit status

Indicates the state of the functional unit (FU). 9 fields for each functional unit

Busy

—Indicates whether the unit is busy or not

Op

—Operation to perform in the unit (e.g., + or –)

Fi

—Destination register

Fj, Fk

—Source-register numbers

Qj, Qk

—Functional units producing source registers Fj, Fk

Rj, Rk

—Flags indicating when Fj, Fk are ready

and not yet read. Set to

No after operand are read.

3. Register result status

Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

Slide6

6

Scoreboard Example Cycle

7

Note:

(1) In-order Issue (2) I2 could not be issued at cycle 2 due to structural hazard (3) I3 issued in cycle 6, but

stalled at read because I2 isn’t completeSlide7

27Slide8

Nov. 2, 2004

Lec. 7

8

Review: Scoreboard

Limitations of 6600 scoreboard

No forwarding

Limited to instructions in basic block (small

window

)

Large number of functional units (structural hazards)

Stall on WAR hazards

Stall on WAW hazards

DIV.D F0, F2, F4

ADD.D

F6

, F0,

F8

S.D F6, 0(R1)

SUB.D

F8

, F10, F14

MUL.D

F6

, F10, F8

WAR

WAW

Antidependence

Output dependence

Name dependenceSlide9

Nov. 2, 2004

Lec. 7

9

Dynamic Scheduling:

Tomasulo

Algorithm

For IBM 360/91 about 3 years after CDC

6600 that proposed

scoreboarding

Goal: High Performance without special compilers

Differences between

Tomasulo

Algorithm & Scoreboard

Control & buffers distributed with Function Units vs. centralized in scoreboard; called “reservation stations”

Registers in instructions replaced by pointers to reservation station buffer

HW renaming of registers to avoid WAW hazards

Buffer operand values to avoid WAR hazards

Common Data Bus broadcasts results to all FUs

Load and Stores treated as FUs as well

Why study? Lead to Alpha 21264, HP 8000, MIPS 10000, Pentium II, Power PC 604 …Slide10

Nov. 2, 2004

Lec. 7

10

FP unit and load-store unit using Tomasulo’s alg.Slide11

Nov. 2, 2004

Lec. 7

11

Dynamic

Algorithm:

Tomasulo

Algorithm

DIV.D F0, F2, F4

ADD.D

S

, F0, F8

S.D

S

, 0(R1)

register renaming

SUB.D

T

, F10, F14

MUL.D F6, F10,

T

Implemented through reservation stations (rs) per functional unit

Buffers an operand as soon as it is available – avoids WAR hazards.

Pending instr. designate rs that will provide their inputs – avoids WAW hazards.

The last write in a sequence of same-register-writing actually updates the register

Decentralize hazard detection and execution control

Instruction results are passed directly to the FU from rs rather than from registers

Through common data bus (CDB)Slide12

Nov. 2, 2004

Lec. 7

12

Three Stages of Tomasulo Algorithm

1. Issue

—get instruction from FP Op Queue

Stall if structural hazard, ie. no space in the rs. If reservation station (

rs

) is free, the issue logic issues instr to

rs

& read operands into

rs if ready (

Register renaming => Solves WAR

). Make status of destination register waiting for this latest instn even if the previous instn writing to this register hasn’t completed => Solves WAW hazards

.

2. Execution

—operate on operands (EX)

When both operands are

ready

then execute;

if not ready, watch CDB for result – Solves RAW

3. Write result

—finish execution (WB)

Write on Common Data Bus to all awaiting units;

mark reservation station available. Write result into dest. reg. if its status is

r

. => Solves WAW.

Normal data bus: data + destination (“go to” bus)CDB: data + source (“come from” bus)64 bits of data + 4 bits of Functional Unit source addressWrite if matches expected Functional Unit (produces result)Does broadcastSlide13

Nov. 2, 2004

Lec. 7

13

Reservation Station Components

Op

—Operation to perform in the unit (e.g., + or –)

Vj, Vk

— Value of the source operand.

Qj, Qk

— Name of the RS that would provide the source operands. Value

zero

means the source operands already available in Vj or Vk, or is not necessary.

Busy

—Indicates reservation station or FU is busy

Register File Status Qi:

Qi

—Indicates which functional unit will write each register, if one exists. Blank (0) when no pending instructions that will write that register meaning that the value is already available. Slide14

14

Tomasulo

Status pp.

99Slide15

Nov. 2, 2004

Lec. 7

15

Tomasulo Example Cycle

0Slide16

Nov. 2, 2004

Lec. 7

16

Tomasulo Example Cycle

1Slide17

Nov. 2, 2004

Lec. 7

17

Tomasulo Example Cycle

2Slide18

Nov. 2, 2004

Lec. 7

18

Tomasulo Example Cycle

3Slide19

Nov. 2, 2004

Lec. 7

19

Tomasulo Example Cycle

4Slide20

Nov. 2, 2004

Lec. 7

20

Tomasulo Example Cycle

5Slide21

Nov. 2, 2004

Lec. 7

21

Tomasulo Example Cycle

6Slide22

Nov. 2, 2004

Lec. 7

22

Tomasulo Example Cycle

7Slide23

Nov. 2, 2004

Lec. 7

23

Tomasulo Example Cycle

8Slide24

Nov. 2, 2004

Lec. 7

24

Tomasulo Example Cycle

9Slide25

Nov. 2, 2004

Lec. 7

25

Tomasulo Example Cycle

10Slide26

Nov. 2, 2004

Lec. 7

26

Tomasulo Example Cycle

11Slide27

Nov. 2, 2004

Lec. 7

27

Tomasulo Example Cycle

12Slide28

Nov. 2, 2004

Lec. 7

28

Tomasulo Example Cycle

15Slide29

Nov. 2, 2004

Lec. 7

29

Tomasulo Example Cycle

16Slide30

Nov. 2, 2004

Lec. 7

30

Tomasulo Example Cycle

56Slide31

Nov. 2, 2004

Lec. 7

31

Tomasulo Example Cycle

57