/
EECS 470 Lecture 7  Branches: EECS 470 Lecture 7  Branches:

EECS 470 Lecture 7 Branches: - PowerPoint Presentation

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
343 views
Uploaded On 2019-03-22

EECS 470 Lecture 7 Branches: - PPT Presentation

Address prediction and recovery And interrupt recovery too Warning Crazy times coming Project handout and group formation today Help me to end class 12 minutes early P3 is due on Sunday 29 ID: 758588

buffer rob reorder branch rob buffer branch reorder result instruction order branches regid instructions address recovery sync speculation don

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "EECS 470 Lecture 7 Branches:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

EECS 470 Lecture 7

Branches:

Address prediction and recovery

(And interrupt recovery too.)Slide2

Warning: Crazy times coming

Project handout and group formation today

Help me to end class 12 minutes early…

P3 is due on Sunday 2/9

It’s a lot of work (20 hours?)

Proposal is due on Monday (2/10)

It’s not a lot of work (1 hour?) to do the write-up, but you’ll need to meet with your group and discuss things.

Don’t worry too much about getting this right. You’ll be allowed to change (we’ll meet

the following Friday).

Just a line in the sand.

HW3 is due on Wednesday 2/12

It’s a fair bit of work (3 hours?)

20 minute group meetings on Friday 2/14 (rather than

inlab

)

Midterm is on Monday 2/17 in the evening (6-8pm)

Exam Q&A on Saturday (2/15 from 6-8pm)

Q&A in class 2/17

Best way to study is look at old exams (posted on-line!) Slide3

Last time:

Covered branch predictors

Direction

AddressSlide4

General speculation

Control speculation

“I think this branch will go to address 90004”

Data speculation

“I’ll guess the result of the load will be zero”

Memory conflict speculation

“I don’t think this load conflicts with any proceeding store.”

Error speculation

“I don’t think there were any errors in this calculation”Slide5

Speculation in general

Need to be 100% sure on final correctness!

So need a recovery mechanism

Must make forward progress!

Want to speed up overall performance

So recovery cost should be low or

expected

rate of occurrence should be low.

There can be a real trade-off on

accuracy

,

cost of recovery

, and

speedup when correct.

Should keep the worst case in mind…Slide6

Address

PredictionSlide7

BTB

(

Chapter

3.9

)

Branch Target Buffer

Addresses predictor

Lots of variations

Keep the target of “likely taken” branches in a buffer

With each branch, associate the expected target.Slide8

Branch PC

Target address

0x05360AF0

0x05360000

……

BTB indexed by current PC

If entry is in BTB fetch target address next

Generally set associative (too slow as FA)

Often qualified by branch taken predictorSlide9

So…

BTB lets you predict target address during the

fetch

of the branch!

If BTB gets a miss, pretty much stuck with not-taken as a prediction

So limits prediction accuracy.

Can use BTB as a predictor.

If it is there, predict taken.

Replacement is an issue

LRU seems reasonable, but only really want branches that are taken at least a fair amount.Slide10

Misprediction

RecoverySlide11

Pipeline recovery is pretty simple

Squash and restart fetch with right address

Just have to be sure that nothing has “committed” its state yet.

In our 5-stage pipe, state is only committed during MEM (for stores) and WB (for registers)Slide12

Tomasulo’s

So far we’ve said “just don’t speculate past unresolved branches”

By that we mean, don’t even dispatch instructions after an unresolved branch.

We are worried that an instruction that

wasn’t supposed to happen

will modify architectural state.

What are our other options?

.

.

Speculate past unresolved branches and create a recovery mechanism.Slide13

What we need is:

Some way to not commit instructions until all branches before it are committed.

Just like in the pipeline, something could have finished execution, but not updated anything “real” yet.Slide14

Interrupt!!!Slide15

Interrupts

These have a similar problem.

If we can execute out-of-order a “slower” instruction might not generate an interrupt until an instruction in front of it has finished.

This sounds like the end of out-of-order execution

I mean, if we can’t finish out-of-order, isn’t this pointless?Slide16

Exceptions and Interrupts

Exception Type

Sync/Async

Maskable?

Restartable?

I/O request

Async

Yes

Yes

System call

Sync

No

YesBreakpointSync

Yes

Yes

Overflow

Sync

Yes

Yes

Page fault

Sync

No

Yes

Misaligned access

Sync

No

Yes

Memory Protect

Sync

No

Yes

Machine Check

Async/Sync

No

No

Power failure

Async

No

NoSlide17

Precise Interrupts

Implementation approaches

Don’t

E.g., Cray-1

Buffer speculative results

E.g., P4, Alpha 21264

History buffer

Future file/Reorder buffer

Instructions

Completely

Finished

No Instruction

Has ExecutedAt AllPCPrecise StateSpeculative StateSlide18

MEM

Precise Interrupts and branches via the

Reorder Buffer

@

Alloc

Allocate result storage at Tail

@

Sched

Get inputs (ROB T-to-H then ARF)

Wait until all inputs ready

@

WB

Write results/fault to ROBIndicate result is ready@ CTWait until inst @ Head is doneIf fault, initiate handlerElse, write results to ARFDeallocate entry from ROBIFIDAllocSchedEXROBCTHeadTailPCDst regIDDst valueExcept?Reorder Buffer (ROB)Circular queue of spec stateMay contain multiple definitions of same registerIn-orderIn-orderAny orderARFSlide19

Reorder Buffer Example

Code Sequence

f1 = f2 / f3

r3 = r2 + r3

r4 = r3 – r2

Initial Conditions

- reorder buffer empty

- f2 = 3.0

- f3 = 2.0

- r2 = 6

- r3 = 5

ROB

TimeHTregID: f1result: ?Except: ?HTregID: f1result: ?Except: ?regID: r3result: ?Except: ?HTregID: f1result: ?Except: ?regID: r3result: 11Except: NregID: r4result: ?Except: ?

r3

regID: r8

result: 2

Except: n

regID: r8

result: 2

Except: n

regID: r8

result: 2

Except: nSlide20

Reorder Buffer Example

Code Sequence

f1 = f2 / f3

r3 = r2 + r3

r4 = r3 – r2

Initial Conditions

- reorder buffer empty

- f2 = 3.0

- f3 = 2.0

- r2 = 6

- r3 = 5

ROB

TimeHTregID: f1result: ?Except: ?regID: r3result: 11Except: nregID: r4result: 5Except: nHTregID: f1result: ?Except: yregID: r3result: 11Except: nregID: r4result: 5Except: nregID: r8result: 2Except: nregID: r8result: 2Except: n

H

T

regID: f1

result: ?

Except: y

regID: r3

result: 11

Except: n

regID: r4

result: 5

Except: nSlide21

Reorder Buffer Example

Code Sequence

f1 = f2 / f3

r3 = r2 + r3

r4 = r3 – r2

Initial Conditions

- reorder buffer empty

- f2 = 3.0

- f3 = 2.0

- r2 = 6

- r3 = 5

ROB

TimeHTHTfirst instof faulthandlerSlide22

There is more complexity here

Rename table needs to be cleared

Everything is in the ARF

Really do need to finish everything which was before the faulting instruction in program order.

What about branches?

Would need to drain everything before the branch.

Why not just squash everything that follows it?Slide23

And while we’re at it…

Does the ROB replace the RS?

Is this a good thing? Bad thing?Slide24

ROB

ROB

ROB is an

in-order

queue where instructions are placed.

Instructions

complete

(retire) in-order

Instructions still

execute

out-of-order

Still use RSInstructions are issued to RS and ROB at the same timeRename is to ROB entry, not RS.When execute done instruction leaves RSOnly when all instructions in before it in program order are done does the instruction retire.Slide25

Adding a Reorder BufferSlide26

Tomasulo

Data Structures

(Timing Free Example, “P6 scheme”)

Map Table

Reg

Tag

r0

r1

r2

r3

r4

Reservation Stations (RS)

T

FU

busy

op

R

T1

T2

V1

V2

1

2

3

4

5

CDB

T

V

ARF

Reg

V

r0

r1

r2

r3

r4

Instruction

r0=r1*r2

r1=r2*r3

Branch if r1=0

r0=r1+r1

r2=r2+1

Reorder

Buffer (

RoB

)

RoB

Number

0

1

2

3

4

5

6

Dest

. Reg.

ValueSlide27

Review Questions

Could we make this work without a RS?

If so, why do we use it?

Why is it important to retire in order?

Why must branches wait until retirement before they announce their

mispredict

?

Any other ways to do this?Slide28

More review questions

What is the purpose of the

RoB

?

Why do we have both a

RoB

and a RS?

Yes, that was pretty much on the last page…

Misprediction

When to we resolve a

mis

-prediction?

What happens to the main structures (RS, RoB, ARF, Rename Table) when we mispredict?What is the whole purpose of OoO execution? Slide29

Topic change

Why on earth are we doing this?

Why do we think it helps?

Homework 2 problems 5 and 6 made the argument.

Only need to obey true data dependencies.

Huge speedup

potential

.Slide30

Optimizing CPU Performance

Golden Rule:

t

CPU

=

N

inst

*CPI*

t

CLK

Given this, what are our options

Reduce the number of instructions executed

Reduce the cycles to execute an instructionReduce the clock periodOur first focus: Reducing CPIApproach: Instruction Level Parallelism (ILP)Slide31

Why ILP?

Vs.

Requirements

Parallelism

Large window

Limited control deps

Eliminate “false” deps

Find run-time depsSlide32

How Much ILP is There

?

(Chapter 3.10)Slide33

How Large Must the “Window” Be?Slide34

ALU Operation

GOOD

, Branch

BAD

Expected Number of Branches

Between Mispredicts

E(X) ~ 1/(1-p)

E.g., p = 95%, E(X) ~ 20 brs, 100-ish instsSlide35

How Accurate are Branch Predictors?Slide36

Impact of Physical Storage Limitations

Each instruction “in flight” must have storage for its result

Really worse than this because of mispeculation…Slide37

Registers

GOOD

, Memory

BAD

Benefits of registers

Well described

deps

Fast access

Finite resource

Memory loses these benefits for flexibility

*p = …

*q = …

… = *p?Slide38

“Bottom Line” for an Ambitious Design