Constructive Computer Architecture PowerPoint Presentation, PPT - DocSlides

Constructive Computer Architecture PowerPoint Presentation, PPT - DocSlides

2016-06-30 76K 76 0 0

Description

Tutorial . 5. Epoch & Branch Predictor. Sizhuo Zhang. 6.175 TA. October 30, 2015. T05-. 1. http://csg.csail.mit.edu/6.175. 1-bit Distributed Epochs. Decode redirects I1 (. ieEp. =. idEp. =0). October 30, 2015. ID: 383302

Embed code:

Download this presentation



DownloadNote - The PPT/PDF document "Constructive Computer Architecture" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentations text content in Constructive Computer Architecture

Slide1

Constructive Computer ArchitectureTutorial 5Epoch & Branch Predictor

Sizhuo Zhang6.175 TA

October 30, 2015

T05-1

http://csg.csail.mit.edu/6.175

Slide2

1-bit Distributed Epochs

Decode redirects I1 (ieEp=idEp=0)

October 30, 2015

T05-2

http://csg.csail.mit.edu/6.175

Execute

d2e

Decode

f2d

Fetch

PC

eEp

=0

feEp

=0

fdEp

=0

dEp

=0

...

I1

ieEp

=0

Delay: 0 cycle

Delay: 100 cycles

Slide3

1-bit Distributed Epochs

Decode redirects I1 (ieEp=idEp=0)

October 30, 2015

T05-3

http://csg.csail.mit.edu/6.175

Execute

d2e

Decode

f2d

Fetch

PC

eEp

=0

feEp

=0

fdEp

=0

dEp

=

1

...

I1

d

eEp

=0

I1 redirect,

ieEp

=0

Delay: 0 cycle

Delay: 100 cycles

Slide4

1-bit Distributed Epochs

Decode redirects I1 (ieEp=idEp=0)Execute redirects I1

October 30, 2015

T05-4

http://csg.csail.mit.edu/6.175

Execute

d2e

Decode

f2d

Fetch

PC

eEp

=0

feEp

=0

Delay: 0 cycle

fdEp

=0

dEp

=1

Delay: 100 cycles

...

I1

d

eEp

=0

I1 redirect,

ieEp

=0

Slide5

1-bit Distributed Epochs

Decode redirects I1 (ieEp=idEp=0)Execute redirects I1

October 30, 2015

T05-5

http://csg.csail.mit.edu/6.175

Execute

d2e

Decode

f2d

Fetch

PC

eEp

=

1

feEp

=

1

Delay: 0 cycle

fdEp

=0

dEp

=1

Delay: 100 cycles

...

I1

d

eEp

=0

I1 redirect,

ieEp

=0

Slide6

1-bit Distributed Epochs

Decode redirects I1 (ieEp=idEp=0)Execute redirects I1Correct-path I2 (ieEp=1,idEp=0) issues

October 30, 2015

T05-6

http://csg.csail.mit.edu/6.175

Execute

d2e

Decode

f2d

Fetch

PC

eEp

=1

feEp

=

1

Delay: 0 cycle

fdEp

=0

dEp

=1

Delay: 100 cycles

...

I2

d

eEp

=0

I1 redirect,

ieEp

=0

Slide7

1-bit Distributed Epochs

Decode redirects I1 (ieEp=idEp=0)Execute redirects I1Correct-path I2 (ieEp=1,idEp=0) issues

October 30, 2015

T05-7

http://csg.csail.mit.edu/6.175

Execute

d2e

Decode

f2d

Fetch

PC

eEp

=1

feEp

=

1

Delay: 0 cycle

fdEp

=0

dEp

=

0

Delay: 100 cycles

...

I2

I1 redirect,

ieEp

=0

d

eEp

=

1

Slide8

1-bit Distributed Epochs

Decode redirects I1 (ieEp=idEp=0)Execute redirects I1Correct-path I2 (ieEp=1,idEp=0) issuesExecute redirects I2

October 30, 2015

T05-8

http://csg.csail.mit.edu/6.175

Execute

d2e

Decode

f2d

Fetch

PC

eEp

=1

feEp

=

1

Delay: 0 cycle

fdEp

=0

dEp

=0

Delay: 100 cycles

...

I2

I1 redirect,

ieEp

=0

d

eEp

=1

Slide9

1-bit Distributed Epochs

Decode redirects I1 (ieEp=idEp=0)Execute redirects I1Correct-path I2 (ieEp=1,idEp=0) issuesExecute redirects I2

October 30, 2015

T05-9

http://csg.csail.mit.edu/6.175

Execute

d2e

Decode

f2d

Fetch

PC

eEp

=

0

feEp

=

0

Delay: 0 cycle

fdEp

=0

dEp

=0

Delay: 100 cycles

...

I2

I1 redirect,

ieEp

=0

d

eEp

=1

Slide10

1-bit Distributed Epochs

Decode redirects I1 (ieEp=idEp=0)Execute redirects I1Correct-path I2 (ieEp=1,idEp=0) issuesExecute redirects I2I1 redirect arrives at Fetch (ieEp == feEp)change PC to a wrong value

October 30, 2015

T05-10

http://csg.csail.mit.edu/6.175

Execute

d2e

Decode

f2d

Fetch

PC

eEp

=0

feEp

=

0

Delay: 0 cycle

fdEp

=0

dEp

=0

Delay: 100 cycles

...

I2

I1 redirect,

ieEp

=0

d

eEp

=1

Slide11

Unbounded Global Epochs

Both Decode and Execute can redirect the PCExecute redirect should never be overruledGlobal epoch for each redirecting stageeEpoch: incremented when redirect from Execute takes effectdEpoch: incremented when redirect from Decode takes effectInitially set all epochs to 0

October 30, 2015

T05-11

http://csg.csail.mit.edu/6.175

Execute

d2e

Decode

f2d

Fetch

PC

miss

pred

?

miss

pred

?

redirect PC

redirect PC

eEp

dEp

...

Redirect

Slide12

Execute stage

October 30, 2015

T05-12

http://csg.csail.mit.edu/6.175

Execute

d2e

Decode

f2d

Fetch

PC

miss

pred

?

miss

pred

?

{pc,

newpc

}

{pc,

newpc

}

eEp

dEp

...

{…,

ieEp

,

idEp

}

{pc,

ppc

,

ieEp

}

Is

ieEp

==

eEp

?

yes

no

Wrong path instruction;

poison it

Current instruction is OK;

check the ppc prediction by execution, signal a redirect message (by writing EHR) on misprediction

Redirect

Slide13

Decode Stage

October 30, 2015

T05-13

http://csg.csail.mit.edu/6.175

Execute

d2e

Decode

f2d

Fetch

PC

miss

pred

?

miss

pred

?

{pc,

newpc

}

{pc,

newpc

}

eEp

dEp

...

{pc,

ppc

,

ieEp

,

idEp

}

{...,

ieEp

}

Is

ieEp

==

eEp

&& idEp == dEp ?

yes

no

Wrong path instruction; drop it

Current instruction is OK;

check

the ppc prediction via BHT, signal a redirect message (by writing EHR) on misprediction

Redirect

Slide14

Fetch stage: Redirect PC, Change Epoch

October 30, 2015

T05-14

http://csg.csail.mit.edu/6.175

Execute

d2e

Decode

f2d

Fetch

PC

miss

pred

?

miss

pred

?

{pc,

newpc

}

eEp

dEp

...

Redirect

If there is redirect message from Execute

Set PC to correct value,

increment

eEp

Else if there is redirect message from Decode

Set PC to correct value,

increment

dEp

We can always request

IMem

to fetch new instruction

New instruction will be tagged with current

eEpoch

and

dEpoch

PC redirection overrides next-PC prediction

When PC redirects, new instruction must be killed at Decode stage

Slide15

Observation

At cycle , instructions between Fetch and Execute: at Fetch, at Execute, at DecodeInvariant:Proved by induction on At cycle , both Decode and Execute redirect, invariants still hold at cycle

 

October 30, 2015

T05-15

http://csg.csail.mit.edu/6.175

eEpoch

dEpoch

...

Execute

...

Fetch

Decode

...

 

 

 

Slide16

1-bit Global Epochs

The max difference of values of one type of epoch is 1Only need 1 bit to encode each epochIncrement epoch  flip epoch

October 30, 2015

T05-16

http://csg.csail.mit.edu/6.175

Slide17

4-stage Pipeline: Code Sketch

October 30, 2015

T05-17

http://csg.csail.mit.edu/6.175

module

mkProc

(Proc

);

// stage 1: Fetch

// stage 2: Decode & read register

// stage 3: Execute & access memory

// stage 4: Commit (write register file)

Ehr

#(2,

Addr

)

pc <-

mkEhr

(?);

IMemory

iMem

<-

mkIMemory

;

...

Fifo

#(2, Fetch2Decode) f2d

<-

mkCFFifo

;

Fifo

#(2, Decode2Execute) d2e

<-

mkCFFifo

;

Fifo

#(2, Execute2Commit) e2c <-

mkCFFifo

;

Reg

#(Bool

)

eEpoch

<-

mkReg

(False);

Reg

#(Bool)

dEpoch

<-

mkReg

(False

);

Ehr

#(2, Maybe#(

ExeRedirect

))

exeRedirect

<-

mkEhr

(Invalid);

Ehr

#(2, Maybe#(

DecRedirect

))

decRedirect

<-

mkEhr

(Invalid);

BTB#(16)

btb

<-

mkBTB

;

BHT#(1024)

bht

<-

mkBHT

;

Slide18

Execute Stage

October 30, 2015

T05-18

http://csg.csail.mit.edu/6.175

rule doExecute; d2e.deq; let x = d2e.first; if(x.ieEp == eEpoch) begin let eInst = exec(...); if(eInst.mispredict) exeRedirect[0] <= Valid ({pc: x.pc, newpc: eInst.addr}); ... end else e2c.enq(Exec2Commit{poisoned: True, ...}); // wrong path instruction, poison itendrule

BHT training

will be in lab

Slide19

Decode Stage

October 30, 2015

T05-19

http://csg.csail.mit.edu/6.175

rule

doDecode

;

f2d.deq;

let

x = f2d.first;

if

(

x.ieEp

==

eEpoch

&&

x.idEp

==

dEpoch

)

begin

let

stall = ...; // check scoreboard for stall

if

(!stall)

begin

if

(

x.iType

== Br)

begin

let

bht_ppc

=

...; // BHT prediction

if

(

bht_ppc

!=

x.ppc

)

decRedirect

[0

] <=

Valid ({pc

:

x.pc

,

newpc

:

bht_ppc

});

end

...

d2e.enq(Decode2Execute{

ieEp

:

x.ieEp

, ...});

end

end

// else

: wrong path instruction, killed

endrule

Slide20

Fetch Stage

October 30, 2015

T05-20

http://csg.csail.mit.edu/6.175

rule

doFetch

;

let

inst

=

iMem.req

(pc[0]);

let

ppc

=

btb.predPc

(pc[0]);

pc[0] <=

ppc

;

f2d.enq(Fetch2Decode{pc: pc[0],

ppc

:

ppc

,

inst

:

inst

,

ieEp

:

eEpoch

,

idEp

:

dEpoch

});

endrule

rule

cononicalizeRedirect

;

if

(

exeRedirect

[1]

matches

tagged

Valid .r)

begin

pc[1] <=

r.newpc

;

btb.update

(

r.pc

,

r.newpc

);

eEpoch

<= !

eEpoch

;

end else if

(

decRedirect

[1]

matches

tagged

Valid .r)

begin

pc[1] <=

r.newpc

;

btb.update

(

r.pc

,

r.newpc

);

dEpoch

<=

!

dEpoch

;

end

exeRedirect

[1] <= Invalid;

decRedirect

[1] <= Invalid;

endrule

Slide21

Advanced Branch Predictor

BHT cannot predict very accuratelyNeed more sophisticated schemesGlobal branch historytaken/not token for previous branchesLocal branch historytaken/not token for previous occurrences of the same branchTournament branch predictorUse both global and local historyAlpha 21264

October 30, 2015

T05-21

http://csg.csail.mit.edu/6.175

Slide22

Tournament Predictor

10-bit PC: index 1024 x 10-bit local history table10-bit local historyIndex 1024 x 3-bit BHT: prediction 112-bit global historyIndex 4096 x 2-bit BHT: prediction 2Index 4096 x 2-bit BHT: select between predictions 1, 2

October 30, 2015

T05-22

http://csg.csail.mit.edu/6.175

“The

Alpha 21264 Microprocessor

Architecture”

Slide23

TAGE Branch Predictors

TAGE (Tagged Geometric history length)

October 30, 2015

T05-23

http://csg.csail.mit.edu/6.175

“A case for partially tagged geometric history length branch prediction”

Slide24

Other Predictors

Perceptron-based predictorsClassification problem in machine learningChampionship Branch PredictorPapersSlidessimulation codes

October 30, 2015

T05-24

http://csg.csail.mit.edu/6.175

Slide25

Return Address Stack

Use a stack to store return address from function callPush when function is calledPop when returning from a functionInstruction to return from function callJALR: rd = x0, rs1 = x1 (ra)Instruction to initiate function callJAL: rd = x1JALR: rd = x1

October 30, 2015

T05-25

http://csg.csail.mit.edu/6.175


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.