/
A 64 A 64

A 64 - PowerPoint Presentation

giovanna-bartolotta
giovanna-bartolotta . @giovanna-bartolotta
Follow
363 views
Uploaded On 2016-06-21

A 64 - PPT Presentation

Kbytes ITTAGE indirect branch predictor André Seznec INRIAIRISA Build on ITTAGE ITTAGE Introduced at the same time as TAGE JILP 2006 Derived directly from the TAGE ID: 371509

target pta ittage history pta target history ittage bit bits entry mppki table global ctr indirect predictor tage update

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A 64" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A 64 Kbytes ITTAGE indirect branch predictor

André

Seznec

INRIA/IRISASlide2

Build on ITTAGEITTAGE:Introduced

at

the

same

time as TAGE (JILP 2006)

Derived

directly

from

the TAGE

predictor

:

Target

prediction

instead

of direction

predictionSlide3

ITTAGE: multiple tables, global history predictor

The set of history lengths forms a

geometric

series

What is important:

L(i)-L(i-1) is drastically increasing

most of the storage

for short history !!

{

0, 2, 4, 8, 16, 32, 64, 128

}

Capture correlation

on very long historiesSlide4

pc

h[0:L1]

=?

=?

=?

prediction

pc

pc

h[0:L2]

pc

h[0:L3]

32

32

1

32

1

32

1

32

32

Tagless base

Predictor

The ITTAGE

predictorSlide5

Prediction computationGeneral case:

Longest matching component provides the prediction

Special case:

Many

mispredictions

on newly allocated entries: weak

Ctr

Sometimes

Altpred (slightly) more accurate than

PredProperty dynamically monitored through a single 4-bit counter

-2

% MPPKISlide6

A tagged table entryCtr: 2-bit hysteresis counter

U:

1-bit

useful counter

Was the entry recently useful ?

Tag: partial tag

Target: the target

Target

Tag

Ctr

U

32 bits or

some

way to reconstruct itSlide7

Allocate entries on mispredictionsAllocate

entries

in longer

history

length

tablesOn tables with

U unsetSet Ctr to Weak

and U to 0HUGE STORAGE BUDGET:Up to 3 entries allocated in

different tablesFast warmingSlide8

Managing the (U)seful bitSetting when

avoids

a

misprediction

(

Pred = target

) & (Alt ≠ target)

Global reset when « difficulties » to allocateDynamically monitor if more failures

than successes on allocationsSlide9

Most of the storage space for targets

32 bits per entry !!

More

than

12K (

PC,target

) pairs on CLIENT05

But only a maximum of 4038

different targetsUse 12 bit pointers + a 4K tableSlide10

Let us be realistic: leverage target

locality

All

targets

in

at

most 90 256KB regions

Use a 128-entry region table:Fully associative, 240 bytesSaves 7 bits per ITTAGE entry

Would have saved 39 bits on a 64-bit architecture !!Slide11

Target

Tag

Ctr

U

Region

offset

Region

pointerSlide12

The global historyConventional global branch

history

10 bits for indirect

jumps

, 5 bits for calls

mixing target and PC

-16 % MPPKISlide13

The global history (2)Including all branches ?

Only

indirect and calls:

-2.5 % MPPKI

But no conclusion:

without

2 branches on INT05 and INT06

just

the other waySlide14

+ the other tricks (for TAGE)Immediate Update

Mimicker

Storage

space

interleaving

Picking

the best set of history lengths

-1 %

MPPKISlide15

The Immediate Update MimickerIssue:

Some

mispredictions

due to

late

updates at retirement

Immediate Update Mimicker:Try to catch

these casesSlide16

P

T

A

Same

table,

same

entry

E

T

A

P

T

A

P

T

A

P

T

A

P

T

A

P

T

A

P

T

A

E

T

A

P

T

A

P

T

A

P

T

A

P

T

A

P

T

A

Misprediction

P(

rediction

)

T(

able)

A(

ddress

in the table)

P

T

A

P

T

A

P

T

A

P

T

A

P

T

A

P

T

A

P

T

A

P

T

A

P

T

A

P

T

A

P

T

A

P

T

A

Fetch

The

Immediate

Update

MimickerSlide17

=?

=?

=?

p

rediction

Xbar

Xbar

h[0,L1]

h[0,L1]

For the

competition

:

interleavingSlide18

For the competitionGuided

selection

of the best set of

history

lengths:4Kentries: 0,4Kentries:

0, 10, 4Kentries: 16, 27, 44, 60, 96, 109, 219, 449,

2Kentries: 487, 714, 1313, 2146, 3881Remember: 10 bits per indirect, 5 per callSlide19

Where is the limit ?

Less

than

3 % MPPKI

Why

did

you not use the « 12-bit pointer » trick ?Just winning 0.5 % MPPKISlide20

SummaryITTAGE directly derived

from

TAGE

History

should

include (PC+target

) for indirect and callsLocality on targets can

be leveragedMarginal tricks not really worth

Related Contents


Next Show more