Kbytes ITTAGE indirect branch predictor André Seznec INRIAIRISA Build on ITTAGE ITTAGE Introduced at the same time as TAGE JILP 2006 Derived directly from the TAGE ID: 371509
Download Presentation The PPT/PDF document "A 64" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A 64 Kbytes ITTAGE indirect branch predictor
André
Seznec
INRIA/IRISASlide2
Build on ITTAGEITTAGE:Introduced
at
the
same
time as TAGE (JILP 2006)
Derived
directly
from
the TAGE
predictor
:
Target
prediction
instead
of direction
predictionSlide3
ITTAGE: multiple tables, global history predictor
The set of history lengths forms a
geometric
series
What is important:
L(i)-L(i-1) is drastically increasing
most of the storage
for short history !!
{
0, 2, 4, 8, 16, 32, 64, 128
}
Capture correlation
on very long historiesSlide4
pc
h[0:L1]
=?
=?
=?
prediction
pc
pc
h[0:L2]
pc
h[0:L3]
32
32
1
32
1
32
1
32
32
Tagless base
Predictor
The ITTAGE
predictorSlide5
Prediction computationGeneral case:
Longest matching component provides the prediction
Special case:
Many
mispredictions
on newly allocated entries: weak
Ctr
Sometimes
Altpred (slightly) more accurate than
PredProperty dynamically monitored through a single 4-bit counter
-2
% MPPKISlide6
A tagged table entryCtr: 2-bit hysteresis counter
U:
1-bit
useful counter
Was the entry recently useful ?
Tag: partial tag
Target: the target
Target
Tag
Ctr
U
32 bits or
some
way to reconstruct itSlide7
Allocate entries on mispredictionsAllocate
entries
in longer
history
length
tablesOn tables with
U unsetSet Ctr to Weak
and U to 0HUGE STORAGE BUDGET:Up to 3 entries allocated in
different tablesFast warmingSlide8
Managing the (U)seful bitSetting when
avoids
a
misprediction
(
Pred = target
) & (Alt ≠ target)
Global reset when « difficulties » to allocateDynamically monitor if more failures
than successes on allocationsSlide9
Most of the storage space for targets
32 bits per entry !!
More
than
12K (
PC,target
) pairs on CLIENT05
But only a maximum of 4038
different targetsUse 12 bit pointers + a 4K tableSlide10
Let us be realistic: leverage target
locality
All
targets
in
at
most 90 256KB regions
Use a 128-entry region table:Fully associative, 240 bytesSaves 7 bits per ITTAGE entry
Would have saved 39 bits on a 64-bit architecture !!Slide11
Target
Tag
Ctr
U
Region
offset
Region
pointerSlide12
The global historyConventional global branch
history
10 bits for indirect
jumps
, 5 bits for calls
mixing target and PC
-16 % MPPKISlide13
The global history (2)Including all branches ?
Only
indirect and calls:
-2.5 % MPPKI
But no conclusion:
without
2 branches on INT05 and INT06
just
the other waySlide14
+ the other tricks (for TAGE)Immediate Update
Mimicker
Storage
space
interleaving
Picking
the best set of history lengths
-1 %
MPPKISlide15
The Immediate Update MimickerIssue:
Some
mispredictions
due to
late
updates at retirement
Immediate Update Mimicker:Try to catch
these casesSlide16
P
T
A
Same
table,
same
entry
E
T
A
P
T
A
P
T
A
P
T
A
P
T
A
P
T
A
P
T
A
E
T
A
P
T
A
P
T
A
P
T
A
P
T
A
P
T
A
Misprediction
P(
rediction
)
T(
able)
A(
ddress
in the table)
P
T
A
P
T
A
P
T
A
P
T
A
P
T
A
P
T
A
P
T
A
P
T
A
P
T
A
P
T
A
P
T
A
P
T
A
Fetch
The
Immediate
Update
MimickerSlide17
=?
=?
=?
p
rediction
Xbar
Xbar
h[0,L1]
h[0,L1]
For the
competition
:
interleavingSlide18
For the competitionGuided
selection
of the best set of
history
lengths:4Kentries: 0,4Kentries:
0, 10, 4Kentries: 16, 27, 44, 60, 96, 109, 219, 449,
2Kentries: 487, 714, 1313, 2146, 3881Remember: 10 bits per indirect, 5 per callSlide19
Where is the limit ?
Less
than
3 % MPPKI
Why
did
you not use the « 12-bit pointer » trick ?Just winning 0.5 % MPPKISlide20
SummaryITTAGE directly derived
from
TAGE
History
should
include (PC+target
) for indirect and callsLocality on targets can
be leveragedMarginal tricks not really worth