/
Rathijit SeN David A. Wood Rathijit SeN David A. Wood

Rathijit SeN David A. Wood - PowerPoint Presentation

volatilenestle
volatilenestle . @volatilenestle
Follow
343 views
Uploaded On 2020-07-04

Rathijit SeN David A. Wood - PPT Presentation

Reusebased Online Models for Caches 6202013 ACM SIGMETRICS 2013 CMU Pittsburgh PA 1 The Problem 6202013 ACM SIGMETRICS 2013 CMU Pittsburgh PA 2 Caches power vs performance ID: 795664

pittsburgh 2013 sigmetrics cmu 2013 pittsburgh cmu sigmetrics acm hit urd hardware llc problem reuse study set bbbb core

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Rathijit SeN David A. Wood" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Rathijit SeNDavid A. Wood

Reuse-based Online Models for Caches

6/20/2013

ACM SIGMETRICS 2013 @ CMU

, Pittsburgh, PA

1

Slide2

The Problem6/20/2013

ACM SIGMETRICS 2013 @ CMU

, Pittsburgh, PA2

Caches: power

vs performance Reconfigurable caches

e.g

., IvyBridge

The Problem

:

Which configuration to select?

e.g., to get the best energy-efficiency?

Core

Core

Core

Core

Core

Core

Core

Core

LLC

LLC

LLC

LLC

LLC

LLC

LLC

LLC

DRAM

Miss

Fetch

Slide3

Cache Performance Prediction

6/20/2013ACM SIGMETRICS 2013 @ CMU

, Pittsburgh, PA3

We propose a

framework h = (r

·

B

)

·

φ

h: hit ratio

r: reuse-distance distribution (novel hardware support)

B: stochastic Binomial matrixφ: hit function (LRU, PLRU, RANDOM, NMRU)Case study: Energy-Delay Product (EDP) within 7% of minimum

Slide4

Agenda6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh

, PA4

The Problem

FrameworkLocality (r)

Matrix transformations (

B

)

Hit functions (

φ

)h = (r

·

B) · φHardware supportCase Study

Slide5

Cache Overview6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh

, PA5

Limited storage

Sets of (usually 64-byte) blocks

#blocks/set

=

associativity

(#ways)

Set Index + Address tags identify data

b

b

b

bbbb

bbb

bbbb

bbbb

bbb

bbbb

bbbb

bbb

Associativity (A)

Sets (S)

Address

Tag Match?

Y

Hit

Miss

N

Slide6

Last-Level Cache (LLC)

Workload Variation

swim

6/20/2013

ACM SIGMETRICS 2013 @ CMU

, Pittsburgh

, PA

6

ammp

,

blackscholes

,

bodytrack

, fluidanimate, freqmine, swaptions

equake,

gafort, wupwise

apache

mgrid

zeus

oltp

jbb

fma3d

Slide7

Bad configurations hurt!

6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

7

EDP (energy-delay product)

27% worse

218% worse

Minimum

Maximum

Slide8

Problem Summary

6/20/2013ACM SIGMETRICS 2013 @ CMU

, Pittsburgh, PA

8Reconfigurable

cachesMultiple replacement policies

Goal: Online miss-ratio prediction

b

b

b

b

b

b

b

b

bbbb

bbb

bbbb

bbbb

bbb

bbbb

bb

Associativity (A)

Sets (S)

Slide9

Indexing Assumption6/20/2013

ACM SIGMETRICS 2013 @ CMU

, Pittsburgh, PA9

Mapping of unique addresses to cache sets

Assumption: independent, uniform [Smith, 1978]Unique accesses as Bernoulli trials(Partial) Hashing

POWER4, POWER5, POWER6, Xeon

Simple XOR-based function [similar to

Cypher

, 2008]

Slide10

Agenda6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh

, PA10

The Problem

FrameworkLocality (r)

Matrix transformations (

B

)

Hit functions (

φ

)h = (r

·

B) · φHardware supportCase Study

Slide11

Temporal Locality Metrics6/20/2013

ACM SIGMETRICS 2013 @ CMU

, Pittsburgh, PA11

Unique Reuse Distance (URD)

#unique intervening addresses

x

y z

z

y

x

: URD(x)=2Stack Distance [Mattson, 1970] – 1

Large cache  large distances to track

Absolute Reuse Distance (ARD)#intervening addressesx y z z y x : ARD(x)=4

■ ■ … ■ ■

i

P(URD=i)

r

Size?

Slide12

Per-set Locality, r(S)

6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

12

r(S) is “compressed” as S (#sets) increasesLess of the tail is important

■ ■

■ ■

i

P(URD=

i

)

r

x

  

x

x

x

    

#sets: S

#sets: S

> S

Slide13

Agenda6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh

, PA13

The Problem

FrameworkLocality (r)

Matrix transformations (

B

)

Hit functions (

φ

)h = (r

·

B) · φHardware supportCase Study

Slide14

Generalized stochastic Binomial matrices [Strum, 1977]

r(S

) = r(1)

· B

(1 – 1/S, 1/S

)

Composition:

r

(S

) =

r(S) · B(1 – S/S

, S/S)      

0 0

0 0

0 0

0

0

0 0

0 0

0 0

0 0

0 Estimating per-set locality

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA14

 

   

      

     

      

■ ■ ■ ■ ■ ■ ■

i

P(URD=i)

k

i

r

B

P(k successes in i trials) i.e.,

P(k of i to the same set)

0

0

0

0

0

0

0

0

0

0

0

1

Slide15

Computation reuse & speedup6/20/2013

ACM SIGMETRICS 2013 @ CMU

, Pittsburgh, PA15

“Shorter” tail

 smaller matrices

r

(1)

r

(2

14

)

r

(213)

r(212)

r(211)r(2

10)

r

(2

10)

r(214)r

(213)r(212

)r(211)

r(1)

Now: compute

Later: hardware support

Size?

Poisson Approximation

■ ■

■ ■

i

P(URD=

i

)

r

Slide16

Size of r(210)?

6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

16

Prediction with r(210

) limited to URD < n

■ ■

■ ■

i

P(URD=

i

)

r

Slide17

Agenda6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh

, PA17

The Problem

FrameworkLocality (r)

Matrix transformations (

B

)

Hit functions (

φ

)h = (r

·

B) · φHardware supportCase Study

Slide18

Hit Function, φ

6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

18

φk

: P(x will

hit|URD

(x)=k)

Monotonically decreasing model

Intuition: larger

URD  same or larger eviction probability

φ

0 = 1φk ≤ φk-1φ

= 0

x

     

Not x

x

Slide19

Hit Function, φ

6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

19

Example: A=8

Slide20

Formulating φ

6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

20

φ(LRU): step-function

(

r

·

B

)

· φ(LRU)

 [Smith, 1978], [Hill & Smith, 1989]

φ(PLRU):Assumes on average, traffic evenly divided between subtreesφ(RANDOM):

Estimates #intervening misses using ARDφ(NMRU): similar to φ(RANDOM) except φ

1=1

Slide21

Agenda6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh

, PA21

The Problem

FrameworkLocality (r)

Matrix transformations (

B

)

Hit functions (

φ

)h = (r

·

B) · φHardware supportCase Study



Slide22

Prediction Accuracy

6/20/2013ACM SIGMETRICS 2013 @ CMU

, Pittsburgh, PA22

LRU, PLRU(A=2), NMRU(A=2): exact per-set model

Others: approximate per-set model

Slide23

Overheads6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh

, PA23

r

 =

r

·

B

: 6

80

μsec

Binomial  Poisson approximation for each row of Bh = (r · B)

· φ : 20  30 μsec Average over 24 configurationsB applied 8 times

Slide24

Agenda6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh

, PA24

The Problem

FrameworkLocality (r)

Matrix transformations (

B

)

Hit functions (

φ

)h = (r

·

B) · φHardware supportCase Study



Slide25

Computation reuse & speedup6/20/2013

ACM SIGMETRICS 2013 @ CMU

, Pittsburgh, PA25

“Shorter” tail

 smaller matrices

r

(1)

r

(2

14

)

r

(213)

r(212)

r(211)r(2

10)

r

(2

10)

r(214)r

(213)r(212

)r(211)

r(1)

Now: compute

Later: hardware support

Size=512

Poisson Approximation

■ ■

■ ■

i

P(URD=

i

)

r

Now

Slide26

Insights6/20/2013

ACM SIGMETRICS 2013 @ CMU

, Pittsburgh, PA26

x y z z y

x

: URD(

x

)=2

Unique

“remember” addressesOnly cardinality, not full addresses

Bloom filter for compact (approximate) representationr(210) is seen by any set of a cache with S=210 Filter address stream

■ ■ … ■ ■

i

P(URD=i

)

r

Slide27

Reference address register

access

insert

Set Filter

Control Logic

filtered access

load

hit

inc

reset

read

read

1024-bit Bloom

Filter

2 hash fns

9-bit

Counter

inc

512-entry Histogram array

Hardware Support for estimating

r

(2

10

)

6/20/2013

ACM SIGMETRICS 2013 @ CMU

, Pittsburgh

, PA

27

Start Sample

Addr

match?

Unique?

Remember

End Sample

N

Y (not hit)

Y

Slide28

Agenda6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh

, PA28

The Problem

FrameworkLocality (r)

Matrix transformations (

B

)

Hit functions (

φ

)h = (r

·

B) · φHardware supportCase Study



+ way counters

Slide29

LRU Way Counters [Suh, et al. 2002]

6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA29

One counter per

logical way (stack position)Determining logical position is hard

not totally (re-)ordered with every access

heuristics, e.g., for PLRU [

Kedzierski

, et al. 2010]

Other Limitations

Inclusion property Fixed #sets

S

= S : special case of reuse frameworkS  S ? Use Bprovided, enough tail of r(S) is available

Slide30

Min. EDP configuration

6/20/2013ACM SIGMETRICS 2013 @ CMU

, Pittsburgh, PA30

EDP within 7% of minimum

Reuse models outperform PLRU way counters in most cases

Slide31

Summary6/20/2013

ACM SIGMETRICS 2013 @ CMU

, Pittsburgh, PA31

The Problem:

Online miss-rate estimation for reconfigurable caches

We propose a

framework

h = (

r

·

B) · φ

h: hit-ratio

r: reuse-distance distribution (novel hardware support)B: stochastic Binomial matrixφ: hit function (LRU, PLRU, RANDOM, NMRU)

Case study: EDP within 7% of minimumFuture work: More policies, applications/case studies

Slide32

Also in the paper

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

32

r: lossy

summarization of the address traceEstimation for ARDOptimizations for LRU

Conditions for PLRU eviction

More details on models

& evaluation

Slide33

Reuse-based Online Models for Caches

6/20/2013

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

33

Questions?

Slide34

Example LLC performance6/20/2013

ACM SIGMETRICS 2013 @ CMU

, Pittsburgh, PA34

OLTP (TPC-C

+ IBM DB2)

Slide35

Estimating cache performance

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

35

Hit ratio = hits/access

 ∑

P(URD=

i

)

·

P(

hit|URD

=

i) = · Miss ratio = misses/access = 1 – hit ratio

Miss rate = misses/instruction = miss ratio x access/instruction

■ ■ ■ ■ …

■ ■

i

P(URD=i)

r



… 

i

P(hit|URD=

i)

φ i

Slide36

URD vs ARD

6/20/2013

ACM SIGMETRICS 2013 @ CMU

, Pittsburgh

, PA

36

x

x

z

0

z

1

z

2

z

3

z

k-1

{z

0

}*

{z

0

,z

1

}*

{z

0

,z

1

,z

2

}*

{z

0

,z

1

,z

2

,...,

z

k-1

}*

d

k

=

d

k-1

+1/

r

i

k

Approximation:

d

k