/
 DICE: Compressing DRAM Caches for Bandwidth and Capacity  DICE: Compressing DRAM Caches for Bandwidth and Capacity

DICE: Compressing DRAM Caches for Bandwidth and Capacity - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
344 views
Uploaded On 2020-04-06

DICE: Compressing DRAM Caches for Bandwidth and Capacity - PPT Presentation

Vinson Young Prashant Nair Moinuddin Qureshi 1 MOOREs LAW HITS BANDWIDTH WALL 2 Moores scaling encounters Bandwidth Wall 3DDRAM MITIGATES BANDWIDTH WALL 3 3DDRAM Hybrid Memory Cube HMC from Micron ID: 776207

bandwidth cache indexing dram bandwidth cache indexing dram compression traditional index spatial compressible incompressible capacity dice bai tsi tag

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document " DICE: Compressing DRAM Caches for Bandw..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

DICE: Compressing DRAM Caches for Bandwidth and Capacity

Vinson YoungPrashant NairMoinuddin Qureshi

1

Slide2

MOORE’s LAW HITS BANDWIDTH WALL

2

Moore’s scaling encounters Bandwidth Wall

Slide3

3D-DRAM MITIGATES BANDWIDTH WALL

3

3D-DRAM

Hybrid Memory Cube (HMC) from Micron,

High Bandwidth Memory (HBM) from Samsung

3D-DRAM improves bandwidth, but does not have capacity to replace conventional DIMM memory

Slide4

OS-visible Space

System Memory

3D-DRAM as a CACHE (3D-DRAM CACHE)

4

3D-DRAM Cache

Memory Hierarchy

fast

slow

CPU

L1$

L2$

L3$

CPU

L2$

L1$

Architecting 3D-DRAM as a cache can improve memory bandwidth (and avoid OS/software change)

MCDRAM from Intel,

HBC from AMD

Slide5

Practical DRAM cache: low latency and bandwidth-efficient

Tags “part-of-line”

 Alloy

Tag+Data  Avoid Tag Serialization

One “Tag+Data”

Similar to DRAM Cache in KNL: Direct-mapped, Tags in ECC

PRACTICAL 3D-DRAM CACHE: ALLOY CACHE

Slide6

3D-DRAM Cache Bandwidth is important

6

2x-capacity cache improves performance by 10%.

And, additional 2x bandwidth increases speedup to 22%.Improving both bandwidth and capacity is valuable.

On 8-CPU, 1GB DRAM Cache configuration

10%

22%

Slide7

7

Baseline: Direct-Mapped, One Data Block in an access

A

D

Baseline

C

B

A

B

C

D

INTRODUCTION: DRAM CACHE

7

A

Spatial

Indexing

(

Incompressible

)

C

Y

W

A

D

Traditional

Compression

(

Incompressible

)

C

B

A

D

Spatial Indexing

(

Compressible

)

C

B

W

X

Y

Z

Slide8

8

Compression: Adds capacity, improve bandwidth?

A

B

C

D

A

D

Traditional

Compression

(

Compressible

)

C

B

W

X

Y

Z

INTRODUCTION: COMPRESSED DRAM CACHE

A

Spatial

Indexing

(

Incompressible

)

C

Y

W

A

D

Traditional

Compression

(

Incompressible

)

C

B

A

D

Spatial Indexing

(

Compressible

)

C

B

W

X

Y

Z

1x

Bandwidth

Slide9

9

Compression: Adds capacity, improve bandwidth?

A

B

C

D

A

D

Traditional

Compression

(

Incompressible

)

C

B

A

D

Traditional

Compression

(

Compressible

)

C

B

W

X

Y

Z

INTRODUCTION: COMPRESSED DRAM CACHE

A

Spatial

Indexing

(

Incompressible

)

C

Y

W

A

D

Spatial Indexing

(

Compressible

)

C

B

W

X

Y

Z

1x

Bandwidth

Slide10

A

D

Traditional

Compression

(

Compressible

)

C

B

W

X

Y

Z

A

D

Traditional

Compression

(

Incompressible

)

C

B

10

Compression: Adds capacity, improve bandwidth?

A

B

C

D

4 accesses

@

1x-2x Capacity

INTRODUCTION: COMPRESSED DRAM CACHE

A

Spatial

Indexing

(

Incompressible

)

C

Y

W

A

D

Spatial Indexing

(

Compressible

)

C

B

W

X

Y

Z

Slide11

11

Compression: Adds capacity, improve bandwidth?

A

B

C

D

A

D

Spatial Indexing

(

Compressible

)

C

B

W

X

Y

Z

2x

Bandwidth

INTRODUCTION: COMPRESSED DRAM CACHE

A

D

Traditional

Compression

(

Compressible

)

C

B

W

X

Y

Z

A

D

Traditional

Compression

(

Incompressible

)

C

B

A

Spatial

Indexing

(

Incompressible

)

C

Y

W

Slide12

12

Compression: Adds capacity, improve bandwidth?

A

B

C

D

A

Spatial

Indexing

(

Incompressible

)

C

Y

W

B,D?

<

1x

Bandwidth

INTRODUCTION: COMPRESSED DRAM CACHE

A

D

Traditional

Compression

(

Compressible

)

C

B

W

X

Y

Z

A

D

Traditional

Compression

(

Incompressible

)

C

B

A

D

Spatial Indexing

(

Compressible

)

C

B

W

X

Y

Z

Slide13

13

Compression: Adds capacity, improve bandwidth?

Compressible

Incompressible

2x

Bandwidth

<

1x

Bandwidth

Incompressible

Compressible

Traditional Compression

Spatial Indexing

INTRODUCTION: COMPRESSED DRAM CACHE

1x

Bandwidth

1x

Bandwidth

Slide14

INTRODUCTION:

Traditional Compression

14

Compression for capacity (TSI) sees little speedup (7%) due to diminishing returns on giga-scale caches

Improves Capacity

No degradation

Little speedup (7%)

Slide15

INTRODUCTION:

SPATIAL Indexing

15

Spatial Indexing compression gets both benefits of bandwidth and capacity when lines are compressible. But, it hurts performance when lines are incompressible

Improves Bandwidth

Can degrade

No speedup

Slide16

Compressible

Incompressible

<

1x

Bandwidth

1x

Bandwidth

16

Goal: Compression for Capacity

AND

Bandwidth

Compressible

2x

Bandwidth

Incompressible

DICE (Dynamic Index)

 19% Speedup + 36% EDP

Spatial Indexing

INTRODUCTION: COMPRESSED DRAM CACHE

Traditional Compression

1x

Bandwidth

Slide17

DICE OVERVIEW

Compressed DRAM Cache OrganizationFlexible Mapping for Quick SwitchingDynamic Indexing ComprEssion (DICE)Insertion PolicyIndex Prediction

17

Slide18

18

L3 Cache

L4 Cache Controller

Memory

DRAM Cache

Writeback

Install

Read

Write

Compression Logic

Decompression Logic

DRAM Cache (compressed)

On-chip

Off-chip

Compression: Simple changes within the controller

PRACTICAL DRAM CACHE COMPRESSION

Slide19

19

Tag A

Data A

Tag Boundary

Data

Cache controller receives 72B of

tag+data. It can flexibly interpret bits as tag bits or data bits.

DRAM CACHE TAG format

8 Bytes

64 Bytes

Slide20

PROPOSED FLEXIBLE TAG FORMAT

20

A

B

I

X

A

B

X

Is Tag?

Tag Boundary

Data

Not Tag

Is Tag?

Is Tag?

We create Tag space as needed, for up to 28 lines. Achieves 1.6x effective capacity.

Slide21

DICE OVERVIEW

Compressed DRAM Cache OrganizationFlexible Mapping for Quick SwitchingDynamic Indexing ComprEssion (DICE)Insertion PolicyIndex Prediction

21

Slide22

5

7

Flexible Mapping (TSI or BAI)

22

0

1

2

3

4

5

6

7

0

4

2

6

1

3

Traditional Set Indexing (TSI)

Bandwidth-Aware Indexing (BAI)

4

6

5

7

0

2

1

3

Naïve Spatial Indexing

Bandwidth-Aware Indexing (BAI)

facilitates quick switching between two indices

TSI

and

BAI

.

Slide23

1

2

3

4

5

6

5

7

Flexible Mapping (TSI or BAI)

23

0

1

2

3

4

5

6

7

0

4

2

6

1

3

Traditional Set Indexing (TSI)

Bandwidth-Aware Indexing (BAI)

7

0

Naïve Spatial Indexing

Bandwidth-Aware Indexing (BAI)

facilitates quick switching between two indices

TSI

and

BAI

.

Slide24

5

7

Flexible Mapping (TSI or BAI)

24

0

1

2

3

4

5

6

7

0

4

2

6

1

3

Traditional Set Indexing (TSI)

Bandwidth-Aware Indexing (BAI)

4

6

5

7

0

2

1

3

Naïve Spatial Indexing

Bandwidth-Aware Indexing (BAI)

facilitates quick switching between two indices

TSI

and

BAI

.

4

6

1

3

Slide25

DICE OVERVIEW

Compressed DRAM Cache OrganizationFlexible Mapping for Quick SwitchingDynamic Indexing ComprEssion (DICE)Insertion PolicyIndex Prediction

25

Slide26

DICE: Dynamic-indexed Compressed Cache

26

DRAM Cache

Compressibility Based Insertion

Install

Read

DICE: Dynamic-Indexing Cache

comprEssion

,

decides index on install, and predicts index on read

Cache Index Prediction

Bandwidth-Aware Index

Traditional Set Index

TSI = BAI

?

?

Slide27

Compressibility-based Insertion

27

DRAM Cache

Compressibility Based Insertion

Install

Compressibilty

-based insertion uses

Bandwidth-Aware Indexing

when lines are compressible, and TSI otherwise

Bandwidth-Aware Index

Traditional Set Index

TSI = BAI

> ½-size

<= ½-size

?

But checking both wastes bandwidth

No explicit swaps. Eviction and install decides policy

Read

Slide28

SIMILAR INTRA-Page COMPRESSIBILITY

28

Indices seen in a Compressible Page

Install

DICE is likely to install lines of a page into similar index

Bandwidth-Aware Index

Bandwidth-Aware Index

Bandwidth-Aware Index

<= ½-size

Read BAI

Lines within a page have similar compressibility

Slide29

SIMILAR INTRA-Page COMPRESSIBILITY

29

Indices seen in an Incompressible Page

Install

Thus, page-based last-time prediction of index

can be accurate (94%)

Traditional Set Index

Traditional Set Index

Traditional Set Index

> ½-size

Read TSI

Traditional Set Index

Bandwidth-Aware Index

Lines within a page have similar compressibility

2

nd

access only on

mispredict

Slide30

Page-based last-time prediction exploits similar intra-page compressibility, to achieve high prediction accuracy (94%)

PAGE-based Cache INDEX PREDICTOR (CIP)

30

0

1

0

0

0

1

1

1

Page #

Hash

Demand Access

Predict

Traditional Set Index

0

=

Traditional Set Index

1 = Bandwidth-Aware Index

Last-Time Table (LTT)

Slide31

DICE OVERVIEW

Compressed DRAM Cache OrganizationFlexible Mapping for Quick SwitchingDynamic Indexing (DICE)Insertion PolicyIndex PredictionResults

31

Slide32

Core Chip

3.2GHz 4-wide out-of-order core

8 cores, 8MB shared last-level cacheCompressionFPC + BDI

Methodology (1/8th Knights Landing)

32

Stacked DRAM

Commodity DRAM

CPU

Slide33

Methodology (1/8th Knights Landing)

33

Stacked DRAM

Commodity DRAM

Stacked DRAM

Commodity

DRAM

Capacity

1GB32GBBusDDR1.6GHz, 128-bitDDR1.6GHz, 64-bitChannels4 channels1 channelBandwidth100 GBps12.5 GBpsLatency35ns35ns

CPU

Other sensitivities in paper

Slide34

DICE RESULTS

34

DICE

improves performance over both Spatial Indexing and Traditional Indexing with fine-grain decision (19%)

Performs as Spatial

Indexing

Performs as Traditional

Indexing

DICE outperforms both

Slide35

Compressible

Incompressible

<

1x

Bandwidth

1x

Bandwidth

35

Goal: Compression for Capacity

AND

Bandwidth

Compressible

2x

Bandwidth

Incompressible

DICE (Dynamic Index)

 19% Speedup + 36% EDP

Spatial Indexing

INTRODUCTION: COMPRESSED DRAM CACHE

Traditional Compression

1x

Bandwidth

Slide36

Thank you

36

Slide37

Extra SLIDES

Extra Slides

37

Slide38

DIFFERENT CACHE SENSITIVITIES

38

Slide39

Comparison to prefetch

39

Slide40

Comparison to sram /memory compression

40

Slide41

FULL RESULTS (mixed compressibility)

41

Slide42

SRAM Cache compression on DRAM CACHE

42

Slide43

Distribution for index decision

43

Slide44

DICE INSERTION threshold

44

Slide45

EFFECTIVE CAPACITY

45

Slide46

L3 hit rate improvement

46

Slide47

Larger TSI vs. BAI example

47

Slide48

48