/
Implementing a Hybrid SRAM / Implementing a Hybrid SRAM /

Implementing a Hybrid SRAM / - PowerPoint Presentation

bikersjoker
bikersjoker . @bikersjoker
Follow
350 views
Uploaded On 2020-10-06

Implementing a Hybrid SRAM / - PPT Presentation

eDRAM NUCA Architecture Javier Lira UPC Spain Carlos Molina URV Spain javierliraacupcedu carlosmolinaurvnet David Brooks Harvard USA Antonio González IntelUPC ID: 813587

sram core banks nuca core sram nuca banks edram cache hybrid tda performance architectural access cacheanalysis designexploiting data power

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Implementing a Hybrid SRAM /" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Implementing a HybridSRAM / eDRAM NUCA Architecture

Javier Lira (UPC, Spain) Carlos Molina (URV, Spain) javier.lira@ac.upc.edu carlos.molina@urv.net David Brooks (Harvard, USA) Antonio González (Intel-UPC, Spain) dbrooks@eecs.harvard.edu antonio.gonzalez@intel.com

HiPC 2011, Bangalore (India) – December 21, 2011

Slide2

CMPs incorporate

large LLC.POWER7 implements L3 cache with eDRAM.3x density.3.5x lower energy consumption.Increases latency few cycles.We propose a placement policy to accomodate both technologies

in a NUCA cache.Motivation

40-45% chip

area

2

Slide3

NUCA divides a large cache in smaller and faster banks.Cache access latency consists of the routing and bank access latencies.

Banks close to cache controller have smaller latencies than further banks.NUCA caches [1]

Processor

[1] Kim et al.

An

Adaptive

, Non-

Uniform

Cache

Structure

for

Wire-Delay

Dominated

On

-Chip

Architectures

. ASPLOS’02

3

Slide4

SRAM provides high-performance.

eDRAM provides low power and high density.SRAMeDRAMLatencyX1.5xDensityX3xLeakage

2xXDynamic energy

1.5x

X

Need

refresh

?NoYes

SRAM vs. eDRAM4

Slide5

IntroductionMethodology

Implementing a hybrid NUCA cacheAnalysis of our designExploiting architectural benefitsConclusionsOutline5

Slide6

Baseline architecture [2]

Migration

Placement

Access

Replacement

Placement

Access

Migration

Replacement

Core

0

Core

1

Core

2

Core

3

Core

4

Core

5

Core

6

Core

7

16 positions per data

Partitioned

multicast

Gradual

promotion

LRU +

Zero-copy

Core

0

[2]

Beckmann

and Wood.

Managing

Wire

Delay

in

Large

Chip-

Multiprocessor

Caches

. MICRO’04

6

Slide7

Number

of cores8 – UltraSPARC IIIiFrequency1.5 GHzMain Memory Size4 GbytesMemory Bandwidth512 Bytes/cyclePrivate L1 caches

8 x 32 Kbytes, 2-way

Shared

L2 NUCA cache

8

MBytes

, 128 Banks

NUCA Bank64 KBytes

, 8-wayL1 cache latency

3 cyclesNUCA bank

latency

4

cycles

Router

delay

1

cycle

On

-chip

wire

delay

1

cycle

Main

memory

latency

250 cycles (from core)

Experimental framework

GEMS

Simics

Solaris

10

PARSEC

SPEC CPU2006

8 x

UltraSPARC

IIIi

Ruby

Garnet

Orion

7

Slide8

IntroductionMethodology

Implementing a hybrid NUCA cacheAnalysis of our designExploiting architectural benefitsConclusionsOutline8

Slide9

Fast SRAM banks are located close to the cores.Slower eDRAM banks in the center of the NUCA cache.

PROBLEM: Migration tends to concentrate shared data in central banks.9Homogeneous approach

Core

0

Core

1

Core

2

Core

3

Core

4

Core

5

Core

6

Core

7

eDRAM

SRAM

Slide10

Significant

amount of data in the LLC are not accessed during their lifetime.SRAM banks store most frequently accessed data.eDRAM banks allocate data blocks that either:Just arrived to the NUCA, orWere evicted from SRAM banks.10Data usage analysis

Slide11

First goes to

an eDRAM.If accessed, it moves to SRAM.Features:Migration between SRAM banks.Lack of communication in eDRAM.No eviction from SRAM banks.eDRAM is extra storage

for SRAM.PROBLEM: Access scheme must search

to

the

double number

of banks.Heterogeneous approach

eDRAM

SRAM

Core

0

Core

1

Core

2

Core

3

Core

4

Core

5

Core

6

Core

7

11

Slide12

Tag Directory Array

(TDA) stores tags of eDRAM banks.Using TDA, the access scheme looks up to 17 banks.TDA requires 512 Kbytes for an 8 Mbyte (4S-4D) hybrid NUCA cache.

12TDA

Slide13

Heterogeneous + TDA outperforms the other hybrid alternatives.

13Performance resultsWe use Heterogeneous + TDA as hybrid NUCA cache in further analysis.

Slide14

IntroductionMethodologyImplementing

a hybrid NUCA cacheAnalysis of our designExploiting architectural benefitsConclusionsOutline14

Slide15

Well-balanced configurations achieve

similar performance as all-SRAM NUCA cache.The majority of hits are in SRAM banks.Performance15

Slide16

Hybrid NUCA pays

for TDA.The less SRAM the hybrid NUCA uses, the better.Power and Area16

Slide17

Similar performance results as all-SRAM.

Reduces power consumption by 10%.Occupies 15% less area than all-SRAM.The best configuration

4S-4D

17

Slide18

IntroductionMethodologyImplementing

a hybrid NUCA cacheAnalysis of our designExploiting architectural benefitsConclusionsOutline18

Slide19

19New configurations

all SRAM banksSRAM: 4MByteseDRAM: 4MBytes15% reduction on area+1MByte in SRAM banks

+2MBytes in eDRAM banks5S-4D

4S-6D

SRAM

eDRAM

Slide20

And do not increase

power consumption.Both configurations increases performance by 4%.Exploiting benefits20

Slide21

IntroductionMethodologyImplementing

a hybrid NUCA cacheAnalysis of our designExploiting architectural benefitsConclusionsOutline21

Slide22

IBM® integrates eDRAM in its

latest general-purpose processor.We implement a hybrid NUCA cache, that effectively combines SRAM and eDRAM technologies.Our placement policy succeeds in concentrating most accesses to the SRAM banks.Well-balanced

hybrid cache achieves similar performance as all-SRAM configuration, but

occupies

15%

less

area and

dissipates 10% less power.Exploiting

architectural benefits we achieve up

to 10% performance improvement, and by 4%, on average.

Conclusions

22

Slide23

Implementing a HybridSRAM / eDRAM NUCA Architecture

Questions?HiPC 2011, Bangalore (India) – December 21, 2011