/
HK-NUCA: Boosting Data Searches in Dynamic NUCA for CMPs HK-NUCA: Boosting Data Searches in Dynamic NUCA for CMPs

HK-NUCA: Boosting Data Searches in Dynamic NUCA for CMPs - PowerPoint Presentation

relylancome
relylancome . @relylancome
Follow
342 views
Uploaded On 2020-10-06

HK-NUCA: Boosting Data Searches in Dynamic NUCA for CMPs - PPT Presentation

Javier Lira ψ Carlos Molina ф Antonio González ψλ λ Intel Barcelona Research Center Intel Labs UPC Barcelona Spain antoniogonzalezintelcom ф Dept Enginyeria Informàtica ID: 813586

core nuca data access nuca core access data banks scheme introduction dynamic performance rate ptr motivationmethodologyhk outline energy latency

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "HK-NUCA: Boosting Data Searches in Dynam..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

HK-NUCA: Boosting Data Searches in Dynamic NUCA for CMPs

Javier Lira ψCarlos Molina фAntonio González ψ,λ

λ Intel Barcelona Research CenterIntel Labs - UPCBarcelona, Spainantonio.gonzalez@intel.com

ф Dept. Enginyeria InformàticaUniversitat Rovira i VirgiliTarragona, Spaincarlos.molina@urv.net

ψ

Dept. Arquitectura de ComputadorsUniversitat Politècnica de Catalunya Barcelona, Spain javier.lira@ac.upc.edu

IPDPS 2011, Anchorage, AK (USA) –

May

17, 2011

Slide2

Introduction

2

Core

0

Core

1

Core

2

Core

3

Core

4

Core

5

Core

6

Core

7

NUCA

S-NUCA

(

Static

NUCA)

One

possible

location

in

the

NUCA

Simple

Trivial

search

of data

No

leverages

locality

D-NUCA

(

Dynamic

NUCA)

Multiple

candidate

banks

Migration

increases

complexity

Not

easy

to

find

data

Optimize

cache

access

latency

Slide3

Motivation

3

Significant performance potentialLimited

by the access scheme

Slide4

Access schemes in D-NUCA

Directory is not an alternativeNeeds to update block location on every migrationReduces D-NUCA

potentialityPotential bottleneckAlgorithmic-based schemesPartitioned multicast (hybrid access

scheme)1st step: Local bank + central banks (9 banks)2nd step: The other core’s local banks

4

Performance

EnergySerialLowLowParallelHighHigh

Slide5

Serial vs Parallel

5

Reduce

the number of messages required per access is crucial

Slide6

Objectives

6Optimize NUCA featuresProvide fast access when the data is near the

requesting coreReduce network contentionCrucial in both performance and energy

Slide7

Outline

Introduction and motivationMethodologyHK-NUCAResultsConclusions7

Slide8

Methodology

Simulation tools:Simics + GEMSCACTI v6.0Two scenarios:Multi-programmedMix of SPEC CPU2006Parallel applicationsPARSECNumber

of cores8 – UltraSPARC IIIiFrequency1.5 GHzMain Memory Size4

GbytesMemory Bandwidth512 Bytes/cyclePrivate L1 caches8 x 32 Kbytes

, 2-wayShared

L2 NUCA cache8 MBytes, 128 Banks

NUCA Bank64 KBytes, 8-wayL1 cache latency

3 cycles

NUCA bank

latency

4 cycles

Router

delay

1

cycle

On

-chip

wire

delay

1

cycle

Main

memory

latency

250

cycles

(

from

core

)

Slide9

Baseline architecture

D-NUCA cache8 MBytes128 BanksBank: 64 KBytes, 8-wayMigration scheme:Gradual Promotion

ReplacementLRUAccessPartitioned Multicast9

Core

0

Core

1

Core

2

Core

3

Core

4

Core

5

Core

6

Core

7

Slide10

Outline

Introduction and motivationMethodologyHK-NUCAResultsConclusions10

Slide11

HK-NUCA

Home Knows where to find data in the NUCA cacheHome bank knows which other banks

have at least one data block that it managesThere

is a HK-PTR per cache set in all banks.11

0

0

1

0

1

1

0

0

0

0

0

0

1

0

1

0

HK-PTR

Slide12

(2)

Call Home(3) Parallel accessHK-NUCA12

Core

0

Core

1

Core

2

Core

3

Core

4

Core

5

Core

6

Core

7

Core

0

(1)

Fast

access

0

0

1

0

1

1

0

0

0

0

0

0

1

0

1

0

Slide13

Managing Home knowledge

Actions that provoke an update of HK-PTR:New data enters to the cacheEviction from the

NUCA cacheMigration movementsMigrations are synchronized with HK-PTR updates13

Slide14

Overheads

HardwareImplementation HK-PTRsNetworkHome knowledge updates14

NUCA cache 8 MBytesHK-PTRs 32 KBytes

Slide15

Outline

Introduction and motivationMethodologyHK-NUCAResultsConclusions15

Slide16

Performance results

16Overall performance improvement of 4-6%

Workloads with high miss rate

Low

miss rate, but high hit

rate in the first two HK-NUCA

stages

Low

miss

rate

, high hit rate in the

parallel

access

stage

of HK-NUCA

Slide17

HK-NUCA accuracy

1785% of memory requests send less

than 6 messages to the NUCA

Slide18

On-chip network

traffic18Avg Messages

sent per requestPart. Multcast 10.03HK-NUCA (3-steps) 3.82HK-NUCA (2-steps) 4.06Perfect Search 1

Slide19

Energy consumption

results19HK-NUCA reduces dynamic energy consumption by more

than 50%

Slide20

Outline

Introduction and motivationMethodologyHK-NUCAResultsConclusions20

Slide21

Conclusions

D-NUCA enables to take profit of the non-uniformity of NUCA cachesD-NUCA benefits are restricted by the access scheme

usedHK-NUCA is an access scheme for D-NUCA organizationsAllows fast accesses

to data that is near the requesting coreHome knowledge reduces miss resolution time and network contention

Outperforms by 6% the

best performing access scheme

Reduces dynamic energy consumption by 50%21

Slide22

HK-NUCA: Boosting

data searches in Dynamic NUCA for CMPsQuestions?22

Slide23

Migration is

not the problem23S-NUCA

D-NUCA

Access scheme is the main limitation in D-NUCA