/
Optical Overlay NUCA: A High Speed Substrate for Shared L2 Caches Optical Overlay NUCA: A High Speed Substrate for Shared L2 Caches

Optical Overlay NUCA: A High Speed Substrate for Shared L2 Caches - PowerPoint Presentation

moistbiker
moistbiker . @moistbiker
Follow
344 views
Uploaded On 2020-10-06

Optical Overlay NUCA: A High Speed Substrate for Shared L2 Caches - PPT Presentation

Eldhose Peter Anuj Arora Akriti Bagaria and Dr Smruti R Sarangi OONUCA IIT Delhi CISCO Bangalore Motivation Overlay NUCA Architecture Results Understand the problem Cache ID: 813584

nuca overlay speed high overlay nuca high speed substrate shared bank 2014optical optical message entry notify cache remove caches

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Optical Overlay NUCA: A High Speed Subst..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Optical Overlay NUCA: A High Speed Substrate for Shared L2 Caches

Eldhose Peter*, Anuj Arora**, Akriti Bagaria* and Dr. Smruti R Sarangi*

OONUCA

*IIT Delhi, **CISCO Bangalore

Slide2

Motivation

Overlay NUCA

ArchitectureResults

Slide3

Understand the problem - Cache

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches3

L1L2

Lower Level MemoryL2L2L2L2

L2

L2

L2

L2

UCA

NUCA

Sets

Improved cache utilization

Static => Not adaptable based on access

pattern

Slide4

Understand the problem – Optical Communication

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches4

L2L2L2L2

L2L2L2L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

L2

50-60 cycles

1-2 cycles

Electrical

Optical

Slide5

Optical Communication

12/14/2014Methods to Leverage Optical Networks for Multicore Processors5

Reservation assisted Single Write Multi Read(R-SWMR)SD1D2

D3

Basic Components

Slide6

No prior work in cache using optical NOC

Electrical NOCSNUCADNUCARNUCA12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches6

L1Lower Level MemoryL1

L1L1L1L1

L1

L1

L1

Prior Approaches

Search

Migration near to the core

111010101110001

1000

100011

Tag

Set Index

Block Size

1000

HomeBank

10100

Slide7

Equidistant nodes

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches7Banks are equidistant in terms of delay(approx)Dynamic creation of sets

Improves the utilization of banksImproves hit rate

S

X cycles

X cycles

I am near to S

I am also near to S

Slide8

Phases of operation

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches8

Slide9

Optical Overlay

Profiling information – Cache bank accesses, bank contention, cache lines usedExperimentally determined that the ring topology 8 banks is the best12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches

9

Slide10

Creation of overlay

1

23

456

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

4

5

6

7

10

12

13

14

15

17

19

20

23

8

25

26

28

29

30

31

32

11

16

3

High

Low

Hybrid

Infreq

12/14/2014

Optical overlay NUCA: A high speed substrate for shared L2 caches

10

Slide11

Operations in Overlay NUCA – Search

2

23

27

15

18

20

5

31

Home Bank

Broadcast

Two-Side Incremental (TSI)

12/14/2014

Optical overlay NUCA: A high speed substrate for shared L2 caches

11

Slide12

18

20

21

24

26

29

31

32

Home Bank

Main Memory

Eviction from L2

Operations in Overlay NUCA - Eviction

12/14/2014

Optical overlay NUCA: A high speed substrate for shared L2 caches

12

Slide13

TSI - Protocol

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches13

L1 CacheL2 Cache BankIf message type is request?MissIf space available in message queue?

No – NACK to sender(Exponential back off)Yes - SearchHitReply

If non home bank

Kill

to home bank

Kill

to opposite branch

Miss

Create Entry in

RCB

(Home

bank)

Send

request

message to children

Notify

message to home bank

Any Child?

Yes

No

Add

notify

Remove entry from RCB

Type of Message ?

Notify

Kill

Home bank?

Yes

Remove entry from MQ

No

If

notify

= 2

Send request to Main memory

RCB

Entry

Miss

Hit

Remove RCB Entry

Remove MQ Entry

Miss

Notify

Slide14

12/14/2014

Optical overlay NUCA: A high speed substrate for shared L2 caches14Home Bank Controller

Message Queue

NACK controllerFullNACK MessageSearch Logic

Search Bank

Read/Write

Fill Bank

Cache Bank

Read/Write

Response

Victim Buffer

Miss

Overlay Info store

Forward request to other banks

Message ID

Block

Addr

MRBV

Miss

Eviction Logic

Hit

Evicted block

Migrate block

Response to

the sender

Main Memory

Main memory Write

Main memory Read

Notify

Hit

Kill controller

Kill

To core

Response Collection Buffer

Slide15

Message Structure

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches15

Slide16

Architecture

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches

16

Slide17

Story till now

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches17

Optical OverlayOperationsHome Bank ControllerMessage Format

Architecture

Slide18

Configuration

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches18

Slide19

Results

Hits in Home Bank12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches

19More non-home bank hits => high performance in Optical Overlay NUCA

More non home bank hits

Slide20

Results

Normalized Average Hit Latency0.4-0.80.2-0.55

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches20

Slide21

Results

L2 Hit Rate

Comparable to DNUCA, much better than SNUCA12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches21More home bank hits

Slide22

Results

Normalized IPC

2-3%50, 24,18%161%167%

Less L2 requests12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches22High non-home bank hits

Slide23

What we achieved

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches23

Slide24

Thank you

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches24

Slide25

Future Work

Normalized Total Bank Accesses

20-30% betterLess accesses12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches25

Slide26

Major Issues

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches26False MissVictim BufferMultiple copies

L2L2Search XMiss

Migrate XSearch XMiss

Slide27

Overlay Structure

1 per 2 cyclesExponential back off

Rules for Eviction

12/14/2014

Optical overlay NUCA: A high speed substrate for shared L2 caches

27

Slide28

12/14/2014

Optical overlay NUCA: A high speed substrate for shared L2 caches28

RCB

Entry

Miss

Hit

Remove RCB Entry

Remove MQ Entry

Miss

Notify

Message Id

Core

ID

Source

ID

Dest

ID

Req

Type

Home Bank ID

Physical Address

Slide29

Prior Approaches

12/14/2014Methods to Leverage Optical Networks for Multicore Processors29Important ProposalsCorona (MWSR – Multi Read Single Write)Firefly (SWMR – Single Read Multi Write)ATACMajor Issues

High energy consumptionCorrectness issues

Slide30

Overview

BackgroundOptical OverlayOperationsArchitectureResults12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches30

Slide31

Background – Optical Communication

Basic componentsR-SWMR(Firefly)Reservation assisted Single Write Multi Read12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches31

Slide32

In

an 8x8 NOC message can take up to 30-40 cycles even without contention1-2 cycles to reach the farthest node(7 ps/mm)Do We Need Optical Communication?12/14/2014Methods to Leverage Optical Networks for Multicore Processors

32

Slide33

Motivation

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches33Shared cache is divided into cache banks and connected together using NOCCommunication via electric NOC is time consumingHit rate can be improved by creating bank setsSometimes a few sets can be accessed more while others are idle

Decreases the hit rate

Slide34

Protocol

12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches34

L1 CacheL2 Cache Home BankIf message type is request?MissIf space available in message queue?

No - Exponential back off to senderYes - SearchHitReplyIf non home bankKill to home bank

Kill to opposite branch

Miss

Create Entry in RCB and VB.

Send

request

message to children

Notify

message to home bank

Any Child?

Yes

No

Add notify

Remove entry from RCB

Type of Message ?

Notify

Kill

Home bank?

Yes

Remove entry from MQ

No

If notify = 2

Send request to Main memory

Slide35

12/14/2014

Optical overlay NUCA: A high speed substrate for shared L2 caches35L2 Cache Home Bank

Add notifyRemove entry from RCBType of Message ?NotifyKill

Home bank?Yes Remove entry from MQNo

If notify = 2

Send request to Main memory