Eldhose Peter Anuj Arora Akriti Bagaria and Dr Smruti R Sarangi OONUCA IIT Delhi CISCO Bangalore Motivation Overlay NUCA Architecture Results Understand the problem Cache ID: 813584
Download The PPT/PDF document "Optical Overlay NUCA: A High Speed Subst..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Optical Overlay NUCA: A High Speed Substrate for Shared L2 Caches
Eldhose Peter*, Anuj Arora**, Akriti Bagaria* and Dr. Smruti R Sarangi*
OONUCA
*IIT Delhi, **CISCO Bangalore
Slide2Motivation
Overlay NUCA
ArchitectureResults
Slide3Understand the problem - Cache
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches3
L1L2
Lower Level MemoryL2L2L2L2
L2
L2
L2
L2
UCA
NUCA
Sets
Improved cache utilization
Static => Not adaptable based on access
pattern
Slide4Understand the problem – Optical Communication
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches4
L2L2L2L2
L2L2L2L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
50-60 cycles
1-2 cycles
Electrical
Optical
Slide5Optical Communication
12/14/2014Methods to Leverage Optical Networks for Multicore Processors5
Reservation assisted Single Write Multi Read(R-SWMR)SD1D2
D3
Basic Components
Slide6No prior work in cache using optical NOC
Electrical NOCSNUCADNUCARNUCA12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches6
L1Lower Level MemoryL1
L1L1L1L1
L1
L1
L1
Prior Approaches
Search
Migration near to the core
111010101110001
1000
100011
Tag
Set Index
Block Size
1000
HomeBank
10100
Slide7Equidistant nodes
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches7Banks are equidistant in terms of delay(approx)Dynamic creation of sets
Improves the utilization of banksImproves hit rate
S
X cycles
X cycles
I am near to S
I am also near to S
Slide8Phases of operation
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches8
Slide9Optical Overlay
Profiling information – Cache bank accesses, bank contention, cache lines usedExperimentally determined that the ring topology 8 banks is the best12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches
9
Slide10Creation of overlay
1
23
456
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
4
5
6
7
10
12
13
14
15
17
19
20
23
8
25
26
28
29
30
31
32
11
16
3
High
Low
Hybrid
Infreq
12/14/2014
Optical overlay NUCA: A high speed substrate for shared L2 caches
10
Slide11Operations in Overlay NUCA – Search
2
23
27
15
18
20
5
31
Home Bank
Broadcast
Two-Side Incremental (TSI)
12/14/2014
Optical overlay NUCA: A high speed substrate for shared L2 caches
11
Slide1218
20
21
24
26
29
31
32
Home Bank
Main Memory
Eviction from L2
Operations in Overlay NUCA - Eviction
12/14/2014
Optical overlay NUCA: A high speed substrate for shared L2 caches
12
Slide13TSI - Protocol
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches13
L1 CacheL2 Cache BankIf message type is request?MissIf space available in message queue?
No – NACK to sender(Exponential back off)Yes - SearchHitReply
If non home bank
Kill
to home bank
Kill
to opposite branch
Miss
Create Entry in
RCB
(Home
bank)
Send
request
message to children
Notify
message to home bank
Any Child?
Yes
No
Add
notify
Remove entry from RCB
Type of Message ?
Notify
Kill
Home bank?
Yes
Remove entry from MQ
No
If
notify
= 2
Send request to Main memory
RCB
Entry
Miss
Hit
Remove RCB Entry
Remove MQ Entry
Miss
Notify
Slide1412/14/2014
Optical overlay NUCA: A high speed substrate for shared L2 caches14Home Bank Controller
Message Queue
NACK controllerFullNACK MessageSearch Logic
Search Bank
Read/Write
Fill Bank
Cache Bank
Read/Write
Response
Victim Buffer
Miss
Overlay Info store
Forward request to other banks
Message ID
Block
Addr
MRBV
Miss
Eviction Logic
Hit
Evicted block
Migrate block
Response to
the sender
Main Memory
Main memory Write
Main memory Read
Notify
Hit
Kill controller
Kill
To core
Response Collection Buffer
Slide15Message Structure
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches15
Slide16Architecture
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches
16
Slide17Story till now
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches17
Optical OverlayOperationsHome Bank ControllerMessage Format
Architecture
Slide18Configuration
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches18
Slide19Results
Hits in Home Bank12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches
19More non-home bank hits => high performance in Optical Overlay NUCA
More non home bank hits
Slide20Results
Normalized Average Hit Latency0.4-0.80.2-0.55
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches20
Slide21Results
L2 Hit Rate
Comparable to DNUCA, much better than SNUCA12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches21More home bank hits
Slide22Results
Normalized IPC
2-3%50, 24,18%161%167%
Less L2 requests12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches22High non-home bank hits
Slide23What we achieved
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches23
Slide24Thank you
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches24
Slide25Future Work
Normalized Total Bank Accesses
20-30% betterLess accesses12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches25
Slide26Major Issues
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches26False MissVictim BufferMultiple copies
L2L2Search XMiss
Migrate XSearch XMiss
Slide27Overlay Structure
1 per 2 cyclesExponential back off
Rules for Eviction
12/14/2014
Optical overlay NUCA: A high speed substrate for shared L2 caches
27
Slide2812/14/2014
Optical overlay NUCA: A high speed substrate for shared L2 caches28
RCB
Entry
Miss
Hit
Remove RCB Entry
Remove MQ Entry
Miss
Notify
Message Id
Core
ID
Source
ID
Dest
ID
Req
Type
Home Bank ID
Physical Address
Slide29Prior Approaches
12/14/2014Methods to Leverage Optical Networks for Multicore Processors29Important ProposalsCorona (MWSR – Multi Read Single Write)Firefly (SWMR – Single Read Multi Write)ATACMajor Issues
High energy consumptionCorrectness issues
Slide30Overview
BackgroundOptical OverlayOperationsArchitectureResults12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches30
Slide31Background – Optical Communication
Basic componentsR-SWMR(Firefly)Reservation assisted Single Write Multi Read12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches31
Slide32In
an 8x8 NOC message can take up to 30-40 cycles even without contention1-2 cycles to reach the farthest node(7 ps/mm)Do We Need Optical Communication?12/14/2014Methods to Leverage Optical Networks for Multicore Processors
32
Slide33Motivation
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches33Shared cache is divided into cache banks and connected together using NOCCommunication via electric NOC is time consumingHit rate can be improved by creating bank setsSometimes a few sets can be accessed more while others are idle
Decreases the hit rate
Slide34Protocol
12/14/2014Optical overlay NUCA: A high speed substrate for shared L2 caches34
L1 CacheL2 Cache Home BankIf message type is request?MissIf space available in message queue?
No - Exponential back off to senderYes - SearchHitReplyIf non home bankKill to home bank
Kill to opposite branch
Miss
Create Entry in RCB and VB.
Send
request
message to children
Notify
message to home bank
Any Child?
Yes
No
Add notify
Remove entry from RCB
Type of Message ?
Notify
Kill
Home bank?
Yes
Remove entry from MQ
No
If notify = 2
Send request to Main memory
Slide3512/14/2014
Optical overlay NUCA: A high speed substrate for shared L2 caches35L2 Cache Home Bank
Add notifyRemove entry from RCBType of Message ?NotifyKill
Home bank?Yes Remove entry from MQNo
If notify = 2
Send request to Main memory