Instructor Mikko H Lipasti Fall 2010 University of WisconsinMadison Optional lecture just for fun Good News Technology advances at astounding rate 19 th century attempts to build mechanical computers ID: 653089
Download Presentation The PPT/PDF document "ECE/CS 552: Nanophotonics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
ECE/CS 552: Nanophotonics
Instructor: Mikko H
Lipasti
Fall 2010
University of
Wisconsin-Madison
Optional lecture – just for “fun”Slide2
Good News
Technology advances at astounding rate
19
th century: attempts to build mechanical computersEarly 20th century: mechanical counting systems (cash registers, etc.)Mid 20th century: vacuum tubes as switchesSince: transistors, integrated circuits1965: Moore’s law [Gordon Moore]Predicted doubling of IC capacity every 18 monthsDrives functionality, performance, costExponential improvement for 40+ yearsBuilt on Von Neumann model (fetch/execute)Slide3
Distributed processing on chip
Future chips rely on distributed processing
Many computation/cache/DRAM/IO nodes
Placement, topology, core uarch/strength, tbdConventional interconnects may not sufficeBuses not viableCrossbars are slow, power-hungry, expensiveNOCs impose latency, power overheadNanophotonics to the rescueCommunicate with photonsInherent bandwidth, latency, energy advantagesSilicon integration becoming a reality
Challenges & opportunities remainSlide4
~3.5
μ
m
[HP]
Ring Resonator
Wavelength Magnet
Si Photonics: How it works
[Intel]
0.5
μ
m
Waveguide
Optical Wire
Laser
Off-Chip Power
[Koch ‘07]
OFF
ONSlide5
Ring Resonators
5
OFF
ON : Diverting
ON : Diverting
+ Detecting
ON : InjectingSlide6
Key attributes of Si photonicsVery low latency, very high bandwidth
Up to 1000x energy efficiency gain
Challenges
Resonator thermal tuning: heatersIntegration, fabrication, is this real?OpportunitiesStatic power dominant (laser, thermal)Destructive reads: fast wired orSlide7
© Hill, Lipasti
7
Nanophotonics
Nanophotonics overviewSharing the nanophotonic channelLight-speed arbitration [MICRO 09]Utilizing the nanophotonic channelAtomic coherence [HPCA 11]Slide8
Corona substrate
[ISCA08]
Targeting Year 2017
Logically a ring topologyOne concentric ring per node3D stacked: optical, analog, digital
8Slide9
Multiple writer single reader (MWSR) interconnects
Arbitration
p
revents corruption of in-flight datalatchless/wave-pipelinedSlide10
Motivating an optical arbitration solution
MWSR Arbiter must be:
Global
- Many writers requesting accessVery fast – Otherwise bottleneckOptical arbiter avoids OEO conversion delays, provides light-speed arbitrationSlide11
Proposed optical protocolsToken-based protocols
Inspired by classic token ring
Token == transmission rights
Fits well with ring-shaped interconnectDistributed, Scalable(limited to ring)Slide12
Baseline
Based on traditional token protocols
Repeat token at each node
But data is not repeated!Poor utilization
Interconnect bubble
(grows linearly
with # of non-requesters)Slide13
Token -
Inject
Token - SeizeToken - Pass
Optical arbitration basics
W
aveguide
Ring resonator
Power
No Repeat!
Token latency bounded by the time of flight between requesters.Slide14
Token Slot
Token
Channel
Single Token / Serial Writes
Multiple Tokens / Simultaneous Writes
Arbitration solutions
Token passing allows token to pace transmission tail (no bubbles)
Token passing allows token to directly precede slotSlide15
Flow control and fairness
Flow Control:
Use token refresh as opportunity to encode flow control information (credits available)
Arbitration winners decrement credit countFairness:Upstream nodes get first shot at tokensNeed mechanism to prevent starvation of downstream nodesSlide16
Results - Performance
Uniform
HotSpot
Token Slot benefits from the availability of multiple tokens (multiple writers) fast turn-around time of flow-control mechanismSlide17
Results - Latency
Token Slot has the lowest latency and saturates at 80%+ load
Uniform
HotSpotSlide18
Optical arbitration summary
Arbitration speed has to match transfer speed for fine-grained communication
Arbiter has to be optical
High throughput is achievable85+% for token slotLimited to simple topologies (MWSR)Implementation challengesOpt-elec-logic-elec-opt in 200ps (@5GHz)Slide19
© Hill, Lipasti
19
Nanophotonics
Nanophotonics overview Sharing the nanophotonic channelLight-speed arbitration [MICRO 09]Utilizing the nanophotonic channelAtomic coherence [HPCA 11]Slide20
What makes coherence hard?
Unordered interconnects
split transaction buses,
meshes, etcSpeculation Sharer-prediction, speculative data use, etc.Multiple initiators of coherence requestsL1-to-L2, Directory Caches, Coherence Domains, etcState-event pair explosionVerification headacheSlide21
Example: MSI
(SGI-Origin-like, directory, invalidate)
21
Stable StatesSlide22
Example: MSI
(SGI-Origin-like, directory, invalidate)
22
Stable StatesBusy StatesSlide23
Example: MSI
(SGI-Origin-like, directory, invalidate)
23
Stable StatesBusy StatesRaces
“unexpected” events from concurrent requests to same blockSlide24
Cache coherence complexity
24
[
Lepak Thesis, ‘03]L2 MOETSI TransitionsSlide25
Cache coherenceverification headache
25
Papers:
So Many States, So Little Time:Verifying Memory Coherence in the Cray X1
Formal Methods:e.g. Leslie Lamport’s
TLA+ specification language @
Intel
Intel Core 2 Duo Errata:
AI39. Cache Data Access Request from One Core Hitting a Modified Line in the L1 Data Cache of the Other Core May Cause Unpredictable System Behavior
Complex Protocol
=
Complex Verification
Simple
SimpleSlide26
Atomic Coherence: Simplicity
26
w/ races
w/o racesSlide27
Race resolution
Cause:
Concurrently active coherence requests to block A
Remedy: Only allow one coherence request to block A to be active at a time.27
Core 0
$CACHE$
Core 1
$CACHE$
A
ASlide28
Race resolution
28
Core 0
$CACHE$
Core 1
$CACHE$
A
Atomic Substrate
Coherence SubstrateSlide29
Race resolution
29
Atomic Substrate
Coherence Substrate
-- Atomic Substrate is on critical path
+ Can optimize substrates separatelySlide30
Atomic & Coherence Substrates
30
Coherence Substrate
Atomic Substrate
(Apply Fancy Nanophotonics Here)
(Add speculation to a traditional protocol)
aggressive
aggressiveSlide31
Mutexes circulate on ring
31
Single out mutex:
hash(
addr
X)
λ
Y
@ cycle
ZSlide32
Mutex acquire
[Requesting Mutex]
[Requesting Mutex]
[Won Mutex]
32
Exploits OFF-resonance rings: mutex passes P1, P2 uninterruptedSlide33
[Requesting Mutex]
[Won Mutex]
[Release Mutex]
[Requesting Mutex][Won
Mutex]
Mutex
release
33Slide34
Mutexes on ring
34
Detectors
Injectors
1 mutex = 200
ps
= ~2cm = 1 cycle @ 5 GHz
2 cm
# Mutex
Latency To:
seize free mutex : ≤
4
cycles
tune ring resonator: <
1
cycleSlide35
35
Static:
Dynamic:
(random tester)* Atomic Coherence reduces complexityAtomic Coherence: ComplexitySlide36
Performance
36
(128 in-order cores, optical data interconnect, MOEFSI directory)
Slowdown relative to non-atomic MOEFSIWhat is causing the slowdown?
coherence
agnosticSlide37
Optimizing coherence
37
O.wned
and F.orward State: Responsible for satisfying on-chip read missesOpportunity: Try to keep O/F alive If O (or F
) block evicted: While mutex is held, ‘shift’ O/F state to sharer
Observation:
Holding Block B’s
mutex
gives holder free
reign over
coherence activity related to block B
(or hand-off responsibility)Slide38
Optimizing coherence
38
If
O (or F) block evicted: ‘Shift’ O/F state to sharer
# L2 transitions
(b/c less variety in sharing possibilities)
Speedup relative to atomic MOEFSI
Complexity:
Performance:Slide39
Atomic Coherence Summary
Nanophotonics
as enabler
Very fast chip-wide consensusAtomic Protocols are simpler protocolsAnd can have minimal cost to performance (w/ nanophotonics)Opportunity for straightforward protocol enhancements: ShiftFMore details in HPCA-11 paperPush protocol (update-like)39races
coherenceSlide40
© Hill, Lipasti
40
Nanophotonics
Nanophotonics overviewSharing the nanophotonic channelLight-speed arbitration [MICRO 09]Utilizing the nanophotonic channelAtomic coherence [HPCA 11]