A community white paper http craorgcccdocsinit21stcenturyarchitecturewhitepaperpdf Technion Haifa Israel June 2013 Information amp Commun Techs Impact Semiconductor Technologys Challenges ID: 781571
Download The PPT/PDF document "21st Century Computer Architecture" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
21st CenturyComputer Architecture A community white paperhttp://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf
Technion
, Haifa Israel, June 2013
Information &
Commun
. Tech’s Impact
Semiconductor Technology’s Challenges
Computer Architecture’s Future
Example: Bypassing Paged Virtual Memory
Slide2White Paper Participants“*” contributed prose; “**” effort coordinator
Thanks of CCC, Erwin
Gianchandani
& Ed Lazowska for guidance and Jim Larus & Jeannette Wing for feedback
2
Sarita
Adve
, U Illinois *
David H. Albonesi, Cornell U
David Brooks, Harvard U
Luis
Ceze
, U Washington *
Sandhya
Dwarkadas
, U Rochester
Joel
Emer
, Intel/MIT
Babak
Falsafi
, EPFL
Antonio
Gonzalez
, Intel/UPC
Mark D. Hill, U Wisconsin *,**
Mary Jane Irwin, Penn State U *
David
Kaeli
, Northeastern U *
Stephen W.
Keckler
, NVIDIA/U Texas
Christos
Kozyrakis
, Stanford U
Alvin
Lebeck
, Duke U
Milo Martin, U Pennsylvania
José F.
Martínez
, Cornell U
Margaret
Martonosi
, Princeton U *
Kunle
Olukotun
, Stanford U
Mark
Oskin
, U Washington
Li-
Shiuan
Peh
, M.I.T.
Milos
Prvulovic
, Georgia Tech
Steven K. Reinhardt, AMD
Michael Schulte, AMD/U Wisconsin
Simha
Sethumadhavan
, Columbia U
Guri
Sohi
, U Wisconsin
Daniel
Sorin
, Duke U
Josep
Torrellas
, U Illinois *
Thomas F.
Wenisch
, U Michigan *
David Wood, U Wisconsin *
Katherine
Yelick
, UC Berkeley/LBNL *
Slide320th Century ICT Set UpInformation & Communication Technology (ICT)Has Changed Our World<long list omitted>Required innovations in algorithms, applications, programming languages, … , & system software
Key (invisible) enablers (cost-)performance gains
Semiconductor technology (“Moore’s Law”)
Computer architecture (~80x
per Danowitz et al
.)
3
Slide4Enablers: Technology + Architecture4
Danowitz
et al., CACM 04/2012,
Figure 1
Technology
Architecture
Slide521st Century PromiseICT Promises Much MoreData-centric personalized health careComputation-driven scientific discoveryHuman network analysis
Much more: known & unknown
Characterized by
Big DataAlways OnlineSecure/Private…
Whither enablers of future (cost-)performance
gains?
5
Slide6Technology’s Challenges 1/2Late 20th Century
The New Reality
Moore’s Law
—
2
×
transistors/chip
Transistor count still 2×
BUT…
Dennard
Scaling —~constant power/chip
Gone.
Can’t repeatedly
double
power/chip
6
Slide7Classic CMOS Dennard Scaling: the Science behind Moore’s Law
7
National Research Council (NRC) – Computer Science and Telecommunications Board (CSTB.org)
Scaling:
Oxide:
t
OX
/
a
Results:
Power Density:
Voltage:
V/
a
Power/ckt:
1/
a
2
~Constant
(Finding 2)
Source: Future of Computing Performance: Game Over or Next Level?,
National Academy Press, 2011
Slide8Power Density:~Constant
Post-classic CMOS
Dennard
Scaling
8
National Research Council (NRC) – Computer Science and Telecommunications Board (CSTB.org)
Scaling:
Oxide:
t
OX
/
a
Results:
Voltage:
V/
a
V
Power/ckt:
1
a
2
1/
a
2
Post Dennard CMOS Scaling Rule
TODO:
C
hips w/ higher power (no), smaller
(
)
, dark silicon
()
, or other (?)
Slide9Technology’s Challenges 2/2Late 20th Century
The New Reality
Moore’s Law —
2× transistors/chip
Transistor count still 2× BUT…
Dennard
Scaling —~constant power/chip
Gone.
Can’t repeatedly
double
power/chip
Modest (hidden)
transistor unreliability
Increasing
t
ransistor unreliability
can’t be hidden
Focus on computation over communication
Communication (energy)
more
expensive than computation
1-time
costs amortized via mass market
One-time cost
much worse
&
want
specialized
platforms
9
How should architects step up as technology falters?
Slide1021st Century Comp Architecture
20
th
Century
21st
Century
Single-chip
in
generic computer
Architecture as
Infrastructure
:
Spanning
s
ensors
to clouds
Performance
plus security, privacy, availability, programmability, …
Cross-Cutting:
Break current layers with new interfaces
Performance
via invisible instr.-level parallelism
Energy
First
Parallelism
Specialization
Cross-layer design
Predictable technologies
: CMOS, DRAM,
& disks
New
technologies
(
non-volatile
memory,
near-threshold,
3D,
photonics,
…) Rethink: memory & storage, reliability, communication
10XX
Slide1121st Century Comp Architecture
20
th
Century
21st
Century
Single-chip
in
stand-alone computer
Architecture as
Infrastructure
:
Spanning
s
ensors
to clouds
Performance
plus security, privacy, availability, programmability, …
Cross-Cutting:
Break current layers with new interfaces
Performance
via invisible instr.-level parallelism
Energy
First
Parallelism
Specialization
Cross-layer design
Predictable technologies
: CMOS, DRAM,
& disks
New
technologies
(
non-volatile
memory,
near-threshold,
3D,
photonics,
…) Rethink: memory & storage, reliability, communication
11
Slide12What Research Exactly?Research areas in white paper (& backup slides)Architecture as Infrastructure: Spanning Sensors to CloudsEnergy FirstTechnology Impacts on Architecture
Cross-Cutting Issues &
Interfaces
Much more research developed by future PIs!
E.g.: Efficient Virtual Memory
for
Big Memory
Servers
Basu
,
Gandhi
,
Chang
,
Hill,
& Swift [ISCA 2013]Big Memory: graph500, memcached, databases
Self-manage most memory (e.g.,
bufferpool
)
12
Slide1310/5/1213Execution Time Overhead: TLB Misses
Significant waste
Larger memory?
Byte-
addr
NVM?
Lower is better
Slide14Hardware: Direct SegmentOFFSETBASE LIMIT
VA
Conventional
P
aging
P
A
1
2
Direct Segment
Why Direct Segment?
Matches Big Memory Workload needs
NO Paging => NO TLB Miss
Slide15Execution Time Overhead: TLB Misses10/5/12
15
92-100% TLB “misses” to direct segment
Requires:
Both
small SW + small HW changes
Slide1621st CenturyComputer Architecture A community white paperhttp://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf
Technion
, Haifa Israel, June 2013
Information &
Commun
. Tech’s Impact
Semiconductor Technology’s Challenges
Computer Architecture’s Future
Example: Bypassing Paged Virtual Memory
Slide17Pre-Competitive Research JustifiedRetain (cost-)performance enabler to ICT revolutionhttp://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdfSuccessful companies cannot do this by themselvesLack needed long-term focus
Don’t want to pay for what benefits all
Resist transcending interfaces that define their products
17
Slide18White Paper ProcessLate March 2012CCC contacts coordinator & forms groupApril 2012Brainstorm (meetings/online doc)Read related docs (PCAST, NRC Game Over, ACAR1/2, …)Use online doc for intro & outline then parallel sectionsRotated authors to revise sections
May 2012
Brainstorm list of researcher in/out of comp. architecture
Solicit researcher feedback/endorsementDo distributed revision & redo of introRelease May 25 to CCC & via email
Kudos to participants on executing on a tight timetable
18
Slide19Back Up SlidesDetailed research areas in white paperArchitecture as Infrastructure: Spanning Sensors to CloudsEnergy FirstTechnology Impacts on Architecture
Cross-Cutting
Issues &
Interfaceshttp://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf
Findings on National Academy “Game Over” Study
Glimpse at DARPA/ISAT Workshop
“Advancing
Computer Systems without Technology Progress
”
19
Slide201. Architecture as Infrastructure: Spanning Sensors to CloudsBeyond a chip in a generic computerTo pillar of 21st century
societal infrastructure.
Computation
in context (sensor, mobile, …, data center)
Systems often large & distributedCommunication issues can dominate computationGoals beyond performance
(battery
life, form factor
)
Opportunities (not exhaustive)
Reliable sensors harvesting (intermittent) energy
Smart phones to Star Trek’s medical “
tricorder
”
Cloud infrastructure suitable for both “Big Data” streams
& low-latency qualify-of-service with stragglers
Analysis & design tools that scale20
Slide212. Energy FirstBeyond single-core performance computerTo (cost-)performance per watt/joule Energy across the layersCircuit/technology (near-threshold CMOS, 3D stacking)
Architecture (reducing unnecessary data movement)
Software (communication-reducing algorithms)
Parallelism to save energyVast (fined-grained) homogeneous & heterogeneousImproved SW stack
Applications focus (beyond graphic processing units)Specialization for performance & energy efficiencyAbstractions for specialization (reducing 1-time cost)
Energy-efficient memory hierarchies
Reconfigurable logic structures
21
Slide223. Technology Impacts on ArchitectureBeyond CMOS, Dram, & Disks of last 3+ decades toUsing replacement circuit technologiesSub/near-threshold CMOS, QWFETs, TFETs, and QCAsNon-volatile storage
Beyond flash memory to STT-RAM
, PCRAM,
& memristor3D die stacking & interposerslogic, cache, small main memoryPhotonic interconnects
Inter- & even intra-chipDesign automationfrom circuit-design w/ new technologies to
pre-RTL functional, performance, power, area modeling of heterogeneous chips & systems
22
Slide234. Cross-Cutting Issues & InterfacesBeyond performance w/ stable interfaces toNew design goals (for pillar of societal infrastructure)Verifiability (bugs kill)R
eliability (“dependability” computing base?)
Security/Privacy (w/ non-volatile memory?)
Programmability (time to correct-performant solution)
Better InterfacesHigh-level information (quality of service, provenance)Parallelism ((
in)dependence, (lack of) side-effects)
Orchestrating communication ((recursive) locality)
Security/Reliability (fine-grain protection)
23
Slide24Executive summary (Added to National Academy Slides)Highlights of National Academy Findings(F1) Computer hardware has transitioned to multicore(F2) Dennard scaling of CMOS has broken down(F3) Parallelism
and locality must be exploited by
software
(F4) Chip power will soon limit multicore scalingEight recommendations from algorithms to education
We know all of this at some level, BUT:
A
re
we all acting on this knowledge or hoping for business as usual
?
Thinking beyond next paper to where future value will be created?
Questions Asked but Not Answered Embedded in NA Talk
Briefly Close with
Kübler
-Ross Stages of Grief:
Denial
…
Acceptance
Source: Future of Computing Performance: Game Over or Next Level?,
National Academy Press, 2011Mark Hill talk (http://www.cs.wisc.edu/~markhill/NRCgameover_wisconsin_2011_05.pptx)
Slide25The Graph25
System Capability (log)
8
0s
90s
0
0s
10s
2
0s
3
0s
40s
CMOS
Fallow Period
New Technology
Our Focus
5
0s
Source
: Advancing Computer Systems without Technology Progress,
ISAT
Outbrief
(http://www.cs.wisc.edu/~markhill/papers/isat2012_ACSWTP.pdf)
Mark
D. Hill and Christos
Kozyrakis
, DARPA/ISAT
Workshop, March 26-27, 2012
.
Approved
for Public Release, Distribution
Unlimited
The
views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.
Surprise 1 of 2Can Harvest in the “Fallow” Period!2 decades of Moore’s Law-like perf./energy gains
Wring out inefficiencies used to harvest Moore’s Law
HW/SW
Specialization/Co-design (3-100x)
Reduce SW Bloat (2-1000x)Approximate Computing (2-500x)
---------------------------------------------------
~1000x = 2 decades of Moore’s Law!
26
Slide27“Surprise” 2 of 2Systems must exploit LOCALITY-AWARE parallelismParallelism Necessary, but not SufficientAs communication’s energy costs dominateShouldn’t be a surprise, but many are in denial
Both surprises hard
,
requiring “vertical cut” thru SW/HW
27