/
Technology Impacts from the New Wave of Architectures for Media-rich Workloads Technology Impacts from the New Wave of Architectures for Media-rich Workloads

Technology Impacts from the New Wave of Architectures for Media-rich Workloads - PowerPoint Presentation

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
374 views
Uploaded On 2018-03-12

Technology Impacts from the New Wave of Architectures for Media-rich Workloads - PPT Presentation

Samuel Naffziger AMD Corporate Fellow June 14 th 2011 VLSI Technology Symposium 2011 Introduction The new workloads and demands on computation Characteristics of serial and parallel computation ID: 648192

cpu gpu technology voltage gpu cpu voltage technology performance metal parallel video device high amp compute power data content

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Technology Impacts from the New Wave of ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Technology Impacts from the New Wave of Architectures for Media-rich Workloads

Samuel Naffziger

AMD Corporate Fellow June 14th, 2011

VLSI Technology Symposium 2011Slide2

Introduction

The new workloads and demands on computationCharacteristics of serial and parallel computation

The Accelerated Processing Unit (APU) architectureAPU architecture implications for technologySummary

OutlineSlide3

Now: Parallel/Data-Dense

16:9 @ 7 megapixels

HD video

flipcams

, phones,

webcams (1GB)

3D Internet apps and HD video

online, social networking w/HD files3D Blu-ray HDMulti-touch, facial/gesture/voice recognition + mouse & keyboardAll day computing (8+ Hours)

Immersive and interactive performance

Workloads

The Big Experience/Small Form Factor Paradox

Mid 2000s

4:3 @ 1.2 megapixels

Digital cameras, SD webcams (1-5 MB files)

WWW and streaming SD video

DVDs

Mouse & keyboard

3-4 Hours

Standard-definition

Internet

Technology

Mid 1990s

Display

4:3 @

0.5 megapixel

Content

Email, film & scanners

Online

Text and low

res photos

Multimedia

CD-ROMInterfaceMouse & keyboardBattery Life*1-2 Hours

Form

Factors

Early

Internet and Multimedia

Experiences

*Resting battery life as measured with industry standard tests.Slide4

Focusing on the experiences that matter

Email

Web browsing

Office productivity

Listen to music

Online chat

Watching online video

Photo editingPersonal financesTaking notesOnline web-based gamesSocial networkingCalendar managementLocally installed gamesEducational appsVideo editingInternet phoneConsumer PC Usage

0%

20%

40%

60%

80%

100%

New

Experiences

Immersive

Gaming

Simplified Content Management

Accelerated Internet and HD Video

Source: IDC's 2009 Consumer PC Buyer SurveySlide5

People Prefer Visual Communications

Visual

Perception

Verbal

Perception

Words are processed

at only 150 words

per minutePictures and videoare processed 400 to2000 times fasterAugmenting Today’s Content:Rich visual experiencesMultiple content sources Multi-DisplayStereo 3DSlide6

Communicating

IM, Email,

Facebook Video Chat, NetMeeting

Gaming

Mainstream Games

3D games

The Emerging World of New Data Rich Applications

ArcSoft TotalMedia® Theatre 5ArcSoft MediaConverter® 7

CyberLink

Media

Espresso 6

CyberLink

Power

Director 9

Corel

VideoStudio

Pro

Corel

Digital Studio

2010

Internet

Explorer 9

Microsoft®

PowerPoint® 2010 Windows Live EssentialsCodemastersF1 2010

Nuvixa

Be Present

ViVu

Desktop

Telepresence

ViewdleUploader Using photos Viewing& Sharing Search, Recognition, Labeling? Advanced Editing Using video DVD, BLU-RAY™, HD Search, Recognition, Labeling Advanced Editing & MixingThe Ultimate Visual Experience™Fast Rich Web content, favorite HD Movies, games with realistic graphics Music Listening and Sharing Editing and Mixing Composing and compositingSlide7

New Workload Examples:

Changing Consumer Behavior

7

24

hours

of video

uploaded to YouTube

every minute50 million +digital media filesadded to personal content librariesevery dayApproximately9 billionvideo files owned are high-definition1000 imagesare uploaded to Facebook

every secondSlide8

What Are the Implications for Computation?

Insatiable demand for high bandwidth processingVisual image processingNatural user interfaces

Massive data mining for associative searches, recognitionSome of these compute needs can be offloaded to servers, some must be done on the mobile device Similar compute needs and massive growth in both spaces

How must CPU architecture change to deal with these trends?Slide9

Parallel and Serial Computation

i=0

i++

load x(i)

fmul

store

cmp i (16)

bc…Loops, branches and conditional evaluationSerial Code

Conditional

branches

i=0

i++

load x(i)

fmul

store

cmp i (

1000000

)

bc

i,j

=0

i++j++load x(i,j)fmulstorecmp j (100000)bccmp i (100000)bc2D array representingvery large datasetLoop 1M times for 1M pieces of dataDataParallel CodeSlide10

GPU/CPU Design Differences

CPU (Serial compute)

GPU (parallel compute)Slide11

Three Eras of Processor Performance

Single-Core Era

Single-thread Performance

?

Time

we are

here

oEnabled by:Moore’s LawVoltage & Process ScalingMicro ArchitectureConstrained by:PowerComplexityMulti-Core EraThroughput PerformanceTime(# of Processors)

we are

here

o

Enabled by:

Moore’s Law

Desire for Throughput

20 years of SMP arch

Constrained by:

Power

Parallel SW availabilityScalability

Heterogeneous

Systems Era

Targeted Application

Performance

Time(Data-parallel exploitation)

we arehere

o

Enabled by:Moore’s LawAbundant data parallelismPower efficient GPUsTemporarily constrained by:Programming modelsCommunication overheadsWorkloadsSlide12

Heterogeneous Computing with an APU Architecture

CPU Cores

GPU

UVD

SB Functions

~

7 GB/sec

~17 GB/sec

UNB

MC

~17 GB/sec

DDR3 DIMM

Memory

CPU Chip

FCH Chip

PCIe

®

Bandwidth pinch points and latency hold back the GPU capabilities

Integration Provides Improvement

Eliminate power and latency of extra chip crossing

3X

bandwidth between GPU and Memory!

Same

sized GPU is substantially more

effectivePower efficient, advanced technology for both CPU and GPUGraphics requires memory BW to bring full capabilities to life~27 GB/sec~27 GB/secDDR3 DIMMMemoryAPU ChipPCIe2010 IGP-based

(“Danube”) Platform

2011 APU-based

(“Llano”) Platform

GPU

CPU Cores

UVD

UNB / MCGPUOptionalSlide13

The Challenges of Integration

Thick, fast metal

Big devices

Dense, thin metal, small devices

Performance

CPU flop

area = 2.14

GPU flop area = 1.0CPU GPU Flop count for 4 Llano CPU cores=0.66MFlop count for Llano GPU =3.5MDensitySlide14

With the 20nm node, even local metal will be seeing large RC increase  compromises more difficult

How to Balance the Metal Stack?

Cu Resistivity without barrier

With barrier

1.5

1.6

1.7

1.8

1.9

2

2.1

2.2

2.3

2.4

2.5

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Line Width (um)

Resistivity (uohm-cm)

Add metal layers?

Thin, dense layers for the GPU

Thick, low resistance layers for the CPU

Cost issues?

Via resistance?Technology improvements in BEOL are requiredPerformanceCPU GPU DensitySlide15

Device Optimization

Performance

CPU

GPU

To achieve breakthrough APU performance, the Llano GPU has ~5X the flops and ~5X the device count of the CPUs

Device

Ioff

Broader span of devices required CPUGPUdesired device rangeSpeed vs. LeakageRO speedLVTLC-RVTRVTHVTLC-HVTA broader device suite is requiredSlide16

Balanced workload

GPU-centric data parallel workload

CPU-centric serial workload

Power Transfers

Temperature

Voltage range is critical to enabling the efficient power transfers that make for compelling APU performance Slide17

Operating Voltage Range

Operating voltage requirements:

Low voltage necessary for power efficiency

High voltage necessary for a snappy user experience enabled by turbo modeSlide18

Operating Voltage Challenges

To maintain cost effective performance growth with technology node, the GPU must:

Hold power density constant

Exploit density gains to add compute units

This necessarily drives operating voltage down

This would be good for energy efficiency except …

Variation impacts are much greater at low voltageSlide19

The Operating Voltage Challenge

Many barriers to maintaining both high and low voltage as technology scales

TDDB vs. SCE control

ULK breakdown vs. denser pitches

Variation control

BOX

Poly

Fin

Current

Flow ->

S

D

FD devices should enable maintaining the functional range for a generation or two

Will turbo modes be too compromised?

What’s next?Slide20

3D Integration to the Rescue?

Through

Silicon

Vias

(TSVs)

CPU Die

Metal Layers

GPU Die

Metal Layers

Analog Die (SB, Power)

Metal Layers

Metal Layers

TIM (Thermal Interface Material)

Heat Sink

DRAM

Micro-bumps

Package Substrate

DRAM

South Bridge

Stacking offers many attractive benefits

Higher bandwidth to local memory

Enables parallel and serial compute die to be in their own separate optimized technology – interconnect speed vs. density, device optimization etc.

Allows

IO and

southbridge

content to remain in older, more analog-friendly technologySlide21

3D Integration Challenges

Economical 3D stacking in high volume manufacturing presents many challenges

Benefits must exceed the additional costs of TSVs, and yield fallout

Logistics of testing and assembling die from multiple sources can be immense

Countless mechanical and thermal issues to

solve in high volume mfg

Clearly 3D provides compelling solutions to many problems, but the barriers to entry mean heavy R&D $$ and partnerships requiredSlide22

Summary

Insatiable demand for high bandwidth computationVisual image processingNatural user interfacesMassive data mining for associate searches, recognition

Some of these compute needs can be offloaded to servers, some must be done on the mobile deviceSimilar compute needs and massive growth in both spacesCombined serial and parallel computation architectures are key in both spacesHuge technology challenges to meeting this opportunityInterconnect scaling is hitting a wall that must be overcomeA broad device suite is necessary that operates efficiently at low voltage while enabling high speed for response time3D integration offers a promising long term solution