Shashwat Shriparv dwivedishashwatgmailcom InfinitySoft 2 10302010 Presentation Overview Definition Comparison with CPU Architecture GPUCPU Interaction GPU Memory 10302010 3 Why GPU ID: 673879
Download Presentation The PPT/PDF document "10/30/2010 1 GRAPHICS PROCESSING UNIT" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
10/30/2010
1
GRAPHICS PROCESSING UNIT
Shashwat Shriparv
dwivedishashwat@gmail.com
InfinitySoftSlide2
2
10/30/2010
Presentation Overview
Definition
Comparison with CPUArchitectureGPU-CPU InteractionGPU MemorySlide3
10/30/2010
3
Why GPU?
To provide a separate dedicated graphics resources including a graphics processor and memory.
To relieve some of the burden of the main system resources, namely the Central Processing Unit, Main Memory, and the System Bus, which would otherwise get saturated with graphical operations and I/O requests. Slide4
10/30/2010
4
There comes
GPUSlide5
10/30/2010
5
What is a GPU?
A Graphics Processing Unit
or GPU (also occasionally called Visual Processing Unit or VPU) is a dedicated processor efficient at manipulating and displaying computer graphics . Like the CPU (Central Processing Unit), it is a single-chip processor. Slide6
10/30/2010
6
HOWEVER,
The abstract goal of a GPU, is to enable a representation of a 3D world as realistically as possible. So these GPUs are designed to provide additional computational power that is customized specifically to perform these 3D tasks
. Slide7
10/30/2010
7
GPU vs CPU
A GPU is tailored for highly parallel operation while a CPU executes programs serially.
For this reason, GPUs have many parallel execution units , while CPUs have few execution units . GPUs have singificantly faster and more advanced memory interfaces as they need to shift around a lot more data than CPUs. GPUs have much deeper pipelines (several thousand stages vs 10-20 for CPUs).Slide8
10/30/2010
8
BRIEF HISTORY
First-Generation GPUs
Up to 1998; Nvidia’s TNT2, ATi’s Rage, and 3dfx’s Voodoo3;DX6 feature set.Second-Generation GPUs1999 -2000; Nvidia’s GeForce256 and GeForce2, ATi’s Radeon7500, and S3’s Savage3D; T&L; OpenGL and DX7;Configurable.
Third-Generation GPUs
2001; GeForce3/4Ti, Radeon8500, MS’s Xbox; OpenGL ARB, DX7/8; Vertex Programmability + ASMFourth-Generation GPUs2002 onwards; GeForce FX family, Radeon 9700; OpenGL+extensions, DX9; Vertex/Pixel Programability + HLSL; 0.13μ Process, 125M T/C, 200M T/S.
Fifth-Generation GPUs
- GeForce 8X:DirectX10. Slide9
10/30/2010
9
GPU Architecture
How many processing units?
How many ALUs?Do you need a cache?What kind of memory?Slide10
10/30/2010
10
GPU Architecture
How many processing units?Lots.
How many ALUs?Do you need a cache?What kind of memory?Slide11
10/30/2010
11
GPU Architecture
How many processing units?Lots.
How many ALUs?Hundreds.Do you need a cache?What kind of memory?Slide12
10/30/2010
12
GPU Architecture
How many processing units?Lots.
How many ALUs?Hundreds.Do you need a cache?Sort of.What kind of memory?Slide13
10/30/2010
13
GPU Architecture
How many processing units?Lots.
How many ALUs?Hundreds.Do you need a cache?Sort of.What kind of memory? very fast.Slide14
10/30/2010
14
The difference…….
Without GPU
With GPUSlide15
10/30/2010
15
The GPU pipeline
The GPU receives geometry information from the CPU as an input and provides a picture as an output
Let’s see how that happens…
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interfaceSlide16
10/30/2010
16
Details………..Slide17
10/30/2010
17
Host Interface
The host interface is the communication bridge between the CPU and the GPU.
It receives commands from the CPU and also pulls geometry information from system memory. It outputs a stream of vertices in object space with all their associated information (texture coordinates, per vertex color etc) .
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interfaceSlide18
10/30/2010
18
Vertex Processing
The vertex processing stage receives vertices from the host interface in object space and outputs them in screen space
This may be a simple linear transformation, or a complex operation involving morphing effectsNo new vertices are created in this stage, and no vertices are discarded (input/output has 1:1 mapping)
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interfaceSlide19
10/30/2010
19
Triangle setup
In this stage geometry information becomes raster information (screen space geometry is the input, pixels are the output)
Prior to rasterization, triangles that are backfacing or are located outside the viewing frustrum are rejected
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interfaceSlide20
10/30/2010
20
Triangle Setup (cont…..)
A pixel is generated if and only if its center is inside the triangle
Every pixel generated has its attributes computed to be the perspective correct interpolation of the three vertices that make up the triangleSlide21
10/30/2010
21
Pixel Processing
Each pixel provided by triangle setup is fed into pixel processing as a set of attributes which are used to compute the final color for this pixel
The computations taking place here include texture mapping and math operations
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interfaceSlide22
10/30/2010
22
Memory Interface
Pixel colors provided by the previous stage are written to the framebuffer
Used to be the biggest bottleneck before pixel processing took overBefore the final write occurs, some pixels are rejected by the zbuffer .On modern GPUs z is compressed to reduce framebuffer bandwidth (but not size).
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interfaceSlide23
10/30/2010
23
Programmability in GPU pipeline
In current state of the art GPUs, vertex and pixel processing are now programmable
The programmer can write programs that are executed for every vertex as well as for every pixelThis allows fully customizable geometry and shading effects that go well beyond the generic look and feel of older 3D applications
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interfaceSlide24
10/30/2010
24
GPU Pipelined Architecture
(simplified view)
Framebuffer
Pixel Shader
Texture Storage + Filtering
Rasterizer
Vertex Shader
Vertex Setup
CPU
Vertices
Pixels
GPU
…110010100100…Slide25
10/30/2010
25
GPU Pipelined Architecture
(simplified view)
GPU
One unit can limit the speed of the pipeline…
Framebuffer
Pixel Shader
Texture Storage + Filtering
Rasterizer
Vertex Shader
Vertex Setup
CPUSlide26
10/30/2010
26
CPU/GPU interaction
The CPU and GPU inside the PC work in parallel with each other
There are two “threads” going on, one for the CPU and one for the GPU, which communicate through a command buffer:
CPU writes commands here
GPU reads commands from here
Pending GPU commands Slide27
10/30/2010
27
CPU/GPU interaction (cont)
If this command buffer is drained empty, we are CPU limited and the GPU will spin around waiting for new input. All the GPU power in the universe isn’t going to make your application faster!
If the command buffer fills up, the CPU will spin around waiting for the GPU to consume it, and we are effectively GPU limitedSlide28
10/30/2010
28
Synchronization issues
In the figure below, the CPU must not overwrite the data in the “yellow” block until the GPU is done with the “black” command, which references that data:
CPU writes commands here
GPU reads commands from here
dataSlide29
10/30/2010
29
Inlining data
One way to avoid these problems is to inline all data to the command buffer and avoid references to separate data:
CPU writes commands here
GPU reads commands from here
However, this is also bad for performance, since we may need to copy several Mbytes of data instead of merely passing around a pointerSlide30
10/30/2010
30
GPU readbacks
The output of a GPU is a rendered image on the screen, what will happen if the CPU tries to read it?
CPU writes commands here
GPU reads commands from here
Pending GPU commands
GPU must be synchronized with the CPU, ie it must drain its entire command buffer, and the CPU must wait while this happensSlide31
10/30/2010
31
GPU readbacks (cont)
We lose all parallelism, since first the CPU waits for the GPU, then the GPU waits for the CPU (because the command buffer has been drained)
Both CPU and GPU performance take a nosediveBottom line: the image the GPU produces is for your eyes, not for the CPU (treat the CPU -> GPU highway as a one way street)Slide32
10/30/2010
32
About GPU memory…..Slide33
10/30/2010
33
Memory Hierarchy
CPU and GPU Memory Hierarchy
CPU Registers
Disk
CPU Caches
CPU Main
Memory
GPU Video
Memory
GPU Caches
GPU Constant
Registers
GPU Temporary
RegistersSlide34
10/30/2010
34
Where is GPU Data Stored?
Vertex buffer
Frame bufferTexture
Vertex Buffer
Vertex
Processor
Rasterizer
Fragment
Processor
Frame
Buffer(s)
TextureSlide35
10/30/2010
35
CPU memory vs GPU memory
CPU
GPU
Registers
Read/write
Read/write
Local Mem
Read/write stack
None
Global Mem
Read/write heap
Read-only during computation.
Write-only at end (to pre-computed address)
Disk
Read/write disk
NoneSlide36
10/30/2010
36
It looks like…..Slide37
10/30/2010
37
Some applications…..
Computer generated holography using a graphics processing unit
Improve the performance of CAD tools.Computer graphics in gamesSlide38
10/30/2010
38
New…..
NVIDIA's new graphics processing unit, the GeForce 8X ULTRA
, said to represent the very latest in visual effects technologies. Slide39
10/30/2010
39
THANK YOU
Shashwat Shriparv
dwivedishashwat@gmail.com
InfinitySoft