/
Nathan Reed Nathan Reed

Nathan Reed - PowerPoint Presentation

kittie-lecroy
kittie-lecroy . @kittie-lecroy
Follow
417 views
Uploaded On 2016-03-06

Nathan Reed - PPT Presentation

VR Software Engineer GameWorks VR How is VR rendering different How is VR rendering different High framerate low latency 90 frames per second Motion to photons in 20 ms How is VR rendering different ID: 244287

rendering gpu multi latency gpu rendering latency multi sli compositor image api gpus stereo density shading resolution viewport rendered

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Nathan Reed" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Nathan Reed, VR Software Engineer

GameWorks VRSlide2

How is VR rendering different?Slide3

How is VR rendering different?High framerate, low latency

90 frames per second

Motion to photons in

≤ 20 msSlide4

How is VR rendering different?Stereo rendering

Two eyes, same sceneSlide5

How is VR rendering different?Correcting for lens distortion

Rendered image

Distorted imageSlide6

GameWorks VRSDK for VR headset and game developers

Increase rendering performance by putting your pixels where they count

Scale performance with multiple GPUs

Minimize

head-

tracking latency with asynchronous time warp

Plug-and-play

compatibility from GPU

to

headset

Reduce latency by rendering directly to the front bufferSlide7

DesignWorks VRExtra VR features for professional graphics

API for geometry and

i

ntensity adjustments for

seamless multi-monitor display

Provides tear-free

VR environments by

synchronizing scanout across GPUs

Fine-grained

control to pin OGL contexts to

specific GPUs

Reduces latency for video transfer to and from the GPU Slide8

VR SLISlide9

VR SLI

Two eyes...two GPUs!Slide10

Typical SLIGPUs render alternate frames

N

N+1

N

N+1

N

N+1

CPU

GPU 0

GPU 1

Display

LatencySlide11

VR SLIEach GPU renders one eye—lower latency

N

L

N+1

R

N

N+1

N

R

N+1

L

N

N+1

CPU

GPU 0

GPU 1

Display

LatencySlide12

VR SLIGPU affinity masking: full control

Shadow maps,

GPU physics,

etc.

Left eye rendering

Right eye renderingSlide13

VR SLIBroadcasting reduces CPU overhead

R

L

Render scene onceSlide14

VR SLI

Per-GPU constant buffers, viewports, scissors

L

R

Multi-GPU API

R

L

EngineSlide15

VR SLICross-GPU data transfer via PCI ExpressSlide16

VR SLIExplicitly control work distribution for up to 8 GPUsNot automatic—needs renderer integrationPublic beta very soonOpenGL extension too

—under NDA (ask us!)

Multi-GPU API extensions for DX11Slide17

Single-GPU Stereo RenderingSlide18

Single-GPU stereoGPU still has to draw twice—not much we can do thereCPU has to submit twice—can we solve that?

Reducing CPU overheadSlide19

Single-GPU stereoRecord rendering API calls in a command listReplay for each eye—minimal CPU overheadStore view matrix in a global constant buffer

DX11/12 command listsSlide20

Single-GPU stereoSame idea: record once, replay per eyeApp writes bytecode-like rendering commands to a bufferSubmit with one API callSpec:

GL_NV_command_list

For more info:

GPU-Driven Large Scene Rendering (GTC 2015)

GL_NV_command_listSlide21

Single-GPU stereoGenerate two instances of everythingVertex shader projects one instance to each eyeOne viewport + user clip planes to keep views separateTwo viewports + passthrough GS to route (pretty fast)

Or

GL_AMD_vertex_shader_viewport_index

(supported on recent NV GPUs too)

Stereo instancingSlide22

Multi-Resolution ShadingSlide23

VR headset opticsDistortion and counter-distortionSlide24

VR headset opticsDistortion and counter-distortion

User’s view

Image Displayed

OpticsSlide25

Distorted renderingRender normally, then resample

Rendered image

Distorted imageSlide26

Distorted renderingOver-rendering the outskirts

Rendered image

Distorted imageSlide27

Multi-resolution shadingSubdivide the image, and shrink the outskirtsSlide28

Multi-resolution shadingFast viewport broadcast on NVIDIA Maxwell GPUs

Viewport 1

Viewport 2

Viewport N

...

Geometry PipelineSlide29

Standard renderingMaximum density everywhere

Ideal pixel density

Rendered pixel densitySlide30

Multi-resolution shading25% pixels saved

Ideal pixel density

Rendered pixel densitySlide31

Multi-resolution shading50% pixels saved

Ideal pixel density

Rendered pixel densitySlide32

Multi-resolution shadingNeeds renderer integration (especially postprocessing)DX11 & OpenGL API extensions in developmentRequires Maxwell GPUGTX 900 series, Titan X, Quadro M6000

SDK coming soonSlide33

GPU Multitasking and VRSlide34

GPU multitaskingGPU is massively parallel under the hood (vertices, tris, pixels)Fed by a serial command buffer (state changes, draw calls)

How does it work?Slide35

GPU multitaskingMany running apps may want to use the GPUIncluding the desktop compositor!DX/GL contexts create command packetsWindows enqueues packets for GPU to execute

How does it work?Slide36

GPU multitaskingProblem: long packets can’t be interrupted (before Windows 10!)One app using GPU heavily can slow down entire system!Desktop compositor needs reliable 60 Hz for good experience

Cooperative more than preemptiveSlide37

GPU multitaskingExtra WDDM “node” (device) on which work can be scheduledGPU time-slices between nodes in 1ms intervalsBut current GPUs can only switch at draw call boundariesP

reempts & later resumes main node

Low-latency nodeSlide38

GPU multitaskingDesktop compositor uses low-latency nodeStill runs at 60 Hz, even if other apps don’tLow-latency

nodeSlide39

VR applicationsMust refresh at 90 Hz for good experienceHitches are really bad in VR!Need protection similar to desktop compositor

Also need reliable framerateSlide40

VR compositorOculus and Valve both use a VR compositor processVR apps submit frames to compositor; it owns the displayCombine multiple VR apps, layers, etc.Warp old frames for new head pose (asynchronous timewarp)

Safety: if app hangs/crashes, fall back to basic VR environment

A lot like the desktop compositorSlide41

Context priority APIVR compositor can use low-latency node tooDX11 extension API to create a low-latency contextFuture: take advantage of Win10 scheduling improvements

Enables control over GPU prioritizationSlide42

Direct ModeHide headset from OS—don’t extend desktop to itVR apps get exclusive access to displayLow-level access to video modes, vsync timing, flip chain

Plug-and-play compatibility for VR headsetsSlide43

Front buffer renderingNormally not accessible in DX11Direct Mode enables access to front bufferEnables advanced latency optimizations

For low-level wizardsSlide44

NVIDIA VR toolkits

GameWorks VR

DesignWorks

VR

Audience

HMD

&

Game Developers

HMD

& Application Developers

Environments

HMD

HMD,

CAVE

& Cluster Solutions

APIs

DirectX 11, OpenGL

DirectX 11,

OpenGL

FEATURES

VR SLI

Context Priority

Direct Mode

Front Buffer Rendering

Multi-Res Shading (Alpha)

Synchronization

GPU Direct for Video

Warp

& Blend

GPU Affinity

Slide45

nreed@nvidia.comQuestions?

pDevice->Flush();

Slides will be posted:

http://developer.nvidia.com