/
The Intersection of The Intersection of

The Intersection of - PowerPoint Presentation

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
420 views
Uploaded On 2016-03-13

The Intersection of - PPT Presentation

Game Engines amp GPUs Current amp Future Johan Andersson Rendering Architect 25 Agenda Goal Share and discuss current amp future graphics use cases in our games and implications for graphics hardware ID: 253562

gpu amp shader texture amp gpu texture shader culling future triangle geometry rendering ps3 shaders dynamic parallel cpu occlusion

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The Intersection of" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The Intersection of

Game Engines & GPUs:Current & Future

Johan AnderssonRendering Architect

2.5Slide2

Agenda

GoalShare and discuss current & future graphics use cases in our games and implications for graphics hardwareAreasEngine overviewShadersParallelizationTexturingRaytracingGPU computeConclusionsQ & ASlide3

Frostbite

DICE proprietary engineXbox 360PS3Windows (Direct3D 10)FocusLarge outdoor environmentsSingleplayer & multiplayerDestruction!New: Content workflowsSlide4

BFBC screenshotSlide5

BFBC screenshotSlide6
Slide7

Graph-based surface shaders

Artist-friendly Easy to create, tweak & manageFlexibleProgrammers & artists can extend & expose featuresData-centricEncapsulates resourcesTransformable

Rich high-level shading frameworkUsed by all content & systemsSlide8
Slide9

Shader permutations

Generate shader permutationsFor each used combination of features/dataHLSL vertex & pixel shadersMany features = permutation explosionShader graphs, lighting, geometryBalance perf. vs permutations vs featuresDynamic branchingLive with many permutationsSlide10

Shader subroutines

Next step: Static subroutine linkingInline in all subroutines at call siteSimilar to a switch statementReduces # permutations Implementation moved to driver or GPUDoesn’t work with instancingFuture step: Dynamic subroutinesControl function pointers inside shaderProblem solved, but coherency importantSlide11

Rendering & ParallelizationSlide12

Jobs

Must utilize multi-core6 HW threads on Xbox 3606 SPUs on PS32-8 cores on PCJob definitionFully independent stateless functionPS3 SPU requirementGraph dependenciesTask-parallel and data-parallelSlide13

Rendering jobs

Refactor rendering systems to jobsMost will move to GPUEventuallyOne-way data flowCompute shaders & stream outputJobsDecal projectionParticle simulationTerrain geometry processingUndergrowth generation [2]Frustum cullingOcclusion culling

Command buffer generationPS3: Triangle cullingSlide14

Parallel command buffer recording

Dispatch draw calls and state to multiple command buffers in parallelScales linearly with # cores1500-4000 draw calls per frameSuper-important for all platforms, used on:Xbox 360PS3 (SPU-based)No support in DX10!Slide15

DX10 parallel command buffer rec.

Single most important DX10 issue For us and many others (in the future)Until future API supportReduce draw calls with instancingTrade GPU performance for CPU performanceReduce state & constant updatesSlow dynamic constant path Manual software command buffers Difficult to update dynamic resources efficiently in parallel due to APISlide16

PS3 geometry processing (1/2)

Slow GPU triangle & vertex setup Unique situation with ”free” processorsNot fully utilizedSolution: SPU triangle cullingTrade SPU time for GPU performanceCull back faces, micro-triangles, frustumSony PS3 EDGE library5 jobs processes frame geometry in parallelOutput is new index buffer for each draw callSlide17

PS3 geometry processing (2/2)

Great flexibility and programmability!Custom processingPartition bounding box cullingTriangle part cullingClip plane triangle trivial accept & rejectTriangle cull volumes (inverse clip planes)Future: No vertex & geometry shadersDIY compute shaders with fixed-func tesselation and triangle setup unitsOutput buffer streaming still importantSlide18

Occlusion culling

Buildings occlude objectsTons of objectsDifficult to implementBuilding destructionDynamic occludeesHeavy GPU occlusion queriesInvisible objects still have toUpdate logic & animationsGenerate command bufferProcessed on CPU & GPUSlide19

Software occlusion culling

Solution: Rasterize course zbuffer on SPU/CPULow-poly occluder meshes100m view distanceMax 10000 vertices/frameManually conservative256x114 float z-bufferCreated for PS3, now on allCull all objects against zbufferBefore passed to all other systems = big savingsScreen-space bbox testSlide20

GPU occlusion culling

Want GPU rasterization & testing, but:Occlusion queries introduces overhead & latencyCan be manageable, not idealConditional rendering only helps GPUNot CPU, frame memory or draw callsFuture1: Low-latency extra GPU exec contextRasterization and testing done on GPULockstep with CPUFuture2: Move entire cull & rendering to GPUScene graph, cull, systems, dispatch. End goal.Slide21

TexturingSlide22

Texture formats

UsingDXT1/5 color maps, sRGBBC5 (3Dc) normal mapsBC4 (DXT5A) for grayscale maskssRGB support for BC4/5 would be niceDXT1 replacement neededLow quality565 color bleedingRG/RGB masks compresses badlyHDR envmaps & lightmaps

RGB DXT1 mask

DXT color bleedSlide23
Slide24

Future texture sampling

Texture sampling derivatives1st order texel derivatives2nd order as well?Implement in sampler unitBad performance or quality with shader sampling Artifacts with ddx/ddy techniqueReplace normalmaps with easily compressed bumpmapsBicubic upsamplingTerrain masks

Terrain heightmap

Derived normals [2]Slide25
Slide26

Current sparse textures

Save memory for terrainStatic quadtree mask textureDynamic sparse destruction maskImplementationIndirection texture lookup in atlasArrays too small, want 8192 slicesCorrect bilinear filtering by bordersSiggraph’07 course for details [2]

Source mask

Atlas textureSlide27

HW sparse textures

Virtual textureHW texture filtering & mipmappingFallback on non-resident tile access Lower mipmap, default value or shader boolAt least 32k x 32k, fp issues with larger?Application-controlled tile commit/free~128 x 128 tilesFeedback mechanism for referenced tilesEasy view-dependent allocationFuture: Latency-free allocation & generationAlt1. CPU thread callback & blockAlt2. Keep everything on GPU. ”Command” shader?Slide28

Cached Procedural Unique Texturing

Unique dynamic sparse texture on all objects Defined by texture shader graphCombine procedurals, compositing, streaming and uv-space geometryDynamically commit & render visible tilesHighly complex compositingThanks to high frame-to-frame coherencyUpsample and refineNew dynamic effects made possibleAffect every surfaceSlide29

RaytracingSlide30

Raytracing

Much recent debate & interest in RTRTWhat we are interested in:Performance!! Rasterization for primary raysDeterministicEasy integration into enginesJust another method for certain effects & objectsNot replace whole pipeline Efficient dynamic geometryProcedural & manual animation (foliage, characters)Destruction (foliage, buildings, objects)Slide31

Mirror’s EdgeSlide32

Raytraced reflections wanted

Glass & metalMostly planar surfacesReflection localityCorrect reflections for important objectsMain characterSimplified world geometry & shading for restCommon for gamesBrickmaps? [3]Slide33

Soft reflections

Mirror’s EdgeSlide34

GPGPUSlide35

GPGPU uses

Effect physicsParticle vs world soft collisionAI pathfindingAI visibilityView rasterization. Obstruction from smoke & foliageProcedural animationTrees, undergrowth, hairPost-processingSlide36

CUDA DOF post-process filter

Circle of confusion mapThesis work at DICE [4]Test CUDA and performancePoisson disc blur

Multi-passed diffusionSeperable diffusionGood:Easy to learn (C)Map complex algorithmsThread & memory controlBad:Performance vs shaders

Beta interopVendor-specific

OutputSlide37

GPU Compute programming model

Wanted:Easy & efficient Direct3D 10 interopLow-latency Compute tasksVendor-independent base interfaceOpenCL?Efficient CPU multi-core backendServer, older GPUs, debuggingMCUDA [5]Eventually platform-independentFuture consolesSlide38

Conclusions

Shader subroutinesMore software-controlled pipelineMore texture sampler functionalityLimited-case raytracingGPU compute for gamesSlide39

Questions?

Contact: johan.andersson@dice.seSlide40

References

[1] Tartarchuk, Natasha & Andersson, Johan. ”Rendering Architecture and Real-time Procedural Shading & Texturing Techniques”. GDC 2007. Link[2] Andersson, Johan. ”Terrain Rendering in Frostbite using Procedural Shader Splatting”. Siggraph 2007. Link[3] Christensen, Per H. & Batali, Dana. "An Irradiance Atlas for Global Illumination in Complex Production Scenes“. Eurographics Symposium on Rendering 2004. Link

[4] Lonroth, Per & Unger, Mattias. ”Advanced Real-time Post-Processing using GPGPU techniques”. Master thesis, 2008.[5] John Stratton, Sam Stone, Wen-mei Hwu. "MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores". Technical report, University of Illinois at Urbana-Champaign, IMPACT-08-01, March, 2008. Slide41

Bonus slidesSlide42

Real-time REYES

Very interestingDisplacement mapping & proceduralsStochastic samplingPotentially more efficient & generalCompared to maxed out rasterization & tessellation on everything = pixel-sized trianglesButNo experience More research & experimentation neededSlide43

Terrain detail

Deriving normal from heightfield good in distanceFuture: HW tessellation & procedural displacement shaders for up close ground detailSlide44

Texture arrays

Use cases:Everything!Rich parameterized shadersVary slice index per instance, triangle or texel Instancing without comprimising on variation or perf.Cascaded shadow mapsHW PCF only in DX 10.1 Stable Cascaded Bounding Box Shadow MapsSparse texturesMore slices plzFor tile pools. 64x64x8192Slide45

Other raytracing uses

Global Illumination & Ambient OcclusionIncremental Photon Mapping?Async collision raycastsAI pathfinding, gameplay, sound obstructionSeperate collision world from visual worldCPU job-based now