/
The Intersection of  Game Engines & GPUs: The Intersection of  Game Engines & GPUs:

The Intersection of Game Engines & GPUs: - PowerPoint Presentation

WiseWhale
WiseWhale . @WiseWhale
Follow
344 views
Uploaded On 2022-08-03

The Intersection of Game Engines & GPUs: - PPT Presentation

Current amp Future Johan Andersson Rendering Architect 25 Agenda Goal Share and discuss current amp future graphics use cases in our games and implications for graphics hardware Areas Engine overview ID: 933329

gpu amp shader texture amp gpu texture shader rendering triangle geometry command future dynamic buffer culling cpu shaders parallel

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The Intersection of Game Engines & ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The Intersection of

Game Engines & GPUs:Current & Future

Johan AnderssonRendering Architect

2.5

Slide2

Agenda

GoalShare and discuss current & future graphics use cases in our games and implications for graphics hardwareAreasEngine overviewShadersParallelizationTexturingRaytracingGPU computeConclusionsQ & A

Slide3

Frostbite

DICE proprietary engineXbox 360PS3Windows (Direct3D 10)FocusLarge outdoor environmentsSingleplayer & multiplayerDestruction!New: Content workflows

Slide4

BFBC screenshot

Slide5

BFBC screenshot

Slide6

Slide7

Graph-based surface shaders

Artist-friendly Easy to create, tweak & manageFlexibleProgrammers & artists can extend & expose featuresData-centricEncapsulates resourcesTransformable

Rich high-level shading frameworkUsed by all content & systems

Slide8

Slide9

Shader permutations

Generate shader permutationsFor each used combination of features/dataHLSL vertex & pixel shadersMany features = permutation explosionShader graphs, lighting, geometryBalance perf. vs permutations vs featuresDynamic branchingLive with many permutations

Slide10

Shader subroutines

Next step: Static subroutine linkingInline in all subroutines at call siteSimilar to a switch statementReduces # permutations Implementation moved to driver or GPUDoesn’t work with instancingFuture step: Dynamic subroutinesControl function pointers inside shaderProblem solved, but coherency important

Slide11

Rendering & Parallelization

Slide12

Jobs

Must utilize multi-core6 HW threads on Xbox 3606 SPUs on PS32-8 cores on PCJob definitionFully independent stateless functionPS3 SPU requirementGraph dependenciesTask-parallel and data-parallel

Slide13

Rendering jobs

Refactor rendering systems to jobsMost will move to GPUEventuallyOne-way data flowCompute shaders & stream outputJobsDecal projectionParticle simulationTerrain geometry processingUndergrowth generation [2]Frustum cullingOcclusion culling

Command buffer generationPS3: Triangle culling

Slide14

Parallel command buffer recording

Dispatch draw calls and state to multiple command buffers in parallelScales linearly with # cores1500-4000 draw calls per frameSuper-important for all platforms, used on:Xbox 360PS3 (SPU-based)No support in DX10!

Slide15

DX10 parallel command buffer rec.

Single most important DX10 issue For us and many others (in the future)Until future API supportReduce draw calls with instancingTrade GPU performance for CPU performanceReduce state & constant updatesSlow dynamic constant path Manual software command buffers Difficult to update dynamic resources efficiently in parallel due to API

Slide16

PS3 geometry processing (1/2)

Slow GPU triangle & vertex setup Unique situation with ”free” processorsNot fully utilizedSolution: SPU triangle cullingTrade SPU time for GPU performanceCull back faces, micro-triangles, frustumSony PS3 EDGE library5 jobs processes frame geometry in parallelOutput is new index buffer for each draw call

Slide17

PS3 geometry processing (2/2)

Great flexibility and programmability!Custom processingPartition bounding box cullingTriangle part cullingClip plane triangle trivial accept & rejectTriangle cull volumes (inverse clip planes)Future: No vertex & geometry shadersDIY compute shaders with fixed-func tesselation and triangle setup unitsOutput buffer streaming still important

Slide18

Occlusion culling

Buildings occlude objectsTons of objectsDifficult to implementBuilding destructionDynamic occludeesHeavy GPU occlusion queriesInvisible objects still have toUpdate logic & animationsGenerate command bufferProcessed on CPU & GPU

Slide19

Software occlusion culling

Solution: Rasterize course zbuffer on SPU/CPULow-poly occluder meshes100m view distanceMax 10000 vertices/frameManually conservative256x114 float z-bufferCreated for PS3, now on allCull all objects against zbufferBefore passed to all other systems = big savingsScreen-space bbox test

Slide20

GPU occlusion culling

Want GPU rasterization & testing, but:Occlusion queries introduces overhead & latencyCan be manageable, not idealConditional rendering only helps GPUNot CPU, frame memory or draw callsFuture1: Low-latency extra GPU exec contextRasterization and testing done on GPULockstep with CPUFuture2: Move entire cull & rendering to GPUScene graph, cull, systems, dispatch. End goal.

Slide21

Texturing

Slide22

Texture formats

UsingDXT1/5 color maps, sRGBBC5 (3Dc) normal mapsBC4 (DXT5A) for grayscale maskssRGB support for BC4/5 would be niceDXT1 replacement neededLow quality565 color bleedingRG/RGB masks compresses badlyHDR envmaps & lightmaps

RGB DXT1 mask

DXT color bleed

Slide23

Slide24

Future texture sampling

Texture sampling derivatives1st order texel derivatives2nd order as well?Implement in sampler unitBad performance or quality with shader sampling Artifacts with ddx/ddy techniqueReplace normalmaps with easily compressed bumpmapsBicubic upsamplingTerrain masks

Terrain heightmap

Derived normals [2]

Slide25

Slide26

Current sparse textures

Save memory for terrainStatic quadtree mask textureDynamic sparse destruction maskImplementationIndirection texture lookup in atlasArrays too small, want 8192 slicesCorrect bilinear filtering by bordersSiggraph’07 course for details [2]

Source mask

Atlas texture

Slide27

HW sparse textures

Virtual textureHW texture filtering & mipmappingFallback on non-resident tile access Lower mipmap, default value or shader boolAt least 32k x 32k, fp issues with larger?Application-controlled tile commit/free~128 x 128 tilesFeedback mechanism for referenced tilesEasy view-dependent allocationFuture: Latency-free allocation & generationAlt1. CPU thread callback & blockAlt2. Keep everything on GPU. ”Command” shader?

Slide28

Cached Procedural Unique Texturing

Unique dynamic sparse texture on all objects Defined by texture shader graphCombine procedurals, compositing, streaming and uv-space geometryDynamically commit & render visible tilesHighly complex compositingThanks to high frame-to-frame coherencyUpsample and refineNew dynamic effects made possibleAffect every surface

Slide29

Raytracing

Slide30

Raytracing

Much recent debate & interest in RTRTWhat we are interested in:Performance!! Rasterization for primary raysDeterministicEasy integration into enginesJust another method for certain effects & objectsNot replace whole pipeline Efficient dynamic geometryProcedural & manual animation (foliage, characters)Destruction (foliage, buildings, objects)

Slide31

Mirror’s Edge

Slide32

Raytraced reflections wanted

Glass & metalMostly planar surfacesReflection localityCorrect reflections for important objectsMain characterSimplified world geometry & shading for restCommon for gamesBrickmaps? [3]

Slide33

Soft reflections

Mirror’s Edge

Slide34

GPGPU

Slide35

GPGPU uses

Effect physicsParticle vs world soft collisionAI pathfindingAI visibilityView rasterization. Obstruction from smoke & foliageProcedural animationTrees, undergrowth, hairPost-processing

Slide36

CUDA DOF post-process filter

Circle of confusion mapThesis work at DICE [4]Test CUDA and performancePoisson disc blur

Multi-passed diffusionSeperable diffusionGood:Easy to learn (C)Map complex algorithmsThread & memory controlBad:Performance vs shaders

Beta interopVendor-specific

Output

Slide37

GPU Compute programming model

Wanted:Easy & efficient Direct3D 10 interopLow-latency Compute tasksVendor-independent base interfaceOpenCL?Efficient CPU multi-core backendServer, older GPUs, debuggingMCUDA [5]Eventually platform-independentFuture consoles

Slide38

Conclusions

Shader subroutinesMore software-controlled pipelineMore texture sampler functionalityLimited-case raytracingGPU compute for games

Slide39

Questions?

Contact: johan.andersson@dice.se

Slide40

References

[1] Tartarchuk, Natasha & Andersson, Johan. ”Rendering Architecture and Real-time Procedural Shading & Texturing Techniques”. GDC 2007. Link[2] Andersson, Johan. ”Terrain Rendering in Frostbite using Procedural Shader Splatting”. Siggraph 2007. Link[3] Christensen, Per H. & Batali, Dana. "An Irradiance Atlas for Global Illumination in Complex Production Scenes“. Eurographics Symposium on Rendering 2004. Link

[4] Lonroth, Per & Unger, Mattias. ”Advanced Real-time Post-Processing using GPGPU techniques”. Master thesis, 2008.[5] John Stratton, Sam Stone, Wen-mei Hwu. "MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores". Technical report, University of Illinois at Urbana-Champaign, IMPACT-08-01, March, 2008.

Slide41

Bonus slides

Slide42

Real-time REYES

Very interestingDisplacement mapping & proceduralsStochastic samplingPotentially more efficient & generalCompared to maxed out rasterization & tessellation on everything = pixel-sized trianglesButNo experience More research & experimentation needed

Slide43

Terrain detail

Deriving normal from heightfield good in distanceFuture: HW tessellation & procedural displacement shaders for up close ground detail

Slide44

Texture arrays

Use cases:Everything!Rich parameterized shadersVary slice index per instance, triangle or texel Instancing without comprimising on variation or perf.Cascaded shadow mapsHW PCF only in DX 10.1 Stable Cascaded Bounding Box Shadow MapsSparse texturesMore slices plzFor tile pools. 64x64x8192

Slide45

Other raytracing uses

Global Illumination & Ambient OcclusionIncremental Photon Mapping?Async collision raycastsAI pathfinding, gameplay, sound obstructionSeperate collision world from visual worldCPU job-based now