Mantle for developers

Mantle for developers Mantle for developers - Start

2016-04-24 66K 66 0 0

Mantle for developers - Description

Johan Andersson – technical director. Frostbite. Electronic arts. Simplify advanced development. . Improve performance . Enable developers to innovate . Challenge the status quo. Mantle?. Control. ID: 290926 Download Presentation

Download Presentation

Mantle for developers




Download Presentation - The PPT/PDF document "Mantle for developers" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentations text content in Mantle for developers

Slide1

Mantle for developers

Johan Andersson – technical directorFrostbiteElectronic arts

Slide2

Simplify advanced development Improve performance Enable developers to innovate Challenge the status quo

Mantle?

Slide3

Slide4

Control

GPU performance

CPU performance

Programmability

Platforms

Developer impact areas

Slide5

Explicit Model: Mantle

Traditional Model:

Black Box

Middle-ground abstraction – compromise between performance & “usability”

Hidden resource

memory & state

Resource CPU access

tied to device context

Driver analyzes & synchronizes implicitly

Thin low-level abstraction to expose how hardware works

App explicit memory managementResources are globally accessibleApp explicit resource state transitions

Control

New model

Slide6

Tell when render target will be used as a textureAnd many more resource state transitionsDon’t destroy resources that GPU is usingKeep track with fences or framesManual dynamic resource renamingNo DISCARD for driver resource renamingResource memory tilingPowerful validation layer will help!

App responsibility

Control

Slide7

App high-level decisions & optimizationsHas full scene informationEasier to optimize performance & memoryFlexible & efficient memory managementLinear frame allocatorsMemory poolsPinned memoryReduced development timeFor advanced game engines & appsEasier to get to target performance & robustness

Explicit control enables

Control

Slide8

Light-weight driverEasier to develop & maintainReduced CPU draw call overhead

Transient resourcesAlias render targets within frame Major memory savingsNo need to pre-allocate everything

Explicit control enables

Control

Slide9

CPU performance

Control

Slide10

CPU perf

Descriptor sets Monolithic pipelines Command buffers

Core concepts

Slide11

Table with resource references to bind to graphics or compute pipelineReplaces traditional resource stage bindingMajor performance & flexibility advantage Closer to how the hardware worksApp managed - lots of strategies possible!Tiny vs huge setsSingle vs multipleStatic vs semi-static vs dynamic

Example 1: Single simple dynamic descriptor setBind everything you need for a single draw callClose to DX/GL model but share between stages

Descriptor sets

CPU perf

Link

Sampler

Image

Memory

VertexBuffer

(VS)

Texture0 (VS+PS)

Constants (VS)

Texture1 (PS)

Texture2 (PS)

Sampler0 (VS+PS)

Dynamic descriptor set

Slide12

Table with resource references to bind to graphics or compute pipelineReplaces traditional resource stage bindingMajor performance & flexibility advantageCloser to how the hardware worksApp managed - lots of strategies possible!Tiny vs huge setsSingle vs multipleStatic vs semi-static vs dynamic

Example 2: Reuse static set with nestingReduce update time & memory usage

Descriptor sets

CPU perf

Link

Sampler

Image

Memory

Constants (VS)

Link

Dynamic descriptor set

Texture3 (PS)

Texture4 (PS)

Sampler0 (VS+PS)

Texture2 (PS)

Texture1 (PS)

Sampler1 (PS)

Static descriptor set

VertexBuffer

(VS)

Texture0 (VS+PS)

Slide13

CPU perf

Shader stages & select graphics state combined into single objectNo runtime compilation or patching needed!Significantly less runtime overhead to use Supports parallel building & cachingFast loading timesUsage & management up to the appStatic vs dynamic creationAmount of pipelinesState usage

Monolithic pipelines

IA

VS

HS

DS

Tessellator

GS

RS

PS

DB

CB

Pipeline state

Slide14

Issue pipelined graphics & compute commands into a command bufferBind graphics state, descriptor sets, pipelineDraw callsRender targetsClearsMemory transfersNOT: resource mappingFully independent objectsCreate multiple every frameOr pre-build up front and reuse

Command buffers

CPU

perf

Slide15

Render

Driver Render

Game

Render

Game

Game

Render

Automatically extracts parallelism out of most apps Doesn’t scale beyond 2-3 cores Additional latency Driver thread often bottleneck – can collide app threads 

CPU 0

CPU 1

CPU 2

CPU

perf

DX/GL parallelism

Slide16

Render

Game

Render

Game

Game

Render

App can go fully wide with its rendering – minimal latency  Close to linear scaling with CPU cores No driver threads – no overhead – no contention Frostbite’s approach on all consoles – and on PC with Mantle! 

Render

Render

Render

Render

Render

Render

Render

Render

Render

CPU 0

CPU 1

CPU 2

CPU 3

CPU 4

CPU

perf

Parallel dispatch with Mantle

Slide17

GPU performance

CPU performance

Slide18

GPU perf

Thanks to improved CPU performance – CPU will rarely be a bottleneck for the GPUCPU could help GPU more:Less brute force renderingImprove cullingShader pipeline object – driver optimizationsCan optimize with pipeline state knowledgeCan optimize across all shader stages

Resource statesGives driver a lot more knowledge & flexibilityApps can avoid expensive/redundant transitions, such as surface decompressionExpose existing GPU functionalityQuad & Rect-listsHW-specific MSAA & depth data accessProgrammable sample patternsAnd more..

GPU optimizations

Slide19

Modern GPUs are heterogeneous machines with multiple enginesGraphics pipelineCompute pipeline(s)DMA transferVideo encode/decodeMore…Mantle exposes queues for the engines + synchronization primitives

Queues

GPU

perf

Graphics

Compute

DMA

GPU

. . .

Queues

Slide20

Queues

GPU

perf

Graphics

Compute

DMA

GPU

. . .

Queues

Slide21

Async DMA transfersCopy resources in parallel with graphics or compute

Queue use cases

GPU

perf

Render

Other render

Use copy

Copy

Graphics

DMA

Slide22

Async DMA transfersCopy resources in parallel with graphics or computeAsync compute together with graphicsALU heavy compute work at the same time as memory/ROP bound work to utilize idle units

Queue use cases

GPU

perf

GBuffer

Shadowmap

0

Shadowmap

1

Final lighting

Non-shadowed lighting

Compute

Graphics

Slide23

Async DMA transfersCopy resources in parallel with graphics or computeAsync compute together with graphicsALU heavy compute work at the same time as memory/ROP bound work to utilize idle units

Multiple compute kernels collaboratingCan be faster than über-kernelExample: Compute geometry backend & compute rasterizer

Queue use cases

GPU perf

Compute Geometry

Compute 0

Compute 1

Graphics

Ordinary Rendering

Compute Rasterizer

Slide24

Async DMA transfersCopy resources in parallel with graphics or computeAsync compute together with graphicsALU heavy compute work at the same time as memory/ROP bound work to utilize idle units

Multiple compute kernels collaboratingCan be faster than über-kernelExample: Compute geometry backend & compute rasterizerCompute as frontend for graphics pipelineCompute runs asynchronously ahead and prepares & optimizes geometry for graphics pipeline

Queue use cases

GPU perf

Game engines will build large GPU job graphsMove away from single sequential submissionJust as we already have done on CPU

Draw0

Draw1

Draw2

Process0

Compute

Graphics

Process1

Process0

Slide25

GPU performance

Programmability

Slide26

Programmability

Explicit control of GPU queues and synchronization, finally!Implement your own Alternate-Frame-RenderingOr something more exotic..Use case: Workstation rendering with 4-8 GPUsSuper high-quality rendering & simulationLoad balance graphics & compute job graphs across GPUs20-40 TFlops in a single machine!Use case: Low-latency renderingImportant for VR and competitive gamesLatency optimized GPU job graph schedulingVR: Simultaneously drive 2 GPUs (1 per eye)

Explicit Multi-GPU

Slide27

Programmability

Command buffer predication & flow controlGPU affecting/skipping submitted commandsGo beyond DrawIndirect / DispatchIndirectAdvanced variable workloads Advanced culling optimizations

Write occlusion query results into GPU bufferNo CPU roundtrip neededCan drive predicated renderingOr use results directly in shaders (lens flares)

New mechanisms

Slide28

Programmability

Mantle supports bindless resourcesShaders can select resources to use instead of static binding from CPUExtension of the descriptor set supportKey component that will open up a lot of opportunities!

ExamplesPerformance optimizations – less data to updateLogic & data structures that live fully on the GPUScene culling & renderingMaterial representationsDeferred shadingRaytracing

Bindless resources

Slide29

Programmability

Platforms

Slide30

Mantle gives us strong benefits on Windows todayConsole-like performance & programmability on both Windows 7 and Windows 8For us, well worth the dev time!DX & GL are the industry standardsNeeded for platforms that do not support MantleNeeded by devs who do not want/need more controlHave to have fallback paths for GL/DX, but not limit oneself to itMantle and PlayStation 4 will drive our future Frostbite designs & optimizationsPS4 graphics API has great programmability & performance as wellShare concepts, methods & optimization strategies

Today

Platforms

Slide31

Want to see Mantle on Linux and Mac!Would enable support for our full engine & renderingSignificantly easier to do efficient renderer with Mantle than with OpenGLUse cases: WorkstationsR&D Not limited by WDDMGames Mantle + SteamOS = powerful combination!

Linux & Mac

Platforms

Slide32

Mobile architectures are getting closer in capabilities to desktop GPUsWant graphics API that allows apps to fully utilize the hardwarePower efficientHigh performanceProgrammableMajor opportunity with Mantle – leap frog GL4, DX11For mobile SoC vendorsFor Google and Apple

Mobile

Platforms

Slide33

Mantle is designed to be a thin hardware abstractionNot tied to AMD’s GCN architectureForward compatibleExtensions for architecture- and platform-specific functionalityMantle would be a much more efficient graphics API for other vendors as wellMost Mantle functionality can be supported on today’s modern GPUsWant to see future version of Mantle supported on all platforms and on all modern GPUs!Become an active industry standard with IHVs and ISVs collaboratingEnable us developers to innovate with great performance & programmability everywhere

Multi-vendor?

Platforms

Slide34

Platforms

Slide35

Mantle support is in developmentCore renderer (closer to PS4 than DX11)Implement all rendering techniques used in BF4 (many!)CPU optimizations (parallel dispatch, descriptor sets)GPU optimizations (minimize transitions, MSAA)R&D for advanced GPU optimizationsMemory managementMulti-GPU support~2 months of workUpdate targeting late December

Battlefield 4

Frostbite

Slide36

Very different rendering compared to BF4 Frostbite Mantle renderer will work out of the boxFocus on APU performance

Plants vs Zombies: Garden Warfare

Frostbite

Slide37

All Frostbite games designed with Mantle15 games in development across all of EAAdvanced Mantle rendering & use casesLots of exciting R&D opportunities!Want multi-vendor & multi-platform support!

Future

Frostbite

Slide38

The end

Email:

repi@dice.seWeb: http://frostbite.comTwitter: @repi

Slide39

Slide40

Slide41


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.