/
A Whirlwind A Whirlwind

A Whirlwind - PowerPoint Presentation

celsa-spraggs
celsa-spraggs . @celsa-spraggs
Follow
394 views
Uploaded On 2016-08-11

A Whirlwind - PPT Presentation

Tour of Vulkan Graham Sellers AMD grahamsellers Architecture APPLICATION LOADER DRIVER DRIVER GPU GPU GPU Your code Khronos IHV Driver Hardware Overview Overview of the Vulkan System ID: 443063

device amp application info amp device info application memory vulkan pipeline state cmdbuffer tools queue multiple command instance gpu

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A Whirlwind" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1
Slide2

A Whirlwind

Tour of VulkanGraham Sellers, AMD@grahamsellersSlide3

Architecture

APPLICATIONLOADERDRIVER

DRIVER

GPU

GPU

GPU

Your code

Khronos

IHV / Driver

HardwareSlide4

Overview

Overview of the Vulkan SystemOutline design goalsShow example API usageSlide5

Goals

Major Vulkan design goalsHigh performance from a single threadScalable to many threadsScalable across wide range of architecturesSolid foundation for future developmentSolve ecosystem issuesSlide6

Application Startup

Vulkan is represented by an “instance”Application can have multiple Vulkan instancesInstance is owned by the loaderAggregates drivers from multiple vendorsResponsible for discovery of GPUsMakes multiple drivers look like one big driver supporting many GPUsSlide7

Application Startup

Application specifies to loader:Information about itselfCallback interface for memory allocationGet back a Vullkan instanceVkApplicationInfo appInfo = { ... };

VkAllocCallbacks

allocCb = { ... };VkInstance

instance;

vkCreateInstance

(&

appInfo

, &

allocCb

, &instance);Slide8

Physical Devices

Devices are explicitly enumerated in VulkanThis produces a list of devicesIntegrated + discreteMultiple discrete GPUs in one systemApplication manages multiple devicesuint32_t devCount;

VkPhysicalDevice devices[10];

vkEnumeratePhysicalDevices(instance

,

ARRAYSIZE(devices), &

devCount

, devices);Slide9

Device Information

Applications can query information about devicesReturns lots of information about the deviceCapabilities, optional features, memory sizes, performance characteristics, etc., etc.VkPhysicalDeviceFeatures features = {};vkGetPhysicalDeviceFeatures(

phsicalDevice,

&features);Slide10

Logical Devices

Logical device is a software representation of a GPUThis is what your application communicates withParameters include information about applicationWhat features it will to useWhich queues, extensions, etc.VkDeviceCreateInfo info = { ... };VkDevice

device;

vkCreateDevice(physicalDevice

,

&info, &device);Slide11

Queues

Work is performed on queuesQueues run asynchronously to each otherQueues have different capabilitiesGraphics, compute, DMA operationsProperty of physical deviceSlide12

Queues

Get queue handle from the deviceQueues are represented as members of familiesEach family has specific capabilitiesThere is one or more queue in each familyFamily and index are the two parameters aboveVkQueue queue;

vkGetDeviceQueue(device, 0, 0, &queue);Slide13

Command Buffers

Commands are sent to a queue in command buffersCreation parameters include:Which queue family it will be submitted toHow aggressively drivers should optimize?etc.VkCmdBufferCreateInfo info

;VkCmdBuffer

cmdBuffer;

vkCreateCommandBuffer

(device, &info, &

cmdBuffer

);Slide14

Command Buffers

Commands are inserted into command buffersDriver heavy lifting happens hereState validation, optimization, etc.VkCmdBufferBeginInfo info = { ... };vkBeginCommandBuffer(

cmdBuf, &info);

vkCmdDoThisThing(cmdBuf

, ...);

vkCmdDoSomeOtherThing

(

cmdBuf

, ...);

vkEndCommandBuffer

(

cmdBuf

);Slide15

Pipelines

Pipelines contain most stateCompiled up front, used in command buffersContains compiled shaders, blend, multisample, etc.Pipelines can be serialized into a cacheImproves application load timeVkGraphicsPipelineCreateInfo info = { ... };

VkPipeline

pipeline;vkCreateGraphicsPipelines(device

,

cache, 1, &info

, &pipeline);Slide16

Shaders

Shaders are compiled up frontPrimary (only) shading language for Vulkan is SPIR-VVendor neutral binary intermediate formSame SPIR-V as used in OpenCL 2.1Reference GLSL -> SPIR-V compiler availableVkShaderCreateInfo info = { ... };

VkShader shader

;vkCreateShader

(device, &info, &

shader

);Slide17

Mutable State

A lot of pipeline state is immutableSome state is dynamicRepresented by smaller chunks of stateVkDynamicViewportStateCreateInfo vpInfo = { ... };

VkDynamicViweportState vpState

;vkCreateDynamicViewportState

(device, &

vpInfo

, &

vpState

);

VkDynamicDepthStencilCreateInfo

dsInfo

= { ... };

VkDynamicDepthStencilState

dsState

;vkCreateDynamicDepthStencilState

(device, &dsInfo

, &dsState

);Slide18

State Binding

State is bound to command buffersState is inherited from draw to drawIt is not inherited across command buffer boundariesIncremental update by dynamic state bindingvkCmdBindPipeline(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);

vkCmdBindDynamicViewportState(

cmdBuffer, vpState

);

vkCmdBindDynamicDepthStencilState

(

cmdBuffer

,

dsState

);Slide19

Derivative State

Pipelines can be derived from other pipelinesCreate a master pipeline templateModify creation parameters, create derivativeProvides performance opportunityDuring creation, drivers can re-use stateAt runtime, fast to switch between related statesSlide20

Vulkan Resources

Resources are data that can be accessed by the deviceExamples are buffers and imagesResources represented by API objectsMemory for resources is managed by the applicationVkImageCreateInfo imageInfo

= { ... };VkImage

image;vkCreateImage(device, &

imageInfo

,

&image);

VkBufferCreateInfo

bufferInfo

= { ... };

VkBuffer

buffer;

vkCreateBuffer

(device, &

bufferInfo

, &buffer);Slide21

Device Memory

Applications query objects for their memory needs:Application allocates memory for objects:Application binds memory to the resource:VkMemoryRequirements reqs;

vkGetImageMemoryRequirements(device, image, &

reqs);

VkMemoryAllocInfo

memInfo

= { ... };

VkDeviceMemory

mem;

vkAllocMemory

(device, &

memInfo

, &mem);

vkBindImageMemory

(device, image, mem, 0);Slide22

Managing Memory

Application managed memory:Application does pool managementMultiple resource in a single allocationAvoid overhead of allocation per objectRecycle memory between objectsSlide23

Sharing Data

Unlike OpenGL, memory is mapped, not buffersBind memory to bufferMap memory for CPU accessFlags control how memory is allocated and mappedControl over caching, coherency, etc. providedZero-copy and UMA fully supportedvkMapMemory(device, mem, offset, size, flags, &pData);Slide24

Descriptors

Vulkan resources are represented by descriptorsDescriptors are arranged in setsSets are allocated from poolsSets have layouts, known at pipeline creation timevkCreateDescriptorPool(...);vkCreateDescriptorSetLayout(...);

vkAllocDescriptorSets(...);Slide25

Pipeline Layouts

Layouts represent arrangement of sets used by pipelinesLayout is shared between sets and pipelinesLayout represented by VkPipelineLayout objectUsed at pipeline create timeSwitch pipelines using sets of the same layoutPipelines are considered compatiblevkCreatePipelineLayout(...);Slide26

Render Passes

Frames logically organized into render passesRender pass contains a lot of information:Layout and types of framebuffer attachmentsWhat to do when the render pass begins and endsPart of the framebuffer that the pass may effectVkRenderPassCreateInfo info = { ... };

VkRenderPass

renderPass;vkCreateRenderPass

(device, &info, &

renderPass

);Slide27

Merging Passes

Vulkan has the concept of a “sub-pass”Allows multiple render passes to be mergedIntermediate attachments for transient dataData passed from pass to passTile-based architectures can keep data on chipMight reuse memory for temporary surfacesSlide28

Drawing

Draws are always inside a render passAll draw types supported – instancing, indirect, etc.VkRenderPassBegin beginInfo = {

renderPass, ... };

vkCmdBeginRenderPass(cmdBuffer

, &

beginInfo

);

vkCmdBindPipeline

(

cmdBuffer

, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);

vkCmdBindDescriptorSets

(

cmdBuffer

, ...);

vkCmdDraw

(

cmdBuffer, 0, 100, 1, 0);

vkCmdEndRenderPass(

cmdBuffer,

renderPass);Slide29

Compute

Compute pipelines are specialPossible to have (multiple) compute-only queuesQueues run asynchronouslyYes, asynchronous computeCompute launched through dispatchesVkComputePipelineCreateInfo info = { ... };VkPipeline

pipeline;

vkCreateComputePipeline(device,

cache, 1, &info

, &pipeline);Slide30

Synchronization

Work is synchronized through event primitivesEvents may be set, reset, polled and waited onVkEventCreateInfo info = { ... };VkEvent event

;vkCreateEvent

(device, &info, &event);vkSetEvent

(...);

vkResetEvent

(...);

vkGetEventStatus

(...);

vkCmdSetEvent

(...);

vkCmdResetEvent

(...);

vkCmdWaitEvents

(...);Slide31

Resource State

Resources can be in any of many statesRenderable, CPU read, shader read or write, etc.Drivers used to track this informationNot any more! Now it’s your job…Pass old state + stages, new state + stagesDriver will take care of the restVkImageMemoryBarrier imageBarrier

= { ...

};vkCmdPipelineBarrier(

cmdBuffer

,

..., 1, &

imageBarrier

);Slide32

Work Submission

Work is submitted to queues for executionA fence (VkFence) is associated with the submissionThis is signaled when work completesCPU can wait on this fenceQueues marshal resources ownership with semaphoresVmCmdBuffer

commandBuffers[] = { cmdBuffer1, cmdBuffer2, ...};

vkQueueSubmit(queue, 1,

commandBuffers

, fence);

vkQueueSignalSemaphore

(queue, semaphore);

vkQueueWaitSemaphore

(queue, semaphore);Slide33

Threading

Threading is a big considerationAPI doesn’t lock – that’s the application responsibilityConcurrent read access to same objectConcurrent write access to different objectsPerformance from one thread will still be goodSlide34

Presentation

Displaying outputs is optional!We expect some compute-only Vulkan applicationsNo real need to create a window – console modeEach platform is differentPresentation is an extensionWe define two flavors of the “Window System Interface”One is for compositors, one is for direct-displaySlide35

Displays

Vulkan also abstracts some display managementAlso delegated to WSI extensionsManage display modeTurn vsync on and offEnumerate and take control of displaysThis all depends on platform support, of course!Slide36

Teardown

Application responsible for object destructionMust be correctly orderedNo reference countingNo implicit object lifetimeDo not delete objects that are still in use!This includes use by GPUSlide37

Scalability

Scalability is an important goalScales from low power mobile to high end workstationMany features optionalQueryable upper limits for most thingsStill considering how to “bundle” featuresWant to avoid “sea of caps” problemMay defer to platform ownersSlide38

Extensibility

Vulkan has a first class extension mechanismExtensions are opt-inNo more using extensions by accidentDon’t pay driver tax for unused featuresMuch easier to validateStill want to expose bleeding edgeVulkan is a platform for innovationSlide39

Tools and Debugging

Tools and development are key to successStrong tools mean better applicationsVulkan is not simple – tools are a mustKhronos is looking to build a strong ecosystemTools, loader and other components open sourceWell documented hooks for extending APISlide40

Tools and Debugging

APPLICATION LOADERDRIVER

DRIVER

GPU

GPU

GPU

TOOLS

LAYERSSlide41

Layers

Loader supports layering APIsFormal hooks for debuggers and toolsNo more interceptors, shims, or stub librariesValidation in intermediate layersOpt-in, very powerfulSeveral layers already developedAPI trace, parameter validation, API timing, etc.Slide42

Layers

Multiple types of layerInstance level layersEnabled at instance creation timeGlobally available to every device in instanceDevice level layersSpecific to deviceEnable device-specific extensions, for exampleSlide43

Summary

Not really “low-level”, just a better abstractionVery low overhead:Low overhead means more application CPU cyclesExplicit threading support means you can go wide without worrying about graphics APIsBuilding command buffers once and submitting many times means low amortized costSlide44

Summary

Cross-platform, cross-vendorNot tied to single OS (or OS version)Not tied to single GPU family or vendorNot tied to single architectureDesktop + mobile, forward and deferred, tilers all first class citizensSlide45

Summary

Open, extensibleKhronos is an open standards bodyCollaboration from across the industry, IHVs + ISVs, games, CAD, “Pro” Graphics, AAA + casualFull support for extensions, layering, debuggers, toolsSPIR-V fully documented – write your own compiler!Slide46

Thanks!

@grahamsellerswww.khronos.org/vulkan