Tour of Vulkan Graham Sellers AMD grahamsellers Architecture APPLICATION LOADER DRIVER DRIVER GPU GPU GPU Your code Khronos IHV Driver Hardware Overview Overview of the Vulkan System ID: 443063
Download Presentation The PPT/PDF document "A Whirlwind" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1Slide2
A Whirlwind
Tour of VulkanGraham Sellers, AMD@grahamsellersSlide3
Architecture
APPLICATIONLOADERDRIVER
DRIVER
GPU
GPU
GPU
Your code
Khronos
IHV / Driver
HardwareSlide4
Overview
Overview of the Vulkan SystemOutline design goalsShow example API usageSlide5
Goals
Major Vulkan design goalsHigh performance from a single threadScalable to many threadsScalable across wide range of architecturesSolid foundation for future developmentSolve ecosystem issuesSlide6
Application Startup
Vulkan is represented by an “instance”Application can have multiple Vulkan instancesInstance is owned by the loaderAggregates drivers from multiple vendorsResponsible for discovery of GPUsMakes multiple drivers look like one big driver supporting many GPUsSlide7
Application Startup
Application specifies to loader:Information about itselfCallback interface for memory allocationGet back a Vullkan instanceVkApplicationInfo appInfo = { ... };
VkAllocCallbacks
allocCb = { ... };VkInstance
instance;
vkCreateInstance
(&
appInfo
, &
allocCb
, &instance);Slide8
Physical Devices
Devices are explicitly enumerated in VulkanThis produces a list of devicesIntegrated + discreteMultiple discrete GPUs in one systemApplication manages multiple devicesuint32_t devCount;
VkPhysicalDevice devices[10];
vkEnumeratePhysicalDevices(instance
,
ARRAYSIZE(devices), &
devCount
, devices);Slide9
Device Information
Applications can query information about devicesReturns lots of information about the deviceCapabilities, optional features, memory sizes, performance characteristics, etc., etc.VkPhysicalDeviceFeatures features = {};vkGetPhysicalDeviceFeatures(
phsicalDevice,
&features);Slide10
Logical Devices
Logical device is a software representation of a GPUThis is what your application communicates withParameters include information about applicationWhat features it will to useWhich queues, extensions, etc.VkDeviceCreateInfo info = { ... };VkDevice
device;
vkCreateDevice(physicalDevice
,
&info, &device);Slide11
Queues
Work is performed on queuesQueues run asynchronously to each otherQueues have different capabilitiesGraphics, compute, DMA operationsProperty of physical deviceSlide12
Queues
Get queue handle from the deviceQueues are represented as members of familiesEach family has specific capabilitiesThere is one or more queue in each familyFamily and index are the two parameters aboveVkQueue queue;
vkGetDeviceQueue(device, 0, 0, &queue);Slide13
Command Buffers
Commands are sent to a queue in command buffersCreation parameters include:Which queue family it will be submitted toHow aggressively drivers should optimize?etc.VkCmdBufferCreateInfo info
;VkCmdBuffer
cmdBuffer;
vkCreateCommandBuffer
(device, &info, &
cmdBuffer
);Slide14
Command Buffers
Commands are inserted into command buffersDriver heavy lifting happens hereState validation, optimization, etc.VkCmdBufferBeginInfo info = { ... };vkBeginCommandBuffer(
cmdBuf, &info);
vkCmdDoThisThing(cmdBuf
, ...);
vkCmdDoSomeOtherThing
(
cmdBuf
, ...);
vkEndCommandBuffer
(
cmdBuf
);Slide15
Pipelines
Pipelines contain most stateCompiled up front, used in command buffersContains compiled shaders, blend, multisample, etc.Pipelines can be serialized into a cacheImproves application load timeVkGraphicsPipelineCreateInfo info = { ... };
VkPipeline
pipeline;vkCreateGraphicsPipelines(device
,
cache, 1, &info
, &pipeline);Slide16
Shaders
Shaders are compiled up frontPrimary (only) shading language for Vulkan is SPIR-VVendor neutral binary intermediate formSame SPIR-V as used in OpenCL 2.1Reference GLSL -> SPIR-V compiler availableVkShaderCreateInfo info = { ... };
VkShader shader
;vkCreateShader
(device, &info, &
shader
);Slide17
Mutable State
A lot of pipeline state is immutableSome state is dynamicRepresented by smaller chunks of stateVkDynamicViewportStateCreateInfo vpInfo = { ... };
VkDynamicViweportState vpState
;vkCreateDynamicViewportState
(device, &
vpInfo
, &
vpState
);
VkDynamicDepthStencilCreateInfo
dsInfo
= { ... };
VkDynamicDepthStencilState
dsState
;vkCreateDynamicDepthStencilState
(device, &dsInfo
, &dsState
);Slide18
State Binding
State is bound to command buffersState is inherited from draw to drawIt is not inherited across command buffer boundariesIncremental update by dynamic state bindingvkCmdBindPipeline(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);
vkCmdBindDynamicViewportState(
cmdBuffer, vpState
);
vkCmdBindDynamicDepthStencilState
(
cmdBuffer
,
dsState
);Slide19
Derivative State
Pipelines can be derived from other pipelinesCreate a master pipeline templateModify creation parameters, create derivativeProvides performance opportunityDuring creation, drivers can re-use stateAt runtime, fast to switch between related statesSlide20
Vulkan Resources
Resources are data that can be accessed by the deviceExamples are buffers and imagesResources represented by API objectsMemory for resources is managed by the applicationVkImageCreateInfo imageInfo
= { ... };VkImage
image;vkCreateImage(device, &
imageInfo
,
&image);
VkBufferCreateInfo
bufferInfo
= { ... };
VkBuffer
buffer;
vkCreateBuffer
(device, &
bufferInfo
, &buffer);Slide21
Device Memory
Applications query objects for their memory needs:Application allocates memory for objects:Application binds memory to the resource:VkMemoryRequirements reqs;
vkGetImageMemoryRequirements(device, image, &
reqs);
VkMemoryAllocInfo
memInfo
= { ... };
VkDeviceMemory
mem;
vkAllocMemory
(device, &
memInfo
, &mem);
vkBindImageMemory
(device, image, mem, 0);Slide22
Managing Memory
Application managed memory:Application does pool managementMultiple resource in a single allocationAvoid overhead of allocation per objectRecycle memory between objectsSlide23
Sharing Data
Unlike OpenGL, memory is mapped, not buffersBind memory to bufferMap memory for CPU accessFlags control how memory is allocated and mappedControl over caching, coherency, etc. providedZero-copy and UMA fully supportedvkMapMemory(device, mem, offset, size, flags, &pData);Slide24
Descriptors
Vulkan resources are represented by descriptorsDescriptors are arranged in setsSets are allocated from poolsSets have layouts, known at pipeline creation timevkCreateDescriptorPool(...);vkCreateDescriptorSetLayout(...);
vkAllocDescriptorSets(...);Slide25
Pipeline Layouts
Layouts represent arrangement of sets used by pipelinesLayout is shared between sets and pipelinesLayout represented by VkPipelineLayout objectUsed at pipeline create timeSwitch pipelines using sets of the same layoutPipelines are considered compatiblevkCreatePipelineLayout(...);Slide26
Render Passes
Frames logically organized into render passesRender pass contains a lot of information:Layout and types of framebuffer attachmentsWhat to do when the render pass begins and endsPart of the framebuffer that the pass may effectVkRenderPassCreateInfo info = { ... };
VkRenderPass
renderPass;vkCreateRenderPass
(device, &info, &
renderPass
);Slide27
Merging Passes
Vulkan has the concept of a “sub-pass”Allows multiple render passes to be mergedIntermediate attachments for transient dataData passed from pass to passTile-based architectures can keep data on chipMight reuse memory for temporary surfacesSlide28
Drawing
Draws are always inside a render passAll draw types supported – instancing, indirect, etc.VkRenderPassBegin beginInfo = {
renderPass, ... };
vkCmdBeginRenderPass(cmdBuffer
, &
beginInfo
);
vkCmdBindPipeline
(
cmdBuffer
, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);
vkCmdBindDescriptorSets
(
cmdBuffer
, ...);
vkCmdDraw
(
cmdBuffer, 0, 100, 1, 0);
vkCmdEndRenderPass(
cmdBuffer,
renderPass);Slide29
Compute
Compute pipelines are specialPossible to have (multiple) compute-only queuesQueues run asynchronouslyYes, asynchronous computeCompute launched through dispatchesVkComputePipelineCreateInfo info = { ... };VkPipeline
pipeline;
vkCreateComputePipeline(device,
cache, 1, &info
, &pipeline);Slide30
Synchronization
Work is synchronized through event primitivesEvents may be set, reset, polled and waited onVkEventCreateInfo info = { ... };VkEvent event
;vkCreateEvent
(device, &info, &event);vkSetEvent
(...);
vkResetEvent
(...);
vkGetEventStatus
(...);
vkCmdSetEvent
(...);
vkCmdResetEvent
(...);
vkCmdWaitEvents
(...);Slide31
Resource State
Resources can be in any of many statesRenderable, CPU read, shader read or write, etc.Drivers used to track this informationNot any more! Now it’s your job…Pass old state + stages, new state + stagesDriver will take care of the restVkImageMemoryBarrier imageBarrier
= { ...
};vkCmdPipelineBarrier(
cmdBuffer
,
..., 1, &
imageBarrier
);Slide32
Work Submission
Work is submitted to queues for executionA fence (VkFence) is associated with the submissionThis is signaled when work completesCPU can wait on this fenceQueues marshal resources ownership with semaphoresVmCmdBuffer
commandBuffers[] = { cmdBuffer1, cmdBuffer2, ...};
vkQueueSubmit(queue, 1,
commandBuffers
, fence);
vkQueueSignalSemaphore
(queue, semaphore);
vkQueueWaitSemaphore
(queue, semaphore);Slide33
Threading
Threading is a big considerationAPI doesn’t lock – that’s the application responsibilityConcurrent read access to same objectConcurrent write access to different objectsPerformance from one thread will still be goodSlide34
Presentation
Displaying outputs is optional!We expect some compute-only Vulkan applicationsNo real need to create a window – console modeEach platform is differentPresentation is an extensionWe define two flavors of the “Window System Interface”One is for compositors, one is for direct-displaySlide35
Displays
Vulkan also abstracts some display managementAlso delegated to WSI extensionsManage display modeTurn vsync on and offEnumerate and take control of displaysThis all depends on platform support, of course!Slide36
Teardown
Application responsible for object destructionMust be correctly orderedNo reference countingNo implicit object lifetimeDo not delete objects that are still in use!This includes use by GPUSlide37
Scalability
Scalability is an important goalScales from low power mobile to high end workstationMany features optionalQueryable upper limits for most thingsStill considering how to “bundle” featuresWant to avoid “sea of caps” problemMay defer to platform ownersSlide38
Extensibility
Vulkan has a first class extension mechanismExtensions are opt-inNo more using extensions by accidentDon’t pay driver tax for unused featuresMuch easier to validateStill want to expose bleeding edgeVulkan is a platform for innovationSlide39
Tools and Debugging
Tools and development are key to successStrong tools mean better applicationsVulkan is not simple – tools are a mustKhronos is looking to build a strong ecosystemTools, loader and other components open sourceWell documented hooks for extending APISlide40
Tools and Debugging
APPLICATION LOADERDRIVER
DRIVER
GPU
GPU
GPU
TOOLS
LAYERSSlide41
Layers
Loader supports layering APIsFormal hooks for debuggers and toolsNo more interceptors, shims, or stub librariesValidation in intermediate layersOpt-in, very powerfulSeveral layers already developedAPI trace, parameter validation, API timing, etc.Slide42
Layers
Multiple types of layerInstance level layersEnabled at instance creation timeGlobally available to every device in instanceDevice level layersSpecific to deviceEnable device-specific extensions, for exampleSlide43
Summary
Not really “low-level”, just a better abstractionVery low overhead:Low overhead means more application CPU cyclesExplicit threading support means you can go wide without worrying about graphics APIsBuilding command buffers once and submitting many times means low amortized costSlide44
Summary
Cross-platform, cross-vendorNot tied to single OS (or OS version)Not tied to single GPU family or vendorNot tied to single architectureDesktop + mobile, forward and deferred, tilers all first class citizensSlide45
Summary
Open, extensibleKhronos is an open standards bodyCollaboration from across the industry, IHVs + ISVs, games, CAD, “Pro” Graphics, AAA + casualFull support for extensions, layering, debuggers, toolsSPIR-V fully documented – write your own compiler!Slide46
Thanks!
@grahamsellerswww.khronos.org/vulkan