In this tutorial, you will be introduced to several classes that will help you to create a robust and flexible framework for building DirectX 12 applications. Some of the problems that are solved with the classes introduced in this lesson are managing CPU descriptors, copying CPU descriptors to GPU visible descriptor heaps, managing resource state across multiple threads, and uploading dynamic buffer data to the GPU. To automatically manage the state and descriptors for resources, a custom command list class is also provided.
Contents
- 1 Introduction
- 2 Upload Buffer
- 2.1 UploadBuffer Class
- 2.1.1 UploadBuffer Header
- 2.1.2 UploadBuffer Preamble
- 2.1.3 UploadBuffer::UploadBuffer
- 2.1.4 UploadBuffer::Allocate
- 2.1.5 UploadBuffer::RequestPage
- 2.1.6 UploadBuffer::Reset
- 2.1.7 UploadBuffer::Page::Page
- 2.1.8 UploadBuffer::Page::~Page
- 2.1.9 UploadBuffer::Page::HasSpace
- 2.1.10 UploadBuffer::Page::Allocate
- 2.1.11 UploadBuffer::Page::Reset
- 2.1 UploadBuffer Class
- 3 Descriptor Allocator
- 3.1 DescriptorAllocator Class
- 3.2 DescriptorAllocatorPage Class
- 3.2.1 DescriptorAllocatorPage Header
- 3.2.2 DescriptorAllocatorPage Preamble
- 3.2.3 DescriptorAllocatorPage::DescriptorAllocatorPage
- 3.2.4 DescriptorAllocatorPage::GetHeapType
- 3.2.5 DescriptorAllocatorPage::NumFreeHandles
- 3.2.6 DescriptorAllocatorPage::HasSpace
- 3.2.7 DescriptorAllocatorPage::AddNewBlock
- 3.2.8 DescriptorAllocatorPage::Allocate
- 3.2.9 DescriptorAllocatorPage::ComputeOffset
- 3.2.10 DescriptorAllocatorPage::Free
- 3.2.11 DescriptorAllocatorPage::FreeBlock
- 3.2.12 DescriptorAllocatorPage::ReleaseStaleDescriptors
- 3.3 DescriptorAllocation Class
- 3.3.1 DescriptorAllocation Header
- 3.3.2 DescriptorAllocation Preamble
- 3.3.3 DescriptorAllocation Default Constructor
- 3.3.4 DescriptorAllocation Parameratized Constructor
- 3.3.5 DescriptorAllocation Destructor
- 3.3.6 DescriptorAllocation Move Constructor
- 3.3.7 DescriptorAllocation Move Assignment
- 3.3.8 DescriptorAllocation::Free
- 3.3.9 DescriptorAllocation::IsNull
- 3.3.10 DescriptorAllocation::GetDescriptorHandle
- 3.3.11 DescriptorAllocation::GetNumHandles
- 3.3.12 DescriptorAllocation::GetDescriptorAllocatorPage
- 4 Dynamic Descriptor Heap
- 4.1 DynamicDescriptorHeap Class
- 4.1.1 DynamicDescriptorHeap Header
- 4.1.2 DynamicDescriptorHeap Preamble
- 4.1.3 DynamicDescriptorHeap::DynamicDescriptorHeap
- 4.1.4 DynamicDescriptorHeap::ParseRootSignature
- 4.1.5 DynamicDescriptorHeap::StageDescriptors
- 4.1.6 DynamicDescriptorHeap::ComputeStaleDescriptorCount
- 4.1.7 DynamicDescriptorHeap::RequestDescriptorHeap
- 4.1.8 DynamicDescriptorHeap::CreateDescriptorHeap
- 4.1.9 DynamicDescriptorHeap::CommitStagedDescriptors
- 4.1.10 DynamicDescriptorHeap::CommitStagedDescriptorsForDraw
- 4.1.11 DynamicDescriptorHeap::CommitStagedDescriptorsForDispatch
- 4.1.12 DynamicDescriptorHeap::CopyDescriptor
- 4.1.13 DynamicDescriptorHeap::Reset
- 4.1 DynamicDescriptorHeap Class
- 5 Resource State Tracking
- 5.1 ResourceStateTracker Class
- 5.1.1 ResourceStateTracker Header
- 5.1.2 ResourceStateTracker Preamble
- 5.1.3 ResourceStateTracker::ResourceBarrier
- 5.1.4 ResourceStateTracker::TransitionResource
- 5.1.5 ResourceStateTracker::UAVBarrier
- 5.1.6 ResourceStateTracker::AliasBarrier
- 5.1.7 ResourceStateTracker::FlushResourceBarriers
- 5.1.8 ResourceStateTracker::FlushPendingResourceBarriers
- 5.1.9 ResourceStateTracker::CommitFinalResourceStates
- 5.1.10 ResourceStateTracker::Reset
- 5.1.11 ResourceStateTracker::Lock
- 5.1.12 ResourceStateTracker::Unlock
- 5.1.13 ResourceStateTracker::AddGlobalResourceState
- 5.1.14 ResourceStateTracker::RemoveGlobalResourceState
- 5.1 ResourceStateTracker Class
- 6 Custom Command List
- 7 Conclusion
- 8 Download the Source
- 9 References
The design of these classes prioritizes convenience for the graphics programmer when creating demos (for research purposes ) but may not reflect the most optimized implementations that would be used in production game engines. Feel free to share your thoughts in the comments below about how to improve the design of the classes shown here.
Introduction
As you have learned in the previous lessons, compared to DirectX 11 or OpenGL, DirectX 12 introduces a few architectural changes that creates some challenges for the graphics programmer. These architectural changes provide a lower-level rendering API but also require a lot of additional code to be written just to get anything to appear on screen. When I first started working with DirectX 12, I really struggled with issues such as memory management, descriptors, and resource state management. What’s the best memory management scheme to use to store resources? How do I make sure I have enough descriptors to render a frame?
In this lesson, I will introduce several classes that will greatly simplify the development of DirectX 12 applications. The first of these classes is the UploadBuffer
. The UploadBuffer
is a linear allocator that creates resources in an Upload Heap. The purpose of this class is to provide the ability to upload dynamic constant, vertex, and index buffer data (or any buffer data for that matter) to the GPU. The most common use-case for the UploadBuffer
class is to upload uniform data to a ConstantBuffer
used in a shader. Another typical use-case for the UploadBuffer
is for particle effects. If the particles are simulated on the CPU, the computed particle attributes need to be uploaded to the GPU every frame. Instead of creating a new upload buffer every frame, the UploadBuffer
is used to upload the particle data to the GPU. Another use-case for the UploadBuffer
is rendering a User Interface (UI). If the UI is dynamic (for example if you want to show run-time performance profiling) then the UI needs to be generated every frame with the new output. For each of these use cases, it is ideal to create a large resource in an upload heap, map a CPU pointer to the underlying resource, and copy the required data (using a memcpy
for example).
The next class that I will discuss is the DescriptorAllocator
class that is used to allocate a number of CPU visible descriptors. CPU visible descriptors are used for Render Target Views (RTV) and Depth-Stencil Views (DSV). CPU visible descriptors are also used to create Constant Buffer Views (CBV), Shader Resource Views (SRV), Unordered Access Views (UAV), and creating Samplers but CBVs, SRVs, UAVs, and Samplers require a corresponding GPU visible descriptor before they can be used in a shader.
Whenever a Draw
or Dispatch
command is executed on a command list, any resource that is read from or written to in the shader needs to be bound to the graphics or compute pipeline using a GPU visible descriptor. Although buffer resources can be bound to the GPU using inline descriptors (see Lesson 2), texture resources cannot be bound using inline descriptors and must be bound to the GPU using a descriptor table. If the shader uses a lot of textures (this is the case if you are doing Physically Based Rendering for example), then all of the textures needed during the draw or dispatch call must be bound to the graphics, or compute pipelines at the same time. Usually all of the SRV’s for the textures are bound in a contiguous block of GPU visible descriptors in a single descriptor table range. But if textures are loaded in random order, or the same texture is being used for multiple draw calls then how can one ensure that all of the textures are bound in a contiguous block of GPU visible descriptors? Another issue is that only a single descriptor heap of the same type (CBV_SRV_UAV
, or SAMPLER
) can be bound on the command list at any moment. So all GPU visible descriptors must come from a single descriptor heap (the descriptor heaps can only be changed between Draw
or Dispatch
calls)! Yet another issue arises since descriptors cannot be reused until the command list that is using them has completed executing on the GPU. So how do you know how many GPU visible descriptors need to be allocated up-front? In all but the most simple case, it is impossible to know how many GPU visible descriptors will ever be needed for an entire frame (or 3 frames in the case of triple-buffering). The DynamicDescriptorHeap
class described in this lesson solves the problem of ensuring that all of the GPU visible descriptors are copied to a single GPU visible descriptor heap before a Draw
or Dispatch
command is executed on the GPU.
Another tricky problem to solve in a DirectX 12 renderer is ensuring that resources are always in the correct state when they need to be. In order to perform a resource transition, both the before and after states of the resource need to be specified in the transition barrier. But if a resource is being used in different states in multiple command lists, then the graphics programmer needs to know exactly what state it was used in the previous command list that was executed. A naïve approach would be to create a class that stores both the resource and the current state of that resource. Anytime a transition barrier is performed on the resource, the current resource state is checked and used as the before state. This approach would work in a single-threaded renderer but wouldn’t work if the command lists are being built on different threads! In this case, there is no way to guarantee the state of the resource across multiple threads. The graphics programmer should only be concerned with implementing the graphics application and not concerned with synchronizing the state of a resource across multiple command lists, multiple command queues, and multiple threads! The ResourceStateTracker
class introduced in this lesson strives to solve the problem of tracking the resource state in a multi-threaded renderer.
In order to bring everything together and make the life of a graphics programmer as easy as possible, a custom CommandList
class is introduced which uses the aforementioned classes to simplify loading of texture and buffer resources, tracking resource state and minimizing transition barriers, and ensuring that all of the resources used in a shader are correctly bound to GPU visible descriptors. The goal of the custom CommandList
class described in this lesson is to abstract all of the complications of using DirectX 12 away and reduce the game specific code from thousands of lines of (user) code to just a few hundred.
Upload Buffer
The UploadBuffer
class provides a simple wrapper around a resource that is created in an upload heap. The UploadBuffer
is implemented as a linear allocator that allocates chunks or blocks of memory from memory pages. If a memory page cannot satisfy an allocation request, a new page is created and added to a list of available pages. A linear allocator can’t grow indefinitely so when a page of memory is no longer in use (for example, the command list that uses an allocation from that page is finished executing on the GPU) then the page can be returned to the list of available pages in the heap. The image below shows an example of a linear allocator.
A linear allocator is probably the simplest allocator to implement since it only needs to store two pointers per memory page (the base pointer, and the current offset in the page). The above image shows an example of a linear allocator after several allocations have been made. The red blocks represent allocated blocks while the green blocks represent free blocks within the page. Allocated blocks are not freed or returned back to the memory page but once all of the allocations are no longer being used, then the entire page of memory can be returned to the available pages for the allocator and the offset pointer within the page is reset to the base pointer. The green chunks of free memory between the allocated blocks are a result of external fragmentation created by the alignment of allocated blocks. For example, if the first allocation is a block of 64 bytes and the next allocation needs to be aligned to 256-bytes (constant buffers are required to be aligned to 256-bytes) then there are 192 bytes of unused space in the memory page between the first and second allocations.
The linear allocator also suffers from internal fragmentation when a block of memory is requested but the size of the allocation is smaller than the requested alignment. For example, a block of 64 bytes of memory is 256-byte aligned (this is typical of a constant buffer that contains only a single 4×4 matrix). The allocation returns 256 bytes even if only 64 bytes will ever be used.
The shaded area in the second allocation shown in the image above is unused memory resulting in internal fragmentation since only 64 bytes was allocated but it required 256 byte alignment so 192 bytes remain unused.
Regardless of the internal and external fragmentation issues, the linear allocator is ideal due to its simplicity and speed. Allocating from the linear allocator only requires the offset pointer to be updated which can be performed in constant time (\(\mathcal{O}(1)\)).
UploadBuffer Class
As mentioned in the introduction, the UploadBuffer
class is used to satisfy requests for memory that must be uploaded to the GPU. When the data in the upload buffer is no longer required, the memory pages can be reused. A page only becomes available again when the command list that is using memory from a page of memory in the upload buffer is finished executing on the GPU. In order to simplify the implementation of the UploadBuffer
class, it is assumed that each UploadBuffer
instance is associated to a single command list/allocator. In the first tutorial, you learned that a command allocator can’t be reset unless it is no longer “in-flight” on the command queue. Similar to the command allocator, the UploadBuffer
is only reset when any memory allocations from the UploadBuffer
are no longer “in-flight” on the command queue. This is shown later in this lesson when describing the custom CommandList
class.
The implementation of this UploadBuffer
class is inspired by the implementation of the LinearAllocator
class in the MiniEngine provided with the DirectX-Graphics-Samples repository available on GitHub [1].
The UploadBuffer
class provides the following functionality:
Allocate
: Allocates a chunk of memory that can be used to upload data to the GPU.Reset
: Release all allocations for reuse.
This provides a very simple interface definition for the UploadBuffer
class.
The header file for the UploadBuffer
class is shown next.
UploadBuffer Header
The UploadBuffer
header file defines the public, and private members of the class. The preamble is shown first which defines the header file dependencies for the class.
1 2 3 4 5 6 7 8 9 10 11 12 |
/** * An UploadBuffer provides a convenient method to upload resources to the GPU. */ #pragma once #include <Defines.h> #include <wrl.h> #include <d3d12.h> #include <memory> #include <deque> |
The Defines.h
header file included on line 6 contains a few useful macro definitions. This file is local to the project but the contents are not shown here for brevity. The source code for this file is available on GitHub here: Defines.h
The wrl.h
header file provides access to the ComPtr
template class.
The d3d12.h
header file contains the interfaces for the DirectX 12 API.
The memory
header contains the std::shared_ptr
which is used to track the lifetime of memory pages in the allocator. The deque
header contains the std::deque
container class which is used to store a pool of memory pages.
1 2 3 4 5 6 7 8 9 |
class UploadBuffer { public: // Use to upload data to the GPU struct Allocation { void* CPU; D3D12_GPU_VIRTUAL_ADDRESS GPU; }; |
The Allocation
structure defined on line 18 is used to return an allocation from the UploadBuffer::Allocate
method which is shown later.
1 2 3 4 |
/** * @param pageSize The size to use to allocate new pages in GPU memory. */ explicit UploadBuffer(size_t pageSize = _2MB); |
The UploadBuffer
class declares only a single constructor which takes the size of a memory page as its only argument. The default size of a page of memory is 2MB. 2MB should be sufficient for most cases, depending on usage. The size of a memory page should be approximately large enough to contain all of the allocations for a single command list. If a lot of dynamic memory allocations are made in the command list, then it may be worthwhile to make larger pages. It is important to understand that the memory pages are never returned to the system. Once a page is allocated, it is never deallocated unless the UploadBuffer
instance is destructed. The intention of the UploadBuffer
is that it is reused each frame so the same allocations will likely be made the next frame, but the data will be different. If the pages are never freed, then the cost of creating the pages each frame can be avoided.
1 2 3 4 |
/** * The maximum size of an allocation is the size of a single page. */ size_t GetPageSize() const { return m_PageSize; } |
The GetPageSize
method simply returns the size of a single page of the allocator. This can be used to check if an allocation can be satisfied by the UploadBuffer
. If an allocation can’t be satisfied (if the page size is too small for example) then this might be an indication that the page size needs to be larger.
1 2 3 4 5 6 7 8 |
/** * Allocate memory in an Upload heap. * An allocation must not exceed the size of a page. * Use a memcpy or similar method to copy the * buffer data to CPU pointer in the Allocation structure returned from * this function. */ Allocation Allocate(size_t sizeInBytes, size_t alignment); |
The Allocate
method allocates a chunk of memory with the specified allocation. The Allocation
structure returned from this method is used to copy the CPU memory into the GPU virtual address space.
1 2 3 4 5 |
/** * Release all allocated pages. This should only be done when the command list * is finished executing on the CommandQueue. */ void Reset(); |
The Reset
method is used to reset any allocations so that the memory can be reused for the next frame.
To keep track of the memory pages, an internal Page
struct is defined. The Page
struct stores a base CPU pointer, the offset within the page, and the ID3D12Resource
that holds the GPU memory.
1 2 3 4 5 |
private: // A single page for the allocator. struct Page { Page(size_t sizeInBytes); |
The Page
structure has only a single constructor which takes the size of the page as its only arguments. This is the same as the pageSize
argument that is passed to the constructor of the UplodBuffer
class.
1 2 3 |
// Check to see if the page has room to satisfy the requested // allocation. bool HasSpace(size_t sizeInBytes, size_t alignment ) const; |
The Page::HasSpace
method is used to check if the page can satisfy the requested allocation. If the allocation cannot be satisfied by the current page, the current page is retired and a new page is created.
1 2 3 4 5 |
// Allocate memory from the page. // Throws std::bad_alloc if the the allocation size is larger // that the page size or the size of the allocation exceeds the // remaining space in the page. Allocation Allocate(size_t sizeInBytes, size_t alignment); |
The Page::Allocate
method is used to perform the actual allocation with the memory page.
1 2 |
// Reset the page for reuse. void Reset(); |
The Page::Reset
method is used to reset the page for reuse. This resets the offset within the page to 0.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
private: Microsoft::WRL::ComPtr<ID3D12Resource> m_d3d12Resource; // Base pointer. void* m_CPUPtr; D3D12_GPU_VIRTUAL_ADDRESS m_GPUPtr; // Allocated page size. size_t m_PageSize; // Current allocation offset in bytes. size_t m_Offset; }; |
The data that is private to the Page
structure is the ID3D12Resource
that contains the GPU memory for the page, the CPU and GPU base pointers, and the current offset within the page. The m_PageSize
variable is also stored to make sure the requested allocation can be satisfied.
The UploadBuffer
class needs to keep track of a pool of pages and provide a method to create new pages as required.
1 2 |
// A pool of memory pages. using PagePool = std::deque< std::shared_ptr<Page> >; |
The PagePool
type alias defines a std::deque
container that stores pointers to the memory pages.
1 2 3 |
// Request a page from the pool of available pages // or create a new page if there are no available pages. std::shared_ptr<Page> RequestPage(); |
The RequestPage
private method is used to provide an available memory page if one is available. If there are no more available pages, a new one is created and added to the page pool.
1 2 |
PagePool m_PagePool; PagePool m_AvailablePages; |
The m_PagePool
member variable is a PagePool
used to hold all of the pages that have ever been created by the UploadBuffer
class. The m_AvailablePages
member variable on the other hand, is a pool of pages that are available for allocation.
1 2 3 4 5 6 |
std::shared_ptr<Page> m_CurrentPage; // The size of each page of memory. size_t m_PageSize; }; |
The m_CurrentPage
member variable is used to store a pointer to the current memory page. The m_PageSize
variable stores the size of a memory page. This is set to the pageSize
constructor argument and is used for allocating new pages.
View the full source code for UploadBuffer.h
UploadBuffer Preamble
The preamble for the source file of the UploadBuffer
class contains the header file dependencies that are specific to the implementation of the class.
1 2 3 4 5 6 7 8 9 10 |
#include <DX12LibPCH.h> #include <UploadBuffer.h> #include <Application.h> #include <Helpers.h> #include <d3dx12.h> #include <new> // for std::bad_alloc |
The DX12LibPCH.h
header file is the precompiled header file for the DX12Lib
project. All of the classes described in this article are part of the DX12Lib project.
The UploadBuffer.h
is the header file that was just described in the previous section.
The Helpers.h
header file contains some helper functions that are used by the UploadBuffer
class. The source code for this file can be retrieved here: Helpers.h.
The d3dx12.h
provides some helper functions and structs specific for DirectX 12. This file is hosted on GitHub and not distributed with the Windows 10 SDK. It is good practice to check GitHub if there is a new version of this file and always use the latest version in your own projects.
The new
header contains the std::bad_alloc
exception class which is thrown if an allocation larger than the size of a page is requested.
UploadBuffer::UploadBuffer
The UploadBuffer
class provides a single parameterized constructor. The constructor takes the size of a memory page as its only argument.
1 2 3 |
UploadBuffer::UploadBuffer(size_t pageSize) : m_PageSize(pageSize) {} |
Besides setting the m_PageSize
member variable, the constructor does nothing. Memory pages will only be allocated if an allocation is requested. The UploadBuffer
class is intended to be used as an internal class for the custom CommandList
class (that is shown later in the lesson). If dynamic allocations are not required by the command list, then no pages will be allocated. This is a typical example of Lazy Initialization.
UploadBuffer::Allocate
The Allocate
method is used to allocate a chunk (or block) of memory from a memory page. This method returns an UploadBuffer::Allocation
struct that was defined in the header file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
UploadBuffer::Allocation UploadBuffer::Allocate(size_t sizeInBytes, size_t alignment) { if (sizeInBytes > m_PageSize) { throw std::bad_alloc(); } // If there is no current page, or the requested allocation exceeds the // remaining space in the current page, request a new page. if (!m_CurrentPage || !m_CurrentPage->HasSpace(sizeInBytes, alignment)) { m_CurrentPage = RequestPage(); } return m_CurrentPage->Allocate(sizeInBytes, alignment); } |
The Allocate
method takes two arguments:
size_t sizeInBytes
: The size of the allocation in bytes.size_t alignment
: The memory alignment of the allocation in bytes. For example, allocations for constant buffers must be aligned to 256 bytes.
If the size of the allocation exceeds the size of a memory page, the method throws a std::bad_alloc exception.
If there is either no memory page (this is the case when the UploadBuffer
is first created) or the current page cannot satisfy the request, a new page is requested.
On line 33, the actual allocation is made from the current memory page and the resulting allocation is returned to the caller.
UploadBuffer::RequestPage
If either the allocator does not have a page to make an allocation from, or the current page does not have the available space to satisfy the allocation request, a new page must be retrieved from the list of available pages or a new page must be created. The RequestPage
method will return a memory page that can be used to satisfy allocation requests.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
std::shared_ptr<UploadBuffer::Page> UploadBuffer::RequestPage() { std::shared_ptr<Page> page; if (!m_AvailablePages.empty()) { page = m_AvailablePages.front(); m_AvailablePages.pop_front(); } else { page = std::make_shared<Page>(m_PageSize); m_PagePool.push_back(page); } return page; } |
If there are pages available in the m_AvailablePages
queue, the the Page
at the front of the queue is retrieved an popped off the queue.
If there are no available pages, then a new page is created and pushed to the back the m_PagePool
queue. The m_PagePool
queue stores all of the pages created by the allocator. In this case, the page is not added to the m_AvailablePages
queue because it is going to be used to satisfy the allocation request. When the UploadBuffer
is reset, the m_PagePool
queue is used reset the m_AvailablePages
queue (which is shown later when the Reset
function is described).
UploadBuffer::Reset
The Reset
method is used to reset all of the memory allocations so that they can be reused for the next frame (or more specifically, the next command list recording).
1 2 3 4 5 6 7 8 9 10 11 12 |
void UploadBuffer::Reset() { m_CurrentPage = nullptr; // Reset all available pages. m_AvailablePages = m_PagePool; for ( auto page : m_AvailablePages ) { // Reset the page for new allocations. page->Reset(); } } |
The Reset
method makes all of the pages available again by copying the m_PagePool
to the m_AvailablePages queue.
On line 60, the available pages are reset to prepare them for new allocations.
UploadBuffer::Page::Page
The constructor for a Page takes the size of the page as its only argument.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
UploadBuffer::Page::Page(size_t sizeInBytes) : m_PageSize(sizeInBytes) , m_Offset(0) , m_CPUPtr(nullptr) , m_GPUPtr(D3D12_GPU_VIRTUAL_ADDRESS(0)) { auto device = Application::Get().GetDevice(); ThrowIfFailed(device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), D3D12_HEAP_FLAG_NONE, &CD3DX12_RESOURCE_DESC::Buffer(m_PageSize), D3D12_RESOURCE_STATE_GENERIC_READ, nullptr, IID_PPV_ARGS(&m_d3d12Resource) )); m_GPUPtr = m_d3d12Resource->GetGPUVirtualAddress(); m_d3d12Resource->Map(0, nullptr, &m_CPUPtr); } |
The Page
constructor also creates the ID3D12Resource
as a committed resource in an upload heap. The creation of committed resource is described in Lesson 2 and for brevity is not described again here.
After the resource is created, the GPU and CPU addresses are retrieved using the ID3D12Resource::GetGPUVirtualAddress
and ID3D12Resource::Map
methods respectively. As long as the resource is created in an upload heap, it is safe to leave the resource mapped until the resource is no longer needed.
UploadBuffer::Page::~Page
The destructor for the Page
struct unmaps the resource memory using the ID3D12Resource::Unmap
method and resets the CPU and GPU pointers to 0. Since the m_d3d12Resource
is stored using a ComPtr
there is no need to explicitly release it since it will be automatically released after the Page
is destructed.
1 2 3 4 5 6 |
UploadBuffer::Page::~Page() { m_d3d12Resource->Unmap(0, nullptr); m_CPUPtr = nullptr; m_GPUPtr = D3D12_GPU_VIRTUAL_ADDRESS(0); } |
Before allocating memory from a Page
, the Page
must have enough space to satisfy the allocation request. The Page::HasSpace
method is used to check if the page can satisfy the requested allocation.
UploadBuffer::Page::HasSpace
The Page::HasSpace
method checks to see if the page can satisfy the requested allocation. This method returns true
if the allocation can be satisfied, or false
if the allocation cannot be satisfied.
1 2 3 4 5 6 7 |
bool UploadBuffer::Page::HasSpace(size_t sizeInBytes, size_t alignment ) const { size_t alignedSize = Math::AlignUp(sizeInBytes, alignment); size_t alignedOffset = Math::AlignUp(m_Offset, alignment); return alignedOffset + alignedSize <= m_PageSize; } |
The HasSpace
method must take the alignment into consideration. If the requested aligned allocation can be satisfied, this method returns true
.
UploadBuffer::Page::Allocate
The Page::Allocate
method is where the actual allocation occurs. This method returns an Allocation
structure that can be used to directly copy (using memcpy
for example) CPU data to the GPU and bind that GPU address to the pipeline.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
UploadBuffer::Allocation UploadBuffer::Page::Allocate(size_t sizeInBytes, size_t alignment) { if (!HasSpace(sizeInBytes, alignment)) { // Can't allocate space from page. throw std::bad_alloc(); } size_t alignedSize = Math::AlignUp(sizeInBytes, alignment); m_Offset = Math::AlignUp(m_Offset, alignment); Allocation allocation; allocation.CPU = static_cast<uint8_t*>(m_CPUPtr) + m_Offset; allocation.GPU = m_GPUPtr + m_Offset; m_Offset += alignedSize; return allocation; } |
If the Page
does not have enough space to satisfy the allocation request, this method will throw a std::bad_alloc
exception.
Page::Allocate
method shown here on lines 105 – 109. Feel free to remove this check in your own implementation. Both the size and the starting address of an allocation should be aligned to the requested alignment. In most cases the size of the allocation will already be aligned to the requested alignment (for example, when allocating memory for a vertex or index buffer) but to ensure correctness, the requested allocation size is explicitly aligned up to the requested alignment on line 111.
On line 112, the current offset within the page must also be aligned to the requested alignment.
On line 114 – 115 the aligned CPU and GPU addresses are written to the Allocation
structure that is returned by this method.
On line 118, the page’s pointer offset is incremented by the aligned size of the allocation.
On line 120, the Allocation
structure is returned to the caller.
Page::Allocate
method is not thread safe! If you require thread safety for this method then you may want to insert a std::lock_guard
before line 105 of this method. Since I do not use the same instance of an UploadBuffer
class across multiple threads, I consider this to be unnecessary overhead (there is some cost associated with locking and unlocking mutexes that I do not want to pay for here). UploadBuffer::Page::Reset
The Page::Reset
method simply resets the page’s pointer offset to 0 so that it can be used to make new allocations.
1 2 3 4 |
void UploadBuffer::Page::Reset() { m_Offset = 0; } |
This concludes the implementation of the UploadBuffer
class. In the next section, the DescriptorAllocator
class is described. As the name implies, the DescriptorAllocator
class is used to allocate (CPU visible) descriptors. CPU visible descriptors are used to create views for resources (for example Render Target Views (RTV), Depth-Stencil Views (DSV), Constant Buffer Views (CBV), Shader Resource Views (SRV), Unordered Access Views (UAV), and Samplers). Before a CBV, SRV, UAV, or Sampler can be used in a shader, it must be copied to a GPU visible descriptor. The DynamicDescriptorHeap
class handles copying of CPU visible descriptors to GPU visible descriptor heaps. The DynamicDescriptorHeap
class is the subject of the next following sections.
View the full source code for UploadBuffer.cpp
Descriptor Allocator
The DescriptorAllocator
class is used to allocate descriptors from a CPU visible descriptor heap. CPU visible descriptors are useful for “staging” resource descriptors in CPU memory and later copied to a GPU visible descriptor heap for use in a shader.
CPU visible descriptors are used for describing:
- Render Target Views (RTV)
- Depth-Stencil Views (DSV)
- Constant Buffer Views (CBV)
- Shader Resource Views (SRV)
- Unordered Access Views (UAV)
- Samplers
The DescriptorAllocator
class is used to allocate descriptors to the application when loading new resources (like textures). In a typical game engine, resources may need to be loaded and unloaded from memory at sporadic moments while the player moves around the level. To support large dynamic worlds, it may be necessary to initially load some resources, unload them from memory, and reload different resources. The DescriptorAllocator
manages all of the descriptors that are required to describe those resources. Descriptors that are no longer used (for example, when a resource is unloaded from memory) will be automatically returned back to the descriptor heap for reuse.
The DescriptorAllocator
class uses a free list memory allocation scheme inspired by the Variable Sized Memory Allocations Manager by Diligent Graphics [2] to manage the descriptors. A free list keeps track of a list of available allocations. Each entry of the free list stores the available allocations from a page of memory. Each entry of the free list stores the offset from the beginning of the memory page and the size of the available allocation. In order to satisfy the allocation, the free list is searched for an entry that is large enough to satisfy the allocation request. If the allocation cannot be satisfied by the current page, a new page is created in memory.
The above image shows an example of pages of memory that are allocated using a free list allocation strategy. The top image shows the initial state of the page before any allocations are made. In this case, the free list contains only a single entry which refers to the entire memory page. The bottom image shows an example of a memory page after several allocations have been made. In this case, the free list contains several entries which represent the available blocks of memory in the page.
To make a new allocation from the page, all of the entries in the free list are searched and the first block that is large enough to satisfy the request is used. If there are no free blocks that can satisfy the request, then a new page is allocated.
This strategy for allocating memory is called first-fit (find the first free block that fits) and is the easiest strategy to implement since it only consists of a linear search through the free list but it is not the most efficient method to use for allocation. A linear search has \(\mathcal{O}(n)\) (worst case) complexity (where \(n\) is the number of entries in the free list).
A better technique would be to sort the free blocks by their size and perform a binary-search through the sizes to find a block that is large enough to satisfy the request. If you remember for your algorithm analysis class, a binary search has \(\mathcal{O}(log_2n)\) complexity (where \(n\) is the number of values to search) which is better than \(\mathcal{O}(n)\).
The above image shows a memory page after several allocations have been made. The binary tree in the bottom of the image represents the entries of the free list sorted by size. Using the binary tree, an allocation of 160 bytes can be satisfied by searching just three nodes. Using the linear list would require five entries to be searched before the allocation could be satisfied. With only six entries in the free list, this may not seem like a significant performance improvement, but with thousands (or millions) of entries, the performance improvement is significant.
Three different classes are used to implement this strategy:
DescriptorAllocator
: This is the main interface to the application for requesting descriptors. TheDescriptorAllocator
class manages the descriptor pages.DescriptorAllocatorPage
: This class is a wrapper for aID3D12DescriptorHeap
. TheDescriptorAllocatorPage
also keeps track of the free list for the page.DescriptorAllocation
: This class wraps an allocation that is returned from theDescriptorAllocator::Allocate
method. TheDescriptorAllocation
class also stores a pointer back to the page it came from and will automatically free itself if the descriptor(s) are no longer required.
The DescriptorAllocator
class is described first.
DescriptorAllocator Class
The implementation of the DescriptorAllocator
class is very similar to the UploadBuffer
class shown in the previous section. The DescriptorAllocator
class stores a pool of DescriptorAllocatorPage
s. If there are no pages that can satisfy a request, a new page is created and added to the pool. Similar to the UploadBuffer
class, the DescriptorAllocator
class has a very simple public interface and only provides a method to allocate descriptors.
DescriptorAllocator Header
The header file for the DescriptorAllocator
class declares the public and private members of the class. The preamble for the header file is shown first which includes the dependencies for the class.
1 2 3 4 5 6 7 8 9 10 11 |
#include "DescriptorAllocation.h" #include "d3dx12.h" #include <cstdint> #include <mutex> #include <memory> #include <set> #include <vector> class DescriptorAllocatorPage; |
The DescriptorAllocator::Allocate
method returns a DescriptorAllocation
by value which requires the DescriptorAllocation.h
header file to be included (on line 40) in this file.
The ubiquitous d3dx12.h
header file included on line 42 is required for the DirectX 12 API and helper structures and functions.
The cstdint
header file included on line 44 is required for the fixed-width integer types (uint32_t
, and uint64_t
).
The mutex
header file is included for the std::mutex
synchronization primitive. The mutex
is used in the Allocate
method to allow allocations to be safely made across multiple threads.
The memory
header file is required for the std::shared_ptr
pointer class. Shared pointers are used to track the lifetime of the pages. Each allocation also stores a shared pointer back to the page it came from.
The set
header file includes the std::set
container class. A set
is used to store an ordered list of indices to the available pages in the page pool.
The vector
header file includes the std::vector
container class.
The DescriptorAllocatorPage
class is used by the DescriptorAllocator
class but the header file does not need to be included since the DescriptorAllocatorPage
class is only used as a pointer within the DescriptorAllocator
class. In this case, it is sufficient to provide a forward-declaration of the class (on line 50) without the need to include the header file.
The DescriptorAllocator
class defines two public member functions:
DescriptorAllocator::Allocate
: Allocates a number of contiguous descriptors from a CPU visible descriptor heap.DescriptorAllocator::ReleaseStaleDescriptors
: Frees any stale descriptors that can be returned to the list of available descriptors for reuse. This method should only be called after any of the descriptors that were freed are no longer being referenced by the command queue.
The definition of these methods is shown later. The declaration of these methods is made in the header file for the DescriptorAllocator
class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
class DescriptorAllocator { public: DescriptorAllocator(D3D12_DESCRIPTOR_HEAP_TYPE type, uint32_t numDescriptorsPerHeap = 256); virtual ~DescriptorAllocator(); /** * Allocate a number of contiguous descriptors from a CPU visible descriptor heap. * * @param numDescriptors The number of contiguous descriptors to allocate. * Cannot be more than the number of descriptors per descriptor heap. */ DescriptorAllocation Allocate(uint32_t numDescriptors = 1); /** * When the frame has completed, the stale descriptors can be released. */ void ReleaseStaleDescriptors( uint64_t frameNumber ); |
The DescriptorAllocator
constructor declared on line 55 takes two parameters. The first is the type of descriptors that the DescriptorAllocator
will allocate. This can be one of the CBV_SRV_UAV
, SAMPLER
, RTV
, or DSV
types.
The second parameter to the constructor is the number of descriptors per descriptor heap. By default, descriptor heaps will be created with 256 descriptors. This value is arbitrary and only needs to be as large as the maximum number of contiguous descriptors that will ever be needed. If all of the descriptors in a descriptor heap have been exhausted, a new heap will be created to satisfy the allocation request.
The DescriptorAllocator::Allocate
method allocates a number contiguous descriptors from a descriptor heap. By default, only a single descriptor is allocated. The numDescriptors
argument can be specified if more than one descriptor is required. This method returns a DescriptorAllocation
which is a wrapper for the allocated descriptor. The DescriptorAllocation
class is described later.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
private: using DescriptorHeapPool = std::vector< std::shared_ptr<DescriptorAllocatorPage> >; // Create a new heap with a specific number of descriptors. std::shared_ptr<DescriptorAllocatorPage> CreateAllocatorPage(); D3D12_DESCRIPTOR_HEAP_TYPE m_HeapType; uint32_t m_NumDescriptorsPerHeap; DescriptorHeapPool m_HeapPool; // Indices of available heaps in the heap pool. std::set<size_t> m_AvailableHeaps; std::mutex m_AllocationMutex; }; |
The DescriptorHeapPool
defined on line 72 is a type alias of a std::vector
of DescriptorAllocatorPage
s.
The DescriptorAllocator::CreateAllocatorPage
method declared on line 75 is an internal method that is used to create a new allocator page if there are no pages in the page pool that can satisfy the allocation request.
The m_HeapType
variable stores the type of descriptors to allocate. This variable is also used to create new descriptor heaps.
The m_NumDescriptorsPerHeap
variable stores the number of descriptors to create per descriptor heap.
The m_HeapPool
is a std::vector
of DescriptorAllocatorPage
s. This variable is used to keep track of all allocated pages.
The m_AvailableHeaps
is a std::set
of indices of available pages in the m_HeapPool
vector. If all of the descriptors in a DescriptorAllocatorPage
have been exhausted, then the index of that page in the m_HeapPool
vector is removed from the m_AvailableHeaps
set. This ensures that empty pages are skipped when looking for a DescriptorAllocatorPage
that can satisfy the allocation request.
Since the DescriptorAllocator
class is intended to be thread safe, a std::mutex
is used to guard against multiple threads allocating or deallocating from the DescriptorAllocator
at the same time.
In the next sections, the implementation of the DescriptorAllocator
is shown.
View the full source code for DescriptorAllocator.h
DescriptorAllocator Preamble
Before defining the methods of the DescriptorAllocator
class, a few header files used by the class need to be included.
1 2 3 4 |
#include <DX12LibPCH.h> #include <DescriptorAllocator.h> #include <DescriptorAllocatorPage.h> |
The DX12LibPCH.h
is the precompiled header file for the DX12Lib
project.
The DescriptorAllocator.h
header file included on line 3 was just described in the previous section and the DescriptorAllocatorPage.h
header file contains the declaration of the DescriptorAllocatorPage
class (which will be shown later).
DescriptorAllocator::DescriptorAllocator
Similar to the constructor for the UploadBuffer
class shown previously, the constructor for the DescriptorAllocator
class does very little except initializing the class’s member variables.
1 2 3 4 5 |
DescriptorAllocator::DescriptorAllocator(D3D12_DESCRIPTOR_HEAP_TYPE type, uint32_t numDescriptorsPerHeap) : m_HeapType(type) , m_NumDescriptorsPerHeap(numDescriptorsPerHeap) { } |
The m_HeapType
and m_NumDescriptorsPerHeap
member variables are initialized based on the arguments passed to the constructor.
DescriptorAllocator::CreateAllocatorPage
The CreateAllocatorPage
method is used to create a new page of descriptors. The DescriptorAllocatorPage
class (which will be shown later) is a wrapper for the ID3D12DescriptorHeap
and manages the actual descriptors.
1 2 3 4 5 6 7 8 9 |
std::shared_ptr<DescriptorAllocatorPage> DescriptorAllocator::CreateAllocatorPage() { auto newPage = std::make_shared<DescriptorAllocatorPage>( m_HeapType, m_NumDescriptorsPerHeap ); m_HeapPool.emplace_back( newPage ); m_AvailableHeaps.insert( m_HeapPool.size() - 1 ); return newPage; } |
The DescriptorAllocator::CreateAllocatorPage
is very simple. On line 17 a new DescriptorAllocatorPage
is created and added to the pool. On line 20, the index of the page in the pool is added to the m_AvailableHeaps
set.
On line 22, the new page is returned to the calling function.
DescriptorAllocator::Allocate
The Allocate
method allocates a contiguous block of descriptors from a descriptor heap. The method iterates through the available descriptor heap (pages) and tries to allocate the requested number of descriptors until a descriptor heap (page) is able to fulfill the requested allocation. If there are no descriptor heaps that can fulfill the request, then a new descriptor heap (page) is created that can fulfill the request.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
DescriptorAllocation DescriptorAllocator::Allocate(uint32_t numDescriptors) { std::lock_guard<std::mutex> lock( m_AllocationMutex ); DescriptorAllocation allocation; for ( auto iter = m_AvailableHeaps.begin(); iter != m_AvailableHeaps.end(); ++iter ) { auto allocatorPage = m_HeapPool[*iter]; allocation = allocatorPage->Allocate( numDescriptors ); if ( allocatorPage->NumFreeHandles() == 0 ) { iter = m_AvailableHeaps.erase( iter ); } // A valid allocation has been found. if ( !allocation.IsNull() ) { break; } } |
Before allocating any descriptors, the m_AllocationMutex
mutex is locked to ensure the current thread has exclusive access to the allocator.
The result of the allocation is stored in the allocation
variable defined on line 29.
On lines 31-47, the available descriptor heaps are iterated and on line 35 an allocation of the requested number of descriptors is made. If the allocator page was able to satisfy the requested number of descriptors, then a valid descriptor allocation is returned. If the allocation resulted in the allocator page becoming empty (the number of free descriptor handles reaches 0) then the index of the current page is removed from the set of available heaps (on line 39).
If a valid descriptor handle was allocated from the allocator page (the descriptor handle is not null) then the loop breaks on line 45.
If there were no available allocator pages (which is the case when the DescriptorAllocator
is created) or none of the available allocator pages could satisfy the request, then a new allocator page is created.
1 2 3 4 5 6 7 8 9 10 11 |
// No available heap could satisfy the requested number of descriptors. if ( allocation.IsNull() ) { m_NumDescriptorsPerHeap = std::max( m_NumDescriptorsPerHeap, numDescriptors ); auto newPage = CreateAllocatorPage(); allocation = newPage->Allocate( numDescriptors ); } return allocation; } |
On line 50, the descriptor allocation is checked for validity. If it is still an invalid descriptor (a null descriptor) then a new descriptor page, that is at least as large as the number of requested descriptors, is created on line 53 using the DescriptorAllocator::CreateAllocatorPage
method described earlier.
On line 55, the requested allocation is made (which should be guaranteed to succeed) and the resulting allocation is returned to the caller on line 58.
DescriptorAllocator::ReleaseStaleDescriptors
The last method of the DescriptorAllocator
class is the ReleaseStaleDescriptors
method. The ReleaseStaleDescriptors
method iterates over all of the descriptor heap pages and calls the page’s ReleaseStaleDescriptors
method. If, after releasing the stale descriptors, the page has free handles, it’s added to the list of available heaps.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
void DescriptorAllocator::ReleaseStaleDescriptors( uint64_t frameNumber ) { std::lock_guard<std::mutex> lock( m_AllocationMutex ); for ( size_t i = 0; i < m_HeapPool.size(); ++i ) { auto page = m_HeapPool[i]; page->ReleaseStaleDescriptors( frameNumber ); if ( page->NumFreeHandles() > 0 ) { m_AvailableHeaps.insert( i ); } } } |
In order to prevent modifications of the DescriptorAllocator
in other threads, the m_AllocationMutex
mutex is locked on line 63.
On lines 65-75, the pages of heap pool are iterated calling the page’s ReleaseStaleDescriptors
method. The implementation of the DescriptorAllocatorPage::ReleaseStaleDescriptors
method is shown in the following sections.
Pages that have free descriptor handles are added to the set of available heaps on line 73. It’s okay to add the same index to the set multiple times since the std::set
is guaranteed to only store unique values.
View the full source code for DescriptorAllocator.cpp
DescriptorAllocatorPage Class
The purpose of the DescriptorAllocatorPage
class is to provide the free list allocator strategy for an ID3D12DescriptorHeap
. The DescriptorAllocatorPage
class is not intended to be used outside of the DescriptorAllocator
class so the library end user doesn’t necessarily need to know the details of this class. Knowing the details of this class is more interesting to someone who is writing their own DirectX 12 library or to someone who wants to understand the implementation details provided by the DX12Lib project that has been created for the purpose of these tutorials. As previously mentioned, the implementation of this class is heavily inspired by Variable Size Memory Allocations Manager from Diligent Graphics [2].
The DescriptorAllocatorPage
class must be able to satisfy descriptor allocation requests but it also needs to provide some functions to query the number of free handles and to check to see if it has sufficient space to satisfy a request. The DescriptorAllocatorPage
provides the following (public) methods:
HasSpace
: Check to see if theDescriptorAllocatorPage
has a contiguous block of descriptors that is large enough to satisfy a request.NumFreeHandles
: Returns the number of available descriptor handles in the descriptor heap. Note that due to fragmentation of the free list, allocations that are less than or equal to the number of free handles could still fail.Allocate
: Allocates a number of contiguous descriptors from the descriptor heap. If theDescriptorAllocatorPage
is not able to satisfy the request, this function will return a nullDescriptorAllocation
Free
: Returns aDescriptorAllocation
back to the heap. Since descriptors can’t be reused until the command list that is referencing them has finished executing on the command queue, the descriptors are not returned directly to the heap until the render frame has finished executing.ReleaseStaleDescriptors
: Returns any free’d descriptors back to the descriptor heap for reuse.
DescriptorAllocatorPage Header
The declaration of the DescriptorAllocatorPage
class is slightly more elaborate than the DescriptorAllocator
class described in the previous section. The DescriptorAllocatorPage
class is not only a wrapper for a ID3D12DescriptorHeap
but also implements a free list allocator to manage the descriptors in the heap.
1 2 3 4 5 6 7 8 9 10 |
#include "DescriptorAllocation.h" #include <d3d12.h> #include <wrl.h> #include <map> #include <memory> #include <mutex> #include <queue> |
Since the DescriptorAllocatorPage::Allocate
method (shown later) returns a DescriptorAllocation
object by value, the header file for DescriptorAllocation
class needs to be included on line 37 (a forward declaration is not sufficient).
The d3d12.h
header file is required for the ID3D12DescriptorHeap
.
The wrl.h
header file included on line 41 is required for the ComPtr
template class.
The map
, memory
, mutex
, and queue
headers are required for the STL types that are used by the DescriptorAllocatorPage
class.
1 2 3 4 5 6 7 8 9 10 11 12 |
class DescriptorAllocatorPage : public std::enable_shared_from_this<DescriptorAllocatorPage> { public: DescriptorAllocatorPage( D3D12_DESCRIPTOR_HEAP_TYPE type, uint32_t numDescriptors ); D3D12_DESCRIPTOR_HEAP_TYPE GetHeapType() const; /** * Check to see if this descriptor page has a contiguous block of descriptors * large enough to satisfy the request. */ bool HasSpace( uint32_t numDescriptors ) const; |
The DescriptorAllocatorPage
class publically inherits from the std::enable_shared_from_this
template class. The std::enable_shared_from_this
template class provides the shared_from_this
member function which enables the DescriptorAllocatorPage
class to retrieve a std::shared_ptr
from itself (which will be used in the DescriptorAllocatorPage::Allocate
method shown later). This requires the DescriptorAllocatorPage
class to be created from a shared pointer using either std::make_shared
or std::shared_ptr<T>( new T(...) )
. This requirement is acceptable in this case since the DescriptorAllocatorPage
class should only be used by the DescriptorAllocator
class. On line 17 of the DescriptorAllocator::CreateAllocatorPage
method shown previously, the DescriptorAllocatorPage
is created using the std::make_shared
method.
The parameterized constructor for the DescriptorAllocatorPage
class is declared on line 51. The constructor takes two arguments: the type of descriptor heap to create and the number of descriptors to allocate in the descriptor heap.
The GetHeapType
method declared on line 53 simply returns the descriptor heap type that was used to construct the DescriptorAllocatorPage
.
The HasSpace
method declared on line 59 is used to check if the DescriptorAllocatorPage
has a contiguous block of descriptors in the descriptor heap that is large enough to satisfy a request. It is often more efficient to first check if an allocation request will succeed first before making an allocation request and then checking for failure.
1 2 3 4 5 6 7 8 9 10 11 |
/** * Get the number of available handles in the heap. */ uint32_t NumFreeHandles() const; /** * Allocate a number of descriptors from this descriptor heap. * If the allocation cannot be satisfied, then a NULL descriptor * is returned. */ DescriptorAllocation Allocate( uint32_t numDescriptors ); |
The NumFreeHandles
method defined on line 64 checks how many descriptor handles the DescriptorAllocatorPage
still has available. Due to fragmentation of the free list, an allocation request of a contiguous block of descriptors that is less than the total number of free handles could still fail. For example, the fragmented free list shown in the previous image has 544 free descriptors but the largest contiguous block is only 128 descriptors wide.
The Allocate
method defined on line 71 is used to allocate a number of descriptors from the descriptor heap. If the allocation fails, this method returns a null descriptor. This method returns a DescriptorAllocation
. To check if the descriptor is valid, the DescriptorAllocation::IsNull
method is used. This method is shown later in the section about the DescriptorAllocation
class.
1 2 3 4 5 6 7 8 9 10 11 12 |
/** * Return a descriptor back to the heap. * @param frameNumber Stale descriptors are not freed directly, but put * on a stale allocations queue. Stale allocations are returned to the heap * using the DescriptorAllocatorPage::ReleaseStaleAllocations method. */ void Free( DescriptorAllocation&& descriptorHandle, uint64_t frameNumber ); /** * Returned the stale descriptors back to the descriptor heap. */ void ReleaseStaleDescriptors( uint64_t frameNumber ); |
The Free
method declared on line 79 is used to free a DescriptorAllocation
that was previously allocated using the DescriptorAllocatorPage::Allocate
method. It is not required to call this method directly since the DescriptorAllocation
class will automatically free itself back to the DescriptorAllocatorPage
it came from if it is no longer in use. This method takes the DescriptorAllocation
as an r-value reference which implies that the DescriptorAllocation
is moved into the function leaving the original DescriptorAllocation
invalid.
The ReleaseStaleDescriptors
method defined on line 84 releases the stale descriptors back to the descriptor heap for reuse. This method take the completed frame number as its only argument. All of the descriptors that were released during that frame will be returned to the heap.
The DescriptorAllocatorPage
defines a few additional methods that are internal to this class.
1 2 3 4 5 6 7 8 9 10 11 12 |
protected: // Compute the offset of the descriptor handle from the start of the heap. uint32_t ComputeOffset( D3D12_CPU_DESCRIPTOR_HANDLE handle ); // Adds a new block to the free list. void AddNewBlock( uint32_t offset, uint32_t numDescriptors ); // Free a block of descriptors. // This will also merge free blocks in the free list to form larger blocks // that can be reused. void FreeBlock( uint32_t offset, uint32_t numDescriptors ); |
The ComputeOffset
method computes the number of descriptors from the base descriptor to the specified descriptor handle. This method is used to determine where a descriptor needs to be placed back in heap when the descriptor is free’d.
The AddNewBlock
method adds a block of descriptors to the free list. This method is used to initialize the free list (with a single block containing all descriptors), when splitting a block of descriptors during allocation, and for merging neighboring blocks when descriptors are free’d.
The FreeBlock
method is used to free a block of descriptors. This method is used by the ReleaseStaleDescriptors
method to commit the stale descriptors back to the descriptor heap. The FreeBlock
method also checks if neighboring blocks in the free list can be merged. Merging free blocks in the free list reduces the fragmentation in the free list.
The DescriptorAllocatorPage
class also defines some private data members.
1 2 3 4 5 |
private: // The offset (in descriptors) within the descriptor heap. using OffsetType = uint32_t; // The number of descriptors that are available. using SizeType = uint32_t; |
In order to improve code readability and reduce ambiguity, the OffsetType
type alias is defined to refer to an offset (in descriptors) within the descriptor heap. The SizeType
type alias is defined to refer to the number of descriptors in a block (in the free list).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
struct FreeBlockInfo; // A map that lists the free blocks by the offset within the descriptor heap. using FreeListByOffset = std::map<OffsetType, FreeBlockInfo>; // A map that lists the free blocks by size. // Needs to be a multimap since multiple blocks can have the same size. using FreeListBySize = std::multimap<SizeType, FreeListByOffset::iterator>; struct FreeBlockInfo { FreeBlockInfo( SizeType size ) : Size( size ) {} SizeType Size; FreeListBySize::iterator FreeListBySizeIt; }; |
The FreeBlockInfo
struct is forward declared on line 105 and defined on line 113. The forward declaration of the FreeBlockInfo
struct is required to create the FreeListByOffset
type alias on line 107. The FreeListByOffset
type is an alias of a std::map
which maps FreeBlockInfo
to the offset of the free block within the free list.
The FreeListBySize
type is an alias of a std::multimap
that provides a mechanisim to quickly find the first block in the free list that can satisfy an allocation request. The FreeListBySize
type needs to be a std::multimap
since there can be many blocks in the free list with the same size.
The FreeBlockInfo
struct simply stores the size of the block in the free list and a reference (iterator) to its entry in the FreeListBySize
map. The FreeBlockInfo
struct stores the iterator to its entry in the FreeListBySize
map so that the entry can be quickly removed (without searching) when merging neighboring blocks in the free list.
The image above shows an example of a free list after several allocations have been made. The FreeListByOffset
data structure stores a reference to the corresponding entry in the FreeListBySize
map. Similarly, each entry in the FreeListBySize
map stores a reference by to the corresponding entry in the FreeListByOffset
map. This solution resembles a bi-directional map (Bimap in Boost) which provides optimized searching on both offset and size of each entry in the free list.
The StaleDescriptorInfo
struct is used to keep track of descriptors in the descriptor heap that have been freed but can’t be reused until the frame in which they were freed is finished executing on the GPU.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
struct StaleDescriptorInfo { StaleDescriptorInfo( OffsetType offset, SizeType size, uint64_t frame ) : Offset( offset ) , Size( size ) , FrameNumber( frame ) {} // The offset within the descriptor heap. OffsetType Offset; // The number of descriptors SizeType Size; // The frame number that the descriptor was freed. uint64_t FrameNumber; }; |
The StaleDescriptorInfo
struct tracks the offset of the first descriptor and the number of descriptors in the descriptor range. The FrameNumber
parameter stores the frame that the descriptors were freed.
1 2 3 4 5 6 7 |
// Stale descriptors are queued for release until the frame that they were freed // has completed. using StaleDescriptorQueue = std::queue<StaleDescriptorInfo>; FreeListByOffset m_FreeListByOffset; FreeListBySize m_FreeListBySize; StaleDescriptorQueue m_StaleDescriptors; |
The StaleDescriptorQueue
is a type alias for a queue of StaleDescriptorInfo
s.
The m_FreeListByOffset
, m_FreeListBySize
, and m_StaleDescriptors
member variables are the necessary data structures to track the state of the free list.
1 2 3 4 5 6 7 8 9 |
Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> m_d3d12DescriptorHeap; D3D12_DESCRIPTOR_HEAP_TYPE m_HeapType; CD3DX12_CPU_DESCRIPTOR_HANDLE m_BaseDescriptor; uint32_t m_DescriptorHandleIncrementSize; uint32_t m_NumDescriptorsInHeap; uint32_t m_NumFreeHandles; std::mutex m_AllocationMutex; }; |
On line 147, the underlying ID3D12DescriptorHeap
interface is defined.
The m_HeapType
variable defines the type of descriptor heap used by the DescriptorAllocatorPage
class.
Since the increment size of a descriptor within a descriptor heap is vendor specific, it must be queried at runtime (see Tutorial 1 for more information on descriptor heaps). The descriptor increment size is stored in the m_DescriptorHandleIncrementSize
member variable.
The total number of descriptors in the descriptor heap is saved in the m_NumDescriptorsInHeap
member variable and the total number of remaining descriptors in the heap is stored in the m_NumFreeHandles
member variable.
The m_AllocationMutex
defined on line 154 is used to ensure safe access allocations and deallocations across multiple threads.
View the full source code for DescriptorAllocatorPage.h
DescriptorAllocatorPage Preamble
The DescriptorAllocatorPage
class requires a few additional headers in order to compile.
1 2 3 4 |
#include <DX12LibPCH.h> #include <DescriptorAllocatorPage.h> #include <Application.h> |
The DX12LibPCH.h
provides a precompiled header file for the DX12Lib
project.
The DescriptorAllocatorPage.h
header file is described in the previous section.
The Application.h
header file provides access to the Application
class. The Application
class was briefly described in Tutorial 2. The Application
class is used to get access to the ID3D12Device
object.
DescriptorAllocatorPage::DescriptorAllocatorPage
The parameratized constructor for the DescriptorAllocatorPage
class takes the heap type and the number of descriptors to allocate in the heap as arguments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
DescriptorAllocatorPage::DescriptorAllocatorPage( D3D12_DESCRIPTOR_HEAP_TYPE type, uint32_t numDescriptors ) : m_HeapType( type ) , m_NumDescriptorsInHeap( numDescriptors ) { auto device = Application::Get().GetDevice(); D3D12_DESCRIPTOR_HEAP_DESC heapDesc = {}; heapDesc.Type = m_HeapType; heapDesc.NumDescriptors = m_NumDescriptorsInHeap; ThrowIfFailed( device->CreateDescriptorHeap( &heapDesc, IID_PPV_ARGS( &m_d3d12DescriptorHeap ) ) ); m_BaseDescriptor = m_d3d12DescriptorHeap->GetCPUDescriptorHandleForHeapStart(); m_DescriptorHandleIncrementSize = device->GetDescriptorHandleIncrementSize( m_HeapType ); m_NumFreeHandles = m_NumDescriptorsInHeap; // Initialize the free lists AddNewBlock( 0, m_NumFreeHandles ); } |
On line 10, a pointer to the ID3D12Device
is retrieved from the Application
class.
Before creating the ID3D12DescriptorHeap
object, it must be described. The D3D12_DESCRIPTOR_HEAP_DESC
is used to describe the ID3D12DescriptorHeap
and has the following members [3]:
D3D12_DESCRIPTOR_HEAP_TYPE Type
: Specifies the types of descriptors in the heap.UINT NumDescriptors
: The number of descriptors in the heap.D3D12_DESCRIPTOR_HEAP_FLAGS Flags
: A combination ofD3D12_DESCRIPTOR_HEAP_FLAGS
values that are combined by using a bitwise OR operation. The following flags are currently available:D3D12_DESCRIPTOR_HEAP_FLAG_NONE
: Indicates default usage of a heap.D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE
: This flag can optionally be set on a descriptor heap to indicate it is be bound on a command list for reference by shaders. Descriptor heaps created without this flag allow applications the option to stage descriptors in CPU memory before copying them to a shader visible descriptor heap, as a convenience. But it is also fine for applications to directly create descriptors into shader visible descriptor heaps with no requirement to stage anything on the CPU.
This flag only applies to CBV, SRV, UAV and samplers. It does not apply to other descriptor heap types since shaders do not directly reference the other types.
UINT NodeMask
: For single-adapter operation, set this to zero. If there are multiple adapter nodes, set a bit to identify the node (one of the device’s physical adapters) to which the descriptor heap applies. Each bit in the mask corresponds to a single node. Only one bit must be set.
On line 16, the actual ID3D12DescriptorHeap
is created using the ID3D12Device::CreateDescriptorHeap
method.
On line 18, the m_BaseDescriptor
member variable is initialized to the first descriptor handle in the heap and on line 19 the increment size of a descriptor in the descriptor heap is queried using the ID3D12Device::GetDescriptorHandleIncrementSize
method. On line 20, the number of free handles in the DescriptorAllocatorPage
is initialized to the number of handles in the ID3D12DescriptorHeap
.
On line 23 a single block of descriptors is added to the free list using the AddNewBlock
method. The new block has an offset of 0 and a size of m_NumFreeHandles
.
DescriptorAllocatorPage::GetHeapType
The GetHeapType
method is simply a getter method that returns the heap type.
1 2 3 4 |
D3D12_DESCRIPTOR_HEAP_TYPE DescriptorAllocatorPage::GetHeapType() const { return m_HeapType; } |
DescriptorAllocatorPage::NumFreeHandles
The NumFreeHandles
method is simply a getter method that returns the number of free handles that are currently available in the heap.
1 2 3 4 |
uint32_t DescriptorAllocatorPage::NumFreeHandles() const { return m_NumFreeHandles; } |
DescriptorAllocatorPage::HasSpace
The HasSpace
method is used to check if the DescriptorAllocatorPage
has a free block of descriptors that is large enough to satisfy a request for a particular number of descriptors.
1 2 3 4 |
bool DescriptorAllocatorPage::HasSpace( uint32_t numDescriptors ) const { return m_FreeListBySize.lower_bound(numDescriptors) != m_FreeListBySize.end(); } |
The std::map::lower_bound
method is used to find the first entry in the free list that is not less than (in other words: greater than or equal to) the requested number of descriptors. If no such element exists that is not less than numDescriptors
, then the past-the-end iterator is returned which indicates that the free list cannot satisfy the requested number of descriptors. If the DescriptorAllocatorPage
is not able to satisfy the request, then the DescriptorAllocator
will create a new page (as was shown previously in the DescriptorAllocator::Allocate
method).
DescriptorAllocatorPage::AddNewBlock
The AddNewBlock
method adds a block to the free list. The block is added to both the FreeListByOffset
map and the FreeListBySize
map. Both lists are linked to create the bi-directional map for optimized lookups.
1 2 3 4 5 6 |
void DescriptorAllocatorPage::AddNewBlock( uint32_t offset, uint32_t numDescriptors ) { auto offsetIt = m_FreeListByOffset.emplace( offset, numDescriptors ); auto sizeIt = m_FreeListBySize.emplace( numDescriptors, offsetIt.first ); offsetIt.first->second.FreeListBySizeIt = sizeIt; } |
On line 43, the std::map::emplace
method is used to emplace an element into the m_FreeListByOffset
map. This method returns a std::pair
where the first element is an iterator to the inserted element. The iterator to the inserted element is used to add an entry to the m_FreeListBySize
multimap
on line 44.
On line 45, the FreeBlockInfo
‘s FreeListBySizeIt
member variable needs to be patched to point to the corresponding iterator in the m_FreeListBySize
multimap
.
DescriptorAllocatorPage::Allocate
The Allocate
method is used to allocate descriptors from the free list. When a block of descriptors is allocated from the free list, it is possible that the existing free block needs to be split and the remaining descriptors are “returned” to the free list. For example, if only a single descriptor is requested by the caller and the free list has a free block of 100 descriptors, then the free block of 100 descriptors is removed from the heap, 1 descriptor allocated from that block, and a free block of 99 descriptors is added back to the free list.
1 2 3 |
DescriptorAllocation DescriptorAllocatorPage::Allocate( uint32_t numDescriptors ) { std::lock_guard<std::mutex> lock( m_AllocationMutex ); |
In order to prevent any race conditions that may occur by multiple threads making allocations on the same DescriptorAllocatorPage
, the m_AllocationMutex
is locked line 50.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
// There are less than the requested number of descriptors left in the heap. // Return a NULL descriptor and try another heap. if ( numDescriptors > m_NumFreeHandles ) { return DescriptorAllocation(); } // Get the first block that is large enough to satisfy the request. auto smallestBlockIt = m_FreeListBySize.lower_bound( numDescriptors ); if ( smallestBlockIt == m_FreeListBySize.end() ) { // There was no free block that could satisfy the request. return DescriptorAllocation(); } |
On lines 54 and 61 the free list is checked to make sure that there are enough free descriptor handles to satisfy the request. If there are not enough descriptor handles, a default (null) DescriptorAllocation
is returned to the calling function. If these checks pass, then smallestBlockIt
contains an iterator to the first entry in the FreeListBySize
multimap
that is not less than the requested number of descriptors.
1 2 3 4 5 6 7 8 |
// The size of the smallest block that satisfies the request. auto blockSize = smallestBlockIt->first; // The pointer to the same entry in the FreeListByOffset map. auto offsetIt = smallestBlockIt->second; // The offset in the descriptor heap. auto offset = offsetIt->first; |
The smallestBlockIt
is used to retrieve the size of the free block and get the iterator to the corresponding entry in the FreeListByOffset
map in \(\mathcal{O}(1)\) constant time (which is better than \(\mathcal{O}(\log_2{n})\) logarithmic time complexity of the std::map::find
method).
The free block that was found needs to be removed from the free list and a new block that results from splitting the free block needs to be added back to the free list.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
// Remove the existing free block from the free list. m_FreeListBySize.erase( smallestBlockIt ); m_FreeListByOffset.erase( offsetIt ); // Compute the new free block that results from splitting this block. auto newOffset = offset + numDescriptors; auto newSize = blockSize - numDescriptors; if ( newSize > 0 ) { // If the allocation didn't exactly match the requested size, // return the left-over to the free list. AddNewBlock( newOffset, newSize ); } |
On lines 77-78 the free block that was found is removed from the free list.
On lines 81-82 the size and offset of the the new free block that resulted from splitting the current block is computed and if the size is not 0, the new block is added to the free list using the AddNewBlock
method on line 88.
1 2 3 4 5 6 7 |
// Decrement free handles. m_NumFreeHandles -= numDescriptors; return DescriptorAllocation( CD3DX12_CPU_DESCRIPTOR_HANDLE( m_BaseDescriptor, offset, m_DescriptorHandleIncrementSize ), numDescriptors, m_DescriptorHandleIncrementSize, shared_from_this() ); } |
The total number of free handles is decremented by the number of requested descriptors on line 92 and the resulting DescriptorAllocation
is returned to the calling function on line 94.
DescriptorAllocatorPage::ComputeOffset
The ComputeOffset
method is used to compute the offset (in descriptor handles) from the base descriptor (first descriptor in the descriptor heap) to a given descriptor.
1 2 3 4 |
uint32_t DescriptorAllocatorPage::ComputeOffset( D3D12_CPU_DESCRIPTOR_HANDLE handle ) { return static_cast<uint32_t>( handle.ptr - m_BaseDescriptor.ptr ) / m_DescriptorHandleIncrementSize; } |
The ComputeOffset
method is used by the Free
method (shown next) in order to compute the offset of a descriptor in the descriptor heap. Since a D3D12_CPU_DESCRIPTOR_HANDLE
is just a structure that contains a single SIZE_T
member variable, computing the offset of a descriptor in a descriptor heap is a matter of simple arithmetic.
DescriptorAllocatorPage::Free
The Free
method returns a block of descriptors back to the free list. Descriptors are not immediately returned to the free list but instead are added to a queue of stale descriptors. Descriptors are only returned to the free list once the frame they were freed in is finished executing on the GPU. This ensures that descriptors are not reused until they are no longer being referenced by a GPU command.
1 2 3 4 5 6 7 8 9 10 |
void DescriptorAllocatorPage::Free( DescriptorAllocation&& descriptor, uint64_t frameNumber ) { // Compute the offset of the descriptor within the descriptor heap. auto offset = ComputeOffset( descriptor.GetDescriptorHandle() ); std::lock_guard<std::mutex> lock( m_AllocationMutex ); // Don't add the block directly to the free list until the frame has completed. m_StaleDescriptors.emplace( offset, descriptor.GetNumHandles(), frameNumber ); } |
The DescriptorAllocation
doesn’t store the offset of the descriptor within the descriptor heap but the offset can be computed using the ComputeOffset
method.
In order to guarantee the m_StaleDescriptors
queue is only modified on a single thread at a time, the m_AllocationMutex
mutex
is locked on line 109 and the StaleDescriptorInfo
is added to the m_StaleDescriptors
queue on line 112.
DescriptorAllocatorPage::FreeBlock
The FreeBlock
method is executed when the stale descriptors are added back to the free list. When adding a block back to the free list, neighboring blocks should be merged to minimize fragmentation of the free list. Two cases need to be considered when adding a block back to the free list:
- Case 1: There is a block in the free list that is immediately preceding the block being freed.
- Case 2: There is a block in the free list that is immediately following the block being freed.
- Case 3: There is both a block in the free list immediately preceding and immediately following the block being freed.
- Case 4: There is neither a block in the free list immediately preceding nor immediately following the block being freed.
If Case 1 is true then the previous block in the free list needs to be merged with the block being freed. If Case 2 is true then the next block in the free list needs to be merged with the block being freed.
The above image shows the two cases that can occur when returning a block back to the free list. Case 3 and Case 4 do not need to be handled in any special way since those cases are already handled implicitly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
void DescriptorAllocatorPage::FreeBlock( uint32_t offset, uint32_t numDescriptors ) { // Find the first element whose offset is greater than the specified offset. // This is the block that should appear after the block that is being freed. auto nextBlockIt = m_FreeListByOffset.upper_bound( offset ); // Find the block that appears before the block being freed. auto prevBlockIt = nextBlockIt; // If it's not the first block in the list. if ( prevBlockIt != m_FreeListByOffset.begin() ) { // Go to the previous block in the list. --prevBlockIt; } else { // Otherwise, just set it to the end of the list to indicate that no // block comes before the one being freed. prevBlockIt = m_FreeListByOffset.end(); } // Add the number of free handles back to the heap. // This needs to be done before merging any blocks since merging // blocks modifies the numDescriptors variable. m_NumFreeHandles += numDescriptors; |
On line 119, the block that comes after the block being freed is queried from the FreeListByOffset
map using the std::map::upper_bound
method. The upper_bound
method returns the first element whos key is strictly greater than the specified key. If no such element exists, this method returns the past-the-end (end
) iterator.
The previous block in the free list (prevBlockIt
) is the one that appears just before the block being freed. The previous block is initialized on line 122 to be the same as the next block (nextBlockIt
) and if it is not the first element in the free list, then it is decremented on line 127 to point to the previous element. If the free list is completely empty (Case 4), then the nextBlockIt
, prevBlockIt
, and begin
iterator will all point to the past-the-end (end
) iterator.
If there is only a single item in the free list then it either comes before or after the element being freed. If it comes after the block being freed, then nextBlockIt
will point to that element and prevBlockIt
will be set to the end
iterator on line 133. If it comes before the block being freed then nextBlockIt
will point to the end
iterator and the prevBlockIt
will point to that element after being decremented on line 127.
The number of free handles is incremented by the number of handles being freed on line 139.
First Case 1 is checked (the previous block is immediately preceding the block being freed).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
if ( prevBlockIt != m_FreeListByOffset.end() && offset == prevBlockIt->first + prevBlockIt->second.Size ) { // The previous block is exactly behind the block that is to be freed. // // PrevBlock.Offset Offset // | | // |<-----PrevBlock.Size----->|<------Size-------->| // // Increase the block size by the size of merging with the previous block. offset = prevBlockIt->first; numDescriptors += prevBlockIt->second.Size; // Remove the previous block from the free list. m_FreeListBySize.erase( prevBlockIt->second.FreeListBySizeIt ); m_FreeListByOffset.erase( prevBlockIt ); } |
If there is a block immediately preceding the block being freed then that block is merged with the block being freed.
Case 2 is checked next (the next block in the free list is following the block being freed).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
if ( nextBlockIt != m_FreeListByOffset.end() && offset + numDescriptors == nextBlockIt->first ) { // The next block is exactly in front of the block that is to be freed. // // Offset NextBlock.Offset // | | // |<------Size-------->|<-----NextBlock.Size----->| // Increase the block size by the size of merging with the next block. numDescriptors += nextBlockIt->second.Size; // Remove the next block from the free list. m_FreeListBySize.erase( nextBlockIt->second.FreeListBySizeIt ); m_FreeListByOffset.erase( nextBlockIt ); } |
Again, the block immediately following the block being freed is merged with the block being freed.
Case 3 and Case 4 do not need to be handled explicitly since they are being implicitly handled.
The final step is to add the new (merged) block back into the free list.
1 2 3 |
// Add the freed block to the free list. AddNewBlock( offset, numDescriptors ); } |
On line 178 the new block is added back into the free list using the AddNewBlock
method.
DescriptorAllocatorPage::ReleaseStaleDescriptors
Stale descriptors are returned to the free list using the ReleaseStaleDescriptors
method when the frame that they were freed in is finished executing on the GPU.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
void DescriptorAllocatorPage::ReleaseStaleDescriptors( uint64_t frameNumber ) { std::lock_guard<std::mutex> lock( m_AllocationMutex ); while ( !m_StaleDescriptors.empty() && m_StaleDescriptors.front().FrameNumber <= frameNumber ) { auto& staleDescriptor = m_StaleDescriptors.front(); // The offset of the descriptor in the heap. auto offset = staleDescriptor.Offset; // The number of descriptors that were allocated. auto numDescriptors = staleDescriptor.Size; FreeBlock( offset, numDescriptors ); m_StaleDescriptors.pop(); } } |
To ensure the m_StaleDescriptors
queue
is not being modified on any other thread, the m_AllocationMutex
mutex
is locked on line 181.
On lines 183-195, the m_StaleDescriptors
queue
is checked for any entries. If there is an entry for which the frame number is less than (or equal to) the completed frame number, its entry is popped off the queue and the block is returned back to the free list using the FreeBlock
method described in the previous section.
The final class in the triad of classes that constitute the descriptor allocation scheme used by the DX12Lib project is the DescriptorAllocation
class and is the subject of the next section.
View the full source code for DescriptorAllocatorPage.cpp
DescriptorAllocation Class
The DescriptorAllocation
class is used by the DescriptorAllocator
to represent a single allocation of contiguous descriptors in a descriptor heap. The DescriptorAllocation
class is a move-only self-freeing type that is used as a wrapper for a D3D12_CPU_DESCRIPTOR_HANDLE
. The reason why the DescriptorAllocation
must be a move-only class is to ensure there is only a single instance of a particular allocation. This guarantees that if the descriptor is destroyed or replaced, the original descriptor will be returned back to the descriptor heap (from) whence it came.
The DescriptorAllocation
class provides the following (public) method:
IsNull
: Check to see if theDescriptorAllocation
contains a valid descriptor handle.GetDescriptorHandle
: Get the descriptor handle to the underlyingD3D12_CPU_DESCRIPTOR_HANDLE
GetNumHandles
: Gets the number of consecutive descriptors in theDescriptorAllocation
.
DescriptorAllocation Header
The header file is used to declare the DescriptorAllocation
class. Additional header files that are necessary to compile the DescriptorAllocation
are shown first.
1 2 3 4 5 6 |
#include <d3d12.h> #include <cstdint> #include <memory> class DescriptorAllocatorPage; |
The d3d12.h
header is necessary for the D3D12_CPU_DESCRIPTOR_HANDLE
type.
The cstdint
header file is included to provide the uint32_t
type.
The memory
header file is included to provide access to the std::shared_ptr
type.
The DescriptorAllocatorPage
is forward declared on line 42 to avoid including the header file for that class. The DescriptorAllocatorPage
is used as a template argument for a std::shared_ptr
which doesn’t require a complete type.
1 2 3 4 5 6 7 8 9 10 |
class DescriptorAllocation { public: // Creates a NULL descriptor. DescriptorAllocation(); DescriptorAllocation( D3D12_CPU_DESCRIPTOR_HANDLE descriptor, uint32_t numHandles, uint32_t descriptorSize, std::shared_ptr<DescriptorAllocatorPage> page ); // The destructor will automatically free the allocation. ~DescriptorAllocation(); |
The DescriptorAllocation
class provides a default constructor which initializes the descriptor as a null descriptor.
The parameterized constructor declared on line 50 is used by the DescriptorAllocatorPage::Allocate
method to construct a valid DescriptorAllocation
.
The destructor declared on line 53 is necessary to ensure the allocation is returned to the DescriptorAllocatorPage
that it came from.
1 2 3 4 5 6 7 |
// Copies are not allowed. DescriptorAllocation( const DescriptorAllocation& ) = delete; DescriptorAllocation& operator=( const DescriptorAllocation& ) = delete; // Move is allowed. DescriptorAllocation( DescriptorAllocation&& allocation ); DescriptorAllocation& operator=( DescriptorAllocation&& other ); |
It is not allowed to make copies of the DescriptorAllocation
to prevent any accidental copies, the copy constructor and copy assignment operator are deleted from the class to prevent the compiler from auto generating them.
Moving the DescriptorAllocation
to another DescriptorAllocation
is allowed (and in fact, required). Both the move constructor and the move assignment operator are declared on lines 60 and 61.
1 2 |
// Check if this a valid descriptor. bool IsNull() const; |
The IsNull
method is used to check if the DescriptorAllocation
contains a valid descriptor.
1 2 |
// Get a descriptor at a particular offset in the allocation. D3D12_CPU_DESCRIPTOR_HANDLE GetDescriptorHandle( uint32_t offset = 0 ) const; |
The DescriptorAllocation
can contain a block of consecutive descriptors in a descriptor heap. The GetDescriptorHandle
method is used to get the underlying D3D12_CPU_DESCRIPTOR_HANDLE
at a particular offset within the contigious block of descriptors.
1 2 |
// Get the number of (consecutive) handles for this allocation. uint32_t GetNumHandles() const; |
The GetNumHandles
is used to get the number of consecutive descriptor handles that are contained in the DescriptorAllocation
.
1 2 3 |
// Get the heap that this allocation came from. // (For internal use only). std::shared_ptr<DescriptorAllocatorPage> GetDescriptorAllocatorPage() const; |
The GetDescriptorAllocatorPage
method is used to query the DescriptorAllocatorPage
where the DescriptorAllocation
came from.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
private: // Free the descriptor back to the heap it came from. void Free(); // The base descriptor. D3D12_CPU_DESCRIPTOR_HANDLE m_Descriptor; // The number of descriptors in this allocation. uint32_t m_NumHandles; // The offset to the next descriptor. uint32_t m_DescriptorSize; // A pointer back to the original page where this allocation came from. std::shared_ptr<DescriptorAllocatorPage> m_Page; }; |
The Free
method is used by the DescriptorAllocation
class to return itself back to the DescriptorAllocatorPage
it came from. This method is used if the DescriptorAllocation
is destructed or when another DescriptorAllocation
is being (move) assigned to it.
The m_Descriptor
member variable is the handle to the first D3D12_CPU_DESCRIPTOR_HANDLE
in the allocation.
The m_NumHandles
member variable stores the total number of descriptors in the DescriptorAllocation
.
The m_DescriptorSize
member variable stores the increment size for each descriptor. This is used to compute the offset of a particular descriptor within the allocation.
The m_Page
member variable stores a std::shared_ptr
back to the DescriptorAllocatorPage
that the DescriptorAllocation
came from.
View the full source code for DescriptorAllocation.h
DescriptorAllocation Preamble
The implementation of the DescriptorAllocation
class is fairly simple as it acts as a wrapper class for the underlying D3D12_CPU_DESCRIPTOR_HANDLE
and provides a few accessor methods that describe the allocation.
1 2 3 4 5 6 |
#include <DX12LibPCH.h> #include <DescriptorAllocation.h> #include <Application.h> #include <DescriptorAllocatorPage.h> |
The DX12LibPCH.h
header file provides the precompiled header file for the DX12Lib
project and must be the first include that appears in the implementation file.
The DescriptorAllocation.h
header is included next and provides the declaration of the DescriptorAllocation
class that was shown in the previous section.
The Application.h
header provides the declaration of the Application
class. When freeing a DescriptorAllocation
it is necessary to provide the current frame of execution which is provided by the Application
class.
The DescriptorAllocatorPage.h
header file is necessary to be able to call the DescriptorAllocatorPage::Free
method when freeing the DescriptorAllocation
.
DescriptorAllocation Default Constructor
The default constructor for the DescriptorAllocation
class simply initializes it as a null descriptor.
1 2 3 4 5 6 |
DescriptorAllocation::DescriptorAllocation() : m_Descriptor{ 0 } , m_NumHandles( 0 ) , m_DescriptorSize( 0 ) , m_Page( nullptr ) {} |
DescriptorAllocation Parameratized Constructor
The parameterized constructor for the DescriptorAllocation
class initializes it as a valid descriptor (assuming the parameters are valid).
1 2 3 4 5 6 |
DescriptorAllocation::DescriptorAllocation( D3D12_CPU_DESCRIPTOR_HANDLE descriptor, uint32_t numHandles, uint32_t descriptorSize, std::shared_ptr<DescriptorAllocatorPage> page ) : m_Descriptor( descriptor ) , m_NumHandles( numHandles ) , m_DescriptorSize( descriptorSize ) , m_Page( page ) {} |
The member variables being initialized here are described in the DescriptorAllocation Header section and shouldn’t require additional explanation.
DescriptorAllocation Destructor
The destructor for the DescriptorAllocation
class must ensure that the descriptor is freed back to the DescriptorAllocatorPage
it came from by calling the Free
method.
1 2 3 4 |
DescriptorAllocation::~DescriptorAllocation() { Free(); } |
DescriptorAllocation Move Constructor
The move constructor allows the DescriptorAllocation
to be moved. The original DescriptorAllocation
must be made invalid but the allocation should not be freed.
1 2 3 4 5 6 7 8 9 10 |
DescriptorAllocation::DescriptorAllocation( DescriptorAllocation&& allocation ) : m_Descriptor(allocation.m_Descriptor) , m_NumHandles(allocation.m_NumHandles) , m_DescriptorSize(allocation.m_DescriptorSize) , m_Page(std::move(allocation.m_Page)) { allocation.m_Descriptor.ptr = 0; allocation.m_NumHandles = 0; allocation.m_DescriptorSize = 0; } |
DescriptorAllocation Move Assignment
The move assignment operator behaves similar to the move constructor except the original descriptor must be freed using the Free
method before moving another descriptor into the current one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
DescriptorAllocation& DescriptorAllocation::operator=( DescriptorAllocation&& other ) { // Free this descriptor if it points to anything. Free(); m_Descriptor = other.m_Descriptor; m_NumHandles = other.m_NumHandles; m_DescriptorSize = other.m_DescriptorSize; m_Page = std::move( other.m_Page ); other.m_Descriptor.ptr = 0; other.m_NumHandles = 0; other.m_DescriptorSize = 0; return *this; } |
DescriptorAllocation::Free
If the DescriptorAllocation
either goes out of scope or is replaced by another descriptor, it must be freed. The Free
method is used to return the DescriptorAllocation
back to the DescriptorAllocatorPage
it came from.
1 2 3 4 5 6 7 8 9 10 11 12 |
void DescriptorAllocation::Free() { if ( !IsNull() && m_Page ) { m_Page->Free( std::move( *this ), Application::GetFrameCount() ); m_Descriptor.ptr = 0; m_NumHandles = 0; m_DescriptorSize = 0; m_Page.reset(); } } |
If the DescriptorAllocation
is valid (not null) then it is returned back to the DescriptorAllocatorPage
it came from using the DescriptorAllocatorPage::Free
method.
DescriptorAllocation::IsNull
The IsNull
method check to see if the underlying D3D12_CPU_DESCRIPTOR_HANDLE
is valid.
1 2 3 4 5 |
// Check if this a valid descriptor. bool DescriptorAllocation::IsNull() const { return m_Descriptor.ptr == 0; } |
DescriptorAllocation::GetDescriptorHandle
The GetDescriptorHandle
method returns a D3D12_CPU_DESCRIPTOR_HANDLE
for the descriptor at a particular offset within the DescriptorAllocation
.
1 2 3 4 5 6 |
// Get a descriptor at a particular offset in the allocation. D3D12_CPU_DESCRIPTOR_HANDLE DescriptorAllocation::GetDescriptorHandle( uint32_t offset ) const { assert( offset < m_NumHandles ); return { m_Descriptor.ptr + ( m_DescriptorSize * offset ) }; } |
DescriptorAllocation::GetNumHandles
The GetNumHandles
method returns the number of descriptor handles in the DescriptorAllocation
.
1 2 3 4 |
uint32_t DescriptorAllocation::GetNumHandles() const { return m_NumHandles; } |
DescriptorAllocation::GetDescriptorAllocatorPage
The GetDescriptorAllocatorPage
method returns the std::shared_ptr
to the DescriptorAllocatorPage
where the DescriptorAllocation
originated from.
1 2 3 4 |
std::shared_ptr<DescriptorAllocatorPage> DescriptorAllocation::GetDescriptorAllocatorPage() const { return m_Page; } |
This concludes the description of the classes that are used to implement the descriptor allocation strategy used by the DX12Lib project. The DescriptorAllocator
class provides a simple interface for allocating and freeing descriptors using a free list memory management scheme. The DescriptorAllocatorPage
class is used internally to manage allocations and the DescriptorAllocation
class is used to represent a single allocation from the descriptor heap.
The DynamicDescriptorHeap
class provides a flexible solution for ensuring the CPU visible descriptors are copied to the correct location in a GPU visible descriptor heap for rendering on the GPU. The DynamicDescriptorHeap
class is the subject of the next section.
View the full source code for DescriptorAllocation.cpp
Dynamic Descriptor Heap
The purpose of the DynamicDescriptorHeap
class is to allocate GPU visible descriptors that are used for binding CBV, SRV, UAV, and Samplers to the GPU pipeline for rendering or compute invocations. This is necessary since the descriptors provided by the DescriptorAllocator
class shown in the previous section are CPU visible and cannot be used to bind resources to the GPU rendering pipeline. The DynamicDescriptorHeap
class provides a staging area for CPU visible descriptors that are committed to GPU visible descriptor heaps when a Draw
or Dispatch
method is invoked on the command list.
Since only a single CBV_SRV_UAV
descriptor heap and a single SAMPLER
descriptor heap can be bound to the command list at the same time, the DynamicDescriptorHeap
class also ensures that the currently bound descriptor heap has a sufficient number of descriptors to commit all of the staged descriptors before a Draw
or Dispatch
command is executed. If the currently bound descriptor heap runs out of descriptors, then a new descriptor heap is bound to the command list.
DynamicDescriptorHeap
class shown in this article is designed to provide functionality similar to that of DirectX 11 where dynamic descriptor indexing wasn’t supported.The DynamicDescriptorHeap
class caches staged descriptors in a descriptor cache that is configured to match the layout of the root signature. For example, if the root signature has the following layout:
Index | Type | Range Type | Num Desriptors |
---|---|---|---|
0 | CBV | – | – |
1 | DESCRIPTOR_TABLE | SRV | 6 |
2 | DESCRIPTOR_TABLE | CBV | 3 |
3 | DESCRIPTOR_TABLE | UAV | 3 |
4 | DESCRIPTOR_TABLE | SAMPLER | 4 |
Then the descriptor table cache for the CBV_SRV_UAV
dynamic descriptor heap would look like this:
There are a few interesting things to note in the image above. The first entry (root index 0) in the descriptor table cache is empty because the root signature contains an inline Constant Buffer View (CBV). Since an inline CBV does not require a descriptor, there is no reason to allocate any space for it in the descriptor handle cache.
The second entry in the descriptor table cache has six SRV descriptors and a pointer to the first entry in the descriptor handle cache. Similarly, the third and fourth entries in the descriptor table cache each have three descriptors and a pointer to their corresponding entry in the descriptor handle cache.
The fourth entry in the descriptor table cache is empty despite the fact that the root signature layout has a descriptor table that contains four SAMPLER
s. Since CBV_SRV_UAV
descriptors and SAMPLER
descriptors cannot be stored in the same descriptor heap, there is a seperate DynamicDescriptorHeap
for each CBV_SRV_UAV
and SAMPLER
descriptor types.
DynamicDescriptorHeap Class
The design of the DynamicDescriptorHeap
class is heavily based on the DynamicDescriptorHeap implementation from Microsoft’s DirectX Samples on GitHub [1].
The DynamicDescriptorHeap
class provides the following functionality:
- Stage Descriptors: Stage CPU visible descriptors to the descriptor table cache.
- Commit Staged Descriptors: Commit the staged descriptors to a GPU visible descriptor heap.
- Copy a Descriptor: Directly copy a CPU visible descriptor to a GPU visible descriptor heap. This is useful for the
ID3D12GraphicsCommandList::ClearUnorderedAccessViewFloat
and theID3D12GraphicsCommandList::ClearUnorderedAccessViewUint
methods.
DynamicDescriptorHeap Header
In this section the declaration of the DynamicDescriptorHeap
class is described. The DynamicDescriptorHeap
class provides methods for staging CPU visible descriptors and committing those descriptors to a GPU visible descriptor heap before a Draw
or Dispatch
command is executed. The DynamicDescriptorHeap
class also provides a method to copy a single CPU visible descriptor to a GPU visible descriptor heap. Copying of single descriptors is required for the ID3D12GraphicsCommandList::ClearUnorderedAccessViewFloat
and the ID3D12GraphicsCommandList::ClearUnorderedAccessViewUint
methods. These methods require both a CPU and a GPU visible descriptor for the resource to be cleared.
A method to parse the root signature and configure the descriptor table cache is also provided. The DX12Lib project provides a RootSignature
class for the purpose of determining the layout of the root signature but this class is not described here. The RootSignature
class is a wrapper for a ID3D12RootSignature
. For more information on the RootSignature
class, refer to the GitHub repository (RootSignature.h, and RootSignature.cpp).
1 2 3 4 5 6 7 8 9 10 |
#include "d3dx12.h" #include <wrl.h> #include <cstdint> #include <memory> #include <queue> class CommandList; class RootSignature; |
The d3dx12.h
header file provides some helper types for working with DirectX 12. The d3dx12.h
header file also includes the d3d12.h
file so it does not need to be included directly.
The wrl.h
header file includes the ComPtr
template class.
The cstdint
header provides access to the standard integer types (such as uint32_t
). The memory
header file is required for the std::unique_ptr
and the queue
header file is required for the std::queue
container class.
The CommandList
and RootSignature
classes are forward declared on lines 9 and 10. The header files are only required in the implementation file for the DynamicDescriptorHeap
class.
1 2 3 4 5 6 7 8 |
class DynamicDescriptorHeap { public: DynamicDescriptorHeap( D3D12_DESCRIPTOR_HEAP_TYPE heapType, uint32_t numDescriptorsPerHeap = 1024); virtual ~DynamicDescriptorHeap(); |
The DynamicDescriptorHeap
class has a single constructor which takes a D3D12_DESCRIPTOR_HEAP_TYPE
argument and the number of descriptors to allocate per heap.
On line 55, the destructor for the DynamicDescriptorHeap
class is declared.
1 2 3 4 5 6 |
/** * Stages a contiguous range of CPU visible descriptors. * Descriptors are not copied to the GPU visible descriptor heap until * the CommitStagedDescriptors function is called. */ void StageDescriptors(uint32_t rootParameterIndex, uint32_t offset, uint32_t numDescriptors, const D3D12_CPU_DESCRIPTOR_HANDLE srcDescriptors); |
CPU visible descriptors are staged to the DynamicDescriptorHeap
using the StageDescriptors
method. This method has the following arguments:
uint32_t rootParameterIndex
: The index of root parameter to copy the descriptors to. This must be configured as aDESCRIPTOR_TABLE
in the currently bound root signature.uint32_t offset
: The offset within the descriptor table to copy the descriptors to. This value can span descriptor ranges within the table butoffset
+numDescriptors
must not exceed the total number of descriptors in the descriptor table.uint32_t numDescriptors
: The number of contiguous descriptors to copy starting fromsrcDescriptors
.const D3D12_CPU_DESCRIPTOR_HANDLE srcDescriptors
: The base descriptor to start copying descriptors from.
The StageDescriptors
method is used to copy any number of contiguous CPU visible descriptors to the DynamicDescriptorHeap
. Using this method, only the descriptor handles are copied to the DynamicDescriptorHeap
but not the contents of the descriptor. For this reason, the CPU visible descriptors cannot be reused or overwritten (using ID3D12Device::CreateShaderResourceView
for example) until the CommitStagedDescriptors
method is invoked.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
/** * Copy all of the staged descriptors to the GPU visible descriptor heap and * bind the descriptor heap and the descriptor tables to the command list. * The passed-in function object is used to set the GPU visible descriptors * on the command list. Two possible functions are: * * Before a draw : ID3D12GraphicsCommandList::SetGraphicsRootDescriptorTable * * Before a dispatch: ID3D12GraphicsCommandList::SetComputeRootDescriptorTable * * Since the DynamicDescriptorHeap can't know which function will be used, it must * be passed as an argument to the function. */ void CommitStagedDescriptors( CommandList& commandList, std::function<void(ID3D12GraphicsCommandList*, UINT, D3D12_GPU_DESCRIPTOR_HANDLE)> setFunc ); void CommitStagedDescriptorsForDraw(CommandList& commandList); void CommitStagedDescriptorsForDispatch(CommandList& commandList); |
The CommitStagedDescriptors
family of methods is used to commit any staged descriptors to the GPU visible descriptor heaps. The CommitStagedDescriptorsForDraw
uses the ID3D12GraphicsCommandList::SetGraphicsRootDescriptorTable
method to bind the descriptors to the graphics pipeline while the CommitStagedDescriptorsForDispatch
method uses the ID3D12GraphicsCommandList::SetComputeRootDescriptorTable
method to bind the descriptors to the compute pipeline.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
/** * Copies a single CPU visible descriptor to a GPU visible descriptor heap. * This is useful for the * * ID3D12GraphicsCommandList::ClearUnorderedAccessViewFloat * * ID3D12GraphicsCommandList::ClearUnorderedAccessViewUint * methods which require both a CPU and GPU visible descriptors for a UAV * resource. * * @param commandList The command list is required in case the GPU visible * descriptor heap needs to be updated on the command list. * @param cpuDescriptor The CPU descriptor to copy into a GPU visible * descriptor heap. * * @return The GPU visible descriptor. */ D3D12_GPU_DESCRIPTOR_HANDLE CopyDescriptor( CommandList& comandList, D3D12_CPU_DESCRIPTOR_HANDLE cpuDescriptor); |
When clearing a UAV resources using either the ID3D12GraphicsCommandList::ClearUnorderedAccessViewFloat
or the ID3D12GraphicsCommandList::ClearUnorderedAccessViewUint
method, both a CPU and a GPU visible descriptor are required. The CopyDescriptor
method is used to copy a single CPU visible descriptor into a GPU visible descriptor heap. This method accepts a CommandList
as its only argument in case the currently bound descriptor heap needs to be updated on the command list as a result of copying the descriptor.
1 2 3 4 5 6 |
/** * Parse the root signature to determine which root parameters contain * descriptor tables and determine the number of descriptors needed for * each table. */ void ParseRootSignature( const RootSignature& rootSignature); |
Using the ParseRootSignature
method, the the DynamicDescriptorHeap
is informed of any changes to the currently bound root signature on the command list. This method updates the layout of the descriptors in the descriptor cache to match the descriptor layout in the root signature (as described in the introduction to this section).
1 2 3 4 5 6 |
/** * Reset used descriptors. This should only be done if any descriptors * that are being referenced by a command list has finished executing on the * command queue. */ void Reset(); |
The Reset
method is used to reset the allocated descriptor heaps and descriptor cache. This should only be done when the command queue is finished processing any commands that are referencing any descriptors in the DynamicDescriptorHeap
.
1 2 3 4 5 |
private: // Request a descriptor heap if one is available. Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> RequestDescriptorHeap(); // Create a new descriptor heap of no descriptor heap is available. Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> CreateDescriptorHeap(); |
The RequestDescriptorHeap
method is used to get an available descriptor heap. If there are no available descriptor heaps, then a new descriptor heap is created using the CreateDescriptorHeap
method.
1 2 3 |
// Compute the number of stale descriptors that need to be copied // to GPU visible descriptor heap. uint32_t ComputeStaleDescriptorCount() const; |
The ComputeStaleDescriptorCount
method returns the number of CPU visible descriptors that need to be copied to the GPU visible descriptor heap.
1 2 3 4 5 6 |
/** * The maximum number of descriptor tables per root signature. * A 32-bit mask is used to keep track of the root parameter indices that * are descriptor tables. */ static const uint32_t MaxDescriptorTables = 32; |
The MaxDescriptorTables
constant represents the maximum number of descriptor tables that can exist in the root signature. The limit of 32 descriptor tables was chosen since a 32-bit bitmask is used to indicate which entries of the root signature uses a descriptor table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
/** * A structure that represents a descriptor table entry in the root signature. */ struct DescriptorTableCache { DescriptorTableCache() : NumDescriptors(0) , BaseDescriptor(nullptr) {} // Reset the table cache. void Reset() { NumDescriptors = 0; BaseDescriptor = nullptr; } // The number of descriptors in this descriptor table. uint32_t NumDescriptors; // The pointer to the descriptor in the descriptor handle cache. D3D12_CPU_DESCRIPTOR_HANDLE* BaseDescriptor; }; |
The DescriptorTableCache
struct represents a single entry in the DescriptorTableCache
array. Each entry in the descriptor cache stores the number of descriptors in the descriptor table and a pointer to the descriptor handle in the descriptor handle cache. By default, each entry in the descriptor table cache is empty (0 descriptors and a null pointer) which indicates that that entry in the currently bound root signature does not use a descriptor table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
// Describes the type of descriptors that can be staged using this // dynamic descriptor heap. // Valid values are: // * D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV // * D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER // This parameter also determines the type of GPU visible descriptor heap to // create. D3D12_DESCRIPTOR_HEAP_TYPE m_DescriptorHeapType; // The number of descriptors to allocate in new GPU visible descriptor heaps. uint32_t m_NumDescriptorsPerHeap; // The increment size of a descriptor. uint32_t m_DescriptorHandleIncrementSize; |
The m_DescriptorHeapType
member variable stores the type of descriptor heap the DynamicDescriptorHeap
uses. This can be either D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV
or D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER
.
The m_NumDescriptorsPerHeap
variable indicates how many descriptors to allocate for each descriptor heap.
The m_DescriptorHandleIncrementSize
variable indicates the offset between descriptors in the descriptor heap. Since the increment size of a descriptor within a descriptor heap is vendor specific, it must be queried at runtime.
1 2 3 4 5 |
// The descriptor handle cache. std::unique_ptr<D3D12_CPU_DESCRIPTOR_HANDLE[]> m_DescriptorHandleCache; // Descriptor handle cache per descriptor table. DescriptorTableCache m_DescriptorTableCache[MaxDescriptorTables]; |
The m_DescriptorHandleCache
variable is an array of D3D12_CPU_DESCRIPTOR_HANDLE
s. The number of descriptors that can be cached is determined by the numDescriptors
argument passed to the paramertized constructor of the DynamicDescriptorHeap
class.
The m_DescriptorTableCache
variable is an array of DescriptorTableCache
structs. This array is statically sized to the maximum number of descriptor tables that can appear in a root signature (MaxDescriptorTables
). The layout of the m_DescriptorTableCache
array is configured in the ParseRootSignature
method shown later.
1 2 3 4 5 6 7 |
// Each bit in the bit mask represents the index in the root signature // that contains a descriptor table. uint32_t m_DescriptorTableBitMask; // Each bit set in the bit mask represents a descriptor table // in the root signature that has changed since the last time the // descriptors were copied. uint32_t m_StaleDescriptorTableBitMask; |
The m_DescriptorTableBitMask
variable indicates which entries in the currently bound root signature contains a descriptor table. The m_StaleDescriptorTableBitMask
variable is used to indicate which descriptor table entries have been modified since the previous commit. If a root signature has multiple descriptor table entries (as is shown in the example in the introduction to this section) but only one of the descriptor tables is modified between draw (or dispatch) commands, then only the modified descriptor table needs to be copied the GPU visible descriptor heap. Any unmodified descriptor tables can be left as-is.
1 2 3 4 |
using DescriptorHeapPool = std::queue< Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> >; DescriptorHeapPool m_DescriptorHeapPool; DescriptorHeapPool m_AvailableDescriptorHeaps; |
The DescriptorHeapPool
is an alias type for a std::queue
of ID3D12DescriptorHeap
s.
The m_DescriptorHeapPool
variable stores all of the descriptor heaps created by the DynamicDescriptorHeap
class and the m_AvailableDescriptorHeaps
variable stores only the descriptor heaps that still contain descriptors. When a descriptor heap does not contain enough descriptors to commit all staged descriptors to the descriptor heap then it is removed from the m_AvailableDescriptorHeaps
queue until the DynamicDescriptorHeap
is reset.
1 2 3 4 5 6 7 |
Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> m_CurrentDescriptorHeap; CD3DX12_GPU_DESCRIPTOR_HANDLE m_CurrentGPUDescriptorHandle; CD3DX12_CPU_DESCRIPTOR_HANDLE m_CurrentCPUDescriptorHandle; uint32_t m_NumFreeHandles; }; |
The m_CurrentDescriptorHeap
variable points to the current descriptor heap that is bound to the command list.
The m_CurrentGPUDescriptorHandle
and m_CurrentCPUDescriptorHandle
variables store the current GPU and CPU descriptor handles within the m_CurrentDescriptorHeap
descriptor heap.
The m_NumFreeHandles
variable stores the number of descriptor handles that are still available in the currently bound descriptor heap.
View the full source code for DynamicDescriptorHeap.h
DynamicDescriptorHeap Preamble
The preamble for the DynamicDescriptorHeap
implementation file contains the additional headers that are required to compile the class.
1 2 3 4 5 6 7 |
#include <DX12LibPCH.h> #include <DynamicDescriptorHeap.h> #include <Application.h> #include <CommandList.h> #include <RootSignature.h> |
The DX12LibPCH.h
header file is the precompiled header file for the DX12Lib project.
The DynamicDescriptorHeap.h
header file contains the declaration for the DynamicDescriptorHeap
class. This header file is described in the previous section.
The Application.h
header file is required to get access to the ID2D12Device
which is owned by the Application
class.
The CommandList.h
header file contains the declaration of the CommandList
class and the RootSignature.h
header file contains the declaration of the RootSignature
class. These classes are part of the DX12Lib project but are not described in detail in this lesson.
DynamicDescriptorHeap::DynamicDescriptorHeap
The constructor for the DynamicDescriptorHeap
initializes the variables for the DynamicDescriptorHeap
and allocates storage for the descriptor handle cache based on the maximum number of descriptors per descriptor heap.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
DynamicDescriptorHeap::DynamicDescriptorHeap(D3D12_DESCRIPTOR_HEAP_TYPE heapType, uint32_t numDescriptorsPerHeap) : m_DescriptorHeapType(heapType) , m_NumDescriptorsPerHeap(numDescriptorsPerHeap) , m_DescriptorTableBitMask(0) , m_StaleDescriptorTableBitMask(0) , m_CurrentCPUDescriptorHandle(D3D12_DEFAULT) , m_CurrentGPUDescriptorHandle(D3D12_DEFAULT) , m_NumFreeHandles(0) { m_DescriptorHandleIncrementSize = Application::Get().GetDescriptorHandleIncrementSize(heapType); // Allocate space for staging CPU visible descriptors. m_DescriptorHandleCache = std::make_unique<D3D12_CPU_DESCRIPTOR_HANDLE[]>(m_NumDescriptorsPerHeap); } |
Since the increment size of a descriptor in a descriptor heap is vendor specific, it must be queried at runtime. The increment size of a descriptor is queried on line 18.
On line 21, the descriptor handle cache is created based on the maximum number of descriptors that can be copied to the GPU visible descriptor heap.
DynamicDescriptorHeap::ParseRootSignature
Before any descriptors can be staged to the DynamicDescriptorHeap
the layout of the descriptor tables in the root signature must be known. The ParseRootSignature
method is used to configure the layout of the descriptor cache whenever the root signature is changed on the command list.
1 2 3 4 5 6 7 |
void DynamicDescriptorHeap::ParseRootSignature(const RootSignature& rootSignature) { // If the root signature changes, all descriptors must be (re)bound to the // command list. m_StaleDescriptorTableBitMask = 0; const auto& rootSignatureDesc = rootSignature.GetRootSignatureDesc(); |
The only argument to the ParseRootSignature
method is a reference to a RootSignature
. The RootSignature
class is part of the DX12Lib project but is not described in any detail in this lesson. The RootSignature
class provides a wrapper for a ID3D12RootSignature
with some additional methods to query the layout of the root signature.
Whenever the root signature changes on the command list, any stale descriptors that were staged but not committed should be bound again to the graphics or compute pipelines. The m_StaleDescriptorTableBitMask
variable is reset on line 31 to indicate that no descriptors should be copied to a GPU visible descriptor heap until new descriptors are staged to the DynamicDescriptorHeap
.
The root signature description used to create the root signature is cached in the RootSignature
class. This value is queried on line 33 so that the layout of the root signature can be determined.
1 2 3 4 |
// Get a bit mask that represents the root parameter indices that match the // descriptor heap type for this dynamic descriptor heap. m_DescriptorTableBitMask = rootSignature.GetDescriptorTableBitMask(m_DescriptorHeapType); uint32_t descriptorTableBitMask = m_DescriptorTableBitMask; |
A bitmask that represents the indices of the root signature that has a descriptor table for a particular descriptor heap type is queried on line 37. The bitmask for the root signature described in the example above looks like this:
The above image shows an example of a descriptor table bitmask for the CBV_SRV_UAV
descriptor heap type shown in the example above. In this case, the parameters at root indices 1, 2, and 3 have a descriptor table matching the heap type.
A copy of the descriptor table bitmask is initialized on line 38 so it can be scanned and cleared without modifying the class member variable.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
uint32_t currentOffset = 0; DWORD rootIndex; while (_BitScanForward(&rootIndex, descriptorTableBitMask) && rootIndex < rootSignatureDesc.NumParameters) { uint32_t numDescriptors = rootSignature.GetNumDescriptors(rootIndex); DescriptorTableCache& descriptorTableCache = m_DescriptorTableCache[rootIndex]; descriptorTableCache.NumDescriptors = numDescriptors; descriptorTableCache.BaseDescriptor = m_DescriptorHandleCache.get() + currentOffset; currentOffset += numDescriptors; // Flip the descriptor table bit so it's not scanned again for the current index. descriptorTableBitMask ^= (1 << rootIndex); } |
While there are bits enabled in the descriptorTableBitMask
bitmask variable, each index of the root signature is queried on line 44 for the number of descriptors in the descriptor table. The corresponding entry of the descriptor table cache is retrieved on line 46 and the number of descriptors and a pointer to the entry in the descriptor handle cache are stored on lines 47-48.
The _BitScanForward
function is actually a compiler intrinsic that scans a bitfield from least-significant bit (LSB) to most-significant bit (MSB) and stores the position of the first set bit in the index argument. Compiler intrinsics are usually faster than calling an equivalent function because intrinsics usually boil down to a single CPU instruction in the compiled executable.
The current offset in the descriptor handle cache is updated on line 50 by the number of descriptors in the descriptor table.
On line 53, the bit in the descriptorTableBitMask
is flipped to 0 so that the current index is not scanned again in the while
loop.
1 2 3 |
// Make sure the maximum number of descriptors per descriptor heap has not been exceeded. assert(currentOffset <= m_NumDescriptorsPerHeap && "The root signature requires more than the maximum number of descriptors per descriptor heap. Consider increasing the maximum number of descriptors per descriptor heap."); } |
Before leaving the ParseRootSignature
method, the post condition that the total number of descriptors of the root signature does not exceed the maximum number of descriptors that can be copied to the GPU visible descriptor heap is checked.
DynamicDescriptorHeap::StageDescriptors
The StageDescriptors
method is used to copy the CPU descriptor handles to prepare them for committing them to the GPU visible descriptor heap later.
1 2 3 4 5 6 7 8 |
void DynamicDescriptorHeap::StageDescriptors(uint32_t rootParameterIndex, uint32_t offset, uint32_t numDescriptors, const D3D12_CPU_DESCRIPTOR_HANDLE srcDescriptor) { // Cannot stage more than the maximum number of descriptors per heap. // Cannot stage more than MaxDescriptorTables root parameters. if (numDescriptors > m_NumDescriptorsPerHeap || rootParameterIndex >= MaxDescriptorTables ) { throw std::bad_alloc(); } |
Before copying any descriptors, the preconditions of the arguments are checked to ensure the user is not able to copy more descriptors than can fit in a descriptor heap or tries to set descriptors at an invalid index in the descriptor table cache. If either of these is the case, a std::bad_alloc
exception is thrown.
1 2 3 4 5 6 7 8 |
DescriptorTableCache& descriptorTableCache = m_DescriptorTableCache[rootParameterIndex]; // Check that the number of descriptors to copy does not exceed the number // of descriptors expected in the descriptor table. if ( (offset + numDescriptors) > descriptorTableCache.NumDescriptors) { throw std::length_error("Number of descriptors exceeds the number of descriptors in the descriptor table."); } |
A reference to the corresponding entry in the descriptor table cache is retrieved on line 69 and an additional check to ensure the user isn’t copying more descriptors than the current descriptor table is configured for is made on lines 73-76. If the user tries to copy a descriptor beyond the number of descriptors in the descriptor table, an std::length_error
exception is thrown.
1 2 3 4 5 |
D3D12_CPU_DESCRIPTOR_HANDLE* dstDescriptor = (descriptorTableCache.BaseDescriptor + offset); for (uint32_t i = 0; i < numDescriptors; ++i) { dstDescriptor[i] = CD3DX12_CPU_DESCRIPTOR_HANDLE(srcDescriptor, i, m_DescriptorHandleIncrementSize); } |
A pointer to the descriptor handle at a particular offset in the descriptor table cache is retrieved on line 78.
On lines 79-82 the descriptor handles are copied to the descriptor handle cache.
1 2 3 4 |
// Set the root parameter index bit to make sure the descriptor table // at that index is bound to the command list. m_StaleDescriptorTableBitMask |= (1 << rootParameterIndex); } |
To ensure the staged descriptors are committed to the GPU visible descriptor heap when the CommitStagedDescriptors
method is invoked, the corresponding bit in the m_StaleDescriptorTableBitMask
variable is set to 1 on line 86.
DynamicDescriptorHeap::ComputeStaleDescriptorCount
The ComputeStaleDescriptorCount
method is used to determine the number of descriptors that need to be committed to the GPU visible descriptor heap.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
uint32_t DynamicDescriptorHeap::ComputeStaleDescriptorCount() const { uint32_t numStaleDescriptors = 0; DWORD i; DWORD staleDescriptorsBitMask = m_StaleDescriptorTableBitMask; while ( _BitScanForward( &i, staleDescriptorsBitMask ) ) { numStaleDescriptors += m_DescriptorTableCache[i].NumDescriptors; staleDescriptorsBitMask ^= ( 1 << i ); } return numStaleDescriptors; } |
The ComputeStaleDescriptorCount
method is fairly simple. It counts the number of descriptors in any descriptor table cache whose corresponding bit in the m_StaleDescriptorTableBitMask
is set.
DynamicDescriptorHeap::RequestDescriptorHeap
The RequestDescriptorHeap
method retrieves a descriptor heap from the list of availble descriptor heaps. If there are no descriptor heaps available, a new one is created.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> DynamicDescriptorHeap::RequestDescriptorHeap() { Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> descriptorHeap; if (!m_AvailableDescriptorHeaps.empty()) { descriptorHeap = m_AvailableDescriptorHeaps.front(); m_AvailableDescriptorHeaps.pop(); } else { descriptorHeap = CreateDescriptorHeap(); m_DescriptorHeapPool.push(descriptorHeap); } return descriptorHeap; } |
If the m_AvailableDescriptorHeaps
queue is not empty, then the first element is popped off the queue. If the m_AvailableDescriptorHeaps
queue is empty, then a new descriptor heap is created on 114 and added to the m_DescriptorHeapPool
.
DynamicDescriptorHeap::CreateDescriptorHeap
If the m_AvailableDescriptorHeaps
queue is empty, then a new descriptor heap is crated using the CreateDescriptorHeap
method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> DynamicDescriptorHeap::CreateDescriptorHeap() { auto device = Application::Get().GetDevice(); D3D12_DESCRIPTOR_HEAP_DESC descriptorHeapDesc = {}; descriptorHeapDesc.Type = m_DescriptorHeapType; descriptorHeapDesc.NumDescriptors = m_NumDescriptorsPerHeap; descriptorHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE; Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> descriptorHeap; ThrowIfFailed(device->CreateDescriptorHeap(&descriptorHeapDesc, IID_PPV_ARGS(&descriptorHeap))); return descriptorHeap; } |
Descriptor heap creation is described in detail in the first lesson in this series. What is interesting to note here is that the descriptor heap is created with the D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE
flag which enables these descriptors to be mapped to the command list and used to access resources in a HLSL shader.
DynamicDescriptorHeap::CommitStagedDescriptors
Arguably the most interesting (and most complex) method of the DynamicDescriptorHeap
class is the CommitStagedDescriptors
method. This method copies the staged descriptors in the descriptor table cache to the GPU visible descriptor heap and binds the descriptors to the command list using the appropriate method.
1 2 3 4 |
void DynamicDescriptorHeap::CommitStagedDescriptors(CommandList& commandList, std::function<void(ID3D12GraphicsCommandList*, UINT, D3D12_GPU_DESCRIPTOR_HANDLE)> setFunc) { // Compute the number of descriptors that need to be copied uint32_t numDescriptorsToCommit = ComputeStaleDescriptorCount(); |
The CommitStagedDescriptors
method takes two parameters: the command list used to bind the descriptors and a setter function that is either ID3D12GraphicsCommandList::SetGraphicsRootDescriptorTable
or ID3D12GraphicsCommandList::SetComputeRootDescriptorTable
depending on the command being executed on the command list.
DynamicDescriptorHeap::CommitStagedDescriptors
method should not be called directly. The DynamicDescriptorHeap::CommitStagedDescriptorsForDraw
and the DynamicDescriptorHeap::CommitStagedDescriptorsForDispatch
should be used instead.The number of descriptors that need to be committed is computed on line 139 using the ComputeStaleDescriptorCount
method described earlier.
1 2 3 4 5 |
if ( numDescriptorsToCommit > 0 ) { auto device = Application::Get().GetDevice(); auto d3d12GraphicsCommandList = commandList.GetGraphicsCommandList().Get(); assert(d3d12GraphicsCommandList != nullptr); |
If there are no descriptors to commit, the CommitStagedDescriptors
method should do nothing. The ID3D12Device
is retrieved from the Application
class on line 143 and a pointer to the ID3D12GraphicsCommandList
is retrieved on 144. On line 145, the pointer to the ID3D12GraphicsCommandList
is checked to make sure it is not null.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
if ( !m_CurrentDescriptorHeap || m_NumFreeHandles < numDescriptorsToCommit ) { m_CurrentDescriptorHeap = RequestDescriptorHeap(); m_CurrentCPUDescriptorHandle = m_CurrentDescriptorHeap->GetCPUDescriptorHandleForHeapStart(); m_CurrentGPUDescriptorHandle = m_CurrentDescriptorHeap->GetGPUDescriptorHandleForHeapStart(); m_NumFreeHandles = m_NumDescriptorsPerHeap; commandList.SetDescriptorHeap(m_DescriptorHeapType, m_CurrentDescriptorHeap.Get()); // When updating the descriptor heap on the command list, all descriptor // tables must be (re)recopied to the new descriptor heap (not just // the stale descriptor tables). m_StaleDescriptorTableBitMask = m_DescriptorTableBitMask; } |
If either the m_CurrentDescriptorHeap
is null (which is the case when the DynamicDescriptorHeap
is first created or after it has been reset) or there are not enough free handles to commit to the descriptor heap, a new heap retrieved using the RequestDescriptorHeap
method on line 149.
On lines 150-151 the CPU and GPU descriptor handles are set to the first descriptors in the new heap and the number of free handles is reset to the total number of descriptors in the descriptor heap.
The CommandList::SetDescriptorHeap
method is used to ensure the command list has the new descriptor heap bound.
When changing descriptor heaps, it is necessary to copy all of the staged descriptors to the descriptor heap (not just the ones that have been updated since the last time the descriptors were committed). Resetting the m_StaleDescriptorTableBitMask
variable to the value of the m_DescriptorTableBitMask
on line 159 ensures that all of the staged descriptors are copied to the new descriptor heap.
1 2 3 4 5 6 |
DWORD rootIndex; // Scan from LSB to MSB for a bit set in staleDescriptorsBitMask while ( _BitScanForward( &rootIndex, m_StaleDescriptorTableBitMask ) ) { UINT numSrcDescriptors = m_DescriptorTableCache[rootIndex].NumDescriptors; D3D12_CPU_DESCRIPTOR_HANDLE* pSrcDescriptorHandles = m_DescriptorTableCache[rootIndex].BaseDescriptor; |
The _BitScanForward
intrinsic method is used to iterate the stale descriptor tables that need to be committed to the GPU visible desccriptor heap.
On lines 166-165, the number of descriptors and the pointer to the CPU visible descriptors in the descriptor table cache is retrieved.
1 2 3 4 5 6 7 8 |
D3D12_CPU_DESCRIPTOR_HANDLE pDestDescriptorRangeStarts[] = { m_CurrentCPUDescriptorHandle }; UINT pDestDescriptorRangeSizes[] = { numSrcDescriptors }; |
Before the descriptors are copied to the GPU visible descriptor heap, it is necssary to configure an array that contains the destination descriptor handles and an array that contains the destination descriptor ranges.
1 2 3 |
// Copy the staged CPU visible descriptors to the GPU visible descriptor heap. device->CopyDescriptors(1, pDestDescriptorRangeStarts, pDestDescriptorRangeSizes, numSrcDescriptors, pSrcDescriptorHandles, nullptr, m_DescriptorHeapType); |
The CPU descriptor handles are copied to the GPU visible descriptor heap on line 178 using the ID3D12Device::CopyDescriptors
method. This method has the following signature [4]:
1 2 3 4 5 6 7 8 9 |
void CopyDescriptors( UINT NumDestDescriptorRanges, const D3D12_CPU_DESCRIPTOR_HANDLE *pDestDescriptorRangeStarts, const UINT *pDestDescriptorRangeSizes, UINT NumSrcDescriptorRanges, const D3D12_CPU_DESCRIPTOR_HANDLE *pSrcDescriptorRangeStarts, const UINT *pSrcDescriptorRangeSizes, D3D12_DESCRIPTOR_HEAP_TYPE DescriptorHeapsType ); |
And takes the following arguments:
UINT NumDestDescriptorRanges
: The number of destination descriptor ranges to copy to. In this case, there is only 1 destintion descriptor range.const D3D12_CPU_DESCRIPTOR_HANDLE *pDestDescriptorRangeStarts
: An array ofD3D12_CPU_DESCRIPTOR_HANDLE
s to copy to.const UINT *pDestDescriptorRangeSizes
: An array of destination descriptor range sizes to copy to.UINT NumSrcDescriptorRanges
: The number of source descriptor ranges to copy from. There is no requirement that the source descriptors appear contigiously in the same CPU visible descriptor heap (or that they come from the same descriptor heap) the number of source ranges is equal to the number of descriptors to copy. That is, the size of each source descriptor range is 1.const D3D12_CPU_DESCRIPTOR_HANDLE *pSrcDescriptorRangeStarts
: An array ofD3D12_CPU_DESCRIPTOR_HANDLE
s to copy from.const UINT *pSrcDescriptorRangeSizes
: An array of source descriptor range sizes to copy from. This parameter is optional and if null, then each descriptor range size is considered to be 1 and the descriptors are copied one at a time. Since the source descriptors do not appear in a consecutive range in the source descriptor heaps, this behaviour is exactly what is required.D3D12_DESCRIPTOR_HEAP_TYPE DescriptorHeapsType
: Specifies the type of descriptor heap to copy with.
1 2 |
// Set the descriptors on the command list using the passed-in setter function. setFunc(d3d12GraphicsCommandList, rootIndex, m_CurrentGPUDescriptorHandle); |
Using the setter function passed to the CommitStagedDescriptors
method, the GPU visible descriptors are set on the command list.
1 2 3 4 |
// Offset current CPU and GPU descriptor handles. m_CurrentCPUDescriptorHandle.Offset(numSrcDescriptors, m_DescriptorHandleIncrementSize); m_CurrentGPUDescriptorHandle.Offset(numSrcDescriptors, m_DescriptorHandleIncrementSize); m_NumFreeHandles -= numSrcDescriptors; |
The current CPU and GPU descriptor handles are incremented on lines 186-187 by the number of descriptors that were copied and the number of free handles in the current descriptor heap is decremented on line 188.
1 2 3 4 5 |
// Flip the stale bit so the descriptor table is not recopied again unless it is updated with a new descriptor. m_StaleDescriptorTableBitMask ^= (1 << rootIndex); } } } |
To ensure the current descriptor table is not copied again (unless the descriptors are updated) the corresponding bit in the m_StaleDescriptorTableBitMask
bitmask is inverted on line 191.
DynamicDescriptorHeap::CommitStagedDescriptorsForDraw
The CommitStagedDescriptorsForDraw
method is a helper method that forwards the ID3D12GraphicsCommandList::SetGraphicsRootDescriptorTable
method to the CommitStagedDescriptors
method.
1 2 3 4 |
void DynamicDescriptorHeap::CommitStagedDescriptorsForDraw(CommandList& commandList) { CommitStagedDescriptors(commandList, &ID3D12GraphicsCommandList::SetGraphicsRootDescriptorTable); } |
DynamicDescriptorHeap::CommitStagedDescriptorsForDispatch
The CommitStagedDescriptorsForDispatch
method is a helper method that forwards the ID3D12GraphicsCommandList::SetComputeRootDescriptorTable
method to the CommitStagedDescriptors
method.
1 2 3 4 |
void DynamicDescriptorHeap::CommitStagedDescriptorsForDispatch(CommandList& commandList) { CommitStagedDescriptors(commandList, &ID3D12GraphicsCommandList::SetComputeRootDescriptorTable); } |
DynamicDescriptorHeap::CopyDescriptor
The CopyDescriptor
method is used to copy a single CPU visible descriptor to a GPU visible descriptor heap.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
D3D12_GPU_DESCRIPTOR_HANDLE DynamicDescriptorHeap::CopyDescriptor(CommandList& comandList, D3D12_CPU_DESCRIPTOR_HANDLE cpuDescriptor) { if (!m_CurrentDescriptorHeap || m_NumFreeHandles < 1) { m_CurrentDescriptorHeap = RequestDescriptorHeap(); m_CurrentCPUDescriptorHandle = m_CurrentDescriptorHeap->GetCPUDescriptorHandleForHeapStart(); m_CurrentGPUDescriptorHandle = m_CurrentDescriptorHeap->GetGPUDescriptorHandleForHeapStart(); m_NumFreeHandles = m_NumDescriptorsPerHeap; comandList.SetDescriptorHeap(m_DescriptorHeapType, m_CurrentDescriptorHeap.Get()); // When updating the descriptor heap on the command list, all descriptor // tables must be (re)recopied to the new descriptor heap (not just // the stale descriptor tables). m_StaleDescriptorTableBitMask = m_DescriptorTableBitMask; } |
Similar to the CommitStagedDescriptors
method, there must be at least one descriptor avaiable in the currently bound descriptor heap. If the current descriptor heap is not valid or there are no free descriptors in the descirptor heap, a new descriptor heap is requested on line 210. If the current descriptor heap changes, then the new descriptor heap must be updated on the command list. It is also important to reset the m_StaleDescriptorTableBitMask
to ensure that all descriptors are copied to the new GPU visible descriptor heap before a draw or dispatch command is executed on the command list.
1 2 3 4 5 6 7 8 9 10 11 |
auto device = Application::Get().GetDevice(); D3D12_GPU_DESCRIPTOR_HANDLE hGPU = m_CurrentGPUDescriptorHandle; device->CopyDescriptorsSimple(1, m_CurrentCPUDescriptorHandle, cpuDescriptor, m_DescriptorHeapType); m_CurrentCPUDescriptorHandle.Offset(1, m_DescriptorHandleIncrementSize); m_CurrentGPUDescriptorHandle.Offset(1, m_DescriptorHandleIncrementSize); m_NumFreeHandles -= 1; return hGPU; } |
Since only a single descriptor is being copied from the source descriptor to the destination descriptor the ID3D12Device::CopyDescriptorsSimple
method is used. This method has the following signature [5]:
1 2 3 4 5 6 |
void CopyDescriptorsSimple( UINT NumDescriptors, D3D12_CPU_DESCRIPTOR_HANDLE DestDescriptorRangeStart, D3D12_CPU_DESCRIPTOR_HANDLE SrcDescriptorRangeStart, D3D12_DESCRIPTOR_HEAP_TYPE DescriptorHeapsType ); |
And takes the following parameters:
UINT NumDescriptors
: The number of descriptors to copy. Both source and destination descriptors are considered to be consecutively ordered in the descriptor heap.D3D12_CPU_DESCRIPTOR_HANDLE DestDescriptorRangeStart
: AD3D12_CPU_DESCRIPTOR_HANDLE
that describes the destination descriptors to start to copy to.D3D12_CPU_DESCRIPTOR_HANDLE SrcDescriptorRangeStart
: AD3D12_CPU_DESCRIPTOR_HANDLE
that describes the source descriptors to start to copy from.D3D12_DESCRIPTOR_HEAP_TYPE DescriptorHeapsType
: Specifies the type of descriptor heap to copy with.
After copying the descriptor to the GPU visible descriptor heap, the current CPU and GPU handles are incremented, the number free handles is decremented, and the GPU descriptor handle is returned on line 232.
DynamicDescriptorHeap::Reset
The Reset
method is called on the DynamicDescriptorHeap
class when the commands that are referencing any descriptor in the DynamicDescriptorHeap
have finished executing on the GPU. When the DynamicDescriptorHeap
is reset, all of the descriptor heaps are made avaiable again and the descriptor table cache is reset.
1 2 |