Sampler Feedback Streaming is a DirectX 12 Ultimate feature that was introduced a couple of years ago by Microsoft. Sampler Feedback Streaming allows for continuous and intelligent loading and eviction of small texture tiles, ensuring that only the textures that are required are displayed at their highest quality. This gives way to higher quality assets than previously possible, while making better use of GPU memory. A more detailed explanation can be found here, but essentially, this is how the technique works:

This implementation of Sampler Feedback Streaming uses DX12 Sampler Feedback in combination with DX12 Reserved Resources, aka Tiled Resources. A multi-threaded CPU library processes feedback from the GPU, makes decisions about which tiles to load and evict, loads data from disk storage, and submits mapping and uploading requests via GPU copy queues. There is no explicit GPU-side synchronization between the queues, so rendering frame rate is not dependent on completion of copy commands (on GPUs that support concurrent multi-queue operation) - in this sample, GPU time is mostly a function of the Sampler Feedback Resolve() operations described below. The CPU threads run continuously and asynchronously from the GPU (pausing when there's no work to do), polling fence completion states to determine when feedback is ready to process or copies and memory mapping has completed.

  1. Texture streams are allocated as DX12 Reserved Resources.
  2. A feedback resource is created which corresponds to each streaming resource, with the same dimensions to record info about which texels were sampled.
  3. Draw objects while recording feedback.
  4. Resolve feedback resource before interpreting on the CPU.
  5. Determine which tiles to load and evict.
  6. Update the residency map.

Below is a demonstration of the technology at work.

In this video, we tested the demo of the feature on the following test system:

RTX 3080 Ti

Ryzen 5 3600

32 GB Crucial Ballistix Sport LT RAM

ADATA XPG SX8200 Pro NVMe SSD 1 TB

Seasonic Focus GX-1000

Arctic Freezer 34 CO

Phanteks Eclipse P500A DRGB

The first 50 seconds of the video features just a single object - a terrain-like object which allows the user to explore Sampler Feedback Streaming. In the top right corner, the window on the left shows the raw GPU min mip feedback, while on the right is residency map as a result of Sampler Feedback Streaming. This will give you an idea of which tiles are evicted as you move the terrain around the screen. SFS seems to be doing a very effective job of displaying this terrain, even with quick camera cuts, without any sort of visible break up or pop-in.

From 0:50 onwards is the more interesting part.

The demo starting at the 50 second mark was configured to run in 4K and each object is a 16k x 16k BC7 texture. Ordinarily, this would take up 350MB of GPU memory for each individual object. The demo features 985 such objects, which would take up nearly 300 GB (yes, that's GB, not MB) of GPU memory. However, as can be seen in the video, allocated VRAM usage is just over 3 GB, and dedicated VRAM usage is just over 2 GB! SFS allows for visuals that simply would not otherwise be possible due to memory constraints.

At 1:11 of the video, things get even more interesting. The benchmark mode of the demo is engaged, which spins the camera, as well as each individual object, at its maximum speed, requiring each tile to be loaded and evicted at blazing fast speeds. As you can see, the read speed from our NVMe SSD shoots up to around 1.7 GB/s! While this demo is aimed at showcasing Sampler Feedback Streaming, it can also be useful to test the sustained speed of your SSD and see whether or not it can maintain such high speeds over a long period of time. In regards to thermals, we tested the benchmark mode of this demo for nearly 20 minutes, and we did not see our ADATA XPG SX8200 Pro going over 50 degrees, despite maintaining approximately 1.7 GB/s read speeds.

When dealing with a scenario in which assets are needed quickly, even if you don't need a large amount of bandwidth, an SSD will be able to page quickly due to its low latency. Given that Sampler Feedback Streaming loads and evicts many tiles at once, this will be crucial in games utilizing SFS. However, as seen in the benchmark mode of this demo, when many high quality assets worth several GBs need to be streamed in at once in a short period of time, the high bandwidth of an NVMe SSD will also be required. Sampler Feedback Streaming working in tandem with high speed storage will be the gateway to high quality visuals, while keeping memory requirements within a very reasonable range.

Another important aspect of this is DirectStorage. Up until Windows 11, we were dealing with a very antiquated storage architecture that failed to take full advantage of advanced storage devices such as NVMe SSDs. As we have written previously, Windows 11 changes that with the additions of I/O Rings and BypassIO. However, games are still being developed for the old storage APIs. Once developers start utilizing DirectStorage, especially on Windows 11, we will see lower CPU overhead and much more efficient loading of assets from storage. When combining Sampler Feedback Streaming with fast SSDs and DirectStorage, the possibilities will be endless for game developers. SFS will intelligently decide which textures will need to be loaded or evicted, and DirectStorage will provide those assets in an efficient manner, paving the path to larger and more detailed worlds with no break-up or pop-in.

Once DirectStorage games or demos become available, we will provide in-depth coverage of its performance, along with detailed storage activity metrics. Make sure to stay in touch with us so you do not miss out.

Stay in touch with Compusemble

To stay up to date on tech news, as well as to see our scores for PC components such as GPUs, CPUs, and SSDs, visit our site and follow us on Twitter.

Visit our YouTube channel for all your tech and gaming content needs.

Previous Post Next Post