This site may earn affiliate commissions from the links on this folio. Terms of use.

Last calendar week, Futuremark released Time Spy, a new DirectX 12 criterion that takes full reward of DX12's features and capabilities, including asynchronous compute. While the release of a new benchmark is typically of modest interest, there'due south been a keen deal of confusion, uncertainty, and doubt over Fourth dimension Spy's benchmark results and what those results hateful. Futuremark has since published an updated and expanded guide to how the benchmark functions and what it'due south designed to practice.

Much of the confusion on this topic is related to what Time Spy tests and how it implements back up for asynchronous compute in DirectX 12. A graph from PC Perspective'due south test results from last week volition illustrate the question:

These results show performance in Fourth dimension Spy at the criterion's default settings with asynchronous compute enabled versus disabled. AMD'south GPUs gain a significant amount of functioning, with the RX 480 increasing its score by 8.five% while the R9 Nano and Fury X selection up eleven.1% and 12.nine% respectively. Nvidia's Maxwell cards, in contrast, are flat.

Pascal, however, does gain some functioning, with the GTX 1070 gaining 5.4% and the GTX 1080 picking upwards vi.8%. This stands contrary to what we've seen in nigh DX12 benchmarks to engagement, in which enabling async compute on Nvidia cards either led to a small performance decrease or had no impact on performance at all.

Revisiting asynchronous compute

The debate over whether Fourth dimension Spy is a valid benchmark, and the questions regarding its implementation of asynchronous compute, speak to a significant amount of confusion in the user community virtually what asynchronous compute is, how information technology works, and how it tin can or should be used in DirectX 12.

While asynchronous compute support is a component of DirectX 12, the details of how to implement that support were left to AMD, Intel, and Nvidia. AMD and Nvidia implemented this capability very differently, and with very unlike results. We now know that Nvidia has never implemented async compute in-driver for Maxwell v2 GPUs (GM200, GM204, and GM206), which ways our attempts to narrate and examine the performance impact of running asynchronous compute workloads on Maxwell in Ashes of the Singularity didn't mensurate what we thought they were measuring. The current consensus is that Nvidia is unlikely to ever enable asynchronous compute on Maxwell due to the difficulty of implementing the feature in a way that would improve performance.

While information technology's truthful that Nvidia could've been far more than clear about Maxwell's power to perform asynchronous compute workloads and the benefits (or lack thereof) of doing so, some in the user community have locked on to asynchronous compute as if it were the sole defining feature of the DX12 API. This is not the case.

The reason asynchronous compute has get such a prominent characteristic of DirectX 12 is considering adopting it tends to significantly meliorate operation on AMD hardware. To date, AMD'south GCN has picked up substantially more performance from the shift to DirectX 12 and so Nvidia has, though some of this gain reflects the relative state of driver optimization between the ii companies. Nvidia has historically had more cash to spend on driver optimization and developer relations, fifty-fifty if some of its programs, like GameWorks, have been controversial. Asynchronous compute also improves operation on AMD hardware because it exposes functionality that previously went untapped in DirectX 11.

So where does Pascal fit into all this?

Pascal adds support for fine-grained preemption and dynamic load balancing — two disquisitional features that Maxwell lacked. 1 of the limits of Maxwell'southward asynchronous compute implementation is that the GPU had to schedule its compute and graphics workloads prior to execution and couldn't shift its strategy mid-stream. This made it comparatively likely that enabling asynchronous compute on a Maxwell v2 chip would result in poor performance due to improper resources allocation.

Maxwell is on top, Pascal is underneath.

Maxwell is on top, Pascal is underneath.

Pascal's dynamic load balancing allows the GPU to rapidly shift the resources it dedicates to compute and graphics depending on what'south happening in-game. This feature doesn't automatically guarantee that Pascal will do good from asynchronous compute, but it fixes a major effect with Nvidia's last generation. Pascal's other new capability, fine-grained preemption, allows the GPU to quickly switch between workloads at the pixel level, rather than Maxwell v2's coarse-grained draw-call boundary. Anandtech has only published a longer in-depth look at both these topics if you lot'd like boosted data.

Enthusiasts volition undoubtedly be quick to indicate out that Pascal, despite these changes, nonetheless can't execute an asynchronous compute workload the style AMD tin can — and they're correct. What gets lost is the fact that Pascal'southward architecture wouldn't benefit from executing workloads in the same mode as AMD, because information technology isn't designed to exercise and so. The flip side to this is that workloads optimized for Pascal probably wouldn't run all that well on AMD hardware, either. Futuremark congenital a benchmark that'southward designed to run well on both cards without favoring any unmarried vendor.

This approach should prove similar to what nosotros'll see in time to come titles, given that dissimilar applications can and volition utilise asynchronous compute in singled-out ways and to varying degrees. GPUs inside the aforementioned family unit too respond differently to asynchronous compute; the RX 480 picks up 8.v% in Time Spy while the Fury X gains 12.ix%. Does that mean Time Spy is biased against the RX 480 just considering the Fury X gets a much larger boost from the feature? Of form not.

Futuremark's Time Spy

The Time Spy-related questions can exist broadly summarized every bit follows:

  • Why does Nvidia'southward Pascal architecture gain performance in Time Spy when it shows no performance proceeds from asynchronous compute in other benchmarks?
  • Why doesn't Futuremark implement optimized, vendor-specific code paths for AMD and Nvidia? Isn't this a functional requirement of DX12?

According to Futuremark, Fourth dimension Spy uses a new engine specifically architected for DirectX 12. The benchmark was designed over a period of ii years of active collaboration with Intel, AMD, and Nvidia, all of whom accept had source code access and accept contributed best practices and technical understanding. Furthermore, all of Futuremark's partners have signed off on releasing the benchmark in its current form.

I'd like to note that this public explanation lines upwardly with what nosotros've heard privately. Neither AMD nor Nvidia's PR teams are known for their reticence when information technology comes to attacking benchmarks they perceive every bit flawed or unfair, and neither company has anything negative to say near Time Spy.

Futuremark goes on to say information technology has considered implementing vendor-specific lawmaking paths, but that its partners are invariably against the practice. Information technology writes:

In many cases, an aggressive optimization path would also require altering the work beingness done, which means the test would no longer provide a mutual reference indicate. And with separate paths for each architecture, not only would the outputs not be comparable, but the paths would exist obsolete with every new architecture launch.

3DMark benchmarks use a path that is heavily optimized for all hardware. This path is developed by working with all vendors to ensure that our engine runs equally efficiently equally possible on all available hardware. Without vendor support and participation this would non be possible, simply we are lucky in having active and dedicated development partners.
Ultimately, 3DMark aims to predict the performance of games in full general. To accomplish this, it needs to be able to predict games that are heavily optimized for one vendor, both vendors, and games that are fairly agnostic. 3DMark is not intended to be a measure of the absolute theoretical maximum operation of hardware.

This statement caused some controversy in the user customs because a joint AMD-Nvidia presentation at GDC 2022 prominently claimed that there was no point to implementing DirectX 12 unless you planned to also implement IHV-specific lawmaking paths.

EngineConsiderations

So, is this proof of skullduggery, bias, or cant? No. In fact, from my perspective as a reviewer, it's quite the opposite.

Vendor-optimized paths are risky

Back in 2008, when I worked for Ars Technica, I wrote a review of the Via Nano. During the grade of testing that CPU, I decided to use a VIA-provided utility to alter the CPUID string that identifies the microprocessor. Most of the test scores didn't change, but the memory subsystem score changed drastically.

PCM2K5-2

The Nano (AMD) and Nano (Intel) labels mean that the chip identified itself as having been manufactured by AMD and Intel, respectively. The Intel code path is 47% faster than the default path.

Changing the CPUID improved Nano's operation past 47% because a vendor-specific codepath had been implemented and certain optimizations had been tied to it. Futuremark always insisted that this was due to an accident rather than a deliberate endeavour to skew benchmark results in favor of Intel. When Futuremark announced PCMark 8 I asked the company what had happened later the PCMark05 controversy. Futuremark informed me it had overhauled its developer programs and optimization strategies to avoid vendor-specific, hand-optimized code paths because of the fallout surrounding the PCMark05 consequence.

It would exist hypocritical in the extreme to attack Futuremark for using Intel-specific optimizations in one test, only to turn around and assail it for not implementing AMD or NV-specific optimizations in a different test. If I accept to choose between a general-example, accommodating off-white test that doesn't include vendor-specific optimizations for whatsoever compages, and a benchmark that'due south been optimized to an unknown caste by multiple vendors, I'll have the sometime every time — even if it ways missing out on seeing the absolute best-case scenario for whatsoever given GPU.

A programme like Time Spy, Fire Strike, or 3DMark xi is designed to serve as a full general, representative vehicle for measuring operation in a given serial of tests. Futuremark's customer base isn't express to individual gamers. It too sells site licenses to other companies that want to measure their hardware's general performance in a standardized criterion. 3DMark versions likewise tend to have longer shelf lives than game benchmarks. Most reviewers refresh their game tests on a ane-2 yr bike, while 3DMark versions typically terminal 3 or more. Writing and updating a benchmark that performs decently well on multiple architectures without being specifically optimized for whatever single target may prevent whatever 1 visitor from showcasing a specific feature. But it besides provides a framework that multiple companies can rely on for qualifying their own designs.

Conclusions

Futuremark'south formal statement and updated technical guide contains a neat deal of boosted information on how asynchronous compute is executed on Maxwell, Pascal, and AMD GPUs. Over again, there's merely no testify that this examination is unfairly or unusually biased towards any vendor.

The simply thing this benchmark shows is that Pascal tin see a pocket-sized improvement with async compute enabled. Given the even so-early state of DirectX 12, the limited number of games that use it, and the fact that merely two engines we are aware of take been written for low-overhead APIs from the ground up (Oxide'due south Nitrous engine and Time Spy itself), concluding that this criterion is biased simply because it shows a small gain for Pascal is extremely premature.

Even 12 months subsequently launch, DirectX 12 support in shipping titles is still limited, and our power to characterize what DirectX 12 performance will look like beyond the entire manufacture is similarly constrained. With Pascal merely launched and AMD'southward Vega arriving later this year, at that place'southward going to be ample opportunity to watch how the API evolves.