Why Not Use Heterogeneous Multi-GPU?

#gpu #directx #rendering #graphics

There was an interesting discussion recently on one Slack channel about using integrated GPU (iGPU) together with discrete GPU (dGPU). Many sound ideas were said there, so I think it's worth writing them down. But because I probably never blogged about multi-GPU before, few words of introduction first:

The idea to use multiple GPUs in one program is not new, but not very widespread either. In old graphics APIs like Direct3D 11 it wasn't easy to implement. Doing it right in a complex game often involved engaging driver engineers from the GPU manufacturer (like AMD, NVIDIA) or using custom vendor extensions (like AMD GPU Services - see for example Explicit Crossfire API).

New generation of graphics APIs – Direct3D 12 and Vulkan – are lower level, give more direct access to the hardware. This includes the possibility to implement multi-GPU support on your own. There are two modes of operation. If the GPUs are identical (e.g. two graphics cards of the same model plugged to the motherboard), you can use them as one device object. In D3D12 you then index them as Node 0, Node 1, ... and specify NodeMask bit mask when allocating GPU memory, submitting commands and doing all sorts of GPU things. Similarly, in Vulkan you have VK_KHR_device_group extension available that allows you to create one logical device object that will use multiple physical devices.

But this post is about heterogeneous/asymmetric multi-GPU, where there are two different GPUs installed in the system, e.g. one integrated with the CPU and one discrete. A common example is a laptop with "switchable graphics", which may have an Intel CPU with their integrated “HD” graphics plus a NVIDIA GPU. There may even be two different GPUs from the same manufacturer! My new laptop (ASUS TUF Gaming FX505DY) has AMD Radeon Vega 8 + Radeon RX 560X. Another example is a desktop PC with CPU-integrated graphics and a discrete graphics card installed. Such combination may still be used by a single app, but to do that, you must create and use two separate Device objects. But whether you could, doesn't mean you should…

First question is: Are there games that support this technique? Probably few… There is just one example I heard of: Ashes of the Singularity by Oxide Games, and it was many years ago, when DX12 was still fresh. Other than that, there are mostly tech demos, e.g. "WITCH CHAPTER 0 [cry]" by Square Enix as described on DirectX Developer Blog (also 5 years old).

iGPU typically has lower computational power than dGPU. It could accelerate some pieces of computations needed each frame. One idea is to hand over the already rendered 3D scene to the iGPU so it can finish it with screen-space postprocessing effects and present it, which sounds even better if the display is connected to iGPU. Another option is to accelerate some computations, like occlusion culling, particles, or water simulation. There are some excellent learning materials about this technique. The best one I can think of is: Multi-Adapter with Integrated and Discrete GPUs by Allen Hux (Intel), GDC 2020.

However, there are many drawbacks of this technique, which were discussed in the Slack chat I mentioned:

It's difficult to implement multi-GPU support in general and to synchronize things properly.
iGPUs have greatly varying performance, from quite fast to very slow, so implementing it to always give a performance uplift is even harder.
Passing data back and forth between dGPU and iGPU involves multiple copies. The cost of it may be larger than the performance benefit of computing on iGPU.
iGPU shares same power and thermal limitations, memory bandwidth, and caches as the CPU, so they may slow each other down.
If you offload finishing render frame (postprocessing and Present) to iGPU, you may improve throughput a bit, but you increase latency a lot.
You need to support systems without iGPU as well, so your testing matrix expands. (An interesting idea was posted that if it's a DirectX workload, you might fall back to the software emulated WARP device – it's quite efficient and good quality in terms of correctness and compliance with GPU-accelerated DX).
Finishing and presenting a frame on iGPU sounds like a good idea if the display is connected to iGPU, but it's not so certain. Multi-GPU laptops usually have the build-in display connected to the iGPU, but external display output (e.g. HDMI) may be connected to iGPU or to dGPU (especially in "gaming laptops") – you never know.
Conscious gamers tend to update their graphics drivers for dGPU, but the driver for iGPU is often left in an ancient version, full of bugs.

Conclusion: Supporting heterogeneous multi-GPU in a game engine sounds like an interesting technical challenge, but better think twice before doing it in a production code.

BTW If you just want to use just one GPU and worry about the selection of the right one, see my old post: Switchable graphics versus D3D11 adapters.

DEV Community

Why Not Use Heterogeneous Multi-GPU?

Top comments (0)

Read next

Unlocking Advanced Docker Networking: Macvlan vs. Ipvlan

Unlocking Docker BuildKit for Faster and More Secure Builds

Docker Distributed Storage: GlusterFS vs. Ceph for Persistent Container Data

Docker Logging Drivers: A Comprehensive Guide for Effective Log Management