The Mantle version clearly shows a much more balanced CPU load across the cores – though the total CPU utilisation has only dropped from 23% on DirectX 11, to 21% on Mantle. The more balanced load is exactly as we’d hoped, since all the Mantle API calls are now distributed across the available cores by our Asura engine’s multithreaded task system, just like we do for other systems like AI, animation or physics.
It’s worth noting that Sniper Elite 3 and the Asura engine are already optimised to account for DirectX 11’s weaknesses. For example, we make heavy use of instancing and similar batching techniques to reduce the number of draw calls we make per frame – all the usual things to reduce CPU overhead, which means Mantle will have less easy wins compared to other draw-call heavy titles.
So that’s what the CPU is doing – but what’s the actual framerate? On those settings we’re running at an average of 88fps on DX11, and 100fps on Mantle – around a 14% speed increase. This explains why the total CPU utilisation is still quite similar – with Mantle the CPU has to cope with 12 more frames every second, meaning we’re packing in more work and still using less CPU power. Furthermore because the work is more distributed, if we increase CPU load (say by using a faster graphics card, or by lowering resolution) we’re less likely for a single logical processor to become the bottleneck.
The size of the frame-rate increase is a pleasant surprise, as frankly at this stage in development we were expecting to have a more roughly equal frame-rate when GPU bound.
There’s still a fair amount of scope for increasing performance with Mantle, particularly as we’re not yet taking advantage of the Asynchronous Compute queue. This would allow us to take some of our expensive compute shaders – like our Obscurance Fields technique – and schedule them to run in parallel with the rendering of shadow maps, which are particularly light on ALU work.