I’ve been asked numerous times why we are using Unity instead of any other engine, and for us it always come back to the same reason; Unity biggest strength is that you never have to use Unity. It means anything in Unity can be replaced with something that better suits your need. You’re never forced to use what comes out of the box with Unity.
In summer 2018, Hellpoint had its first fully dressed and lit scene. This gave us our first full performance test, and the result were – as expected – terrible. Our performance goals are as follow:
60 FPS on Playstation 4 / Xbox One in single player and 40 FPS in split-screen coop
600 draw calls
In our scene, in the worst sections, we were hitting around:
30 FPS in single player
2000 draw calls
Draw calls isn’t something we will cover here. The triangle count was something we knew was an issue from the start, but we also know how to handle; LODs (Level of Details)! For the following week, our artists created all the LODs needed for our scene.
Importance of LODs
LODs are very important, as they allow meshes with less details to be used when drawn at long distance. While drastically increasing performance, it can also improve the overall visual quality, by removing flickering when drawing details that are smaller than a screen pixel wide.
LODs can also be used to fully culled off an object. For example, if you add decals over a wall, it might not be useful to draw them when they take too small of a screen estate.
They can both decrease your triangle and draw call count. Pretty much every major game use level of details in some way, from manually making them to generating them with tools, or even using GPU subdivision features to reduce triangle count at longer distance.
LODGroup First Results
We integrated them, and fired it up! Here's the results we first got:
6 FPS in single player
4000 draw calls
Now that is going straight in the wrong direction. It didn’t take long to figure out that LODGroups were not behaving in the way we were expecting;
Unity 5.4: “Enable / Disable the LODGroup - Disabling will turn OFF all renderers.”
Unity 5.5: “Enable / Disable the LODGroup - Disabling will turn ON all renderers.”
Surprisingly, the LODGroup documentation still claims disabling it turns off renderers, which isn’t and hasn't been the case since Unity 5.4. Story for another time.
Our sectorization system was made to simply disable a LODGroup when a sector isn’t visible. In our case, it meant suddenly all levels or all LODs of a culled sector were visible, which easily explained the 12 million triangles!
LODGroup Second Results
Disabling renderers instead of disabling LODGroup isn’t too hard, even if the CPU cost of doing so was too much for our likings. Once done:
20 FPS in single player
2000 draw calls
Now the triangle count is very close to our target. The remaining 100k can be shaved off by tweaking LODs or by optimizing some meshes. However, the framerate is still worse than having no LODs! As we dig more, we also found the following issue with LODGroups that in the end would prevent us from using them:
They are computed on every camera, regardless of what the camera renders (no layer culling and no camera flag)
They cost roughly 8ms of CPU time per 100k LODGroup, per camera
Forcing a specific LOD using ForceLOD has no impact on CPU time
The only way to reduce the CPU cost is by turning LODGroup off, which makes all renderers visible
A LOD update is performed before a camera renders, on the main thread and is blocking
At any point in time, Hellpoint would have between 4 to 20 cameras active in the scene;
Main Player World
Main Player Skybox
Main Player UI
Coop Player World
Coop Player Skybox
Coop Player UI
Main Game UI
Reflection Probe realtime update
And a number of shadow caster (0 to 8)
Our scenes also have between 60k to 120k LODGroup. Even while turning off culled LODGroup, we would still end up with a nasty fixed-per camera cost of about 0.056ms per 1000 LODGroup, as seen in the following image.
Even in the best-case scenario, we ended up with 1-1.5 ms spend on LODs per camera which would be 4 to 6ms of CPU time. And in some of the worst case, we would end up with 25-28 ms spent only on updating LODs. Considering we only have a budget of 16ms per frame, Unity LODGroup ended up being way too expensive for us. Even at the best-case scenario, the CPU cost wasn't acceptable. Sadly, removing cameras were also not an option.
We needed a LOD system that would fit our needs;
Only use the player world camera(s) (1 camera in single player, 2 in coop)
Only perform work if changes occur between frames
Turn off all renderers when culled from a sector
We can also make a few assumptions;
It doesn't matter if the proper LOD levels are 1 frame late, from our test nobody is able to notice it
The camera doesn't move fast in Hellpoint
LODs culled from sectorization can be safely ignored in all processes
We don't have to revert the LOD solution between frame, only update it to the newer frame
A frame will always take longer to update/draw, then it takes for any separated thread to compute LODs
Writing this new system took us about three days of work and debug. In the end, the new system only spends on average 0.05 ms of CPU time on the main thread, and about 5ms on a separated working thread. The time spend on the main thread – between 0 and 0.3ms – is depending on how many LOD changed since last frame.
You can see from the screenshot above that the whole LOD update now takes drastically less time than the camera scene culling! However, the most important, this LOD update is only performed once per frame. We hit our goal of 500k triangles and with less CPU overhead than we first anticipated.
You can read more about our steps and how we solve it here.