Around a year ago, I was working on a technology demonstration for my WebAssembly back-end, a project that would end up being called wasmCloud. The demo was a game engine that ran in the cloud--a distributed ECS (Entity/Component/System) architecture.
This game allowed players to enter a shared, persistent, 3D sci-fi universe from web browsers and mobile devices alike. The ECS had a system called
radar that would, on each frame, populate the list of entities to appear on a player's radar GUI.
After a few hundred people got into the game, we noticed the radar system would get a bit laggy. After it lagged, it eventually got so bad that the radar contacts being displayed in the GUI were several seconds behind what was actually happening in real-time according to the
movement system. If we left the game running long enough, the radar system lag ultimately made the entire game unplayable and we had to restart it.
Thankfully the fault did not lie with my underlying "WebAssembly in the cloud" technology, but rather with the naive radar algorithm.
Some of us are old enough to have learned computer programming at a time when most professors thought Object-Oriented Programming was "the way".
We were enthralled by our ability to create cats that could also be lions, chairs that could also be containers, animals that could roar or meow. It was the best time to be alive.
But modeling things in a computer using the same perspective with which humans view that thing is often naive and, in many cases, highly inefficient or broken outright.
My naive radar algorithm wasn't quite object oriented, but it was modeled the way a human might think about the problem. How do I know the list of radar beacons that are on my screen? You could model all the empty space in the universe and examine it for beacons, but we all know that would be a ludicrous idea. Surely there was a better way.
So my "better way" was this: During each entity-frame (a frame sent to a system for a specific entity) just find every radar beacon in the universe (this was an ECS, so I didn't care what entity to which the beacon component was associated), do a euclidean distance check against the positions reported by all beacons in the universe and the radar range of the current
transceiver component. Anything less than the transceiver's range away would get dumped into the
radar_contacts component (a simple list with GUI hints for rendering).
You see the problem here? A universe with 1000 radar beacons and 1000 transceivers (aka 1000 players) would make 1,000,000 loop iterations per root game frame. So, if the
radar system operated at 1hz (1 frame per second), it would have to finish those 1,000,000 loop iterations in under a second. If it didn't finish all of its work within its allotted frame time, it would get behind. Once behind, it would get more behind every frame, until it was eventually so behind that the UI was useless to players. Moreover, if the system needed to make message broker requests for any of those calculations, we're also looking at a million messages per second... just for radar.
We knew this was garbage but it was a demo and we literally got the demo working the day before the conference, so nobody was about to go back in and re-architect radar, especially when we weren't demoing radar, we were demoing WebAssembly as a unit of deployed compute. We just rebooted the game universe once radar got behind. We also scaled out the number of
radar system instances we had running (each system was a
.wasm file/actor), and that delayed the inevitable, but it didn't change the overall throughput requirements of just the radar system from 1,000,000 requests/second.
Having had a ton of time to lick my wounds from the defeat that came at the hands of the age-old villain, "rushing to meet a conference deadline", I'd like to think I know better now.
Rather than modeling the universe in a way that required each
radar system's entity-frame (set/archetype of components) to make a loop iteration/data request for every other player in the universe, we need to model the data in a way that supports the way it will be used.
Each frame, for each player, the
radar system needs to know which other nav beacons are "within range" of the current player's radar receiver. So how do we reduce this to O(1) cost instead of O(n^2)?
There are a couple of techniques, and the first involves an optimization that I call "hash key buddies".
If the vector of things that satisfies your request can be contained within a single hash key, then retrieving that list of things can be done in O(1) cost! I'm sure smarter people have a much better name for this, but the idea is that your hash keys are traversal paths through a binary space partition tree.
I woke up one morning with this idea for recursively sub-dividing regions of space so that the path through these sub-divisions would be predictable. Turns out, this was invented decades ago and is called a Binary Space Partition tree.
A BSP is basically a tree structure (there are many specializations, including k-d trees, used for partitioning n-dimensional space) that divides some "space" into a binary tree. The organization of the tree is up to you--you choose what the leaf nodes contain for your needs. I discovered that there are also other kinds of trees that aren't binary that are optimized for different uses (which led me down a rabbit hole of searching and tab opening).
In our case, we could have taken the 3-dimensional universe and sub-divide it in half recursively down to some depth that is a multiple of our universe size. At each "slice" step, we cut the universe in half and the left child and right child refer to the two divisions, respectively. We then recursively slice those children in half, and so on.
What we're left with is "buckets". Many BSPs are designed so that a leaf is the smallest unit, e.g. a point in a modeled space of point regions. Here's where the "optimize for need" comes in.
Identifying a bucket is like playing the Price is Right game of Plinko. You start at the top (root node) and you traverse recursively by "turning left" or "turning right" until you reach a leaf node. You can then map a "left turn" to the number 0 and a "right turn" to the number 1. What can we do with a "path of bits"? Why, we can turn them into real numbers of course, which can then be used as hash keys!
So let's say my world is 256x256x256 and I want to sub-divide it such that each leaf is 16x16x16, which is roughly the 3D radius of my most powerful radar receiver. If I take a player's location and recursively compare it against the "buckets" in the tree until I hit a leaf, I might take the path right-left-left-left. This gives me a "bit path" of 1000, which is the number 8 (side note: while there are math shortcuts to speed this up, the "slow" traversal to find the bucket is O(log2n) cost).
This has the very, very useful side effect of making it so that everything in the same 16x16x16 region of space can have a "bit path", which I'll call the position key, of 8.
If I want the fastest radar approximation, I can simply dump everyone in the same position key region of space onto my radar display and call it done. This is O(1) cost.
What if I need it to be more accurate? You've probably figured out that this algorithm treats the imaginary boundaries between sub-regions of space as hard barriers that can block radar.
In reality, if I'm in the top right corner of a region, my radar receiving sphere (assuming that's the shape) will overlap with a few other regions. Here's what's even more awesome about hash buddy keys. The region of space that is adjacent to mine is also mathematically adjacent to my "buddy key".
If my algorithm does some quick back-of-napkin math to figure out the other region into which my radar sphere extends, I can get an exact list of all the nav beacons in my range by doing a euclidean distance check on those in my own region and the adjacent region, making the cost O(n) where
n in this case is the aggregate number of beacons that could reside in those regions. This is less efficient than the previous, but still deals with a much smaller value of
n. Further, the smaller the sub-division the fewer the number of entities I need to check during each frame (but also the smaller the player's perceived radar range).
So, in short, if you need to support a query like "give me everything in my sub-region of the world" or you do a 2-pass filter, which is "euclidean distance filter everything within my sub-region or collection of sub-regions", then using a BSP traversal path as a key for a hash map of vectors could be a ridiculously fast optimization. If you need to do anything other than just dump the bucket items into some other component, then this solution does not scale. This kind of modeling can cause a "zone crash", where a single "bucket" or approximation region contains so many items that you now have the original problem I described--it takes longer to process the items in the bucket than you have latency in your frame budget.
It was at this point that I started scratching my head. If my needs go beyond the "give me everyone in my bucket" level of precision, can I optimize the traversal of the bucket inhabitants so that I can get the list of people in radar range faster than O(n)?
You bet you can, and this is where things get pretty awesome. If I organize the list of nav beacons in this bucket by the sub-region of space they're in, then I might be able to do a traversal that's cheaper or shorter than by calculating
n distance squares.
Of course, organizing the items in a leaf bucket into a tree is really just the same thing as making the root tree taller or more deep. Remember that any node in a tree is just another tree.
I briefly mentioned earlier that another way you can model this problem is to take a "point list" (the locations of all the nav beacons in the universe) and then every single nav beacon becomes a leaf on this tree.
Because every point in this tree is a leaf, the "hash key" now refers to only one coordinate in space. Coincidentally, this also means that your K-D tree implementation needs to handle multiple "things" existing in the same position.
This is where, at least to this nerd's mind, things get beautiful. Now, to find the list of all nav beacons that are within some euclidean distance of a point of origin, this is just a graph traversal starting at the radar receiver's leaf, and collecting all nav beacons within
n traversal hops.
I want to make one more point before my bloviation comes to an end. This point is that all too often we just assume that data is stored in just one canonical way for its lifetime. This assumption creeps throughout our code and propagates needless inefficiencies everywhere.
In the case of a k-d tree mapping a point region, this structure might not be optimized for keys changing. Mutating a k-d tree in place requires a re-organization of the entire tree.
If a ship's position changes every frame, then it will be re-organized within this tree every frame. If all 1,000 ships are assumed to be moving, then what we really want to do is construct this k-d tree anew at the beginning of each frame rather than incurring the re-sort/re-organization cost of each move during the
movement system frames. The point here is that given what we know of the data and the rate of change, it's faster to build this structure from scratch every frame than it is to mutate it in place.
In summary, my point here is that thinking of space and the objects within space from a human perspective led me down a horribly inefficient road.
Thinking about space and the objects within it as data structures that are read-optimized or write-optimized for specific purposes means that my systems can now get a lot more work done per frame, which means I can have more systems doing more interesting things in my simulation and I don't have to worry about them getting behind and exceeding their frame latency budget.
p.s. - fun fact: Open Location Codes, also called "place keys" are just ascii-encoded traversal paths through polar coordinate sub-divisions of the planet!