TL;DR: Embark on our journey to enhance DynamoDB geospatial queries with custom Z-Order indexing implementation! Discover calculating Z-addresses, fine-tuning queries, and results. Although performance is equivalent to Mongo Atlas, maintaining your own Z-Order indexing is very challenging. We recommend using non DynamoDB off-the-shelf alternatives.
This article will guide you through our experimentation of a Z-Order index on DynamoDB, based on the excellent work by Zack Slayton in his article on Z-order indexing.
When it comes to handling complex, geographic queries in Amazon DynamoDB, there is no off the shelf solution. Z-Order indexing offers an efficient and powerful solution to this problem. It is particularly suited for querying a small zone around a geolocation.
Before diving in, let’s go through some of Z-Order core concepts.
🤖 All our implementations of the functions we are describing are available on this gist
The first step in implementing a Z-Order index is calculating the Z-addresses for your records. A Z-address is a single number representing a tuple of field values. To calculate a Z-address, simply interleave the bits of each field in the record. Check out the examples provided in the base article to see how this is done.
Once you can generate a z-order index, apply it to your dataset and store it in DynamoDB. The SK will be the z-order Index.
With Z-addresses in hand, you can now build queries to retrieve data. This step was much more challenging than we thought.
Begin by defining a minimum and maximum value for the range of items you want to read. Then, create two records representing the lower and upper bounds of this range. Translate these records into Z-addresses to form the query space.
However, if your query range intersects with "seams" in the Z-order curve, you'll need to use two functions to avoid irrelevant regions in the search space:
zDivide. These functions help narrow down the search space and minimize garbage data.
You can find our implementation in this gist. We left it as a PoC and consider it to be far from prod ready
We parametrized the algorithm to be able to optimize the precision of our Z-Order indexes and the data outside of our search box.
The main limitation of Z-Order indexing is the accuracy of your queries. This is due to the conversion from latitude/longitude to an index
You can optimize the precision of your Z-Order indexes by tweaking the parameters of the algorithm.
Our implementation also doesn’t include more than 2D data, you will not be able to query on timestamp as an index for example.
Z-Order indexing significantly improves the efficiency of your DynamoDB queries when fetching geographic data. By implementing the Z-Order index and utilizing functions
zDivide, you can minimize garbage data and optimize your database's performance.
Our feeling is that handling your own geo-index is challenging and hard to maintain. If it is not critical for your use case to handle it yourself, we recommend you use off the shelf solutions such as
- MongoDB Atlas
- Postgis Plugin on your Postgre Database
With the information provided in this article and the base knowledge from Zack Slayton's article on Z-Order indexing, you're now equipped to supercharge your DynamoDB queries with Z-Order indexing. Happy querying! 🚀🎉