geohashing-intro (3 Part Series)
This is the third part in a series of posts. See the links above to jump to the prior posts
In my last post I talked through how we can use S2 cells to fill or cover a shape on the map. In this case a circle, within which we want to search for items of interest. I also mentioned that Uber came up with H3 cells, a technique for covering a shape on the map with hexagons.
The images below show S2 and H3 coverings of the same area in Washington, DC.
The S2 covering is composed of cells of different sizes or levels. Some are larger cells that fit mostly into the circle, others are smaller cells that help cover the edges. This means fewer cells overall are needed to cover the space vs the H3 example, albeit with the downside of a more uneven covering compared to what we see in H3.
With that somewhat uneven covering, we can see how important that distance calculation is in the final step of our S2 geo-search from the prior post. It excludes those points of interest that came from the outer edges of covering cells.
These images also illustrate how S2 cells can nest. Any given S2 cell will fit completely into its parent cell at a higher level. S2 cells nesting allow us to calculate coverings with diverse cell sizes. Nesting also would allow us to roll up location based data into larger aggregates. If data is stored at a small (high level) cell, then we can roll up that data to a neighborhood, city or region level by summing up the data from all the child cells in a bigger cell that covers the target area.
In contrast while H3 provides a more precise covering (at least a high resolution), hexagons do not nest. As we scale up levels, the hexagons overlap incompletely. So we cannot have a meaningful H3 covering with diverse hexagon sizes. Without nesting, aggregation is not as automatic. While we can still roll up data from the smaller hexagons, we cannot map it directly to larger ones.
While watching a brilliant discussion on Twitch between DynamoDB gurus Rick Houlihan and Alex DeBrie, at around an hour in, Rick dropped a quick reference to an AWS customer who had taken the DynamoDB Geo library I looked at last post and improved upon it. Thanks to twitch user switch184 in the comments for pointing me to the repo at: https://github.com/Dash-Labs/dynamodb-geo
#Geo Library for Amazon DynamoDB
This library was forked from the AWS geo library.
Following limitations with the aws geo library were the main reasons that necessitated this fork:
- Usage required a table’s hash and range key to be replaced by geo data. This approach is not feasible as it cannot be used when performing user-centric queries, where the hash and range key have to be domain model specific attributes.
- Developed prior to GSI, hence only used LSI
- No solution for composite queries. For e.g. “Find something within X miles of lat/lng AND category=‘restaurants’;
- The solution executed the queries and returned the final result. It did not provide the client with any control over the query execution.
What methods are available for geo-querying in this library?
- Query for a given lat/long
- Radius query
- Box/Rectangle query
All of the above queries can be run as composite queries, depending on their…
One bit of documentation that stuck out to me was more discussion on hash key length, including the number of queries the library produces for a radius search with a given hash key. Something that tripped me up at first is that the geohash key they are talking about here is the first X digits of an S2 cell id, and the length of that key does not match 1:1 to the level of the cell. As the Dash-Labs repo suggests, a 5 or 6 digit long geohash key is well suited for near proximity searches. I still struggle with understanding the math behind that assertion, but the sample query results are convincing.
Thanks for reading and happy geohashing!