DEV Community

Cover image for System Design: Geohashing and Quadtrees
Karan Pratap Singh
Karan Pratap Singh

Posted on • Originally published at github.com

System Design: Geohashing and Quadtrees

Geohashing

Geohashing is a geocoding method used to encode geographic coordinates such as latitude and longitude into short alphanumeric strings. It was created by Gustavo Niemeyer in 2008.

For example, San Francisco with coordinates 37.7564, -122.4016 can be represented in geohash as 9q8yy9mf.

How does Geohashing work?

Geohash is a hierarchical spatial index that uses Base-32 alphabet encoding, the first character in a geohash identifies the initial location as one of the 32 cells. This cell will also contain 32 cells. This means that to represent a point, the world is recursively divided into smaller and smaller cells with each additional bit until the desired precision is attained. The precision factor also determines the size of the cell.

geohashing

Geohashing guarantees that points are spatially closer if their Geohashes share a longer prefix which means the more characters in the string, the more precise the location. For example, geohashes 9q8yy9mf and 9q8yy9vx are spatially closer as they share the prefix 9q8yy9.

Geohashing can also be used to provide a degree of anonymity as we don't need to expose the exact location of the user because depending on the length of the geohash we just know they are somewhere within an area.

The cell sizes of the geohashes of different lengths are as follows:

Geohash length Cell width Cell height
1 5000 km 5000 km
2 1250 km 1250 km
3 156 km 156 km
4 39.1 km 19.5 km
5 4.89 km 4.89 km
6 1.22 km 0.61 km
7 153 m 153 m
8 38.2 m 19.1 m
9 4.77 m 4.77 m
10 1.19 m 0.596 m
11 149 mm 149 mm
12 37.2 mm 18.6 mm

Use cases

Here are some common use cases for Geohashing:

  • It is a simple way to represent and store a location in a database.
  • It can also be shared on social media as URLs since it is easier to share, and remember than latitudes and longitudes.
  • We can efficiently find the nearest neighbors of a point through very simple string comparisons and efficient searching of indexes.

Examples

Geohashing is widely used and it is supported by popular databases.

Quadtrees

A quadtree is a tree data structure in which each internal node has exactly four children. They are often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. Each child or leaf node stores spatial information. Quadtrees are the two-dimensional analog of Octrees which are used to partition three-dimensional space.

quadtree

Types of Quadtrees

Quadtrees may be classified according to the type of data they represent, including areas, points, lines, and curves. The following are common types of quadtrees:

  • Point quadtrees
  • Point-region (PR) quadtrees
  • Polygonal map (PM) quadtrees
  • Compressed quadtrees
  • Edge quadtrees

Why do we need Quadtrees?

Aren't latitude and longitude enough? Why do we need quadtrees? While in theory using latitude and longitude we can determine things such as how close points are to each other using euclidean distance, for practical use cases it is simply not scalable because of its CPU-intensive nature with large data sets.

quadtree-subdivision

Quadtrees enable us to search points within a two-dimensional range efficiently, where those points are defined as latitude/longitude coordinates or as cartesian (x, y) coordinates. Additionally, we can save further computation by only subdividing a node after a certain threshold. And with the application of mapping algorithms such as the Hilbert curve, we can easily improve range query performance.

Use cases

Below are some common uses of quadtrees:

  • Image representation, processing, and compression.
  • Spacial indexing and range queries.
  • Location-based services like Google Maps, Uber, etc.
  • Mesh generation and computer graphics.
  • Sparse data storage.

This article is part of my open source System Design Course available on Github.

Top comments (0)