MarineCadastre.gov AIS vessel density maps with PySpark and h3 | by Jordan Bell | Feb, 2024 | Medium
Office for Coastal Management, 2023: Nationwide Automatic Identification System 2022, https://www.fisheries.noaa.gov/inport/item/67336
See Leveraging ESG Data to Operationalize Sustainability. November 11, 2020. Antoine Amend. Databricks
We ingest AIS message data for June 2022 into a Databricks table. (See https://github.com/jordanbell2357/uscg-nais-data/blob/main/Databricks/Databricks-AIS-01-Data-Loading.ipynb and https://github.com/jordanbell2357/uscg-nais-data/blob/main/Databricks/Databricks-AIS-10-Statistical-Tests.ipynb)
We get the following:
df_ais_june = spark.read.table("ais.june")
df_ais_june.printSchema()
root
|-- MMSI: string (nullable = true)
|-- BaseDateTime: timestamp (nullable = true)
|-- LAT: double (nullable = true)
|-- LON: double (nullable = true)
|-- SOG: double (nullable = true)
|-- COG: double (nullable = true)
|-- Heading: double (nullable = true)
|-- VesselName: string (nullable = true)
|-- IMO: string (nullable = true)
|-- CallSign: string (nullable = true)
|-- VesselType: integer (nullable = true)
|-- Status: integer (nullable = true)
|-- Length: double (nullable = true)
|-- Width: double (nullable = true)
|-- Draft: double (nullable = true)
|-- Cargo: string (nullable = true)
|-- TranscieverClass: string (nullable = true)
|-- filename: string (nullable = true)
In https://github.com/jordanbell2357/uscg-nais-data/blob/main/Databricks/Databricks-AIS-11-Time-Filtering.ipynb we make a plot of vessel density made using pyspark and h3 on a Databricks cluster, for June 4, 2022.
We make similar plots for each day in June in the following, also from https://github.com/jordanbell2357/uscg-nais-data/blob/main/Databricks/Databricks-AIS-11-Time-Filtering.ipynb:
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Polygon
import h3
import h3_pyspark
from pyspark.sql import functions as F
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
import matplotlib.colors as colors
# Define the lat/lon boundaries
min_lat, max_lat, min_lon, max_lon = 10, 60, -140, -50
# Define the hexagon resolution
resolution = 4
# Iterate over the days of the month
for day_of_month in range(1, 31):
# Read the spark dataframe
df_ais_june = spark.read.table("ais.june")
# Filter the dataframe for the specific day of the month
df_ais_june = df_ais_june.filter(F.dayofmonth('BaseDateTime') == day_of_month)
# Add a 'resolution' column
df_ais_june = df_ais_june.withColumn('resolution', F.lit(resolution))
# Create an h3 index for each location using the UDF
df_ais_june = df_ais_june.withColumn('hex_id', h3_pyspark.geo_to_h3(df_ais_june['LAT'], df_ais_june['LON'], df_ais_june['resolution']))
# Count the occurrences of each hex_id to get the density
df_density = df_ais_june.groupBy('hex_id').count()
# Convert the density dataframe to Pandas
df_density_pd = df_density.toPandas()
# Create polygons for each unique hex_id, and filter them based on the bounding box
hex_polygons = []
hex_ids = []
for hex_id in df_density_pd['hex_id'].unique():
hex_boundary = h3.h3_to_geo_boundary(hex_id)
polygon = Polygon([(lon, lat) for lat, lon in hex_boundary])
# create a point based on the polygon's centroid and check if it lies within the bounding box
centroid = polygon.centroid
if min_lat <= centroid.y <= max_lat and min_lon <= centroid.x <= max_lon:
hex_polygons.append(polygon)
hex_ids.append(hex_id)
# Filter the density dataframe to only include hexagons within the bounding box
df_density_pd = df_density_pd[df_density_pd['hex_id'].isin(hex_ids)]
# Replace inf with max non-inf value and NaN with 1
df_density_pd.loc[df_density_pd['count'] == np.inf, 'count'] = df_density_pd.loc[df_density_pd['count'] != np.inf, 'count'].max()
df_density_pd['count'] = df_density_pd['count'].replace({0:1}).fillna(1)
# Create a GeoDataFrame
gdf = gpd.GeoDataFrame(df_density_pd, geometry=hex_polygons)
# Save the choropleth map to a file
output_file = f"/dbfs/tmp/ais_june_choropleth_{day_of_month}.png"
fig, ax = plt.subplots(1, 1, figsize=(20, 20))
gdf.plot(column='count', cmap='YlOrRd', norm=colors.LogNorm(vmin=gdf['count'].min(), vmax=gdf['count'].max()), missing_kwds={'color': 'white'}, ax=ax)
plt.xlim(min_lon, max_lon)
plt.ylim(min_lat, max_lat)
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.savefig(output_file)
plt.close(fig)
print(f"Choropleth map for day {day_of_month} saved to: {output_file}")
Choropleth map for day 1 saved to: /dbfs/tmp/ais_june_choropleth_1.png
Choropleth map for day 2 saved to: /dbfs/tmp/ais_june_choropleth_2.png
Choropleth map for day 3 saved to: /dbfs/tmp/ais_june_choropleth_3.png
Choropleth map for day 4 saved to: /dbfs/tmp/ais_june_choropleth_4.png
Choropleth map for day 5 saved to: /dbfs/tmp/ais_june_choropleth_5.png
Choropleth map for day 6 saved to: /dbfs/tmp/ais_june_choropleth_6.png
Choropleth map for day 7 saved to: /dbfs/tmp/ais_june_choropleth_7.png
Choropleth map for day 8 saved to: /dbfs/tmp/ais_june_choropleth_8.png
Choropleth map for day 9 saved to: /dbfs/tmp/ais_june_choropleth_9.png
Choropleth map for day 10 saved to: /dbfs/tmp/ais_june_choropleth_10.png
Choropleth map for day 11 saved to: /dbfs/tmp/ais_june_choropleth_11.png
Choropleth map for day 12 saved to: /dbfs/tmp/ais_june_choropleth_12.png
Choropleth map for day 13 saved to: /dbfs/tmp/ais_june_choropleth_13.png
Choropleth map for day 14 saved to: /dbfs/tmp/ais_june_choropleth_14.png
Choropleth map for day 15 saved to: /dbfs/tmp/ais_june_choropleth_15.png
Choropleth map for day 16 saved to: /dbfs/tmp/ais_june_choropleth_16.png
Choropleth map for day 17 saved to: /dbfs/tmp/ais_june_choropleth_17.png
Choropleth map for day 18 saved to: /dbfs/tmp/ais_june_choropleth_18.png
Choropleth map for day 19 saved to: /dbfs/tmp/ais_june_choropleth_19.png
Choropleth map for day 20 saved to: /dbfs/tmp/ais_june_choropleth_20.png
Choropleth map for day 21 saved to: /dbfs/tmp/ais_june_choropleth_21.png
Choropleth map for day 22 saved to: /dbfs/tmp/ais_june_choropleth_22.png
Choropleth map for day 23 saved to: /dbfs/tmp/ais_june_choropleth_23.png
Choropleth map for day 24 saved to: /dbfs/tmp/ais_june_choropleth_24.png
Choropleth map for day 25 saved to: /dbfs/tmp/ais_june_choropleth_25.png
Choropleth map for day 26 saved to: /dbfs/tmp/ais_june_choropleth_26.png
Choropleth map for day 27 saved to: /dbfs/tmp/ais_june_choropleth_27.png
Choropleth map for day 28 saved to: /dbfs/tmp/ais_june_choropleth_28.png
Choropleth map for day 29 saved to: /dbfs/tmp/ais_june_choropleth_29.png
Choropleth map for day 30 saved to: /dbfs/tmp/ais_june_choropleth_30.png
We then download these daily plot files locally. Next we use ffmpeg
. ffmpeg Documentation
The following bash script uses ImageMagick convert
to make daily progress bars and overlay those with the daily plots. ImageMagick convert documentation
#!/bin/bash
# ffmpeg.sh
# Create 30 images with increasing filled progress bars
for i in {1..30}; do
progress=$((i*1000/30)) # increase size 10 times
convert -size 1000x200 xc:grey50 -fill "rgb(100,149,237)" -draw "rectangle 0,0 $progress,200" progress_$(printf "%02d" $i).png
done
width=1440 # width of original images
# Overlay progress bar onto each image, create new image
for i in {1..30}; do
# Calculate progress bar position
x_offset=$(( (width-1000)/2 )) # center the progress bar
# Overlay progress bar and save as new image
convert ais_june_choropleth_$i.png progress_$(printf "%02d" $i).png -geometry +$x_offset+10 -composite ais_june_choropleth_progress_$(printf "%02d" $i).png
done
# Create video using ffmpeg
ffmpeg -framerate 5 -pattern_type glob -i 'ais_june_choropleth_progress_*.png' -c:v libx264 -r 30 -pix_fmt yuv420p ais_june_choropleth.mp4
# Clean up progress bar and composite images
rm progress_*.png
rm ais_june_choropleth_progress_*.png
This creates ais_june_choropleth.mp4
. This video file is hosted on YouTube:
We remark in passing that the Liquid syntax used for embedding the above YouTube video is
{% youtube Rt0RTpmlaK0 %}
Finally, we use ffmpeg
to convert ais_june_choropleth.mp4
to ais_june_choropleth.gif
.
ffmpeg -i ais_june_choropleth.mp4 -vf "fps=3,scale=1000:-1:flags=lanczos" -c:v gif ais_june_choropleth.gif
The file ais_june_choropleth.gif
is shown below.
Top comments (0)