DEV Community

Cover image for AIS vessel density maps with pyspark and h3 and animations with ffmpeg
Jordan Bell
Jordan Bell

Posted on

AIS vessel density maps with pyspark and h3 and animations with ffmpeg

MarineCadastre.gov AIS vessel density maps with PySpark and h3 | by Jordan Bell | Feb, 2024 | Medium

Office for Coastal Management, 2023: Nationwide Automatic Identification System 2022, https://www.fisheries.noaa.gov/inport/item/67336

See Leveraging ESG Data to Operationalize Sustainability. November 11, 2020. Antoine Amend. Databricks

We ingest AIS message data for June 2022 into a Databricks table. (See https://github.com/jordanbell2357/uscg-nais-data/blob/main/Databricks/Databricks-AIS-01-Data-Loading.ipynb and https://github.com/jordanbell2357/uscg-nais-data/blob/main/Databricks/Databricks-AIS-10-Statistical-Tests.ipynb)

We get the following:

df_ais_june = spark.read.table("ais.june")

df_ais_june.printSchema()

root
 |-- MMSI: string (nullable = true)
 |-- BaseDateTime: timestamp (nullable = true)
 |-- LAT: double (nullable = true)
 |-- LON: double (nullable = true)
 |-- SOG: double (nullable = true)
 |-- COG: double (nullable = true)
 |-- Heading: double (nullable = true)
 |-- VesselName: string (nullable = true)
 |-- IMO: string (nullable = true)
 |-- CallSign: string (nullable = true)
 |-- VesselType: integer (nullable = true)
 |-- Status: integer (nullable = true)
 |-- Length: double (nullable = true)
 |-- Width: double (nullable = true)
 |-- Draft: double (nullable = true)
 |-- Cargo: string (nullable = true)
 |-- TranscieverClass: string (nullable = true)
 |-- filename: string (nullable = true)
Enter fullscreen mode Exit fullscreen mode

In https://github.com/jordanbell2357/uscg-nais-data/blob/main/Databricks/Databricks-AIS-11-Time-Filtering.ipynb we make a plot of vessel density made using pyspark and h3 on a Databricks cluster, for June 4, 2022.

Image description

We make similar plots for each day in June in the following, also from https://github.com/jordanbell2357/uscg-nais-data/blob/main/Databricks/Databricks-AIS-11-Time-Filtering.ipynb:

import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Polygon
import h3
import h3_pyspark
from pyspark.sql import functions as F
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
import matplotlib.colors as colors

# Define the lat/lon boundaries
min_lat, max_lat, min_lon, max_lon = 10, 60, -140, -50

# Define the hexagon resolution
resolution = 4

# Iterate over the days of the month
for day_of_month in range(1, 31):
    # Read the spark dataframe
    df_ais_june = spark.read.table("ais.june")

    # Filter the dataframe for the specific day of the month
    df_ais_june = df_ais_june.filter(F.dayofmonth('BaseDateTime') == day_of_month)

    # Add a 'resolution' column
    df_ais_june = df_ais_june.withColumn('resolution', F.lit(resolution))

    # Create an h3 index for each location using the UDF
    df_ais_june = df_ais_june.withColumn('hex_id', h3_pyspark.geo_to_h3(df_ais_june['LAT'], df_ais_june['LON'], df_ais_june['resolution']))

    # Count the occurrences of each hex_id to get the density
    df_density = df_ais_june.groupBy('hex_id').count()

    # Convert the density dataframe to Pandas
    df_density_pd = df_density.toPandas()

    # Create polygons for each unique hex_id, and filter them based on the bounding box
    hex_polygons = []
    hex_ids = []
    for hex_id in df_density_pd['hex_id'].unique():
        hex_boundary = h3.h3_to_geo_boundary(hex_id)
        polygon = Polygon([(lon, lat) for lat, lon in hex_boundary])
        # create a point based on the polygon's centroid and check if it lies within the bounding box
        centroid = polygon.centroid
        if min_lat <= centroid.y <= max_lat and min_lon <= centroid.x <= max_lon:
            hex_polygons.append(polygon)
            hex_ids.append(hex_id)

    # Filter the density dataframe to only include hexagons within the bounding box
    df_density_pd = df_density_pd[df_density_pd['hex_id'].isin(hex_ids)]

    # Replace inf with max non-inf value and NaN with 1
    df_density_pd.loc[df_density_pd['count'] == np.inf, 'count'] = df_density_pd.loc[df_density_pd['count'] != np.inf, 'count'].max()
    df_density_pd['count'] = df_density_pd['count'].replace({0:1}).fillna(1)

    # Create a GeoDataFrame
    gdf = gpd.GeoDataFrame(df_density_pd, geometry=hex_polygons)

    # Save the choropleth map to a file
    output_file = f"/dbfs/tmp/ais_june_choropleth_{day_of_month}.png"
    fig, ax = plt.subplots(1, 1, figsize=(20, 20))
    gdf.plot(column='count', cmap='YlOrRd', norm=colors.LogNorm(vmin=gdf['count'].min(), vmax=gdf['count'].max()), missing_kwds={'color': 'white'}, ax=ax)
    plt.xlim(min_lon, max_lon)
    plt.ylim(min_lat, max_lat)
    plt.xlabel('Longitude')
    plt.ylabel('Latitude')
    plt.savefig(output_file)
    plt.close(fig)

    print(f"Choropleth map for day {day_of_month} saved to: {output_file}")
Enter fullscreen mode Exit fullscreen mode
Choropleth map for day 1 saved to: /dbfs/tmp/ais_june_choropleth_1.png
Choropleth map for day 2 saved to: /dbfs/tmp/ais_june_choropleth_2.png
Choropleth map for day 3 saved to: /dbfs/tmp/ais_june_choropleth_3.png
Choropleth map for day 4 saved to: /dbfs/tmp/ais_june_choropleth_4.png
Choropleth map for day 5 saved to: /dbfs/tmp/ais_june_choropleth_5.png
Choropleth map for day 6 saved to: /dbfs/tmp/ais_june_choropleth_6.png
Choropleth map for day 7 saved to: /dbfs/tmp/ais_june_choropleth_7.png
Choropleth map for day 8 saved to: /dbfs/tmp/ais_june_choropleth_8.png
Choropleth map for day 9 saved to: /dbfs/tmp/ais_june_choropleth_9.png
Choropleth map for day 10 saved to: /dbfs/tmp/ais_june_choropleth_10.png
Choropleth map for day 11 saved to: /dbfs/tmp/ais_june_choropleth_11.png
Choropleth map for day 12 saved to: /dbfs/tmp/ais_june_choropleth_12.png
Choropleth map for day 13 saved to: /dbfs/tmp/ais_june_choropleth_13.png
Choropleth map for day 14 saved to: /dbfs/tmp/ais_june_choropleth_14.png
Choropleth map for day 15 saved to: /dbfs/tmp/ais_june_choropleth_15.png
Choropleth map for day 16 saved to: /dbfs/tmp/ais_june_choropleth_16.png
Choropleth map for day 17 saved to: /dbfs/tmp/ais_june_choropleth_17.png
Choropleth map for day 18 saved to: /dbfs/tmp/ais_june_choropleth_18.png
Choropleth map for day 19 saved to: /dbfs/tmp/ais_june_choropleth_19.png
Choropleth map for day 20 saved to: /dbfs/tmp/ais_june_choropleth_20.png
Choropleth map for day 21 saved to: /dbfs/tmp/ais_june_choropleth_21.png
Choropleth map for day 22 saved to: /dbfs/tmp/ais_june_choropleth_22.png
Choropleth map for day 23 saved to: /dbfs/tmp/ais_june_choropleth_23.png
Choropleth map for day 24 saved to: /dbfs/tmp/ais_june_choropleth_24.png
Choropleth map for day 25 saved to: /dbfs/tmp/ais_june_choropleth_25.png
Choropleth map for day 26 saved to: /dbfs/tmp/ais_june_choropleth_26.png
Choropleth map for day 27 saved to: /dbfs/tmp/ais_june_choropleth_27.png
Choropleth map for day 28 saved to: /dbfs/tmp/ais_june_choropleth_28.png
Choropleth map for day 29 saved to: /dbfs/tmp/ais_june_choropleth_29.png
Choropleth map for day 30 saved to: /dbfs/tmp/ais_june_choropleth_30.png
Enter fullscreen mode Exit fullscreen mode

We then download these daily plot files locally. Next we use ffmpeg. ffmpeg Documentation

The following bash script uses ImageMagick convert to make daily progress bars and overlay those with the daily plots. ImageMagick convert documentation

#!/bin/bash
# ffmpeg.sh

# Create 30 images with increasing filled progress bars
for i in {1..30}; do
    progress=$((i*1000/30))  # increase size 10 times
    convert -size 1000x200 xc:grey50 -fill "rgb(100,149,237)" -draw "rectangle 0,0 $progress,200" progress_$(printf "%02d" $i).png
done

width=1440  # width of original images

# Overlay progress bar onto each image, create new image
for i in {1..30}; do
    # Calculate progress bar position
    x_offset=$(( (width-1000)/2 ))  # center the progress bar

    # Overlay progress bar and save as new image
    convert ais_june_choropleth_$i.png progress_$(printf "%02d" $i).png -geometry +$x_offset+10 -composite ais_june_choropleth_progress_$(printf "%02d" $i).png
done

# Create video using ffmpeg
ffmpeg -framerate 5 -pattern_type glob -i 'ais_june_choropleth_progress_*.png' -c:v libx264 -r 30 -pix_fmt yuv420p ais_june_choropleth.mp4

# Clean up progress bar and composite images
rm progress_*.png
rm ais_june_choropleth_progress_*.png
Enter fullscreen mode Exit fullscreen mode

This creates ais_june_choropleth.mp4. This video file is hosted on YouTube:

We remark in passing that the Liquid syntax used for embedding the above YouTube video is

{% youtube Rt0RTpmlaK0 %}
Enter fullscreen mode Exit fullscreen mode

Finally, we use ffmpeg to convert ais_june_choropleth.mp4 to ais_june_choropleth.gif.

ffmpeg -i ais_june_choropleth.mp4 -vf "fps=3,scale=1000:-1:flags=lanczos" -c:v gif ais_june_choropleth.gif
Enter fullscreen mode Exit fullscreen mode

The file ais_june_choropleth.gif is shown below.

Density map of vessel positions for June 2022. From MarineCadastre.gov

Top comments (0)