DEV Community

Cover image for Speeding up geodata processing with feather
Sophia Parafina
Sophia Parafina

Posted on

Speeding up geodata processing with feather

Previously, on speeding up geodata processing...

In this post, I compare the read and write performance of the feather file format against the pickle file format.

From Hadley Wickham's blog:

What is Feather?

Feather is a fast, lightweight, and easy-to-use binary file format for storing data frames. It has a few specific design goals:

  • Lightweight, minimal API: make pushing data frames in and out of memory as simple as possible
  • Language agnostic: Feather files are the same whether written by Python or R code. Other languages can read and write Feather files, too.
  • High read and write performance. When possible, Feather operations should be bound by local disk performance.

Geopandas has supported the feather format since version 0.8, and the test used version 0.10.

import geopandas as gpd
import time
import pickle
from pyogrio import read_dataframe
import warnings; warnings.filterwarnings('ignore', message='.*initial implementation of Parquet.*')

# read shapefile
read_start = time.process_time()
data = read_dataframe("Streets.shp")
read_end = time.process_time()

# write feather test
write_start = time.process_time()
data.to_feather('test_feather.feather', 'wb')
write_end = time.process_time()

write_time = write_end - write_start
print(str(write_time/60)+" minutes to write feather file")

# read feather test
read_start = time.process_time()
csv_feather_df = pd.read_feather('csv_feather.feather')
read_end = time.process_time()

write_time = read_end - read_start
print(str(write_time/60)+" minutes to read feather file")
Enter fullscreen mode Exit fullscreen mode

Results

r/w minutes pickle feather
read 0.92 1.07
write 4.36 1.69

Read times are comparable, but write times 4x faster. The longer write time is probably caused by converting geometry to a Well Known Binary, which is compatible with the feather format. The caveat is that the feather format is subject to change, as evidenced by the import ignore warning.

Thoughts

If your data is static or distributed, the pickle format may be better. Feather may be the right choice if you need to transfer geodata within a processing workflow with a file, e.g., from Python to R.

Discussion (0)