There are several reasons you may want to obtain municipal data.
You can present them nicely on your website in a form of an interactive map, or you can perform some geospatial analysis like finding distances to the closest facilities.
Or maybe you just want to play with F# language on real, tangible datasets?
In this post, I explain how to get any data from OpenStreetMap, convert it to geojson, and parse for wanted places so you can take them and have further adventures with F# and real data.
You can download the notebook here
*You still have to generate geojson file on your own as described in the post
Open Street Map / Open City Data
OpenStreetMap is an excellent source of city data that is absent or not covered enough by the open data portal maintained by city governors.
This is especially useful for small cities and villages (that don't have data portal at all) or to quickly get data that have potential commercial value.
Be aware, that in contradiction to open city data, OSM data are not complete and hence not perfect for many business scenarios. The data are most often injected by plain users, which puts the OSM data schema robustness ...at risk.
Even if you are among lucky citizens where the data portal shines (please check ~200 geojson files for Rostock (Germany): https://www.opendata-hro.de/dataset/?res_format=GeoJSON) you can still find OSM valuable and complementary to aforementioned.
I'm using data for London, as it is a huge city that still lacks some publicly available open data. Also, the data volume itself requires special steps not needed for smaller areas. I parse rivers, addresses, shops, and leisure; however, any facility for any city can be obtained in the same (or almost the same) manner.
What I do present here is zero rocket science, just grabbing data, converting to geojson, and parsing. I'm sharing it as it can be tedious for someone who never worked with geojson in C#/F#.
Yes, you can write an idiomatic functional parser, "type provider" as well, however, I'm not sure it will make working with these data more pleasant and approachable to the average programmer (taking into account OSM data schema flaws)
Getting data from Open Street Map
Pythonists have plenty of OSM (and all relevant geospatial) libraries at their disposal, there are some libraries for dotnet as well.
However, I like to work primarily with raw geojson files as it is a well-known, human-readable format so I can go through the file to understand how to process the wanted data.
Go to OSM
and you will see some map tiles:
You have at least two options to select the city that is relevant to you:
- scroll accordingly to cover the wanted area and press export -> overpass API
- or type wanted area name and then click 'export'.
Be aware that these options have limitations in size, you cannot download too large areas this way. For example, I'm not able to download the whole London data.
In such scenarios, you can download the appropriate datasets via plenty of pre-generated files. For London you can find it here:
Converting to geojson
After download, we have ~1.5 GB OSM file
To convert it to geojson I'm using npm package called ... osmtogeojson. For those who are not working with nodejs on a daily basis it is very simple: just install the tool globally
For London file I'm getting out of memory error hence need to run the program, based on the clue from the documentation:
The conversion takes ~1-2 minutes for such large file.
For smaller cities it can be simpler to download data in *.shp format and convert to geojson via mapsharper.com, https://mygeodata.cloud/, ArcGis or many, many other options.
As we have a file, we need something to process it.
In my case, it is .NET Interactive extension for Visual Studio Code.
The very fist notebook cell loads geojson file and parse it to get the items. There are almost 2mln of items processed in 6 seconds. Not that bad.
Now get some city goodies, starting with ...water.
But only rivers that have some name assigned.
This ...TryGetProperty.. from JSON deserializer is not perfect here, definately could be wrapped somehow but I just want here to show how things work without DSL.
Output of the code depends on extension you have installed. Unless you have some renderer extension from marketplace you will see just raw text.
My favorite one is the Unfolded Map Renderer extension created by Taras Novak, however it has been made private recently (unless you are supporter as I'm). You can still use one of his many other extensions available here:
... using as many functions from collection modules as you want
What about addresses ? Well, I rarely need to process them this way as typically municpal portals enables them or they are available via Open Addresses. Apparently London does neither of them ( I believe this is because of a private sector etc).
Addresses in OSM are (from my best knowledge) NOT a standalone object, it means they are just attributes of associated objects.
It can end up with many objects having the same address, and what is more important: they can be either points of polygons. And we most likely want to have unified shapes like point.
First, let get them with a familiar approach that additionally does some distinctions.
Is this 113k addresses a correct number of real addresses ? I don't think soo but it still is usefull dataset for experiments.
Using NetTopologySuite for geospatial analysis
We already did a lot, but in order to do more we have to introduce dedicated library/package. In dotnet it is NetTopologySuite.
You can achieve a lot from the geospatial perspective with this library, here we just want to get centroids for polygons. Additionally I'm getting rid of unwanted properties, keeping only street, number and city.
NetTopologySuite doesn't work nicely with .NET Interactive, it will hang forever unless you are using old interactive version.
So if you want to run the rest of cells
you have to temporary rollback to mentioned version:
There is a workaround suggested from .NET Interactive team to make it work with latest version, I will update the post as soon as I know how to apply it.
We started to process data with System.Text.Json.
NetTopologySuite has equivalent version for processing geojson called NetTopologySuite.IO.GeoJSON4STJ. However here I'm using the regular one as "STJ" doesn't work properly with my advanced processing (not covered here). Hopefully all at some point will be unified.
*Lets rand some addresses. *
Please note than when using projection library build on WebGL (like unfolded), displaying hundreds-of-thousands items is very smooth
I presented how to obtain and process geojson municipal data.
Actually it should be called "preprocess" as I haven't touch any interesting processing from the user perspective. Now we could connect this with actual "domain" processing, join with other data sources and add true F# expressiveness on top of that.
But this is for another story.