Readers of a certain age and RDBMS background will probably remember northwind
, or HR
, or OE
databases - or quite possibly not just remember them but still be using them. Hardcoded sample data is fine, and it’s great for repeatable tutorials and examples - but it’s boring as heck if you want to build an example with something that isn’t using the same data set for the 100th time.
I’ve written before about one of my favourite resources for mocking data, Mockaroo, and how you can even use it to stream mock data into Kafka. Other mock data generators for Kafka include kafka-connect-datagen and Voluble.
Sometimes though, you just want some real, live, warts-and-all data. And there is fortunately a real shift in governments and public bodies in recent years to Open data. Here is a list of some of my (UK-centric) resources. Many have a mix of live and static datasets.
Northern Data Hub - Bradford City Council data, including the car park live stream that I used in this talk
Data Mill North - 685 published datasets from Leeds City Council
data.gov.uk - Huge listing of open data provided by the UK government
UK Environment Agency flood-monitoring API - this is one of my favourites, because not only do you get a live feed of river levels from around the UK, you get to make awful puns about streams (geddit?!)
Transport for London (TfL) - Great source of data about the capital’s transport system, including lots of live feeds
Network Rail - a nice feed of data all about the UK rail network. I had fun with this data here :)
What are your go-to sources for real data? Comment below or let me know on Twitter.
Top comments (3)
Worth highlighting the Splitgraph Data Delivery Network. It hs 40,000 public datasets. It's essentially a PostgreSQL proxy - so you can access with any PostgreSQL client. splitgraph.com/connect
My favourite of the UK government data pages is Highways Agency Roadworks.
I have an app running here which I update each week with the latest data. Code on Github.
I've also done a local travel app here, based on the data at Transport API.
I loved the idea! Car Detailing Services Thanks for all you do for the community!