loading...

Some of my favourite public data sets

rmoff profile image Robin Moffatt Originally published at rmoff.net on ・2 min read

Readers of a certain age and RDBMS background will probably remember northwind, or HR, or OE databases - or quite possibly not just remember them but still be using them. Hardcoded sample data is fine, and it’s great for repeatable tutorials and examples - but it’s boring as heck if you want to build an example with something that isn’t using the same data set for the 100th time.

I’ve written before about one of my favourite resources for mocking data, Mockaroo, and how you can even use it to stream mock data into Kafka. Other mock data generators for Kafka include kafka-connect-datagen and Voluble.

Sometimes though, you just want some real, live, warts-and-all data. And there is fortunately a real shift in governments and public bodies in recent years to Open data. Here is a list of some of my (UK-centric) resources. Many have a mix of live and static datasets.

slide 3

  • Transport for London (TfL) - Great source of data about the capital’s transport system, including lots of live feeds

  • Network Rail - a nice feed of data all about the UK rail network. I had fun with this data here :)


What are your go-to sources for real data? Comment below or let me know on Twitter.

Discussion

pic
Editor guide
Collapse
saubury profile image
Simon Aubury

Worth highlighting the Splitgraph Data Delivery Network. It hs 40,000 public datasets. It's essentially a PostgreSQL proxy - so you can access with any PostgreSQL client. splitgraph.com/connect

Collapse
juliannicholls profile image
Julian Nicholls

My favourite of the UK government data pages is Highways Agency Roadworks.

I have an app running here which I update each week with the latest data. Code on Github.

I've also done a local travel app here, based on the data at Transport API.