You need a database. Maybe you need more than one.
Fifteen years ago you’d choose MySQL if you were from the open source camp, SQL Server if you were on the Microsoft stack, and Oracle if you were operating in a large enterprise.
Today, we’re spoilt for choice when it comes to storing, querying, and manipulating data. But despite all the noise around NoSQL, Postgres has quietly come to be the default data store for many developers.
So, what is it about Postgres? Why has it become the preferred choice for much of the Rails community? What led Microsoft to buy a major Postgres vendor and Amazon to migrate from Oracle to largely Postgres-based data stores?
When I was looking for an answer, I have to be honest and say that I decided to be lazy. Rather than hunt around the web, I decided to look at one place where there were lots of Postgres installs. With more than two million managed Postgres databases, Heroku seemed like the perfect place to satisfy both my curiosity and my laziness.
Let’s start at the beginning. What do you want from a database?
Priority number one for many people is that the data store should get on and quietly do its job. Unless you’re particularly into databases, you probably want one you don’t have to think about. You want to write data and you want to retrieve data.
Postgres ticks pretty much every box when it comes to being boring in a good way:
Straightforward licensing: Postgres itself has a permissive open source licence and is developed by a global community rather than a single vendor. That reduces the risk of future bills and changes in product availability.
Standards compliant: Postgres meets 150 of 164 of the requirements of the SQL:2003 standard; no database management system meets all 164. That makes it easier to swap out, should you want to.
More than relational: support for object inheritance, plus key-value and JSON data types, mean Postgres can reduce your ops overhead in that you might not need to spin-up a NoSQL data store.
If you’re not worrying about your database, then you can focus on building something great.
Let’s say you’re growing a side-project into a business. You need to invest all those evenings and weekends in your product, not into worrying about the specifics of your data store. For Sugarmate, an app that helps people manage diabetes, Heroku Postgres has enabled its founder Josh to consider the data layer a solved problem, while he grows the app into a business with users worldwide.
Postgres scales. Vertically and horizontally. From a few hundred records to a multi-terabyte monster. Just how you scale with Postgres depends on your data model and data access needs.
In many scenarios, increasing memory and CPU gives just the right capacity boost. What happens, though, when you need more? Before you reach for the nearest NoSQL enthusiast to tell you why their preferred solution is better than Postgres, consider that:
Postgres scales out: for read-heavy workloads, PostgreSQL’s built-in support for read-only replicas simplifies spreading the load across multiple Postgres nodes.
Partitioning isn’t a dirty word: in write-heavy situations, splitting your data set into distinct databases is a perfectly acceptable way of gaining capacity.
Offloading helps: using Redis alongside Postgres can lighten the load, whether through caching or by shifting predictable data requests completely to Redis.
Sure, you’ll need to do some prep work to scale Postgres but there are trade-offs with any kind of data scaling.
Your production database is sacrosanct. But you still need real data for development, testing, and analysis.
Postgres has two ways to help:
Forking: create a new independent database from a snapshot of another database.
Following: live stream changes from one database –– say, your production instance –– to another, such as that used by an analytics team.
Take the example of Bonobos. They’re a fashion retailer with several teams across the company, such as data science, accounts, and warehouse, who need access to production data. Rather than put the burden on their main database, the Bonobos engineering team created read-only follower instances that share the data safely across the company.
An added bonus of Postgres being open source is that there are no additional licensing costs for those secondary instances.
Data warehouses have traditionally been big, complex beasts.
But what do you really want from a data warehouse? A single place to store a lot of data and a flexible way to query it. Both of which are Postgres specialities.
Thanks to the follower database concept, streaming data from other Postgres instances into a Postgres-based data warehouse is simple. Rhino, a provider of an insurance backed alternative to property rental deposits, takes it a step further by streaming data from multiple non-Postgres sources into their Postgres data warehouse. Using homegrown tooling or ETL such as Mulesoft, Postgres can ingest data from pretty much any source and then allow you to query it with good old SQL.
Part of the joy of the Postgres ecosystem lies in the ways that companies have integrated with and extended the core database.
There are extensions that add geographic and time series support, those that make it easier to migrate from other data stores, and yet others that help with partitioning, ACLs, indexing, and more.
One integration, Heroku Connect, takes care of bidirectional sync between a Postgres instance and data held in Salesforce. That opens the possibility of writing apps that work with Salesforce data and also of writing tools that make it easier to query and analyse Salesforce data.
Similar to Rhino’s use of Postgres as a data warehouse, syncing Salesforce data into Postgres unlocks that data for deeper query and analysis. Need to dive deep into, say, the relative order value between two different regions? Postgres can help you to pull out subtleties from the data that wouldn’t be possible otherwise. In fact, Salesforce’s own go to market planning team uses Heroku Connect as a core part of their annual planning process.
Postgres is an excellent foundation precisely because it’s so good in a bunch of different situations.
It might be the only database you ever need. Or perhaps you’ll find some parts of your dataset are better off elsewhere. The social network Dubsmash switched recording likes of their user videos to a simple key-value store instead, while holding onto Postgres for everything else.
The past decade has seen massive upheaval in the world of databases. Through it all, though, Postgres has been there chugging away serving queries, taking in data, and adding new features to its roster. I’m curious to hear from other people in the comments below about how you’re using Postgres, especially if it’s in an unusual or unexpected way.
Anyway, here’s to Postgres, whether it’s self-hosted in a data center, running on a Raspberry Pi cluster, running in a managed PostgreSQL offering, or however you’re using it.