Postgres Sharding and Scalability with Marco Slot
Relational databases have been popular since the 1970s, but in the last 20 years the amount of data that applications need to collect and store has skyrocketed. The raw cost to store that data has decreased. There is a common phrase in software companies: “it costs you less to save the data than to throw it away.”
Saving the data is cheap, but accessing that data in a useful way can be expensive. Developers still need rapid row-wise and column-wise access to the data. Accessing an individual row of a database can be useful if a user is logging in and you want to load all of that user’s data, or if you want to update a banking system with a new financial transaction. Accessing an entire column of a database can be useful if you want to aggregate summaries of all of the entries in a system–like the sum of all financial transactions in a bank.
These different kinds of transactions are nothing new, but with the growing scale of data, companies are changing their mentality from thinking in terms of individual databases to thinking about distributed “data platforms.”
In a data platform, the data across a company might be put into a variety of storage systems–distributed file systems, databases, in-memory caches, search indexes–but the API for the developer is kept simple. And the simplest, most commonly understood language is SQL.
Marco Slot is an engineer with Citus Data, a company that makes Postgres scalable. Postgres is one of the most common relational databases, and in this episode Marco describes how Postgres can be used to service almost all of the needs of a data platform.
This isn’t easy to do, as it requires sharding your growing relational database into clusters and orchestrating distributed queries between those shards. In this show, Marco and I discuss Citus’s approach to the distributed systems problems of a sharded relational database. This episode is a nice complement to previous episodes we have done with Ozgun and Craig from Citus, in which they gave a history of relational databases, and explained how Postgres compares to the wide variety of relational databases out there. Full disclosure: Citus Data is a sponsor of Software Engineering Daily.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.