A famous article, "10 Things I Hate About PostgreSQL", by Rick Branson summarizes the main pain points people encounter with PostgreSQL. It bring an answer to: "No software is perfect, so exactly what are PostgreSQL’s imperfections?"
Those problems are known for a while, work has been started on them, but they have their root in the design of the storage and transactions. This is not easy to change in a database that is used for decades, for critical applications, with, most of the time, no direct contact between the users and the community, because it is free and open source.
The PostgreSQL community brings many features and solutions into every release in the upper layer, the SQL language, the relational features, the advanced semi-structured data types like JSONB. But the lower layer, storage and transaction, is harder to evolve and still uses the Heap tables, B*Tree indexes, full-page WAL, checkpoints, no in-place updates, Vacuum, 32-bit xid... designed from the old times. It works, there are contributions to workaround the problems, but it can be annoying as mentioned in the "10 Things" article.
The primary goal of YugabyteDB is to provide a distributed SQL database compatible with all the features of PostgreSQL, with a cloud-native architecture for the distributed storage: all nodes active, resilient to failures and elastic to scale. But, with the modern storage layer design, it also solves most of the problems mentioned. This post will list the "10 Things" from the article and how they are solved in YugabyteDB. The titles are from Rick Branson article, they are not my words.
🐘PostgreSQL uses a 32-bit transaction ID and the database freezes when it is exhausted. Table *vacuum * must run frequently to avoid this.
🚀YugabyteDB uses Hybrid Logical Time (HLC) for cluster synchronization (see A Matter of Time: Evolving Clock Sync for Distributed Databases) and, as there's a physical clock component in it, it will never exhaust.
🐘PostgreSQL doesn't provide ACID across multiple servers. An async replica can be setup to offload reads, with an acceptable lag, and to provide Disaster Recovery with a Recovery Time Objective in minutes, and a Recovery Point Objective with some data loss. A quorum of sync replicas can reduce the risk of data loss at the price of higher latency for write transactions. This must be carefully managed to avoid data corruption in case of failover.
🚀YugabyteDB is built for efficient sync replication, sharding the tables and indexes into tablets, forming a Raft group, and replicating them to their peers with the Raft protocol. There's the guarantee to always have one and only one "leader" to take the reads and writes for a set of rows or index entries, without any risk of split-brain. And a "follower" can be elected as a new "leader" within 3 seconds.
🐘PostgreSQL streaming replication is based on physical replication though the WAL, at page level. I do not agree with the "Spread Corruption" part. The author of the "10 things" mentions "hardware-induced data corruption" which would need more explanations, because WAL streaming replication is a lot safer, in this area, than storage-based replication. I guess that what the author experienced was a non-full-page WAL record corrupting an existing block in the standby.
🚀YugabyteDB replication is a logical replication, at key-value level (the rows, index entries, transaction intents,.. are all key-value records appended to the Raft group). This is more efficient in the volume transferred (and this is also important for the cost of cross-cloud-region clusters). But it also lets each tablet peer apply the changes to their own files. YugabyteDB writes into Sorted Sequence Table (SST) files, always written sequentially, with no risk that a new change corrupts a previous block. They are compacted on each node independently, and compaction will detect a block corruption, thanks to checksum checking, and repair it.
🐘PostgreSQL accumulates the states by writing entire new rows. Even updating one byte of a single column will copy the whole row. This is known as table bloat, has also some performances on index maintenance especially if you use the default 100% fillfactor, and requires frequent vacuum.
🚀YugabyteDB stores updates on a per column basis. The accumulation of the garbage collection of the oldest versions, kept a short time for MVCC purpose, is managed by the storage with the compaction of SST files in background, per table/index shard. This activity doesn't collide with writes because the SST Files are immutable.
🐘PostgreSQL is not multi-threaded, and forks a new process per connection. Each connection takes memory, and memory is limited on a server. Especially with double buffering, less available memory means more I/O. Connection pools can help to limit this, but, with microservices, this is still a lot of connections. Pgbouncer can help, but that's another component to install and manage.
🚀YugabyteDB does the same, one process per connection, because the protocol and query layer uses PostgreSQL. However, this is usually not a scalability issue because connections can be spread to multiple nodes: all nodes are equal, accepting connections, with read and write transactions. In addition to that, there's ongoing work to add a database resident connection pool based on Odyssey. This will address two cases:
- application that doesn't use a client-side connection pool
- microservices with many too connection pools
🐘PostgreSQL stores rows in heap tables like Oracle, and the primary key is an additional secondary index. This takes more space and more reads (even the Index Only Access) has to read the table.
🚀YugabyteDB stores tables in their primary index structure, a log-structured merge-tree (LSM) with the first level in memory. This ensures fast access by primary key, and real Index Only Access by secondary indexes.
🐘PostgreSQL upgrades are done with a planned outage. The application must be stopped, not only for the duration of the upgrade but also for the statistic gathering that must be done after it (ANALYZE). This makes the downtime window depending on the size of the database.
🚀YugabyteDB, being distributed, allows online upgrade by upgrading the nodes one by one in a rolling fashion. In a Replication Factor RF=3 deployment, one node can be down without impacting the application availability. The restart is fast, but you can also add a new node and remove the old one when data has been rebalanced to it.
🐘PostgreSQL replication is not easy to setup. From a recent discussion with a friend concerned by it, here is a recent example showing how troubleshooting can be complex. Note that it also shows the amazing help you can have from an open-source community.
🚀YugabyteDB is designed for replication and all, sync, async, and change data capture (CDC), are at the same level: the per-shard Raft group writes. This is built to be fully automated. Enabling sync replication is as simple as setting
--replication-factor=3 when starting the cluster.
🐘PostgreSQL core team has always been against the usage of query planner hints. Even when it is accepted that giving directives to the optimizer is a good thing, like materializing a CTE or not, it has been introduced in the SQL syntax, breaking the compatibility with previous versions.
🚀YugabyteDB enables the
pg_hint_plan by default. That's not a reason to put hints everywhere, but it is already there when needed. For testing or quickly workaround a problem. This is simple and can be done with PostgreSQL of course. But, in IT, so many deployments are made without specific configuration, the defaults matter.
🐘PostgreSQL doesn't provide compression and, generally speaking, counts on the operating system to handle anything file-related. I've seen people using ZFS for the many features it provides, but getting acceptable and predictable performance for a database on ZFS is not easy.
🚀YugabyteDB provides compression and encryption, at rest (storage) and in transit (network). This is a must for a cloud-native database given the cost of storage, and egress traffic between AZ or regions.
The author, in conclusion, still recommends: "In general I’d recommend starting with PostgreSQL and then trying to figure out why it won’t work for your use case". He mentions that some of those issues can be reduced by using a managed service. Because, solutions, or workarounds are well known. You just need some DBAs to managed them pro-actively, in-house or as from a managed service. PostgreSQL is the best open-source database I know for a wide area of workloads.
Today, for an OLTP workload where you may need to scale at some point, and where high availability is a thing, I would recommend looking at YugabyteDB. Fully open-source, and compatible with PostgreSQL, you can still change your mind and move to PostgreSQL, managed or not, later. By doing this, you release the burden of dealing with vacuum and replication problems.
I'm not saying that you will not have other problems to deal with, there's no magic. A distributed database requires a bit more upfront design, like on the primary key sharding, and may show lower performance on single-node operations because there's no shared buffer cache. YugabyteDB re-use of PostgreSQL is stateless to be able to scale-out. However, in the decision to start with one or the other, the time you spend on the issues mentioned in this article are wasted when you move to another storage engine. On the other way, when starting with a distributed SQL database, the time spent on data model design for scalability will never be a waste of time. You will still benefit from scalable design in a monolithic database.
In some way, this last paragraph is an answer to
My answer is my subjective opinion, and you should make your own in your context, of course. I love PostgreSQL for its community of users, the ecosystem around it, its open-source model, and the strict SQL standard compatibility. And YugabyteDB reuses all that. Don't forget that I'm a Developer Advocate at Yugabyte. The more people start with YugabyteDB, the more work I have to help, learn and share. And I enjoy it enough to ask for more 😎
There are also many improvements in YugabyteDB when compared to PostgreSQL. For example, Batched Nested Loop, Loose Index Scan, in-database connection pool...