Scaling databases is hard. However, perhaps the lowest hanging fruit is introducing read-only replicas.
A typical load balancing requirement is to route all "logical" read-only queries to a read-only instance. This requirement can be implemented in 2 ways:
- Create two database clients (read-write and read-only) and pass them around the application as needed.
- Use a middleware to assign query to a connection pool based on the query itself.
Option 1: Two distinct clients
The first option is preferable as it is the most explicit. However, it also has the most overhead to implement – adding this to an existing codebase could mean modifying thousands lines of code. The other downside of this approach is that it that it is easy to overlook when a read-only query is using a read-write database client, and while that isn't a major issue, it would be nice if the inverse it true – we would be warned if we are using the wrong connection for the query. That might be possible with a middleware that routes queries.
Option 2: Middleware
The second option routes queries based on the query itself and the context in which it was initiated. The following has to be true for us to safely route a query to a read-only instance:
- It must be a SELECT query.
- It mustn't be part of a transaction.
- It mustn't execute a volatile function.
The first two requirements are relatively straightforward to implement. The third one requires that we introduce a convention.
Handling volatile functions
A volatile function in PostgreSQL is any function that has side-effects (e.g. writes data) or that can change output even within a single table scan (e.g. random()). In our context, only the first part of volatile function is relevant. However, there is no way of telling just by looking at the query if it is volatile or not.
SELECT foo() # Is foo() volatile?
Therefore, we need to have a convention for marking such queries as not safe for read-only routing. There is more than one way of doing it, but we chose a very simple approach: a magic comment that marks query as volatile. Just add @volatile
anywhere in the query (comment) to indicate that it has side-effects and have middleware check for that keyword.
We could also do the inverse here and use a convention to mark which queries are safe for routing to the read-only instances using @readOnly
keyword. This way we could avoid checking the query for other keywords altogether and entirely rely on what the engineer instruments. However, this approach suffers from the same short-comings as the #1 option: Easy to miss opportunities and requires editing every query one-by-one.
Implementing query routing using Slonik
We are using Slonik PostgreSQL client in our project, and lucky for us, Slonik has a beforePoolConnection
middleware that can be used to implement all 3 requirements. In short, beforePoolConnection
is called just before the query is assigned a connection. All we have to do is create a second read-only pool and route queries to that read-only pool when all 3 conditions are satisfied. The code is mighty straightforward:
const readOnlyPool = await createPool('postgres://read-only');
const pool = await createPool('postgres://main', {
interceptors: [
{
beforePoolConnection: (connectionContext) => {
if (!connectionContext.query?.sql.trim().toUpperCase().startsWith('SELECT ')) {
// Returning null falls back to using the DatabasePool from which the query originates.
return null;
}
// This is a convention for the edge-cases where a SELECT query includes a volatile function.
// Adding a @volatile comment anywhere into the query bypasses the read-only route, e.g.
// sql`
// # @volatile
// SELECT write_log()
// `
if (connectionContext.query?.sql.includes('@volatile')) {
return null;
}
// Returning an instance of DatabasePool will attempt to run the query using the other connection pool.
// Note that all other interceptors of the pool that the query originated from are short-circuited.
return readOnlyPool;
}
}
]
});
// This query will use `postgres://read-only` connection.
pool.query(sql`SELECT 1`);
// This query will use `postgres://main` connection.
pool.query(sql`UPDATE 1`);
A few things that are worth clarifying about the above code:
- We are not handling
SELECT INTO
in this middleware. We know that our use case does not require it, so I left it out. If you are usingSELECT INTO
, a simple fix could be to check if the query containsINTO
keyword. - It is worth emphasizing that if a client initiates connection using
pool#connect()
orpool#transaction()
, then
connectionContext.queryis
null`, i.e., we are going to fall back to using the main pool.
And that's it! Short and sweet. We now have an efficient query routing mechanism that reduced the load on our main instance and allows us to better scale our service.
Top comments (7)
Interesting. This, however, seems to need some sort of SQL interpreter, and even if one is provided, it would have to be pretty darn smart. What if my SQL statement is a multi-statement SQL? What if in my multi-statement SQL I start with a
SELECT
but the next statement is anUPDATE
? If the SQL interpreter/parser cannot cover things like this, I think I would just stick to layered programming and make the decision at the repository level based on the function call. Just my thoughts.Slonik only allows you to execute one query at a time, so you cannot combine SELECT and UPDATE into one. In the context of the chosen stack, all you need to make a sane decision are the 3 criteria outlined in the post.
Yes, then it's not for me, unfortunately. All my REST API does UPDATE/INSERT then SELECT to return the most current version of the record to pick up database-inserted data like timestamps. I would need a different mechanism. Still, it is interesting for sure. Thanks for sharing.
You can use the 'returning' keyword to make update and insert return the filled in columns. You don't need a select for that.
Nice concept. Should work in a lot of implementations, only 100% correct though if you make a perfect sql-parser. The idea to use a read-only replica for reporting and anomalydetection is a valid pattern anyway. It's a great way to maximise performance.
It is already 100% correct in the sense that it won't ever route not safe queries to a read-only instance.
The few edge cases (like
WITH
) that we are not routing toread-only
instance would indeed require a parser.Choosing the safe route when in doubt is the right thing to do.