Consistent reads go to the tablet leader to work on the current state, being guaranteed to see the latest writes, maybe waiting for concurrent writes to be committed.
In a geo-distributed deployment, where network latency can be higher than the expected response time, you may prefer to read from the nearest replica, leader or follower, to get faster response time.
When follower reads are allowed, YugabyteDB still reads from a consistent snapshot, but accepting a bounded staleness defined by
yb_follower_read_staleness_ms which defaults to 30 seconds. All replicas are guaranteed to be consistent with this snapshot, even without reading from other followers, for two reasons: they exchange heartbeats (by default every 500ms), and the maximum clock skew is known (by default 500ms).
A staleness of 15 or 30 seconds is often acceptable for real-time analytics or reporting where you query a virtual snapshot without making any some reservations (locks).
This behavior is enabled in YugabyteDB by two settings:
- the transaction must be declared as read-only, so that it cannot compromise write consistency. Basically, in an MVCC database, "read-only" is an isolation level that doesn't require any lock acquisition or read intents record.
yb_read_from_followersis set to
onto explicitly allow the possible staleness because that is different from the monolithic PostgreSQL behavior of read-only transactions (which depends on the replication mode and settings). It explicitly allow a staleness of
yb_follower_read_staleness_msbefore the beginning of the query.
To set those, it is recommended to deploy with a dedicated connection pool with those initialization settings. Here are some examples:
- SET on connection initialization declared in the connection pool configuration:
set yb_read_from_followers to on; set default_transaction_read_only to on;
- PGOPTION as environnement variable:
PGOPTIONS="-c yb_read_from_followers=on -c default_transaction_read_only=on"
- PGOPTION in the JDBC URL:
- dedicate a user for it:
alter user reader_role set yb_read_from_followers to on; alter user reader_role set default_transaction_read_only to on;
Typically you need at least two connection pools in a CQRS architecture: one for the consistent reads and writes, and one for the queries.
Multiple connection pools will increase the number of connections and it is also a good practice to limit the number of connections and, in some cases, you want to allow follower reads for a few specific queries only.
Setting those two parameters before the query, and back to their initial value after it, may require additional roundtrips to the server. It is also not easy to maintain: forgetting to set it back, in the normal scenario or after an exception, would be a critical issue.
This blog post explores some way to set those parameters per query without increasing the number of roundtrips.
Before trying some alternatives, I need a reliable way to test it. I create a simple table:
create table demo as select generate_series(1,1000) x;
and use my YBWR script:
\! curl -s https://raw.githubusercontent.com/FranckPachot/ybdemo/main/docker/yb-lab/client/ybwr.sql | grep -v '\watch' > ybwr.sql \i ybwr.sql
This defines two prepared statements,
snap_table to display the read statistics on the tablet (
next() in the LSM-Tree).
First, I set
yb_read_from_followers=on without setting a read-only transaction. This reads from the leaders (
L) which are distributed to multiple nodes (
yugabyte=# execute snap_reset; ybwr metrics -------------- (0 rows) Time: 141.791 ms yugabyte=# set yb_read_from_followers=on; SET Time: 12.080 ms yugabyte=# select count(*) from demo where x between 1 and 42 ; count ------- 42 (1 row) Time: 13.412 ms yugabyte=# execute snap_table; rocksdb_seek | rocksdb_next | rocksdb_insert | dbname / relname / tserver / tabletid / leader --------------+--------------+----------------+------------------------------------------------------------ 1 | 667 | | yugabyte demo 0d4b4b64a93c43239d0ca8ed21286458 L 10.0.0.41 1 | 645 | | yugabyte demo 49b822c41fe34c059e0e24788810012a L 10.0.0.40 1 | 685 | | yugabyte demo aa45e3e86eaa41ca9254100fbe4b8b43 L 10.0.0.39 (3 rows) Time: 450.742 ms
This has read from different nodes, where the table Leader (
L) is. In this case (Sequential Scan on few rows) there's one
seek() per node, which means that the response time will include the latency to each node, once per execution.
Now, if I do the same in a read-only transaction (with
yb_read_from_followers stills set to
yugabyte=# execute snap_reset; ybwr metrics -------------- (0 rows) Time: 141.791 ms yugabyte=# set default_transaction_read_only=on; SET Time: 12.080 ms yugabyte=# select count(*) from demo where x between 1 and 42 ; count ------- 42 (1 row) Time: 13.412 ms yugabyte=# set default_transaction_read_only=off; SET Time: 12.069 ms rocksdb_seek | rocksdb_next | rocksdb_insert | dbname / relname / tserver / tabletid / leader --------------+--------------+----------------+------------------------------------------------------------ 1 | 667 | | yugabyte demo 0d4b4b64a93c43239d0ca8ed21286458 10.0.0.39 1 | 645 | | yugabyte demo 49b822c41fe34c059e0e24788810012a 10.0.0.39 1 | 685 | | yugabyte demo aa45e3e86eaa41ca9254100fbe4b8b43 L 10.0.0.39 (3 rows) Time: 450.742 ms
In this 3 nodes cluster with Replication Factor 3, there's a replica on each node. I have read from the Leader only when it was near the node I'm connected to (
10.0.0.39) and from the followers for the other, still reading from my local node. This validates the read from follower behavior.
I can also test the consequence, that with those settings, a read-only transaction reads from a virtual snapshot taken from 30 seconds ago:
yugabyte=# \dconfig yb*follower* List of configuration parameters Parameter | Value -------------------------------+------- yb_follower_read_staleness_ms | 30000 yb_read_from_followers | on (2 rows) yugabyte=# select count(*) from demo; count ------- 1000 (1 row) yugabyte=# insert into demo values(0); INSERT 0 1 yugabyte=# select count(*) from demo; count ------- 1001 (1 row) yugabyte=# set default_transaction_read_only=on; SET yugabyte=# select count(*) from demo; count ------- 1000 (1 row) yugabyte=# \! sleep 30 yugabyte=# select count(*) from demo; count ------- 1001 (1 row) yugabyte=# set default_transaction_read_only=off; SET yugabyte=#
Now that I know how to test, I can show the different ways to set those parameters for a single query.
I'll start with a way that may seem to work in some situation but only because of some side effects when setting the read point. YugabyteDB installs the
pg_hint_plan which allows a
Set() hint to set a parameter for one query. However, this is set only during the parsing phase, and is relevant only for query planner parameters. Reading from followers is a behavior of the Executor and is not controlled by
Doing so was used in some situation, and even got its way into some documentation examples, by mistake. It was working as a side effect only, because the read time is pro-actively set during the parse phase and, in some limited cases (atomic transaction, no prepared statement, no generic plan) doesn't change at execution time.
Basically there is no way in PostgreSQL to set a parameter for the scope of a query only and I'm not aware of any extension doing this. There was a proposal in 2011 (proposal: set GUC variables for single query) with some discussions about a possible syntax and some doubts about the value of it in addition to the transaction scope with SET LOCAL.
SET LOCAL is in theory the best solution but it requires an explicit transaction:
start transaction; set local transaction_read_only to on; set local yb_read_from_followers to on; select count(*) from demo where x between 1 and 42; commit;
I don't like this solution because I consider that read-only must be at transaction level. Would it make sense to set this parameter back and forth in the same transaction? No, and you would get
ERROR: cannot set transaction read-write mode inside a read-only transaction
yb_read_from_followers=on is already set, it is much better to define READ ONLY at the scope of the transaction:
start transaction read only; select count(*) from demo where x between 1 and 42; commit;
However, as your goal is a single-query, you don't want 3 roundtrips to the PostgreSQL backend to run this. It is not possible to use an autocommit statement (or you would have to set
default_transaction_read_only and then be sure to set it back after).
You can send the explicit transaction in a single SQL command if your driver is able to get the query result from there. this is possible with
ysqlsh but this requires that the
; is escaped (
; as a command separator before sending it as a statement separator. The difference is visible when you set
\timing on as it shows the time for each command:
yugabyte=# \timing on Timing is on. yugabyte=# start transaction read only ; START TRANSACTION Time: 12.096 ms yugabyte=*# select count(*) from demo where x between 1 and 42 ; count ------- 42 (1 row) Time: 13.410 ms yugabyte=*# commit; COMMIT Time: 12.069 ms yugabyte=# yugabyte=# yugabyte=# start transaction read only \; yugabyte-# select count(*) from demo where x between 1 and 42 \; yugabyte-# commit; START TRANSACTION count ------- 42 (1 row) COMMIT Time: 13.756 ms
You should test what happens with the drivers you use, For example do you know what JDBC sends when you
setReadOnly(true) or JPA and Spring with
@Transactional(readOnly = true)? How many calls to the database? Some client library will also allow to batch multi-statement commands so that you can send the
start transaction and the
select in one execution.
pg_stat_statements you will see multiple statements in both cases, because
query is a statement there, but in
pg_stat_activity you will see the whole command in
Another possibility to reduce the calls to the database is encapsulating your query as a function. There you can define parameters with the scope of the function execution:
create or replace function demo_query(a int, b int) returns setof demo as $SQL$ select * from demo where x between a and b; $SQL$ language sql set transaction_read_only to on set yb_read_from_followers=on ;
This reads from followers in one atomic transaction:
yugabyte=# select count(*) from demo_query(1, 42) ; count ------- 42 (1 row) Time: 16.308 ms yugabyte=# execute snap_table; rocksdb_seek | rocksdb_next | rocksdb_insert | dbname / relname / tserver / tabletid / leader --------------+--------------+----------------+------------------------------------------------------------ 1 | 669 | | yugabyte demo 0d4b4b64a93c43239d0ca8ed21286458 10.0.0.39 1 | 645 | | yugabyte demo 49b822c41fe34c059e0e24788810012a 10.0.0.39 1 | 685 | | yugabyte demo aa45e3e86eaa41ca9254100fbe4b8b43 L 10.0.0.39 (3 rows) Time: 454.406 ms
This works but you have to write a query with parameters. As far as I know there is no possibility to do the same with a view.
There is also no possibility to do the same with a DO block as it cannot return a result.
transaction_read_only but there are equivalent syntaxes:
within a session you can
set session characteristics as transaction read onlyto set the transaction mode for the next transactions (it doesn't try to change the current one if you have already started one - and some drivers with auto commit of issue a begin before).
within a transaction, you can
set transaction read only. You don't need SET LOCAL as this is a characteristic of the current transaction and the next one will use the default, but I think it is good to make it explicit.
Be careful, some drivers disable Auto Commit by starting a transaction before the call. Then setting the default for the next transaction will not be applied for the next statements.
- my preference goes to
start transaction read onlyas it clearly associate the mode with the transaction. It is the same as
begin read onlyor
begin work read onlyor
begin work transaction read only. I often type
beginas it is shorter but the SQL starndard is
If you know a simpler way to set a parameter for a single query in PostgreSQL, please let me know.