DEV Community

Cover image for Multi-Statement SQL for reducing write latency in YugabyteDB (and PostgreSQL alternative to INSERT ALL)
Franck Pachot for YugabyteDB

Posted on

Multi-Statement SQL for reducing write latency in YugabyteDB (and PostgreSQL alternative to INSERT ALL)

In this article, I'll discuss different methods for running multiple SQL statements: a multi-statement transaction, a DO block, a call to a stored procedure, a multi-statement command like JDBC batching, or a single SQL statement. We will also see a difference between Read Committed and Repeatable Read.

YugabyteDB is similar to PostgreSQL but with distributed transactions and storage, introducing network latency to read and write operations. Despite the initial concern, this may affect performance but remains scalable because operations can be batched and flushed simultaneously rather than once per row. This results in network latency occurring once per transaction or every thousand rows, which makes it less noticeable to the end users. The performance overhead is similar to that of a traditional database waiting for a synchronous commit to its standby but with better resilience to failures.

In a procedural block, running multiple statements may not allow all operations to be batched. This can cause an intermediate flush and require waiting for the Raft quorum's acknowledgment multiple times per transaction. This is because of the procedural nature of multiple statements within a transaction, where each depends on the state resulting from the previous statement. In other words, all buffered write operations must be flushed before a read. Additionally, any non-transactional action between write operations must flush the buffered ones, like any output sent to the client (user or application).

In contrast, within a single SQL statement, all operations are declarative and read the same state as at the beginning of the statement. This eliminates the need for an intermediate flush. An SQL statement should not see the changes made within the same statement, so there's no need to flush them before the end.

I'll provide some examples to clarify these concepts.


For this demo, I'll exaggerate the network latency using my yb-compose setup:

git clone https://github.com/FranckPachot/yb-compose.git
cd yb-compose

cat > .env.delay <<-'CAT'
 COMPOSE_PROJECT_NAME=yb-compose
 TSERVER_FLAGS=yb_enable_read_committed_isolation=true
 MASTER_FLAGS=
 NET_DELAY_MS=50
CAT

docker compose --env-file=.env.delay up -d

Enter fullscreen mode Exit fullscreen mode

This starts a three nodes cluster, adding a large delay on network calls from the first node (region1) to the others (150ms to region2 and 200ms to region3):
Image description

I set the data placement to have all Raft leaders in the first node, the one I'll connect to so that the network delay applies only to the Raft consensus with the followers:

docker compose --env-file=.env.delay exec yb yugabyted configure data_placement --constraint_value=cloud.region1.zone:1,cloud.region2.zone:2,cloud.region3.zone:3

+-------------------------------------------------------------------------------------------------+
|                                            yugabyted                                            |
+-------------------------------------------------------------------------------------------------+
| Status                  : Configuration successful.                                             |
| Fault Tolerance         : Primary Cluster can survive at most any 1 availability zone failure   |
| Zone Preferences        : Successful in setting preference for zones.                           |
|                         : Following are the preferences for zones                               |
|                         : cloud.region1.zone : 1                                                |
|                         : cloud.region2.zone : 2                                                |
|                         : cloud.region3.zone : 3                                                |
+-------------------------------------------------------------------------------------------------+

Enter fullscreen mode Exit fullscreen mode

I'm connecting to the first node:

docker compose run -it pg psql -h yb-compose-yb-1

Enter fullscreen mode Exit fullscreen mode

With this configuration, I know that the simulated network latency will be added only to the replication synchronization to the Raft followers when the writes are flushed and the leader waits for one to acknowledge.

I create three tables and measure the elapsed time in my session:

create extension if not exists pgcrypto;

create table demo1 (
 id uuid default gen_random_uuid() primary key,
 value text
);
create table demo2 (
 id uuid default gen_random_uuid() primary key,
 value text
);
create table demo3 (
 id uuid default gen_random_uuid() primary key,
 value text
);
\timing on

Enter fullscreen mode Exit fullscreen mode

Multi-Statement Transaction

I insert one row in each table, with single-row insert statements in a Repeatable Read transaction:

yugabyte=# begin transaction isolation level repeatable read;
BEGIN

Time: 50.432 ms
yugabyte=*# insert into demo1(value) values ('Bonjour');
INSERT 0 1
Time: 290.115 ms
yugabyte=*# insert into demo2(value) values ('Ciao');
INSERT 0 1
Time: 202.580 ms
yugabyte=*# insert into demo3(value) values ('Grüezi');
INSERT 0 1
Time: 203.431 ms

yugabyte=*# commit;
COMMIT
Time: 336.672 ms
Enter fullscreen mode Exit fullscreen mode

Each statement pays the high price of network latency, as write operations must be completed before returning feedback to the user or application. Typically, write operations can be applied asynchronously, and if they fail later, the transaction can be rolled back. However, user feedback is non-transactional and cannot be erased, so the database must wait for flushing write operations on each statement. This entails waiting for Raft's consensus, including the latency to the nearest follower.

DO $$block$$ begin ...; end; $$block$$

One way to avoid per-statement feedback is grouping all the statements in a DO block:

yugabyte=# do $$
begin
rollback; -- because of https://github.com/yugabyte/yugabyte-db/issues/12494
set transaction_isolation='repeatable read';
insert into demo1(value) values ('Bonjour');
insert into demo2(value) values ('Ciao');
insert into demo3(value) values ('Grüezi');
commit;
end;
$$;
DO
Time: 465.189 ms
Enter fullscreen mode Exit fullscreen mode

The total time is much faster.

Read Committed

I've run it with Repeatable Read Isolation level because the default, Read Committed, brings another reason for a per-statement flush:

yugabyte=# do $$
begin
insert into demo1(value) values ('Bonjour');
insert into demo2(value) values ('Ciao');
insert into demo3(value) values ('Grüezi');
end;
$$;
DO
Time: 801.634 ms
Enter fullscreen mode Exit fullscreen mode

The total time is higher with the default Read Committed (and Auto-Commit). Read Committed is another reason for a per-statement flush because it runs in a subtransaction, to be able to be rolled back to the implicit savepoint taken before in case of conflict detection and transparent restart.

The reasons for flushing the write operations are documented in the YugabyteDB code: https://github.com/yugabyte/yugabyte-db/blob/2.23.0.2490/src/yb/yql/pggate/README#L90

The best way to avoid those flush is to run all writes in a single SQL statement: no feedback, no procedural dependencies, no savepoints.

Single-Table Multi-Value Insert

This is easy when inserting into a single table:

yugabyte=# insert into demo1(value) values ('Bonjour'),('Ciao'),('Grüezi');
INSERT 0 3
Time: 355.160 ms
Enter fullscreen mode Exit fullscreen mode

All writes are flushed at once, which is visible with the dist option of explain analyze showing Storage Flush Requests:

yugabyte=# explain (analyze, dist, costs off)
       insert into demo1(value) values ('Bonjour'),('Ciao'),('Grüezi');
                                QUERY PLAN
---------------------------------------------------------------------------
 Insert on demo1 (actual time=0.150..0.150 rows=0 loops=1)
   ->  Values Scan on "*VALUES*" (actual time=0.026..0.048 rows=3 loops=1)
         Storage Table Write Requests: 3
 Planning Time: 0.051 ms
 Execution Time: 153.538 ms
 Storage Read Requests: 0
 Storage Rows Scanned: 0
 Storage Write Requests: 3
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 1
 Storage Flush Execution Time: 153.262 ms
 Storage Execution Time: 153.262 ms
 Peak Memory Usage: 24 kB
(14 rows)

Time: 383.826 ms

Enter fullscreen mode Exit fullscreen mode

There are three write requests, one for each row, but only one flush, waiting for the Raft synchronization to the quorum.

I did this to demonstrate my goal of inserting three rows with only one flush. However, this multi-value insert is not equivalent to the previous transactions because it inserts into a single table, where I wanted to insert one row into three tables.

Stored Procedure

Instead of a DO block, I can declare my logic in a stored procedure and execute it:

yugabyte=# create or replace procedure p() as $$
insert into demo1(value) values ('Bonjour');
insert into demo2(value) values ('Ciao');
insert into demo3(value) values ('Grüezi');
$$ language SQL;

CREATE PROCEDURE
Time: 1268.817 ms (00:01.269)

yugabyte=# begin transaction isolation level repeatable read
\;
call p()
\;
commit;
BEGIN
CALL
COMMIT
Time: 361.638 ms

Enter fullscreen mode Exit fullscreen mode

This was fast, with no intermediate flush, but I've run it with a Repeatable Read isolation level.

With the default Read Committed, the response time increases:

yugabyte=# begin transaction isolation level read committed
\;
call p()
\;
commit;
BEGIN
CALL
COMMIT
Time: 660.963 ms
Enter fullscreen mode Exit fullscreen mode

Multi-Statement Command (like JDBC batching)

Note that depending on the language, there are other ways to send multiple statements simultaneously, like with Java statement batching, and the performance will depend on how they are processed.

With psql, I can send a multi-statement command with \; separators:

yugabyte=# begin transaction isolation level repeatable read
\;
insert into demo1(value) values ('Bonjour')
\;
insert into demo2(value) values ('Ciao')
\;
insert into demo3(value) values ('Grüezi')
\;
commit
;
BEGIN
INSERT 0 1
INSERT 0 1
INSERT 0 1
COMMIT
Time: 661.411 ms
Enter fullscreen mode Exit fullscreen mode

This is not as fast as with a stored procedure because feedback (the number of rows inserted) is collected after each insert.

Who uses multi-statement commands? Object Relational Mappers (ORM) use them to minimize the number of roundtrips when flushing entities and collections, such as setting the JDBC batch_size. However, when using ORM, you won't benefit from the write operations batching, like we have seen above when doing the same from psql.

The number of flushes to the Raft consensus will be reduced only when the inserts go to the same table, and you set reWriteBatchedInserts=true in the PostgreSQL driver. This setting rewrites multiple insert statements into one multi-value insert.


The problem is the procedural sequence of three statements. There's a solution: SQL is not procedural.

Single SQL Statement with Common Table Expressions

Thanks to the PostgreSQL powerful WITH clause, I can write the insert statements as Common Table Expression (CTE):

yugabyte=# with
i1 as (
insert into demo1(value) values ('Bonjour')
),
i2 as (
insert into demo2(value) values ('Ciao')
),
i3 as (
insert into demo3(value) values ('Grüezi')
)
select;
--
(1 row)

Time: 418.613 ms

Enter fullscreen mode Exit fullscreen mode

With this statement, I can insert into three tables with the same performance as the multi-value insert into one table. This is an excellent equivalent to the Oracle INSERT ALL statement.

I mentioned that flushing batched writes is necessary before reading. However, within a single SQL statement, it is not necessary because it doesn't read the updated state. If your logic requires querying the modified state, you can return it from any DML statement.

With a RETURNING clause, I can get the feedback about the rows inserted and get the details within the same response time:

yugabyte=# with
i1 as (
insert into demo1(value) values ('Bonjour')
returning 1 as inserted
),
i2 as (
insert into demo2(value) values ('Ciao')
returning 1 as inserted
),
i3 as (
insert into demo3(value) values ('Grüezi')
returning 1 as inserted
)
select
 (select sum(inserted) from i1) as i1
,(select sum(inserted) from i2) as i2
,(select sum(inserted) from i3) as i3
;
 i1 | i2 | i3
----+----+----
  1 |  1 |  1
(1 row)

Time: 355.848 ms

Enter fullscreen mode Exit fullscreen mode

This is optimal, limits the Raft consensus synchronization when distributed, and provides the same feedback as multiple statements would do.


Note that it is not limited to INSERT statements. That's what we will see in the next post. A WITH clause is a great way to declare multiple DML statements. If you can't group all your statements in a single SQL, you can still perform well by grouping them in a single call. Running them with a Repeatable Read isolation level will be more scalable.


YugabyteDB Resiliency vs. PostgreSQL High Availability Solutions | Yugabyte

Discover the difference between traditional HA database solutions based on enhanced and automated disaster recovery, with cloud-native solutions focused on resiliency built into the database.

favicon yugabyte.com

Top comments (0)