TrailBase is a fast, single-file, open-source application server with type-safe APIs, built-in JS/ES6/TS Runtime, Auth, and Admin UI built on Rust+SQLite+V8.
It's the kind of thing that let's you bootstrap a backend for you rich client-side application - mobile, desktop, PWA - in no time.
You can check out a demo of the admin dashboard online:
- user: admin@localhost
- password: secret itself claims
In this article, we'll compare TrailBase's performance to a few amazing, and certainly more weathered alternatives such as SupaBase, PocketBase, and vanilla SQLite.
Disclaimer
Generally benchmarks are tricky, both to do well and to interpret.
Benchmarks never show how fast something can theoretically go but merely how fast the author managed to make it go. Micro-benchmarks, especially, offer only a key-hole insights, which may be biased and may not apply to your workload.
Performance also doesn't exist in a vacuum. If something is super fast but doesn't do what you need it to do, performance is an illusive luxury. Doing less makes it naturally easier to go fast, which is not a bad thing, however means that comparing a specific aspect of a highly specialized solution to a more general one may be misleading, unfair or irrelevant for you.
We tried our hardest to give all contenders the best chance1 2 and were initially surprised by the observed performance gap ourselves. We suspect that given how quick SQLite itself is for simple queries, even small overheads weigh heavily. If you have any suggestions on how to make anyone go faster, make it more apples-to-apples or generally see any issues, let us know. We hope that the results can still provide some interesting insights even with a chunky grain of salt. Ultimately, nothing beats benchmarking your own setup and workloads.
Insertion Benchmarks
Total Time for 100k Insertions
The graph shows the overall time it takes to insert 100k messages into a mock chat-room table setup. The less time it takes, the better.
Unsurprisingly, in-process vanilla SQLite is the quickest 3. All other setups add additional table look-ups for authorization, IPC overhead4, and layers of features on top. Think of the vanilla SQLite point as an upper bound on how fast one could go or the cost for adopting any of the other systems.
The measurement suggests that for this specific setup TrailBase can insert 100k records almost 70 times faster than Payload2, 9 to 16 times faster than SupaBase5, and roughly 6 to 7 times faster than PocketBase 1.
Total time of inserting a large batch of data tells only part of the story. Let's have a quick look at resource consumption to get an intuition for provisioning or footprint requirements, i.e. what kind of machine one would need:
TrailBase & PocketBase Utilization
The graph shows the CPU utilization and memory consumption (RSS) of both PocketBase and TrailBase. They look fairly similar apart from TrailBase finishing earlier. They both load roughly 3 CPUs with PocketBase's CPU consumption being slightly more variable 6. The little shelf at ~1 CPU after the TrailBase run is likely due to SQLite check-pointing.
Both consume only about 140MB of memory at full tilt, which makes them a great choice for running on a tiny VPS or a toaster.
SupaBase is a bit more involved due to its layered architecture including a dozen separate services providing various functionality:
SupaBase Memory Usage
Looking at SupaBase's memory usage, it increased from from roughly 6GB at rest to 7GB fully loaded. This means that out of the box, SupaBase has roughly 50 times the memory footprint of either PocketBase or TrailBase. In all fairness, a lot of SupaBase' itself claimss functionality isn't needed for this benchmark and it might be possible to shed less critical services, e.g. removing supabase-analytics would save ~40% of memory. That said, we don't know how feasible this is in practice.
SupaBase CPU utilization
Looking at the CPU usage, one can see it jump up to roughly 9 cores (the benchmark ran on a machine with 8 physical cores and 16 threads: 7840U). Most of the CPUs seem to be consumed by supabase-rest, the API frontend, with Postgres itself hovering at only about 0.7 cores. Also, supabase-analytics definitely seems to be in use.
Latency and Read Performance
Let's take a closer look at latency distributions. To keep things manageable we'll focus on PocketBase and TrailBase, which are architecturally simpler and more comparable.
For TrailBase, reads were on average 3.5 and insertions 6 times faster. That latter is in line with the throughput results we've seen above.
Looking at the latency distributions we can see that the spread is well contained for TrailBase. For PocketBase, read latencies are also generally well contained and predictable. However, insert latencies show a more significant "long tail" with the p90 latency being roughly 5 times slower than p50. Slower insertions can take north of 100ms. This may or be related to GC pauses, scheduling, or more generally the CPU variability we observed earlier.
JavaScript-Runtime Benchmarks
The benchmark sets up a custom HTTP endpoint /fibonacci?n=<N>
using the same slow recursive Fibonacci implementation for both, PocketBase and TrailBase. This is meant as a proxy for a computationally heavy workload to primarily benchmark the performance of the underlying JavaScript engines: goja for PocketBase and V8 for TrailBase. In other words, the impact of any overhead within PocketBase or TrailBase is diminished by the time it takes to compute fibonacci(N)
for sufficiently large N
.
We found that for N=40
, V8 (TrailBase) is around 40 times faster than goja (PocketBase):
Interestingly, PocketBase has an initial warm-up of ~30s where it doesn't parallelize. Not being familiar with goja's execution model, one would expect similar behavior for a conservative JIT threshold in combination with a global interpreter lock 🤷. However, even after using all cores completing the benchmark takes significantly longer.
With the addition of V8 to TrailBase, we've experienced a significant increase in the memory baseline dominating the overall footprint. In this setup, TrailBase consumes roughly 4 times more memory than PocketBase. If memory footprint is a major concern for you, constraining the number of V8 threads will be an effective remedy (--js-runtime-threads
).
Final Words
We're very happy to confirm that TrailBase's APIs and JS/ES6/TS runtime are quick. The significant performance gap we observed, especially for the APIs, might just be a consequence of how much even small overheads matter given how quick SQLite itself is.
With the numbers fresh off the press, prudence is of the essence and ultimately nothing beats benchmarking your own specific setup and workloads. In any case, we hope this was at least somewhat insightful. Let us know if you see anything that can or should be improved. The benchmarks are available on GitHub.
-
Trying to give PocketBase the best chance, the binary was built with the latest go compiler (v1.23.1 at the time of writing),
CGO_ENABLED=1
(which according to PB's own documentation will use a faster C-based SQLite driver) andGOAMD64=v4
(for less portable but more aggressive CPU optimizations). We found this setup to be roughly 20% faster than the static, pre-built binary release. ↩ -
We picked Payload as representative of popular Node.js CMS, which itself claims to be many times faster than popular options like Strapi or Directus. We were using a v3 pre-release, as recommended, also using the SQLite/drizzle database adapter marked as beta. We manually turned on WAL mode and filed an issue with payload, otherwise stock payload was ~210x times slower. ↩
-
Our setup with drizzle and node.js is certainly not the fastest possible. For example, we could drop down to SQLite in C or another low-level language with less FFI overhead. That said, drizzle is a great popular choice which mostly serves as a point-of-reference and sanity check. ↩
-
The actual magnitude on IPC overhead will depend on the communication cost. For the benchmarks at hand we're using a loopback network device. ↩
-
The SupaBase benchmark setup skips row-level access checks. Technically, this is in its favor from a performance standpoint, however looking at the overall load on its constituents with PG being only a sliver, it probably would not make much of an overall difference nor would PG17's vectorization, which has been released since the benchmarks were run. That said, these claims deserve re-validation. ↩
-
We're unsure as to what causes these 1-core swings. Runtime-effects, such as garbage collection, may have an effect, however we would have expected these to show on shorter time-scales. This could also indicate a contention or thrashing issue 🤷. ↩
Top comments (0)