loading...

Firestore Dataflow Illustrated and Measured

dbanisimov profile image Denis Anisimov ・4 min read

Firestore is more sophisticated than just a managed DBaaS - it comes with a real-time updates support, client caching, offline persistence and background triggered cloud functions. All of these pieces can be used to create very dynamic and scalable apps with complex dataflows without the headache of managing the infrastructure.

But how all these pieces fit together and how can they be used to achieve different user experiences? In this blog post I'm presenting my view of a typical app data flow and perform a little testing.

No time to read? Play with the interactive version here: Firestore Dataflow Test

A typical Firestore app dataflow

For this part I'm assuming that we are building an app (web or mobile) that uses Firestore directly through official SDKs with local caching enabled. Our app may also want to perform some background data processing of Firestore documents and write the results back to the Firestore to be displayed on the client. All communication is realtime, which means that we use snapshot listeners on the client and Firestore triggers for functions.

An example may be a chat app where we fetch link previews in the background.

Take a look at this diagram and read below for the description

Firestore dataflow diagram

TC ~15ms
From client write to client read from the cache

The shortest dataflow path is through the local cache. The data written by the client is saved in the cache and immediately triggers a snapshot listener.

You should care about this data path if you rely on the local cache to provide optimistic UI updates.

TDb ~200ms
From client write to client read from the database

The default path is from the client to the database backend. All writes eventually go there producing the final document result that is streamed back to the client and all other clients listening to affected queries. This path performance is reliant on the quality of a client network connection.

This is the data path of client-to-client realtime communication. It's also the one for transactions, as those needs to be committed to the backend.

TTr ~400ms
From firestore write to firestore update with the triggered function

Once the data is written to Firestore a background function is triggered. This path is heavily affected by the cold start time and also by the regional locations of your functions and Firestore.

This path is important if you're making data changes in response to document creation or update.

TU ~600ms
From client write to client read of the data updated with the triggered function

Finally your updated data needs to reach back the client. Technically it is the same snapshot listening to document updates. For the user experience it is the longest path.

Despite the path length it may be a good choice to do background data processing.

A special case with a callable function

Sometimes talking directly to Firestore is not an option - the operation is too computationally heavy, or it needs to access some sensitive data, or requires complex validation that cannot be done with security rules. A common approach is to move the logic from the client to a callable function. The obvious downside of this case is that you loose the offline capabilities and caching and need to implement optimistic UI separately.

Let's look at this special case

Diagram of the Firestore dataflow with a callable function

TCall ~200ms
From client request to response received

The direct path is a callable cloud function request. The response may be anything, but it's a good idea to include the document being written to the database so it can be used for optimistic update on the client. This path is heavily affected by the cold start time and network latencies.

TDb ~250ms
From client request to client read from the database

The indirect path through Firestore. The data written to Firestore by the function can be read back at the client and other client listening to affected queries. This is the simplest way to get the result back as you don't need to process the callable function response separately.

Measuring the paths lengths

The times above are the results of measurements I've made and present here to show the relative lengths of different paths. They are on the optimistic side of the spectrum, especially if you factor in cold start times and proximity to Google Cloud datacenters. All times are for round-trip data paths.

I've written a simple web-based testbed to get those numbers:

Github dbanisimov / firestore-dataflow-test

I encourage you to clone the code and deploy your own project to see the performance in your case. Your geographical location and regions of Firestore and Cloud Functions will change the numbers significantly. It also shows very vividly cold start delays, time to establish connection to Firestore for the first time, and some other artifacts. It uses HTM + Preact (look ma, no build!) for the web client.

Feel free to play around with the live version of the testbed:

Firestore Dataflow Test

Both Firestore and Functions are in us-central.

Conclusion

Of course, the two cases above don't cover all the possible combinations of data flow primitives. More complex systems can be built with inclusion of PubSub and Cloud Storage, as both of them are well integrated with Cloud Functions. And drawing some high-level pictures and running experiments before building the whole system may save from some surprises later.

Happy flowing! :)

Posted on by:

dbanisimov profile

Denis Anisimov

@dbanisimov

Maker, runner, coffee drinker.

Discussion

pic
Editor guide