Brought to you by engine.so - a tool to instantly create a public self-service knowledge base for your customers with Notion.
Google Firestore is Google's propriety NoSQL Document-Based Database. Paired with the rest of the Firebase suite such as Cloud Functions, Firebase Auth and Firebase Storage this can look like a pretty attractive tech stack for startups or solo-developers looking to get an app up and running quickly.
This is exactly what I thought 9 months ago when choosing a tech stack for my mobile app. Firestore had some advantages that I was attracted to. It had a generous free tier, an auto-scaling NoSQL data model, and some sweet integrations with the other Firebase services. If you feel like you're in this happy bubble with this technology now, here's a word of advice…
Make sure you're aware of the downsides of Firestore.
We've all heard the term "vendor lock-in". Well, Firestore is the epitome of this idea. If you think this won't be an issue because your product is simple or small, I'll tell you right now that even with the simplest apps Firestore's vendor lock-in creeps up. I experienced this when trying to do the simple task of a deploying a DEV and a PROD version of the database. This is a huge challenge with Firestore.
The first hurdle you run into is the fact that you can't have multiple Firestore databases associated with a project. Therefore you have to create separate project-dev and project-prod projects. This isn't too hard initially, and is probably a good design pattern in general, but now your development experience gets 2x as complex. Now you have to decide if you want each project to have a separate Firebase Auth, and what about cloud functions, or storage buckets, etc? And there's no tools to automate any of this deployment, so if you want to just "copy over" your database data and functions and auth users to production, you have to do that manually. And even then, some of these operations can be done through the firebase CLI, but the more important ones like migrating data can't be.
Assuming you get production and development environments setup, now you have 20 other issues that crop up. How do you do automated backups? How do you export data from one database to another in an automated way to refresh staging servers? How can you get a local version of this database running to test with? The answer to all these questions is that… it's complicated. These more complicated use cases are hard to do because this database isn't open source, so there's no community around it making tools for these things.
Some of these issues aren't unique to Firestore, but simply to any proprietary database vendor. This is why I'll never choose a proprietary database again. There's times to try out the latest and greatest thing, but when it comes to the integrity, security, and accessibility of your company's most important asset (your data), I'd say 10 times out of 10 that it's a better choice to use a solution that's been battle-tested on open source.
This part really annoyed me while using Firestore. It's the fact that Firestore has two features that are consistently at ends with each other.
Firestore charges per document when you read/write to the database.
Firestore's querying abilities are very primitive, so more complicated filtering, sorting, or merging of data MUST be done client-side.
This deadly combination means that if you have to do a more complicated query (which is almost unavoidable), you will need to overfetch the data, and then filter it in a Cloud Function or on the client-side before using it. This isn't just wasteful on networking bandwidth and client-side processing time, but because of Firestore's payment strategy it ends up costing you more money as-well. The biggest result I've seen from this is that
It results in my database collections and available querying operations defining what features I implement into my product, rather than my customers deciding it.
Now I'm going to play devil's advocate for a second because I understand why Firestore is setup this way. It's because Firestore is built for one purpose. It's built to make it very difficult for you to write a bad query. Almost every possible query you can make to Firestore is of O(1) complexity. This is great because it means your database processing time is short and clients are getting results very quickly. But…
Did you catch that?
Firestore is built to make processing cheap on the server-side. But guess what? You pay per document so whether a query takes 1ms or 100ms doesn't matter to your wallet. This means that Firestore is optimizing to make their costs cheaper. Not yours. And since you have to overfetch data and manually filter it on the client side you actually end up with a more expensive and slower query overall. This is why I moved away from Firestore. After seeing that this was their business model, it proved to me that there's no way I want to try to scale with this product.
One thing that initially attracted me to Firestore was it's NoSQL data model. There's other options for NoSQL such as MongoDB or AWS DynamoDB, but Firestore provided a really nice auto-scaling out-of-the-box solution for me that I liked right away. Until I didn't like it anymore.
You see, most data for the typical web or mobile application is going to be highly relational. A typical application will probably have users, as-well as things that relate to the users in some way. And these things likely relate to other things as-well. Etc, etc. And they might be viewed in a list, or indexed, or queried to see all the things that a user has created. For managing these basic use-cases, Firestore is okay, but once it gets more complicated Firestore breaks down.
The NoSQL solution to these problems includes things like data duplication, fan-out writes, etc. These principles take more development time to implement than having a SQL database to begin with. If you're looking towards Firestore as a solution, you're probably looking for something that saves development time, because that's Firebase's selling point, but Firestore is more akin to taking on time-debt that you have to pay off later. To illustrate some really painful hurdles I had to develop around I'll give some quick examples from my project:
Users can create reviews. A user's profile picture and username is attached to each review they create. This is needed because the frontend views a list of reviews. If we have to fetch all the reviews then make a second query for each review to get the user profile picture and username, then that 1 query now becomes N+1 queries. This is called the N+1 problem. Then a user changes their name. Now you have to code a cloud function that notices that change and dynamically searches through every report (could be millions) and changes that user's display name on each one that their old name is on. This is a lot of programming for something that in a SQL database would be a feature out-of-the-box.
Users need to choose a username when they sign up. I want to make sure two users don't have the same username (ignoring capitalization). The solution to this problem in a Firestore NoSQL way? I had to add a lowercaseUsername field to every single user. When a user wants to change their username, it converts it to lowercase, then queries if it exists already and if not it changes their username. This is a total pain if your app is in production already, because backfilling every user document to add a lowercaseUsername field requires development time to write a single-use function to execute this migration. I found I had to backfill data all the time and eventually it just got too hard to work with.
Users can follow Trails. Trails can have multiple Users following them. This creates a many-to-many relationship between these objects. Managing this in Firestore was beyond tedious. It's somewhat straightforward when you only have to think about creating data, but then you have to deal with updating and deleting it too creates a ton of complexity.
As you can see, there's so many situations where a NoSQL database screws you up and causes a lot of development time-sink. SQL databases are very scalable and powerful now that they will serve your needs much better. And guess what? If you want the best of both worlds you can use BOTH. Put your relational database in a SQL database, and put your non-relational data (like the millions of live chat messages for example) in a NoSQL database and get the benefits of both with the tradeoffs of neither.
I still like a couple things about Firestore. Their client SDK that managed client-side offline-support was convenient, and for querying simple data that's non-relational in nature I would still consider it. But unless I know my project has a fixed completion date and won't run into any of the limitations mentioned above, I can't recommend it.
If you're like me and you enjoy getting the nested JSON response from your database, then you should consider using GraphQL. I switched to GraphQL paired with a SQL Database and found it to be the perfect balance where I get everything I liked from before in terms of easy querying, but then I still can query the database directly if I want to do something more involved. I also found that speed was still comparable, and I can add read-replicas if my database begins to slow down as it scales.
For other use cases, here are my recommendations:
If you want something that's just an easy bucket to put data into then consider checking out something like Contentful: https://www.contentful.com/
If you want something that gives you an easy-to-use open-source UI to make CRUD API's on top of an open-source Postgres database, consider GraphQL with Hasura + Postgres: https://hasura.io/
If you want a SQL database where you don't have to deal with data duplication, but also don't want to use GraphQL or manage database scaling, consider AWS Aurora:
Check me out: https://spencerpauly.com