Author: David Barrat
Date: November 20, 2019
[Originally published on the Fauna blog.](https://fauna.com/blog/building-an-authentication-saas-with-faunadb
I created [Authdog] to make an authentication layer easy to integrate in any piece of code (mobile, web app, desktop, service to service, script etc.) while keeping the service at the lowest price possible for the end user. Authdog provides a fully-fledged dashboard in which users can understand authentication workflow out-of-the-box without having to integrate any other tracking service to get log data. It offers the same quality of service to businesses of any size, from start-ups to big corporations, without them having to worry about scalability of authentication.
Authdog uses Lambda function as a backend to permit ideal scalability and pricing while growing up. Serverless was chosen due to its simplicity to manage multiple staging/credentials and its native features integrated (KMS encryption, Webpack bundling system, CloudFormation template creation etc.). As Authdog aims to provide a security layer to various companies of any scale, opting for a backend completely serverless (including the database) will have a very positive impact in the company's growth, and will help to reduce maintenance and ops costs as well. FaunaDB’s low latency characteristics offer the ability for Authdog’s consumer applications to get access to their applications faster.
Client Side (Netlify):
- React (UI components)
- Redux (State Management)
Backend (AWS / API Gateway/SQS :
- Apollo (serverless)
- --> OAuth2.0 workflow
- --> Authorization / ABAC
- Tenants / Applications
- --> Authorization / ABAC
- SQS / Pusher (Notification system)
As for Data Storage, we use FaunaDB to store and manage all Authdog account data (user profile, tenants, security preferences etc.) and external app data (external apps metadata, groups, rules, permissions, users of Authdog-registered apps, etc.).
When designing the architecture of Authdog, I faced the question of scale. I want to build Authdog into a global authentication service that caters to the needs of businesses both large and small. I opted for a serverless stack in part to address this need. Using a serverless architecture enables Authdog to achieve desired scale. However, the data layer posed challenges. Databases have always been a bottleneck to highly scalable applications.
To address this issue, I first looked at a distributed/replicated pgSQL approach, but quickly realized that making a pgSQL infrastructure has a higher cost to permit high availability, especially for thousands of concurrent reads. You need to manage and adapt your infrastructure with physical nodes (Master/Slave replication), per staging environment. In contrast, FaunaDB permits you to have multiple staging environments which won't require such DevOps (no concurrency issue at scale), and the ability to set up Dev/Uat staging (similar performance as production) available 24/7 for testing/development. All of this comes at a very low cost because you're paying only when you're using the database, whereas with a pgSQL infrastructure, you're paying all the time, or have to enable the node when needed, which requires extra DevOps.
So I evaluated various NoSQL databases (MongoDB, DynamoDB, Voldemort, Riak) that offered the hope to scale horizontally. However, I still felt limited by the pricing of these solutions, at scale, and none of them had all of the features I needed. They all seemed to scale at the expense of some other capability essential to my application. For example, they might scale, but without consistency, or with excessive DevOps.
Then, along came FaunaDB offering a very appealing, scale-able pricing model with all of the features I needed:
This was the main appeal of FaunaDB. Of course, Dynamo is serverless as well, but it isn't ACID-compliant. From my understanding, FaunaDB is the only NoSQL solution that's 100% ACID-compliant, without any caveats with respect to the number of documents, keys, or data sharding. We can trust that our data will be correct, no matter how we organize it, or what types of queries we write against it [1,2,3].
We knew that we couldn't solve every problem with NoSQL. We would need, at some point, to have structured data and to make relationships between entities. And I think this is what new databases like FaunaDB are trying to achieve [1,2].
When you are growing your business, you don't want to spend all your time doing DevOps. You don't want to spend all your time designing an architecture that can support more users. Given that my database would be storing all of my account and external app data, scalability was crucial. With FaunaDB, I knew that I would be able to support millions of users the same way I was using it in development with only one or two users .
For an authentication app, having high availability is crucial. If you don't have authentication available, you can't use any feature of your application. So it's important to be globally distributed just in case one datacenter is not available. You want multiple replicas all over the world using AWS, Azure, Google Cloud, IBM Cloud. And the ability to synchronously load balance between those regions .
I need this feature for managing my FaunaDB database without having to share my admin credentials with all the people I will be working with. So, I can create a key for low-level administration, and another key for an end user who might only need to consume my fauna database as read-only. It's very customizable. This built-in functionality is very useful for what I'm trying to achieve .
With FaunaDB, you've got this abstraction of objects. You can store json objects straight in your database and retrieve them with indexes. It's a bit more complex than MongoDB, because you have to define your index manually, but in the end you get better performance because you are defining exactly what you need to index. In contrast, with MongoDB, the indexes are created quite naively; you're not indexing a specific column, so you don't have the best indexes for your queries.
When I started development on AuthDog, FaunaDB had not yet released the native GraphQL API, so I did not use it for this particular project. Historically, I've developed on top of Apollo Serverless, but I would like to use FaunaDB's OpenCRUD API in the future. Even for this project, Authdog will be divided into smaller microservices, so there will be microservices using this native GraphQL interface, at least for centralizing existing other microservices interfaces.
FaunaDB offers a usage-based pricing model, including a free tier that is unthrottled. With FaunaDB, I pay for only the resources consumed by my queries, not idle time. Should my usage cross the free tier, I am billed for the excess. As a service operator, I find comfort in the fact that my application will never be blocked should I see spikes in load (which I do hope to see assuming Authdog is successful!).
Although I've been very happy with FaunaDB, there are a few more features from their roadmap that I'm excited to try out in the future:
This is something I really like in ORMs. You can define all of the relationships between models, and then when you delete a model which is linked to another model, it will delete all of the linked models without you needing to write any custom code. So it will prevent bugs caused by connected collections not being deleted and causing random data to be sitting somewhere without you knowing where it fits in and what it's linked to. In general, I'd like to see a few more relational features in FaunaDB.
In all of my applications, I would like to get real-time insights when my data changes, and then notify my users, so you don't have to refresh the page to see if the data has changed. So for now, my plan is to use a simple queue service on AWS, but having something provided directly by the database would be a huge plus.
I've heard that Fauna's temporality is a sort of built-in auditing/version control that allows you to access a snapshot of your database at any point in time. However, I wish there were an easier way from within the dashboard to replicate a database's previous or current state. From my understanding, I would need to write some custom code to do this.
FaunaDB offers features that were critical for the development of Authdog, including native ABAC, scalability, and 100% ACID transactions. It’s the only serverless databases that checked all of the boxes for me. Furthermore, I’m excited to see even more features like streaming and migration coming in the next few months.
Shipping a product as fast as possible is critical as a startup founder, and FaunaDB has helped me in my mission.
David Barrat, founder of Authdog, is a Data Engineer within the Data Engineering & Dynamic Regulatory Reporting group at Novartis Pharmaceuticals in Basel, Switzerland, with a focus on Architecture, Data Integrity, and Security to make Clinical applications safer and more robust for large audiences.
Note: Authdog is still under development & testing. First public release is expected early 2020 (late Q1/early Q2).