Guido Zuidhof

Posted on Aug 26, 2020

Implementing serverless passwordless login with FaunaDB

#fauna #tutorial #serverless #authentication

In this tutorial we’ll cover:

Why serverless is a good choice for SaaS products
Which database to choose to go together with you serverless functions
How to implement passwordless login into your app using FaunaDB.
What the upsides and downsides of passwordless login are.

Introduction

For most applications you will need to keep some state related to your users and log in flow, I found FaunaDB to be the perfect companion for that in a serverless setting. In this post we'll explore how to implement passwordless (a.k.a. magic) login in such a setup.

A stateful, serverless, SaaS stack

Why serverless for SaaS?

Serverless has some big advantages compared to the classic load balancer + application servers setup, especially when you are just starting out.

The cost of operating your service is predictable: if nobody is using it the cost will be zero, you won't pay for idling servers.
If your service becomes unexpectedly popular, it won't die because your application servers can't handle the huge sudden influx of users. In serverless the cost will still be predictable, most serverless providers will serve millions of requests for a single dollar: it won't break the bank.
Many serverless environments such as Cloudflare Workers and Vercel run on the edge. This means that your serverless function can be physically located close to your end user which means shorter response times and better SEO.
Your code runs in a more isolated environment: if you have a mistake in your code that puts your server in a weird state, in a serverless environment that won't be as disastrous as the individual runtimes are usually short-lived.

Choosing a database for serverless

Most real world applications need a single source of truth for data, for example your users table. If you use a traditional database such as MySQL or Postgres you will need a server for this database running 24/7 and you again introduce a single point of failure. Also, if you have a global audience for many of your users this database will be geographically distant.

Serverless, and in particular serverless on the edge, allows your code to run close to the user, but if that server is constantly talking to a geographically distant database instance you lose this latency advantage.

In recent years so-called "serverless" databases have become the perfect companion database for serverless. For these databases you pay for usage - not for uptime - very similar to serverless functions themselves. Examples are Google Cloud Firestore, CloudFlare Workers KV and FaunaDB. They each have different geographical distribution, guarantees, and pricing. A few words about each:

Workers KV

Workers KV can only be used if you are on the Cloudflare Workers serverless platform. It is a key-value store designed to scale to many reads per second, but with very infrequent writes (up to once per key per second). This makes it a great fit for storing for instance your website's static HTML and JS files, or for caching.

If two writes happen to the same key at nearly the same time in two different datacenters, the state of the two keys will be different. It is possible that a read in different datacenters show a different value for the same key for a short while in this case. After around 10 seconds it is guaranteed to be globally consistent again, the latest write wins. There are no transactions or advanced queries.

Storing something like user data or login tokens is problematic in such a database. If two users sign up with the same username, one of the users will just disappear. In this article we will be building a passwordless sign up and login service, using just Workers KV for this would make our life very difficult.

FaunaDB

Unlike Workers KV supports ACID transactions despite being a globally distributed database. That means that in the case of the two conflicting signups like in the previous example, one of the two transactions will fail at the time of writing, not eventually afterwards. Without this guarantee it is very difficult to create a system that doesn't have weird edge cases.

With FaunaDB your data is replicated all over the world which ensures low latency, especially for reading, which fits perfectly with the advantages of serverless. For querying, we can use the Fauna Query Language (FQL), FaunaDB's own query language that feels like a functional programming language, or GraphQL.

Google Cloud Firestore

Firestore is Google Cloud Platform's offering of a pay-per-use database. It features ACID transactions just like FaunaDB, but not across different collections. That limitation means your own application code will have to be more complex when doing transactions across different collections.

Your data is replicated across different datacenters, but only within a single region or multi-region. Multi-region here means two datacenters on the same continent, in Europe that would be Belgium and the Netherlands (being Dutch I can tell you they are only a few hours drive apart!). This poor geographical distribution means many users will still have poor latency to the database.

To query Firestore you would generally use one of the Google Cloud Platform SDK libraries. I personally have found this to be a big problem: their server SDK only works on Node. Many serverless environments (such as Cloudflare Workers) run in a WebWorker-like environment, not Node.

Decision time

The decision here is between FaunaDB and Google Cloud Firestore.

FaunaDB wins here in a big way because:

It is globally geographically distributed
It gives us the right guarantees to make a robust system.
And quite importantly too: the SDK actually works in serverless JavaScript environments.

The pricing of both options is similar, with FaunaDB being cheaper at the time of writing. Both have the big advantage of a pay-per-use model: you don't pay for uptime but for various operations. This predictability of costs helps me sleep at night.

Firestore is not without its merits too: it is highly integrated with the whole Firebase environment. This makes for a great integrated experience, but those usually come at the cost of less control, more lock-in, and bloat on your clientside code. For comparison: the Firestore component of the Firebase SDK is 85KB (minified gzip), FaunaDB's client library is 13KB (minified gzip). When you add the other components of Firebase (such as auth) we get to 215KB.

All in all: FaunaDB is the right choice for serverless.

Passwordless Login

Magic link login, also called passwordless login, is getting more and more popular. Large platforms like Slack and Medium (screenshot above) now feature this option. In a magic login scheme there is no password involved in authentication, instead the user receives a single use login link in their e-mail inbox whenever they want to log in.

Some benefits:

You don't have to store passwords and users can't forget their password. This also saves in support effort helping users that struggle to reset their password.
Decreased friction in signup (I believe I read >60% more signups somewhere), also you verify e-mail addresses before account creation simplifying things a lot.
You can still have a password-based login too, they are not mutually exclusive.

There are also some downsides to passwordless login approach, more on that later in this post.

Implementing passwordless login in serverless

Before we dive into the code, something to note: We'll be building a single form and endpoint that can be used both for login and signup. It is important that we do distinguish between the two cases and have clear signalling for the user. Users often have multiple e-mail addresses, and they may have forgotten which one they used to sign up for your service, you wouldn't want them to accidentally create a second account.

Aside from that, signup and login links also have different requirements: login links should expire much more quickly and should only be used once as they are essentially a one time password.

Let's go!

The passwordless flow

The user submits the form with the e-mail address they wish to either log in or sign up with. We should verify that the submitted value is a valid e-mail address.
We generate a cryptographically sound random token on the server (i.e. a random string with >32 bytes of randomness), let's call this the token.
We hash this token, we can use a fast hash like SHA256 because our input has sufficient entropy: don't ever do this for user passwords. Let's call this hash token_hash.
We look up the e-mail address in our database to determine whether this is a user that is logging in, or a new user that we should send a signup link.
In case this is a new user
- We store token_hash and the e-mail address in a signup_tokens database collection with a fairly long expiration time (hours or days).
- We send them an e-mail with a link to www.example.com/signup?token=token
- We redirect the user to a page that makes clear the user has received their signup e-mail. (If the user entered the wrong e-mail address and instead wanted to log in it should be very clear on this page they should check the e-mail address they entered)
In case this is an existing user:
- We insert the token_hash into our login_tokens table along with the user's metadata (e-mail address, unique account id), with a short expiration date (minutes).
- We send a welcome email to the user with a link to www.example.com/login?token=token
- We redirect the browser to a page saying a magic login link was sent to the specified e-mail address.

Let's set it up

In FaunaDB your data lives in collections, they are not too different from tables in a relational database. There is no strict schema, but by creating indices we can still get rules enforced at the database level, for instance that the e-mail address of a user is only used for one account.

These indices also allow you to efficiently find the documents you are looking for. In most applications you know in advance that you will for instance be searching for a post by its ID, or that you want to list all posts by a certain user.

We set up the collections and indices by running FQL queries, you can save these in a file for reproducibility. Alternatively, you can set it up using the visual UI in the FaunaDB Dashboard.

Setting up the collections and indexes (using FaunaDB FQL shell):

First we will create a collection for our users and an index:

CreateCollection({ name: "accounts" });

CreateIndex({
  name: "unique_emails",
  source: Collection("accounts"),
  terms: [{field: ["data", "email"]}],
  unique: true,
});

CreateIndex({
  name: "accounts_by_email",
  source: Collection("accounts"),
  terms: [{field: ["data", "email"]}],
  values: [{field: ["data", "account_id"]}],
  unique: true,
});

In the terms argument we can specify a collection of terms that we want to index on. Here it is just the e-mail field. In the values argument we specify what we want to return from each document when we execute the query, here we only care about the account id.

By setting the index to be unique FaunaDB will enforce the combination of terms and values to be unique. If a transaction were to create a new account with the same e-mail address, it would be rejected. The first index created above is there to ensure that an e-mail can only be used for a single account, the second one we will use to retrieve the account associated with an e-mail address. This kind of constraint is very difficult to ensure if we were using an eventually consistent database.

Next, we create a collection and index for the login and signup tokens:

// Login tokens, the TTL we put here is to make sure they get cleaned up automatically. 
// we will set a shorter expiration in our application code
CreateCollection({ name: "login_tokens", ttl_days: 7 });
CreateIndex({
  name: "login_tokens_by_hash",
  source: Collection("login_tokens"),
  terms: [{field: ["data", "token_hash"]}],
  values: [
      {field: ["data", "expires_at"]},
      {field: ["data", "account_id"]},
      {field: ["data", "used"]}
  ],
});

CreateCollection({ name: "signup_tokens", ttl_days: 7 });
CreateIndex({
  name: "signup_tokens_by_hash",
  source: Collection("signup_tokens"),
  terms: [{field: ["data", "token_hash"]}],
  values: [
      {field: ["data", "email"]},
      {field: ["data", "expires_at"]}
    ],
});

Writing the /magic endpoint

In this endpoint we should first check whether the user supplied e-mail address is valid, and then we generate a long random string (token), and hash it to create token_hash.

Then we talk to our database, FaunaDB offers an expressive language called FQL for writing complex queries: in a single query we can check for the existence of the account e-mail and write the right kind of token.

The FQL query

First I will show you the whole query, and then we'll break it into pieces to see what's going on:

Let(
  {
    token_hash: "hash-of-the-random-token-we-generated",
    email: "bob@example.com",
    account_match: Match(Index("accounts_by_email"), Var("email")),
    account_is_new: Not(Exists(Var("account_match"))),
  },
  If(
    Var("account_is_new"),
    Do( // In case it does not exist
      Create(
        // Store signup token for new account
        Collection("signup_tokens"),
        {
          data: {
            token_hash: Var("token_hash"),
            email: Var("email"),
            expires_at: TimeAdd(Now(), 8, "hours"),
          }
        }
      ),
      "signup"
    ),
    Do( // In case a user exists with this e-mail
      Create(
        // Store token for existing account
        Collection("login_tokens"),
        {
          data: {
            token_hash: Var("token_hash"),
            email: Var("email"),
            account_ref: Var("account_match"),
            used: false,
            expires_at: TimeAdd(Now(), 30, "minutes")
          }
        }
      ),
      "login"
    )
  )
)

Let's break this query up!

Let(
  {
    token_hash: "hash-of-the-random-token-we-generated",
    email: "bob@example.com",
    account_match: Match(Index("accounts_by_email"), Var("email")),
    account_is_new: Not(Exists(Var("account_match"))),
  },
  ...
)

The top level Let call takes two arguments, the first argument allows us to define a number of variables that we can then use in the rest of the query with Var("name_of_the_variable").
To find a reference to the account we use the accounts_by_email index we created earlier, this allows us to retrieve the account ID that matches a given e-mail.

With this part of the query we have now determined whether a user already exists with that e-mail address. We'll use that information to either generate a signup token or a login token.

Now let's have a look at the rest of the query:

  If(
    Var("account_is_new"),
    Do( // In case it does not exist
      Create(
        // Store signup token for new account
        Collection("signup_tokens"),
        {
          data: {
            token_hash: Var("token_hash"),
            email: Var("email"),
            expires_at: TimeAdd(Now(), 8, "hours"),
          }
        }
      ),
      "signup"
    ),
    Do( // In case a user exists with this e-mail
      Create(
        // Store token for existing account
        Collection("login_tokens"),
        {
          data: {
            token_hash: Var("token_hash"),
            email: Var("email"),
            account_ref: Var("account_match"),
            used: false,
            expires_at: TimeAdd(Now(), 30, "minutes")
          }
        }
      ),
      "login"
    )
  )

At the top here is an If call, it takes the form of If(<condition>, <trueExpression>, <falseExpression>) . The condition here is whether we were able to retrieve an account id (which means we create a login token), and otherwise create a signup token. Let's look at both cases:

Do( // In case the account does not exist
      Create(
        // Store signup token for new account
        Collection("signup_tokens"),
        {
          data: {
            token_hash: Var("token_hash"),
            email: Var("email"),
            expires_at: TimeAdd(Now(), 8, "hours"),
          }
        }
      ),
      "signup"
    ),

In the above query we create a signup token. The Do function at the top allows us to specify multiple expressions that are executed in order. We use that to set the return value of the whole query to the string "signup". Before we return that value we create a new entry in our signup tokens collection.

Do( // In case a user exists with this e-mail
      Create(
        // Store token for existing account
        Collection("login_tokens"),
        {
          data: {
            token_hash: Var("token_hash"),
            email: Var("email"),
            account_ref: Var("account_match"),
            used: false,
            expires_at: TimeAdd(Now(), 30, "minutes")
          }
        }
      ),
      "login"
    )

This is the other case: this gets executed if a user with the e-mail does already exist. In that case we insert a login token with a short expiration time of 30 minutes.

After the query

Our query will return an error if something went wrong, or the string "signup" or "login", based on that we send the right kind of e-mail. We either send a welcome e-mail with a link to the /signup?token=... endpoint, or a magic login e-mail with a link to /login?token=....

Afterward we redirect the user to a page with details on what to do next (i.e. open their inbox).

/login and /signup endpoint

Now, the user has received an e-mail with the magic login or signup link. We need to create a handler for the login and signup endpoints. Both need to be HTTP GET endpoints as that's what the user clicking the link will send.
First we will hash the token that was sent in the URL to get token_hash and compare this to our database. The login and signup endpoints both listen to different URLs, so we will know what query to run.

The login query

The goal of this query is:

Check if such a login token hash exists at all, else return "invalid"
Check if the login token is not too old, else return "expired"
Check if the token has already been used before, else return "used"
At this point we determined that the token is valid, and we need to mark the token as used.

Let(
  {
    token_hash: "hash-of-the-token-submitted-by-the-user",
    token_match: Match(Index("login_tokens_by_hash"), Var("token_hash")),
    token: If(Exists(Var("token_match")), Get(Var("token_match")), null),
  },
  If(
    IsNull(Var("token")), // Token not found?
    "invalid",
    If(
      GT(TimeDiff(Select(["data", "expires_at"], Var("token")), Now(), "seconds"), 0), // Expired?
      "expired",
      If(
        Select(["data", "used"], Var("token")), // Used?
        "used",
        Do( // Valid token
          Update(Select("ref", Var("token")), {data: {used: true}}), // Set as used
          { account_id: Select(["data", "account_id"], Var("token"))} // Return value
        ),

      ),
    )
  )
)

I will not dissect this query like the token insertion before (as this article is getting way too long!), but you should be able to read it more or less top to bottom in that it checks each of the valid token requirements one by one.

We use FQL here to write a powerful query that saves us from writing a lot of code in our application code. Being able to write and read queries like a more conventional programming language is a big selling point for me.

After the login query

The result from this query we will either get the reason of failure as a string ("invalid", "expired", or "used", or a JSON object {account_id: 12345}). Within the query itself we mark the login token as used: true to prevent re-use.

From here your serverless application code takes over: you can generate a JWT token or a session cookie to keep the user logged in (or whatever authentication system you use), or in case of failure: let the user know why their token was not valid.

The signup query

The signup token query is the same but simpler: we only have to check that the token exists and has not been used yet. As they are tied to a single e-mail address we don't have to check whether it has already been used: the constraint we set that we can only have one unique user per e-mail address will prevent abuse here.

Let(
  {
    token_hash: "hash-of-the-random-token-we-generated",
    token_match: Match(Index("signup_tokens_by_hash"), Var("token_hash")),
    token: If(Exists(Var("token_match")), Get(Var("token_match")), null),
  },
  If(
    IsNull(Var("token")), // Hash not found?
    "invalid",
    If(
      GT(TimeDiff(Select(["data", "expires_at"], Var("token")), Now(), "seconds"), 0), // Expired?
      "expired",
      { email: Select(["data", "email"], Var("token")) }
    )
  )
)

After the signup query

This query will return "invalid", "expired" or a JSON object such as {email: "bob@example.com"}. You can now present this new user with a form to enter additional account details to create an account for them. You can be sure that their e-mail address was already validated and log them in immediately after that form.

Conclusion and notes

And there you have it, magic login and signup for your app 😃. By using FaunaDB we were able to create this without writing much code at all, it is all handled for us by the database.

Important note

Make sure to rate limit e-mails sent, especially to individual addresses. Also introduce some anti-bot measures to the magic signup form (such as captcha or honeypot values) to prevent abuse. This is no different from a normal signup form.

What are the downsides of passwordless login?

The user needs to open their e-mail client on the machine they want to log in on.
The wrong browser may be used to open the link. In this case the user would have to manually copy/paste the link into the right browser.
Power users especially will prefer to use their password manager that auto-fills password login. For them the passwordless login flow is actually more painful.
E-mail deliverability is important: if your signup or login e-mails are delayed a lot or end up in people's spam folders or are rejected completely users will not be able to log in. Use a reputable transactional e-mail provider to reduce the chance of this happening.

Should I add passwordless login? My advice:

Remember that you can still have password or social login authentication aside from magic login links. They are not mutually exclusive.
Magic links are amazing for signup: they increase conversion and not having to somehow verify an e-mail after signup saves a lot of complexity.
Start out by building your product with only magic links and/or social logins, that way you don't have to store users' passwords.
If your users start demanding it, create a password based login alongside magic login. Depending on your audience this may never happen. Note that password reset has almost exactly the same flow as passwordless login, so you can probably re-use a lot from this.

Top comments (1)

fauna-brecht • Jan 5 '21

Cool stuff! Great article

DEV Community