DEV Community

Cover image for Amazon Cognito: The Ugly Parts (and our workarounds)
Raphael Jambalos
Raphael Jambalos

Posted on

Amazon Cognito: The Ugly Parts (and our workarounds)

Don't get us wrong. Amazon Cognito is sufficient as an authentication service. We've been using it for more than two years now. But as our needs get more complex, we find ourselves boxed in by Cognito's limitations (the ugly parts). In this post, we will tackle these limitations one by one. And how we addressed them.

For those new to Cognito...

Amazon Cognito is an AWS service that allows you to add authentication and authorization to your application. It is designed to scale to support millions of users.

To get started, you create a user pool. It stores your app's users and provides you APIs for login, register, etc.

Before Cognito, you'd have to find a Python package or Laravel package that adds user authentication to your application, storing your user data on your database. With Cognito, you store user data like email, and passwords in the user pool.

During user login, you configure your FE to talk to Cognito (using an FE library called Amplify) and it gives you 3 tokens that your application can use as the user browses your website. These tokens prove the user's identity and declare what functionalities they can access.

With that, let's proceed with the ugly parts...

1. You cannot make changes to attributes after User Pool creation.

When you create a user pool, you set both standard and custom attributes. Standard attributes are preset attributes like name, birthday, and email (Cognito has a list, you just tick whichever you need). Custom attributes are the attributes with names and types you specify (i.e. income level, education level, etc).

But these attributes can only be set when you create the Cognito User Pool. After the user pool has been created, it can no longer be changed. No additional attributes, no removal of attributes after creation. This limitation forces us to pair our Cognito user pool with a separate database like a DynamoDB table, so that our application can accommodate new attributes later on. We stick with essential data in Cognito and use DynamoDB to store all other user information.

When the user updates their account (i.e. name, gender, section, income), our FE would have to call an API endpoint that updates both Cognito and DynamoDB. It would have been nice to just have to call Cognito and not develop our own set of APIs.

Unfortunately, if you need to change attributes in a Cognito user pool, you will have to create a new user pool and migrate your users there (an exhausting affair, as you'll see here)

2. No support for search by custom attributes.

Cognito cannot search through custom attributes because they aren't indexed attributes. It can only query standard attributes

This forces us to put even more functionality into a custom API and query DynamoDB instead.

3. Customizing the hosted UI is very limited. But Custom UI does not have the same feature set as hosted UI

For Hosted UI, Cognito's OAuth is robust. When the user signs in, it gives you a code that you can exchange for token_set. Your backend does this exchange so you can keep track of your tokens.

For Custom UI, we use Amplify. We cannot use the code method anymore. We use Auth.signIn(). It verifies the username and password, and it sends the token set to the frontend. This is troublesome because our FE now needs to call our custom BE to store the session and the tokens associated with it.

4. You can't keep track of the sessions

When your user login, Cognito generates 3 JWT tokens: id_token, refresh_token and access_token. Let's call them the token set.

Cognito keeps track of the token set internally but does not give you APIs to do the same. So, if you need a feature like "list all active sessions", you would need to store the sessions on your database. This means that after every login with Cognito, your FE would have to call your BE APIs to send the token set and create the session on your database. All this trouble just so you can show these sessions later.

Cognito has only 3 APIs for token management:

  • Generate token set (after login)
  • Revoke one token
  • Revoke all tokens

Since we are keeping track of the sessions on our database, we have to wrap those Cognito APIs into our own API Endpoint. If the user wants to delete a session, we delete it on our database and revoke the token in Cognito.

We also have to create an extra API endpoint for the Show All Sessions API, which is currently not supported by Cognito.

5. Cognito APIs have rate limitations (and some couldn't be increased)

On one of our projects, we had to migrate 3M users from an on-premise database. We also had to migrate their passwords (if it complies with Cognito's password policy). Cognito has a CSV Import functionality but it does not allow importing passwords, so we had to find another route.

We created a lambda function that reads a CSV from S3. It reads the CSV row-by-row and sends each row to SQS. SQS queues each row, and another lambda function consumes those tasks. For each user, the consumer lambda function does the following:

  • Create the user in Cognito [CreateCognitoUser]
  • Create the user in DynamoDB, with the user_sub from Cognito as the primary key
  • Using Cognito Admin SDK, change the user's password with the one in the file [AdminUpdateUser]

Image description

Since we have 3M users, this operation took around 26 hours. And this is where we encountered several issues.

[1] The CreateCognitoUser API has a limit of 100 requests per minute. We also asked AWS to expand their request limit to 200, which they did.

[2] The AdminUpdateUser API has a limit of 30 requests per minute. And this cannot be requested for a higher limit. And this is where we got stumped. We repeatedly got throttled here so we had to add code in our Lambda that retries if we got throttled. But it had the unfortunate consequence of also limiting our rate of user creation to 30 requests per second. This caused our deployment to last for 26+ hours.

[3] Since our workflow ran for 26+ hours, our Lambda scaled up to accommodate the demand. It eventually consumed all available concurrency in the prod account. This means all the other Lambda functions couldn't run anymore because ours hogged all the concurrency in the account! We had to request AWS to expand our account-level concurrency to 10,000 and allocate just 2,000 concurrency to Lambda functions involved in the migration. The 8,000 concurrency is for other lambda functions to use.

7. Dirty data

Our 26hr migration run had some problems:

  1. The user is created multiple times in Cognito even if they appear only once in the CSV. We had to manually delete these users
  2. The user is created in Cognito, but not in DynamoDB. This may be because the Lambda function stopped at the part where the user is created. We deleted these users as well.
  3. The user is created in DynamoDB but not in Cognito.
  4. Some users were created with the state FORCE_CHANGE_PASSWORD, which our application wasn't designed to handle.
  5. Cognito treats emails as case-sensitive. So a user with the email "Jamby@aws.com" and "jamby@aws.com" are different users.

8. Our bill spiked during the migration

Of the 3M users our customer had, perhaps only 300K are active users at any given point in time. Cognito prices per monthly active user. So we expected to be billed for only 300K users. However, if you created the users programmatically, they are considered active upon creation. Hence, for the first month, we had to pay for 3M active users! Not to mention the additional costs of running thousands of Lambda functions for 26 hours.

9. It's very expensive to "dry run" a big migration

With 3M users, we have a lot of variety in our data. It would have been ideal if we could run the whole dataset before it goes to production. But doing so would cost over 9000 USD! So we can only test with a subset of the data. This meant we didn't encounter all sets of possible errors during the dry run. And so we had to face a lot of the errors after the production migration.

That's all folks!

I hope this article showed you what you're getting into with Cognito, and the potential pain points you'd encounter down the line.

Did I miss anything? Comment it below!

Photo by Onur Binay on Unsplash

Top comments (1)

Collapse
 
emeria profile image
Chris Gracia

Did you find a method to "revoke" a user's access from they system? We are trying to design around the need to remove a user and log them out instantly if an admin disables their access. Running into issues doing so with the way the tokens work and the lack of a session.