Recently, we had an issue on a project which after some investigation turned out to be related to AppSync Resolvers 😰. Don't worry - this not me bashing AppSync or AppSync Resolvers, they are most certainly awesome! We just overlooked how we implemented one, got caught out, learnt a lesson and are now a little more careful with them. Let me take you for a little journey...
How we used AppSync Resolvers
We were using AppSync and AppSync Resolvers as you would expect. We had a GraphQL schema which defined Queries and Mutations and had a mix of "Data Sources". This is AWS's example below just to give context of what "Data Sources" would be in terms of AppSync and AppSync Resolvers.
Source: https://docs.aws.amazon.com/appsync/latest/devguide/resolver-components.html
Most of our "Data Sources" were lambdas which performed custom operations. But, in some cases we would have "Data Sources" which were direct connections to services like DynamoDB or Cognito. These are super nice to use, we would simply have a data source i.e. Cognito declared in CDK like below. There would be a resolver created from it declaring the fieldName
matching the field in the GraphQL schema.
const cognitoDataSource = graphQlApi.addHttpDataSource(
"CognitoDataSource",
`https://cognito-idp.${region}.amazonaws.com/`,
{
authorizationConfig: {
signingRegion: region,
signingServiceName: "cognito-idp",
}
}
);
cognitoDataSource.createResolver(
"CognitoStatusResolver",
props: {
fieldName: "cognitoStatus"
}
);
Note: I've taken out the request and response mapping VTL templates from the example code so it's not as hard to read. Have a look here if you want to understand more about creating AppSync Resolvers with either Javascript or VTL templates.
AppSync Javascript Resolvers
AppSync VTL Template Resolvers
So, we had been using lots of these "Data Source" resolvers and they were working seamlessly. The field names were declared in the schema like below and used wherever we could to provide direct integration to particular AWS services.
type Query {
getAllPersonDetails: [PersonDetails!]
}
type PersonDetails {
id: String!
name: String!
cognitoStatus: String!
}
Shown above we could call getAllPersonDetails
with the fields inside it and it would get the CognitoStatus
directly from Cognito since it had it's own Cognito resolver already created. No need for it to be passed through a lambda and perform the Cognito call. Fantastic! Except...
This is getting big
...our application began to grow. We added more and more GraphQL fields which could be queried. PersonDetails
began to look like this:
type PersonDetails {
id: String!
firstName: String!
lastName: String!
dateOfBirth: String!
address: Address!
phoneNumber: String!
emailAddress: String!
firstLogin: String
lastLogin: String
cognitoStatus: CognitoStatus!
}
What's wrong with this you might think? Nothing in particular really as we were able to query the exact fields we wanted (thanks GraphQL) directly from the "Data Source". But, as the application grew we made more and more queries to call these resolvers. And if one of those resolvers were to be carelessly included in a heavily used Query then we would be querying the connected service...directly...every time.
cognitoStatus: CognitoStatus!
Oops.
Ouch.
Yes we suddenly had caused a Denial of Service attack...on ourselves.
Wait, what happened?
With our excitement of using resolvers and connecting up the data sources all over the place we had forgotten that one of them was calling the Cognito API directly...which could be rate limited if used carelessly. And we used it carelessly. And we hit that rate limit, hard.
Actually, there was nothing wrong with us having the cognitoStatus
field and it having the Cognito "Data Source". But with the growing size of the application and users, the cognitoStatus
field was included in a Query when it shouldn't have been. This field was then subsequently queried many...many times. This meant it was calling the Cognito API directly...many...many times and Cognito didn't like that.
Queue us hitting the User Pool rate limit and people being unable to log in...yeah.
So what did we learn?
- We do still love AppSync Resolvers. The ability to connect your "Data Source" directly without the need for a lambda resolver is still fantastic. Better speed and less code make them a win.
- Just be considerate when using them. You can easily connect up a "Data Source" like Cognito and leave it there. But this can lead to future you or someone else unknowingly popping it into a Query which is hit so much it makes Cognito fall over. We will still be using resolvers of course, but just checking every time we write queries that we know what all the fields are actually connected too 😏.
- AWS is your friend and not your enemy in these situations. When it came to understanding this problem we could use both CloudWatch and CloudTrail to gain a quick insight into our API calls and understand where our "TooManyRequestException" was originating from. Take time to read up on these services and be able to use them in a critical situation like this one!
Thank you for listening
Thanks for coming through this journey with me! I hope it was helpful and not too scary. Boo! Bye. 👋
Top comments (0)