At Daniel Wellington, one of our core principles since 2016 has been serverless first, and almost everything we have developed over the last four years is based upon AWS’ serverless products.
We are heavily built upon CloudFormation and have been since five years and all resources need to be defined in a CloudFormation to be able to be created in any environment outside development. At the time of writing this article, we have 4 310 stacks across all our accounts & regions where 14.45% of the stacks are located in AWS China. At this stage, we started to discover various corner cases and undocumented behavior, the internal saying was CloudFrustation for the service. We also identified a need to align our way of working and follow best practices.
It started with the problem of strings with leading zeros in CloudFormation. Putting a string with a leading zero in a YAML-based CloudFormation file will lead to the aws-cli checking if the string contains only numeric characters. If it does, an implicit cast of the string to a numerical data type is made, which automatically strips away any leading zeros. We noticed this behavior the hard way, as an ID with leading zeros embedded in a CloudFormation file led to very confusing errors in our test environment. After this discovery, we decided it was time to come up with a way of detecting these kinds of errors early and in an automated fashion.
We started to build our own CloudFormation YAML parser lib but found the awesome cfn-lint project with a great set of built-in linting rules and a plugin API.
The way the plugin API works is that custom rules are implemented as Python classes and when running the cfn-lint binary, a file path to the a folder containing the class files or an import path to a module containing the rules must be supplied. For ease of packaging, installation and distribution of our custom rules, we decided to create a wrapper around cfn-lint containing the rules we created.
Over time, more rules were developed in response to discoveries of new corner cases, common mistakes, cost savings and aligning teams. Below is a timeline where we want to highlight some rules that have supported us well.
During 2018, we had a lot going on, and over two years of increasing lambda usage, we noticed that our CloudWatch logs were consuming massive amounts of storage space as the default retention period is infinite, even though we rarely needed to look at logs older than a month or a quarter. This led to the creation of a rule warning if the retention period is not explicitly set on log groups.
At one point, we hit the maximum quota of API Gateways of the endpoint type “edge”. Not many of the allocated API Gateways by that time actually needed edge capacity (same infrastructure as CloudFront), it was a legacy as we started with serverless in 2016 and the regional support of API Gateway was released by end of 2017. We could also see that the move was slow due to reuse of outdated CloudFormation templates from other projects whenever we built something new. To help with the migration and reduce costs as regional type endpoints are cheaper, we created a rule warning if the endpoint configuration was not explicitly defined.
By 2019, we had another period of cost-efficiency improvements, it was decided that all DynamoDB tables should use pay-per-request billing, so we made a rule warning if configuration files containing provisioned throughput were detected. At the end of 2019, the node.js 8.x lambda runtime support was dropped by AWS. To help with our migration effort, we made a rule causing an error status if any deprecated lambda runtime were detected in the CloudFormation files of a project.
In July 2020, during a period of security improvement work, we added a rule warning if the usage of AWS managed “FullAccess”-policies or misusage of “*” for Action or Resource.
if the use of different AWS resources’ built-in full access (*) IAM policies were detected.
At the time of writing this article, we support the following rules
- No mismatched log groups and subscription filters
- No missing endpoint types
- No missing log retention period
- No use of deprecated lambda runtime
- No use of full access policies
- No use of leading zeroes in numbers or strings
- No missing/implicit log groups for lambdas
- No use of old style subscription filters
- No use of provisioned throughput
- No use of reserved environment variable names
- No use of reserved words for Dynamodb column names
- No malformed subscription filters
Now we want to open source this (yet another) lint’ing tool that can be found here. We believe its stable and mature, although we expect to find new cases that can be transformed to rules for automatic detection and mitigation in the future. If this tool sounds useful to you, please check out the repository and feel free to report issues or submit pull requests.
Do you want to know more about how it is to work with technology at Daniel Wellington? Take a few minutes to watch our 3 min video story up to the right, if you are open to new challenges, check out our open tech positions.