I was working on the migration to AWS of my client, an e-retailer, when I received a phone call: “Paul, we are in trouble, our site has been attacked by denial of service for a week; we are losing money! Can you help?“
Rushing the migration project was not an option. The client had not yet containerized their app, we hadn't done any data migration test, nor any load test. But as I wrote in a previous blog post, Cloud services can also benefit on-premises infrastructure. Time to prove it!
The first analyzes carried out revealed the attacker used multiple IPs (from the TOR network) and targeted the site's login page. This page makes database calls and the database was overloaded, causing first latency (then outages) throughout the system.
I get to work right away. Thanks to Terraform, in half a day, I have a functional stack in a test environment, ready to be promoted to production.
The tech stack
Here is a diagram of the technical stack deployed to counter the attack my client faced:
Here are the main additions to the existing stack:
- Instead of sending directly to my client's on-site infrastructure, DNS now sends requests to CloudFront.
- CloudFront is a managed Content-Delivery Network (CDN). That is to say, it makes it possible to serve cached content (or not) from locations close to clients.
- During the incident, initially, it is not the cache functionality (which reduces the load on the servers, and the latency on the client side) that I was looking for, but rather the possibility of exposing an HTTPS endpoint as an proxy between visitors and my client's infrastructure.
- Before relaying requests to my "origin" (the existing infrastructure), CloudFront passes through AWS WAF
- WAF is a Web Application Firewall, which allows the inspection of HTTP requests.
- On AWS WAF, I configured rules based on AWS managed rule sets. Here are the rules that proved most useful in stopping the attack:
- The
AWSManagedRulesAnonymousIpList
rule group contains a rule which precisely targets known exit IPs of the TOR network as well as the most frequently used VPN services, and another one listing hosting providers (who may zombie machines). This rule will do 95% of the job. - The second
AWSManagedRulesATPRuleSet
allows precisely to protect the login pages, by analyzing requests that are made: do they include all the expected login form fields? Is an IP responsible for multiple authentication failures?- In addition to these rules, as a precautionary measure, we put in place the "usual" rules: SQL injections, PHP vulnerabilities, OWASP top10, etc.
- Finally, we added a rule allowing IPs to be whitelisted (the economic model of our e-retailer involves quite a lot of traffic from a few partners, whose IPs were caught by the aforementioned hosting provider list).
- The
Implementation and result
We moved my clients main DNS zone to the Route53 service (luckily, all the preparatory census work had been carried out before). This brings at least two benefits:
- The automation offered by Route53, in conjunction with Terraform, allowed me to quickly generate the DNS entries necessary for the Certificate Manager service to deliver SSL certificates authenticating my client's domain.
- The service makes it possible to define a dynamic “A” record (an alias) at the root of the domain, while RFC 1034 does not allow a CNAME (which cannot co-exist with other records) to be positioned at the root.
We created origin.mydomain.fr
type records in this zone and my client did the required work on their webserver to process requests made to this address (including with a TLS certificate so that CloudFront - origin traffic is encrypted in-transit).
Once this was tested, we switched the DNS entries for mondomaine.fr
and api.mondomaine.fr
to CloudFront.
To avoid WAF bypass (in case the hacker discovers the origin URLs or simply directly uses the IP of my client's server), CloudFront was configured to send a "secret" header with each origin request, making it easy for on-premise infrastructure to filter any bypassing traffic.
The result is immediate: at 8pm. we made the switch. The site immediately became fully available again. At 9pm. the attacker stopped the attack (before waited for the next day for his next attempt)
The image below shows allowed traffic in orange and blocked traffic in blue. We therefore had 6000 requests per minute, more than twice the usual traffic:
A word on cost / FinOps
WAF costs $0.60 per million requests analyzed using basic managed rules (the group that includes all of our rules except one). That's less than $5 per day to protect my client.
Be careful though! Advanced rules like Account Takeover Protection are billed (after a free tier of 10,000 calls) $1 per 1000 (yes, 1000, not 1,000,000) calls.
And at the beginning, our configuration looked like this this:
In 24 hours, we burned $700 worth of WAF usage. Fortunately, I had set up cost anomaly alarms when designing the landing zone! It took just a support ticket (category “dispute a charge”) for AWS to gracefully clean our slate! [Nb: in my experience, AWS always clears high slates resulting from configuration errors; this very good commercial policy is one of the reasons, along with the quality of their support, which makes it my favorite cloud provider].
In short, we corrected it by placing the ATP rule in last position in order of priority and, above all, by conditioning its execution on the presence of a label placed by another rule which tags requests on the /connection
path.
Relief all the same when we see the traffic passing by the ATP rule go down!
An additional benefit of Cloudfront
After a well-deserved rest, it was time to add an additional benefit for my client: activating Cloudfront cache for all the static resources served by the application.
Thanks to Terraform, it's not very complicated: the following block allows you to hide all the gifs.
ordered_cache_behavior {
path_pattern = "*.gif"
allowed_methods = ["GET", "HEAD", "OPTIONS", "PUT", "POST", "PATCH", "DELETE"]
cached_methods = ["GET", "HEAD", "OPTIONS"]
target_origin_id = local.origin_domain
viewer_protocol_policy = "redirect-to-https"
cache_policy_id = aws_cloudfront_cache_policy.cachingoptimizez_with_v_header.id
origin_request_policy_id = "b689b0a8-53d0-40ab-baf2-68738e2966ac" #Hard-Coded: Forward all headers EXCEPT HOST, cookies and query strings
}
Here too, the effect is immediate. A few minutes later, almost 90% of requests were served by CloudFront, relieving my client's infrastructure of quite a load and improving time-to-full-load for clients!
Let's talk!
If you need help migrating to the Cloud, helping your dev teams take advantage of the many services available, do not hesitate to contact me via LinkedIn or my website.
Top comments (2)
great content! what about the shield standard? in theory that is enabled for any network edge solution on AWS.
Hi @acontreras_mp !
Shield standard will only protect your workloads against Network (Layer 3) and Transport (Layer 4) attacks. So bots launching lots of TCP SYN packets for instance.
In this case (notwithstanding the fact that the actual workloads didn't run on AWS), the attacker bothered to run application-level queries (so they actually connected at TCP layer, established TLS-encrypted connection, and then sent HTTP payload).
Overloading the server with those is more costly for the attacker but requires less connections (thousands, not millions) than network-level attacks.
Such attacks are those covered by WAF (I'm not familiar how Shield Advanced, which also includes WAF, would have blocked it).