Frangeris Peguero for Cloud(x);

Posted on Nov 28, 2022

Reverse proxy cache using AWS CloudFront

#tutorial #aws #cache #cloudfront

How to implement a reverse proxy cache to any API using AWS CloudFront

A reverse proxy is the application that sits in front of back-end applications and forwards client (e.g. browser) requests to those applications. Reverse proxies help increase scalability, performance, resilience and security. The resources returned to the client appear as if they originated from the web server itself.

AWS CloudFront is a CDN service for high performance and security convenience that offers a lot of advantages including a global edge network with a low latency and high throughput network connectivity (the one that matter to us).

One typical example where we could be needing a reverse proxy cache mechanism is when building HTTP APIs (API Gateway v2) on AWS, this type of APIs are designed with minimal features so that they can be offered at lower price, lacking options as edge optimization, support for api keys, throttling and cache, more detailed comparison here; not having support for cache means processing time load will increase on backend side on origin servers, resulting in high latency on every request.

As it turns out, CloudFront solves this problem nicely.

For simplicity we will be using Serverless Framework v3 to handle AWS stack creation.

Start by creating a HTTP API (API Gateway v2)

A very basic serverless api deployment should be working and usable to be able to configure CloudFormation distribution on top of it.

If you don't have previous experience with serverless, follow this link on how to do it, just remember to select "HTTP API" as is the one that doesn't have cache support already built in.

Let's start by defining the type of api we need and some basic function to be able to exemplify:

# serverless.yml

provider:
  name: aws
  # ...
  httpApi:
    name: "myapi"
    cors: true

functions:
  hello:
    handler: src/handler.hello
    events:
      - httpApi:
          path: /
          method: get

Regarding to what data we will be returning, lets run a process that sleep for 5 seconds to simulate some background process that "take too long" to complete using the Timers API, something like this:

🎉 After a correct deployment, the api should be created successfully and ready to use.

Now that the API is live and usable, we can make request by just calling the endpoint provided:

curl --location --request GET 'https://644z4ooroe.execute-api.us-east-1.amazonaws.com/'

This will return the response we explicitly send back in our lambda, BUT, after 5 seconds:

If you notice the time taken to complete the request 5.21s is the time we setup to sleep, this time is also influenced by the spin up (known as freeze time) of lambdas, consecutive requests will decrease the time needed by the script to return data but only by a few ms.

So what happens if we cache this response not to wait those 5s of processing time?

Let's configure CloudFront as a reverse proxy

The process consists in creating a distribution using the API domain as origin, enabling the built-in cache inside the distribution and controlling the caching time by TTL.

Following the Amazon CloudFront resource type reference we will create the distribution directly from serverless template and connect it to the previous created API as our origin.

We need to create two resources to be able to create the distribution:

AWS::CloudFront::CachePolicy
AWS::CloudFront::Distribution

Inside the serverless.yml file (at the end), let's create a new section: resources where we can add resources that will be created for us inside AWS by the sls deploy command, those resources are:

resources:
  Resources:
    # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-cloudfront-cachepolicy.html
    mycachepolicy:
      Type: AWS::CloudFront::CachePolicy
      Properties:
        CachePolicyConfig:
          Name: mycachepolicy
          # We can custom or TTL values below
          DefaultTTL: 86400
          MaxTTL: 86400
          MinTTL: 1
          ParametersInCacheKeyAndForwardedToOrigin:
            EnableAcceptEncodingGzip: true
            EnableAcceptEncodingBrotli: true
            CookiesConfig:
              CookieBehavior: none
            HeadersConfig:
              HeaderBehavior: none
            QueryStringsConfig:
              QueryStringBehavior: none

    # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-cloudfront-distribution.html
    mydistribution:
      Type: AWS::CloudFront::Distribution
      Properties:
        DistributionConfig:
          Enabled: true
          Origins:
            # auto generated by serverless, also removed "http:" as is not allowed in domain name, is going to use the default API URL generated by AWS, if you have a custom api url, just replace it here
            - DomainName:
                !Select [1, !Split ["//", !GetAtt HttpApi.ApiEndpoint]]
              # this value should be moved to a custom global var instead of duplicating the same string below
              Id: mydistributiondomainid
              CustomOriginConfig:
                OriginProtocolPolicy: https-only
          DefaultCacheBehavior:
            CachePolicyId: !Ref mycachepolicy
            DefaultTTL: 300
            TargetOriginId: mydistributiondomainid
            ViewerProtocolPolicy: https-only
            # List of allowed method acceded by cache, only GET for our case
            AllowedMethods:
              - GET
              - HEAD
          # all means all edge locations (recommended)
          PriceClass: PriceClass_All

Dont forget to run the re-deploy to update the AWS stack with the new config, if everything works, we should be able to make request to the cloudfront URL and it will cache the responses from the origin.

⚡️ Here is the final requests, the first is to our origin, the second to cloudfront cache.

🤯 As we can see, the response time is absurd in comparison just by enabling a cache.

Remember the very first request (miss cloudfront) will have the same load time as the origin due to will populate the first time the cache.

All the code is available here if you want to test it.

Hope it helps, cheers 🍻

DEV Community

Reverse proxy cache using AWS CloudFront

How to implement a reverse proxy cache to any API using AWS CloudFront

Start by creating a HTTP API (API Gateway v2)

Let's configure CloudFront as a reverse proxy

Top comments (0)

Read next

Building an Event-Driven Architecture for Content Embedding Generation with AWS Bedrock, DynamoDb, and AWS Batch

Elastic Load Balancing (ELB): Ensuring High Availability and Reliability

Day 18: Deploying Docker to the Cloud

Speak the Command, Execute the Lambda