Andrew

Posted on May 7, 2020

What's the Best Way to Set The Cache-Control Header?

#cache #cachecontrol

As a web developer, we spend lots of time minimizing our files and API response in order to allow our customers to have a better user experience. What if we can make it better by reducing the response to just a few hundred bytes or even zero? Implementing a better cache control policy will help us to reach that goal. In this article, I will list a few strategies about how to properly set the response header to allow the browser to handle the cache for us.

Gotchas

Before we start, there are a few gotchas I want to point out first.

When we are testing our cache setting, pressing "enter" on the address bar rather than refresh page with F5. Because some browsers will send the request with header Cache-Control max-age=0 and overwrite our cache policy when we refresh the page. Check here for more information.
Not all CDN providers follow the cache-control header (eg. some CDN providers follow max-age rather than s-maxage), you might want to check with your CDN provider before you start modifying the settings.

HTTP header - Cache-Control

There are tons of settings for cache-control. To make this article easier to understand, here I only mention the attributes that I'm going to apply to my examples. If you're interested to know more about that. You can check here to gain more information.

Cacheability

public: The response can be stored by proxy or browser
no-cache: The response can be stored by proxy or browser, but the stored response MUST always go through validation with the origin server first before using it.

Expiration

max-age: The maximum amount of time a resource is considered fresh.
s-maxage: Overrides max-age or the Expires header, but only for shared caches (e.g., proxies).

Revalidation and reloading

must-revalidate: Indicates that once a resource becomes stale, caches must not use their stale copy without successful validation on the origin server.
immutable: Indicates that the response body will not change over time. The resource, if unexpired, is unchanged on the server and therefore the client should not send a conditional revalidation for it.

HTTP header - ETag

The ETag HTTP response header is an identifier for a specific version of a resource. It lets caches be more efficient and save bandwidth, as a web server does not need to resend a full response if the content has not changed.

	Request	Response
1st request	GET /	200 OK Cache-Control: max-age=0 ETag: W/"a46ba20afd"
ETag haven't changed	GET / If-None-Match: W/"a46ba20afd"	304 Not Modified
ETag was changed	GET / If-None-Match: W/"a46ba20afd"	200 OK Cache-Control: max-age=0 ETag: W/"7a48833148"

Another use case about ETag is avoiding mid-air collisions, it's not related to cache control so I won't dive into that. Check here if you're interested.

Cache Strategies

Apply cache can not only save the bandwidth but also prevent the latency to allow the user to get the result faster.
But at the same time, we also want to make sure our users can always see the latest version of the application. Therefore, it's extremely important to config the cache-control header properly. Normally, we need to base on the type of our pages and files to apply different cache strategies. Here I list a few types of the setting for different scenarios.

1. Always Revalidation

If the page is changed frequently, like the list page for the e-commerce website. Then we should let the user check if there is any new information on every request. Therefore, we can set the cache-control as below.



Cache-Control: public, max-age=0, must-revalidate;

The behavior will looks like the table below.

	Request	Response
1st request	GET /	200 OK ETag: W/"a46ba20afd"
next request and ETag haven't changed	GET / If-None-Match: W/"a46ba20afd"	304 Not Modified
another request after ETag was changed	GET / If-None-Match: W/"a46ba20afd"	200 OK ETag: W/"7a48833148"

You can play around with this demo page. The etag will change every 10 seconds.

2. Long Term Caching

For some content that is hardly changed, then we can set a longer max-age and add the immutable property. This will allow the browser to use the catch from disk or memory if the age is within the range. Therefore, we can prevent sending an unnecessary request for saving bandwidth and also reduce the latency.



Cache-Control: public, max-age=604800, immutable;

The behavior will looks like the table below.

	Request	Response
1st request	GET /	200 OK
within max-age	N/A	200 (from disk cache)
over max-age	GET /	200 OK

You can play around with this demo page. The max-age is 30 seconds.

Long Term Caching is suited for static files like javascript, css, or images. But what if we need to change it within the max-age? Normally, when we apply the Long Term Caching, we will add a hash string into the file name (eg. filename.[hash].js), So we can force the browser request the new file when we change the content.

Benefits

Now let's talk about the benefits of cache-control. The biggest reward will be the cost-saving on the bandwidth. In reference to AWS CloudFront, the transmission cost is around $ 0.1/GB. But how much bandwidth can we save after applying a different cache-control policy?

1. Always Revalidation

If we apply the Always Revalidation policy, our server will respond 304 rather than the full content if the browser already has the latest content and the content isn't changed. Therefore, the bandwidth we can save is based on how many users will revisit the website and how often the content changes.

For example, if our website's content changes once per week. And every week, about 10% of pageview is coming from the revisit user. Then the total bandwidth we can save will be around 10% (if we ignore 304 response size).

2. Long Term Caching

If we apply the Long Term Caching policy, the browser will use the cache directly without sending the validating request. So we save more bandwidth. The drawback is that we will have to change the file name if we need to change the content within the cache time (max-age).

Example

If you still feel a little confused about cache-control, let me try to explain in an example.
If our website has only one page. It contents one html, one css and one javascript file. And Bob is our loyal user who needs to reference the information on our website every day.

The content change about one time per week (change html)
We release a new version every week (change html, css, js)
We upgrade third party libraries every half year (change js )

The table below is how many times Bob needs to download our files in a year. We can see how much bandwidth we can save after applying a better cache-control policy.

	size	without cache	Always Revalidation	Long Term Caching
index.html	50KB	*365	*52	*52
index.css	50KB	*365	*52	*12
index.js	100KB	*365	*52	*12
Total		71.29MB (100%)	10.16MB (14.2%)	4.3MB (6%)

Moreover, because we only update third party libraries twice per year, so if we separate our javascript file into two files, one is our own script - main.js, another is third party library - vendor.js. Then we will be able to save more bandwidth. You can search code-splitting to get more information about it.

		Long Term Caching + code splitting
index.html	50KB	*52
index.css	50KB	*12
main.js	50KB	*12
vendor.js	50KB	*2
Total		3.8MB (5.3%)

Conclusion

That's all. Thanks for reading, I hope this article can help you have a better understanding of cache-control.

DEV Community

What's the Best Way to Set The Cache-Control Header?

Agenda

Gotchas

HTTP header - Cache-Control

Cacheability

Expiration

Revalidation and reloading

HTTP header - ETag

Cache Strategies

1. Always Revalidation

2. Long Term Caching

Benefits

1. Always Revalidation

2. Long Term Caching

Example

Conclusion

Reference

Top comments (0)