Ján Regeš

Posted on Apr 9, 2021 • Edited on Jan 12, 2024

How to build a CDN (1/3): introduction and basic components

#webdev #webperf #devops #performance

If your projects have high traffic and you need to deliver a lot of static files, there is nothing easier than getting a commercial CDN. But if you’re a technology enthusiast like us, you can build a real CDN yourself.

This is the first in a series of three articles and aims to introduce you to the issue and describe the basic components that make up the CDN (Content Delivery Network). The next two articles will describe the technologies used and their configurations, as well as various other tips regarding the operation, functionality and monitoring of the CDN.

Motivation to use CDN

The main motivation for using the CDN is clear — to ensure fast and reliable loading of the website and its content for all visitors over the world. But if you care about the operation of projects with a monthly traffic of millions of users and the traffic just from JS/CSS files is tens of TB, then sooner or later you will get to a state where your 1 Gbps internet connection simply stops to be sufficient.

Websites are usually composed of dynamic and static content. Dynamic content usually includes generated HTML code and data from various APIs (typically REST or GraphQL). Static content is made up of files such as javascripts, styles, images, fonts, or audio/video. A typical ratio for our projects is that dynamic content makes up 10% and static 90% of the total data transfer.

If you have a really high number of visitors, the introduction of the rule that static files are cached in a browser valid for one year will not help you much. Changing the contents of a file then requires a file with a new name or some “version” in the query parameter to force the browser to download the new file. If you do a release every few days, even if you use JS/CSS chunks, at least some part of JS/CSS will be recompiled and every visitor must download it.

Then, when you reach gigabit at the peak of traffic, you start to deal with what to do next and thus look for a CDN.

The main benefits of CDN from our point of view

Speed for global visitors — if you have a project hosted on servers in only one specific country, this increases the loading time of the pages in proportion to the distance. So the further around the world, the slower on the screen. The reason is high latency and low baud rate. But be careful here — if you have the vast majority of visitors from the local country (for us it is Czech Republic), make sure that your CDN provider has servers (PoPs) in the Czech Republic as well. Otherwise, the load speed for your primary users may slow down after deploying CDN. The Czech Republic is a small country, but has TOP-quality data centers and connectivity providers. Loading content only from foreign countries PoPs would be disadvantageous to Czech visitors.
Speed for local visitors — all browsers have a limit on the maximum number of concurrent requests per server IP address. If the browser can load content from multiple different domains and IP addresses (domain sharding), it will allow more parallelization and the content will load into the browser faster. This is especially important for JS/CSS/images and fonts that are part of the initial rendering of the page. HTTP/2 with multiplexing helps to solve this problem very well, but only to a certain extent. Based on real requirements/rendering tests, we conclude that even with HTTP/2 streams, where there are dozens of files in one stream, the resulting page display is slower than with the involvement of a CDN on another domain than the site itself.
Reduce the load on primary servers — if you don’t have a CDN, your primary servers and their connectivity must handle both dynamic content requests and relatively trivial static data requests. This is inefficient because the optimal server configuration for dynamic content is little different than for handling static files.
Content Optimization — A good CDN also provides tools for data/binary optimization of static content. As a result, less data is transferred and pages load faster (brotli compression, WebP or AVIF images).
Cost savings — even though it has long been possible to get an almost unlimited “thick” line to your primary servers, the jumps are quite drastic — why pay 10 Gbit, when 1 Gbit is enough for us 90% of the time?
Simplifying the life of DevOps — if the configurations of file/web and application servers for maximum performance and security are fine-tuned, then it is necessary to have all possible metrics from real operation. If the traffic for dynamic and static content is strictly separated, then the statistics are cleaner. It is therefore possible to make better decisions and optimize performance and security parameters exactly tailored to the specific workload.

Why we decided to build our own CDN

There are many commercial CDNs on the market, for example see at CDNPerf . The best known include CloudFlare, Amazon CloudFront, Google, Fastly, Akamai, KeyCDN or our favorite and recommended BunnyCDN or CDN77.

Our projects are most often visited by clients from the Czech and Slovak Republics. Unrivaled in such a case, both in terms of function, price and immediate professional support, is CDN77 and their awesome network. It is one of the best CDN to cover traffic from around the world. Their very strong ability is also video streaming for the world’s largest high-traffic projects.

Because we don’t want to invent a wheel in SiteOne, we first looked to see if any of the above-mentioned providers would suit us. Our requirements were:

100 TB data transfer per month (majority from Europe).
Low latency and fast transfers in the Czech Republic/Slovakia.
Very good coverage of the whole of Europe, good coverage of North America and sufficient coverage of other continents.
HTTP/2 (and fast deployment of HTTP/3 after it is more standardized).
Brotli compression, which is even 15% — 30% more efficient on text files than gzip (LZ77 + dictionary).
Automatic JPG/PNG conversion → WebP/AVIF, if supported by the browser (reduces data transfer without noticeable loss of quality by 30% to 1,000% depending on how much the source JPG/PNG has already been optimized).
TLSv1.3 with 0-RTT (zero round-trip) significantly reduces the hand-shake communication time of browsers with servers.
API for selective cache invalidation using regular expressions. Ideally with support for cache tagging by response HTTP header like X-Cache-Tags or X-Key.
DDoS protection & Web Application Firewall (WAF)
Access to logs and statistics.
100 GB of storage (typically for videos and large image libraries).
Custom HTTPs certificates.

Finding a provider that meets most requirements was not a problem. The problem was the price. For progressive players (such as BunnyCDN or CDN77) you can buy a service for about 1 000 EUR/month, for other leaders in the CDN market, the costs start at 3–4 000 EUR/month and increase in multiples. If you start working with such amounts as a budget for building your own CDN, the return on investment (ROI) will become more than interesting. Of course, there are other price-friendly providers on the global market, but usually their coverage in the Czech Republic/Slovakia is very weak, so they cannot be recommended for primarily local projects.

Combining the above requirements with our enthusiasm for IT challenges, we have come to the conclusion that we will build our own CDN. The resulting (but our own) CDN is not nearly robust as that of commercial providers, yet it meets all our requirements. A big advantage is the fact that we can scale very quickly according to our real needs, at low cost.

Another of our motivations for our own CDN is that we use GraphQL for all web projects in recent years. Unlike REST, this cannot simply be cached on a reverse proxy or CDN, because everything is POST requests to one single URL endpoint. Of course, there are already attempts in the world, however, no commercial CDN offers a sophisticated cache of POST requests. We have types of projects where clever selective caching of POST requests at the CDN level (probably written in Lua) could greatly ease application servers. For us, this is another useful benefit that commercial CDNs will not offer for a long time.

At the end of this chapter, it should be noted that our CDN is designed primarily for handling static files and its deployment on the web does not require any changes in the DNS origin of the domain. Therefore, our CDN do not serve as a proxy for absolutely all requests to the website (which is the usual way of deploying commercial CDNs), only to static files. To deploy our CDN, it is necessary to prefix file paths with our main CDN domain, which can be solved also very easily without the need to intervene in the application itself, eg. using the output filter in Nginx (sub_filter).

CDN components

In order for our CDN to meet all the required parameters, we first had to provide all the components and processes that are needed to operate a quality CDN. And of course learn some new areas. Because we manage more than 120 servers for our other projects, we had everything we needed to handle it technically and procedurally.

The following chapters describe in more detail the individual components of the CDN that you will need:

Domain — used mainly for configuring GeoDNS rules and possible referencing of other domains via CNAME.
GeoDNS — a network service that will direct visitors to the nearest servers according to your settings and requirements.
Servers — strategically located around the world, in order to minimize latency for visitors and maximize transfer speed.
Technologies and their configurations — fine-tuned operating system and reverse proxy with caching and content optimization (brotli compression, WebP, AVIF).
Operational tools — you will have many servers and need to solve orchestration, backup, monitoring, metrics, logs and much more.
Auxiliary applications — background processes that provide, for example, static brotli compression or conversion of images to WebP/AVIF.

Domain

First, choose and buy the second-level domain on which you will run the CDN. It is ideal to choose a domain that you achieve “cookie-less” requests. During heavy traffic, every byte saved is counted. In the examples of the article we will use “company.com” and its subdomain “cdn.company.com”.

You will manage the DNS zone file for this domain with the GeoDNS provider(s) of your choice.

Get an SSL/TLS certificate for the domain, whether from Let’s Encrypt or a commercial Certification Authority (CA). Consider a wildcard certificate, which will make your life easier if you use more than one subdomain. You can get trusted wildcard certificates from as little as 40 USD/year. I recommend, for example, ssl2buy.com and give a few seconds google the discount code. You will often get an identical certificate from the same CA for 30–40% of the price than elsewhere.

To prevent attackers from spoofing other IP addresses for your domain, setup DNSSEC for your domain. Check the correct DNS configuration yourself with the Zonemaster tool from CZ.NIC. We had to temporarily deactivate DNSSEC on our CDN because we use two DNSs in primary-primary mode (for each of them, GeoDNS rules and failovers are defined differently). In this mode, setting up DNSSEC on both providers is difficult because they would both have to share the same private key, or some other solution. So far, this manual intervention is complicated for providers, but they have promised to allow it in the future.

Whether you use this domain directly in URLs or just as a hostname so that you can route to the CDNs of other domains via CNAME is up to you.

GeoDNS with failover support

What you need GeoDNS for

A critical component of a real CDN is an area of interest, let’s call it: GeoDNS . You can also find it under the names IP intelligence , GeoIP , Geo-based routing, Latency-based routing , etc.

GeoDNS is a network service that translates a domain name into an IP address(es), taking into account the location/country from which the visitor comes. If someone is interested in details, they can study them in RFC 7871 (Client Subnet in DNS Queries) .

We, as the administrator of the GeoDNS settings of our CDN domain, can define various rules from which continents/states the traffic should be directed to which IP addresses (PoPs in specific states). To be precise — PoP (Point of Presence) can technically mean only one server or more servers, in front of which is a load balancer (typically eg HAProxy).

Because we needed to rent servers abroad and from various providers, in addition, we do not have many years of experience with some, so we needed to solve the guarantee of high availability. Therefore, the critical functionality of GeoDNS is also automatic failover — the ability to monitor the availability of individual PoPs and the immediate elimination or replacement of unavailable or non-functional PoPs in the CDN topology.

In practice, it looks like our URL status is monitored every minute on each PoP. When it starts to fail from more than one place at once, the set failover scenario is automatically activated, which, according to our per-PoP consideration, has 2 main forms:

Deactivation of the DNS record — in such a case it will direct traffic only to the second secondary PoP in the given locality (if any), or visitors will start directing to the default PoPs (in our case all in the Czech Republic).
Replacing the IP address with another — with this setting you can say “If the PoP in Paris goes out in France, let the traffic go to the nearby PoP in the Netherlands instead, and if it doesn’t happen by accident, to the PoP in Germany“.

Due to the minute TTL, a really non-functional PoP is deactivated or replaced by another, no later than 2–3 minutes to all end visitors. However, if you have at least two PoPs defined for each location (DNS resolves to 2 IP addresses), then the browsers will be able to cope with such an outage, and visitors may not even know the critical 2–3 minute moment, which we describe in the next chapter. If you have only one PoP defined for a site and you do not have a backup PoP defined for it, then visitors from this site are in case of failure to route to the default PoPs, which are set as default for “the rest of the world”.

Even given the minute TTL, it is necessary to think about the speed of DNS translation, this also has a significant effect on the page load speed. We therefore recommend that you choose a DNS provider that has anycast NS (Name Servers) worldwide. Cloudflare leads in the speed ladder, see benchmarks on DNSPerf.com . With a global DNS provider, you can be sure that your domain will be translated into units of up to tens of milliseconds around the world.

Browsers also help with high availability

Because high availability is essential for us, we use the native functionality of browsers, which can work with the fact that our CDN domain will be translated in all major locations to multiple IP addresses from different providers. The real behavior of browsers is then such that the browser randomly selects one of the IP addresses and tries to make requests to it. If the IP address is unavailable, the browser will try another IP address after a few seconds.

Failure of one of the IP addresses / servers / providers will not cause the required content to malfunction. It will only take a little longer to load the page. Today’s browsers are already really smart and very helpful in terms of outage detection, connection recovery and auto-retry logic. The driving force of this area are mainly mobile/portable devices, where there are frequent mini outages due to switching connectivity between BTS in mobile networks, their alternation with WiFi networks, etc.

Unfortunately, we have not yet found any publicly available information/specifications that would specify exactly how these auxiliary functionalities are implemented in individual browsers. We therefore rely only on our own tests and analyzes of behavior from current versions of individual browsers.

If you have studied this unique issue, share the information in the discussion with us :-)

Which GeoDNS provider to choose?

There are many GeoDNS providers to choose from — it is worth mentioning Amazon Route53, ClouDNS, NS1, Constellix GeoDNS, FastDNS from Akamai, EasyDNS, UltraDNS from Neustar or DNS Made Easy.

Due to high availability, we do not recommend relying on only one DNS provider, even if it has NS servers worldwide, with anycast IP addresses. Likewise, the distribution of changes is usually solved by one “central brain” and once every few years there are defects that eventually affect more or all NS servers at once (real experience from 2019). Therefore, we decided to go the route of redundant primary-primary settings, where we run all GeoDNS settings at two completely independent providers.

This is a bit annoying, because the AXFR protocol for DNS synchronization of GeoDNS zones does not support the problem, so we have to manage everything manually with two independent providers. We tested six GeoDNS providers and due to their grasp of “GeoDNS rule modeling” and monitoring, we cannot imagine that someone would propose a uniform specification for GeoDNS issues in order to synchronize DNS zones.

We at SiteOne have chosen for GeoDNS as the first ClouDNS provider to offer excellent options for setting up the “geo rules” themselves and an automatic failover with multiple behavior options. The provider has DDoS protection, has anycast IP addresses and low latency from the Czech Republic/Slovakia. It also provides traffic statistics and has very decent limits and pricing due to the number of DNS requests (in the basic GeoDNS package there are 100M queries per month).

The big advantage is non-stop chat support 24/7, which can answer technical questions in a matter of minutes, or tailor the price program, even if you do not fit into any of the pre-prepared packages.

As the second DNS provider, we chose the company Constellix (sister of DNS Made Easy), which offers similar options for setting up GeoDNS issues, monitoring and failover as ClouDNS. The strength of Constellix is the definition of weights (traffic distribution) in some situations.

At first, we also liked Microsoft Azure and its Traffic Manager, but in the end we gave it up because it didn’t give us the ability to manage traffic in some countries the way we wanted. However, Azure pleasantly surprised us with its pricing policy in the area of DNS compared to other global cloud providers, such as Amazon or Google.

Route53 from Amazon is also worth considering, which is more cost-effective if DNS resolves to IP addresses in AWS. However, if you send tens of TB or more from AWS per month, then expect monthly costs in the thousands of USD/EUR. But you already have the same or more expensive as if you conveniently rent a commercial CDN.

For all GeoDNS providers, however, the price depends mainly on the number of DNS requests and the number and frequency of health checks. In other words, from the number of PoPs you have in the CDN, or from how many places around the world you have them monitored to eliminate false positives and, of course, the monitoring frequency, which can usually be set from 30 seconds to tens of minutes — our default is one minute. You can also reduce the price for DNS requests many times by increasing the TTL for individual DNS records. However, and of course at the expense of the speed of a possible auto-failover, because the recursive NS cache will keep the translations longer in their cache.

For the biggest pioneers, there is also a variant to build your own GeoDNS service with your own name servers. But for this to make sense and real value, anycast IP addresses would be needed. Also a number of other reliable servers around the world with DDoS protection and then understand, select and adapt eg EdgeDNS or Czech Knot DNS (which also uses Cloudflare). However, commercial GeoDNS services are relatively cheap and reliable, so we can’t imagine an ROI that would make sense with our own small, non-commercial DNS solution.

Servers

GEO server layout and provider selection

If you are going to build your own CDN, then take into account that if it is to be a real CDN, you will need 8–10 servers around the world in even the smallest setup. We currently have twenty production and three test ones. We also have two development PoPs, available only on the internal network, that developers can use to deploy CDNs to internal development domains as well.

The main goal of CDN is to provide visitors around the world with the lowest possible latency and the highest transfer rate to the data that CDN caches locally.

The ideal situation is if you have the opportunity to analyze visitors to projects for which you use the CDN. If you know from which continents/countries what traffic and which data transfers you handle, then you can strategically decide on which continents and in which countries you will place your PoPs.

In the beginning, you won’t have servers in every state, and probably not on every continent, so consider “catchment areas.” However, based on real latency and traceroute measurements, you will often be surprised that the latency between ISPs in each state does not correspond to geographical proximity. Peering between states and individual ISPs is different, very often “neighbor is not neighbor”. E.g. from Finland, you may have significantly lower latency to the Czech Republic than to Poland for some providers. If you do not yet have any servers abroad through which you could perform measurements, the WonderNetwork.com tool can also help you . This tool shows the latency between different cities of the world, vice versa. Of course, this is a fee for the ISP used in this tool, but it is sufficient for orientation.

Do a good market research when choosing a server provider and connectivity. Of course, price is not the only or last factor, but it must not be the first. We focused on:

Provider quality and reputation — In each state, 2–3 robust providers usually stand out, who should be the most reliable. Their robust infrastructure should be better able to withstand potential DDoS attacks. We do not recommend small and unverified providers.
Local and global connectivity of the provider — it is necessary to take into account that the servers will handle large traffic. Partly in their own country, some are catchment areas for other states as well. Therefore, focus on studying and comparing their connectivity abroad. A quality provider describes its connectivity on the web because it is usually proud of it. SuperHosting , which we have part of our infrastructure for 15 years, does great for us .
Quality support — sooner or later some problems will definitely occur and it is necessary to react quickly. As a first test, you can choose to communicate with support about what line the server will actually have available (usually 100 or 1,000 Mbps), what aggregation it has, and what they mean by “Unlimited traffic.” If this includes your estimated XY terabytes per month that the server will need to handle. You can ask the second question to the possibilities and functioning of their DDoS protection.
The expected data traffic on a given server should ideally be included in the price, or there should be a clear pricing policy in advance.

Our CDN currently counts 20 PoPs and each is from a different provider. So far, our primary Czech/Slovak visitors are covered by six PoPs (4 × Prague, Brno and Bratislava). Then Germany (two PoPs) and Poland (two PoPs) for part of Eastern and Northern Europe. We also have one PoP in France, Italy, England and Hungary. The two PoPs also cover North America. South America is covered by only one PoP in Sao Paulo. Africa is covered by one PoP in Cairo, Australia by one PoP in Sydney, the Russian Federation by one PoP in Moscow and Asia by one PoP in Mumbai. These PoPs also include selected neighboring states, where it made sense to us according to the measured latencies.

In the next chapter, you will also find information on how you can cover various secondary sites very effectively with the help of a commercial CDN, if it makes functional and economic sense. For our CDNs, this makes sense to us, so we have covered most of the non-redundant sites described above with commercial CDNs, and we only have some our PoPs as a backup.

Recommendation : select at least two independent providers in each important location — ideally with different foreign connectivity. Try to ensure that at least two independent PoPs (IP addresses) are resolved in each DNS site. In the event of a failure of one of the PoPs, visitors will not have to wait 2–3 minutes for DNS failover, because browsers detect this and immediately switch traffic to the other IP address. In current browser versions, you will only see “ Connecting…” for 2–3 seconds and the content will then be read immediately from the second IP address.

Tip: You can test the quality of your CDN topology (especially with regard to latencies from different parts of the world) using the online tool MapLatency.com . This is great in that it measures latency from endpoints at different ISPs, which means that it measures more realistic latency of visitors to your CDN, not just from servers/datacenters. For us, the coverage of Europe is key and we have it very good for our needs (see screenshot). The CDN Latency Test from CDNPerf fulfills the same purpose — but it measures latencies from data centers, not from end devices.

Use of commercial CDN for better coverage

At some point, you will be very sorry (as well as we) that you will not give visitors in remote corners of the world (for us it is mainly Africa, Asia, Australia and South America) such comfort (latency and transfer speed) as in Europe. But even that has its own effective and simple solution.

You can cover remote corners of the world with a commercial CDN provider that has a robust infrastructure and strong coverage in these locations as well. Because these are low-traffic secondary sites (hundreds of GB to TB units per month), you can take advantage of a pay-as-you-go CDN provider and cost you a few tens or hundreds of dollars a month. On the one hand, this may seem like parasitism, but on the other hand, when we examined the IP addresses of commercial CDNs in different countries, we found that some providers shared their own infrastructure in different locations. So it’s not unusual. We all want to deliver maximum value to our clients, but at the same time we have to think about the economy and operating costs.

How to set it up?

The commercial CDN will provide you with a hostname , usually under their 3rd order domain managed in their GeoDNS (eg “mycompany.cdn-provider.com”), to which you can point your CDN domain through CNAME.
For a commercial CDN, set it to “listen” to your “cdn.company.com” domain in addition to the hostname mentioned above . You will also need to set up an SSL/TLS certificate. The provider will probably offer you the opportunity to use Let’s Encrypt, but we recommend using your own SSL certificate purchased from a public CA, uniform for all PoPs. If you have different certificates in different locations and, moreover, with a short validity, it will not be possible to use SSL pinning, which you may need in some situations.
For your GeoDNS provider, route the CNAME of your domain in all secondary locations to the hostname of the commercial CDN. Technically illustrated: set it to “(Africa) cdn.company.com → CNAME mycompany.cdn-provider.com”.
You must avoid loops . You must not tell the commercial CDN to listen to “cdn.company.com” and at the same time set it as the original domain. The African PoP would have resolved the DNS origin to itself. To prevent such looping, you must ensure that a few major PoPs will listen on the domain, eg “cdn-src.company.com” (it directs A records to eg the three main PoPs in the EU). You then set “cdn-src.company.com” as the origin, so if the PoP commercial CDN does not have the file in its cache, it will download it from one of the main PoPs in the EU through “cdn-src.company.com”.
If, over time, you find out from statistics and billing that it will be more advantageous for you to cover a location with your PoPs due to increased traffic, then you always have the option and you can deploy it without an outage.

The disadvantage of secondary sites is that they are very far from the origin servers, and it is likely that most first visitors will wait quite a long time before the cache heats up. Therefore, it is advisable to prepare a background process that will regularly push these most queried files into the commercial CDN storage from the TOP requests statistics. There will be a good chance that remote visitors will be able to retrieve content from the local PoP immediately, even though it was called for the first time at that PoP.

Hardware

If you already have selected providers, you still have to choose a specific physical or virtual server from their menu. This of course depends on your budget. But also decide how important the site is to you and your visitors.

A few of our verified recommendations

Virtual vs. physical server — this is a rather controversial topic and it is not appropriate to generalize it. If the economy allows, choose physical servers for critical servers, even if only those from the basic menu. Redundant disks are a must, ideally with redundant power supply. With a physical server, you usually get a 1 Gbps uplink and a direct physical connection directly to the ToR switch. There is a much lower chance that you will struggle with sharing CPU and IO or connectivity on a physical hypervisor running hundreds, or dozens of virtual servers at best. If you’re lucky, they have a shared “tube” of * × 10 Gbit, or worse, they have 1 Gbit. With authenticated providers you don’t even have to worry about virtual servers, just watch the aggregation and performance (eg benchmark nench). Over time, the collected metrics will also tell you a lot, especially for redundant PoPs that will handle ± the same traffic (DNS round-robin). As a result, we have very quickly detected very aggressive CPU throttling or volatile IO performance at some providers.
CPU — if you do it smartly and tighten the static gzip and brotli compression correctly, you will be able to handle hundreds of Mbps even with 1–2 CPU cores. However, if you do not provide static compression and ad-hoc compress each request, you need at least 4–8 cores. It is good to choose a modern CPU with a high clock speed (turbo at 3 GHz+). By the way, the absence of static compression is something that, according to our benchmarks, commercial CDNs are often missing, and as a result, they send textual content much more slowly than with it.
RAM — the minimum is 1 GB, but the more, the better. This is because the cache filesystem (PageCache) is stored in RAM. Usually, this cache will contain most of the small but most downloaded files (typically JS/CSS/fonts). The more of them fit in the RAM, the lower the IOPS requirements, so you can more safely afford a larger rotating HDD instead of an SSD. When you have enough RAM, even with hundreds of Mbps, you can have almost zero IOPS on storage.
SSD/NVME vs. HDD — of course we recommend SSD/NVME for handling high IOPS. But the real need depends on the actual operation. We have preferred SSDs over high capacity everywhere. 100–200 GB per-server is enough for us. But it is also necessary to take into account the fact that you need to log in. It is optimal to rotate the logs continuously, send them to a collection point for further processing and clean them.
Connectivity — it is advisable to have a realistic idea of how much traffic and especially its peaks you will handle. As for the less important PoP, 100 Mbps will suffice. However, when it comes to PoP in an important location, prefer 1 Gbps and distribute the load among multiple PoPs (round-robin DNS, when more A records are returned). You will achieve overall higher throughput and lower load on specific ISPs, in addition to higher availability of the CDN as a whole. Whoever has the budget and the real need for this, of course, can choose a 10 Gbps port, but it is necessary to count on a high price.

Orchestration

Because you will manage several servers around the world with 99% identical configuration, you need to ensure automated installation, configuration, and mass orchestration.

We use and recommend Ansible. Historically, we’ve also used Puppet, Chef and SaltStack for a while, but only Ansible meets what we need for many years. Over the years of use, we have over 80 own roles in it, so when preparing each additional server, the most time-consuming is order and waiting for an activation e-mail. If we have 10 or 50 servers, it doesn’t matter from the orchestration point of view.

Whether you manage the servers with any orchestration tool, we recommend a few things to help you eliminate the “global outage”:

When deploying changes to all servers in bulk , be careful — deploy to individual servers should run in series rather than in parallel. Possibly also in parallel, but for example after three servers at once simultaneously (in the Ansible playbook this is controlled by the “serial” directive). If the deploy on one of the servers fails, force the deploy to abort (in the Ansible directive, “max_fail_percentage”).
Before restarting/reloading components, first check the validity of the configuration (configtest). Eliminate outages associated with invalid configuration. Some distributions and their init scripts do not do this automatically. Ideally, configtest should be performed before restarting the service to prevent the service from stopping and starting.
At the end of deployment to an individual server, perform a set of CDN functionality tests on that particular server. E.g. by calling the status URL and ideally also by calling some functional URL from one of the originals, which will be returned from the cache and also one URL, which, on the contrary, will not be in the cache and will have to be downloaded from the original. We also have one “service” origin domain for these purposes. In conjunction with serial deployment, you can be sure that you will not cause outages on more than 1 PoP at a time.

Server configuration and reverse proxy (cache)

If you already have prepared servers, the really interesting and
creative part awaits you — the preparation of configurations of individual SW components, of which the CDN is composed.

In the next article (in 2–3 weeks), we will focus on operating system settings (with real settings for Debian Linux), reverse proxy (Nginx as cache) and other aspects related to CDN traffic — content optimization, security, attack protections or settings that affect search engines behavior. And maybe also cache tagging and its invalidation based on Varnish (we are working on it these weeks). This is a very useful feature that very few CDN providers offer and only in their most expensive plans.

Thanks for reading, and if you like the article, I will be happy if you share it or leave a comment. Have a nice day :-)

If you are interested in any other CDN-related details, ask in the comments or ask on X/Twitter @janreges. I will be happy to answer.

Test your websites with my analyzer

In conclusion, I would like to recommend one of my personal open-source projects, which I would like to help improve the quality of websites around the world. The tool is available as a desktop application, but also a command-line tool usable in CI/CD pipelines. For Windows, macOS and Linux.

I launched it at the end of 2023 and I believe that it will help a lot of people to increase security, performance, SEO, accessibility or other important aspects of a quality web presentation or application. It's called SiteOne Crawler - Free Website Analyzer and I also wrote an article about it. Below you will find 3 descriptive videos - the last one also shows what report it will generate for your website.

In addition to various analyses, it also offers, for example, the export of the entire website into an offline form, where you can view the entire website from a local disk without the internet, or the generation of sitemaps.

Sharing this project with your colleagues and friends will be the greatest reward for me for writing these articles. Thank you and I wish you all the best in 2024.

Desktop Application

Command-line tool

HTML report - analysis results

Note: this article was written with the best intentions and without advertising purposes. However, it contains a few partner links in the text to specific providers with whom we have many years of excellent experience.