At this year's KubeCon, the buzz was all about distributed Kubernetes: Edge, Hybrid Cloud, and Multi-Cloud.
For each of these topics there's an endless number of solutions: KubeEdge, OpenShift Edge, Akri, Baetyl, Kubermatic, Lens, Rancher, KubeFed, KubeSphere, Red Hat ACM, Liqo, Skupper, Linkerd, Fleet…
…the list goes on and on and on and on and…you get the point.
One thing not really discussed? You can run a single cluster across servers in different locations. You can skip the new tools and run a cluster like you normally would, but extended to new environments. Sounds like a no-brainer, right?
This is called a "wide cluster" or "stretched cluster", and it's an alternative to the "multi-cluster" model that has been coming into popularity. Before we discuss why you might want to implement a wide cluster vs. a multi-cluster architecture, let's discuss some common concerns.
Etcd (the Kubernetes "brain") is latency intolerant, so if your control plane nodes are too far apart, a wide cluster just plain wont work. You also don't usually want to introduce latency between worker nodes either because, well, application performance.
However, both problems can be easily solved in by:
- Co-locating Etcd nodes, or using a non-Ectd alternative (dqlite, anyone?)
- Applying node labels for locations and using node selectors for apps
If you're running a single cluster between clouds, that likely means running inter-node traffic over a public network, which is scary. It can also be challenging, since you need to provide direct connectivity between all nodes.
The solution here is a mesh VPN (no, not a service mesh).
The VPN will encrypt all your traffic and provide a flexible subnet where all your nodes can communicate directly and securely.
You may also have special considerations around access controls. There are ways to manage access from a single cluster, but maybe you'd rather just have multiple clusters as your way of managing access. And that's ok, I won't judge you.
The number one reason you might need to avoid running a wide cluster is the cloud costs. Some certain, giant, cloud providers will charge you through the nose for data egress. If you're running worker nodes between such clouds and your data center, you're gonna pay for it. Still, as we'll discuss below, you may actually end up saving money with a wide cluster.
Most cloud-hosted k8s options can't be easily extended to other locations, which makes sense, since cloud providers have every incentive to keep you in their cloud. If you're stuck running certain distributions, you may just be stuck.
We've discussed the concerns, some of which have easy answers and some of which are harder. With all that, here's why you might want to think about running a wide Kubernetes cluster as an alternative to a multi-cluster architecture.
As discussed, there are a thousand tools for running multi-cluster, hybrid cloud, and edge computing with Kubernetes.
A large portion of these tools and platforms require a whole new framework for app deployments which must be adopted across all of your clusters. That's a lot of learning, and a lot of dependency on a new tool. The solution might also require relying on a single k8s distribution or cloud provider.
Alternatively, with the single, multi-cloud cluster approach, you can run your operations exactly as you would with a standard cluster. No new tools, and vastly simplified operations.
If you're running large, complex clusters, you probably have a lot of redundant components across all of them. Components to handle storage, networking, metrics, logging, images, pipelines, and more may need to be replicated across each and every cluster. That overhead adds up.
In addition, each cluster needs its own control plane, and assuming they're all HA, that's 3+ additional nodes per cluster. Compute gets expensive.
Compare this to a single, wide cluster, where there is one control plane and one set of services to support nodes in different clusters. You can add in special tooling as needed for particular environments, but you don't have to, which is a key distinction.
This is why if you're hesitant because of egress data charges in your cloud, you might still want to weigh this against the cost of a multi-cluster infrastructure.
It's almost impossible to describe how flexible a cluster on a mesh VPN becomes. You have given it an extensible networking base from which to grow into new environments. A normal cluster will always be just that, a normal cluster occupying a subnet in a data center.
Sure, you can put some tools on top to make it handle some more fancy operations, but the cluster itself is fundamentally limited to that location in the data center.
On the other hand, a k8s cluster built on a mesh VPN can grow. It can expand to new locations. Its nodes can live in any location it needs to be. You can cloud burst into a new provider and remove all those nodes when they're no longer needed. The cluster can shift from place to place. The underlying infrastructure becomes incredibly malleable.
This is why, even if you're running just a single cluster on a single cloud, you may still want to deploy it on a mesh VPN just in case that ever changes.
This concludes my Ted Talk. We've discussed the pros and cons of a wide cluster on a mesh VPN vs. the standard multi-cluster architecture. I hope this at least sparks some ideas for you when planning your next Kubernetes deployment.