Federico Sismondi for Camptocamp Infrastructure Solutions

Posted on Apr 16, 2024 • Edited on Sep 11, 2024

Unveiling the Simplicity of Cluster Mesh for Kubernetes Deployments

#cilium #clustermesh #ebpf #kubernetes

Unveiling the Simplicity of Cluster Mesh for Kubernetes Deployments

During Kubecon EU 2024, among a crowd of tech enthusiasts and Kubernetes aficionados, Liz Rice the Queen bee, demo’ed multi-cluster networking. This is Cluster Mesh 101 with Cilium.

Here are a few paragraphs summarizing the experience.

Overview

Cluster Mesh extends the networking plane across multiple clusters. It enables connection among endpoints of connected clusters. Two noticeable features are:
i) Network Policy Enforcement as implemented by Cilium prevails even under this network setup
ii) Services can load balance requests among clusters just by using annotations

Networking Adventures Begin

To the surprise of many, Liz announces she will be running the demo over the venue's Wi-Fi.

Presentation gets started with connectivity tests over VPN connections, routes propagation validation, checks of the BGP peering and visualization of routing tables. The setup is a running k8s cluster in GKE and another in EKS. Node to node Network connectivity is the final objective here, and do not forget all assigned IPs should be not overlapping. Cilium cannot create a bridge between two cloud providers. No black magic.

This is foreplay preparing the ground for the demo. The steps for reproducing this can be found in official Cilium here.

Enter Cilium's Cluster Mesh

With the foundational work laid, it's time to kickstart the demo. Liz demonstrates how to enable all necessary components with cilium clustermesh enable in both clusters. These trigger the deployment of the clustermesh-apiserver into each cluster, along with the generation of all required certificates. This component goes the extra mile by attempting to auto-detect the optimal service type for LoadBalancer, ensuring the Cluster Mesh control plane is efficiently exposed to other clusters.

A simple cilium clustermesh connect builds the bridge between clusters, just as if we had a single network plane between Pods across all clusters.

Now cilium clustermesh status echos:

✅ All 2 nodes are connected to all clusters
🔌 Cluster Connections:

cilium-cli-ci-multicluster-2-168: 2/2 configured, 2/2 connected

Traffic load balancing and failover in the hive:

Till now demo showed and validated the piping between Pods. We can successfully communicate with a Pod on another cluster using its IP address for example. But this wouldn't be very practical in a real world-scenario. Moreover no DNS service can help us fetch this dynamic IP. In fact, this is the raison d'être of k8s Service object.

What Cilium proposes is extending native Service resources into a cluster-mesh-aware Service using annotations.

Cilium provides a pragmatic solution through global services with auto discovery and failover mechanisms.

By using service.cilium.io/global=true makes a service global, which meansmatching Pods across clusters. Or in other words we extend service’s backends to use Pods in remote clusters. Then the service’s traffic is balanced across clusters.

Questions? This is probably better explained by Ciilium documentation here

apiVersion: v1
kind: Service
metadata:
  name: rebel-base
  annotations:
    service.cilium.io/global: "true"
spec:
  type: ClusterIP
  ports:
  - port: 80
  selector:
    name: rebel-base

With service.cilium.io/affinity=local|remote we can fine tune the global service to prefer local or remote Pods. With local we could designate our local cluster as the primary destination, while the remote Pods serve as a backup.

The following represents a service, which is global and prefers using endpoints found locally:

apiVersion: v1
kind: Service
metadata:
  name: rebel-base
  annotations:
    service.cilium.io/global: "true"
    service.cilium.io/affinity: "local"
spec:
  type: ClusterIP
  ports:
  - port: 80
  selector:
    name: rebel-base

Should the primary cluster encounter any issues or downtime, traffic seamlessly shifts to the backup cluster, ensuring continuity of service.
In essence, Cilium offers a straightforward approach to traffic management, enhancing reliability by providing a failover mechanism that ensures service accessibility remains intact in the face of pod disruptions.

Conclusion

A few interesting use cases arise when using Cluster Mesh. The one we focused on in this article is about leveraging remote clusters Pods as service's Pod backend as failover mechanism. We can also think of using a global service for moving workloads around, for example for lowering computational costs into cheaper regions.

So there you have it, folks. A whirlwind tour of multicluster networking traffic management with Cilium, served up with a dose of honey.
Who knew cluster meshing would be that simple?

Contact us

Needs a demo, or dig into some specifics on Cilium?
Ping us here: https://camptocamp.com/consulting

DEV Community

Unveiling the Simplicity of Cluster Mesh for Kubernetes Deployments

Unveiling the Simplicity of Cluster Mesh for Kubernetes Deployments

Overview

Networking Adventures Begin

Enter Cilium's Cluster Mesh

Traffic load balancing and failover in the hive:

Conclusion

Contact us

Top comments (0)