More and more companies consider Kubernetes a deploy target for their
production environments. When deploying your applications on Kubernetes,
these applications often come with dependencies like databases or
message brokers. Running a message broker like Apache Kafka on
Kubernetes is a no-brainer. Kafka is an essential building block of
modern microservice-based architectures. Running everything on the same
platform has its merits. In this blog post, we will explore the
necessary steps to deploy a new Kafka cluster into a cloud-native
environment step by step.
Bringing together Kubernetes and Stateful Services
Deploying a new service on Kubernetes sounds straightforward initially.
Let's configure a new Kafka and Zookeeper deployment, run a handful of
replicas for each, and we're good. This approach might suffice for a
development setup but won't hold up in a production environment.
Managing State
Like other stateful tools such as databases, Kafka requires unique
guarantees and configuration when running it on Kubernetes. It needs to
hold a state to keep track of topics and partitions. Specifically, it
needs to read and write state from and to the hard disk. With this
requirement, Kubernetes cannot easily (re-)schedule or scale a Kafka
workload. With a StatefulSet
, we get some guarantees around the
uniqueness and ordering of pods.
Service Access
When we deploy a regular application, we manage it with a standard
deployment. Once up and running, we couple the deployment with a
service
. The service has two jobs:
- Providing access to pods for other code within the cluster
- Act as a load balancer
When running Kafka, clients need to directly talk to a specific Kafka
broker, as a particular broker manages the topic and partition the
client (producer or consumer) works with. While Kafka replicates data to
other brokers, producers, and consumers must connect to a specific
broker.
A regular Kubernetes service does not allow for that. Since it acts as a
load balancer, it forwards requests in a round-robin pattern - so access
to a specific broker is not guaranteed.
To make this possible, we use a so-called headless service
. A headless
service does not load balance and broadcasts all pod addresses to
clients.
Deploy Kafka on Kubernetes
When you research how to run Kafka on Kubernetes, you find a few
different types of answers:
- The one that says it can't or shouldn't be done.
- The one that shows you how to deploy it but does not configure any hard disks to persist state. As soon as you re-reschedule your pods, all your state is gone. - Acceptable in development but not production
- The one that only pitches a product
In this blog post, we will explore a combination of showing how to
deploy Kafka on Kubernetes (with persistence) as well as showing you a
way to automate a lot of the tasks.
Let's get into it.
We start with a new network definition:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: kafka-network
spec:
ingress:
- from:
- podSelector:
matchLabels:
network/kafka-network: "true"
podSelector:
matchLabels:
network/kafka-network: "true"
Next is Zookeeper. We must deploy Zookeeper first so that when we later
spin up Kafka, it has a Zookeeper host to connect to:
apiVersion: v1
kind: Service
metadata:
labels:
service: zookeeper
name: zookeeper
spec:
ports:
- name: "2181"
port: 2181
targetPort: 2181
selector:
service: zookeeper
And:
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
service: zookeeper
name: zookeeper
spec:
serviceName: zookeeper
replicas: 1
selector:
matchLabels:
service: zookeeper
template:
metadata:
labels:
network/kafka-network: "true"
service: zookeeper
spec:
securityContext:
fsGroup: 1000
enableServiceLinks: false
containers:
- name: zookeeper
imagePullPolicy: Always
image: confluentinc/cp-zookeeper:7.3.2
ports:
- containerPort: 2181
env:
- name: ZOOKEEPER_CLIENT_PORT
value: "2181"
- name: ZOOKEEPER_DATA_DIR
value: "/var/lib/zookeeper/data"
- name: ZOOKEEPER_LOG_DIR
value: "/var/lib/zookeeper/log"
- name: ZOOKEEPER_SERVER_ID
value: "1"
resources: {}
volumeMounts:
- mountPath: /var/lib/zookeeper/data
name: zookeeper-data
- mountPath: /var/lib/zookeeper/log
name: zookeeper-log
hostname: zookeeper
restartPolicy: Always
volumeClaimTemplates:
- metadata:
name: zookeeper-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1024Mi
- metadata:
name: zookeeper-log
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1024Mi
In this example, we encounter the first aspect worth explaining in more
detail. Besides the usual deployment configuration (container
, env
,
…), we also configure a volume (volumeClaimTemplate
). Zookeeper needs
to persist various data to ensure Kafka works properly. We need to hold
on to this data independently of restarts.
We're not configuring a regular deployment here, but a StatfulSet
instead. Since Zookeeper pods need to mount a volume, we configure a
securityContext
. It sets the appropriate user id for filesystem
access. Without this setting, the container could not read the
filesystem and failed to start.
Next is the Kafka Service:
apiVersion: v1
kind: Service
metadata:
labels:
service: kafka
name: kafka
spec:
clusterIP: None
selector:
service: kafka
ports:
- name: internal
port: 29092
targetPort: 29092
- name: external
port: 30092
targetPort: 9092
What's unique about this service is that it is a headless service
(because of clusterIP: None
). A regular Kubernetes service acts as a
load balancer. You cannot tell how many pods exist behind a given
service. For Kafka, however, we cannot rely on this feature because a
client needs to connect to a specific Kafka broker. Each Kafka broker is
responsible for its data. Therefore, we're using a headless service,
allowing clients to connect to a particular Kafka broker.
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
service: kafka
name: kafka
spec:
serviceName: kafka
replicas: 3
selector:
matchLabels:
service: kafka
template:
metadata:
labels:
network/kafka-network: "true"
service: kafka
spec:
securityContext:
fsGroup: 1000
enableServiceLinks: false
containers:
- name: kafka
imagePullPolicy: IfNotPresent
image: confluentinc/cp-kafka:7.0.1
ports:
- containerPort: 29092
- containerPort: 9092
env:
- name: KAFKA_ADVERTISED_LISTENERS
value: "INTERNAL://:29092,LISTENER_EXTERNAL://:9092"
- name: KAFKA_AUTO_CREATE_TOPICS_ENABLE
value: "true"
- name: KAFKA_INTER_BROKER_LISTENER_NAME
value: "INTERNAL"
- name: KAFKA_LISTENERS
value: "INTERNAL://:29092,LISTENER_EXTERNAL://:9092"
- name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
value: "INTERNAL:PLAINTEXT,LISTENER_EXTERNAL:PLAINTEXT"
- name: KAFKA_ZOOKEEPER_CONNECT
value: "zookeeper:2181"
resources: {}
volumeMounts:
- mountPath: /var/lib/kafka/
name: kafka-data
hostname: kafka
restartPolicy: Always
volumeClaimTemplates:
- metadata:
name: kafka-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
Above, you find the configuration for the Kafka StatefulSet
. Its
format is very similar to Zookeeper.
In this example, we define KAFKA_AUTO_CREATE_TOPICS_ENABLE
to make
testing a bit more straightforward. For a production-ready setup, do not
set this environment variable. Instead, configure topics ahead of time.
We also spin up three replicas. As soon as we have more than one
replica, the configuration you might find in Docker Compose examples or
other single-container use cases does not apply anymore. For instance,
we do not define KAFKA_BROKER_ID
. If we had this environment variable
set, as soon as the second broker starts, it would stop working
immediately since the ID is already registered within Zookeeper. The
same applies to KAFKA_ADVERTISED_LISTENERS
. Instead of setting a
hostname like:
env:
- name: KAFKA_ADVERTISED_LISTENERS
value: "INTERNAL://kafka:29092,LISTENER_EXTERNAL://kafka:9092"
We configure it like this:
env:
- name: KAFKA_ADVERTISED_LISTENERS
value: "INTERNAL://:29092,LISTENER_EXTERNAL://:9092"
We let the broker configure its ID automatically and let it figure out
the hostname on its own as well.
Testing Our Setup
To ensure our setup works, let's create one more deployment:
#kcat.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
name: kcat
labels:
app: kcat
spec:
selector:
matchLabels:
app: kcat
template:
metadata:
labels:
app: kcat
spec:
containers:
- name: kafka-cat
image: confluentinc/cp-kafkacat:7.1.6
command: ["/bin/sh"]
args: ["-c", "trap : TERM INT; sleep infinity & wait"]
This deployment creates a kcat container we can use to produce and consume messages.
Create the new workload:
$ kubectl apply -f kcat.yaml
Let's find the pod name next:
$ kubectl get pods
kafka-0 1/1 Running 0 17m
kafka-1 1/1 Running 0 18m
kafka-2 1/1 Running 0 18m
kcat-79fff6fcf8-pcqm9 1/1 Running 0 31m
...
With the pod name, we create a bash session, and so we can run our
commands:
$ kubectl exec --stdin --tty kcat-79fff6fcf8-pcqm9 -- /bin/bash
Since we have KAFKA_AUTO_CREATE_TOPICS_ENABLE
set, we don't need to
create topics beforehand. Let's produce a few messages:
[appuser@kcat-79fff6fcf8-pcqm9 ~]$ echo "TEST" | kafkacat -b kafka:29092 -t newtopic123 -p -1
% Auto-selecting Producer mode (use -P or -C to override)
[appuser@kcat-79fff6fcf8-pcqm9 ~]$ echo "TEST2" | kafkacat -b kafka:29092 -t newtopic123 -p -1
% Auto-selecting Producer mode (use -P or -C to override)
[appuser@kcat-79fff6fcf8-pcqm9 ~]$ echo "TEST3" | kafkacat -b kafka:29092 -t newtopic123 -p -1
% Auto-selecting Producer mode (use -P or -C to override)
With those messages produced, it's time to consume them:
[appuser@kcat-79fff6fcf8-pcqm9 ~]$ kafkacat -C -b kafka:29092 -t newtopic123 -p -1
TEST
TEST2
TEST3
% Reached end of topic newtopic123 [0] at offset 3
A Managed Service On Your Premises
In the steps above, we covered parts of a full production setup for
Apache Kafka. We still need to cover monitoring, scaling, and managing
permissions.
Going through the Kafka on K8s setup manually requires many steps and
online research. It can feel overwhelming to stand up a production-ready
cluster.
Are there any alternatives? Yes. You could choose a managed Kafka
hoster. Several vendors offer to host and configure Kafka for you,
abstracting away aspects that require more expertise. Where there are
many scenarios where going with a managed hoster is excellent, there are
also cases where that's not an option. For instance, you must keep data
on your premises for regulatory reasons. That's where Calisti comes into
play.
Calisti offers you the best of both worlds. It
provides you with a managed Kafka installation - on your premises.
You get to run Kafka on your own Kubernetes cluster, without having to
be an expert right off the bat. Instead, you can build your expertise
step by step while running Kafka in production.
Calisti's free tier lets you try out the product with no strings
attached and no credit card required.
Provision a new Kubernetes Cluster
*If you already have a Kubernetes Cluster up and running, skip this
step.*
We use eksctl in this example to provision a new instance on AWS. Create a new config
file:
# calisti-kafka.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: calisti-kafka
region: us-east-2
version: "1.22"
managedNodeGroups:
- name: ng-1
labels:
instanceType: m5.xlarge
desiredCapacity: 5
volumeSize: 80
privateNetworking: true
We use this configuration file to provision a new EKS cluster with
m5.xlarge
instance types:
$ eksctl create cluster -f calisti-kafka.yaml
This step can take up to 20 minutes to complete.
Calisti: Installation
Once you have a working Kubernetes cluster, go ahead and sign up for a
free Calisti account. Calisti comes with
a command line application that runs on your local machine. With this
command line application, we will install Calisti on your Kubernetes
cluster.
Step 1 - Download Binaries and Activation Credentials
Visit the Calisti download page to
download binaries for your platform and install them.
Once you have the smm
binary in your PATH
, we need to configure
activation credentials:
SMM_REGISTRY_PASSWORD=<PASSWORD> ./smm activate \
--host=registry.eticloud.io \
--prefix=smm \
--user=<USER_ID>
(You find these here)
Step 2 - Install Calisti
Next, we install the Calisti components into your Kubernetes cluster.
Run:
$ smm install -a --install-sdm
(This step can take up to 10 minutes)
This command installs various components, including all parts needed to
manage Kafka clusters.
We also need to configure a set of permissions (if you use Kubernetes on
AWS, GCP, or Azure):
kubectl create clusterrolebinding user-cluster-admin --clusterrole=cluster-admin --user=<aws/azure/gcp
username>
Step 3 - Install Kafka
Let's install a new Kafka broker in your environment. First, let's make
sure Zookeeper is up and running (Calisti automatically installs it):
$ kubectl get pods -n zookeeper
NAME READY STATUS RESTARTS AGE
zookeeper-operator-77df49fd-gb9tm 2/2 Running 1 (10m ago) 10m
zookeeper-operator-post-install-upgrade-bnnww 0/1 Completed 0 10m
With Zookeeper up and running, let's provision a new broker:
$ smm sdm cluster create
This command creates:
- A service account called
kafka-cluster
used by the broker pods - A
KafkaCluster
Custom Resource with:- 2 Kafka Brokers
- 2 internal listeners (ports 29092, 29093)
- 10GB broker log dir storage
- Zookeeper address:
zookeeper-server-client.zookeeper:2181
Confirm that the cluster is up and running:
$ smm sdm cluster get
Namespace Name State Image Alerts Cruise Control Topic Status Rolling Upgrade Errors Rolling Upgrade Last Success
kafka kafka ClusterRunning ghcr.io/banzaicloud/kafka:2.13-3.1.0 0 CruiseControlTopicReady 0 2023-01-31 20:17:25
After this step, you are all set to produce and consume messages.
Final Thoughts
It is possible to deploy a new Kafka cluster on Kubernetes from scratch.
This blog post provides you with the foundation to do just that. While
running Kafka in production requires additional configuration, the
instructions in this blog post are a good starting point.
In some cases, running Kafka on your Kubernetes Cluster can be too much.
While managed Kafka providers sometimes can be the answer, there are
cases where Kafka needs to be located in your infrastructure. That's
where you can leverage Calisti. Calisti abstracts away more complex
administration aspects so you can focus on providing value to your
users.
Top comments (1)
It worked!!