Jan Schulte for Outshift By Cisco

Posted on Mar 27, 2023

Deploy Apache Kafka® on Kubernetes

#kafka #kubernetes #devops

More and more companies consider Kubernetes a deploy target for their
production environments. When deploying your applications on Kubernetes,
these applications often come with dependencies like databases or
message brokers. Running a message broker like Apache Kafka on
Kubernetes is a no-brainer. Kafka is an essential building block of
modern microservice-based architectures. Running everything on the same
platform has its merits. In this blog post, we will explore the
necessary steps to deploy a new Kafka cluster into a cloud-native
environment step by step.

Bringing together Kubernetes and Stateful Services

Deploying a new service on Kubernetes sounds straightforward initially.
Let's configure a new Kafka and Zookeeper deployment, run a handful of
replicas for each, and we're good. This approach might suffice for a
development setup but won't hold up in a production environment.

Managing State

Like other stateful tools such as databases, Kafka requires unique
guarantees and configuration when running it on Kubernetes. It needs to
hold a state to keep track of topics and partitions. Specifically, it
needs to read and write state from and to the hard disk. With this
requirement, Kubernetes cannot easily (re-)schedule or scale a Kafka
workload. With a StatefulSet, we get some guarantees around the
uniqueness and ordering of pods.

Service Access

When we deploy a regular application, we manage it with a standard
deployment. Once up and running, we couple the deployment with a
service. The service has two jobs:

Providing access to pods for other code within the cluster
Act as a load balancer

When running Kafka, clients need to directly talk to a specific Kafka
broker, as a particular broker manages the topic and partition the
client (producer or consumer) works with. While Kafka replicates data to
other brokers, producers, and consumers must connect to a specific
broker.

A regular Kubernetes service does not allow for that. Since it acts as a
load balancer, it forwards requests in a round-robin pattern - so access
to a specific broker is not guaranteed.

To make this possible, we use a so-called headless service. A headless
service does not load balance and broadcasts all pod addresses to
clients.

Deploy Kafka on Kubernetes

When you research how to run Kafka on Kubernetes, you find a few
different types of answers:

The one that says it can't or shouldn't be done.
The one that shows you how to deploy it but does not configure any hard disks to persist state. As soon as you re-reschedule your pods, all your state is gone. - Acceptable in development but not production
The one that only pitches a product

In this blog post, we will explore a combination of showing how to
deploy Kafka on Kubernetes (with persistence) as well as showing you a
way to automate a lot of the tasks.

Let's get into it.

We start with a new network definition:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: kafka-network
spec:
  ingress:
    - from:
        - podSelector:
            matchLabels:
              network/kafka-network: "true"
  podSelector:
    matchLabels:
      network/kafka-network: "true"

Next is Zookeeper. We must deploy Zookeeper first so that when we later
spin up Kafka, it has a Zookeeper host to connect to:

apiVersion: v1
kind: Service
metadata:
  labels:
    service: zookeeper
  name: zookeeper
spec:
  ports:
    - name: "2181"
      port: 2181
      targetPort: 2181
  selector:
    service: zookeeper

And:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    service: zookeeper
  name: zookeeper
spec:
  serviceName: zookeeper
  replicas: 1
  selector:
    matchLabels:
      service: zookeeper
  template:
    metadata:
      labels:
        network/kafka-network: "true"
        service: zookeeper
    spec:
      securityContext:
        fsGroup: 1000
      enableServiceLinks: false
      containers:
        - name: zookeeper
          imagePullPolicy: Always
          image: confluentinc/cp-zookeeper:7.3.2
          ports:
            - containerPort: 2181
          env:
            - name: ZOOKEEPER_CLIENT_PORT
              value: "2181"
            - name: ZOOKEEPER_DATA_DIR
              value: "/var/lib/zookeeper/data"
            - name: ZOOKEEPER_LOG_DIR
              value: "/var/lib/zookeeper/log"
            - name: ZOOKEEPER_SERVER_ID
              value: "1"
          resources: {}
          volumeMounts:
            - mountPath: /var/lib/zookeeper/data
              name: zookeeper-data
            - mountPath: /var/lib/zookeeper/log
              name: zookeeper-log
      hostname: zookeeper
      restartPolicy: Always
  volumeClaimTemplates:
    - metadata:
        name: zookeeper-data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1024Mi
    - metadata:
        name: zookeeper-log
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1024Mi

In this example, we encounter the first aspect worth explaining in more
detail. Besides the usual deployment configuration (container, env,
…), we also configure a volume (volumeClaimTemplate). Zookeeper needs
to persist various data to ensure Kafka works properly. We need to hold
on to this data independently of restarts.

We're not configuring a regular deployment here, but a StatfulSet
instead. Since Zookeeper pods need to mount a volume, we configure a
securityContext. It sets the appropriate user id for filesystem
access. Without this setting, the container could not read the
filesystem and failed to start.

Next is the Kafka Service:

apiVersion: v1
kind: Service
metadata:
  labels:
    service: kafka
  name: kafka
spec:
  clusterIP: None
  selector:
    service: kafka
  ports:
    - name: internal
      port: 29092
      targetPort: 29092
    - name: external
      port: 30092
      targetPort: 9092

What's unique about this service is that it is a headless service
(because of clusterIP: None). A regular Kubernetes service acts as a
load balancer. You cannot tell how many pods exist behind a given
service. For Kafka, however, we cannot rely on this feature because a
client needs to connect to a specific Kafka broker. Each Kafka broker is
responsible for its data. Therefore, we're using a headless service,
allowing clients to connect to a particular Kafka broker.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    service: kafka
  name: kafka
spec:
  serviceName: kafka
  replicas: 3
  selector:
    matchLabels:
      service: kafka
  template:
    metadata:
      labels:
        network/kafka-network: "true"
        service: kafka
    spec:
      securityContext:
        fsGroup: 1000
      enableServiceLinks: false
      containers:
      - name: kafka
        imagePullPolicy: IfNotPresent
        image: confluentinc/cp-kafka:7.0.1
        ports:
          - containerPort: 29092
          - containerPort: 9092
        env:
          - name: KAFKA_ADVERTISED_LISTENERS
            value: "INTERNAL://:29092,LISTENER_EXTERNAL://:9092"
          - name: KAFKA_AUTO_CREATE_TOPICS_ENABLE
            value: "true"
          - name: KAFKA_INTER_BROKER_LISTENER_NAME
            value: "INTERNAL"
          - name: KAFKA_LISTENERS
            value: "INTERNAL://:29092,LISTENER_EXTERNAL://:9092"
          - name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
            value: "INTERNAL:PLAINTEXT,LISTENER_EXTERNAL:PLAINTEXT"
          - name: KAFKA_ZOOKEEPER_CONNECT
            value: "zookeeper:2181"
        resources: {}
        volumeMounts:
          - mountPath: /var/lib/kafka/
            name: kafka-data
      hostname: kafka
      restartPolicy: Always
  volumeClaimTemplates:
    - metadata:
        name: kafka-data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi

Above, you find the configuration for the Kafka StatefulSet. Its
format is very similar to Zookeeper.

In this example, we define KAFKA_AUTO_CREATE_TOPICS_ENABLE to make
testing a bit more straightforward. For a production-ready setup, do not
set this environment variable. Instead, configure topics ahead of time.

We also spin up three replicas. As soon as we have more than one
replica, the configuration you might find in Docker Compose examples or
other single-container use cases does not apply anymore. For instance,
we do not define KAFKA_BROKER_ID. If we had this environment variable
set, as soon as the second broker starts, it would stop working
immediately since the ID is already registered within Zookeeper. The
same applies to KAFKA_ADVERTISED_LISTENERS. Instead of setting a
hostname like:

env: 
 - name: KAFKA_ADVERTISED_LISTENERS
   value: "INTERNAL://kafka:29092,LISTENER_EXTERNAL://kafka:9092"

We configure it like this:

env: 
 - name: KAFKA_ADVERTISED_LISTENERS
   value: "INTERNAL://:29092,LISTENER_EXTERNAL://:9092"

We let the broker configure its ID automatically and let it figure out
the hostname on its own as well.

Testing Our Setup

To ensure our setup works, let's create one more deployment:

#kcat.yaml
kind: Deployment
apiVersion: apps/v1 
metadata:
  name: kcat
  labels:
    app: kcat
spec:
  selector:
    matchLabels:
      app: kcat
  template:
    metadata:
      labels:
        app: kcat 
    spec:
      containers:
        - name: kafka-cat
          image: confluentinc/cp-kafkacat:7.1.6
          command: ["/bin/sh"]
          args: ["-c", "trap : TERM INT; sleep infinity & wait"]

This deployment creates a kcat container we can use to produce and consume messages.

Create the new workload:

$ kubectl apply -f kcat.yaml

Let's find the pod name next:

$ kubectl get pods
 kafka-0                            1/1     Running            0               17m
 kafka-1                            1/1     Running            0               18m
 kafka-2                            1/1     Running            0               18m
 kcat-79fff6fcf8-pcqm9              1/1     Running            0               31m
 ...

With the pod name, we create a bash session, and so we can run our
commands:

$ kubectl exec --stdin --tty kcat-79fff6fcf8-pcqm9 -- /bin/bash

Since we have KAFKA_AUTO_CREATE_TOPICS_ENABLE set, we don't need to
create topics beforehand. Let's produce a few messages:

[appuser@kcat-79fff6fcf8-pcqm9 ~]$ echo "TEST" | kafkacat -b kafka:29092 -t newtopic123 -p -1
% Auto-selecting Producer mode (use -P or -C to override)
[appuser@kcat-79fff6fcf8-pcqm9 ~]$ echo "TEST2" | kafkacat -b kafka:29092 -t newtopic123 -p -1
% Auto-selecting Producer mode (use -P or -C to override)
[appuser@kcat-79fff6fcf8-pcqm9 ~]$ echo "TEST3" | kafkacat -b kafka:29092 -t newtopic123 -p -1
% Auto-selecting Producer mode (use -P or -C to override)

With those messages produced, it's time to consume them:

[appuser@kcat-79fff6fcf8-pcqm9 ~]$ kafkacat -C -b kafka:29092 -t newtopic123 -p -1
TEST
TEST2
TEST3
% Reached end of topic newtopic123 [0] at offset 3

A Managed Service On Your Premises

In the steps above, we covered parts of a full production setup for
Apache Kafka. We still need to cover monitoring, scaling, and managing
permissions.

Going through the Kafka on K8s setup manually requires many steps and
online research. It can feel overwhelming to stand up a production-ready
cluster.

Are there any alternatives? Yes. You could choose a managed Kafka
hoster. Several vendors offer to host and configure Kafka for you,
abstracting away aspects that require more expertise. Where there are
many scenarios where going with a managed hoster is excellent, there are
also cases where that's not an option. For instance, you must keep data
on your premises for regulatory reasons. That's where Calisti comes into
play.

Calisti offers you the best of both worlds. It
provides you with a managed Kafka installation - on your premises.

You get to run Kafka on your own Kubernetes cluster, without having to
be an expert right off the bat. Instead, you can build your expertise
step by step while running Kafka in production.

Calisti's free tier lets you try out the product with no strings
attached and no credit card required.

Provision a new Kubernetes Cluster

*If you already have a Kubernetes Cluster up and running, skip this
step.*

We use eksctl in this example to provision a new instance on AWS. Create a new config
file:

# calisti-kafka.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: calisti-kafka 
  region: us-east-2 
  version: "1.22"

managedNodeGroups:
  - name: ng-1
    labels: 
    instanceType: m5.xlarge
    desiredCapacity: 5
    volumeSize: 80
    privateNetworking: true

We use this configuration file to provision a new EKS cluster with
m5.xlarge instance types:

$ eksctl create cluster -f calisti-kafka.yaml

This step can take up to 20 minutes to complete.

Calisti: Installation

Once you have a working Kubernetes cluster, go ahead and sign up for a
free Calisti account. Calisti comes with
a command line application that runs on your local machine. With this
command line application, we will install Calisti on your Kubernetes
cluster.

Step 1 - Download Binaries and Activation Credentials

Visit the Calisti download page to
download binaries for your platform and install them.

Once you have the smm binary in your PATH, we need to configure
activation credentials:

SMM_REGISTRY_PASSWORD=<PASSWORD> ./smm activate \
--host=registry.eticloud.io \
--prefix=smm \
--user=<USER_ID>

(You find these here)

Step 2 - Install Calisti

Next, we install the Calisti components into your Kubernetes cluster.
Run:

$ smm install -a --install-sdm

(This step can take up to 10 minutes)

This command installs various components, including all parts needed to
manage Kafka clusters.

We also need to configure a set of permissions (if you use Kubernetes on
AWS, GCP, or Azure):

    kubectl create clusterrolebinding user-cluster-admin --clusterrole=cluster-admin --user=<aws/azure/gcp
username>

Step 3 - Install Kafka

Let's install a new Kafka broker in your environment. First, let's make
sure Zookeeper is up and running (Calisti automatically installs it):

$ kubectl get pods -n zookeeper
NAME                                            READY   STATUS      RESTARTS      AGE
zookeeper-operator-77df49fd-gb9tm               2/2     Running     1 (10m ago)   10m
zookeeper-operator-post-install-upgrade-bnnww   0/1     Completed   0             10m

With Zookeeper up and running, let's provision a new broker:

$ smm sdm cluster create

This command creates:

A service account called kafka-cluster used by the broker pods
A KafkaCluster Custom Resource with:
- 2 Kafka Brokers
- 2 internal listeners (ports 29092, 29093)
- 10GB broker log dir storage
- Zookeeper address: zookeeper-server-client.zookeeper:2181

Confirm that the cluster is up and running:

$ smm sdm cluster get 
Namespace  Name    State              Image                                    Alerts  Cruise Control Topic Status  Rolling Upgrade Errors  Rolling Upgrade Last Success
kafka      kafka    ClusterRunning     ghcr.io/banzaicloud/kafka:2.13-3.1.0     0       CruiseControlTopicReady       0                       2023-01-31 20:17:25

After this step, you are all set to produce and consume messages.

Final Thoughts

It is possible to deploy a new Kafka cluster on Kubernetes from scratch.
This blog post provides you with the foundation to do just that. While
running Kafka in production requires additional configuration, the
instructions in this blog post are a good starting point.

In some cases, running Kafka on your Kubernetes Cluster can be too much.
While managed Kafka providers sometimes can be the answer, there are
cases where Kafka needs to be located in your infrastructure. That's
where you can leverage Calisti. Calisti abstracts away more complex
administration aspects so you can focus on providing value to your
users.

Top comments (3)

Harish Jadhav • Feb 7

I got this error in Kubernetes

[2025-02-07 18:01:56,886] WARN [Controller id=1024, targetBrokerId=1025] Error connecting to node kafka-1.kafka.datahub.svc.cluster.local:29092 (id: 1025 rack: null) (org.apache.kafka.clients.NetworkClient)
java.net.UnknownHostException: kafka-1.kafka.datahub.svc.cluster.local
at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)
at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1509)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1368)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1302)
at org.apache.kafka.clients.DefaultHostResolver.resolve(DefaultHostResolver.java:27)
at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:110)
at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:511)
at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:468)
at org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:173)
at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:975)
at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:301)
at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:65)
at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:291)
at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:245)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)

Checked service status via telnet:

/ # telnet kafka-1.kafka.datahub.svc.cluster.local 9092
telnet: bad address 'kafka-1.kafka.datahub.svc.cluster.local'

Can you please help me how to resolve this error