Clusterception Part 3: Getting started with Kafka

#kubernetes #apachekafka #strimzi

This post is part of a series on running Kafka on Kubernetes on Azure. You can find links to other posts in the series here. All code is available in my Github.

In this part, we'll install Kafka to our Kubernetes cluster using Strimzi and try it out.

As discussed in part 1 of this series, running Kafka on Kubernetes widens the choice of deployment platform. Managed Kubernetes is provided by almost every cloud provider, whereas managed Kafka is rarer.

Strimzi makes it quite simple to start up your Kafka cluster. Note that Strimzi is not the only option for running Kafka in Kubernetes; another prominent alternative is Confluent for Kubernetes.

Installing Strimzi

I'll install Strimzi on the Azure Kubernetes Cluster that we set up in the previous part of this series. We'll follow Strimzi's quickstart for installing Strimzi. Select the "Minikube" option; it should work precisely similarly against AKS.

I'll first create a new namespace for our Kafka resources:

kubectl create namespace kafka

With version 0.32.0, Strimzi has introduced a one-line command for installation:

kubectl create \
  -f 'https://strimzi.io/install/latest?namespace=kafka' \
  -n kafka

The script deploys the Strimzi Cluster Operator, which will run and administer the Kafka resources, and the required Kubernetes users and rights for it to function. In addition, the process installs several Custom Resource Definitions or CRDs. These enable declarative deployment of Kafka resources supported by Strimzi.

For convenience, I'll set kafka as the default namespace to avoid having to write it out each time:

kubectl config set-context --current --namespace=kafka

You can also check out the CRDs that were installed:

kubectl get crd

The console shows several resources with the word kafka in them. If interested, you can get further details with kubectl describe. For example:

kubectl describe crd kafkas.kafka.strimzi.io

Now that is a long one! Lucky that you don't need to implement all that yourself. 😄

Creating a Kafka cluster

With the CRDs created, I can deploy a Kafka cluster using a single resource definition in a YAML. I'll start with the sample YAML provided by Strimzi in their quickstart, linked previously. All scripts used in this post are also available in the series' Github repository.

The YAML is as follows:

There are a lot of possible configurations when setting up the cluster. I'll not go into those in this post; that's a possible topic for the future. 🙂

I'll create the cluster with kubernetes apply:

kubectl apply -f kafka-cluster.yaml

You can have a look at the Kubernetes Services that this created:

kubectl get service

There are many services with the name of your cluster prefixed; if you used the sample YAML, the prefix is my-cluster. These services include:

ZooKeeper: ZooKeeper is Apache's general-purpose orchestrator for distributed services, used in Kafka and several other services like Hadoop. Note that you shouldn't need to interact with this directly; it just works in the background. Also, Strimzi is working on removing this dependency to simplify the setup even further.
Kafka Brokers: As discussed in part 1, brokers are the actual worker servers containing all the topics and messages in Kafka. You can think of them as comparable to "nodes" in most other distributed services. We only have one broker in our setup, but we could scale up our cluster by simply increasing the value in spec.kafka.replicas in the YAML.
Bootstrap: Strimzi simplifies connecting to Kafka by providing a bootstrap service. You only need to provide this service for any client process, and the Kafka protocol will connect to the broker containing your target topic.

Now, you don't see an External IP on any of these services, and you'll need one for connecting to Kafka from outside Kubernetes. For this, you need to add an external listener. This is luckily easy to do - I'll add the following entry to spec.kafka.listeners:

- name: external
  port: 9094
  type: loadbalancer
  tls: false

If you now apply the YAML and list your services, you'll see a service with an external IP for your broker and an external bootstrap service. The external bootstrap works the same as the internal bootstrap already added - you can use this as the single entry point for clients.

I now created an external listener of type "LoadBalancer", but there are other types. You can find more information in this series of posts by Strimzi.

WARNING: You'll also see that tls is set to false. This means that communication to Kafka is not encrypted, so it's highly insecure. I'll return to this topic in the next part of this series, where I configure security for the Kafka cluster.

For now, let's continue with these settings; however, don't send anything sensitive to your Kafka. After testing, you can remove the external listener or stop your AKS cluster to limit exposure.

Testing connectivity

Following the Strimzi quickstart, you'll find instructions for testing the Kafka cluster from inside Kubernetes. For this post, you can try it out also from outside Kubernetes with the external listener. You can do this do this with the same Docker image used in the quickstart, but with local Docker instead of Kubernetes - so you'll need Docker running.

First, get the external IP of your external bootstrap service, for example, my-cluster-kafka-external-bootstrap. With this in hand, start the console producer:

docker run -it --rm --name kafka-producer quay.io/strimzi/kafka:0.32.0-kafka-3.3.1 bin/kafka-console-producer.sh --bootstrap-server EXTERNAL_BOOTSTRAP_IP:9094 --topic my-topic

The setting -it connects an interactive terminal to the running container, and --rm automatically removes the container when you exit the console.

In another terminal, start the consumer:

docker run -it --rm --name kafka-consumer quay.io/strimzi/kafka:0.32.0-kafka-3.3.1 bin/kafka-console-consumer.sh --bootstrap-server EXTERNAL_BOOTSTRAP_IP:9094 --topic my-topic --from-beginning

Write a message in the producer console, and you should see it appear in the consumer console. If so, you now have a functioning Kafka cluster in AKS! 🎉

That's it for this post. Next time we'll look into setting up security for Kafka - see you then!