Jan Schulte for Outshift By Cisco

Posted on May 16, 2023

Install and Maintain Apache Kafka® with a GitOps Approach

#kafka #gitops #devops #argocd

Previously, we showed you how you can install and manage Apache Kafka® on Kubernetes with Koperator. While a one-time installation is great, you can unlock more value by automating this process to replicate Kafka deployments as needed for different environments. In this blog post, I will walk you through automating Apache Kafka deployments with ArgoCD.

Setup

Before you get started, you’ll need to install the necessary tools:

a Kubernetes cluster on AWS
eksctl
Helm
ArgoCD

Provision a new cluster

Next, we’ll use eksctl to provision new clusters. If you don't have eksctl on your machine yet, install it by following these instructions.

eksctl either accepts command-line arguments or a configuration file. Today, we'll use the following parameters in this yaml configuration to setup our cluster:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: koperator-demo 
  region: us-east-2 
  version: "1.24"

managedNodeGroups:
  - name: ng-1
    labels: 
    instanceType: m5.xlarge
    desiredCapacity: 2
    volumeSize: 80
    privateNetworking: true

With the configuration in place, let's create a new cluster:

eksctl create cluster -f workstation.yaml

This step might take a while since it needs to provision a lot of resources.

Once the cluster is ready, you can continue installing ArgoCD. Argo CD is an open-source GitOps continuous delivery tool. It monitors your cluster and your declaratively defined infrastructure stored in a Git repository and resolves differences between the two.

Install ArgoCD

Run the following commands:

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

This will install ArgoCD on your new cluster. Note that ArgoCD can deploy applications within the same cluster or remote clusters. For the sake of scope, we will deploy within the same cluster.

Next you’ll install the ArgoCD CLI:

brew install argocd

(If you're using a different operating system, please visit this link for install instructions.)

By default, the Argo CD API server is not exposed with an external IP. To access the API server, run the following command:

kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'

Next, you’ll configure access credentials:

argocd admin initial-password -n argocd

We won't expose ArgoCD directly, but rather leverage port forwarding to access the API server and the web interface:

kubectl port-forward svc/argocd-server -n argocd 8080:443

Next, you'll log in with username admin and the password you configured:

argocd login localhost:8080

After you’re logged in, you’ll also need to configure a deployment target. In this case, we'll use the same cluster ArgoCD runs on:

argocd cluster add <your clustername>.us-east-2.eksctl.io

Now you’re good to go!

Continuous deployment and GitOps

Per my previous blog post, I showed you how to set up a new Kafka cluster on Kubernetes in minutes. While that process will get you up and running quickly, the real value comes with an automated infrastructure setup. For that, we want to leverage a GitOps-based approach to deploy Apache ZooKeeper^TM and Kafka, using ArgoCD.

We will use https://github.com/banzaicloud/koperator as a starting point and turn its instructions into Helm Charts, which we will deploy with ArgoCD.

Helm Charts

A Helm Chart allows you to make application installations on Kubernetes reproducible. Usually, to install an application on Kubernetes, you'd run a series of kubectl apply commands to create various resources, such as deployments, pods, services, etc. A Helm Chart bundles these resources and allows us to parameterize them. A Helm Chart can be installed several times within a Kubernetes cluster, under different names, so-called release. We'll use a single repository that will hold two Helm Charts--one for ZooKeeper and one for Koperator/Kafka.

ZooKeeper

To install and run ZooKeeper, we'll use the ZooKeeper from charts.pravega.io. This operator does the heavy lifting.

The ZooKeeper Helm Chart lists it as dependency in Chart.yaml:

#Chart.yaml
apiVersion: v2
name: -setup
description: Installs  aplication
version: 0.1.0
appVersion: "1.16.0"
dependencies:
  - name: zookeeper-operatorsitory: https://charts.pravega.io
    version: "0.2.15"

Next, let's take a look at templates/zookeeper-cluster.yml:

apiVersion: zookeeper.pravega.io/v1beta1
kind: ZookeeperCluster
metadata:
    name: {{.Values.zookeeper.name }} 
    namespace: {{.Values.zookeeper.namespace }} 
spec:
    replicas: {{.Values.zookeeper.replicas }}
    persistence:
        reclaimPolicy: {{.Values.zookeeper.reclaimPolicy }}

When executed, this yaml code creates a new ZooKeeper cluster. The template file does not contain specific values, only template variables. Since Helm allows us to install a chart more than once, this yaml code is parameterized, so the user can override values such as the cluster name or namespace.

Helm allows us to provide values in different ways. For now, we stick with a few defaults defined in values.yaml:

# Default values for koperator-setup.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

zookeeper:
  replicas: 1
  name: zookeeper
  namespace: zookeeper
  reclaimPolicy: Delete

As you can see here, all variables have sensible default values, allowing you to get up and running without edits. It's possible to override certain values on the command line when executing Helm.

With this Helm chart in place, let's move on to configuring ArgoCD. ArgoCD can deploy from various sources, such as Helm repositories or Git repositories. In this case, we're going with the latter. While we're choosing to create everything on the command line for consiseness, you can also perform the same steps in the web interface.

Let's apply what we have:

argocd app create zookeeper --repo https://github.com/schultyy/koperator-install.git --path zookeeper --dest-namespace zookeeper --dest-server https://kubernetes.default.svc --sync-option Server-Side-Apply=true --sync-option createnamespace=true
argocd app sync zookeeper

First, we create a new application and provide some configuration options, such as the git repository, namespace and server.

Next, we sync the application, so it starts a deployment run. While the first step completes quickly, the second might take a few minutes to complete succesfully. If you want to confirm successful rollout of all resources inspect visit the dashboard.

Now, whenever we make any changes to our ZooKeeper Helm chart, we sync the app and ArgoCD takes care of rolling out the changes automatically.

Koperator and Kafka

Next on our list is Koperator and Kafka. We'll use Koperator to spin up and manage Kafka instances.

This Helm Chart, similar to ZooKeeper, makes use of several Custom Resource Definitions (CRDs). These CRDs will allow us later to:

Define and configure new Kafka clusters
Define Kafka topics

For Koperator itself, the project's README suggests to install CRDs in a separate step. Therefore, we add all CRDs from https://github.com/banzaicloud/koperator/releases/download/v0.24.1/kafka-operator.crds.yaml to this chart into a separate crd directory. We need to add the CRDs to the Chart.yml:

apiVersion: v2
name: koperator
description: A Helm chart to install Koperator
type: application
version: 0.1.0
appVersion: "1.16.0"
dependencies:
  - name: kafka-operator
    repository: https://kubernetes-charts.banzaicloud.com
    version: "0.24.1"
# CRDs are defined here
crds:
  - kafkatopics.crds.yml
  - cruisecontrol.crds.yml
  - kafkauser.crds.yml
  - kafkacluster.crds.yml

We also define our dependency to the Koperator Helm Chart.

When the Helm chart gets installed, Helm will automatically install these CRDs for us.

To get Kafka up and running, the main aspect will be the simplekafkacluster.yml file in the templates directory. This file also comes from the Koperator README. The only change made to it is replacing some of the values with template variables, so we can parameterize it.

Additionally, this chart also contains a template to create one or more Kafka topics:

{{- range .Values.topics }}
---
apiVersion: kafka.banzaicloud.io/v1alpha1
kind: KafkaTopic
metadata:
    name: {{ .name }} 
spec:
    clusterRef:
        name: {{ $.Values.kafka.name }}
    name: {{ .name }}
    partitions: {{ .partitions }}
    replicationFactor: {{ .replicationFactor }}
{{- end }}

values.yml has everything we need to start creating Kafka clusters:

# Install the Kafka cluster
kafka:
  name: kafka
  readOnlyConfig:
    autoCreateTopics: false

#Setup one or more topics
topics:
  - name: my-topic
    partitions: 1
    replicationFactor: 1

With this in place, we are all set to create a new Kafka cluster and the first topic:

argocd app create koperator --repo https://github.com/schultyy/koperator-install.git --path koperator --dest-namespace kafka --dest-server https://kubernetes.default.svc --sync-option Server-Side-Apply=true --sync-option validate=false --sync-option createnamespace=true
argocd app sync koperator

What's next?

With this workflow in place, from now on you can declaratively deploy and manage your Kafka cluster. With a few modifications to the Helm charts, you can create as many Kafka cluster as you like or need.

Check out the Helm charts used for these examples here and make sure to also star the koperator repository!

DEV Community

Install and Maintain Apache Kafka® with a GitOps Approach

Setup

Provision a new cluster

Install ArgoCD

Continuous deployment and GitOps

Helm Charts

ZooKeeper

Koperator and Kafka

What's next?

Top comments (0)

Read next

Docker to the Rescue: Deploying React And FastAPI App With Monitoring

AWS Data Storage Options: S3, EBS, and EFS

Say Goodbye to tedious Code Reviews

DevOps Engineer Skills