This post was originally published on my personal blog.
Apache Kafka, a distributed streaming platform, has become a key component of many organization's infrastructure and data platforms. As adoption of Kafka grows across an organization, it is important to manage the creation of topics and access control lists (ACLs) in a centralized and standardized manner. Without proper procedures around topic and ACL management, Kafka clusters can quickly become hard to manage from a data governance and security standpoint.
Today, I'll be discussing how to automate Kafka topic and ACL management and how it can be done with a continuous integration/continuous delivery (CI/CD) pipeline. I'll explain how to do this while following GitOps patterns: all topics and ACLs will be stored in version control. This is a model followed by many companies, both small and large, and can be applied to any Kafka cluster.
Although I'll be discussing in terms of organizations, these processes can be applied to local development clusters and smaller Kafka implementations as well.
As most developers who have used Kafka know, it is quite easy to create topics. They can be created through a single usage of the
kafka-topics tool or with various user interfaces. Before jumping into our tutorial, let's dive into some background.
Outside of the tools mentioned above, it is even easier to create topics – they are automatically created due to the broker configuration
auto.create.topics.enabled being set to
true by default. Although this configuration makes it easy to create topics, it is considered by most to be a bad practice. With some platforms, such as Confluent Cloud, it is even impossible to enable auto topic creation.
Allowing the automatic creation of topics can be problematic:
- Security and access control become a lot harder to manage.
- Test topics and unused topics end up in the cluster and likely do not get cleaned up.
- Any developer or any service can create topics without giving thought to proper partitioning and potential overhead.
Outside of a development cluster, every topic should have a purpose that is understood and has an underlying business need to justify its existence. Additionally, allowing automatic topic creation does not solve the need for creating and managing ACLs.
The next logical step most organizations take is to create topics manually through tools such as
kafka-topics or Confluent Control Center. This usually happens when Kafka is fairly new to an organization or used by a small group of people, e.g. a team or two.
Manually creating topics and ACLs only works until the usage of Kafka within an organization starts to grow. There are typically two patterns that are followed by with manual topic creation:
Anyone has access: All developers/operations team members who can access the cluster can create topics as well as ACLs. This leads to topic naming standards and security best practices being thrown out the window. If anyone can make ACLs, there is no real security on the cluster.
Operations has access: A centralized operations team manages topics & ACLs manually through a change management/request process. Although this allows for some governance to be enforced, it leaves an operations team doing manual work.
A major issue with manual topic & ACL creation is that it is not repeatable. It may be enticing to use a web interface to quickly create topics, but more-often-than-not it becomes a pain-point in the future.
Imagine a scenario where you want to migrate to a new cluster or spin up a new environment; how easy is it to re-create all of the topics, topic configurations, and ACLs if they are not defined and easily accessible? It's pretty hard.
After manual topic & ACL creation becomes a limiting factor, teams usually seek to build tooling and automation around it. Most organizations in today's world are automating as much as they can. We see automation around immutable infrastructure, deploying applications, managing business processes, and much more.
The first step in automating the creation of Kafka resources is usually a simple python or bash script. Teams might define their topics and ACLs in files such as JSON or YAML. These scripts are then either run by teams themselves or included in a continuous integration process.
Unfortunately, these scripts are usually quick-and-dirty. They often cannot easily change topic configurations, delete unneeded topics, or provide insight into what your actual cluster has defined in terms of topics and ACLs. Lastly, ACLs can be quite verbose: it can be hard to understand the needed ACLs depending on the complexity of the application and its security needs (e.g. Kafka Connect is much more complicated than a simple consumer).
GitOps, as commonly found in Kubernetes deployment models, is a pattern centered around using a version control system (such as Git) to house information and code describing a system. This information is then used in an automated fashion to make changes to infrastructure (such as deploying a new Kubernetes workload).
This pattern is essentially how most implementations of Terraformwork: infrastructure gets defined in Terraform state files, a plan with the desired changes is generated, and then the plan is executed to apply the desired changes.
Note: This blog post describes how to manage topics & ACLs with GitOps, and not an actual Apache Kafka cluster deployment.
In this tutorial, I'll be introducing a tool called kafka-gitops. This project is a resources-as-code tool which allows users to automate the management of Apache Kafka topics and ACLs. Before we dive in, I'd like to introduce some terminology:
- Desired State: A file describing what your Kafka cluster state should look like.
- Actual State: The live state of what your Kafka cluster currently looks like.
- A Plan: A set of topic and/or ACL changes to apply to your Kafka cluster.
Topics and services are defined in a YAML desired state file. When run,
kafka-gitops compares your desired state to the actual state of the cluster and generates a plan to execute against the cluster. The plan will include any creates, updates, or deletes to topics, topic configurations, and ACLs. After validating the plan looks correct, it can be applied and will make your topics and ACLs match your desired state.
On top of topic management, if your cluster has security,
kafka-gitops can generate the needed ACLs for most applications. There is no need to manually define a bunch of ACLs for Kafka Connect or Kafka Streams. By defining your services,
kafka-gitops will build the applicable ACLs.
The major features of
kafka-gitops compared to other management tools:
- 🚀 Built For CI/CD: Made for CI/CD pipelines to automate the management of topics & ACLs.
- 🔥 Configuration as code: Describe your desired state and manage it from a version-controlled declarative file.
- 👍 Easy to use: Deep knowledge of Kafka administration or ACL management is NOT required.
- ⚡️️ Plan & Apply: Generate and view a plan with or without executing it against your cluster.
- 💻 Portable: Works across self-hosted clusters, managed clusters, and even Confluent Cloud clusters.
- 🦄 Idempotency: Executing the same desired state file on an up-to-date cluster will yield the same result.
- ☀️ Continue from failures: If a specific step fails during an apply, you can fix your desired state and re-run the command. You can execute
kafka-gitopsagain without needing to rollback any partial successes.
I'll provide an overview of how
kafka-gitops works and how it can be applied to any Kafka cluster. An in-depth tutorial on how to use it will be posted in the next blog post; otherwise, the documentation has a great getting started guide.
Reminder: This tool works on all newer Kafka clusters; including self-hosted Kafka, managed Kafka solutions, and Confluent Cloud.
Topics and services that interact with your Kafka cluster are defined in a YAML file, named
state.yaml by default.
Example desired state file:
topics: test-topic: partitions: 6 replication: 3 configs: cleanup.policy: compact services: test-service: type: application principal: User:testservice produces: - test-topic
This state file defines two things:
- A compacted topic named test-topic with six partitions and a replication factor of three.
- An application service named test-service tied to the principal
The type of the service tells
kafka-gitops what type of ACLs to generate. In the case of application, it will generate the needed ACLs for producing to and/or consuming from its specified topics. In this case,
kafka-gitops will generate a
WRITE ACL for the topic test-topic.
Currently, we support three types of services:
kafka-streams. Each service has a slightly different schema due to the nature of the service.
Example Kafka Streams service:
services: my-stream: type: kafka-streams principal: User:mystream consumes: - test-topic produces: - test-topic
Kafka Streams services have special ACLs included for managing internal streams topics.
Example Kafka Connect service:
services: my-connect-cluster: type: kafka-connect principal: User:myconnect connectors: rabbitmq-sink: consumes: - test-topic
Kafka Connect services have special ACLs for working with their internal topics as well as defined ACLs for each running connector.
Essentially, all topics and all services for a specific cluster get put into this YAML file. If you are not using security, such as on a local development cluster, you can omit the services block.
Note : For full examples and specific requirements for each service, read the services documentation page. The specification for the desired state file and its schema can be found on the specification documentation page.
Once your desired state file is created, you can generate a plan of changes to be applied against the cluster.
kafka-gitopsis configured to connect to clusters via environment variables. See the documentation for more details.
This does NOT actually change the cluster. We can generate the plan by running:
kafka-gitops -f state.yaml plan -o plan.json
This will output a JSON file with the plan as well as a prettified output describing the changes. This is an example plan for the first
state.yaml file described when including only the topics block:
Generating execution plan... An execution plan has been generated and is shown below. Resource actions are indicated with the following symbols: + create ~ delete The following actions will be performed: Topics: 1 to create, 0 to update, 0 to delete. + [TOPIC] test-topic ACLs: 0 to create, 0 to update, 0 to delete. Plan: 1 to create, 0 to update, 0 to delete.
If there are topics or ACLs on the cluster that are not in the desired state file, the plan will include changes to update and/or delete them.
Note: It is possible to disable deletion by passing the
Once the plan is created, we can apply the changes to the cluster.
Warning: This WILL change the cluster to match the plan generated from the desired state file. Without the
--no-deleteflag, this can be destructive.
Changes are applied using the apply command:
kafka-gitops -f state.yaml apply -p plan.json
This will execute the changes to the running Kafka cluster and output the results.
Executing apply... Applying: [CREATE] + [TOPIC] test-topic Successfully applied. [SUCCESS] Apply complete! Resources: 1 created, 0 updated, 0 deleted.
If there is a partial failure, successes will not be rolled back. Instead, fix the error in the desired state file or manually within the cluster and rerun plan and apply.
After a successful apply, you can re-run the plan command to generate a new plan – except this time, there should be no changes, since your cluster is up to date with your desired state file!
On top of the brief description of the features above,
- Automatically creating Confluent Cloud service accounts.
- Splitting the
servicesblocks into their own files.
- Ignoring specific topics from being deleted when not defined in the desired state file.
- Defining custom ACLs to a specific service (e.g. for a service such as Confluent Control Center).
Now that we've had an overview of how kafka-gitops works, we can examine how to put this workflow into action within an organization. First, we can define typical roles within an organization:
- Developers: Engineers who are writing applications and services utilizing Kafka.
- Operations: Engineers who manage, monitor, and maintain Kafka infrastructure.
- Security: Engineers who are responsible for security operations within an organization.
Next, we can define an example setup and process for a GitOps workflow. This is not a one-size-fits-all answer – a lot depends on the organization and culture; however, this is a generalized approach that will work well if implemented correctly.
A scalable implementation of the kafka-gitops workflow within an organization looks like this:
- All desired state files are stored within a repository owned by Operations.
Operations owns the
masterbranch, which should reflect the live state of every cluster.
- Developers fork this repository to make changes to their topics & services.
- Developers create a pull request with their changes and mark it ready to review by Operations and Security.
Operations and Security review the changes and merge to
- A CI/CD system kicks off a
kafka-gitops planbuild to generate a new plan.
- (Optional) The plan output is reviewed by Operations, ensuring it looks correct.
- The plan is then applied, either manually by Operations or automatically, through
kafka-gitops apply. The desired changes will then be reflected in the live cluster and the cluster will match the desired state file in
As described above, all topics and services (which includes ACLs) are defined in version-controlled code. Developers are responsible for their topic and service definitions. Operations is responsible for managing the changes to the cluster (e.g. ensuring teams are not doing crazy things) as well as responsible for deploying the changes. Security is responsible for ensuring sensitive data is being properly locked down to the services that require it.
- Create a centralized git repository for storing Kafka cluster desired state files.
- In that repository, create folders for each environment and/or cluster.
- In each cluster's folder, create its state file. Define any existing topics, services, and ACLs.
Note: If adding this workflow to an existing Kafka cluster: the easiest way to get it set up is to continually run
planagainst the live cluster as you update the desired state file to contain the correct information. Continue to do this until there are no changes planned.
Setting up CI/CD is highly dependent on which build system you are using. This is a general outline of how it could be configured:
- Set up a main CI job that is triggered on changes to the
- The main job should look for changes in each desired state file.
- For each desired state file with a change, trigger a side job.
- The side job(s) should utilize
kafka-gitops planto generate an execution plan.
- (Optional) The side job(s) should then wait until Operations can review the generated plan.
- The side job(s) should then utilize
kafka-gitops applyto execute the planned changes to the specified Kafka cluster.
Once the full process is in place, you gain many benefits that allow you to easily govern the clusters as the adoption of Kafka continues within an organization.
- Developers have a well-defined process to follow to create topics & services.
- Operations has control over what is changing within the cluster and can ensure standards are followed.
- Security can easily audit and monitor access changes to data within the streaming platform.
- A defined process to make any changes to the Kafka cluster; no manual steps.
- A full audit log and history of changes to your cluster via version control.
- Automatic ACL generation for common services, reducing time spent on security.
- The ability to re-create a cluster's complete topic and ACL setup (e.g. for a new environment).
kafka-gitops is actively being used in production, there are a few upcoming features to address some limitations:
- The ability to set a custom
group.idfor consumers & streams applications (currently, this must match the service name)
- The ability to set custom connect topic names (currently, this has a predefined pattern)
- Tooling around creating the initial desired state file from existing clusters
- Eventually, the optional ability to run it as-a-service to actively monitor for changes and source from locations such as git, AWS S3, etc.
Automating the management of Kafka topics and ACLs brings significant benefits to all teams working with Apache Kafka. Whether working with a large enterprise set of clusters or defining topics for your local development cluster, the GitOps pattern allows for easy, repeatable cluster resource definitions.
By adopting a GitOps pattern for managing your Kafka topics and ACLs, your organization can reduce time spent managing Kafka and spend more time providing value to your core business.
In some upcoming blog posts, I will be providing in-depth tutorials on using
kafka-gitops with self-hosted clusters and with Confluent Cloud.