Apache Kafka is a distributed streaming platform that can act as a message broker, as the heart of a stream processing pipeline, or even as the backbone of a large enterprise data synchronization system. Kafka is not only a highly-available and fault-tolerant system; it also handles vastly higher throughput compared to other message brokers such as RabbitMQ or ActiveMQ.
In this tutorial, you will utilize Docker & Docker Compose to run Apache Kafka & ZooKeeper. Docker with Docker Compose is the quickest way to get started with Apache Kafka and to experiment with clustering and the fault-tolerant properties Kafka provides. A full Docker Compose setup with 3 Kafka brokers and 1 ZooKeeper node can be found here.
Prerequisites
To complete this tutorial, you will need:
- A UNIX environment (Mac or Linux)
- Docker & Docker Compose
Note: Docker can be installed by following the official installation guide.
System Architecture
Before running Kafka with Docker, let's examine the architecture of a simple Apache Kafka setup.
- Kafka Cluster : A group of Kafka brokers forming a distributed system
- Kafka Broker : An instance of Kafka that holds topics of data
- ZooKeeper : A centralized system for storing and managing configuration
- Producer : A client that sends messages to a Kafka topic
- Consumer : A client that read messages from a Kafka topic
Kafka utilizes ZooKeeper to manage and coordinate brokers within a cluster. Producers and consumers are the main clients that interact with Kafka, which we'll take a look at once we have a running Kafka broker.
The above diagram shows the architecture of the systems we are going to run in this tutorial. It also helps demonstrate how Kafka brokers utilize ZooKeeper and shows the ports of the running services. In this tutorial, we'll start by running one Apache Kafka broker and one ZooKeeper node (seen above in blue). Later on, we'll form a three node cluster by adding in two more Kafka brokers (seen above in green).
Running ZooKeeper in Docker
Ensure you have Docker installed and running. You can verify this by running the following command; you should see a similar output.
docker -v
> Docker version 18.09.2, build 6247962
Additionally, verify you have Docker Compose installed:
docker-compose -v
> docker-compose version 1.23.2, build 1110ad01
We're ready to begin! Create a directory, such as ~/kafka
, to store our Docker Compose files. Using your favorite text editor or IDE, create a file named docker-compose.yml
in your new directory.
We'll start by getting ZooKeeper running. In the Docker Compose YAML file, define a zookeeper
service as shown below:
version: '3'
services:
zookeeper:
image: zookeeper:3.4.9
hostname: zookeeper
ports:
- "2181:2181"
environment:
ZOO_MY_ID: 1
ZOO_PORT: 2181
ZOO_SERVERS: server.1=zookeeper:2888:3888
volumes:
- ./data/zookeeper/data:/data
- ./data/zookeeper/datalog:/datalog
A brief overview of what we're defining:
- Line
1
: docker compose file version number, set to3
- Line
4
: starting the definition of a ZooKeeper service - Line
5
: The docker image to use for ZooKeeper and its version - Line
6
: The hostname the container will use when running - Lines
7-8
: The ports to expose to the host; ZooKeeper's default port - Line
10
: The unique ID of this ZooKeeper instance, set to1
- Line
11
: The port this ZooKeeper instance should run with - Line
12
: The list of ZooKeeper servers; in our case just one - Lines
13-15
: Mapping volumes on the host to store ZooKeeper data
Note: We've mapped ./data/zookeeper
on the host to directories within the container. This allows ZooKeeper to persist data even if you destroy the container.
We can now start ZooKeeper by running the following command in the directory containing the docker-compose.yml
file:
docker-compose up
Logs will start printing, and should end with a line similar to this:
zookeeper_1 | ... binding to port 0.0.0.0/0.0.0.0:2181
Congrats! ZooKeeper is running and exposed on port 2181. You can verify this utilizing netcat in a new terminal window:
echo ruok | nc localhost 2181
> imok
Running Kafka In Docker
We can now add our first kafka
service to our Docker Compose file. We're calling this kafka2
as it will have a broker id of 2
and run on the default port of 9092
. Later on, we'll add in kafka1
and kafka3
. This is to demonstrate that order does not matter and broker id
s are just for identification.
version: '3'
services:
...
kafka2:
image: confluentinc/cp-kafka:5.3.0
hostname: kafka2
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka2:19092,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
KAFKA_BROKER_ID: 2
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
volumes:
- ./data/kafka2/data:/var/lib/kafka/data
depends_on:
- zookeeper
If you prefer, copy the full gist found here. A brief overview of what we're defining:
- Line
6
: The docker image to use for Kafka; we're using the Confluent image - Line
7
: The hostname this Kafka broker will use when running - Line
8-9
: The ports to expose; set to Kafka's default (9092
) - Line
11
: Kafka's advertised listeners. Robin Moffatt has a great blog post about this. - Line
12
: Security protocols to use for each listener. - Line
13
: The inter-broker listener name (used for internal communication) - Line
14
: The list of ZooKeeper nodes Kafka should use - Line
15
: The broker ID of this Kafka broker. - Line
16
: The replication factor of the consumer offset topic (1
for one broker) - Lines
17-18
: Mapping volumes on the host to store Kafka data - Lines
19-20
: Start the ZooKeeper service before the Kafka service
Let's start the Kafka broker! In a new terminal window, run the following command in the same directory:
docker-compose up
ZooKeeper should still be running in another terminal, and if it isn't, Docker Compose will start it. You'll see a lot of logs being printed and then Kafka should be running! We can verify this by creating a topic.
If you have the Kafka command line tools installed, run:
kafka-topics --zookeeper localhost:2181 --create --topic new-topic --partitions 1 --replication-factor 1
> Created topic "new-topic".
If you don't have the Kafka command line tools installed, you can run a command using Docker as well:
docker exec -it kafka_kafka2_1 kafka-topics --zookeeper zookeeper:2181 --create --topic new-topic --partitions 1 --replication-factor 1
> Created topic "new-topic".
If you get any errors, verify both Kafka and ZooKeeper are running with docker ps
and check the logs from the terminals running Docker Compose.
Yay! You now have the simplest Kafka cluster running within Docker. Kafka with broker id 2
is exposed on port 9092
and ZooKeeper on port 2181
. Data for this Kafka cluster is stored in ./data/kafka2
.
To stop the containers, you can use ctrl + c
or cmd + c
on the running Docker Compose terminal windows. If they don't stop, you can run docker-compose down
. To remove the containers if they don't get removed as a part of down
, you can run docker-compose rm
.
Running Three Kafka Brokers In Docker
To run three brokers, we need to add two more kafka
services to our Docker Compose file. We'll run broker 1
on port 9091
and broker 3
on port 9093
.
Add two more services as so:
version: "3"
services:
...
kafka1:
image: confluentinc/cp-kafka:5.3.0
hostname: kafka1
ports:
- "9091:9091"
environment:
KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka1:19091,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9091
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
KAFKA_BROKER_ID: 1
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
volumes:
- ./data/kafka1/data:/var/lib/kafka/data
depends_on:
- zookeeper
kafka3:
image: confluentinc/cp-kafka:5.3.0
hostname: kafka3
ports:
- "9093:9093"
environment:
KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka3:19093,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9093
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
KAFKA_BROKER_ID: 3
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
volumes:
- ./data/kafka3/data:/var/lib/kafka/data
depends_on:
- zookeeper
You can find a full gist with ZooKeeper and three Kafka brokers here. Essentially, we update the ports, the broker ID, and the data directory on the host.
Note: In a production setup, you'd want the offset topic replication factor to be set higher than 1, but for the purposes of this tutorial I've left it at one since we started with one broker.
We can now verify that all three brokers are running by creating a topic with a replication factor of 3:
docker exec -it kafka_kafka2_1 kafka-topics --zookeeper zookeeper:2181 --create --topic three-isr --partitions 1 --replication-factor 3
> Created topic "three-isr".
If you receive an error, ensure all three Kafka clusters are running. Woohoo! You've now got a Kafka cluster with three brokers running.
Conclusion
Congrats! You've successfully started a local Kafka cluster using Docker and Docker Compose. Data is persisted outside of the container on the local machine which means you can delete containers and restart them without losing data. For next steps, I'd suggest playing around with Kafka's fault tolerance and replication features.
For example, you could create a topic with a replication factor of 3
, produce some data, delete broker 2
, delete broker 2's data directory (./data/kafka2
), and start broker 2
and see that the data is replicated to the new broker. Pretty cool!
For full sets of Docker Compose files for running various Kafka Cluster setups, check out Stephane Maarek's kafka-stack-docker-compose repository. This post was inspired by it. :-).
Top comments (0)