We have talked about how to design a real-time data pipeline, and one of the important components is Debezium. In order to load OLTP databases from various microservices into the data warehouse seamlessly, we often rely on Debezium to do this task.
In addition, in previous article, we mentioned the evolutionary features of Apache Avro and the usage of Confluent Schema Registry, which can be integrated to build a more automated real-time data pipeline.
Nevertheless, since Debezium 2.0, it removes the native Confluent support, and if we want to use Debezium with Confluent schema registry, we have to build the Debezium Connect images manually.
The reference is as follows.
https://debezium.io/documentation/reference/stable/configuration/avro.html#deploying-confluent-schema-registry-with-debezium-containers
The original image repository is at GitHub. We can follow the Dockerfile of connect-base/2.0
to build the connect-base
image.
In addition to the original dependencies, we need more about Confluent supports.
- kafka-connect-avro-converter
- kafka-connect-avro-data
- kafka-avro-serializer
- kafka-schema-serializer
- kafka-schema-registry-client
- common-config
- common-utils
Therefore, Dockerfile should be as follows.
ARG DEBEZIUM_DOCKER_REGISTRY_PRIMARY_NAME
FROM $DEBEZIUM_DOCKER_REGISTRY_PRIMARY_NAME/kafka:2.0
LABEL maintainer="Debezium Community"
USER root
RUN microdnf -y install libaio && microdnf clean all
USER kafka
EXPOSE 8083
VOLUME ["/kafka/data","/kafka/logs","/kafka/config"]
COPY docker-entrypoint.sh /
COPY --chown=kafka:kafka log4j.properties $KAFKA_HOME/config/log4j.properties
COPY docker-maven-download.sh /usr/local/bin/docker-maven-download
#
# Set up the plugins directory ...
#
ENV KAFKA_CONNECT_PLUGINS_DIR=$KAFKA_HOME/connect \
EXTERNAL_LIBS_DIR=$KAFKA_HOME/external_libs \
CONNECT_PLUGIN_PATH=$KAFKA_CONNECT_PLUGINS_DIR \
MAVEN_DEP_DESTINATION=$KAFKA_HOME/libs \
CONFLUENT_VERSION=7.0.1 \
AVRO_VERSION=1.10.1 \
APICURIO_VERSION=2.2.5.Final \
GUAVA_VERSION=31.0.1-jre
RUN mkdir "$KAFKA_CONNECT_PLUGINS_DIR" "$EXTERNAL_LIBS_DIR"
#
# The `docker-entrypoint.sh` script will automatically discover the child directories
# within the $KAFKA_CONNECT_PLUGINS_DIR directory (e.g., `/kafka/connect`), and place
# all of the files in those child directories onto the Java classpath.
#
# The general recommendation is to create a separate child directory for each connector
# (e.g., "debezium-connector-mysql"), and to place that connector's JAR files
# and other resource files in that child directory.
#
# However, use a single directory for connectors when those connectors share dependencies.
# This will prevent the classes in the shared dependencies from appearing in multiple JARs
# on the classpath, which results in arcane NoSuchMethodError exceptions.
#
RUN docker-maven-download confluent kafka-connect-avro-converter "$CONFLUENT_VERSION" fd03a1436f29d39e1807e2fb6f8e415a && \
docker-maven-download confluent kafka-connect-avro-data "$CONFLUENT_VERSION" d27f30e9eca4ef1129289c626e9ce1f1 && \
docker-maven-download confluent kafka-avro-serializer "$CONFLUENT_VERSION" c72420603422ef54d61f493ca338187c && \
docker-maven-download confluent kafka-schema-serializer "$CONFLUENT_VERSION" 9c510db58119ef66d692ae172d5b1204 && \
docker-maven-download confluent kafka-schema-registry-client "$CONFLUENT_VERSION" 7449df1f5c9a51c3e82e776eb7814bf1 && \
docker-maven-download confluent common-config "$CONFLUENT_VERSION" aab5670de446af5b6f10710e2eb86894 && \
docker-maven-download confluent common-utils "$CONFLUENT_VERSION" 74bf5cc6de2748148f5770bccd83a37c && \
docker-maven-download central org/apache/avro avro "$AVRO_VERSION" 35469fee6d74ecbadce4773bfe3a204c && \
docker-maven-download apicurio "$APICURIO_VERSION" f7874b2e2a59dfa829242d9a52b44230 && \
docker-maven-download central com/google/guava guava "$GUAVA_VERSION" bb811ca86cba6506cca5d415cd5559a7
ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["start"]
There are three required environments to specify those versions of dependencies.
- CONFLUENT_VERSION: 7.0.1
- AVRO_VERSION: 1.10.1
- GUAVA_VERSION: 31.0.1
Confluent packages rely on Guava, which is a common util package.
The latest Confluent version is 7.3.3 until now.
Avro and Guava are located at official maven repo.
After having connect-base
, we can build connect
like main Dockerfile except the base image, which should be our owned.
Conclusion
Once the image is built, how do we verify the image works?
Here is an official Debezium tutorial.
https://github.com/debezium/debezium-examples/tree/main/tutorial
If the built image can pass the tutorial, then the image works.
By the way, if you need a Debezium 2.0 with Confluent 7.0.1 image, here's a ready-made.
https://hub.docker.com/r/wirelessr/debezium-connect-avro
Top comments (3)
Hey just a question, how did you know that it needs Avro & Guava? In the documentation, they didn't mention about those
It's been a long time and I've forgotten, but I think I found it after testing. By the way, Avro is what I would have needed with Confluent Registry.
Thank you anyway! The post was really helpful. I wasn't able to make Debezium work with Avro despite following all the docs