We have talked about how to design a real-time data pipeline, and one of the important components is Debezium. In order to load OLTP databases from various microservices into the data warehouse seamlessly, we often rely on Debezium to do this task.
In addition, in previous article, we mentioned the evolutionary features of Apache Avro and the usage of Confluent Schema Registry, which can be integrated to build a more automated real-time data pipeline.
Nevertheless, since Debezium 2.0, it removes the native Confluent support, and if we want to use Debezium with Confluent schema registry, we have to build the Debezium Connect images manually.
The original image repository is at GitHub. We can follow the Dockerfile of
connect-base/2.0 to build the
In addition to the original dependencies, we need more about Confluent supports.
Therefore, Dockerfile should be as follows.
ARG DEBEZIUM_DOCKER_REGISTRY_PRIMARY_NAME FROM $DEBEZIUM_DOCKER_REGISTRY_PRIMARY_NAME/kafka:2.0 LABEL maintainer="Debezium Community" USER root RUN microdnf -y install libaio && microdnf clean all USER kafka EXPOSE 8083 VOLUME ["/kafka/data","/kafka/logs","/kafka/config"] COPY docker-entrypoint.sh / COPY --chown=kafka:kafka log4j.properties $KAFKA_HOME/config/log4j.properties COPY docker-maven-download.sh /usr/local/bin/docker-maven-download # # Set up the plugins directory ... # ENV KAFKA_CONNECT_PLUGINS_DIR=$KAFKA_HOME/connect \ EXTERNAL_LIBS_DIR=$KAFKA_HOME/external_libs \ CONNECT_PLUGIN_PATH=$KAFKA_CONNECT_PLUGINS_DIR \ MAVEN_DEP_DESTINATION=$KAFKA_HOME/libs \ CONFLUENT_VERSION=7.0.1 \ AVRO_VERSION=1.10.1 \ APICURIO_VERSION=2.2.5.Final \ GUAVA_VERSION=31.0.1-jre RUN mkdir "$KAFKA_CONNECT_PLUGINS_DIR" "$EXTERNAL_LIBS_DIR" # # The `docker-entrypoint.sh` script will automatically discover the child directories # within the $KAFKA_CONNECT_PLUGINS_DIR directory (e.g., `/kafka/connect`), and place # all of the files in those child directories onto the Java classpath. # # The general recommendation is to create a separate child directory for each connector # (e.g., "debezium-connector-mysql"), and to place that connector's JAR files # and other resource files in that child directory. # # However, use a single directory for connectors when those connectors share dependencies. # This will prevent the classes in the shared dependencies from appearing in multiple JARs # on the classpath, which results in arcane NoSuchMethodError exceptions. # RUN docker-maven-download confluent kafka-connect-avro-converter "$CONFLUENT_VERSION" fd03a1436f29d39e1807e2fb6f8e415a && \ docker-maven-download confluent kafka-connect-avro-data "$CONFLUENT_VERSION" d27f30e9eca4ef1129289c626e9ce1f1 && \ docker-maven-download confluent kafka-avro-serializer "$CONFLUENT_VERSION" c72420603422ef54d61f493ca338187c && \ docker-maven-download confluent kafka-schema-serializer "$CONFLUENT_VERSION" 9c510db58119ef66d692ae172d5b1204 && \ docker-maven-download confluent kafka-schema-registry-client "$CONFLUENT_VERSION" 7449df1f5c9a51c3e82e776eb7814bf1 && \ docker-maven-download confluent common-config "$CONFLUENT_VERSION" aab5670de446af5b6f10710e2eb86894 && \ docker-maven-download confluent common-utils "$CONFLUENT_VERSION" 74bf5cc6de2748148f5770bccd83a37c && \ docker-maven-download central org/apache/avro avro "$AVRO_VERSION" 35469fee6d74ecbadce4773bfe3a204c && \ docker-maven-download apicurio "$APICURIO_VERSION" f7874b2e2a59dfa829242d9a52b44230 && \ docker-maven-download central com/google/guava guava "$GUAVA_VERSION" bb811ca86cba6506cca5d415cd5559a7 ENTRYPOINT ["/docker-entrypoint.sh"] CMD ["start"]
There are three required environments to specify those versions of dependencies.
- CONFLUENT_VERSION: 7.0.1
- AVRO_VERSION: 1.10.1
- GUAVA_VERSION: 31.0.1
Confluent packages rely on Guava, which is a common util package.
The latest Confluent version is 7.3.3 until now.
connect-base, we can build
connect like main Dockerfile except the base image, which should be our owned.
Once the image is built, how do we verify the image works?
Here is an official Debezium tutorial.
If the built image can pass the tutorial, then the image works.
By the way, if you need a Debezium 2.0 with Confluent 7.0.1 image, here's a ready-made.