DEV Community

ChunTing Wu
ChunTing Wu

Posted on

Making Debezium 2.x Support Confluent Schema Registry

We have talked about how to design a real-time data pipeline, and one of the important components is Debezium. In order to load OLTP databases from various microservices into the data warehouse seamlessly, we often rely on Debezium to do this task.

In addition, in previous article, we mentioned the evolutionary features of Apache Avro and the usage of Confluent Schema Registry, which can be integrated to build a more automated real-time data pipeline.

Nevertheless, since Debezium 2.0, it removes the native Confluent support, and if we want to use Debezium with Confluent schema registry, we have to build the Debezium Connect images manually.

The reference is as follows.
https://debezium.io/documentation/reference/stable/configuration/avro.html#deploying-confluent-schema-registry-with-debezium-containers

The original image repository is at GitHub. We can follow the Dockerfile of connect-base/2.0 to build the connect-base image.

In addition to the original dependencies, we need more about Confluent supports.

  • kafka-connect-avro-converter
  • kafka-connect-avro-data
  • kafka-avro-serializer
  • kafka-schema-serializer
  • kafka-schema-registry-client
  • common-config
  • common-utils

Therefore, Dockerfile should be as follows.

ARG DEBEZIUM_DOCKER_REGISTRY_PRIMARY_NAME
FROM $DEBEZIUM_DOCKER_REGISTRY_PRIMARY_NAME/kafka:2.0

LABEL maintainer="Debezium Community"

USER root
RUN microdnf -y install libaio && microdnf clean all

USER kafka

EXPOSE 8083
VOLUME ["/kafka/data","/kafka/logs","/kafka/config"]

COPY docker-entrypoint.sh /
COPY --chown=kafka:kafka log4j.properties $KAFKA_HOME/config/log4j.properties
COPY docker-maven-download.sh /usr/local/bin/docker-maven-download

#
# Set up the plugins directory ...
#
ENV KAFKA_CONNECT_PLUGINS_DIR=$KAFKA_HOME/connect \
    EXTERNAL_LIBS_DIR=$KAFKA_HOME/external_libs \
    CONNECT_PLUGIN_PATH=$KAFKA_CONNECT_PLUGINS_DIR \
    MAVEN_DEP_DESTINATION=$KAFKA_HOME/libs \
    CONFLUENT_VERSION=7.0.1 \
    AVRO_VERSION=1.10.1 \
    APICURIO_VERSION=2.2.5.Final \
    GUAVA_VERSION=31.0.1-jre

RUN mkdir "$KAFKA_CONNECT_PLUGINS_DIR" "$EXTERNAL_LIBS_DIR"

#
# The `docker-entrypoint.sh` script will automatically discover the child directories
# within the $KAFKA_CONNECT_PLUGINS_DIR directory (e.g., `/kafka/connect`), and place
# all of the files in those child directories onto the Java classpath.
#
# The general recommendation is to create a separate child directory for each connector
# (e.g., "debezium-connector-mysql"), and to place that connector's JAR files
# and other resource files in that child directory.
#
# However, use a single directory for connectors when those connectors share dependencies.
# This will prevent the classes in the shared dependencies from appearing in multiple JARs
# on the classpath, which results in arcane NoSuchMethodError exceptions.
#
RUN docker-maven-download confluent kafka-connect-avro-converter "$CONFLUENT_VERSION" fd03a1436f29d39e1807e2fb6f8e415a && \
    docker-maven-download confluent kafka-connect-avro-data "$CONFLUENT_VERSION" d27f30e9eca4ef1129289c626e9ce1f1 && \
    docker-maven-download confluent kafka-avro-serializer "$CONFLUENT_VERSION" c72420603422ef54d61f493ca338187c && \
    docker-maven-download confluent kafka-schema-serializer "$CONFLUENT_VERSION" 9c510db58119ef66d692ae172d5b1204 && \
    docker-maven-download confluent kafka-schema-registry-client "$CONFLUENT_VERSION" 7449df1f5c9a51c3e82e776eb7814bf1 && \
    docker-maven-download confluent common-config "$CONFLUENT_VERSION" aab5670de446af5b6f10710e2eb86894 && \
    docker-maven-download confluent common-utils "$CONFLUENT_VERSION" 74bf5cc6de2748148f5770bccd83a37c && \
    docker-maven-download central org/apache/avro avro "$AVRO_VERSION" 35469fee6d74ecbadce4773bfe3a204c && \
    docker-maven-download apicurio "$APICURIO_VERSION" f7874b2e2a59dfa829242d9a52b44230 && \
    docker-maven-download central com/google/guava guava "$GUAVA_VERSION" bb811ca86cba6506cca5d415cd5559a7

ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["start"]
Enter fullscreen mode Exit fullscreen mode

There are three required environments to specify those versions of dependencies.

  • CONFLUENT_VERSION: 7.0.1
  • AVRO_VERSION: 1.10.1
  • GUAVA_VERSION: 31.0.1

Confluent packages rely on Guava, which is a common util package.

The latest Confluent version is 7.3.3 until now.

Avro and Guava are located at official maven repo.

After having connect-base, we can build connect like main Dockerfile except the base image, which should be our owned.

Conclusion

Once the image is built, how do we verify the image works?

Here is an official Debezium tutorial.

https://github.com/debezium/debezium-examples/tree/main/tutorial

If the built image can pass the tutorial, then the image works.

By the way, if you need a Debezium 2.0 with Confluent 7.0.1 image, here's a ready-made.
https://hub.docker.com/r/wirelessr/debezium-connect-avro

Top comments (3)

Collapse
 
ngochieu642 profile image
ngochieu642

Hey just a question, how did you know that it needs Avro & Guava? In the documentation, they didn't mention about those

Collapse
 
lazypro profile image
ChunTing Wu

It's been a long time and I've forgotten, but I think I found it after testing. By the way, Avro is what I would have needed with Confluent Registry.

Collapse
 
ngochieu642 profile image
ngochieu642

Thank you anyway! The post was really helpful. I wasn't able to make Debezium work with Avro despite following all the docs