GitOps for Confluent Schema Registry

#kafka #schemaregistry #gitops

If you are like me, you love version control. Not so much the CLI interface (git isn't very beginner-friendly after all), but the concept of having a changelog of what has been done and - if you use meaningful commit messages - also the reasons behind a change.

Another thing that I love is Apache Kafka and that it is serialization-format agnostic. However most of the time, I use Avro-serialization because having a schema does bring a lot of pros to the table: it provides a safe way to change the contracts producers and consumers share and we all know:

Change is inevitable in life. - Jack Canfield

Schema evolutions and registry

Confluent's Schema Registry provides a central storage and retrieval API for schemas to be used by Kafka consumers and procuders. It also checks compatibility for schema evolutions to enable changes be backward- and/or forward-compatible.

Schema evolutions are often performed automatically when producers change the schema they use to write. This may be fine for prototyping but can lead to serious issues in production: in case of an error in the schema a rollback might not be as easy as going back to the previous version as eg. a newly-introduced field cannot be removed as easily and might need a lot of manual intervention to get back to the previous state. Also if the schema evolution fails because of compatibility issues, it will fail only when the changes are deployed and the new code runs for the first time. This one can be mitigated by testing on another environment first (which you should do regardless of this problem), but still.

Open Source ftw!

Now I wanted to combine both technologies and open sourced schema-registry-gitops to prevent from the above issues: having a version-controlled history of schema changes and push them to the registry only when ready and reviewed in QA. It can be used in CI/CD pipelines to ensure that schema changes are compatible with previous versions and can be part of your code review process.

My team at FLYERALARM uses it in our Atlassian Bamboo Pipelines to ensure that no bad schema evolutions make it into production and we've had no issues since! It's written in Kotlin using the APIs that Confluent provides for Schema Registry and Avro-serialization.

Top comments (2)

Ofir Hasbani • Oct 26 '21

Hey Dominik Liebler, thank you very much. I find this application very useful.

I have an important question though - when and where do you handle "Code Generation"? meaning, how do you produce Java/Kotlin/Scala/etc classes that are correlated to User.avsc that will be used by the Consumers and the Producers?

Dominik Liebler • Nov 11 '21

Hi Ofir, thank you! Code generation is provided by the Avro tools itself. Me I use Gradle, so there is a plugin to generate Java classes from the schemas github.com/davidmc24/gradle-avro-p.... So the classes are built at compile time and I run schema-registry-gitops after that before deploying it but in the same CI/CD pipeline to ensure that invalid schema validations are being stopped before going into production (or even staging) systems.