Alex Yaroslavsky

Posted on Jul 13, 2020

Integration testing Apache Pulsar clients with Docker Compose and TestContainers

#testing #pulsar #docker #testcontainers

Hello, quarantined user!

I would like to share with you my adventures with a complex Docker Compose testing scenario using TestContainers Java library that by pure coincidence (not really) also involves the coolest new kid on the block - Apache "The Kafka Killer" Pulsar.

Full source code is available at the end of the article.

Three main tools we are going to use today

Apache Pulsar

A distributed pub-sub messaging system with stream processing capabilities. It combines tenant isolation, seamless scaling and all the functionality of RabbitMQ and Kafka in one pretty damn badass package.

Docker Compose

A tool that allows you to keep multiple container configurations in a nice YAML file and to run multiple containers with one command.

TestContainers

A Java library for integration testing that can spin up any containers and let your tests interact with them in an easy and convenient way.

What are we trying to accomplish?

We want to set up an environment with the following components:

Apache Pulsar with three tenants: "internal", "customer1" & "customer2". All messages sent to the "customer" tenants will be routed by a Pulsar Function to a single topic in the "internal" tenant for easy centralized consumption.
Two Producers that will be sending messages to tenants "customer1" and "customer2" respectively - topic name: "/outbound/corona"
A single Consumer that will be receiving all messages from both Producers from a single topic "internal/inbound/corona"

It might seem that creating a single compose file with all the services above should give us what we need, but, it is not the case. First, we must start the Pulsar server, wait for it to load, configure all the tenants and the Functions. Then, we should start the consumer so it will be ready to read the messages and then finally start the producers. After all the components are up and running we want to verify that the producers are sending messages and the consumer is receiving them.

Can we do it manually first?

Before trying to automate the task, let's figure out how to do it manually using only Docker Compose.

We already understand that we will need several compose files and we know that all the components must be able to communicate with each other, which means that they have to be on the same network. Docker Compose allows just that. We can define a network in one compose file and then use it in another compose file by marking it there as external.

Here you can see a Pulsar compose file that defines a network called "local-pulsar-net":

And here is a different compose file that uses the same network for its service:

We see above that the "pulsar-net" network that is defined in this compose file is linked to an external network called "local-pulsar-net", which happens to be the network we defined in the previous compose file.

Running those compose files with docker-compose up will place the containers in the same network and they will be able to communicate with each other by their respective service names.

Let's automate this coronapulsar!

TestContainers library is quite powerful and easy to use, here is an example code snippet showing how to start a Pulsar server from a compose file:

Until this point all is great, we can even create a PulsarAdmin client and configure our Pulsar the way that we need. Unfortunately, starting the second compose file that references the network defined in the Pulsar compose file just fails with an exception:

What happens is that TestContainers adds a random prefix to all components it loads to Docker. This is of course necessary to prevent name collisions and such when loading many different compose files. So, for example, our "local-pulsar-net" is created as "pulsarrpekn3_local-pulsar-net" in the run above. And the name will be different in the next one and so on…

Let's solve this COVID-bug!

It seems that in order to solve our issue we need:

To know the random name of the "local-pulsar-net" network as generated by TestContainers.
To somehow apply this name to the existing additional compose files before loading them.

Fortunately, there is way to do both. Hooray!

Getting the name of the network:

To apply the network name to the remaining compose files we can use the following trick, define the network like this in the YAML files:

note the last line - name: $PULSAR_NETWORK

This allows to pass the name of the network as an environment variable when starting the compose file, like this:

The final chapter

You can find the full source code that starts and configures a standalone Pulsar, then starts a Consumer, two Producers, verifies that they are all working correctly and shuts down the environment on my GitHub page:

trexinc / pulsar-integration-testing