To ensure reliability, we use Mule Clusters to synchronize our runtimes and ensure high availability between our applications.
Contrary to Server Groups which only deploy applications on all servers, Clusters also allow the runtimes to synchronize with each other. When clustered, their in-memory caches are synchronized and the schedulers are run only on the primary node.
After upgrading to Mule 4.4, we noticed that our schedulers were run on every runtimes in our cluster, resulting in the same operation being processed multiple times when the scheduler would run. The cluster had the same behaviour as Server Groups: they were not synchronized and schedulers and caches were managed individually on each node.
Understanding the issue: clusters logs
By default, Mule runtimes do not output logs related to clusters. If there is an issue, the runtime will fail silently to create the cluster and you will have no way to know there is a problem as nothing is reported in the Runtime Manager either.
To help you find the root cause, you can enable cluster logs from your runtime Log4J2 configuration (located in $MULE_HOME/conf/log4j2.xml
):
<AsyncLogger name="com.mulesoft.mule.runtime.module.cluster" level="DEBUG"/>
<AsyncLogger name="com.hazelcast.internal.cluster" level="DEBUG"/>
By adding these two lines in the Loggers
section of your configuration, the cluster logs will be written in $MULE_HOME/logs/mule_ee.log
. The setup of the cluster is one of the first operation done at runtime startup; you can find the corresponding log entries just after the runtime boots up.
Link to official documentation.
$MULE_HOME/.mule/mule-cluster.properties
If the cluster was not properly setup for some reason, the cluster configuration file might not be created. In this case, you will have an error saying that $MULE_HOME/.mule/mule-cluster.properties
could not be found.
This problem is a bit tricky because the file cannot be regenerated without editing the cluster, or created by hand if you manage your cluster using the runtime manager.
First, verify if the file $MULE_HOME/.mule/mule-cluster.properties
does exists and if not, make sure the runtime has the rights to write in $MULE_HOME
and $MULE_HOME/.mule/
.
If the file is missing, we must trigger an update of the cluster configuration, which will rewrite mule-cluster.properties
on every nodes of the cluster. To do so, you must change the number of nodes in the cluster. You can do it in two ways:
- Install the runtime on a new server and add it to the cluster. This is time consuming if you don't have any server ready and no automation to install the runtime.
- Remove a server from the cluster and add it back. This is the preferred solution but it will not avoid downtime.
Let's see how to proceed with the second solution:
First, you need to make sure there is at least 2 nodes up in your cluster with your application deployed.
In the runtime manager, go in the Servers page and click on your cluster's name to open its settings.
In the server list, click on the server you would like to remove (we will add it back in a second).
At the top of the opened page, click on Remove Server From Cluster. The server you removed and the other nodes in the cluster will now restart and the cluster status will be in the status Created if there is only one node in it.
You will now have to wait for a few minutes so every applications restart properly in the cluster. You will notice that no application is deployed on the node your just removed from the cluster.
In the meantime, you can SSH into your cluster' nodes and check whether $MULE_HOME/.mule/mule-cluster.properties
has been created properly. If so, you can open it and will find the property mule.cluster.nodes
listing all nodes in your cluster.
When every application has been restarted in the cluster, you can add the node back to the cluster. This will trigger a new restart of the applications, but your nodes should now be able to communicate with each other.
Ensuring the nodes can communicate
By default, all nodes should be on the same network and will use ports 5701, 5702 and 5703 to communicate with each others. If you have an error stating that the clusters could not communicate with each other, the two runtimes might not have the rights to call each other.
Make sure that all runtimes in your cluster fulfil the pre-requites and reach to your network team if something is not working properly.
What's next?
This small guide tries to save your some time but does not cover every issues you might encounter with your cluster. If you still have issues with your cluster, contact Mulesoft support: they know their product much more than I do!
Top comments (0)