DEV Community

Suave Bajaj
Suave Bajaj

Posted on

ZooKeeper Chronicles: Navigating EOFException and the Enigma of 0-Length Files

In the realm of technology, ZooKeeper plays a crucial role in maintaining order within distributed systems. However, much like any compelling narrative, I recently encountered a stumbling block marked by an unexpected challenge – an EOFException paired with an intriguing 0-length file.

The Error:

ERROR [main:Util@214] - Last transaction was partial.
ERROR [main:ZooKeeperServerMain@66] - Unexpected exception, exiting abnormally
java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
Enter fullscreen mode Exit fullscreen mode

An EOFException occurred, causing our ZooKeeper server to crash.


The Mysterious 0 Length File:

Navigating through the ZooKeeper folder, I saw something strange:

-rw-rw-r-- 1 zookeeper zookeeper 1342177280 Sep 23 23:17 log.be45c2
-rw-rw-r-- 1 zookeeper zookeeper          0 Oct  1 14:31 log.be85e9
Enter fullscreen mode Exit fullscreen mode

Digging Deeper:

The zero file hinted at trouble with our transaction logs. Looking into the logs, we found the last transaction was a bit messed up, triggering the EOFException. But why did it result in a zero file?


Real Issue: Running Out of Space

There was a 0 Length File because the underlying disk utilization was full and no space was left on the device to write the logs.

Filesystem      Size  Used Avail Use% Mounted on
/dev/sde         25G   25G     0 100% /var/lib/zookeeper
Enter fullscreen mode Exit fullscreen mode

As per the Zookeeper official documentation here

A ZooKeeper server will not remove old snapshots and log files, this is the responsibility of the operator. Every serving environment is different and therefore the requirements of managing these files may differ from install to install (backup for example).

The PurgeTxnLog utility implements a simple retention policy that administrators can use. The API docs contains details on calling conventions (arguments, etc...).


Using the PurgeTxnLog utility

zookeeper@zk-0:/bin$ zkCleanup.sh
Usage:
PurgeTxnLog dataLogDir [snapDir] -n count
    dataLogDir -- path to the txn log directory
    snapDir -- path to the snapshot directory
    count -- the number of old snaps/logs you want to keep, value should be greater than or equal to 3
Enter fullscreen mode Exit fullscreen mode

Keep the latest 5 logs and snapshots
./zkCleanup.sh -n 5


Conclusion:

In the dynamic landscape of distributed systems, unexpected errors often hint at underlying complexities. Our journey through the EOFException and zero file quandary emphasizes the importance of vigilant monitoring of logs and disk space. As tech custodians, we must be prepared to decode mysteries, address core issues, and ensure the seamless operation of systems like ZooKeeper.

ZooKeeper, with its unique quirks and challenges, continues to be an exciting playground for tech enthusiasts eager to navigate the intricacies of system synchronization.

Top comments (0)