Disk Space Debugging Checklist

#linux #debug #devops #bash

Many times, smoothly running processes stop working mysteriously. You open the logs and see what is happening, only to find that even the logs have stopped updating. But the process itself is running. You SSH to the server and type cd TAB. Bash weeps "Unable to create temporary file". The machine is out of disk space...

Here is a checklist to make disk space debugging easier, using standard Linux utilities so you can get started without having to install anything extra:

df -h command gives you an overview in a readable format about the number of disks mounted and their total and available capacities.
To get an idea of which folders/directories are eating up the maximum space, try out du -ch / | sort -h | tail -n 30. This gives you the 30 most space consuming directories. If you already know which directories generate maximum disk output e.g logs and temp files, you can replace the '/' with your directory (DIR) and run the command as du -ch DIR | sort -h | tail -n 30
Now that we have identified the directories with maximum space consumed, we may need to delete some files and get our process going again. The rm command is your friend here. You can delete old logs and temporary files to free up space.
Many times, the culprit is a single large file which is already in use by a program e.g catalina.out by Apache Tomcat. If you want to free up space without shutting down the process, the truncate command will help you out. Example: truncate -s0 BIG_LOG.log. This will truncate the file to 0 bytes and still allow the other process to use it without issues (standard Unix permissions apply)
Sometimes, you delete files and still, the space does not seem to be recovered. This can be because some process is still holding on to the file descriptor of the deleted file. Once these processes are stopped, the space will be recovered. The lsof command will help you out here. It stands for list open files. You can find out which processes are using deleted files as follows: lsof | grep deleted | grep OLD_FILENAME. The lsof command gives you the process name and the process id so you can run kill on the process. If you do not know the name of the deleted file, you can still run lsof | grep deleted and see the output to check for any familiar file / process.

Finally, keep in mind that disk space is one of the metrics you should monitor on your server. This checklist must be used in a pinch. If you find yourself constantly having disk space issues, the solution is to set up periodic deletion/rotation of old log files, alerts when the disk space reaches a particular threshold or to increase the disk size if your processes require a lot of disk space e.g Kafka, MySQL and other databases.

Let me know if there are some other tools I am missing out on and your experiences dealing with disk space issues!

DEV Community

Disk Space Debugging Checklist

Top comments (0)