In my previous post DevOps: Understanding Process Monitoring on Linux I discussed how a Linux process works and why it's important to monitor process.
Let's dig deeper into how we can keep an eye out for how a Linux server is "Performing".
TL;DR:
This article dives into Linux process monitoring tools and techniques, helping you keep an eye on your server's performance. It covers command-line tools like top
, htop
, vmstat
, and sar
for in-depth monitoring, along with system utilities like System Monitor
for a graphical overview. The article also demonstrates a sample script using top
and uptime
to monitor CPU, memory, and system uptime, laying the groundwork for integrating push notifications.
Performance Monitoring
1./proc
In my previous post, I explained that "Everything is a File in Linux system" so where are these process files stored?
go to your CMD and just type ls /proc
and you will see your PIDs in there. This is where the Linux process resides. the /proc
directory contains files that contain (including but not limited to):
- Current state of Linux Kernel.
- Information about System Hardware.
- Currently running process.
Try running the below commands to find out more about what does /proc
consists of:
cat /proc/cpuinfo
cat /proc/devices #list serial ports, Network Interface, etc.
cat /proc/cmdline #useful in boot failures
/proc
can be modified and can be used to communicate configurational changes directly to the kernel.
The Linux kernel is equipped with procps
package which contains useful tools such as ps, top, iostat, etc. to help us in performance and process monitoring.
In addition to previously discussed top
and ps
there are other alternatives to the top
which can provide additional or graphical alternatives to the traditional top
.
htop
htop
offers a more visually appealing interface with color-encoded bars for CPU and memory utilization.
It views processes in a tree-like structure making it easier to understand the relationship between processes.
atop
atop
has the ability to be configured and run on remote systems, making it suitable for large-scale monitoring environments.
atop
provides long-term monitoring and analysis. It logs system data to a file which allows to review historic trends and identifies performance issue over time.
2.Where's my task manager?
Isn't it easy to find out what's going on in my system and process on Windows by just hitting "Ctrl+Alt+Del" and going to "Task Manager"? Why doesn't Linux provide something like that?
If you are in a GNOME environment you can find a similar tool under your apps by searching for "System Monitor". System Monitor has 4 tabs:
- System: Shows basic system info. -Process: Lists all the running processes. Can sort them and also perform operations such as Kill, stop, or terminating that process.
- Resources: Lists current CPU usage, Memory and Swap usage, Network usage, and Disk usage.
- File System: Lists all currently mounted file systems and additional info such as mount point, system type, and memory usage.
3.Virtual Memory statistics: vmstat
As the name suggests the vmstat
command provides detailed info regarding the processes, memory, paging, Input/Output blocks, traps and disk and CPU activity.
The first time you run vmstat
it lists the average since the last reboot. the subsequent reports are from the sampling period of provided 'delay'.
Some useful options with vmstat
:
vmstat -s #lists memory and scheduling statistics
From the above image you can see that running vmstat -s
gives you info regarding:
- Amount of used memory: Total memory, currently used memory, Active/Inactive memory, Free, Buffer, Cache, etc.
- CPU statistics: High and low priority process, Kernel Process, I/O management, Software interrupts, etc.
- Memory Paging: Total pages paged in and paged out from virtual memory, total pages read from and written to swap memory.
- Event Counters: Total interrupts, context switches, timestamps, and forks since last boot time.
4.System Activity Reporter: sar
Go to your terminal and write the below command:
ls /var/log/sysstat
You will see a bunch of directories either named saDD
or saYYYYMMDD
where YYYY, MM, and DD stand for Year, Month, and Day. These are "Standard System Activity Daily Data Files".
These are the directories created by sar
, which collects and reports information about system activity that has occurred so far since the system started. It is possible to store the output of sar
to a different file by the below command:
sar -o [filename] #save output to a different file
sar -1 # shows sar output from the previous day
Real world example:
Problem statement:
You want to keep a check on the current performance of your Linux server. You want to get notified if either CPU usage, Memory Usage, or System usage is going over a certain threshold and prevent unintentional system overutilization.
Assumptions & lab setup
I will be using the below command which is provided by Linux and is a way to benchmark the hardware or software component. It can generate various types of load, including I/O, CPU, Memory, and Network:
stress --cpu 8 --io 4 --vm 4 --vm-bytes 1024M --timeout 10s
I have specified:
- CPU load equivalent to 8 CPU cores.
- 4 I/O concurrent operations.
- 1024 MB of 4 virtual memory workload.
Solution & Explaination:
#!/bin/bash
# Set the interval for monitoring (in seconds)
interval=5
while true; do
# Get CPU usage and average load
cpu_usage=$(top -n 1 | grep 'Cpu(s):' | awk '{print 100 - $8}')
avg_load=$(uptime | awk '{print $8, $9, $10}')
# Get memory usage
mem_total=$(free -m | grep Mem | awk '{print $2}')
mem_used=$(free -m | grep Mem | awk '{print $3}')
mem_free=$(free -m | grep Mem | awk '{print $4}')
mem_usage=$(( ($mem_used * 100) / $mem_total ))
# Get network statistics
echo "Network Packets:"
# Iterate over each interface
for interface in $(ifconfig | grep 'flags' | awk '{print $1}' | cut -d':' -f1); do
# Get RX packets and bytes
packets_transferred=$(ifconfig $interface | grep 'RX packets')
# Print the interface name and transfer the data
echo "$interface : $packets_transferred"
done
# Get system uptime
uptime_hours=uptime | awk -F, '{sub(".*up ",x,$1);print $1,$2}'
echo "CPU Usage: $cpu_usage%"
echo "Average Load: $avg_load"
echo "Memory Usage: $mem_usage%"
echo "System Uptime: $uptime_hours"
echo
sleep $interval
done
Code explanation:
Getting the CPU usage:
top -n 1 | grep 'Cpu(s):' | awk '{print 100 - $8}'
- Running the
top
and grepping details regarding the CPU. The top command will show the details regarding the tasks, CPU details, Swap, and Physical Memory in the system.
Getting the Average system load:
In addition to vmstat and sar commands, we can use uptime
command to get concise details about the system. uptime
command will output the current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5 and 15 minutes.
avg_load=$(uptime | awk '{print $8, $9, $10}')
I am simply manipulating the output to only fetch the required average system load from uptime
.
Later in the script, I manipulated the same uptime
output to get the current uptime. I am using some RegEX to accommodate different uptime. i.e. 15 days, 12 hours, 2 minutes, and 45 seconds
uptime_hours=uptime | awk -F, '{sub(".*up ",x,$1);print $1,$2}'
Getting the Memory usage:
free
is another useful command to get detailed output regarding the memory available, used, and free on the system. You can think of it as a more concise version of vmstat -s
.
I am manipulating the string returned by free
to get the precise memory currently being used.
The output
Here's the output, I am printing out the CPU usage, Memory Usage, or System usage for now. We can extend the bash code and use Push notification services such as pushover, sendmail, etc.
Additional Performance Monitoring Tools:
I would just want to list some additional GUI tools which can help you monitor performance of your linux server better:
1.stacer
:
GUI for CPU/Memory and other things
2.saidar
:
similar to atop
or htop
3.cpu-x
:
My personal favorite as it gives very precise details on the CPU, memory and Disk usage and feels familiar to use.
You can also run the stress
command that I ran in the beginning to benchmark and stress test the CPU directly inside the cpu-x
Conclusion:
This article effectively explores various Linux process monitoring tools. From command-line utilities offering detailed insights to GUI tools providing a visual representation, you're equipped to choose the tools that best suit your needs. The provided script example demonstrates the practical application of these tools and opens doors for further customization with push notifications.
Feel free to ask any questions or share your preferred monitoring tools and techniques! Let's keep the discussion going!
Top comments (0)