DEV Community

Leon Nunes for Kubernetes Community Days Chennai

Posted on • Edited on

Linux Troubleshooting - A simple primer

In the beginning!

Linux has been around for a long time, most of the web runs on Linux, FreeBSD. Linux is everywhere, but there is a lot of things under the surface that one has to learn when troubleshooting Linux servers. Let's see some of the commands I used to use when I was doing Techsupport for Customers.

Commands

netcat/nc

Customers often say that my domain isn't working on say a certain port, or the customer needs to host an application on a port but it's not working

$ sudo nc --verbose  google.com 443
Ncat: Connected to 216.58.203.14:443.
Ncat: 0 bytes sent, 0 bytes received in 0.10 seconds.

--verbose Verbose output
  -z      Zero-I/O mode, report connection status only
Enter fullscreen mode Exit fullscreen mode

ss

From the manpage(cause even I didn't know this) ss - another utility to investigate sockets, say you have an application running on port 9001 how do you know if it's actually listening to connections and how do you know if it's listening on all interfaces(0.0.0.0)? ss helps you figure that out.

ss -patun | grep -w 9001

tcp   LISTEN     0      4096                       *:9001                *:*     users:(("rootlessport",pid=691027,fd=11))
Enter fullscreen mode Exit fullscreen mode

As you can see this command gives you the Protocol,If it's listening or not, the interface(*9001 is for all) and also the application that is using it along with the Process ID(PID) and File Descriptor(FD).

Atop

Atop Process manager
Ever had a customers server or your server gone out of memory, and nothing could pinpoint the reason? Atop solves that for you, by default Linux servers do not store any history of the processes, atop does.

vmstat, pidstat, iostat.

Vmstat will display all the virtual memory stats
Vmstat output details

vmstat -w -S M 1 9                                                                                                               
--procs-- -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu--------
   r    b         swpd         free         buff        cache   si   so    bi    bo   in   cs  us  sy  id  wa  st
   2    0         3545         5650          121         3052    0    0    69   183   61  216  22   9  68   0   0
   0    0         3545         5649          121         3052    0    0     0  1556 1466 4690   2   2  96   0   0
   0    0         3545         5649          121         3053    0    0     0   108 1160 4263   1   1  98   0   0
   0    0         3545         5649          121         3052    0    0     0    68 1584 5925   2   1  96   0   0
   0    0         3545         5648          121         3052    0    0     0    20 1213 4872   1   1  98   0   0
   0    0         3545         5649          121         3052    0    0     0     0  995 4079   1   1  99   0   0
   1    0         3545         5650          121         3052    0    0     0    88 1465 5533   2   1  97   0   0
   0    0         3545         5650          121         3052    0    0     0     0 1290 5131   1   1  98   0   0
   0    0         3545         5650          121         3052    0    0     0    96 1855 6303   2   2  96   0   0
Enter fullscreen mode Exit fullscreen mode

Then there is iostat that will give you details about the I/O pressure and cache

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          22.3%    0.0%    9.2%    0.1%    0.0%   68.4%

      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
     7.09        29.8k       143.2k         0.0k       5.4G      26.1G       0.0k sda
     1.74        33.9k        25.6k         0.0k       6.2G       4.7G       0.0k sdb
Enter fullscreen mode Exit fullscreen mode

More details here

And then there is pidstat which will give you process statistics, such as which process is taking a lot of memory and CPU More Details

There are many such tools in the sysstat package
There is a really nice article by Netflix read it here

Using the system journal correctly will save you a lot of headache.
Few Journal commands I use

journalctl --since=today -g oom
Enter fullscreen mode Exit fullscreen mode

This will essentially do a journal grep for the keyword oom no more journalctl | grep please.

journalctl -t httpd.service --since=today
Enter fullscreen mode Exit fullscreen mode

This will give you details about the httpd service only.

Checking Disk space.

This is often looked over, the quickest way to check disk space is

$ df -Th
Enter fullscreen mode Exit fullscreen mode

That's it nothing more.

Need to find files that are occupying disk? Not an issue.

$ sudo du -ach / | awk '$1 ~/[G]/ {print}' 
Enter fullscreen mode Exit fullscreen mode

This will print files that are in GB's, you can also use find for finding files.

top

The easiest way to check server load is by launching the top command the top command is an essential tool in Linux troubleshooting.

Checking Server port usage.

If you ever notice a random port and you want to know what process is occupying it and the list of files open, simply run.

lsof -i :9001
Enter fullscreen mode Exit fullscreen mode

Checking DNS resolution

DNS is something that is the most important thing when it comes to servers and domains.

dig a domain.com
;; ANSWER SECTION:
google.com.     188 IN  A   142.250.76.206
Enter fullscreen mode Exit fullscreen mode

Will tell you if your DNS resolution is working

Checking How much Memory is left

The free command is used to check the memory usage

free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       7.2Gi       4.5Gi       1.5Gi       3.6Gi       6.2Gi
Swap:           10Gi       3.4Gi       7.4Gi
Enter fullscreen mode Exit fullscreen mode

That's all, there are probably a few more commands I use, in case I remember them I will let you all know.

In case you would like to chat with me or have a discussion I'm always available at @mediocreDevops

Top comments (0)