I have been on both side of the table as interviewer and as interviewee for DevOps and SRE Roles. This blog I am trying to share some of the questions I have been asked or I have asked.
Note: This is just to share knowledge, experience and some fun questions
Linux Troubleshooting
Any DevOps and SRE interview commonly starts with some troubleshooting questions, where the interviewer tries to nudge your Linux Internal and some basic core concepts. Here are some of them on top of my mind
1. What happens when a Linux System boots, till you get a login prompt
This type of question usually comes from the companies where bare metals are still in use and they don’t use any public cloud. So let see what happens
Detailed Answer can be found here
2. What happens when you type ls on terminal
These type of questions are used to understand interviewee attention to details and depth of Linux Internals. Basically the interviewer wants to know if you under
forks()
andexec()
system calls.
the shell reads what you typed using thegetline()
function and function calledstrtok()
which took the line to tokenize. Shell also check if the 1st tokenls
is a Shell alias or not. If it’s not a built-in function, shell will find thePATH
variable in the directory. Since it holds the absolute paths for all the executable binary files. Once it finds the binary forls
, the program is loaded in memory and a system callfork()
is made. This creates a child process asls
and the shell will be the parent process. Thefork()
returns0
to the child process so it knows it has to act as a child and returns PID of the child to the parent process(i.e. the shell).
Next, the ls process executes the system callexecve()
that will give it a brand new address space with the program that it has to run. Now, the ls can start running its program. Thels
utility uses a function to read the directories and files from the disk by consulting the underlying filesystem’s inode entries.
Oncels
process is done executing, it will call the_exit()
system call with an integer0
that denotes a normal execution and the kernel will free up its resources.
Note: you can usestrace
ls
to dig deeper into the system calls
3. Explain Linux Inodes
An Inode number points to an Inode. An Inode is a data structure that stores the information about the file or folder
Detailed Answer is available here
4. Crash vs Panic
Crash usually happens when a
trap
occurs when the application trying to access memory incorrectly. Panic usually when the application kill/shutdown itself abruptly. Main difference between crash and panic is that crash is hardware or OS initiated and panic usually imitated by application by callingabort()
function. Some applications use a special function called a signal handler to generate information about the trap other can usegdb
to collection information about the same.
Most common bad programming signals areSIGSEGV
,SIGBUS
andSIGILL
usually caused by bad memory management, a bad pointer, uninitialized values or memory corruption.
5. Explain the /proc
filesystem
/proc
is very special in that it is also a virtual filesystem. It’s sometimes referred to as a process information pseudo-file system. It doesn’t contain ‘real’ files but runtime system information. Lot of system utilities are simply calls to files in this directory/proc
file system has the pid for the process running. if you docd /procs/self
you will see al lot file and there size is0
however you will see that they do contain information/maps
provides information about the memory address space of the process/cmdline
contains the arguments for the commandline/environ
provides information about the process' current environment/fd
contains symbolic link pointing to each file for which the process currently has file descriptor/proc/locks
shows all the locks on currently exist in the system/proc/sys/fs
contains some useful information likefile-nr
which tells you the number of open files and available on the system/proc/sys/vm
holds files and information to tune virtual memory
6. When I get a filesystem is full
error, but df
shows there is free space
Check check if you see zero
IFree
by usingdf -i
. If that is not the case then see if deleted files are still in use usinglsof
and restart those processes
7. What are the performance tools you would use on Linux Machine
uptime
dmesg
|tail
vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top
Detailed Answer is available here
8. Explain Linux FileSystem
Interviewer wants to know how much you understand about linux filesystems. A specific type of data storage format, such as
EXT3
,EXT4
,BTRFS
,XFS
, and so on. Linux supports almost 100 types of filesystems.
Detailed Answer is available here
9. Explain Kernel Space and User Space
This can be a rabbit hole question, Interviewer can go as deep as possible to see what are your limits. This is also the most interesting topic about Linux that how the control flows from User Space to Kernel Space and why that is important. Why can’t we directly access the Kernel Space. What are use internal libraries like
libc
and why we need system call
Detailed Answer is available here
10. How would you troubleshoot a High I/O Issue
11. What are processes
and threads
?
Process are basically the programs which are dispatched from the ready state and are scheduled in the CPU for execution. PCB (Process Control Block) holds the concept of process. A process can create other processes which are known as Child Processes. The process takes more time to terminate and it is isolated means it does not share the memory with any other process.
Detailed Answer is available here
12. Explain Kernel Memory Management
This is not a trivial question. It is very deep and convoluted. So I would hope that interviewer will only be trying to see if you understand the basics around the Kernel Memory Management
Detailed Answer is available here
13. Explain processes
and threads
?
14. Explain different type of task status ?
15. Explain Linux Concurrency and Race Conditions ?
16. Explain STACK
and HEAP
in Operating System ?
17. Explain Memory Leak ?
Naive definition: Failure to release
unreachable
memory, which can no longer be allocated again by any process during execution of the allocating process. This can mostly be cured by using GC (Garbage Collection) techniques or detected by automated tools.
Subtle definition: Failure to releasereachable
memory which is no longer needed for your program to function correctly. This is nearly impossible to detect with automated tools or by programmers who are not familiar with the code. While technically it is not a leak, it has the same implications as the naive one. This is not my own idea only. You can come across projects that are written in a garbage collected language but still mention fixing memory leaks in their changelogs.
18. How does Linux handles Interrupts ?
19. Explain Load Average ?
The best definition and internals about load average can be is explained here. I would encourage everybody to go though this website for more deeper understanding about internals
20. What happens when you try to curl
to website ?
This is very famous question and comes to life every now and then. However I would think that we all should be aware of the internal process flow when you do curl www.google.com . Once the best detailed explanation I found is, here. One can certainly argue that this way too much detail but hey no harm in knowing things completely, you may not say this whole thing when asked but one should certainly know about it
Other awesome resources available out there for interview preparations
I just made this effort to put all these together in one place. I will keep tracking these and put them together here in part…so stay tune!!
Top comments (3)
This article is a really helpful resource for SRE/DevOps preparation.
Beautiful article. Thank you.
Thanks