DEV Community

Watson
Watson

Posted on

Some Notes on Process

What is process

Process is just a substance of program. Program is a like image containing a set of machine language instructions and some data, which is stored on the disk. You can check processes on your machine using the command "ps". This is the example of the "ps" command running on the WSL of Ubuntu 20.04.

$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 17:23 ?        00:00:00 /init
root         5     1  0 17:23 tty1     00:00:00 /init
watson       6     5  0 17:23 tty1     00:00:00 -bash
watson     111     6  0 19:23 tty1     00:00:00 ps -ef
Enter fullscreen mode Exit fullscreen mode

As you can see, "ps" command is also a process.
All linux distributions have "init" process whose process id is 1. Init process is the first process to run on the machine although the process 0 has already ran in the kernel, to be exact. So all process starts from init process and it is said to be the parents of all processes.

Processes have not only PID but also PPID meaning parents process id. Processed can be represent as a tree structure. In this case, the structure is as below.

Alt Text

There are two inits process because I am using WSL. WSL uses its own init process that is different from linux's. WLS needs to serve 9p server to enble windows to access on the file system of linux. This task is in the same init binary but runs as a different process. That is why there are some init processes when using WSL.

Memory Architecture

This is the overview of virtual memory and executable image. Different UNIX systems use different layouts for processes, but for the most part, most modern systems adhere to a format known as the Executable and Linkable Format (ELF).

Alt Text

Kernel space

Process structure

The kernel maintains process structure for every process. This structure contains the information that the kernel needs to manage the process. The information set it is containing differs among linux distributions or versions but most distributions have the information below.
・Process id
・Parent process id (or pointer to parent's process structure)
・Pointer to list of children of the process
・Process priority for scheduling, statistics about CPU usage and last priority
・Process state
・Signal information (signals pending, signal mask, etc.)
・Machine state
・Timers

User structure

The user structure maintains far less information than the kernel structure. In Linux, it contains the memory maps and the process control block. The memory maps generally include the starting and ending addresses of the text, data, and stack segments, the various base and limit registers for the rest of the address space, and so on. The process control block contains the CPU state and virtual memory state.

Kernel stack

Each process has its own kernel stack. All functions in the kernel space is carefully designed so that they are non-recursive because recursive functions use a lot of stack spaces. The max possible size of stack can be determined by tracing the function chain. So the kernel stack is allocated in a fixed size.

User space

Text segment

The text segment is the programs's executable code. This segment is read-only and shared with any processed.

Stack segment

The stack segment storage for the return address of function, arguments of function and so on. If the stack meets the top of the heap, it causes an exception.

Data segment

Data segment maintains the initialized data and uninitialized data. Initialized data has starting value and its name coming from symbol table. Uninitialized data doesn't have its value so it has only the offset of data segment. The data segment grows or shrinks by explicit memory requests such as brk() system call. malloc() is brk() related function in C.

Type of ELF

There are 3 types of ELF. We can get it with the process of compile.

Alt Text

1 - Object file (*.o)
This is the binary holding code and data for linking with other object files to create executable. This is like a part of executable so it cannot executed by itself.

2 - Executable
This is the binary which can be executed. All object files needed to execute a program are linked and executable is generated. *.a file is a static library which is archive of some object files. When the program needs static library, linker links it when compiling. If you don't define the name of output executable, the name will be a.out meaning assembler output.

3 - Shared object file (*.so)
This is known as a dynamic linking library. When the program needs it, it will be linked when executing.

Top comments (0)