DEV Community

Satoru Takeuchi
Satoru Takeuchi

Posted on • Edited on

How Linux Works: Chapter1 Linux Overview (Part2)

Libraries

In this section, we will discuss libraries provided by the operating system. Many programming languages offer the ability to bundle commonly used functions across multiple programs into libraries. This allows programmers to efficiently develop programs by choosing from a vast array of libraries created by their predecessors. Some libraries, which are expected to be used by a large number of programs, may be provided by the operating system.

The following figure shows the software hierarchy when a process is using a library.

Image description

C language has a standard library defined by the International Organization for Standardization (ISO). Linux also provides this standard C library. Typically, the glibc provided by the GNU project GNU is used as the standard C library. In this book, we will refer to glibc as libc.

Almost all C programs written in C language are linked with libc.

You can use the ldd command to check which libraries a program is linked with. Let's take a look at the ldd output for the echo command.

$ ldd /bin/echo
        linux-vdso.so.1 (0x00007ffef73a9000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2925ebd000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f29260d1000)
$
Enter fullscreen mode Exit fullscreen mode

In the above example, libc.so.6 refers to the standard C library. Also, ld-linux-x86-64.so.2 is a special library for loading shared libraries, which is also one of the libraries provided by the OS.

Let's also check the cat command.

$ ldd /bin/cat
        linux-vdso.so.1 (0x00007ffc3b155000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fabd1194000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fabd13a9000)
$
Enter fullscreen mode Exit fullscreen mode

This also links to libc. Let's also look at the python3 command, which is the Python3 interpreter.

$ ldd /usr/bin/python3
        linux-vdso.so.1 (0x00007ffc91126000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5fb7206000)
 ...
        /lib64/ld-linux-x86-64.so.2 (0x00007f5fb740f000)
$
Enter fullscreen mode Exit fullscreen mode

Again, libc is linked. In other words, when executing Python programs, the standard C library is used internally. Although few people may use C language directly nowadays, it can be seen that it remains an important language as the backbone of the OS level.

If you run the ldd command for various programs existing in the system, you will see that many of them are linked with libc. Please give it a try.

In Linux, in addition to this, standard libraries for various programming languages, such as C++, are provided. It also offers libraries that, while not standard, many programmers are likely to use. In Ubuntu, library files often begin with the string "lib". When I ran dpkg-query -W | grep lib in my environment, over 1000 packages were displayed.

Wrapper Functions for System Calls

libc not only provides the standard C library but also offers something called "wrapper functions" for system calls. System calls cannot be directly called from high-level languages such as C, unlike regular function calls. They must be invoked using architecture-dependent assembly code.

For example, in the x86_64 CPU architecture, the getppid() system call is issued at the assembly code level as follows:

mov    $0x6e,%eax
syscall
Enter fullscreen mode Exit fullscreen mode

In the first line, the system call number "0x6e" for getppid() is assigned to the eax register. This is determined by the Linux system call calling convention. The second line issues the system call and transitions to kernel mode via the syscall instruction. After this, the kernel code that processes getppid() is executed. If you don't usually write assembly language, you don't need to understand the detailed meaning of this source here. Just get a feel for the atmosphere that it's obviously different from the source code you normally see.

In the arm64 architecture, which is mainly used in smartphones and tablets, the getppid() system call is issued at the assembly code level as follows:

mov     x8,  <system call number>
svc     #0
Enter fullscreen mode Exit fullscreen mode

Quite different, isn't it? Without the help of libc, every time you issue a system call, you would have to write architecture-dependent assembly source code and call it from a high-level language.

Image description

This would make program creation more time-consuming and not portable to other architectures.

To solve such problems, libc provides a series of functions called "wrapper functions" for system calls, which internally just call the system calls. Wrapper functions exist for each architecture. From user programs written in high-level languages, you only need to call the system call wrapper functions prepared for each language.

Image description

Static Libraries and Shared Libraries

Libraries can be classified into two types: static libraries and shared (or dynamic) libraries. Both provide the same functionality, but the way they are incorporated into a program is different.

When creating a program, first, you compile the source code to create a file called an object file. Then, you link the library used by the object file to create the executable file. At link time, static libraries incorporate the functions within the library into the program. In contrast, shared libraries only embed information such as "call this function of this library" in the executable file at link time. Then, at program startup or during execution, the library is loaded into memory, and the program calls the functions within it.

The following figure shows the difference between the two in the case of a pause program that only calls the pause() system call and does nothing else.

Image description

And here is the source code of pause.

#include <unistd.h>

int main(void) {
    pause();
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Let's verify if my explanation is correct with the following perspectives:

  • The size of pause program
  • Link status with shared libraries

As an example, let's consider linking the libc library to the program. First, let's check the case of using the static library "libc.a"1.

$ cc -static -o pause pause.c
$ ls -l pause
-rwxrwxr-x 1 sat sat 871688  Feb 27 10:29 pause  ... (1)
$ ldd pause
        not a dynamic executable   ... (2)
$
Enter fullscreen mode Exit fullscreen mode

The execution results show the following:

  • (1) The program size is just under 900KB
  • (2) No shared libraries are linked

Since this program already incorporates libc, it will still work if "libc.a" is deleted. However, doing so would be very dangerous because other programs would no longer be able to statically link with libc, so please do not do this.

Next, let's consider the case of using the shared library "libc.so"2.

$ cc -o pause pause.c
$ ls -l pause
-rwxrwxr-x 1 sat sat 16696  Feb 27 10:43 pause
$ ldd pause
        linux-vdso.so.1 (0x00007ffc18a75000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f64ad4e9000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f64ad6f7000)
$
Enter fullscreen mode Exit fullscreen mode

From these results, we can see the following:

  • The size is about 16KB, which is a fraction of the size when libc is statically linked.
  • libc ("/lib/x86_64-linux-gnu/libc.so.6") is dynamically linked.

The pause command with dynamically linked libc will not execute if libc.so is deleted. In fact, doing so is even more dangerous than deleting libc.a, as it would render all programs that link to libc.so inoperable. If this happens, you'll need to use complex methods to recover or reinstall the entire OS. Please do not do this under any circumstances.

The reason for the small size is that libc is not embedded in the program itself but is loaded into memory at runtime. Instead of using separate copies of libc code for each program, all programs using libc share the same instance.

Both static and shared libraries have their pros and cons, so it's hard to say which is better overall. However, shared libraries have been mainly used for the following reasons:

  • They keep the overall storage consumption low.
  • If there's an issue with the library, replacing the new shared library will resolve the problem for all programs using that library.

It might be interesting to run the ldd command on the executable files of the programs you use to see which shared libraries are linked.

Column: The Revival of Static Linking

In this article, I mentioned that shared libraries have been preferred, but the situation has changed slightly in recent years. For example, the popular Go language, which has gained popularity in the past few years, statically links most libraries by default. As a result, most Go program does not depend on any shared libraries.

Let's run ldd on the hello program, which is written in Go, to verify this.

$ ldd hello
        not a dynamic executable
Enter fullscreen mode Exit fullscreen mode

There are various reasons for this, such as:

  • The size issue has become relatively smaller thanks to the large capacity of memory and storage in modern computers.
  • If a program can run with just a single executable file, it is easier to handle since you can simply copy the file to run in another environment.
  • Faster startup as there is no need to link shared libraries at runtime.
  • Shared libraries have issues, such as some programs not working due to library version upgrades, because the behavior of different versions of libraries that should originally work the same can be subtly different (so called "DLL Hell").

There are various ways of thinking, and the appropriate method changes over time.

previous part
next part

NOTE

This article is based on my book written in Japanese. Please contact me via satoru.takeuchi@gmail.com if you're interested in publishing this book's English version.


  1. In Ubuntu 20.04, this is provided by the libc6-dev package. 

  2. In Ubuntu 20.04, this is provided by the libc6 package. 

Top comments (0)