DEV Community

AinaJ
AinaJ

Posted on

The CPython interpreter GIL

The CPython interpreter GIL

Introduction

Python is one of the most popular programming language in the last decade.
Although the syntax is easy to read there are some lack to consider with python.
Many consider python too slow when doing multi-threading. This is principally due to the original implementation of python Cpython wich use one thing called GIL.
In this article I'll try to explain you what is GIL, how it work, and why it is so important for python.

Cpython memory management

To start let's understand how python memory management is done in the official implementation of python Cpython. For every object in python there are three things stored: its value, its type, and the reference-counter.

Alt Text

The reference counter is what Cpython use to decide if object must be removed from memory or not. When some add reference to the object its reference counter increment and decrement when some remove reference to it. It will be removed from memory when the reference counter is zero. That's mean there is no more things making reference to the object.
The reference count field can be examined using the sys.getrefcount function (notice that the value returned by this function is always 1 more as the function also has a reference to the object when called)

>>> import sys
>>> A="toto"
>>> B=A
>>> sys.getrefcount(A)
3
Enter fullscreen mode Exit fullscreen mode

The reference count of A is 3 because in addition to the two line before sys.getrefcount also make a reference to A. It will directly drop to 2 after we run this function.
Let's add reference to A with a variable C and see what happen.

>>> C=A
>>> sys.getrefcount(A)
4
Enter fullscreen mode Exit fullscreen mode

Now the reference increases by 1 and become 4.

With del() function we can delete object from memory so we can remove reference.

>>> del(C)
>>> sys.getrefcount(A)
3
Enter fullscreen mode Exit fullscreen mode

Bingo! The reference decreases by 1 because we delete on reference.

That's pretty good. But how if multiple part of program make change in the reference counter at the same time. With the concept of multi-thread this can happen. Imagine both thread1 and thread2 use the variable A. Thread1 decrease the value of A reference counter at the same time as thread2 increase it. This can result that reference counter of A never rich zero or reach zero to soon while other thread still have reference to it. This is one of the reason python use GIL. So what is really GIL.

## What is GIL?
The GIL or the Global Interpreter Lock is a mutex or a lock used in language like python and nodejs to allow only one thread running at a time. It's ensure safe memory management and make single thread program more faster. It was first introduced by Guido Van Rossum in August 1992.

So what the GIL do for Cpython
Back with the problem of multi-thread and reference counter GIL bring a great solution. Only one thread run at any time so we are pretty sure that one thread only can make a change in the reference counter during execution. That's is one of the main reason GIL is used with python. But there is also the fact that Cpython use so many C extensions and library. These extensions can interact directly with memory and can cause memory problem. GIL can prevent this with the lock system it implement so we can run these extension in a safe way. It also provide a performance boost for single-threaded programs.
That's about the positive side. But what about the other face of GIL.

GIL as we saw above allow only one thread to run at any time. Let's take a look at it in practice.
We are going to compare two program that do the same thing: count back from a given number.
The first one is single-threaded:

import time 


def count_back(start):
    while start > 0:
        start = start - 1

START = 50000000

t1 = time.time()
count_back(START)

print("total execution time is: ", time.time() - t1)
Enter fullscreen mode Exit fullscreen mode

total execution time is: 3.8295717239379883

As we can see it takes 3,83 seconds to do the count back.

Now let's do that with multi-threaded program.

import time
from threading import Thread

def count_back(start):
    while start > 0:
        start = start - 1

START = 50000000

# creation of threads
thread1 = Thread(target=count_back, args=(START//2,))
thread2 = Thread(target=count_back, args=(START//2,))

# execution
t1 = time.time()
thread1.start()
thread2.start()
thread1.join()
thread2.join()

print("total execution time is: ", time.time() - t1)
Enter fullscreen mode Exit fullscreen mode

total execution time is: 4.143504619598389

I run it with 2 core CPU.
It supposed to be faster than the single-threaded one but it's not the case. It take even more time.
Python execute each thread one by one so even with a multi-core processor multi-threaded program is executed like single-threaded.
That's is one of the reason python is so slow especially with multi-threaded program. Let's understand how GIL work with multi-threaded.

How GIL work?

When a thread start it need to acquire the GIL first. Also when it finishes or have to do some IO operation it release the GIL so the next thread can acquire the GIL and be executed.
It's like running 400m relay. Let's assume every runner is a thread and the stick is the GIL. When someone want to make a run he need to get the stick before. That's is how GIL work. It's simple stupid but it works great especially with single-threaded program. Performance boost can be really important. From python2 to python3 there was some notable improvement made inside the GIL by Antoine Pitrou. And it makes python3 GIL more reliable.

You can visualize it in this image below.
Example of 2 thread running concurrently

Alternative to GIL

GIL can make your program slower with the original implementation of python Cpython.
Some implementation doesn't use GIL, like Jython, pypy. As Guido says if you want your python program to run faster use pypy.
Here is an output of the program above with pypy.

total execution time is: 0.13152098655700684

There is also multi-processing in python to deal with performance in Cpython. It will run one interpreter for each process so one GIL per each.

Conclusion

We saw that GIL is an important part of python. Even if it makes your program run slower it offers more advantage for python. It's one of the reason that makes python so popular today with the easy support of C extensions.
As a programmer you can not really see the impact of GIL but it always good to know what is behind the wheel. That's was this article for.
Hope you enjoy.

Top comments (0)