Sachin

Posted on Nov 13, 2023 • Originally published at geekpython.in

Multi-Threaded Programs in Python Using threading Module

#python #programming #thread #tutorial

You may have heard the terms "parallelization" or "concurrency", which refer to scheduling tasks to run parallelly or concurrently (at the same time) to save time and resources. This is a common practice in asynchronous programming, where coroutines are used to execute tasks concurrently.

Threading in Python is used to run multiple tasks at the same time, hence saving time and resources and increasing efficiency.

Although multi-threading can save time and resources by executing multiple tasks at the same time, using it in code can lead to safety and reliability issues.

In this article, you'll learn what is threading in Python and how you can use it to make multiple tasks run concurrently.

What is Threading?

Threading, as previously stated, refers to the concurrent execution of multiple tasks in a single process. This is accomplished by utilizing Python's threading module.

Threads are smaller units of the program that run concurrently and share the same memory space.

How to Create Threads and Execute Concurrently

Python provides a module called threading that provides a high-level threading interface to create and manage threads in Python programs.

Create and Start Thread

A thread can be created using the Thread class provided by the threading module. Using this class, you can create an instance of the Thread and then start it using the .start() method.

import threading

# Creating Target Function
def num_gen(num):
    for n in range(num):
        print("Thread: ", n)

# Main Code of the Program
if __name__ == "__main__":
    print("Statement: Creating and Starting a Thread.")
    thread = threading.Thread(target=num_gen, args=(3,))
    thread.start()
    print("Statement: Thread Execution Finished.")

A thread is created by instantiating the Thread class with a target parameter that takes a callable object in this case, the num_gen function, and an args parameter that accepts a list or tuple of arguments, in this case, 3.

This means that you are telling Thread to run the num_gen() function and pass 3 as an argument.

If you run the code, you'll get the following output:

Statement: Creating and Starting a Thread.
Statement: Thread Execution Finished.
Thread:  0
Thread:  1
Thread:  2

You can notice that the Statement section of the code has finished before the Thread did. Why does this happen?

The thread starts executing concurrently with the main program and the main program does not wait for the thread to finish before continuing its execution. That's why the above code resulted in executing the print statement before the thread was finished.

To understand this, you need to understand the execution flow of the program:

First, the "Statement: Creating and Starting a Thread." print statement is executed.
Then the thread is created and started using thread.start().
The thread starts executing concurrently with the main program.
The "Statement: Thread Execution Finished." print statement is executed by the main program.
The thread continues and prints the output.

The thread and the main program run independently that's why their execution order is not fixed.

join() Method - The Saviour

Seeing the above situation, you might have thought then how to suspend the execution of the main program until the thread is finished executing.

Well, the join() method is used in that situation, it doesn't let e*xecute the code further until the current thread terminates*.

import threading

# Creating Target Function
def num_gen(num):
    for n in range(num):
        print("Thread: ", n)

# Main Code of the Program
if __name__ == "__main__":
    print("Statement: Creating and Starting a Thread.")
    thread = threading.Thread(target=num_gen, args=(3,))
    thread.start()
    thread.join()
    print("Statement: Thread Execution Finished.")

After creating and starting a thread, the join() method is called on the Thread instance (thread). Now run the code, and you'll get the following output.

Statement: Creating and Starting a Thread.
Thread:  0
Thread:  1
Thread:  2
Statement: Thread Execution Finished.

As can be seen, the "Statement: Thread Execution Finished." print statement is executed after the thread terminates.

Daemon Threads

Daemon threads run in the background and terminate immediately whether they completed the work or not when the main program exits.

You can make a daemon thread by passing the daemon parameter when instantiating the Thread class. You can pass a boolean value to indicate whether the thread is a daemon (True) or not (False).

import threading
import time

def daemon_thread():
    while True:
        print("Daemon thread is running.")
        time.sleep(1)
        print("Daemon thread finished executing.")

if __name__ == "__main__":
    thread1 = threading.Thread(target=daemon_thread, daemon=True)

    thread1.start()
    print("Main program exiting.")

A thread is created by instantiating the Thread class passing the daemon_thread function inside it and to mark it as a daemon thread, the daemon parameter is set to True.

The daemon_thread() function is an infinite loop that prints a statement, sleeps for one second, and then again prints a statement.

Now when you run the above code, you'll get the following output.

Daemon thread is running.Main program exiting.

You can see that as soon as the main program exits, the daemon thread terminates.

At the time when the daemon_thread() function enters the loop, the concurrently running main program exits, and the daemon_thread() function never reaches the next print statement as can be seen in the output.

threading.Lock - Avoiding Race Conditions

Threads, as you know, run concurrently in a program. If your program has multiple threads, they may share the same resources or the critical section of the code at the same time, this type of condition is called race conditions.

This is where the Lock comes into play, it acts like a synchronization barrier that prevents multiple threads from accessing the particular code or resources simultaneously.

The thread calls the acquire() method to acquire the Lock and the release() method to release the Lock.

import threading

# Creating Lock instance
lock = threading.Lock()

data = ""

def read_file():
    global data
    with open("sample.txt", "r") as file:
        for info in file:
            data += "\n" + info

def lock_task():
    lock.acquire()
    read_file()
    lock.release()

if __name__ == "__main__":
    thread1 = threading.Thread(target=lock_task)
    thread2 = threading.Thread(target=lock_task)

    thread1.start()
    thread2.start()

    thread1.join()
    thread2.join()

    # Printing the data read from the file
    print(f"Data: {data}")

First, a Lock is created using the threading.Lock() and store it inside the lock variable.

An empty string is created (data) for storing the information from both threads concurrently.

The read_file() function is created that reads the information from the sample.txt file and adds it to the data.

The lock_task() function is created and when it is called, the following events occur:

The lock.acquire() method will acquire the Lock immediately when the lock_task() function is called.
If the Lock is available, the program will execute the read_file() function.
After the read_file() function finished executing, the lock.release() method will release the Lock to make it available again for other threads.

Within the if __name__ == "__main__" block, two threads are created thread1 and thread2 that both runs the lock_task() function.

Both threads run concurrently and attempt to access and execute the read_file() function at the same time but only one thread can access and enter the read_file() at a time due to the Lock.

The main program waits for both threads to execute completely because of thread1.join() and thread2.join().

Then using the print statement, the information present in the file is printed.

Data: 
Hello there! Welcome to GeekPython.
Hello there! Welcome to GeekPython.

As can be seen in the output, one thread at a time reads the file. However, there were two threads that's why the file was read two times, first by thread1 and then by thread2.

Semaphore Objects in Threading

Semaphore allows you to limit the number of threads that you want to access the shared resources simultaneously. Semaphore has two methods:

acquire(): Thread can acquire the semaphore if it is available. When a thread acquires a semaphore, the semaphore's count decrement if it is greater than zero. If the count is zero, the thread waits until the semaphore is available.
release(): After using the resources, the thread releases the semaphore that results in an increment in the count. This means that shared resources are available.

Semaphore is used to limit access to shared resources, preventing resource exhaustion and ensuring controlled access to resources with limited capacity.

import threading

# Creating a semaphore
sem = threading.Semaphore(2)

def thread_task(num):
    print(f"Thread {num}: Waiting")

    # Acquire the semaphore
    sem.acquire()
    print(f"Thread {num}: Acquired the semaphore")

    # Simulate some work
    for _ in range(5):
        print(f"Thread {num}: In process")

    # Release the semaphore when done
    sem.release()
    print(f"Thread {num}: Released the semaphore.")

if __name__ == "__main__":
    thread1 = threading.Thread(target=thread_task, args=(1,))
    thread2 = threading.Thread(target=thread_task, args=(2,))
    thread3 = threading.Thread(target=thread_task, args=(3,))

    thread1.start()
    thread2.start()
    thread3.start()

    thread1.join()
    thread2.join()
    thread3.join()

    print("All threads have finished.")

In the above code, Semaphore is instantiated with the integer value of 2 which means two threads are allowed to run at the same time.

Three threads are created and all of them use the thread_task() function. But only two threads are allowed to run at the same time, so two threads will access and enter the thread_task() function at the same time, and when any of the threads releases the semaphore, the third thread will acquire the semaphore.

Thread 1: Waiting
Thread 1: Acquired the semaphore
Thread 1: In process
Thread 1: In process
Thread 1: In process
Thread 1: In process
Thread 1: In processThread 2: Waiting
Thread 2: Acquired the semaphore

Thread 1: Released the semaphore.
Thread 2: In process
Thread 2: In process
Thread 3: WaitingThread 2: In process
Thread 3: Acquired the semaphore
Thread 3: In process

Thread 2: In process
Thread 2: In process
Thread 2: Released the semaphore.
Thread 3: In process
Thread 3: In process
Thread 3: In process
Thread 3: In process
Thread 3: Released the semaphore.
All threads have finished.

Using ThreadPoolExecutor to Execute Tasks from a Pool of Worker Threads

The ThreadPoolExecutor is a part of concurrent.features module that is used to execute multiple tasks concurrently. Using ThreadPoolExecutor, you can run multiple tasks or functions concurrently without having to manually create and manage threads.

from concurrent.futures import ThreadPoolExecutor

# Creating pool of 4 threads
executor = ThreadPoolExecutor(max_workers=4)

# Function to evaluate square number
def square_num(num):
    print(f"Square of {num}: {num * num}.")

task1 = executor.submit(square_num, 5)
task2 = executor.submit(square_num, 2)
task3 = executor.submit(square_num, 55)
task5 = executor.submit(square_num, 4)

# Wait for tasks to complete and then shutdown
executor.shutdown()

The above code creates a ThreadPoolExecutor with a maximum of 4 worker threads which means the thread pool can have a maximum of 4 worker threads executing the tasks concurrently.

Four tasks are submitted to the ThreadPoolExecutor using the submit method with the square_num() function and various arguments. This will execute the function with specified arguments and prints the output.

In the end, the shutdown method is called, so that ThreadPoolExecutor shutdowns after the tasks are completed and resources are freed.

You don't have to explicitly call the shutdown method if you create ThreadPoolExecutor using the with statement.

from concurrent.futures import ThreadPoolExecutor

# Task
def square_num(num):
    print(f"Square of {num}: {num * num}.")

# Using ThreadPoolExecutor as context manager
with ThreadPoolExecutor(max_workers=4) as executor:
    task1 = executor.submit(square_num, 5)
    task2 = executor.submit(square_num, 2)
    task3 = executor.submit(square_num, 55)
    task5 = executor.submit(square_num, 4)

In the above code, the ThreadPoolExecutor is used with the with statement. When the with block is exited, the ThreadPoolExecutor is automatically shut down and its resources are released.

Both codes will produce the same result.

Square of 5: 25.
Square of 2: 4.
Square of 55: 3025.
Square of 4: 16.

Common Function in Threading

The threading module provides numerous functions and some of them are explained below.

Getting Main and Current Thread

The threading module has a main_thread() and a current_thread() function which is used to get the main thread and the currently running thread respectively.

import threading

def task():
    for _ in range(2):
        # Getting the current thread name
        print(f"Current Thread: {threading.current_thread().name} is running.")

# Getting the main thread name
print(f"Main thread   : {threading.main_thread().name} started.")
thread1 = threading.Thread(target=task)
thread2 = threading.Thread(target=task)

thread1.start()
thread2.start()

thread1.join()
thread2.join()
print(f"Main thread   : {threading.main_thread().name} finished.")

Because the main_thread() and current_thread() functions return a Thread object, threading.main_thread().name is used to get the name of the main thread and threading.current_thread().name is used to get the name of the current thread.

Main thread   : MainThread started.
Current Thread: Thread-1 (task) is running.
Current Thread: Thread-1 (task) is running.
Current Thread: Thread-2 (task) is running.
Current Thread: Thread-2 (task) is running.
Main thread   : MainThread finished.

Monitoring Currently Active Threads

The threading.enumerate() function is used to return the list of Thread objects that are currently running. This includes the main thread even if it is terminated and excludes terminated threads and threads that have not started yet.

If you want to get the number of Thread objects that are currently alive, you can utilize the threading.active_count() function.

import threading

def task():
    print(f"Current Thread     : {threading.current_thread().name} is running.")

# Getting the main thread name
print(f"Main thread        : {threading.main_thread().name} started.")

threads_list = []

for _ in range(5):
    thread = threading.Thread(target=task)
    thread.start()
    threads_list.append(thread)
    # Getting the active thread count
    print(f"\nActive Thread Count: {threading.active_count()}")

for thread in threads_list:
    thread.join()

print(f"Main thread        : {threading.main_thread().name} finished.")
# Getting the active thread count
print(f"Active Thread Count: {threading.active_count()}")
# Getting the list of active threads
for active in threading.enumerate():
    print(f"Active Thread List: {active.name}")

Output

Main thread        : MainThread started.
Current Thread     : Thread-1 (task) is running.
Active Thread Count: 2

Current Thread     : Thread-2 (task) is running.
Active Thread Count: 2

Current Thread     : Thread-3 (task) is running.
Active Thread Count: 2

Current Thread     : Thread-4 (task) is running.

Active Thread Count: 2
Current Thread     : Thread-5 (task) is running.

Active Thread Count: 1
Main thread        : MainThread finished.
Active Thread Count: 1
Active Thread List: MainThread

Getting Thread Id

import threading
import time

def task():
    print(f"Thread {threading.get_ident()} is running.")
    time.sleep(1)
    print(f"Thread {threading.get_ident()} is terminated.")

print(f"Main thread started.")

threads_list = []

for _ in range(5):
    thread = threading.Thread(target=task)
    thread.start()
    threads_list.append(thread)

for thread in threads_list:
    thread.join()

print(f"Main thread finished.")

Every thread running in a process is assigned an identifier and the threading.get_ident() function is used to retrieve the identifier of the currently running thread.

Main thread started.
Thread 9824 is running.
Thread 7188 is running.
Thread 4616 is running.
Thread 3264 is running.
Thread 7716 is running.
Thread 7716 is terminated.
Thread 9824 is terminated.
Thread 7188 is terminated.Thread 4616 is terminated.

Thread 3264 is terminated.
Main thread finished.

Conclusion

A thread is a smaller unit in the program that is created using the threading module in Python. Threads are tasks or functions that you can use multiple times in your program to execute concurrently to save time and resources.

In this article, you've learned:

What is threading and how do you create and start a thread
Why join() method is used
What are daemon threads and how to create one
How to Lock threads to avoid race conditions
How semaphore is used to limit the number of threads that can access the shared resources at the same time.
How you can execute a group of tasks using the ThreadPoolExecutor without having to create threads.
Some common functions provided by the threading module.