Many software developers consider multithreaded programming as an advanced (and scary) topic. Because understanding this matter is not essential to employ multithreading techniques throughout software development, some programmers would rather skip the low-level complexities and go straight to the usage.
However, take some time to comprehend a few concepts can help developers to better picture the aspects of this kind of programming hence improving the ability of writing and reading this sort of code.
In this article, I intend to present the reader with the ideas related to this topic without diving into any language or platform specificities but focusing on the general behavior.
It’s not an essay for pros. I’ll deliberately simplify a few things here and there for the sake of the big picture.
In computer science, a thread is no more than a stream of sequential instructions that a CPU will eventually process.
The operating system is usually responsible for managing the threads. Nevertheless, sometimes the platform where the application runs (JVM, for instance) implements its own threading system, which can or cannot rely on the hosting OS.
A multithreaded program or application are those that run more than one thread in its process. Which, in the real world, are, basically, all out there.
Applications that features a graphical interface often reserve a specific thread just to handle the user interface elements. This thread is frequently referred to as the main one. The goal is to avoid blocking or freezing the user interface while running the application inner logic. This way, the only responsibility of the UI thread is to handle UI interactions and updates, so that all the other threads take care of the dirty work.
By the other hand, too many threads within a single process might indicate a bottleneck somewhere – not necessarily related to the code, though: it may be a slow disk for an intense I/O system, for instance; or a high latency when performing web requests.
It’s also important to notice that the process will only truly terminate when all of its threads have finished. So, even when closing all the windows or screens (hence finishing the main thread), the process may still continue to run as long as at least one of its threads is still alive. Keep that in mind if that’s not the expected behavior.
Now, how many threads are too much? How many threads should be enough? Those are tough questions to answer. Theoretically, the maximum number of threads should be the number of available CPUs in the device in which the application is running, in order to avoid context switch (don’t worry, we’ll get there). Surely, if the application’s built upon a framework, the number of threads that the framework itself requires should also be taken into account.
A utopic assumption, as you may conclude as well. Open your task manager and check out how many threads are running right now on your computer. I guess it will be much more than the number of CPUs it has.
Therefore, the answer for the “right” number of threads is neither a fixed number nor even a mathematical formula; the application must have the number of threads that it needs.
Now, let’s dive on how the computer works (and, by “computer”, I mean “all kind of electronic devices that uses CPU”; just to be thorough).
The CPUs were designed to execute a sequence of instructions. Nevertheless, a single CPU can only process a single stream of instructions (thread) at a time. So, how to make a quadcore computer runs more than just four threads?
The OS along with the CPUs quickly iterates through all the threads, across all the processes, in order to make it seems that all the applications are running simultaneously. The OS distributes the threads among all the available CPUs and then each CPU will run all the instructions it’s able to in a fraction of seconds for each given thread. Switching between these threads means a pause in the execution of the current thread and a resume of the execution of the next thread.
“Context switch” is the name of the action of when a CPU changes from one thread to another.
This process of iterating through the threads has, obviously, a cost. CPU usage takes time. Some instructions and operations can run rapidly, others are slower. It also means electricity/battery consumption. That’s the reason in which an application with intensive processing can drain the battery of devices.
You should be careful when creating a brand-new OS thread. It requires memory allocation and CPU instructions in order to set it up and also kill it down.
So, in order to better handle the usage of a thread and also avoid the creation of new ones, the operating systems or platforms reckon with a Thread Pool feature, which allows the application to take an already existing thread to use.
That’s a much more efficient way to handle multiple threads without dealing with its creation or destruction. Furthermore, the OSs know when a thread from the thread pool is not actively in use thus, they can automatically “skip” it during the threads’ iteration.
Unlike the regular thread, a thread-pool thread doesn't remain after the main thread is done, making it specifically useful for background operations that shall terminate along with the program.
You might have been told or read somewhere that you should never block a thread. People are willing to scream it aloud but seldom explain why.
When blocking a thread, you are forcing the CPU to attempt to run this thread even though it’s only waiting for something. In doing so, the code will slow down the performance not only of the application but also of the entire system. In short, it’s a waste of CPU time.
Although sometimes it seems easier to compel the thread to wait for something before moving on, it’s not recommended. This blocked thread forces the system to do nothing for a while when it could be working on something else.
Synchronous operations are, indeed, more straightforward to understanding and reading, but turning all the asynchronous operations into synchronous ones for the sake of maintainability is more than wrong. I know it might be painful to deal with asynchronous operations the right way in some languages or platforms, but it cannot turn into an excuse to not cope with.
Part of becoming a developer is to learn how to deal with both the nice and the awful portion of the technology with which you are working.
Asynchronous methods or functions are procedures which depend on the response of something external. Querying a web service, reading a file, waiting for a user input interaction, all of these do not depend on the thread, do not depend on the CPU.
The whole point of implementing and utilizing asynchronous procedures in the right way is to let the CPU work on other threads, other processes, instead of wasting time doing nothing.
There are two kinds of asynchronous methods/functions: CPU-bound and I/O-bound.
An I/O-bound asynchronous code is a piece of code in which the CPU does not directly do anything. Drives that communicate with some specific hardware are responsible for running that code. It means that network request/responses and disk read/write operations, for instance, are naturally I/O-bound asynchronous actions as they rely on the hardware to work.
The reason most of I/O operations do not require a thread to wait for data is that the underneath protocol (low level) uses a queue structure that conveniently suits quite well into the async/await approach.
By the other hand, a CPU-bound asynchronous code is a code that does run on the CPU. Perform a time-consuming calculation or evaluate a heavy regular expression are examples of what types of operations can be put aside to work on a separate thread in an asynchronous manner.
It’s important to notice that just fitting in the async/await pattern doesn't necessarily mean that the code you wrote perform asynchronously.
If you write a for-statement and inside each iteration the program waits for an async method, you are just transforming asynchronous code into synchronous. Asynchronous code must be dispatched, so that other (synchronous) code can run. The CPU should only wait for the asynchronous operations to complete when it needs their results.
If you have a facade method that waits on each line and two or more of those lines don’t relate to each other, it’s not asynchronous either. In this case, the method should fire all the tasks that don’t relate to each other and, as they return, proceed to run the remaining tasks that depend on their respective outcome.
I hope that with those two examples you can realize how the asynchronous pattern is supposed to work.
Both concurrency and parallelism relate to the same concept: distribute the work in multiple units, in such a way that it doesn't compromise the final product but minimizing the total execution time.
Concurrent execution is the possibility of two (or more) tasks to run at the same time whereas parallel execution is the ability of those two (or more) tasks to run at the very same time.
Concurrency stands for the possibility. Parallelism is the reality.
Turning concurrency into parallelism requires more than one CPU. You might write concurrent code, but it will only actually run in parallel in the presence of multiple CPUs. A single-core device running a concurrent code can only execute it in a sequential way.
Notwithstanding how amazing the parallel execution may be, there are some resources, sometimes shared among all the threads, sometimes shared among all the processes, that will require some coordination in order to work properly. Writing to a stream (a file, or a console), for instance; in case you don’t want to corrupt the file, nor display randomly merged messages to the output, it’s necessary to coordinate their reading and writing operations.
The tactic of caching data for the sake of performance has a long existence in computer science. It’s heavily applied in web environments as it also reduces the bandwidth consumption when surfing the Internet. What some people might not be aware of is that the CPUs employ this same technique as well.
The CPU may cache a variable’s value in order to improve the speed of the code execution.
The problem starts when a thread reads a variable’s value and stores it in its internal cache but then another thread, running in a different CPU, sets the value of the same variable: the first thread might keep using the outdated information, due to its cache.
It’s not common, though; and, certainly, doesn't happen to all kinds of data structures.
Furthermore, if you ever face this situation, know that it’s possible to specify to the CPU that it should not cache the value of some specific variable. The “volatile” keyword, when declaring the variable, should do the work. Nonetheless, the usage of this keyword should not be reckless. Analyze each case carefully because, even though the cache would store the obsolete information, it provides better performance for the application, overall.
It’s quite common to read the expression “thread-safe” related to asynchronous operations or to multithreading contexts. If different threads can access the same instance of a determined data structure and perform operations on it at the same time, it means it’s thread-safe.
Let’s say: a dictionary. A regular list of key/value pair in which the key must but unique. When working with multithreading, two (or more) threads can evaluate the same if-statement simultaneously, producing the same result. Now imagine that it is the statement that checks the existence of some key in the dictionary and the block of code that this if-statement protects should add a new (unique) key into the dictionary. The first thread that reaches the “add” method will run to completion, but the second one will throw an exception.
It happens because the dictionary structure of the example is not thread-safe. In order to work properly with multiple threads, it’s necessary to defer the entrance of the second thread into that code. In doing so, when the second thread attempts to evaluate the if-statement, it will produce a different result.
There are a few distinct ways to implement concurrency control and eventually you will find out the best technique for each occasion.
A solution for the aforementioned situation is to apply a lock around the checking and the adding statements. Locking a piece of code means that everything inside the lock block will run one thread at a time, and if other threads hit the begin of the block, they will have to wait until the thread that is inside the lock complete its execution.
The problem of applying a lock to certain blocks of statements is that it serializes the concurrent access to that specific part of the code, which is virtually the opposite of what you look for when using multiple threads. Of course, even in asynchronous methods, even in multithreading environments, there will be routines that require sequential or controlled execution. Therefore, it’s important to understand the code and only use a lock when it’s actually necessary. Otherwise, it just undoes part of all the working effort.
Multithreading is an undeniable part of modern software development. It’s supported by programming languages and platforms and goes all the way down to the operating system. Knowing how to work with multiple threads can definitely lead developers to build better applications.
Hence, I hope this article could have cast some light on this subject and helped you leverage your knowledge.