Let’s refresh some basics first. As we know everything we execute in a computer program boils down to an OS thread that is executed by the CPU. Each core in a CPU can execute a single thread. A program may have several types of tasks such as doing some computation, waiting on some condition or blocking on an I/O, etc. So all the tasks in a program are not CPU intensive. For an instance when there is an I/O operation the thread gets blocked and the CPU time gets wasted. In order to avoid that the OS creates a new thread for another task and CPU switches to that, we call this context switching.
When we have a lot of blocking threads, the OS keep creating new threads. When there are a lot of threads to execute, the OS has to switch between threads to give everyone a chance to execute. This may lead to a decrease in performance. This is because, as the number of OS threads in the system increases, the context switch overhead start to dominate and each thread have to wait longer to be scheduled.
So in order to have better performance either the programmer has to write clever code or the programming language should provide a better abstraction on top of the OS threads.
A strand can be called a lightweight thread. In Ballerina a sequence of execution is called a Strand. A program starts with a default strand. Every time you create a new worker or make an async call, a new strand is getting created. In a the service mode each and every request is a new strand.
In a programing language like java, a java thread is mapped to an OS thread and it’s up to the OS to schedule those threads. But several ballerina strands can be run on the same OS thread. That’s why we call it a lightweight thread. This is more similar to the Go implementation.
Ballerina scheduler is responsible for executing the ballerina strands. The scheduler is where we have implemented the non-blocking architecture of ballerina. Scheduler has a fixed size thread pool. Since the ballerina scheduler is non-blocking we do not need something like a growing thread pool. Let’s see how that works.
In the scheduler, we have a set of threads that are ready to accept strands to be executed. Once a strand is submitted, one of these threads starts executing the strand. If a strand comes to a state where it is blocked to another strand (or an external library) the strand is yielded and the execution returns back to the scheduler. The thread gets released from the old strand and it will pick a new strand from the submitted runnable strands list. It is the duty of the strand that blocked the old strand to put it back to the runnable strand list when it is ready to continue. The structure of the strand makes sure that the strand will continue from where it got blocked.
With this non-blocking design, since the Ballerina itself reuses the blocking threads, the OS does not have to worry on recreating new threads and assigning tasks. We believe that this program level switching is more efficient than the OS level thread creations. Also, if the Ballerina program has a thread pool size equals to the number of cores, since the program does not create new threads there will be minimal context switching in the CPU.
Currently, the default thread pool size in Ballerina 1.0 is set to twice the number of cores available due to some other reasons. But this can be configured by setting the system variable “BALLERINA_MAX_POOL_SIZE”.
A ballerina program that is written without using any standard libraries will run 100% non-blocking. But Ballerina is a language for network distributed systems and a modern program cannot live without interacting with external parties. For that Ballerina has a set of standard libraries to be used for network interactions and other use cases.
Most of these libraries have their own thread pools and will not block the Ballerina scheduler threads. For instance, the HTTP standard library uses its own non-blocking thread pool. Ballerina scheduler couples well with this and delivers optimal performance with the thread pool size of the number of cores.
But, there are some libraries such as JDBC that are still blocking. Those libraries will not release the scheduler thread when it is blocked. For such cases, we need to set the Ballerina thread pool size to a larger number.
Anyway, this can be improved. One option is sharing the same Ballerina scheduler to the standard libraries as well. Any suggestions on this are welcome.
Please note that this post is based on Ballerina 1.0.
So it’s time for your ballet in the non-blocking style. Happy coding!