Can you guess why snippet 1 is outperforming by 400% than snippet 2 π€―
So, this is a thread pool implementation for a simple HTTP server using an mpsc channel
to add incoming requests' handlers to a queue then threads can receive the sent closures (request handlers) and execute them in parallel.
Snippet 1:
type Job = Box<dyn FnOnce() + Send + 'static>;
struct Worker {
id: usize,
thread: JoinHandle<()>,
}
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || loop {
// here π
let job = receiver.lock().unwrap().recv().unwrap();
// here π
println!("executing on thread: {}", id);
job();
});
Worker { id, thread }
}
}
Snippet 2:
type Job = Box<dyn FnOnce() + Send + 'static>;
struct Worker {
id: usize,
thread: JoinHandle<()>,
}
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || loop {
// here π
let lock = receiver.lock().unwrap();
let job = lock.recv().unwrap();
// here π
println!("executing on thread: {}", id);
job();
});
Worker { id, thread }
}
}
Benchmarks
Tested using Artillery on a CPU that has 16 logical processors and checked the graphs to confirm that all threads are indeed running in parallel and not concurrently.
Snippet 1:
Snippet 2:
Really? a single line can make that difference? aren't they the same though π€
Let's discuss a few things first to understand where is the problem π¨βπ«
mpsc
from its name is "multiple producers, single consumer", in this case, we want a single producer(main thread) to add requests' handlers to the queue and we want multiple consumers so that multiple threads can consume messages in parallel.
So to achieve our goal we wrapped the receiver we got from mpsc
in an Arc
which gives us multiple ownership for the receiver to pass to the different threads to work with and to make it that only one thread is responsible for executing a handler we used Mutex
which allows only one thread to access the receiver at any given time, because we don't want two threads to respond for a given request.
To access the data in a mutex, a thread must first signal that it wants access by asking to acquire the mutexβs lock. This lock keeps track of who currently has exclusive access to the data and the lock is released when the MutexGuard
is dropped.
Notice in snippet 1:
// -------β¬β¬β¬β¬Temporary Valueβ¬β¬β¬β¬
let job = receiver.lock().unwrap().recv().unwrap();
β¨the trick here is: using let
, any temporary values used in the expression on the right-hand side of the equals sign are immediately dropped when the let statement ends!
So that means that the lock is released before the thread even starts executing the handler, which means that other threads can acquire the lock to receive incoming requests' handlers to process them πͺ
As you see here we're running 16 threads in parallel which is freaking awesome!
executing on thread: 0
executing on thread: 1
executing on thread: 2
executing on thread: 5
executing on thread: 4
executing on thread: 3
executing on thread: 10
executing on thread: 7
executing on thread: 13
executing on thread: 9
executing on thread: 6
executing on thread: 11
executing on thread: 12
On the other hand with snippet 2:
let thread = thread::spawn(move || loop {
let acquired_lock= receiver.lock().unwrap();
let job = acquired_lock.recv().unwrap();
println!("executing on thread: {}", id);
job();
} // `acquired_lock` is dropped here and lock is released.
);
the lock is released after the execution of the handler, specifically at the end of the scope where It was defined.
Therefore the threads, in this case, are running concurrently and not in parallel, cause each thread has to wait for the other thread who acquired the lock to release it so that it can read the sent closures from the receiver.
And actually the current lock acquirer probably will pick it up again faster than other threads cause it's like:
- acquire lock.
- release lock.
- next iteration of the loop.
- acquire lock.
- ...
which finally leads to single-threaded execution even if there are threads spawned and waiting in the pool, As shown below π
executing on thread: 0
executing on thread: 0
executing on thread: 0
executing on thread: 0
executing on thread: 0
executing on thread: 0
executing on thread: 0
executing on thread: 0
executing on thread: 0
executing on thread: 0
Very tricky right? π
The lesson here is, to be careful when dealing with Mutex
cause the compiler can't guarantee the desired threading behavior you wanna achieve such as this scenario or other scenarios like deadlocks which are so much worse.
That's why I've included a picture of Ferris that means that "This code compiles but does not produce the desired behavior." from the rust lang book
Without the logs, I wouldn't even notice that there was a problem π€·ββοΈ
Hope you enjoyed it π€
Top comments (0)