Introduction
Multithreaded Ruby is a niche topic in our community and to no surprise. Most Ruby applications are web servers built on Rails or Sinatra, those are single-threaded frameworks and developers on such projects rarely even need to know about threads, as the framework usually has got your back.
Even if you do not use it, some basic knowledge of multithreading (and its basic concepts) in an interpreted language like Ruby will surely come in handy throughout your career.
I assume you know about the GIL (Global Interpreter Lock). In case you don't know what it is, you can read my article Ruby's GIL in a nutshell
GIL != the end of the world
Even though it limits parallelism, Ruby's GIL does not completely stop it. As we know, it exists to guard the interpreter's internal state. As such, it only applies to Ruby operations. In our normal day-to-day code there are a lot of operations that are not the job of Ruby's interpreter to handle.
A good example is I/O operations. While waiting for an external service to load something, there is no need to hold the GIL, as this external service cannot harm our internal state.
Ruby's PostgreSQL library is written in C and its method call for a DB query releases the GIL. The following example shows that:
require 'thwait'
require 'pg'
start = Time.now
first_sleep = Thread.new do
puts 'Starting sleep 1'
conn = PG::Connection.open(dbname: 'test')
conn.exec('SELECT pg_sleep(1);')
puts 'Finished sleep 1'
end
second_sleep = Thread.new do
puts 'Starting sleep 2'
conn = PG::Connection.open(dbname: 'test2')
conn.exec('SELECT pg_sleep(1);')
puts 'Finished sleep 2'
end
random = Thread.new do
puts 'In a random thread'
end
ThWait.all_waits(first_sleep, second_sleep, random)
puts "Time it took: #{Time.now - start}"
Here we spin up two threads, create a connection to different databases and run a sleep query for a second. Without parallelism, this should take at minimum 2 seconds.
> enether$ ruby async_pg.rb
> Starting sleep 2
> Starting sleep 1
> In a random thread
> Finished sleep 2
> Finished sleep 1
> Time it took: 1.074824
But it runs in 1 second!
This proves that the PostgreSQL query does not hold the GIL and lets the other thread take control. Not only does it not lock the interpreter but it actually runs the query in parallel with the other query, that's the only way in which we could achieve a 1 second execution time to run two sleep queries!
Reminder: The GIL does not protect you
A problem can occur when two or more threads access shared data and try to change it. This is called a race condition.
Because Ruby's thread scheduling algorithm can swap between threads at any time, you don't know the order in which the threads will attempt to access the shared data. Therefore, the result of the change in data is dependent on the algorithm and seemingly out of your control.
It is therefore possible for two threads to modify data in such a sequence where you get an unexpected outcome.
Here is an example of the so called check-and-act race condition, where you check for a variable's value and then act in regards to it.
require 'thwait'
def send_money(amount)
puts "Sending $#{amount}"
sleep 1 # Simulate network call sending of money PS: This is I/O, so you know Ruby releases GIL here
end
threads = []
money_is_sent = false
2.times do
th = Thread.new do
unless money_is_sent
send_money 10
money_is_sent = true
end
end
threads << th
end
ThWait.all_waits(*threads)
We obviously want to send the money only once but running the code shows that this is not the case
> enether$ ruby balling.rb
> Sending $10
> Sending $10
As you saw, what looks like straightforward code can end up producing a huge problem (losing us money!) when executed concurrently. It is up to you to make your code thread-safe.
How to protect yourself
So how could we avoid such race conditions?
Simple, you can take the same approach as the Ruby Core team and introduce your own lock (kind of like the GIL), which would be a local lock on a block of code.
This is called a Mutex (Mutual Exclusion) and it helps you synchronize access to blocks of code, acting like a gatekeeper.
require 'thwait'
def send_money(amount)
puts "Sending $#{amount}"
sleep 1 # Simulate network call sending of money
end
lock = Mutex.new
threads = []
money_is_sent = false
2.times do
th = Thread.new do
lock.synchronize {
unless money_is_sent
send_money 10
money_is_sent = true
end
}
end
threads << th
end
ThWait.all_waits(*threads)
We define a Mutex
and call the synchronize
method. When we enter the block in the synchronize method, our mutex gets locked. If another thread tries to access code through lock.synchronize
it will see that the lock is locked and pause until it is unlocked.
> enether$ ruby balling_on_a_budget.rb
> Sending $10
Be sure to note that lock.synchronize
only prevents a thread from being interrupted by others wanting to execute code wrapped inside the same lock
variable!
Creating two different locks will obviously not work.
2.times do
Thread.new do
Mutex.new.synchronize {
unless money_is_sent
send_money 10
money_is_sent = true
end
}
end
end
> enether$ ruby lock_city.rb
> Sending $10
> Sending $10
yeah, no way
Mutexes are not perfect
Now that we know about these locks, we need to pay attention to how we use them. They offer protection but there is also a possibility where that can backfire on you if not used correctly.
It is possible to end up in a so-called deadlock (sounds scary, doesn't it?). A deadlock is a situation where one thread that holds mutex A waits for a mutex B to be released but the thread that holds mutex B is waiting for mutex A.
require 'thread'
require 'thwait'
first_lock = Mutex.new
second_lock = Mutex.new
a = Thread.new {
first_lock.synchronize {
sleep 1 # essentially forces a context switch
second_lock.synchronize {
puts 'Locked #1 then #2'
}
}
}
b = Thread.new {
second_lock.synchronize {
sleep 1 # essentially forces a context switch
first_lock.synchronize {
puts 'Locked #2 then #1'
}
}
}
ThWait.all_waits(a, b)
> enether$ ruby dead_lock.rb
> /Users/enether/.rvm/rubies/ruby-2.4.1/lib/ruby/2.4.0/thwait.rb:112:in `pop': No live threads left. Deadlock? (fatal)
They are both holding what the other thread wants and waiting for what the other thread has.
Of course, this is a pretty specific example and there are not many cases in which you might use two mutexes in such a way, but it is essential to know about this pitfall.
Summary
We saw that regardless of the GIL you can still do tasks asynchronously (I/O and native libraries) and confirmed that it won't save you from your thread-unsafe code.
You learned about the most common pitfall - the check-then-act race condition, we introduced a way of handling the problem through our own little GIL-esque lock (Mutex) and we saw that even that can backfire.
I hope I've managed to showcase how tricky multithreaded programming can turn out to be and how it can introduce problems you would not consider programming synchronously.
Top comments (1)
Fucking killer article man, has been a pleasure working with you (: