Phuong Le

Posted on Oct 29, 2024 • Originally published at victoriametrics.com

Go sync.Cond, the Most Overlooked Sync Mechanism

#go

This is an excerpt of the post; the full post is available here: https://victoriametrics.com/blog/go-sync-cond/

This post is part of a series about handling concurrency in Go:

Go sync.Mutex: Normal and Starvation Mode
Go sync.WaitGroup and The Alignment Problem
Go sync.Pool and the Mechanics Behind It
Go sync.Cond, the Most Overlooked Sync Mechanism (We're here)
Go sync.Map: The Right Tool for the Right Job
Go Singleflight Melts in Your Code, Not in Your DB

In Go, sync.Cond is a synchronization primitive, though it's not as commonly used as its siblings like sync.Mutex or sync.WaitGroup. You'll rarely see it in most projects or even in the standard libraries, where other sync mechanisms tend to take its place.

That said, as a Go engineer, you don't really want to find yourself reading through code that uses sync.Cond and not have a clue what's going on, because it is part of the standard library, after all.

So, this discussion will help you close that gap, and even better, it'll give you a clearer sense of how it actually works in practice.

What is sync.Cond?

So, let's break down what sync.Cond is all about.

When a goroutine needs to wait for something specific to happen, like some shared data changing, it can "block," meaning it just pauses its work until it gets the go-ahead to continue. The most basic way to do this is with a loop, maybe even adding a time.Sleep to prevent the CPU from going crazy with busy-waiting.

Here's what that might look like:

// wait until condition is true
for !condition {  
}

// or 
for !condition {
    time.Sleep(100 * time.Millisecond)
}

Now, this isn't really efficient as that loop is still running in the background, burning through CPU cycles, even when nothing's changed.

That's where sync.Cond steps in, a better way to let goroutines coordinate their work. Technically, it's a "condition variable" if you're coming from a more academic background.

When one goroutine is waiting for something to happen (waiting for a certain condition to become true), it can call Wait().
Another goroutine, once it knows that the condition might be met, can call Signal() or Broadcast() to wake up the waiting goroutine(s) and let them know it's time to move on.

Here's the basic interface sync.Cond provides:

// Suspends the calling goroutine until the condition is met
func (c *Cond) Wait() {}

// Wakes up one waiting goroutine, if there is one
func (c *Cond) Signal() {}

// Wakes up all waiting goroutines
func (c *Cond) Broadcast() {}

Alright, let's check out a quick pseudo-example. This time, we've got a Pokémon theme going on, imagine we're waiting for a specific Pokémon, and we want to notify other goroutines when it shows up.

var pokemonList = []string{"Pikachu", "Charmander", "Squirtle", "Bulbasaur", "Jigglypuff"}
var cond = sync.NewCond(&sync.Mutex{})
var pokemon = ""

func main() {
    // Consumer
    go func() {
        cond.L.Lock()
        defer cond.L.Unlock()

        // waits until Pikachu appears
        for pokemon != "Pikachu" {
            cond.Wait()
        }
        println("Caught" + pokemon)
        pokemon = ""
    }()

    // Producer
    go func() {
        // Every 1ms, a random Pokémon appears
        for i := 0; i < 100; i++ {
            time.Sleep(time.Millisecond)

            cond.L.Lock()
            pokemon = pokemonList[rand.Intn(len(pokemonList))]
            cond.L.Unlock()

            cond.Signal()
        }
    }()

    time.Sleep(100 * time.Millisecond) // lazy wait
}

// Output:
// Caught Pikachu

In this example, one goroutine is waiting for Pikachu to show up, while another one (the producer) randomly selects a Pokémon from the list and signals the consumer when a new one appears.

When the producer sends the signal, the consumer wakes up and checks if the right Pokémon has appeared. If it has, we catch the Pokémon, if not, the consumer goes back to sleep and waits for the next one.

The problem is, there's a gap between the producer sending the signal and the consumer actually waking up. In the meantime, the Pokémon could change, because the consumer goroutine might wake up later than 1ms (rarely) or other goroutine modifies the shared pokemon. So sync.Cond is basically saying: 'Hey, something changed! Wake up and check it out, but if you're too late, it might change again.'

If the consumer wakes up late, the Pokémon might run away, and the goroutine will go back to sleep.

"Huh, I could use a channel to send the pokemon name or signal to the other goroutine"

Absolutely. In fact, channels are generally preferred over sync.Cond in Go because they're simpler, more idiomatic, and familiar to most developers.

In the case above, you could easily send the Pokémon name through a channel, or just use an empty struct{} to signal without sending any data. But our issue isn't just about passing messages through channels, it's about dealing with a shared state.

Our example is pretty simple, but if multiple goroutines are accessing the shared pokemon variable, let's look at what happens if we use a channel:

If we use a channel to send the Pokémon name, we'd still need a mutex to protect the shared pokemon variable.
If we use a channel just to signal, a mutex is still necessary to manage access to the shared state.
If we check for Pikachu in the producer and then send it through the channel, we'd also need a mutex. On top of that, we'd violate the separation of concerns principle, where the producer is taking on the logic that really belongs to the consumer.

That said, when multiple goroutines are modifying shared data, a mutex is still necessary to protect it. You'll often see a combination of channels and mutexes in these cases to ensure proper synchronization and data safety.

"Okay, but what about broadcasting signals?"

Good question! You can indeed mimic a broadcast signal to all waiting goroutines using a channel by simply closing it (close(ch)). When you close a channel, all goroutines receiving from that channel get notified. But keep in mind, a closed channel can't be reused, once it's closed, it stays closed.

By the way, there's actually been talk about removing sync.Cond in Go 2: proposal: sync: remove the Cond type.

"So, what's sync.Cond good for, then?"

Well, there are certain scenarios where sync.Cond can be more appropriate than channels.

With a channel, you can either send a signal to one goroutine by sending a value or notify all goroutines by closing the channel, but you can't do both. sync.Cond gives you more fine-grained control. You can call Signal() to wake up a single goroutine or Broadcast() to wake up all of them.
And you can call Broadcast() as many times as you need, which channels can't do once they're closed (closing a closed channel will trigger a panic).
Channels don't provide a built-in way to protect shared data—you'd need to manage that separately with a mutex. sync.Cond, on the other hand, gives you a more integrated approach by combining locking and signaling in one package (and better performance).

"Why is the Lock embedded in sync.Cond?"

In theory, a condition variable like sync.Cond doesn't have to be tied to a lock for its signaling to work.

You could have the users manage their own locks outside of the condition variable, which might sound like it gives more flexibility. It's not really a technical limitation but more about human error.

Managing it manually can easily lead to mistakes because the pattern isn't really intuitive, you have to unlock the mutex before calling Wait(), then lock it again when the goroutine wakes up. This process can feel awkward and is pretty prone to errors, like forgetting to lock or unlock at the right time.

But why does the pattern seem a little off?

Typically, goroutines that call cond.Wait() need to check some shared state in a loop, like this:

for !checkSomeSharedState() {
    cond.Wait()
}

The lock embedded in sync.Cond helps handle the lock/unlock process for us, making the code cleaner and less error-prone, we will discuss the pattern in detail soon.

How to use it?

If you look closely at the previous example, you'll notice a consistent pattern in consumer: we always lock the mutex before waiting (.Wait()) on the condition, and we unlock it after the condition is met.

Plus, we wrap the waiting condition inside a loop, here's a refresher:

// Consumer
go func() {
    cond.L.Lock()
    defer cond.L.Unlock()

    // waits until Pikachu appears
    for pokemon != "Pikachu" {
        cond.Wait()
    }
    println("Caught" + pokemon)
}()

Cond.Wait()

When we call Wait() on a sync.Cond, we're telling the current goroutine to hang tight until some condition is met.

Here's what's happening behind the scenes:

The goroutine gets added to a list of other goroutines that are also waiting on this same condition. All these goroutines are blocked, meaning they can't continue until they're "woken up" by either a Signal() or Broadcast() call.
The key part here is that the mutex must be locked before calling Wait() because Wait() does something important, it automatically releases the lock (calls Unlock()) before putting the goroutine to sleep. This allows other goroutines to grab the lock and do their work while the original goroutine is waiting.
When the waiting goroutine gets woken up (by Signal() or Broadcast()), it doesn't immediately resume work. First, it has to re-acquire the lock (Lock()).

Here's a look at how Wait() works under the hood:

func (c *Cond) Wait() {
    // Check if Cond has been copied
    c.checker.check()

    // Get the ticket number
    t := runtime_notifyListAdd(&c.notify)

    // Unlock the mutex     
    c.L.Unlock()

    // Suspend the goroutine until being woken up
    runtime_notifyListWait(&c.notify, t)

    // Re-lock the mutex
    c.L.Lock()
}

Even though it's simple, we can take away 4 main points:

There's a checker to prevent copying the Cond instance, it would be panic if you do so.
Calling cond.Wait() immediately unlocks the mutex, so the mutex must be locked before calling cond.Wait(), otherwise, it will panic.
After being woken up, cond.Wait() re-locks the mutex, which means you'll need to unlock it again after you're done with the shared data.
Most of sync.Cond's functionality is implemented in the Go runtime with an internal data structure called notifyList, which uses a ticket-based system for notifications.

Because of this lock/unlock behavior, there's a typical pattern you'll follow when using sync.Cond.Wait() to avoid common mistakes:

c.L.Lock()
for !condition() {
    c.Wait()
}
// ... make use of condition ...
c.L.Unlock()

The typical pattern for using sync.Cond.Wait()

"Why not just use c.Wait() directly without a loop?"

This is an excerpt of the post; the full post is available here: https://victoriametrics.com/blog/go-sync-cond/

DEV Community

Go sync.Cond, the Most Overlooked Sync Mechanism

What is sync.Cond?

How to use it?

Cond.Wait()

Top comments (0)