Discussion on: Is Cooperative Concurrency Here to Stay?

View post

Replies for: Thank you, yes, for eg NGINX runs a worker process per cpu core as you describe (nginx.com/blog/inside-nginx-how-we...). And as you point out, it d...

We need to be careful about what we're measuring and comparing. A "context switch" will happen anytime you make an OS call, this includes the IO functions I mentioned. There is definitely overhead in this alone, even if we don't get into scheduling.

For a thread switch there will be more overhead, as Zuodian mentions. On Linux, or any OS, I don't think the actual process selection takes much time (these aren't complex data structures). The switching of page/privilege tables possibly.

But, here's an important aspect. When you switch to a new thread you have different memory pointers and your caches, especially the L0/L1 levels may not have that data in memory. This causes new loads to be requested. This will also happen if you have green threads and switch in user space, since your cooperative "threads" also have different memory.

Without a concrete comparison of use-cases written in both approaches I still think it'd be hard to say whether the actual context/thread switch is a significant part of the problem.

Nested Software • Sep 18 '18 • Edited

That's a good point that we should compare apples to apples.

In a very rough way, it seems that the difference in performance between NGINX and Apache (with worker/event model) suggests that preemptive multitasking has enough overhead to make a significant difference. NGINX can handle at least twice the number of requests per second.

Why that happens is less clear to me. Is context switching a thread significantly slower than context switching in-process? Do both have the same effect on the CPU cache? If that doesn't make enough of a difference, maybe there's something else going on.

I think threads/processes are preempted in Linux something like every 1-10 milliseconds in CFS, so that's a fixed, guaranteed cost. I wonder if maybe the frequency of context switching is significantly lower with a cooperative approach, since we only context switch voluntarily, usually because of I/O. There wouldn't be any context switching at all between tasks while they're using the CPU. Could that be the difference?

This suggests that might be the case:

When an NGINX server is active, only the worker processes are busy. Each worker process handles multiple connections in a nonblocking fashion, reducing the number of context switches.

Zuodian Hu • Sep 18 '18

When you change from one user process to another, the virtual page table must change since each process has its own virtual address space. That's the main thing I can think of that makes in-process context switches faster.

You're right in a sense; a cooperative event loop avoids OS-managed context switches entirely by running all tasks in the same thread context. So the only time that event loop has to give up CPU time to the OS is when the OS preempt the entire event loop thread, or some task in the event loop performs a system call. It's a technicality, but Linux will still preempt the event loop periodically to run other user processes, handle interrupts, and run kernel threads. The event loop just keeps all the tasks that it owns from fragmenting into multiple user threads.

Nested Software • Sep 19 '18

This is a good point, though I think in Linux at least, it only applies to actual process switching. That is, if we're switching between threads that are all running under the same process (as would be the case for Apache process running on a given core), I believe the memory isn't switched out the way it is when we switch from one process to another. I am not familiar with the details though, just that heap memory is generally shared, so I may be missing something.

Zuodian Hu • Sep 19 '18

I would encourage finding some reading material on operating systems if you're interested in this kind of stuff, I personally love it. Here is the one my OS professor wrote.