A Brief Intro to Multi-Threaded Programming

#c #multithreading

Of course before we start talking about multithreaded programs we need some background information. Even though there is a lot of background information for this concept, we will only discuss the bare minimum.

What is a Thread?

A thread is a sequential flow of control in a program. So it's essentially a sequence a program instructions that are managed by the scheduler so they can be executed.

What is a scheduler?

Don't worry about it for now. Just know that it's a part of the operating system. There is a link in the references section if you want to know more.

Let's say your working on a system with multiple physical cores or multiple virtual cores in a hyper threaded environment. Multi-Threading is a way of telling your operating system that these are the different threads or chunks of instructions that I want to run on my computer. The operating system (scheduler) is responsible for choosing which thread gets time on the CPU cores whether they are physical or virtual. So if you have a single program running on multiple threads that proposes some interesting issues and some very powerful advantages.

Issue

Take the same code below for example.

Bad Multi-Threaded Code

# include <stdio.h>
# include <stdlib.h>
# include <pthread.h>
# include <unistd.h>
volatile long int a = 0;
void threadOne(void *arg) {
  int i;
  for (i = 1; i < 500000; i++) {
    a = a + i;
  }
}
void threadTwo(void *arg) {
  int i;
  for (i = 500000; i <= 1000000; i++) {
    a = a + i;
  }
}
int main (int argc, char **argv) {
  pthread_t one, two;
  int i;
  pthread_create(&one, NULL, (void*)&threadOne, NULL);
  pthread_create(&two, NULL, (void*)&threadTwo, NULL);
  pthread_join(one, NULL);
  pthread_join(two, NULL);
  printf("%ld\n", a);
}

All this program does is add up all of the numbers from 1 to 190009000. But the catch is that it does it using two threads of execution. So one thread will add up the numbers from 1 to 500,000 and the other from 500,000 to 1,000,000.

Let's run this code multiple times and observe.

Run #1 Output - 370752617319
Run #2 Output - 336751282108
Run #3 Output - 250846382315
Run #4 Output - 314774608001

Hmmm… why do we get different answers every time we run this code? In theory, we should get the right answer in half the amount when compared to using a single thread.

Here is another program that does the same thing but uses only one thread.

Single Thread

# include <stdio.h>
# include <stdlib.h>
# include <unistd.h>
volatile long int a;
int main (int argc, char **argv) {
  int i;
  a = 0;
  for (i = 1; i <= 1000000; i++) {
    a = a + i;
  } 
  printf("%ld\n", a);
}

As expected we get the same value every time. So what is the multithreaded program doing that makes it produce different incorrect answers?

Each thread loads the variable A from memory, makes a local copy, adds a value to it, and then stores it back into A. The issue is that both threads are using the global variable A to add up their numbers but they aren't they aren't working in sync. They could both load variable A at the same time add a value to it then store it back into A. Depending on which one stores the value back to A first, one update will be overwritten by the other.

The solution to this issue would be to synchronize these two threads so that they don't try to operate/access A at the same time. One way to do this would be to use a mutex (mutual exclusion). You can use tokens as an analogy. Let's say that now each thread can only access A if it has the token. This means that if thread one has the token and is working with A, then thread two can not access A and is forced to wait until thread one releases the token.

Let's look at some code

Better but still bad Multi-Threaded Code

# include <stdio.h>
# include <stdlib.h>
# include <pthread.h>
# include <unistd.h>
pthread_mutex_t a_mutex = PTHREAD_MUTEX_INITIALIZER;
volatile long int a = 0;
void threadOne(void *arg) {
  int i;
for (i = 1; i < 500000; i++) {
    pthread_mutex_lock(&a_mutex);
    a = a + i;
    pthread_mutex_unlock(&a_mutex);
  }
}
void threadTwo(void *arg) {
  int i;
  for (i = 500000; i <= 1000000; i++) {
    pthread_mutex_lock(&a_mutex);
    a = a + i;
    pthread_mutex_unlock(&a_mutex);
  } 
}
int main (int argc, char **argv) {
  pthread_t one, two;
  int i;
  pthread_create(&one, NULL, (void*)&threadOne, NULL);
  pthread_create(&two, NULL, (void*)&threadTwo, NULL);
  pthread_join(one, NULL);
  pthread_join(two, NULL);
  printf("%ld\n", a);
}
Run #1 Output - 500000500000
Run #2 Output - 500000500000
Run #3 Output - 500000500000
Run #4 Output - 500000500000

You can see that there are some new lines that lock and unlock the resource from being accessed. Now we get the correct output! But there is still a problem since using threads this way is pointless. It's pointless because while thread one is working with A thread two is forced to wait until thread one releases the resource. In theory this is as slow as using a single thread.

Time to fix it!

Advantage

Good Multi-Threaded Code

# include <stdio.h>
# include <stdlib.h>
# include <pthread.h>
# include <unistd.h>
pthread_mutex_t a_mutex = PTHREAD_MUTEX_INITIALIZER;
volatile long int a = 0;
void threadOne(void *arg) {
  int i;
  long int localA = 0;
  for (i = 1; i < 500000; i++) {
    localA = localA + i;
  }
  pthread_mutex_lock(&a_mutex);
  a = a + localA;
  pthread_mutex_unlock(&a_mutex);
}
void threadTwo(void *arg) {
  int i;
  long int localA = 0;
  for (i = 500000; i <= 1000000; i++) {
    localA = localA + i;
  }
  pthread_mutex_lock(&a_mutex);
  a = a + localA;
  pthread_mutex_unlock(&a_mutex);
}
int main (int argc, char **argv) {
  pthread_t one, two;
  int i;
  pthread_create(&one, NULL, (void*)&threadOne, NULL);
  pthread_create(&two, NULL, (void*)&threadTwo, NULL);
  pthread_join(one, NULL);
  pthread_join(two, NULL);
  printf("%ld\n", a);
}

In this code we are using a local variable called localA in each function. We do this so that thread one will add numbers from 1 to 500,000 to its own localA and at the same time thread two is adding numbers from 500,000 to 1,000,000 to its own localA. After one thread is done adding it will lock A, update it, then unlock it so the next thread can use it. So now we have significantly reduced the time one thread waits on the other. In theory this should run twice as fast as opposed to using a single thread.