DEV Community

Cover image for Implementing Parallel Merge Sort: 25sec vs 1.5sec
server side digest
server side digest

Posted on

Implementing Parallel Merge Sort: 25sec vs 1.5sec

🥵 When i was in my college days, one constant thing that I didn't get why the heck are we learning CS subjects like operating systems, computer networks and data structures and algorithms.

Computer science memes

🔥 Yes, exactly I was like this only till i discovered how to dig deeper in CS subjects and how I can use problem solving with my cs subjects.

⚡️ Last weekend, I was getting bored hence I was thinking constantly about what I should implement. So, i went back to my college days and thought of tweaking some basic sorting algorithm.

AND, Yes i implemented a Multithreaded Merge sort algorithm which took 1.5 sec on my mac m1 pro in comparison to 25 sec taken by a normal Merge sort algorithm.

If you prefer videos, Click on this one where You can go through the implementation details of the algorithm with theory:-
https://www.youtube.com/watch?v=t43U7UuVAHQ&t=247s

🫡 Merge Sort Theory (Recursion)

The fastest sorting algorithm has O(nlogn) time complexity and the there are two cool algorithms in sorting space: Merge sort & Quick Sort.

Because both of em uses recursion which is a super cool thing in Data structures and Algorithms subject (We will combine recursion with multithreading further in parallel merge sort)

Recursion meme

  • Merge sort relies on recursion where we divide array/list into two parts and continue to do so, until we get single element array (Base case of recursion).
  • Now, Single element array is always a sorted array and this is the base case where we assume that both left and right arrays are sorted
  • then we try to combine them both in a resulting array and return to the upper step of our recursion tree

merge sort

  • Now, you can notice one thing which is whenever we are dividing the array in two parts both of them are being processed in an independent manner, right?

Code of Merge sort goes like:-

#include "mergeSort.hpp"
#include <iostream>
MergeSort::MergeSort(std::vector<int> *nums){
    this->nums = nums;
}

MergeSort::~MergeSort(){}

void MergeSort::recursiveSort(int left, int right){
    if(left>=right){
        return;
    }

    int mid = left + (right-left)/2;

    recursiveSort(left, mid);
    recursiveSort(mid+1, right);

    std::vector<int> result;
    int i = left;
    int j = mid+1;

    while(i<=mid && j<=right){
        if((*nums)[i]<=(*nums)[j]){
            result.push_back((*nums)[i]);
            i++;
        }else{
            result.push_back((*nums)[j]);
            j++;
        }
    }

    while(i<=mid){
        result.push_back((*nums)[i]);
        i++;
    }

    while(j<=right){
        result.push_back((*nums)[j]);
        j++;
    }

    for(int k = 0; k<result.size(); k++){
        (*nums)[left+k] = result[k];
    }

    return;

}

void MergeSort::sort(){
    if((*nums).size() == 0){
        exit(1);
    }
    recursiveSort(0, (*nums).size());

}
Enter fullscreen mode Exit fullscreen mode

🚀 You can go through the code and it will be easy to gothrough it if you can dry run a case

📍 Parallel merge sort (recursion + multithreading)

Now, there is a catch, why can't we use our 8 cores machine parallel power. instead of waiting for left part of the array to get sorted we can hand over the right array to different thread right.

Confused?? dont worry let's look at it:-

Parallel merge sort

Code of the same is:-

#include "parallelMergeSort.hpp"
#include <thread>
#include <iostream>

ParallelMergeSort::ParallelMergeSort(std::vector<int> *nums) 
    : nums(nums){
}

ParallelMergeSort::~ParallelMergeSort(){}

void ParallelMergeSort::recursiveSort(int left, int right) {

    if (left >= right) {
        return;
    }
    int mid = left + (right - left) / 2;

    std::thread thread_1([this, left, mid]{ this-> recursiveSort(left, mid);});
    std::thread thread_2([this, mid, right]{ this-> recursiveSort(mid+1, right);});
    thread_1.join();
    thread_2.join();


    std::vector<int> result;
    int i = left;
    int j = mid + 1;

    while (i <= mid && j <= right) {
        if ((*nums)[i] <= (*nums)[j]) {
            result.push_back((*nums)[i]);
            i++;
        } else {
            result.push_back((*nums)[j]);
            j++;
        }
    }

    while (i <= mid) {
        result.push_back((*nums)[i]);
        i++;
    }

    while (j <= right) {
        result.push_back((*nums)[j]);
        j++;
    }

    for (int k = 0; k < result.size(); k++) {
        (*nums)[left + k] = result[k];
    }
}

void ParallelMergeSort::sort(){
   if((*nums).size()==0) {
        return;
   }
   std::thread thread_1([this] { this->recursiveSort(0, (*nums).size()-1);});
   thread_1.join();

}
Enter fullscreen mode Exit fullscreen mode

You can find the complete code at:- https://github.com/singhdevhub-lovepreet/parallelMergeSort

Note:- We have also used threshold of threads here, because depending on your machine our capacity of creating new threads might get exhaust. Hence, we have to make sure when you reach a low size array, sort that in normal manner, there is no need to spawn more threads

Now, you can look that we are delegating our work to different threads and waiting for them to get finish their work. Catch here is that because we dont have any shared resource where two threads can modify stuff at the same time.

Both left and right modifications in threads are independent of each other and hence we dont have to worry about synchronization premitives/acquiring a lock here.

The difference in sorting 10^7 elements is:-

Sorting benchmarking

That's it for today Guys. Follow for more.....

And If you have come so far, you can also subscribe to our newsletter:- https://www.serversidedigest.com/

Thanks ❤️

Top comments (1)

Collapse
 
green_apple_216c6bfb9dd4f profile image
GREEN APPLE

Ohh good one ✅