This article was co-authored by @richelle_123drc
Even though most programming languages have built-in algorithms for sorting, it is crucial to recognize that there are numerous approaches to problem-solving for both beginners and experienced programmers. The fundamentals of sorting algorithms will help understand how to apply them in various systems. These knowledge for knowing different approaches for sorting algorithms provide an educational foundation that you can apply to various problem-solving, facilitating in-depth understanding for easy debugging and optimization of the program. Given our primary objective to create an algorithm for sorting our data, the ability to optimize our programs should be one of the goals we aim to achieve. As programmers, carefully choosing sorting algorithms that fits our purpose is one of our considerations to provide the best possible outcomes for our software.
How should we choose the best algorithm for our program?
First, comprehend the problem and its requirements, observe and compare several solutions, and assess and test the proposed solution. Here are some things you might need to consider:
Complexity
It is important to note that when your program needs to perform a simple task, it is advisable to use a straightforward algorithm that directly addresses the task for optimized efficiency. It is better to have an understanding regarding Big-O notation.
What is Big-O Notation?
Measuring how effectively an algorithm scales in the worst-case scenario as the input size increases involves describing how the program's speed decreases with larger datasets. In Big O notation, "n" is commonly utilized to denote the input size. For instance, when comparing an algorithm with quadratic time complexity O(n2) to one with linear time complexity O(n), both with an input size of 1000, the result indicates that the running time for the quadratic algorithm would be significantly greater, involving 1,000,000 operations, compared to the linear algorithm, which involves 1000 operations.
Symbols and Ranking:
1. O(1)
2. O(log n)
3. O(n)
4. O(n log n)
5. O(n2)
6. O(n!)
For in depth explanation: https://www.youtube.com/watch?v=XMUe3zFhM5c
Memory Allocation
Some algorithms demand a larger amount of memory for processing. This may lead to complications on devices with lower memory capacity. It is advisable to use algorithms that are more memory-efficient in such cases. It is better to use algorithms that can dynamically allocate storage to be more accessible to other users.
Security
Certain algorithms are susceptible to buffer overflow issues, making the program more vulnerable to security breaches.
Adaptability
With the variety of devices and operating systems that people use, algorithms ought to be able to adjust to different scenarios and still be user-friendly. Their adaptability improves their use on various platforms.
Type of Data
The data that you wished to process will also determine other algorithms. While some algorithms do better with almost-sorted data, others perform better with data that is randomly arranged or reverse-sorted.
What are some of the Common Sorting Algorithms?
Bubble Sort
This algorithm employs arrays and loops, providing a simple method of sorting. It serves as an excellent introductory tool for beginners, helping them grasp problem-solving concepts. Bubble Sort operates through a repetitive loop based on the number of elements in an array. It involves pairwise comparisons and swaps adjacent elements if they meet certain conditions. However, it is not recommended for longer datasets as it lacks optimization because it creates processes that are not necessary to be repeated.
How does it work?
Given: [7, 2, 4]
We will have two iterations loops because we have a size of 3 elements and we subtract one from it.
-
First Iteration/Cycle
Compare 7 and 2. Swap them because 2 is smaller.
Array becomes [2, 7, 4]
Compare 7 and 4. Swap them because 4 is smaller.
Array becomes [2, 4, 7] -
Second Iteration/Cycle
Compare 7 and 2. No swap needed.
Array remains [2, 4, 7]
Compare 7 and 4. No swap needed.
Array remains [2, 4, 7]
What did you observe?
Certain procedures don't need to be carried out again. This makes the software less optimized, but since the given problem is not complicated and uses small datasets, it still applies to this kind of situation.
In Summary,
Pros | Cons |
---|---|
Great for Beginners | Not applicable for large data set having time complexity of n2 |
No additional memory storage needed | Lack of adaptability to partially sorted data. |
Applicable to simple programs for a small data set | Worst case: reverse sorted array |
In C,
// Bubble sort in C
#include <stdio.h>
// perform the bubble sort
void bubbleSort(int array[], int size) {
// loop to access each array element
for (int step = 0; step < size - 1; ++step) {
// loop to compare array elements
for (int i = 0; i < size - step - 1; ++i) {
// compare two adjacent elements
// change > to < to sort in descending order
if (array[i] > array[i + 1]) {
// swapping occurs if elements
// are not in the intended order
int temp = array[i];
array[i] = array[i + 1];
array[i + 1] = temp;
}
}
}
}
// print array
void printArray(int array[], int size) {
for (int i = 0; i < size; ++i) {
printf("%d ", array[i]);
}
printf("\n");
}
int main() {
int data[] = {-2, 45, 0, 11, -9};
// find the array's length
int size = sizeof(data) / sizeof(data[0]);
bubbleSort(data, size);
printf("Sorted Array in Ascending Order:\n");
printArray(data, size);
}
Taken from: https://www.programiz.com/dsa/bubble-sort
Insertion Sort
Another algorithm is the Insertion Sort algorithm which is another simple algorithm that is not applicable to large data. It uses loops and arrays the same as the bubble sort.
How does it work?
Given: [6,4,7]
There will be two loops the same as the Bubble Sort. It will move from its position to the 2nd element which pushes other elements and stops only when it is not applicable to the condition.
Assume that the first element [6] is in the correct position.
-
First Loop
Start at the 2nd element [4]. Compare [4] to [6] and move 6 to the right. Repeat comparison between the two.
Array becomes [4,6,7] -
Second Loop
Move to the next element which is [7].
Compare it to the left element which is [6] and [7] and stop since [7] is greater.
Array becomes [4,6,7]
What did you observe?
Compared to Bubble sorting, it stops when it does not satisfy the given condition and moves to another iteration requiring lesser steps when dealing with small datasets
In Summary,
Pros | Cons |
---|---|
Simple | Not applicable for larger datasets |
Efficient for small datasets | Less efficient for linked lists |
Better compared to bubble sorting | May not maintain stability when using an unstable comparison function that only considers the numeric scores without considering the original order of appearance. |
Adapt to partially sorted data |
In C,
// Insertion sort in C
#include <stdio.h>
// Function to print an array
void printArray(int array[], int size) {
for (int i = 0; i < size; i++) {
printf("%d ", array[i]);
}
printf("\n");
}
void insertionSort(int array[], int size) {
for (int step = 1; step < size; step++) {
int key = array[step];
int j = step - 1;
// Compare key with each element on the left of it until an element smaller than
// it is found.
// For descending order, change key<array[j] to key>array[j].
while (key < array[j] && j >= 0) {
array[j + 1] = array[j];
--j;
}
array[j + 1] = key;
}
}
// Driver code
int main() {
int data[] = {9, 5, 1, 4, 3};
int size = sizeof(data) / sizeof(data[0]);
insertionSort(data, size);
printf("Sorted array in ascending order:\n");
printArray(data, size);
}
Taken from: https://www.programiz.com/dsa/insertion-sort
Selection Sort
A simple sorting method called selection sort divides the input array into two sections: a sorted and an unsorted region. By the name, it selects an element either the smallest or the largest element, then it will be swapped in the unsorted region. The process involves repeatedly choosing the smallest (or largest) element from the list's unsorted section and shifting it to the sorted section.
How does it work?
Given: [2, 1, 5]
-
First Loop
Find the minimum in the unsorted region [2, 1, 5], which is 1.
Swap 1 with the first element,
Array becomes [1, 2, 5] -
Second Loop
Find the minimum in the unsorted region [2, 5], which is 2.
Swap 2 with the first element in the unsorted region
Array is still [1, 2, 5]
In Summary,
Pros | Cons |
---|---|
Simple | Not applicable for larger Datasets |
No additional memory storage needed | Lack of adaptability to partially sorted data. |
In C,
// C program for implementation of selection sort
#include <stdio.h>
void swap(int *xp, int *yp)
{
int temp = *xp;
*xp = *yp;
*yp = temp;
}
void selectionSort(int arr[], int n)
{
int i, j, min_idx;
// One by one move boundary of unsorted subarray
for (i = 0; i < n-1; i++)
{
// Find the minimum element in unsorted array
min_idx = i;
for (j = i+1; j < n; j++)
if (arr[j] < arr[min_idx])
min_idx = j;
// Swap the found minimum element with the element at arr[i]
if(min_idx != i)
swap(&arr[min_idx], &arr[i]);
}
}
/* Function to print an array */
void printArray(int arr[], int size)
{
int i;
for (i=0; i < size; i++)
printf("%d ", arr[i]);
printf("\n");
}
// Driver program to test above functions
int main()
{
int arr[] = {64, 25, 12, 22, 11};
int n = sizeof(arr)/sizeof(arr[0]);
selectionSort(arr, n);
printf("Sorted array: \n");
printArray(arr, n);
return 0;
}
Taken from: https://www.geeksforgeeks.org/selection-sort/
Merge Sort
From its name, “Merge” it creates subarrays and merges them for final sorting. To better understand, it is dividing the original array into two in order to sort each half, then merging them back together.
How does it work?
Given: [5,3,7,4]
-
Divide into Two Halves
Left Half: [5, 3]
Right Half: [7, 4] Sort Each Half
- For the left half [5, 3]:
Divide into Two Halves:
Left half: [5]
Right half: [3]
Sort Each Half:
Both halves are sorted individually since
they have only one element each.
Merge Sorted Halves:
Merge the sorted left ([5]) and right ([3]) halves
into a single sorted array: [3, 5]
- For the right half [7, 4]
Divide into Two Halves:
Left half: [7]
Right half: [4]
Sort Each Half:
Both halves are sorted individually since
they have only one element each.
Merge Sorted Halves:
Merge the sorted left ([7]) and right ([4]) halves
into a single sorted array: [4, 7]
-
Merge the Final Sorted Halves
Merge the previously sorted left ([3, 5]) and right ([4, 7]) halves into a single sorted array
Array becomes [3, 4, 5, 7]
In Summary,
Pros | Cons |
---|---|
Works well with linked lists | Requires additional space |
Efficiently sort a list in O(n log n) time | Implementing the algorithm can be more complex. |
Well-suited for large datasets. | Slower for small datasets |
Consistently performs well across different types of input data. |
In C,
// C program for Merge Sort
include <stdio.h>
include <stdlib.h>
// Merges two subarrays of arr[].
// First subarray is arr[l..m]
// Second subarray is arr[m+1..r]
void merge(int arr[], int l, int m, int r)
{
int i, j, k;
int n1 = m - l + 1;
int n2 = r - m;
// Create temp arrays
int L[n1], R[n2];
// Copy data to temp arrays L[] and R[]
for (i = 0; i < n1; i++)
L[i] = arr[l + i];
for (j = 0; j < n2; j++)
R[j] = arr[m + 1 + j];
// Merge the temp arrays back into arr[l..r
i = 0;
j = 0;
k = l;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k] = L[i];
i++;
}
else {
arr[k] = R[j];
j++;
}
k++;
}
// Copy the remaining elements of L[],
// if there are any
while (i < n1) {
arr[k] = L[i];
i++;
k++;
}
// Copy the remaining elements of R[],
// if there are any
while (j < n2) {
arr[k] = R[j];
j++;
k++;
}
}
// l is for left index and r is right index of the
// sub-array of arr to be sorted
void mergeSort(int arr[], int l, int r)
{
if (l < r) {
int m = l + (r - l) / 2;
// Sort first and second halves
mergeSort(arr, l, m);
mergeSort(arr, m + 1, r);
merge(arr, l, m, r);
}
}
// Function to print an array
void printArray(int A[], int size)
{
int i;
for (i = 0; i < size; i++)
printf("%d ", A[i]);
printf("\n");
}
// Driver code
int main()
{
int arr[] = { 12, 11, 13, 5, 6, 7 };
int arr_size = sizeof(arr) / sizeof(arr[0]);
printf("Given array is \n");
printArray(arr, arr_size);
mergeSort(arr, 0, arr_size - 1);
printf("\nSorted array is \n");
printArray(arr, arr_size);
return 0;
}
Taken from: https://www.geeksforgeeks.org/merge-sort/
Quick Sort
Quick sort, as its name suggests, is a fast and efficient sorting algorithm. It uses a recursive “divide and conquer” method which, like Merge Sort, merges each sorted subarray to form the final sorted array.
How does it work?
As quick sort requires a lot of steps on paper, only a summarized view of the behavior will be shown here. For a more in-depth explanation, refer to https://www.youtube.com/watch?v=Vtckgz38QHs
Given: [8, 3, 6, 4, 7, 9, 1, 2, 5]
-
Arrange into Two Halves
Using the final number as the pivot, create two halves dividing numbers less than the pivot and greater than the pivot by comparing each variable to the pivot while ignoring larger numbers while swapping the smaller numbers with the first to fourth index of the array using a temporary variable. Then swap the pivot with the 5th index of the array.
Right Half: [3, 4, 1, 2]
Left Half: [9, 6, 8, 7] -
Recursion
Recursively divide each half, or partitions as it’s properly called, into two sides with the same process as the first step.
Right Half: [1, 2, 3, 4]
Left Half: [6, 7, 8, 9] -
Merge the Two Sorted Halves
Merge the two halves together with the first pivot from the first step
Array becomes [1, 2, 3, 4, 5, 6, 7, 8, 9]
In Summary,
Pros | Cons |
---|---|
Best and average case time complexity of n log n | Worst case complexity of O(n2) |
Space complexity of O(log n) | Is considered an unstable sorting algorithm |
In C,
include <stdio.h>
void swap(int* a, int* b) { // Swap two elements
int t = *a; // Temporary variable to store the value of a
*a = *b;
*b = t;
}
int partition(int array[], int start, int end) {
int pivot = array[end]; // Setting the pivot element to the last element
int i = (start - 1); // Creating a variable to store the index for swapping the smaller than the pivot elements
for (int j = start; j <= end - 1; j++) {
if (array[j] <= pivot) {
i++;
swap(&array[i], &array[j]); // Swap the smaller than the pivot element with the element at the index i
}
}
swap(&array[i + 1], &array[end]); // Swap the pivot element with the element at the index i + 1
return (i + 1);
}
void quicksort(int array[], int start, int end) {
if (start < end) { // If the start index is less than the end index
int pivot = partition(array, start, end); // Get the pivot element
quicksort(array, start, pivot - 1); // Sort the left subarray
quicksort(array, pivot + 1, end); // Sort the right subarray
}
}
int main() {
int array[] = {5, 3, 7, 6, 2, 8, 4, 1, 9}; // 9 elements
int n = sizeof(array) / sizeof(array[0]); // Size of array
quicksort(array, 0, n - 1); // Sort the array
for (int i = 0; i < n; i++) {
printf("%d ", array[i]); // Print the sorted array
}
}
Derived from: https://www.youtube.com/watch?v=Vtckgz38QHs
Heap Sort
Derived from Binary Heap structure, it utilizes a comparison-based sorting using a heap wherein you remove the parent in each step and place it at the end of the array. It is similar to Selection Sort.
How does it work?
Given: [4, 10, 3, 5, 1]
-
Turn the Given Array into a Heap and Max-Heapify it
-
Remove the Root Node and Place it at the End of the Array
-
Repeat the Process of Step 2
Array becomes [1, 3, 4, 5, 10]
In Summary,
Pros | Cons |
---|---|
Time complexity of n log n, in all cases | It is considered unstable |
Simple and memory efficient | Costly to run |
Inefficient when handling complex data |
In C,
// Heap Sort in C
include <stdio.h>
// Function to swap the position of two elements
void swap(int* a, int* b)
{
int temp = *a;
*a = *b;
*b = temp;
}
// To heapify a subtree rooted with node i
// which is an index in arr[].
// n is size of heap
void heapify(int arr[], int N, int i)
{
// Find largest among root,
// left child and right child
// Initialize largest as root
int largest = i;
// left = 2*i + 1
int left = 2 * i + 1;
// right = 2*i + 2
int right = 2 * i + 2;
// If left child is larger than root
if (left < N && arr[left] > arr[largest])
largest = left;
// If right child is larger than largest
// so far
if (right < N && arr[right] > arr[largest])
largest = right;
// Swap and continue heapifying
// if root is not largest
// If largest is not root
if (largest != i) {
swap(&arr[i], &arr[largest]);
// Recursively heapify the affected
// sub-tree
heapify(arr, N, largest);
}
}
// Main function to do heap sort
void heapSort(int arr[], int N)
{
// Build max heap
for (int i = N / 2 - 1; i >= 0; i--)
heapify(arr, N, i);
// Heap sort
for (int i = N - 1; i >= 0; i--) {
swap(&arr[0], &arr[i]);
// Heapify root element
// to get highest element at
// root again
heapify(arr, i, 0);
}
}
// A utility function to print array of size n
void printArray(int arr[], int N)
{
for (int i = 0; i < N; i++)
printf("%d ", arr[i]);
printf("\n");
}
// Driver's code
int main()
{
int arr[] = { 12, 11, 13, 5, 6, 7 };
int N = sizeof(arr) / sizeof(arr[0]);
// Function call
heapSort(arr, N);
printf("Sorted array is\n");
printArray(arr, N);
}
// This code is contributed by i_plus_plus.
Taken from: https://www.geeksforgeeks.org/heap-sort/
Radix Sort
An efficient sorting algorithm for integers and/or strings with fixed keys. It sorts items linearly digit-by-digit, and can be used with two variations, Least Significant Digit and Most Significant Digit.
How does it work?
Radix sort uses sub sorting algorithms that are stable such as Counting Sort, which will be used in this example. Refer to https://www.youtube.com/watch?v=XiuSW_mEn7g.
Given: [712, 26, 61, 832, 512]
-
Sort Array using the 3rd Digit of each Number
Using the 3rd digit of each number, sort them from smallest to largest (Counting Sort)
Array becomes [61, 712, 832, 512, 26] -
Repeat Process with 2nd Digit
Array becomes [712, 512, 26, 832, 61]
-
Repeat Process with 3rd Digit
The amount of reiterations are based on the number with the highest digits, which is 3 digits. As some numbers only have 2 digits, we will add a 0 to act as the 1st digit (i.e. 026, 061).
Array becomes [26, 61, 512, 712, 832]
In Summary,
Pros | Cons |
---|---|
Linear Time Complexity of O(d(n+b)), wherein d is the number of digits, n is the number of elements, and b is the base system being used | Is not space-efficient |
Is stable and in practical uses, it is better than comparison-based algorithms | Needs to be modified to handle floating points, negative numbers, and strings |
Can handle multiple key-types | Becomes less efficient when range of input data exceeds the number of data points |
In C,
include <stdio.h>
// Function to get maximum value in arr[]
int getMax(int arr[], int n) {
int mx = arr[0];
for (int i = 1; i < n; i++)
if (arr[i] > mx)
mx = arr[i];
return mx;
}
// A function to do counting sort of arr[] according to the digit represented by exp.
void countSort(int arr[], int n, int exp) {
int output[n]; // output array
int i, count[10] = {0};
// Store count of occurrences in count[]
for (i = 0; i < n; i++)
count[(arr[i] / exp) % 10]++;
// Change count[i] so that count[i] now contains actual position of this digit in output[]
for (i = 1; i < 10; i++)
count[i] += count[i - 1];
// Build the output array
for (i = n - 1; i >= 0; i--) {
output[count[(arr[i] / exp) % 10] - 1] = arr[i];
count[(arr[i] / exp) % 10]--;
}
// Copy the output array to arr[], so that arr[] now contains sorted numbers according to current digit
for (i = 0; i < n; i++)
arr[i] = output[i];
}
// The main function to that sorts arr[] using Radix Sort
void radixsort(int arr[], int n) {
// Find the maximum number to know number of digits
int m = getMax(arr, n);
// Do counting sort for every digit. Note that instead of passing the digit number, exp is passed. exp is 10^i where i is current digit number
// for (int exp = 1; m / exp > 0; exp *= 10)
countSort(arr, n, exp);
}
// A utility function to print an array
void print(int arr[], int n) {
for (int i = 0; i < n; i++)
printf("%d ", arr[i]);
printf("\n");
}
// Driver code
int main() {
int arr[] = {543, 986, 217, 765, 329};
int n = sizeof(arr) / sizeof(arr[0]);
// Function Call
radixsort(arr, n);
print(arr, n);
return 0;
}
Derived from: https://www.geeksforgeeks.org/radix-sort/
Built-in Sorting Algorithms
Programming languages have built-in sorting functions, such as the sort()
function in Python. And with sorting functions come sorting algorithms. In the real world, these functions are what are mostly being used instead of implementing the sorting algorithms themselves.
What are some examples of built-in sorting algorithms?
-
Python
As mentioned earlier, Python has a
sort()
function, and the sorting algorithm used for this is called Timsort. Implemented in Python by Tim Peters in 2002, Timsort is a hybrid sorting algorithm derived from Merge Sort and Insertion Sort, designed with the expectation of being able to handle real-world data.
-
Java
Earlier Java versions used a dual-pivot Quick Sort algorithm for their
Collections.sort()
andArrays.sort()
functions, but as Timsort provides a stable algorithm, Timsort is now used for current Java versions.
-
C++
C++ has the function
std::sort()
and it uses a 3-phase sorting algorithm that's a mixture of Quick Sort, Heap Sort and Insertion Sort. This amalgamation is called Introsort, and as all three algorithms are a comparison-type algorithm, Introsort is also considered a comparison-type algorithm.
-
C
C has a function called
qsort()
which internally uses a Quick Sort algorithm, hence the nameqsort()
. It can handle any data type within C.
References
- https://www.youtube.com/watch?v=XMUe3zFhM5c
- https://www.programiz.com/dsa/bubble-sort
- https://www.programiz.com/dsa/insertion-sort
- https://www.geeksforgeeks.org/selection-sort
- https://www.geeksforgeeks.org/merge-sort/
- https://www.youtube.com/watch?v=Vtckgz38QHs
- https://www.geeksforgeeks.org/quick-sort/
- https://www.geeksforgeeks.org/heap-sort/
- https://studysmarter.co.uk/explanations/computer-science/algorithms-in-computer-science/radix-sort/#:~:text=The%20advantages%20of%20using%20Radix,data%20with%20many%20unique%20elements.
- https://www.youtube.com/watch?v=XiuSW_mEn7g
- https://www.geeksforgeeks.org/radix-sort/
- https://en.wikipedia.org/wiki/Timsort
- https://en.wikipedia.org/wiki/Introsort
- https://www.javatpoint.com/qsort-in-c#:~:text=qsort()%20is%20a%20pre,type%2C%20including%20strings%20and%20structures.
- https://www.digitalocean.com/community/tutorials/sort-in-c-plus-plus
- https://www.linkedin.com/pulse/java-collectionssort-arrayssort-under-hood-surinder-kumar-mehra/
Top comments (0)