This article explains how to implement and use the Counting Sort algorithm. Since I am a Java developer, the examples would be using Java, but it is applicable to any other programing language. If you want to have a summary, just scroll to a summary section.
Suppose there is an array given [1, 3, 1, 5]
which needs to be sorted in ascending order. Pretty simple isn't it?
Counting Sort
Here is a pseudocode from wiki that demonstrates provides a basic algorithm:
function CountingSort(input, hi)
count ← array of hi + 1 zeros //1
for i = 0 to length(input) - 1 do //2
j = key(input[i])
count[j] = count[j] + 1
for i = 1 to hi do //3
count[i] = count[i] + count[i - 1]
output ← array of same length as input //4
for i = length(input) - 1 down to 0 do //5
j = key(input[i])
count[j] = count[j] - 1
output[count[j]] = input[i]
return output
Don't worry there will be an explanation for each individual step, but first, let's state some terms:
input
- is a given array.
output
- is a resulting sorted array.
count
- is an array that would be used for counting.
hi
- is the biggest element in the input
.
First part. Create a count array
Create the count
array filled with 0
(zeros) of hi + 1
length. It will store the count of each element placed at the number as an index. For instance, 3
from the input
array, the count of it's occurrences will be stored at the count[3]
cell.
Now, it is obvious why we need to know the hi
element from the input
array and why an extra cell is needed for 5th index.
int[] countingSort(int[] input, int hi) {
var count = new int[hi + 1]; //1
//Rest of the code
}
Java by default fills an array with zeros, but for other languages, there might be some handling to prepare the count
array.
Second part. Count number of elements
Next, using the count
array, all elements need to be counted by looping over the input
array, it will help us with the following steps. With [1, 3, 1, 5]
as the input
array, the result would be [0, 2, 0, 1, 0, 1]
.
Here is an example with Java:
int[] countingSort(int[] input, int hi) {
var count = new int[hi + 1]; //1
for (var n : input) { //2
count[n]++;
}
//Rest of code
}
Third Part. Calculate indexes
This time loop over the count
array to calculate indexes, where count[i] = count[i] + count[i - 1]
, except for the 0
indexed element, which stays unchanged. By doing this we would know where each element needs to be placed (see the following steps). For instance, for the input
array [1, 3, 1, 5]
with the counts [0, 2, 0, 1, 0, 5]
the indexes array would be [0, 2, 2, 3, 3, 4]
.
int[] countingSort(int[] input, int hi) {
var count = new int[hi + 1]; //1
for (var n : input) { //2
count[n]++;
}
for (var i = 1; i < count.length; i++) { //3
count[i] += count[i - 1];
}
//Rest of code
}
Fourth part. Create output array
Create the output
array, with the same length as original one. Nothing special here, but it will be important later on.
int[] countingSort(int[] input, int hi) {
var count = new int[hi + 1]; //1
for (var n : input) { //2
count[n]++;
}
for (var i = 1; i < count.length; i++) { //3
count[i] += count[i - 1];
}
var output = new int[input.length]; //4
//Rest of code
return output;
}
Fifth part. Fill output array
The [0, 2, 2, 3, 3, 4]
indexes array from the previous steps means the following: each element can be expressed as a range with the element itself as an exclusive end and the previous one as an inclusive beginning (zero-indexed element takes 0
as inclusive lo range):
- Element
0
is in a range[0, 0)
, zero repetitions - Element
1
is in a range[0, 2)
, two repetitions - Element
2
is in a range[2, 2)
, zero repetitions - Element
3
is in a range[2, 3)
, one repetition - Element
4
is in a range[3, 3)
, zero repetitions - Element
5
is in a range[3, 4)
, one repetition
At this point everything to form the output
array is available. Loop over the input
array inserting the elements from it into the output
array.
int[] countingSort(int[] input, int hi) {
var count = new int[hi + 1]; //1
for (var n : input) { //2
count[n]++;
}
for (var i = 1; i < count.length; i++) { //3
count[i] += count[i - 1];
}
var output = new int[input.length]; //4
for (var i = input.length - 1; i >= 0; i--) { //5
output[--count[input[i]]] = input[i];
}
return output;
}
This might be a little tricky, but nothing too special. At first, input[i]
returns the current element in the input
array. count[input[i]]
returns an exclusive end range for the element. For instance, for the number 5
it would be 4
. Then --
decrease it to 3 -> --count[input[i]]
with is inclusive now. Finally, set the element to a position of the output
array output[--count[input[i]]] = input[i]
. Looping backward helps to preserve it stable. There will be a paragraph about this term.
Possible improvements
OK, it is a good enough implementation, but there are a couple of improvements that could be made.
Check input
What if the input
is null or empty, or contains only one element? In such a case, there is nothing we could or should do.
int[] countingSortImproved(int[] input) {
if (input == null || input.length < 2) return input;
//The rest of code
}
Unknown max element
What if we don't know the hi
element in the array? Then the length of the count
array is unknown as well. Let's find it on our own.
int[] countingSortImproved(int[] input) {
if (input == null || input.length < 2) return input;
//safe to do, since we know that the length is at least 2
var hi = input[0];
for (var i = 1; i < input.length; i++) {
hi = Math.max(hi, input[i]);
}
var count = new int[hi + 1]; //1
//The rest of code
}
Already sorted input
What if the input
array is already sorted? It would be good to check it first to avoid a lot of manipulations.
int[] countingSortImproved(int[] input) {
if (input == null || input.length < 2) return input;
var hi = input[0];
var sorted = true;
for (var i = 1; i < input.length; i++) {
hi = Math.max(hi, input[i]);
sorted &= input[i - 1] <= input[i];
}
if (sorted) return input;
var count = new int[hi + 1]; //1
//The rest of code
}
Together with finding hi
a check can be performed to examine the current element if it is bigger or equal to the previous. &=
accumulates all measurements to sorted
. Only if all checks returned true
then the input
could be returned back as a result.
Big and/or negative numbers
So far so good, but what if there are negative numbers [-3, 2, 1]
? Or big numbers [1030-1, 1030-2, 1030]? The algorithm will fail for the first case, as there is no -3
index in the count array. In the second case, it will work but the count
array would be 1030 long.
To solve both issues a shift can be introduced. Let's take our lo
element and agree that it will be our 0
index.
The first case: -3 -> 0; 2 -> 5; 1 -> 4.
The second case: 1030-2 -> 0; 1030-1 -> 1; 1030-> 2
The updated code would look like this:
int[] countingSortImproved(int[] input) {
if (input == null || input.length < 2) return input;
var hi = input[0];
var lo = input[0];
var sorted = true;
for (var i = 1; i < input.length; i++) {
hi = Math.max(hi, input[i]);
lo = Math.min(lo, input[i]);
sorted &= input[i - 1] <= input[i];
}
if (sorted) return input;
//now the count would fit in the [lo, hi] range
var count = new int[hi - lo + 1]; //1
for (var n : input) { //2
count[n - lo]++; //transform the index, -3 or 10^30-2 to 0;
}
for (var i = 1; i < count.length; i++) { //3
count[i] += count[i - 1];
}
var output = new int[input.length]; //4
for (var i = input.length - 1; i >= 0; i--) { //5
//count index now input[i] - lo
output[--count[input[i] - lo]] = input[i];
}
return output;
}
The updated steps are 1, 3, and 5.
Combine step 4 and 5
OK, let's consider the following input
and count
arrays: [3, 2, 3, 5]
and [1, 2, 0, 1]
, lo = 2
. It is possible to loop over the count
array and add as many numbers as count for it's number. So even the original array is not needed anymore.
-
2
one time (index 0) -
3
two times, (index 1) -
4
zero times, (index 2) -
5
one time, (index 3)
[2, 3, 3, 5]
as a result. But be aware that this change makes the algorithm not stable. It is not important for the ints, but if some Objects are being sorted and it is critical to have original ones, this improvement will not work.
int[] countingSortImprovedAndCombined(int[] input) {
if (input == null || input.length < 2) return input;
var hi = input[0];
var lo = input[0];
var sotred = true;
for (var i = 1; i < input.length; i++) {
hi = Math.max(hi, input[i]);
lo = Math.min(lo, input[i]);
sorted &= input[i - 1] <= input[i];
}
if (sorted) return input;
var count = new int[hi - lo + 1]; //1
for (var n : input) { //2
count[n - lo]++;
}
var output = new int[input.length]; //4
var index = 0;
for (var i = 0; i < count.length; i++) { // 3 & 5 combined
for (var j = 0; j < count[i]; j++) {
output[index++] = i + lo;
}
}
return output;
}
Problems can be solved
Now let's consider some problems other than initial ascending order sorting with the counting algorithm.
Sort descending order
To sort in descending order, all that needs to be done is alter step 3 calculating indexes.
Original:
for (var i = 1; i < count.length; i++) { //3
count[i] += count[i - 1];
}
Changed:
for (var i = count.length - 2; i >= 0 ; i--) { //3
count[i] += count[i + 1];
}
Just counting indexes from the end of the count
array.
Finding max occurrence
The main point here is to find the number with max occurrence. In such a case all that we need is the count
array, making the algorithm even simpler.
int countingSortImprovedMaxOccurrence(int[] input) {
if (input.length < 2) return input[0];
var hi = input[0];
var lo = input[0];
for (var i = 1; i < input.length; i++) {
hi = Math.max(hi, input[i]);
lo = Math.min(lo, input[i]);
}
var count = new int[hi - lo + 1]; //1
var maxOccurrence = input[0];
for (var n : input) { //2
if (++count[n - lo] > count[maxOccurrence - lo]) {
maxOccurrence = n;
}
}
return maxOccurrence;
}
The first steps are the same and the 3rd, 4th, and 5th are not needed anymore. The small trick here is to find the maximum occurrence element during calculating counts and avoid an additional loop.
Removing duplicates
The task is simple, together with sorting required removing duplicates. The count
array can be modified to store boolean
as the count for an element always would be 1 if it appears at least once. In addition, the elements that appear more than once needs to be counted so that the output
array is created with the appropriate size.
int[] countingSortImprovedDistinct(int[] input) {
if (input == null || input.length < 2) return input;
var hi = input[0];
var lo = hi;
var sorted = true;
for (var i = 1; i < input.length; i++) {
hi = Math.max(hi, input[i]);
lo = Math.min(lo, input[i]);
sorted &= input[i - 1] <= input[i];
}
if (sorted) return input;
var count = new boolean[hi - lo + 1]; //1
var removed = 0;
for (var n : input) { //2
if (count[n - lo]) {
removed++;
} else {
count[n - lo] = true;
}
}
var output = new int[input.length - removed]; //4
var outputIndex = 0;
for (var i = 0; i < count.length; i++) { //5
if (count[i]) {
output[outputIndex++] = i + lo;
}
}
return output;
}
There are many other problems that can be solved with an adaptation of or modification of the algorithm.
Complexity and Characteristics
One of the biggest limitations is distribution of numbers. We have solved couple of issues related to it but in the following case using the counting sort is inefficient. [2147483647, -2147483648]
(Java's max and min numbers), although it is obvious that sorting is pretty simple but the count
array would be extremally huge and even will not fit into int
range. So that previous implementation would work only if the hi - lo
is in [0, 2147483647)
range.
The algorithm is stable unless steps 4 and 5 are combined. Stable in the context of the sorting algorithms means that the elements that originally were in the correct position will not be moved from that place during sorting. This could be important when mowing is an expensive operation. For instance, if books on a bookshelf are being sorted, it would be great not to move books already in their place.
It is not in-place by default, but if modified to combine steps 4 and 5 it will be in-place. In-place means that the original array will be modified to supply the result. Since for the result a new array is being created, it is not fit. If take the bookshelf analogy, a new bookshelf would be built and filled with books from the old one.
The Space Complexity of the algorithm is O(n+k)
, where k
is the size of the count array and n
is the size of the original array. The implementation requires having the count array and the result array making the major space waisted on. If steps 4 and 5 are combined, then O(k)
, since no more need for the result array.
The Time Complexity is again O(n+k)
. First, there is a loop over the count array to create it and set 0
to each element - k
operations. Then loop over the input array to count elements - n
operations. Next loop over the count array, to calculate indexes - k
operations. n
operations to create the result array. And finally, loop over the result array to fill it with sorted values - n
operations. As a result there are k + n + k + n + n = 3n + 2k
and simplified to O(n+k)
.
Summary
The Counting Sort is one of the best sorting algorithms with O(n+k)
time complexity and O(n+k)
space complexity. It consists of steps: create count array, count elements in the input, loop over count array to calculate indexes count[i] += count[i-1]
, look over input array in backward to fill output array decreasing index.
It is stable, but not in-place. Could be modified to be in-place, but not stable. Inefficient if hi - lo
is significantly bigger than number of elements.
Links
https://en.wikipedia.org/wiki/Counting_sort
https://www.bigocheatsheet.com/
https://en.wikipedia.org/wiki/Category:Stable_sorts
https://en.wikipedia.org/wiki/In-place_algorithm
https://www.amazon.com/Algorithms-Java-Parts-1-4-Pts-1-4/dp/0201361205
https://www.amazon.com/Introduction-Algorithms-fourth-Thomas-Cormen/dp/026204630X/ref=sr_1_1
Top comments (1)
mind blowing
Very nicely explained!