QuickSelect is a selection algorithm, to understand it better, let's solve a Facebook interview question.
A sneak peek of what we're trying to achieve :
112ms : quicksort ,
56ms : quickselect.
We are going to double the speed!
Question: Find Kth Largest Element in an Array
Eg : Input: [3,2,3,1,2,4,5,5,6] and k = 4
Output: 4
Let's start from a brute force approach to optimized quickselect and at each step, you'll learn better about quicksort and quickselect algorithms.
Brute Force
The brute force approach would be to sort the array, select the kth largest element.
I am going to discuss quicksort algorithm, if you know it then feel free to skip this part.
Quick Sort in layman terms
Quick Sort consists of three parts.
1> Divide the array at an index, it can be anything within the range which we call as a pivot.
2> Group all elements less than the pivot index element to the left of the pivot and all elements greater than the pivot index to the right of the pivot.
3> Perform steps 1 and 2 on the two sub-arrays.
Here, we are selecting the element at the end as the pivot to make our lives easier.
function partition(arr,low,high){
let pivot = arr[high];
let idx = low;
for(let j=low;j<high;j++){
if(arr[j]<pivot){
swap(arr,idx,j);
idx++;
}
}
swap(arr,idx,high);
return idx;
}
function swap(arr,i,j){
let temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
function quickSort(arr,low,high){
if(low<high){
let part = partition(arr,low,high);
quickSort(arr,low,part-1);
quickSort(arr,part+1,high);
}
}
let arr = [-11,6,-4,5,2,0,12,5,-42];
quickSort(arr,0,arr.length-1);
console.log(arr);
If you're interested in reading more about quicksort go here.
Back to the main question, to get the Kth largest element, we will just apply quicksort on the array and find the Kth largest element on the sorted array which would be
KthLargest = arr.length - k
But quicksort has a major flaw, it runs at an average of O(nlogn) and has worst case of O(n^2). So how do we improve this?
Priorty Queues / Min Heap
Since Priority Queue in itself is a huge topic, if you're interested in its working, please go to this article
To get a gist, we implement a min heap using priority queue, why min heap?
Tip : if you're asked to find the Kth largest element then use min heap and kth smallest element the implement max heap.
So basic idea is that since min heap always gives us the minimum value,
1 > Use min heap of size K.
2 > Add elements to min heap.
3 > At each step, if the size of min heap exceeds K .
4 > Pop from min heap, ie the minimum value in heap, after parsing all the elements, we will have a heap with size K.
5 > But also we will have the Kth largest element at root since all the elements less than Kth largest were already popped out and all the elements larger than the Kth largest are present after the root, so popping the root will give us Kth largest element.
The guarantee that the algorithm will always work in O(nlogn) is a huge bump but uses O(nlogn) space.
So let's optimize it to O(n).
Quickselect
Let's first understand what does Quicksort actually accomplishes.
In one sentence, quicksort finds the "right" position for the current pivot index element.
Think about it, we perform quicksort in the following order, select a pivot, shift all element less than pivot to left and shift all elements greater than pivot to right, so essentially we're sandwiching the pivot element to its right place.
To visualize :
consider you're making a: '🥪'
this is what it looks when in proper order :
['🍞','🧀','🥬','🧈','🧅','🥩','🍅','🍞']
and currently, you've ingredients in the following order :
['🥬','🍅','🧀','🍞','🥩','🧈','🍞','🧅']
from the quicksort algorithm, selecting '🧅' as the pivot,
after the first iteration, the order will be :
['🥬','🧀','🍞','🧈','🧅','🍅','🥩','🍞']
So '🧅' is now on its correct position in the sandwich and
we won't disturb its position again.
In short, we found the index where '🧅' must be.
Then we continue the same on the left and right side.
So what's with quickselect?
In quicksort we were sorting the entire array, in quickselect we shall sort only a partial array. How? let's see.
Going back to the quicksort algorithm, we perform a step where we partition the array, get the pivot index, and then perform quicksort on subarray left and right of the pivot index.
We shall use this pivot index for our advantage and we do it as follows :
if(pivot+1 == k) return nums[pivot] //since index 0 is 1st element
if the pivot is the Kth element, return the pivot element.
if(pivot < k) return quicksort(nums, k, pivot+1, high)
if the pivot index is less than Kth index, we perform quicksort only on the right subarray.
else return quicksort(nums,k,low,pivot-1)
else perform quicksort only on the left subarray.
Putting it together:
let arr = [0,1,2,3,0];
function partition(arr,low,high){
let pivot = arr[high];
let idx = low;
for(let i=low;i<high;i++){
if(arr[i]<=pivot){
let temp = arr[i];
arr[i] = arr[idx];
arr[idx] = temp;
idx++;
}
}
let temp = arr[idx];
arr[idx] = arr[high];
arr[high] = temp;
return idx;
}
function quickSelect(arr,low,high,k){
if(low>high) return;
let pivot = partition(arr,low,high);
if(pivot+1 == k){
return part;
}
if(pivot<k){
return quickSelect(arr,pivot+1,high,k);
}else{
return quickSelect(arr,low,pivot-1,k);
}
}
let res = quickSelect(arr,0,arr.length-1,4);
console.log(arr[res]);
But this algorithm suffers from problems of quicksort, ie what if the array is already sorted? In that case our algorithm will work in O(n^2) instead of O(n).
So how to optimize this even further?
We have to choose a good pivot index to ensure O(n) average runtime. And the best way to do this is to randomize the pivot.
So instead of choosing the last element as pivot, we choose a random pivot.
Updated code :
//nums.length-k because we want Kth largest element.
var findKthLargest = function(nums, k) {
return quickSelect(nums,nums.length-k,0,nums.length-1);
};
function quickSelect(nums,k,low,high){
// base case optimization
if(low == high){
return nums[low];
}
// pivot index
let pivot = partition(nums,low,high);
if(pivot == k) return nums[pivot];
if(pivot<k){
return quickSelect(nums,k,pivot+1,high);
}else{
return quickSelect(nums,k,low,pivot-1);
}
}
function partition(nums,low,high){
// using mid as random index since native JS randomization was taking
//too long
let idx = Math.floor(low+(high-low)/2);
//swap idx with low and set it as pivot element
swap(nums,low,idx);
let pivot = low;
low++;
// perform swapping
while(low<=high){
while(low<=high && nums[low] < nums[pivot])
low++;
while(low<=high && nums[high] >= nums[pivot])
high--;
if(low>high){
swap(nums,pivot,high);
return high;
}
swap(nums,low,high);
}
return high;
}
I was amazed by its performance :
112 ms : Quicksort
56 ms : Quickselect
When used native javascript randomization, it would be awesome if someone explains why performance suffers in this case :
Now you know why companies like Facebook asks this, they want you to come up with such crazy algorithms why will make things work faster.
Even though we were able to achieve O(n) average, we can't be sure if it will be O(n) in the worst case even with randomization, so if the interviewer asks which one to go for, Quikselect or minheap, use that as a topic to discuss both approaches in length to get brownie points :P
I hope you understood and liked my article.
Please comment if I messed up somewhere or you've a better optimized approach to the same !
github: https://github.com/AKHILP96/Data-Structures-and-Algorithms/blob/master/Algorithm/quickSelect.js
Latest comments (2)
Good article. Just to inform you, you have typos in the headings Visualization.
Thanks for pointing out : )