Posted on Jan 16, 2020 • Edited on Jun 12, 2020

Searching Algorithms

#computerscience #algorithms #datastructures #csharp

Linear search

Resources

Linear search overview

Takeaways

Very simple. Most developers already know it.
Iterate over an input and for each element compare it with the target element, when the current element and the target are equal the search is complete.
Inefficient as it scans the entire input. O(n) (linear) time complexity, O(1) space.

Binary search

Resources

Takeaways

Only works on sorted inputs.
It is a divide & conquer algorithm.
Works by finding the midpoint of an input and comparing the middle element with the target element.
If the middle element is equal to our target element then the search is over.
If the target element is less than the middle element, then we search the left half of the input (and repeat the process of finding the middle and comparing).
If the target element is larger than the middle element, then we search the right half of the input (and repeat).
Binary search reduces the time complexity to O(log n) (logarithmic) - as we do not need to scan the entire input (like with linear search). Space is O(1).

Binary vs linear search visualization

Finding the K'th smallest/largest element

Resources

Video explaining K'th smallest/largest + approaches to solve

Takeaways

K'th smallest problem statement: Given an array and an integer k, where k is smaller than the size of the array, find the k’th smallest element in the array. For example, given [13,532,667,33] and k = 2 - the k'th smallest element will be 33 (second smallest).
K'th largest works in the same way, but for largest instead of smallest. So given the same input of [13,532,667,33] and k = 2 - the k'th largest element will be 532 (second largest).
There are three main ways to solve k'th largest/smallest:
- Sort the input array, then grab the desired element. For k'th smallest the index of our target element would be k-1 (as indexes are 0 based). For k'th largest the index would be input.Length - k (as we want to start from the end of the array). This approach is O(n log n) (linearithmic).
- Use a min/max-heap and add each element of the input to the heap. Every time we add a new element to the heap, check that the heap's size is less than, or equal to, k. If the heap's size is greater than k, remove the root element (smallest or largest, depending on the type of heap). After adding every element to the heap, and removing as we go along, we will be left with k items in the heap. The root will now be the target element we want to return. This approach is O(k log n) (at least, the way I've seen it commonly implemented - you can optimize beyond this though)
- Use Quickselect (a sort of modified/adapted version of Quicksort). This runs in O(n) in the average case. See below for more on quickselect.

Quickselect

Resources

I really encourage those unfamiliar with quicksort to first read the quicksort section in my previous post on common sorting algorithms. I won't be repeating everything about the partitioning/sorting process of the quicksort algorithm here. It will be very helpful to understand quicksort before trying to understand quickselect.

Takeaways

Quickselect is a divide & conquer algorithm.
Like quicksort, it was invented by Tony Hoare and is also known as Hoare's selection algorithm.
It's time complexity, like with quicksort, depends on the pivot selection. As I mentioned in my previous post, randomly selecting the pivot guards well against the worst case O(n^2) (quadratic) running time.
Another approach to selecting the pivot, which Sedgewick suggests is an improvement to random selection (see improvements section), is median-of-three.
Median-of-three pivot selection takes the first, middle, and last elements of an input and returns the index of the median. For example, given [2,54,5,3,11,33,1] the start, middle, end would be 2,3,1, and the median of those is 2. The index of the median is therefore 0 - which is returned as the index of the pivot.
Quickselect, similar to binary search, works by reducing the amount of elements we need to search.
Quickselect does this by borrowing quicksort's partitioning - effectively splitting the input into two virtual lists (two partitions).
After sorting (which happens during partitioning), elements left of the pivot's index are smaller than it, elements to the right are larger. If k is less than the index of the pivot, then the partitioning process is repeated for the left partition only. If k is larger, then the process is repeated for the right partition.
By only searching/partitioning one of two partitions each time, quickselect reduces the overall number of elements we search (similar to binary search).
A side effect of quickselect is that the input array ends up partially sorted (not fully, as each partition is not dealt with). This detail isn't incredibly important, but interesting to note.
As mentioned previously, to understand quickselect it really helps to understand quicksort.

Bonus

Introspective sort (introsort) - In my last post I mentioned that sorting algorithms will sometimes switch to insertion sort under the hood for small inputs (as insertion sort is fast for small inputs). Whilst learning about searching, I came across what this process is known as (as it relates to quicksort): introsort. Introsort is what Microsoft uses for C#'s Array.Sort. Introsort will begin sorting an input using quicksort. Introsort will use insertion sort for tiny inputs, and will switch to heapsort if the number of recursions (recursion depth) is greater than log^n. This is also the same as saying when the number of partitions is greater than 2 * log^n (remember that for each recursive call of quicksort we have 2 partitions). Here is the Wikipedia for introsort.
Median of medians - Another way to select a pivot for quickselect. Median of medians attempts to find a pivot as close to the true median as possible. It has overhead in calculating the pivot and is more complex than median-of-three, but can produce better pivots as a result. This is the best explanation of median of medians I have come across (slide-show). And here is the Wikipedia for median of medians

Below you will find implementations of all the searching algorithms discussed:

As always, if you found any errors in this post please let me know!