ruthmoog

Posted on Jul 24, 2023 • Edited on Sep 3, 2023

Book Club: Grokking Algorithms. 4: Quicksort, Hash tables, and Breadth-first search

#discuss #books #algorithms #bookclub

Grokking Algorithms An illustrated guide for programmers and other curious people, by Aditya Y. Bhargava

Chapter 4: Quicksort

Chapter 5: Hash tables

Chapter 6: Breadth-first search

If you read any of these chapters, please go ahead and discuss what you thought in the comments!

📝 you can read my notes below.

📆 The next post will be on chapters 7 - 8. I'll aim for August 2023.
If you're on Storygraph and want to join me, say hi and I'll start a Buddy Read.

Quicksort

key topics in this chapter: base case, recursive case, divide and conquer, big O notation, average vs worst case...

Base and recursive cases for binary search

A base case is the simplest scenario for an algorithm.

The base case for binary search is an array with 1 item, that is true (matches) or false (doesn't match).
The recursive case for binary search, is to split the array in half, then call binary search on the true half.

Average case vs. worst case

The pivot you choose will impact quicksort's performance!

cake, copyright Manning Publications, drawn by adit.io

A worst case scenario is a pivot of 1 on a sorted scenario O_(n)
It's much faster to choose the middle as the pivot, which is the best case scenario O_{(log n)}
The best case scenario is the average case! If you always choose a random element in an array as the pivot, quicksort will complete in _O_{(n log n)} time (in average)

Hash Tables

Key topics covered Common hash table use cases, hash functions, caching, collisions, load factor...

Hash functions

A function where you put in a sequence of data and you get back a number: it maps 'strings' to numbers
- the number you get back is consistent
- no two 'strings' are mapped to the same number

Hash tables

This data structure has logic behind it, unlike an array which maps straight to memory. A hash function and an array together make a hash table.

some examples from standard code libraries...

Use cases

Contacts/phonebook
DNS resolution (mapping a web address to an IP address)
Limiting survey submissions to one per user
Caching (websites remembering data instead of recalculating it)

Performance

On average, hash tables take O₍₁₎ which is constant time; the time taken will stay the same, regardless of the size of the table.
In the worst case, they take linear time for everything, O_(n), which is really slow.

A collision is when two keys are assigned to the same slot. If this happens, start a linked list in that slot. To avoid collisions, you need a low load factor and a good hash function.

Load factor measures how many empty slots remain in the hash table. Use resizing when the table is getting full, e.g. when the load factor is >0.7

load factor = ^{number of items in hash table}/_{total number of slots}

Breadth-first Search

This chapter covers graphs, network modelling, data structures, breadth-first search (BFS), queues, directed graphs, undirected graphs, topological sort, dependencies between nodes...

What is a graph?

A graph models a set of connections, it's made up of nodes and edges, where connected nodes are called neighbors.

Breadth-first Search (BFS)

BFS is an algorithm to solve the "shortest-path" problem (e.g. how to get from A to B with the fewest transport changes). To use BFS you need to,

Model a problem as a graph
Solve the problem with BFS

BFS can answer

Is there a path from node A to node B? (Search all 1st degree neighbors, then all 2nd degree connections, etc...)
What is the shortest path from node A to node B? (Searching all connections in order of their degree to the starting node, will find the shortest path)

Queues

Are have a "first in, first out" structure (FIFO) structure and have only two operations, enqueue and dequeue

Implementing the graph

A hash table data structure allows you to represent relationships, by mapping a node (the key) to all its neighbors (the value).

An undirected graph doesn't have any arrows, and a directed graph does.
Making an ordered list out of a graph is a topological sort.
If all directions are one-way, the graph is a tree.
nodes with shared neighbors will cause an infinate loop, so you need to check off nodes that have been processed.

Implementing the algorithm

Code example in Python

from collections import dequeue
queue = dequeue() # create a queue
queue += graph["Ruth"] # add all my neighbors to the queue

searched[] # keep track of searched nodes

# While the queue isnt empty
while queue:
   person = queue.popleft() # pop a node from the queue
   if not name in searched : 
      if friend_match_criteria(name) :
         print name + " matches the criteria!"
         return True
      else:
         queue += graph[name] # enqueue all the node's neighbors to the queue
         searched.append(name) # check off this node so it's not re-processed
return False # no friends met the criteria :(

def friend_match_criteria(name) :
   return name[-1] == '!' # checks if their name ends in an exclaimation

Running time

The running time will be at least O_{(number of edges)}
Adding a neighbor node to the queue takes constant time O₍₁₎, so doing this for every node or vertice will be O_{(number of vertices)}
Breadth-first search takes O_{(number of vertices + number of edges)} which is written as O_(V+E)

Key Takeaways

"Divide and conquer" in algorithms land is breaking a problem into smaller & smaller chunks
- the base case is probably an empty array/array with 1 element
Choose a random element for quicksort's pivot since the average runtime of quicksort is O_{(n log n)}
Sometimes the constant in Big O matters
but it almost never matters for simple Vs binary search
Using a lang's hash table will give you average case performance of constant time (the same amount of time regardless of the map size)
Collisions are bad, a good hash function will avoid them
Hashes are good for
- Catching duplicates
- Caching data
- Mapping a key to a value
- Fast search, insert, and delete behaviours
Breadth-first search (BFS) tells you if there's a path from A to B, and the shortest path
- the search list needed to be a queue
- nodes should not be re-checked (to avoid infinite looping)
An undirected graph has no arrows and goes both ways. A directed graph has arrows. If all arrows go one-way, the graph is a tree
Queues are FIFO (first in, first out) and stacks are LIFO (last in, first out)

DEV Community