loading...

Union-find (Disjoint-set)

jjb profile image JB ・3 min read

Resources:

  1. Youtube playlist
  2. Video explanation
  3. Princeton explanation & implementations
  4. Wikipedia article

Takeaways:

  • The union-find data structure (also known as disjoint-set) keeps track of elements that are partitioned into disjoint subsets (also called components).
  • When elements are initially added to union-find, they are all considered members of their own component. That means if we create a union-find with 3 elements it will also contain 3 components (each component being a single element). This is achieved by having each element specify itself as its parent (self-referencing).
  • Union-find has three main operations: MakeSet(), Union(), & Find().
    • MakeSet() takes an element, adds it to the underlying collection/tree, gives it either a rank of 0 or size of 1 (explained later on), and specifies a parent pointer to itself (which creates a new component).
    • Find() takes an element x and traverses it's parent chain. Find() tells us what component x is in by finding the root element of that component. If two elements have the same root, they are in the same component.
    • Union() takes two elements x & y and merges the components they belong to together. Union() leverages Find() to determine what components x & y are in. If the roots are the same, x & y are in the same component, and no action is taken. If the roots are different, this means x & y are in separate components. These components are merged by pointing the root of one of the components to the root of the other (one root becomes the parent of the other).
      • How do we determine which component is the one to merge into the other? By using rank or size mentioned earlier.
  • Union by size merges the smallest component (fewest elements) into the larger one.
  • Union by rank merges the component with the shorter tree into the taller one. Each componenet has a rank, which starts as 0. If two components are unioned and they have the same rank then the resulting component's rank is increased by 1. If the unioned components have a different rank, then the resulting component's rank is equal to the highest rank between the two.
    • We use rank instead of tree height or depth because of a techninque called path compression that changes the height/depth of the components over time
  • Path compression is utilized in the Find() operation to flatten a components tree. It does this by making every element of the component point to the components root (every element's parent is now the root of the component). This makes the Find() operation faster for that component, as it has a flatter chain to traverse when looking for a component's root.
  • Union-find is often implemented with either an array or using a more object-oriented approach. In my implementations I use an array.
  • Union-find is a useful data structure when used on graphs. Union-find can be used for things such as efficiently connecting vertices, finding connected components, & determining minimum spanning tree.
  • Using path compression, both union-find using rank or size have an amortized time complexity of O(n) for Union() & Find() operations. Space is O(n). Without path compression, both Union() & Find() operations have a complexity of O(log n).

Below are array based union-find implementations by size & rank:

As always, if you found any errors in this post please let me know!

Discussion

pic
Editor guide