# Union Find | Disjoint Set Union Data Structure Introduction

Prereq: Depth First Search Review

Once we have a strong grasp of recursion and Depth First Search we now introduce Disjoint Set Union (DSU).

For this data structure we are motivated by the following problem. Suppose we have sets of elements and we are asked to check if
a certain element belongs to a particular set. In addition, we want to have our data structure support updates as well through
merging two sets into one set. One might think to use a list of hashsets in order
to accomplish this but our end goal is a data structure that can handle the merge and query operations both in effectively `O(1)`

time, the merge
operation will take O(n) time as we must add each element individually. Therefore, we support the following operations

- We want to be able to query for the Set ID of a given node (find operation)
- We want to merge two disjoint sets into one set (union operation)

We can imagine the disjoint set data structure as a series of trees such that a particular element within a tree belongs solely to that set and no other set. The following graphic illustrates this idea.

Now what do we mean by set ID and elements. The elements are the particular nodes of the tree that belong to that set and
we nominate a particular node to be the parent of all the nodes which will act as an identifier. We know that if two nodes
share this same parent they must therefore belong to the same set. Furthermore, if they donβt share the same parent they donβt
belong in the same set. In this example we have Set ID 0 and 4 which means nodes 0 and 4 act as the parents of the set. We can accomplish
this with a singular Hash Map where key i in the Hash Map represents node iβs parent node. Initially, we set every node's parent to itself
as every node is in a set by itself. We can then merge two sets by setting one node's parent to the other node's parent.
We can find what the Set ID node is by recursively moving up the chain of parents to find the parent which points back to itself. The
following code accomplishes a Union operation in `O(1)`

and a Find operation that has best case `O(1)`

, average case `O(log(n))`

since we have
a randomized trees which have average depth `O(log(n))`

and a worst case of `O(n)`

for a maximum depth tree.

```
1public static class UnionFind<T> {
2 // initialize the data structure that maps the node to its set ID
3 private HashMap<T, T> id = new HashMap<>();
4
5 // find the set ID of Node x
6 public T find(T x) {
7 // get the value associated with key x, if it's not in the map return x
8 T y = id.getOrDefault(x, x);
9 // check if the current node is a Set ID node
10 if (y != x) {
11 // set the value to Set ID node of node y
12 y = find(y);
13 }
14 return y;
15 }
16
17 // union two different sets setting one Set's parent to the other parent
18 public void union(T x, T y) {
19 id.put(find(x), find(y));
20 }
21}
22
```

```
1class UnionFind:
2 # initialize the data structure that maps the node to its set ID
3 def __init__(self):
4 self.id = {}
5
6 # find the Set ID of Node x
7 def find(self, x):
8 # get the value associated with key x, if it's not in the map return x
9 y = self.id.get(x, x)
10 # check if the current node is a Set ID node
11 if y != x:
12 # set the value to Set ID node of node y
13 y = self.find(y)
14 return y
15
16 # union two different sets setting one Set's parent to the other parent
17 def union(self, x, y):
18 self.id[self.find(x)] = self.find(y)
19
```

```
1class UnionFind {
2 // initialize the data structure that maps the node to its set ID
3 constructor() {
4 this.id = new Map();
5 }
6
7 // find the Set ID of Node x
8 find(x) {
9 let y = this.id.has(x) ? this.id.get(x) : x;
10 // check if the current node is a Set ID node
11 if (y !== x) {
12 y = this.find(y);
13 }
14 return y;
15 }
16
17 // union two different sets setting one Set's parent to the other parent
18 union(x, y) {
19 this.id.get(this.find(x)) = this.find(y);
20 }
21}
22
```

```
1template <class T> class UnionFind {
2private:
3 std::unordered_map<T, T> id;
4
5public:
6 T find(T x) {
7 T y = id.count(x) ? id[x] : x;
8 if (y != x) {
9 y = find(y);
10 id[x] = y;
11 }
12 return y;
13 }
14
15 // add trailing _ to avoid name collision with C++ keyword "union"
16 void union_(T x, T y) {
17 id[find(x)] = find(y);
18 }
19};
20
```

```
1public class UnionFind<T> {
2 private Dictionary<T, T> id = new Dictionary<T, T>();
3
4 // find the Set ID of Node x
5 public T Find(T x) {
6 T y = id.GetValueOrDefault(x, x);
7 //check if the current node is a Set ID node
8 if (!y.Equals(x)) {
9 y = Find(y);
10 }
11 return y;
12 }
13
14 // union two different sets setting one Set's parent to the other parent
15 public void Union(T x, T y) {
16 id[Find(x)] = Find(y);
17 }
18}
19
```

Disjoint Set Union is also called **Union Find** because of its two operations - union and find.

## Tree Compression Optimization

Now that we have a general idea of the data structure and how it is implemented, we now introduce an optimization. We can imagine there may be scenarios where our tree is not particularly balanced. Therefore, while recursing up the tree we make sure to store the Set ID node value and while moving back down the recursive stack set every node to the Set ID value. Here is a graphic to demonstrate this idea and should be a good visual indication of why this technique is referred to as tree compression as we eventually reach a tree with depth of 2 after querying every node.

While we try to find the parent node of node id 8 we set every node along the path to the Set ID node which is 5 which means later queries within our data structure only take O(log(n)) time.
This technique is called tree compression and allows us to achieve amortized `O(log(n))`

time complexity.
As a reminder, amortized time complexity is referring to the time complexity over a large number of operations.
We eventually get worst, average and best case of `O(log(n))`

over a large number of queries which allows us to avoid the worst case of `O(n)`

every operation.

```
1public static class UnionFind<T> {
2 // initialize the data structure that maps the node to its set ID
3 private HashMap<T, T> id = new HashMap<>();
4
5 // find the Set ID of Node x
6 public T find(T x) {
7 // Get the value associated with key x, if it's not in the map return x
8 T y = id.getOrDefault(x, x);
9 // check if the current node is a Set ID node
10 if (y != x) {
11 // set the value to Set ID node of node y
12 y = find(y);
13 // change the hash value of node x to Set ID value of node y
14 id.put(x, y);
15 }
16 return y;
17 }
18
19 // union two different sets setting one Set's parent to the other parent
20 public void union(T x, T y) {
21 id.put(find(x), find(y));
22 }
23}
24
```

```
1class UnionFind:
2 # initialize the data structure that maps the node to its set ID
3 def __init__(self):
4 self.id = {}
5
6 # find the Set ID of Node x
7 def find(self, x):
8 y = self.id.get(x, x)
9 # check if the current node is a Set ID node
10 if y != x:
11 # set the value to Set ID node of node y, and change the hash value of node x to Set ID value of node y
12 self.id[x] = y = self.find(y)
13 return y
14
15 # union two different sets setting one Set's parent to the other parent
16 def union(self, x, y):
17 self.id[self.find(x)] = self.find(y)
18
```

```
1class UnionFind {
2 // initialize the data structure that maps the node to its set ID
3 constructor() {
4 this.id = new Map();
5 }
6
7 // find the Set ID of Node x
8 find(x) {
9 let y = this.id.has(x) ? this.id.get(x) : x;
10 // check if the current node is a Set ID node
11 if (y !== x) {
12 // set the value to Set ID node of node y
13 y = this.find(y);
14 // change the hash value of node x to Set ID value of node y
15 this.id.set(x, y);
16 }
17 return y;
18 }
19
20 // union two different sets setting one Set's parent to the other parent
21 union(x, y) {
22 this.id.set(this.find(x), this.find(y));
23 }
24}
25
```

```
1public class UnionFind<T> {
2 private Dictionary<T, T> id = new Dictionary<T, T>();
3
4 // find the Set ID of Node x
5 public T Find(T x) {
6 T y = id.GetValueOrDefault(x, x);
7 //check if the current node is a Set ID node
8 if (!y.Equals(x)) {
9 y = Find(y);
10 // change the hash value of node x to Set ID value of node y
11 id[x] = y;
12 }
13 return y;
14 }
15
16 // union two different sets setting one Set's parent to the other parent
17 public void Union(T x, T y) {
18 id[Find(x)] = Find(y);
19 }
20}
21
```

## Union by Rank (Advanced, Optional)

Can we improve this even more though?
We have already discussed tree compression to optimize our later queries but there actually does exist a way to improve the time complexity.
This uses a technique called union by rank where we assign ranks to our nodes where the ranks represent the relative depths of our trees.
Each time we merge 2 sets we always set the parent of the node with the smaller ranks to that of the larger rank and update ranks.
This technique improves our `O(log(n))`

algorithm to that of `O(alpha(n))`

where `alpha(n)`

represents the inverse Ackermann Function which grows very slowly relative to `n`

.
The proof for the time complexity is a bit complicated so we will not touch upon it here.
It should be noted that since the inverse Ackermann Function grows so slowly it will never practically speaking exceed `O(4)`

effectively making it `O(1)`

but precisely speaking the time complexity is defined by `O(alpha(n))`

.

As a final note, in most cases using union by rank is likely not necessary as `O(log(n))`

is likely to be sufficiently fast for most cases but this is still a good trick to know for cases where it is required.
Now our final code taking into account union by rank,

```
1public static class UnionFind<T> {
2 private HashMap<T, T> id = new HashMap<>();
3 private HashMap<T, Integer> rank = new HashMap<>();
4
5 // find the Set ID of Node x
6 public T find(T x) {
7 // Get the value associated with key x, if it's not in the map return x
8 T y = id.getOrDefault(x, x);
9 // check if the current node is a Set ID node
10 if (y != x) {
11 // set the value to Set ID node of node y
12 y = find(y);
13 // change the hash value of node x to Set ID value of node y
14 id.put(x, y);
15 }
16 return y;
17 }
18
19 // union two different sets setting one Set's parent to the other parent
20 public void union(T x, T y) {
21 // check if keys exist in our rank map; if not, add them
22 if (!rank.containsKey(find(x))) rank.put(find(x), 0);
23 if (!rank.containsKey(find(y))) rank.put(find(y), 0);
24 if (rank.get(find(x)) < rank.get(find(y))) {
25 id.put(find(x), find(y));
26 }
27 else {
28 id.put(find(y), find(x));
29 // if rank is the same then we update x rank and increment by 1
30 // we make y's parent equal to x's parent, so x has increased depth
31 if (rank.get(find(x)) == rank.get(find(y))) {
32 rank.put(find(x), rank.get(find(x)) + 1);
33 }
34 }
35 }
36}
37
```

```
1class UnionFind:
2 def __init__(self):
3 self.id = {}
4 self.rank = {}
5
6 # find the Set ID of Node x
7 def find(self, x):
8 # Get the value associated with key x, if it's not in the map return x
9 y = self.id.get(x, x)
10 # check if the current node is a Set ID node
11 if y != x:
12 # change the hash value of node x to Set ID value of node y
13 self.id[x] = y = self.find(y)
14 return y
15
16
17 # union two different sets setting one Set's parent to the other parent
18 def union(self, x, y):
19 # check if keys exist in our rank map; if not, add them
20 if self.find(x) not in self.rank:
21 self.rank[self.find(x)] = 0
22 if self.find(y) not in self.rank:
23 self.rank[self.find(y)] = 0
24 if self.rank[self.find(x)] < self.rank[self.find(y)]:
25 self.id[self.find(x)] = self.find(y)
26 else:
27 self.id[self.find(y)] = self.find(x)
28 # if rank is the same then we update x rank and increment by 1
29 # we make y's parent equal to x's parent, so x has increased depth
30 if self.rank[self.find(x)] == self.rank[self.find(y)]:
31 self.rank[self.find(x)] = self.rank[self.find(x)] + 1
32
```

```
1class UnionFind {
2 constructor() {
3 this.id = new Map();
4 this.rank = new Map();
5 }
6
7 // find the Set ID of Node x
8 find(x) {
9 // get the value associated with key x, if it's not in the map return x
10 let y = this.id.has(x) ? this.id.get(x) : x;
11 // check if the current node is a Set ID node
12 if (y !== x) {
13 // change the hash value of node x to Set ID value of node y
14 y = this.find(y);
15 this.id.set(x, y);
16 }
17 return y;
18 }
19
20 // union two different sets setting one Set's parent to the other parent
21 union(x, y) {
22 // check if keys exist in our rank map; if not, add them
23 if (!this.rank.has(this.find(x))) {
24 this.rank.set(this.find(x), 0);
25 }
26 if (!this.rank.has(this.find(y))) {
27 this.rank.set(this.find(y), 0);
28 }
29 if (this.rank.get(this.find(x)) < this.rank.get(this.find(y))) {
30 this.id.set(this.find(x), this.find(y));
31 } else {
32 this.id.set(this.find(y), this.find(x));
33 // if rank is the same then we update x rank and increment by 1
34 // we make y's parent equal to x's parent, so x has increased depth
35 if (this.rank.get(this.find(x)) == this.rank.get(this.find(y))) {
36 this.rank.set(this.find(x), this.rank.get(this.find(x)) + 1);
37 }
38 }
39 }
40}
41
```

## Do I Need to Know or Implement Union Find for the Interview?

It's slightly more advanced. For implementation, knowing the version with Tree Compression Optimization is enough. No interviewer would expect you to write union by rank version in the short interview time.

Still not clear?Β SubmitΒ the part you don't understand to our editors.