Topological Sort | Topological Order

Prereq: Breadth First Search Review

What is a topological order?

Topological sort or topological ordering of a directed graph is an ordering of nodes such that every node appears in the ordering before all the nodes it points to. Topological sort is not unique. For example, for a graph like this

Both [4, 5, 2] and [5, 4, 2] are valid topological sort.

And for the following graph:

Task 3 is completely independent of task 2 and 4, and it can be anywhere in the order as long as it is before task 1 which depends on it. All of the following ordering are valid topological orderings.

[4, 2, 3, 1], [4, 3, 2, 1], [3, 4, 2, 1]

Graphs with Cycles Do Not Have Topological Ordering

Cyclic graph

It should be obvious that if a graph with a cycle does not have a topological ordering. In the above example, 2 has to come before 5 which has come before 4 which has to come before 2 which has to come before 5... which is impossible.

Circular dependency is a common problem in real-world software engineering. If you have experience with web development, you probably have seen dreaded errors like these 😅:

Kahn's Algorithm

To obtain a topological order, we can use Kahn's algorithm which is very similar to Breadth First Search.

In order to understand why we must use this algorithm we first look at the problem it is trying to solve.

Given a directed graph does there exist a way to remove the nodes such that each time we remove a node we guarantee that no other nodes point to that particular node?

For this algorithm we systematically remove one node at a time, each time removing a node such that no other nodes point to that node. If no such node exists then there exists a cycle and there does not exist a solution to the problem (because if there isn't a cycle then we can always find a node that does not depend on any other node). For each of the neighbouring nodes to that node we then check if it has no other nodes that point to it. If there isn't one then we push it into the queue. Notice it is important to keep track of the number of nodes pointing to the node in question as we only push the node into the queue once all the parents have been removed. Here is a graphic to demonstrate the idea.

Let's summarize the steps here:

  1. Initialize a hashmap of node to parent count.
  2. Go through the nodes, count how many parents each node has (a parent node means another node pointing to the current).
  3. Push the nodes with 0 parents into the queue.
  4. Pop each node from the queue, subtract 1 from the parent count of each node it points to.
  5. If a node's parent count drops to 0, then push it into the queue.
  6. repeat until the queue is empty. If the queue is not empty, then there must be a cycle.

Topological Sort vs BFS

Notice the topological sort algorithm is very similar to BFS. The main difference is that we only push nodes with 0 parents into the queue in topological sort whereas in BFS we push all the neighboring nodes into the queue.

Topological Sort Implementation in Python, Java, Javascript and C++

Similar to BFS, we keep things short and clear by separating the logic into functions count_parents (step 1) and topo_sort (step 2-5).

1from collections import deque
2
3def count_parents(graph):
4    counts = { node: 0 for node in graph }
5    for parent in graph:
6        for node in graph[parent]:
7            counts[node] += 1
8    return counts
9
10
11def topo_sort(graph):
12    res = []
13    q = deque()
14    counts = count_parents(graph)
15    for node in counts:
16        if counts[node] == 0:
17            q.append(node)
18    while len(q) > 0:
19        node = q.popleft()
20        res.append(node)
21        for child in graph[node]:
22            counts[child] -= 1
23            if counts[child] == 0:
24                q.append(child)
25    return res if len(graph) == len(res) else None
26

Topological Sort Example

An example problem is Task Scheduling.