Topological Sort

Prereq: Breadth First Search Review

Topological sort or topological ordering of a directed graph is an ordering of nodes such that every node appears in the ordering before all the nodes it points to. Topological sort is not unique. For example, for a graph like this

Both [4, 5, 2] and [5, 4, 2] are valid topological sort.

And for the following graph:

Task 3 is completely independent of task 2 and 4, and it can be anywhere in the order as long as it is before task 1 which depends on it. All of the following ordering are valid topological orderings.

[4, 2, 3, 1], [4, 3, 2, 1], [3, 4, 2, 1]

Graph with Cycles Do Not Have Topological Ordering

Cyclic graph

It should be obvious that if a graph with a cycle does not have a topological ordering. In the above example, 2 has to come before 5 which has come before 4 which has to come before 2 which has to come before 5... which is impossible.

Circular dependency is a common problem in real-world software engineering:

Kahn's Algorithm

To obtain a topological order, we can use Kahn's algorithm which is very similar to Breadth First Search.

In order to understand why we must use this algorithm we first look at the problem it is trying to solve.

Given a directed graph does there exist a way to remove the nodes such that each time we remove a node we guarantee that no other nodes point to that particular node?

For this algorithm we systematically remove one node at a time, each time removing a node such that no other nodes point to that node. If no such node exists then there exists a cycle and there does not exist a solution to the problem (because if there isn't a cycle then we can always find a node that does not depend on any other node). For each of the neighbouring nodes to that node we then check if it has no other nodes that point to it. If there isn't one then we push it into the queue. Notice it is important to keep track of the number of nodes pointing to the node in question as we only push the node into the queue once all the parents have been removed. Here is a graphic to demonstrate the idea.

Now we show an implementation of these ideas in the code below.

1from collections import deque
3def count_parents(graph):
4    counts = { node: 0 for node in graph }
5    for parent in graph:
6        for node in graph[parent]:
7            counts[node] += 1
8    return counts
11def topo_sort(graph):
12    res = []
13    q = deque()
14    counts = count_parents(graph)
15    for node in counts:
16        if counts[node] == 0:
17            q.append(node)
18    while len(q) > 0:
19        node = q.popleft()
20        res.append(node)
21        for child in graph[node]:
22            counts[child] -= 1
23            if counts[child] == 0:
24                q.append(child)
25    return res

An example problem is Task Scheduling.