Topological Sort | Topological Order
Prereq: Breadth First Search Review
Directed Graph
Before we get started on topological order, let's talk about directed graphs first. A graph is directed when its edges have directions (these edges are also called arcs).
Suppose that v, w
are vertices in a directed graph. Let's get familiar with a few concepts.
- in-edge - an edge pointing to or coming into the node
- out-edge - an edge pointing away or going out from the node (we will refer those nodes on the other end of the out-edges of
v
tov
'sneighbors
) - in-degree - the number of in-edges coming into the node
- out-degree - the number of out-edges going out from the node
- dependency - "
v
depends onw
" / "v
is dependent onw
" means that there is a directed path fromw
tov
wherev, w
are vertices in a directed graph. With these new terms under our belt, it will make more sense when we talk about directed graphs.
What is a topological order?
Topological sort or topological ordering of a directed graph is an ordering of nodes such that every node appears in the ordering before all the nodes it points to. Topological sort is not unique. For example, for a graph like this
Both [4, 5, 2]
and [5, 4, 2]
are valid topological sort.
And for the following graph:
Task 3 is completely independent of task 2 and 4, and it can be anywhere in the order as long as it is before task 1 which depends on it. All of the following ordering are valid topological orderings.
[4, 2, 3, 1]
, [4, 3, 2, 1]
, [3, 4, 2, 1]
Graphs with Cycles Do Not Have Topological Ordering
It should be obvious that if a graph with a cycle does not have a topological ordering. In the above example, 2 has to come before 5 which has come before 4 which has to come before 2 which has to come before 5... which is impossible.
Circular dependency is a common problem in real-world software engineering. If you have experience with web development, you probably have seen dreaded errors like these 😅:
Kahn's Algorithm
To obtain a topological order, we can use Kahn's algorithm which is very similar to Breadth First Search.
In order to understand why we must use this algorithm we first look at the problem it is trying to solve.
Given a directed graph does there exist a way to remove the nodes such that each time we remove a node we guarantee that no other nodes point to that particular node?
For this algorithm, we systematically remove one node at a time, each time removing a node such that no other nodes point to that node (in-degree is 0). If no such node exists, then there must be a cycle, and there is no way to order the nodes such that "every node appears in the ordering before all the nodes it points to" (no solution). After removing the current node, for each neighboring node the current node points to, we check whether any nodes point to this node. If there isn't any, we push this node into the queue. Notice it is important to keep track of the number of nodes pointing to a node (in-degree) in the question as we only push the node into the queue once all nodes the current node depended on have been removed. Here is a graphic to demonstrate the idea.
Let's summarize the steps here:
- Initialize a hashmap to store the in-degrees.
- Go through the nodes, count the in-degree of each node.
- Push the nodes with 0 in-degree into the queue.
- Pop each node from the queue, subtract 1 from the in-degree of each of its neighbors (each node it points to).
- If a node's in-degree drops to 0, then push it into the queue.
- repeat until the queue is empty. If any nodes remain unprocessed, then there must be a cycle.
Topological Sort vs BFS
Notice the topological sort algorithm is very similar to BFS. The main difference is that we only push nodes with 0 in-degree into the queue in topological sort whereas in BFS we push all the neighboring nodes into the queue.
Topological Sort Implementation in Python, Java, Javascript and C++
Similar to BFS, we keep things short and clear by separating the logic into functions find_indegree
(step 1) and topo_sort
(step 2-5).
1from collections import deque
2
3def find_indegree(graph):
4 indegree = { node: 0 for node in graph } # dict
5 for node in graph:
6 for neighbor in graph[node]:
7 indegree[neighbor] += 1
8 return indegree
9
10
11def topo_sort(graph):
12 res = []
13 q = deque()
14 indegree = find_indegree(graph)
15 for node in indegree:
16 if indegree[node] == 0:
17 q.append(node)
18 while len(q) > 0:
19 node = q.popleft()
20 res.append(node)
21 for neighbor in graph[node]:
22 indegree[neighbor] -= 1
23 if indegree[neighbor] == 0:
24 q.append(neighbor)
25 return res if len(graph) == len(res) else None
26
1// find the indegree of a node by adding in-edge count
2public static <T> Map<T, Integer> findInDegree(Map<T, List<T>> graph) {
3 Map<T, Integer> inDegree = new HashMap<>();
4 graph.keySet().forEach(node -> {
5 inDegree.put(node, 0);
6 });
7 // loop through every node and add 1 in-edge count to its neighbors
8 graph.entrySet().forEach(entry -> {
9 for (T neighbor : entry.getValue()) {
10 inDegree.put(neighbor, inDegree.get(neighbor) + 1);
11 }
12 });
13 return inDegree;
14}
15
16// topological sort the list
17public static <T> List<T> topoSort(Map<T, List<T>> graph) {
18 // return a list of the topological sorted list
19 List<T> res = new ArrayList<>();
20 // make a queue that we will use for our solution
21 Queue<T> q = new ArrayDeque<>();
22 // loop through all nodes and add all nodes that have 0 in-degree
23 Map<T, Integer> inDegree = findInDegree(graph);
24 inDegree.entrySet().forEach(entry -> {
25 if (entry.getValue() == 0) {
26 q.add(entry.getKey());
27 }
28 });
29 // perform bfs with queue, mostly the same as template bfs
30 while (!q.isEmpty()) {
31 T node = q.poll();
32 // add node to list to keep track of topological order
33 res.add(node);
34 for (T neighbor : graph.get(node)) {
35 // subtract one from every neighbour
36 inDegree.put(neighbor, inDegree.get(neighbor) - 1);
37 // once the in-degree reaches 0 you add it to the queue
38 if (inDegree.get(neighbor) == 0) {
39 q.add(neighbor);
40 }
41 }
42 }
43 // check for cycle
44 return (graph.size() == res.size()) ? res : null;
45}
46
1function findInDegree(graph) {
2 const inDegree = new Map();
3 for (let node of graph.keys()) {
4 inDegree.set(node, 0);
5 }
6 for (let node of graph.keys()) {
7 for (let neighbor of graph.get(node)) {
8 inDegree.set(neighbor, inDegree.get(neighbor) + 1);
9 }
10 }
11 return inDegree;
12}
13
14function topoSort(graph) {
15 const res = [];
16 const q = [];
17 const inDegree = findInDegree(graph);
18 for (let node of inDegree.keys()) {
19 if (inDegree.get(node) == 0) {
20 q.push(node);
21 }
22 }
23 while (q.length > 0) {
24 const node = q.shift();
25 res.push(node);
26 for (let neighbor of graph.get(node)) {
27 inDegree.set(neighbor, inDegree.get(neighbor) - 1);
28 if (inDegree.get(neighbor) == 0) {
29 q.push(neighbor);
30 }
31 }
32 }
33 return (graph.size === res.length) ? res : null;
34}
35
1template <typename T> std::unordered_map<T, int> find_indegree(std::unordered_map<T, std::vector<T>> graph) {
2 std::unordered_map<T, int> indegree;
3 for (auto entry : graph) {
4 indegree[entry.first] = 0;
5 }
6 for (auto node : graph) {
7 for (auto neighbor : node.second) {
8 indegree[neighbor] += 1;
9 }
10 }
11 return indegree;
12}
13
14template <typename T> std::vector<T> topo_sort(std::unordered_map<T, std::vector<T>> graph) {
15 std::vector<T> res;
16 std::queue<T> q;
17 std::unordered_map<T, int> indegree = find_indegree(graph);
18 for (auto entry : indegree) {
19 if (entry.second == 0) q.push(entry.first);
20 }
21 while (q.size() > 0) {
22 T node = q.front();
23 res.emplace_back(node);
24 for (T neighbor : graph[node]) {
25 indegree[neighbor] -= 1;
26 if (indegree[neighbor] == 0) q.push(neighbor);
27 }
28 q.pop();
29 }
30 if (graph.size() == res.size()) return res;
31 else {
32 std::cout << "Invalid topo sort: graph contains cycle" <<'\n';
33 return std::vector<T>();
34 }
35}
36
Topological Sort Example
An example problem is Task Scheduling.