269. Alien Dictionary


Problem Description

In the given problem, we have a set of strings that are words from an alien language that uses the same characters as the English alphabet, but with an unknown order. We are tasked with determining the order of characters in the alien language. The input is a list of words from the alien language. These words are supposed to be sorted lexicographically according to the rules of the alien language. We need to verify if this claim of sorted order is true, and if so, deduce the order of the characters in the alien language based on the words provided.

If we cannot determine a valid character order that would result in the words being sorted lexicographically, we return an empty string. Otherwise, we should return any valid string that represents the characters in the alien language in lexicographically increasing order according to the rules of the language. There may be more than one correct order, in which case any one of them is an acceptable answer.

Intuition

The solution to this problem lies in treating the problem as a graph problem. Each character can be considered as a node in a graph, and a directed edge between two nodes indicates that one character precedes another in the alien language's lexicographical order. This can be derived from two consecutive words in the list: if the first differing characters between the two words are c1 and c2, then c1 must come before c2 in the alien language order.

Our goal is to build this directed graph and then find a topological ordering of the characters that satisfies all the constraints imposed by the consecutive words. This problem, therefore, boils down to the detection of cycles in a directed graph, and if no cycles are present, finding a topological sort of the graph's nodes.

The steps in arriving at the solution are as follows:

  1. Initialize a graph data structure to represent the ordering rules between characters.
  2. Populate the graph with nodes and directed edges by comparing consecutive words from the provided dictionary. If words are sorted lexicographically, the first different character from word[i] and word[i+1] sets an ordering rule.
  3. Detect if there is any contradiction in the ordering rules, such as a cycle in the graph which would make it impossible to determine the order, or an improper ordering like a word being a prefix of a previous word but appearing later in the list.
  4. Assuming the ordering rules do not contain contradictions, perform a topological sort on the graph to find the alien language character order.
  5. If we can complete the topological sort with all characters accounted for, we have successfully identified a possible alien language order. Otherwise, we return an empty string.

Learn more about Depth-First Search, Breadth-First Search, Graph and Topological Sort patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece๏ผš

Which of the following is a min heap?

Solution Approach

The implementation of the solution starts by constructing a directed graph and initializing some auxiliary data structures.

  1. We begin by creating a graph g which is represented as a 2D boolean list of size 26x26, initializing every element to False. Here, g[i][j] being True represents there is a directed edge from i to j, implying letter i comes before letter j in the alien language.

  2. An array s of size 26 is created to keep track of which characters are present in the given words, so we don't have to consider characters not in the language. Initially, every element is set to False.

  3. A variable cnt is used to count the unique letters that have appeared so far as we process the list of words.

  4. We process every word. Within each word, we mark the characters in s as True to signify their existence and increment cnt appropriately.

  5. The comparison between consecutive words is done character by character. We look for the first non-matching character pair c1 and c2 between words i and i+1. If c1 and c2 differ, it means c1 must come before c2 in the alphabet order. We encode this by setting g[ord(c1) - ord('a')][ord(c2) - ord('a')] = True.

  6. If c2 appears before c1 in any subsequent words, it contradicts the previously established order, and in such a case, we immediately return an empty string '', indicating that there's no valid order that can sort the words lexicographically.

  7. An array indegree of size 26 is maintained to keep track of the number of incoming edges for each node in the graph, which is crucial for topological sorting.

  8. We iterate through the graph to populate indegree by counting how many times a letter has been placed after another letter.

  9. We employ a queue to perform a topological sort. We start by adding all characters with zero indegree (no incoming edges) to the queue, which means these characters could appear at the beginning of the alien language's order.

  10. While the queue is not empty, we continuously pop elements from the queue, append their corresponding character to the answer, and reduce the indegree of their connected nodes by 1. If any node's indegree becomes zero, we add it to the queue, as it could now come next in order.

  11. Finally, we check if the length of our answer matches the count of unique letters (cnt). If it does not, that means it was impossible to establish a topological sort due to a cycle or other issues, and we return an empty string ''. Otherwise, we join the answer list into a string and return it as our character order.

Here's an example of how this works:

Consider the words ["wrt", "wrf", "er", "ett", "rftt"]. By comparing consecutive words, we conclude the order: w before e, t before f and r before t. When we try to find a topological order of the nodes, we correctly deduce the order to be wertf (or any order that follows the deduced rules, such as wetrf).

This approach is essentially a graph creation followed by a topological sort - a classic pattern for deducing order-based relationships between entities, often used in scheduling, course prerequisite structures, and similarly structured problems.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Breadth first search can be used to find the shortest path between two nodes in a directed graph.

Example Walkthrough

Let's consider a small example to illustrate the solution approach. We are provided with the following alien words:

1["z", "x", "z"]

According to the problem, these words are sorted according to the alien language's lexicographical order. We need to determine if this order is correct and, if so, deduce the character order in the alien language.

  1. We create a graph g as a 2D boolean list with a size of 26x26 representing the characters a to z. We also create an array s of size 26 to track present characters and a variable cnt to count unique letters.

  2. As we process our first word, "z", we mark z as present by setting s[ord('z') - ord('a')] to True. Then we do the same for "x" in the next word. At this point, cnt has been incremented twice, for z and x.

  3. Next, we compare consecutive words to establish ordering rules. We try to compare "z" and "x", but since there is only one letter in each word, we cannot define any order here and move on to the next pair of words.

  4. Comparing "x" and "z" from the second and third words, we would expect x to come before z if the words are sorted correctly. However, here we have the same situation as in the previous step: with only one letter, no relative order can be determined.

  5. At this point, we realize that according to the given words, z was supposed to come before x based on the first and the last word. However, there is no such rule established between them since the second word is 'x', which shows that we cannot determine a valid character order, and thus we need to return an empty string.

  6. For contradictions in the case where c2 appears before c1, like in our example ("z" appears before and after "x"), we would return an empty string ''.

Since this example demonstrates that the given words are not sorted lexicographically (the first and the last word are the same, "z", yet there's a word "x" in between), our function would return an empty string, indicating an invalid character order. There is no need to complete a topological sort because a contradiction in the order has already been detected.

In conclusion, this example illustrates how the given solution approach can process each word and compare consecutive words to establish a directed graph representing the alien language order. However, if a contradiction is foundโ€”such as the given words are not sorted or a word acts as both predecessor and successor of another wordโ€”the algorithm detects it and returns an empty string without proceeding to topological sorting.

Solution Implementation

1from typing import List
2from collections import deque
3
4class Solution:
5    def alienOrder(self, words: List[str]) -> str:
6        # Graph representation - 26 characters for 26 lowercase letters
7        graph = [[False] * 26 for _ in range(26)]
8      
9        # Seen array to keep track of which characters are present in the words list
10        seen = [False] * 26
11      
12        # Counter for the number of unique characters
13        unique_char_count = 0
14      
15        # Length of the words list
16        words_count = len(words)
17      
18        # Process all the words for character presence and ordering
19        for i in range(words_count - 1):
20            # Record the characters in current word as seen
21            for char in words[i]:
22                if unique_char_count == 26:
23                    break
24                index = ord(char) - ord('a')
25                if not seen[index]:
26                    unique_char_count += 1
27                    seen[index] = True
28            word_length = len(words[i])
29          
30            # Compare characters and establish ordering
31            for j in range(word_length):
32                # If the next word is shorter than the current word, there is an issue with the order
33                if j >= len(words[i + 1]):
34                    return ''
35              
36                char1, char2 = words[i][j], words[i + 1][j]
37                if char1 == char2:
38                    continue
39                index1, index2 = ord(char1) - ord('a'), ord(char2) - ord('a')
40                if graph[index2][index1]:  # If there is a cycle, return empty string
41                    return ''
42                graph[index1][index2] = True
43                break
44      
45        # Check the last word for new characters
46        for char in words[words_count - 1]:
47            if unique_char_count == 26:
48                break
49            index = ord(char) - ord('a')
50            if not seen[index]:
51                unique_char_count += 1
52                seen[index] = True
53      
54        # Calculate in-degree for each character that is seen
55        in_degree = [0] * 26
56        for i in range(26):
57            for j in range(26):
58                if i != j and seen[i] and seen[j] and graph[i][j]:
59                    in_degree[j] += 1
60      
61        # Initialize the queue for topological sort
62        queue = deque()
63        # Hold characters of the alien language in order
64        ordered_chars = []
65      
66        # Find characters with in-degree of 0 to start topological sort
67        for i in range(26):
68            if seen[i] and in_degree[i] == 0:
69                queue.append(i)
70      
71        # Run topological sort
72        while queue:
73            current_char_index = queue.popleft()
74            ordered_chars.append(chr(current_char_index + ord('a')))
75          
76            # Decrease the in-degree of adjacent characters and add new 0 in-degree characters to the queue
77            for i in range(26):
78                if seen[i] and i != current_char_index and graph[current_char_index][i]:
79                    in_degree[i] -= 1
80                    if in_degree[i] == 0:
81                        queue.append(i)
82      
83        # If the length of the ordered characters is less than total unique characters, ordering is not possible
84        return '' if len(ordered_chars) < unique_char_count else ''.join(ordered_chars)
85
1class Solution {
2    public String alienOrder(String[] words) {
3        // Graph represented as a 2D boolean array; g represents the edges.
4        boolean[][] graph = new boolean[26][26];
5        // Seen array to keep track of the characters encountered in the words.
6        boolean[] seen = new boolean[26];
7        // Counter for the number of unique letters.
8        int uniqueLettersCount = 0;
9        int totalWords = words.length;
10      
11        // First, process all the words except the last one for building the graph.
12        for (int i = 0; i < totalWords - 1; ++i) {
13            for (char c : words[i].toCharArray()) {
14                if (uniqueLettersCount == 26) {
15                    break;
16                }
17                int index = c - 'a';
18                if (!seen[index]) {
19                    ++uniqueLettersCount;
20                    seen[index] = true;
21                }
22            }
23          
24            // Build the graph based on the given dictionary (words array).
25            int wordLength = words[i].length();
26            for (int j = 0; j < wordLength; ++j) {
27                if (j >= words[i + 1].length()) {
28                    return ""; // If the next word is a prefix of the current word, it is invalid.
29                }
30              
31                char currentChar = words[i].charAt(j), nextChar = words[i + 1].charAt(j);
32                if (currentChar == nextChar) {
33                    continue; // No specific order is extracted from this character comparison.
34                }
35              
36                // Detect cycle or invalid sequence.
37                if (graph[nextChar - 'a'][currentChar - 'a']) {
38                    return "";
39                }
40                // Establish the edge relationship in the graph.
41                graph[currentChar - 'a'][nextChar - 'a'] = true;
42                break;
43            }
44        }
45      
46        // Process the last word to account for its letters.
47        for (char c : words[totalWords - 1].toCharArray()) {
48            if (uniqueLettersCount == 26) {
49                break;
50            }
51            int index = c - 'a';
52            if (!seen[index]) {
53                ++uniqueLettersCount;
54                seen[index] = true;
55            }
56        }
57
58        // Compute the in-degree for each node/character in the graph.
59        int[] inDegree = new int[26];
60        for (int i = 0; i < 26; ++i) {
61            for (int j = 0; j < 26; ++j) {
62                if (i != j && seen[i] && seen[j] && graph[i][j]) {
63                    ++inDegree[j];
64                }
65            }
66        }
67      
68        // Queue for topological sorting (Kahn's algorithm).
69        Deque<Integer> queue = new LinkedList<>();
70        for (int i = 0; i < 26; ++i) {
71            if (seen[i] && inDegree[i] == 0) {
72                queue.offerLast(i); // Enqueue characters with in-degree of 0.
73            }
74        }
75      
76        // Perform topological sort to determine the alien language order.
77        StringBuilder alienOrder = new StringBuilder();
78        while (!queue.isEmpty()) {
79            int current = queue.pollFirst();
80            alienOrder.append((char) (current + 'a'));
81            for (int i = 0; i < 26; ++i) {
82                if (i != current && seen[i] && graph[current][i]) {
83                    // Decrement in-degree and enqueue if in-degree becomes 0.
84                    if (--inDegree[i] == 0) {
85                        queue.offerLast(i);
86                    }
87                }
88            }
89        }
90        // If the final sorted string has a length less than the count of unique letters, there is an inconsistency.
91        return alienOrder.length() < uniqueLettersCount ? "" : alienOrder.toString();
92    }
93}
94
1class Solution {
2public:
3    string alienOrder(vector<string>& words) {
4        // Use a graph (represented as a 2D array) to represent the letter precedence
5        vector<vector<bool>> graph(26, vector<bool>(26, false));
6        // An array indicating whether each letter is present in the dictionary or not.
7        vector<bool> seen(26, false);
8        // Total number of unique characters seen so far
9        int uniqueCharsCount = 0;
10        int wordCount = words.size();
11      
12        // Build the graph
13        for (int i = 0; i < wordCount - 1; ++i) {
14            for (char ch : words[i]) {
15                int charIndex = ch - 'a';
16                if (!seen[charIndex]) {
17                    ++uniqueCharsCount;
18                    seen[charIndex] = true;
19                    if (uniqueCharsCount == 26) break;
20                }
21            }
22            int wordLength = words[i].size();
23            for (int j = 0; j < wordLength; ++j) {
24                if (j >= words[i + 1].size()) {
25                    return ""; // Next word cannot be prefix of the current word
26                }
27                char char1 = words[i][j], char2 = words[i + 1][j];
28                if (char1 == char2) continue;
29                if (graph[char2 - 'a'][char1 - 'a']) {
30                    return ""; // Invalid ordering found
31                }
32                graph[char1 - 'a'][char2 - 'a'] = true;
33                break; // Found the first difference, so move on to next word
34            }
35        }
36      
37        // Register the characters seen in the last word
38        for (char ch : words[wordCount - 1]) {
39            int charIndex = ch - 'a';
40            if (!seen[charIndex]) {
41                ++uniqueCharsCount;
42                seen[charIndex] = true;
43                if (uniqueCharsCount == 26) break;
44            }
45        }
46      
47        // Count the indegrees of each node/letter
48        vector<int> indegree(26, 0);
49        for (int i = 0; i < 26; ++i) {
50            for (int j = 0; j < 26; ++j) {
51                if (i != j && seen[i] && seen[j] && graph[i][j]) {
52                    ++indegree[j];
53                }
54            }
55        }
56      
57        // Start with all nodes/letters with 0 indegree
58        queue<int> queue;
59        for (int i = 0; i < 26; ++i) {
60            if (seen[i] && indegree[i] == 0) {
61                queue.push(i);
62            }
63        }
64      
65        // Perform topological sort
66        string result;
67        while (!queue.empty()) {
68            int current = queue.front();
69            result += (current + 'a');
70            queue.pop();
71            for (int i = 0; i < 26; ++i) {
72                // Reduce the indegree of the neighbors and add to queue if indegree becomes 0
73                if (i != current && seen[i] && graph[current][i]) {
74                    if (--indegree[i] == 0) {
75                        queue.push(i);
76                    }
77                }
78            }
79        }
80      
81        // If the number of characters in the result does not match the unique count
82        // it means there are some nodes left in the graph due to a cycle,
83        // hence no valid ordering
84        if (result.size() < uniqueCharsCount) {
85            return "";
86        }
87      
88        return result;
89    }
90};
91
1// Initialize type alias for readability
2type AlienGraph = boolean[][];
3type SeenArray = boolean[];
4
5// Abstracting the letter-to-index conversion into a function
6function charToIndex(ch: string): number {
7    return ch.charCodeAt(0) - 'a'.charCodeAt(0);
8}
9
10// Build the graph from the given list of words
11function buildGraph(words: string[], graph: AlienGraph, seen: SeenArray): number {
12    let uniqueCharsCount = 0;
13    const wordCount = words.length;
14  
15    for (let i = 0; i < wordCount - 1; ++i) {
16        for (const ch of words[i]) {
17            const charIndex = charToIndex(ch);
18            if (!seen[charIndex]) {
19                ++uniqueCharsCount;
20                seen[charIndex] = true;
21                if (uniqueCharsCount === 26) break;
22            }
23        }
24        const wordLength = words[i].length;
25        for (let j = 0; j < wordLength; ++j) {
26            if (j >= words[i + 1].length) return -1; // Next word cannot be prefix of the current word
27          
28            const char1 = words[i][j], char2 = words[i + 1][j];
29            if (char1 === char2) continue;
30            if (graph[charToIndex(char2)][charToIndex(char1)]) return -1; // Invalid ordering found
31          
32            graph[charToIndex(char1)][charToIndex(char2)] = true;
33            break; // Found the first difference, so move on to next word
34        }
35    }
36
37    for (const ch of words[wordCount - 1]) {
38        const charIndex = charToIndex(ch);
39        if (!seen[charIndex]) {
40            ++uniqueCharsCount;
41            seen[charIndex] = true;
42            if (uniqueCharsCount === 26) break;
43        }
44    }
45
46    return uniqueCharsCount;
47}
48
49// Perform topological sort to get alien language order
50function alienOrder(words: string[]): string {
51    const graph: AlienGraph = Array(26).fill(null).map(() => Array(26).fill(false));
52    const seen: SeenArray = Array(26).fill(false);
53    let uniqueCharsCount = buildGraph(words, graph, seen);
54  
55    // Return empty string if invalid prefix or ordering found
56    if (uniqueCharsCount === -1) return "";
57
58    // Initialize indegree array and calculate indegrees
59    const indegree: number[] = Array(26).fill(0);
60    for (let i = 0; i < 26; ++i) {
61        for (let j = 0; j < 26; ++j) {
62            if (i !== j && seen[i] && seen[j] && graph[i][j]) {
63                ++indegree[j];
64            }
65        }
66    }
67
68    // Create a queue and add all the characters with 0 indegree
69    const queue: number[] = [];
70    for (let i = 0; i < 26; ++i) {
71        if (seen[i] && indegree[i] === 0) {
72            queue.push(i);
73        }
74    }
75
76    // Perform the topological sort
77    let result = "";
78    while (queue.length) {
79        const current = queue.shift();
80        result += String.fromCharCode(current + 'a'.charCodeAt(0));
81      
82        for (let i = 0; i < 26; ++i) {
83            if (i !== current && seen[i] && graph[current][i]) {
84                if (--indegree[i] === 0) {
85                    queue.push(i);
86                }
87            }
88        }
89    }
90
91    // Check for cycles (if result size doesn't match unique char count)
92    if (result.length < uniqueCharsCount) {
93        return "";
94    }
95  
96    return result;
97}
98
Not Sure What to Study? Take the 2-min Quiz๏ผš

What is the best way of checking if an element exists in a sorted array once in terms of time complexity? Select the best that applies.

Time and Space Complexity

Time Complexity

The main time complexity of the provided code comes from multiple nested loops, iterating over the given words and then characters within those words, as well as operations related to constructing and traversing the graph representation of the alien dictionary. Analyzing the various components:

  1. Constructing the Character Graph: The code iterates through each pair of consecutive words and compares their characters until it finds a differing pair. This takes at most O(n * l) time, where n is the number of words and l is the average length of the words since it stops comparing once there is a discrepancy or the end of the word is reached.

  2. Populating visit and graph arrays: Similarly, the loop goes through each word's characters, and so does so in O(n * l) time as well. This also includes the check for early termination if an invalid order is detected.

  3. Calculating In-degrees: The nested loop has a fixed size of 26, so it takes O(26^2) which is essentially O(1) since it's a constant not based on input size.

  4. The Breadth-First Search (BFS) on the character graph involves dequeuing and enqueuing each character once. Since every character is enqueued and dequeued exactly once, and there are at most 26 characters, this step takes O(26) time, which is again constant.

Overall, the primary time complexity driver is determined by the comparison of characters in the given words, which is O(n * l). The rest of the steps are either constant time or linear time with a fixed upper bound unrelated to the input size. Hence, the time complexity can be summarized as O(n * l).

Space Complexity

For space complexity, we account for the data structures used in relation to the input size:

  1. The graph g is a 26x26 matrix that takes up O(26^2) which simplifies to O(1) space since it's constant space, irrespective of the input.

  2. The visit array s takes up O(26) space, again a constant O(1).

  3. The indegree array is also of size 26, hence O(1).

  4. The queue and answer list will at most hold all 26 characters, so they both take O(1) space.

Like the time complexity, the largest space consideration is the character graph, which is fixed and not dependent on input size. Thus, the space complexity is O(1) - constant space complexity.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which data structure is used in a depth first search?


Recommended Readings


Got a question?ย Ask the Teaching Assistantย anything you don't understand.

Still not clear? Ask in the Forum, ย Discordย orย Submitย the part you don't understand to our editors.

โ†
โ†‘TA ๐Ÿ‘จโ€๐Ÿซ