820. Short Encoding of Words
Problem Description
You need to find the shortest possible encoding string that can represent all words in a given array.
The encoding works as follows:
- You create a reference string
s
that contains all the encoded words separated by#
characters - Each word in the original array is represented by a starting index in this reference string
- Reading from each index until the next
#
character gives you the corresponding word
For example, if you have words ["time", "me", "bell"]
, one valid encoding could be:
- Reference string:
"time#bell#"
- Indices:
[0, 2, 5]
- Reading from index 0 to the first
#
gives "time" - Reading from index 2 to the first
#
gives "me" (which is a suffix of "time") - Reading from index 5 to the next
#
gives "bell"
The key insight is that if one word is a suffix of another word, they can share the same encoding space. In the example above, "me" is a suffix of "time", so we don't need a separate "me#" in our encoding.
The solution uses a Trie (prefix tree) built with reversed words to identify which words are suffixes of others:
- Insert all words into the Trie in reverse order (so suffixes become prefixes)
- Traverse the Trie using DFS to find all leaf nodes
- Only leaf nodes represent words that need their own encoding (they're not suffixes of any other word)
- Sum up the lengths of all paths to leaf nodes, adding 1 for each
#
character
The function returns the minimum total length of the reference string s
needed to encode all words.
Intuition
The key observation is that we want to share as much encoding space as possible between words. When can two words share the same encoding? When one word is a suffix of another.
Think about it this way: if we have "time" and "me", we can encode them as "time#"
with indices [0, 2]
. The word "me" doesn't need its own separate encoding because it's already contained within "time" as a suffix.
This naturally leads us to think about suffix relationships. How can we efficiently determine which words are suffixes of others? If we reverse all the words, suffix relationships become prefix relationships. For example:
- "time" reversed becomes "emit"
- "me" reversed becomes "em"
- Now "em" is a prefix of "emit"
Prefix relationships are perfectly handled by a Trie data structure! When we insert reversed words into a Trie:
- Words that share prefixes will share paths in the Trie
- A word that is a complete prefix of another (originally a suffix) won't be a leaf node
- Only leaf nodes represent words that need their own encoding
For example, with ["time", "me"]
:
- Insert "emit" and "em" into the Trie
- "em" is not a leaf (since "emit" extends beyond it)
- Only "emit" is at a leaf, so only "time#" is needed in the encoding
The final step is calculating the encoding length. We traverse the Trie with DFS, and for each leaf node we encounter, we add the path length plus 1 (for the #
character). This gives us the minimum encoding length because we've maximally shared encoding space between words with suffix relationships.
Learn more about Trie patterns.
Solution Approach
The implementation uses a Trie data structure to identify suffix relationships by working with reversed words.
Step 1: Build the Trie with Reversed Words
First, we create a Trie class with an array of 26 children (for each lowercase letter):
class Trie:
def __init__(self):
self.children = [None] * 26
Then we insert each word in reverse order into the Trie:
root = Trie()
for w in words:
cur = root
for c in w[::-1]: # Reverse the word
idx = ord(c) - ord("a") # Get character index (0-25)
if cur.children[idx] == None:
cur.children[idx] = Trie()
cur = cur.children[idx]
For example, with words ["time", "me", "bell"]
:
- "time" becomes "emit" and creates path: root → e → m → i → t
- "me" becomes "em" and follows existing path: root → e → m
- "bell" becomes "lleb" and creates path: root → l → l → e → b
Step 2: Calculate Minimum Encoding Length with DFS
We traverse the Trie using depth-first search to find all leaf nodes:
def dfs(cur: Trie, l: int) -> int:
isLeaf, ans = True, 0
for i in range(26):
if cur.children[i] != None:
isLeaf = False
ans += self.dfs(cur.children[i], l + 1)
if isLeaf:
ans += l
return ans
The DFS works as follows:
- Start from the root with length 1
- For each node, check all 26 possible children
- If a node has children, it's not a leaf - recursively explore children with incremented length
- If a node is a leaf (no children), add its path length to the answer
- The path length includes the
#
character that will follow the word
In our example:
- The path to "t" (from "emit") has length 5, contributing
5
to the answer - The path to "m" (from "em") is not a leaf, contributing
0
- The path to "b" (from "lleb") has length 5, contributing
5
to the answer - Total:
5 + 5 = 10
, representing"time#bell#"
The algorithm correctly identifies that "me" doesn't need separate encoding since it's a suffix of "time", resulting in the minimum encoding length.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with the example words = ["time", "me", "bell"]
.
Step 1: Build the Trie with reversed words
We reverse each word and insert it into the Trie:
- "time" → "emit"
- "me" → "em"
- "bell" → "lleb"
Building the Trie:
- Insert "emit": Creates path
root → e → m → i → t
- Insert "em": Follows existing path
root → e → m
(no new nodes created!) - Insert "lleb": Creates new path
root → l → l → e → b
The Trie structure looks like:
root / \ e l | | m l | | i e | | t b
Step 2: DFS traversal to find leaf nodes
Starting DFS from root with initial length 1:
-
From root, explore child 'e':
- From 'e', explore child 'm':
- From 'm', explore child 'i':
- From 'i', explore child 't':
- 't' is a leaf node! Add length 5 to answer
- From 'i', explore child 't':
- From 'm', explore child 'i':
- 'm' is not a leaf (has child 'i'), so contributes 0
- From 'e', explore child 'm':
-
From root, explore child 'l':
- From 'l', explore child 'l':
- From second 'l', explore child 'e':
- From 'e', explore child 'b':
- 'b' is a leaf node! Add length 5 to answer
- From 'e', explore child 'b':
- From second 'l', explore child 'e':
- From 'l', explore child 'l':
Step 3: Calculate result
Total encoding length = 5 (for "time#") + 5 (for "bell#") = 10
The minimum encoding string is "time#bell#"
with:
- Indices
[0, 2, 5]
- "time" starts at index 0
- "me" starts at index 2 (suffix of "time")
- "bell" starts at index 5
Notice how "me" doesn't need its own encoding because it's already contained within "time" as a suffix. This is exactly what our algorithm detected when "em" didn't create a leaf node in the Trie!
Solution Implementation
1class Trie:
2 """Trie node for storing reversed words."""
3 def __init__(self) -> None:
4 # Array to store 26 possible children (one for each lowercase letter)
5 self.children: list[Optional['Trie']] = [None] * 26
6
7
8class Solution:
9 def minimumLengthEncoding(self, words: List[str]) -> int:
10 """
11 Find the minimum length of a reference string that encodes all words.
12 Words that are suffixes of other words can share the same encoding.
13
14 Args:
15 words: List of words to encode
16
17 Returns:
18 Minimum length of the encoding string including '#' separators
19 """
20 # Build a trie with reversed words to identify suffix relationships
21 root = Trie()
22
23 # Insert each word in reverse order into the trie
24 for word in words:
25 current_node = root
26 # Traverse the word from end to beginning
27 for char in word[::-1]:
28 char_index = ord(char) - ord('a')
29 # Create new node if path doesn't exist
30 if current_node.children[char_index] is None:
31 current_node.children[char_index] = Trie()
32 current_node = current_node.children[char_index]
33
34 # Calculate total encoding length by summing leaf node depths
35 return self.dfs(root, 1)
36
37 def dfs(self, current_node: Trie, depth: int) -> int:
38 """
39 Depth-first search to calculate encoding length.
40 Only leaf nodes (words not being suffixes) contribute to the total.
41
42 Args:
43 current_node: Current trie node being visited
44 depth: Current depth in the trie (includes '#' separator)
45
46 Returns:
47 Total encoding length for this subtree
48 """
49 is_leaf = True
50 total_length = 0
51
52 # Check all possible children
53 for i in range(26):
54 if current_node.children[i] is not None:
55 is_leaf = False
56 # Recursively calculate length for child subtree
57 total_length += self.dfs(current_node.children[i], depth + 1)
58
59 # If this is a leaf node, add its depth to total
60 # (represents a word that needs to be encoded)
61 if is_leaf:
62 total_length += depth
63
64 return total_length
65
1class Trie {
2 // Array to store 26 possible children (one for each lowercase letter)
3 Trie[] children = new Trie[26];
4}
5
6class Solution {
7 /**
8 * Finds the minimum length of the reference string that encodes all words.
9 * Words that are suffixes of other words can share the same encoding.
10 *
11 * @param words Array of words to encode
12 * @return Minimum length of encoding string including '#' separators
13 */
14 public int minimumLengthEncoding(String[] words) {
15 // Initialize the root of the Trie
16 Trie root = new Trie();
17
18 // Build a Trie by inserting each word in reverse order
19 // This allows us to identify words that are suffixes of other words
20 for (String word : words) {
21 Trie currentNode = root;
22
23 // Traverse the word from right to left (reverse order)
24 for (int i = word.length() - 1; i >= 0; i--) {
25 // Calculate the index for the current character
26 int charIndex = word.charAt(i) - 'a';
27
28 // Create a new Trie node if it doesn't exist
29 if (currentNode.children[charIndex] == null) {
30 currentNode.children[charIndex] = new Trie();
31 }
32
33 // Move to the child node
34 currentNode = currentNode.children[charIndex];
35 }
36 }
37
38 // Calculate the total encoding length using DFS
39 // Start with length 1 to account for the '#' separator
40 return dfs(root, 1);
41 }
42
43 /**
44 * Performs depth-first search to calculate the encoding length.
45 * Only leaf nodes contribute to the total encoding length.
46 *
47 * @param currentNode Current Trie node being processed
48 * @param currentLength Current depth/length from root (including '#')
49 * @return Total encoding length for this subtree
50 */
51 private int dfs(Trie currentNode, int currentLength) {
52 // Flag to check if current node is a leaf
53 boolean isLeaf = true;
54
55 // Accumulator for the total encoding length
56 int totalLength = 0;
57
58 // Check all possible children
59 for (int i = 0; i < 26; i++) {
60 if (currentNode.children[i] != null) {
61 // Node has at least one child, so it's not a leaf
62 isLeaf = false;
63
64 // Recursively calculate encoding length for child subtree
65 totalLength += dfs(currentNode.children[i], currentLength + 1);
66 }
67 }
68
69 // If this is a leaf node, add its length to the total
70 // Leaf nodes represent complete words that need encoding
71 if (isLeaf) {
72 totalLength += currentLength;
73 }
74
75 return totalLength;
76 }
77}
78
1struct TrieNode {
2 TrieNode* children[26] = {nullptr}; // Array to store 26 lowercase letter children
3};
4
5class Solution {
6public:
7 int minimumLengthEncoding(vector<string>& words) {
8 // Create the root of the Trie
9 TrieNode* root = new TrieNode();
10
11 // Build a reverse Trie by inserting each word from end to beginning
12 // This helps identify words that are suffixes of other words
13 for (const string& word : words) {
14 TrieNode* current = root;
15
16 // Insert characters in reverse order (from last to first)
17 for (int i = word.size() - 1; i >= 0; --i) {
18 int charIndex = word[i] - 'a'; // Convert character to index (0-25)
19
20 // Create new node if path doesn't exist
21 if (current->children[charIndex] == nullptr) {
22 current->children[charIndex] = new TrieNode();
23 }
24
25 // Move to the child node
26 current = current->children[charIndex];
27 }
28 }
29
30 // Calculate the minimum encoding length by traversing the Trie
31 // Start with length 1 to account for the '#' delimiter
32 return calculateEncodingLength(root, 1);
33 }
34
35private:
36 int calculateEncodingLength(TrieNode* node, int currentLength) {
37 bool isLeafNode = true;
38 int totalLength = 0;
39
40 // Check all possible children (26 lowercase letters)
41 for (int i = 0; i < 26; ++i) {
42 if (node->children[i] != nullptr) {
43 isLeafNode = false; // Node has at least one child, so it's not a leaf
44
45 // Recursively calculate length for the subtree
46 totalLength += calculateEncodingLength(node->children[i], currentLength + 1);
47 }
48 }
49
50 // If this is a leaf node, it represents the end of a word that needs encoding
51 // Add its length (including the '#' delimiter) to the total
52 if (isLeafNode) {
53 totalLength += currentLength;
54 }
55
56 return totalLength;
57 }
58};
59
1// Define the TrieNode structure
2interface TrieNode {
3 children: (TrieNode | null)[]; // Array to store 26 lowercase letter children
4}
5
6// Create a new TrieNode
7function createTrieNode(): TrieNode {
8 return {
9 children: new Array(26).fill(null) // Initialize with 26 null values for a-z
10 };
11}
12
13function minimumLengthEncoding(words: string[]): number {
14 // Create the root of the Trie
15 const root: TrieNode = createTrieNode();
16
17 // Build a reverse Trie by inserting each word from end to beginning
18 // This helps identify words that are suffixes of other words
19 for (const word of words) {
20 let current: TrieNode = root;
21
22 // Insert characters in reverse order (from last to first)
23 for (let i = word.length - 1; i >= 0; i--) {
24 const charIndex: number = word.charCodeAt(i) - 'a'.charCodeAt(0); // Convert character to index (0-25)
25
26 // Create new node if path doesn't exist
27 if (current.children[charIndex] === null) {
28 current.children[charIndex] = createTrieNode();
29 }
30
31 // Move to the child node
32 current = current.children[charIndex] as TrieNode;
33 }
34 }
35
36 // Calculate the minimum encoding length by traversing the Trie
37 // Start with length 1 to account for the '#' delimiter
38 return calculateEncodingLength(root, 1);
39}
40
41function calculateEncodingLength(node: TrieNode, currentLength: number): number {
42 let isLeafNode: boolean = true;
43 let totalLength: number = 0;
44
45 // Check all possible children (26 lowercase letters)
46 for (let i = 0; i < 26; i++) {
47 if (node.children[i] !== null) {
48 isLeafNode = false; // Node has at least one child, so it's not a leaf
49
50 // Recursively calculate length for the subtree
51 totalLength += calculateEncodingLength(node.children[i] as TrieNode, currentLength + 1);
52 }
53 }
54
55 // If this is a leaf node, it represents the end of a word that needs encoding
56 // Add its length (including the '#' delimiter) to the total
57 if (isLeafNode) {
58 totalLength += currentLength;
59 }
60
61 return totalLength;
62}
63
Time and Space Complexity
Time Complexity: O(N * M)
- Building the Trie: We iterate through all
N
words, and for each word, we iterate through all its characters (average lengthM
). For each character, we perform constant time operations (checking if child exists, creating new node, moving pointer). This gives usO(N * M)
for Trie construction. - DFS traversal: We visit each node in the Trie exactly once. In the worst case, the Trie contains
O(N * M)
nodes (when all words are completely different). For each node, we check all 26 children (constant time). Therefore, DFS takesO(N * M)
. - Overall time complexity:
O(N * M) + O(N * M) = O(N * M)
Space Complexity: O(N * M)
- Trie storage: In the worst case, when all words share no common suffixes (since we're inserting reversed words), we need to store all characters from all words. Each Trie node contains an array of 26 pointers. Maximum nodes needed is
O(N * M)
. - DFS recursion stack: The maximum depth of recursion equals the length of the longest word, which is at most
M
. This contributesO(M)
to space complexity. - Overall space complexity:
O(N * M) + O(M) = O(N * M)
where N
is the number of words and M
is the average/maximum length of the words.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Not Handling Duplicate Words
A common mistake is assuming all words in the input array are unique. If duplicates exist, inserting the same word multiple times into the Trie doesn't create additional paths, but naive implementations might count them separately or handle them incorrectly.
Example Problem:
words = ["time", "me", "time"] # "time" appears twice
Solution: Use a set to remove duplicates before processing:
def minimumLengthEncoding(self, words: List[str]) -> int:
# Remove duplicates first
words = list(set(words))
root = Trie()
# Continue with the rest of the implementation...
2. Incorrect Initial Depth Value
Starting the DFS with depth 0 instead of 1 is a frequent error. The depth should start at 1 because each word in the encoding needs a '#' separator after it.
Wrong:
return self.dfs(root, 0) # Incorrect - misses the '#' character
Correct:
return self.dfs(root, 1) # Correct - accounts for '#' separator
3. Building Regular Trie Instead of Suffix Trie
Attempting to solve this problem with a regular (forward) Trie is a conceptual error. The problem requires identifying suffix relationships, which becomes prefix relationships when words are reversed.
Wrong Approach:
# Inserting words forward - WRONG for word in words: current = root for char in word: # Should be word[::-1] # This won't identify suffix relationships correctly
Correct Approach:
# Insert reversed words to identify suffixes for word in words: current = root for char in word[::-1]: # Reverse the word # Now suffixes become prefixes in the Trie
4. Memory Optimization Overlooked
Creating a full 26-element array for each Trie node can be wasteful if the input contains words with limited character variety. For large datasets with sparse character usage, this approach uses unnecessary memory.
Alternative Solution Using Dictionary:
class Trie:
def __init__(self):
self.children = {} # Use dictionary instead of array
# In the insertion logic:
for char in word[::-1]:
if char not in current_node.children:
current_node.children[char] = Trie()
current_node = current_node.children[char]
# In the DFS logic:
for child in current_node.children.values():
is_leaf = False
total_length += self.dfs(child, depth + 1)
5. Not Considering Edge Cases
Failing to handle edge cases like empty arrays or single-character words.
Edge Cases to Consider:
# Empty array words = [] # Should return 0 # Single character words words = ["a", "b", "a"] # Should return 4 ("a#b#") # All words are suffixes of one word words = ["me", "time", "ime"] # Should return 5 ("time#")
Which of the following uses divide and conquer strategy?
Recommended Readings
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Don’t Miss This!