820. Short Encoding of Words

MediumTrieArrayHash TableString

Problem Description

You need to find the shortest possible encoding string that can represent all words in a given array.

The encoding works as follows:

You create a reference string s that contains all the encoded words separated by # characters
Each word in the original array is represented by a starting index in this reference string
Reading from each index until the next # character gives you the corresponding word

For example, if you have words ["time", "me", "bell"], one valid encoding could be:

Reference string: "time#bell#"
Indices: [0, 2, 5]
Reading from index 0 to the first # gives "time"
Reading from index 2 to the first # gives "me" (which is a suffix of "time")
Reading from index 5 to the next # gives "bell"

The key insight is that if one word is a suffix of another word, they can share the same encoding space. In the example above, "me" is a suffix of "time", so we don't need a separate "me#" in our encoding.

The solution uses a Trie (prefix tree) built with reversed words to identify which words are suffixes of others:

Insert all words into the Trie in reverse order (so suffixes become prefixes)
Traverse the Trie using DFS to find all leaf nodes
Only leaf nodes represent words that need their own encoding (they're not suffixes of any other word)
Sum up the lengths of all paths to leaf nodes, adding 1 for each # character

The function returns the minimum total length of the reference string s needed to encode all words.

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Intuition

The key observation is that we want to share as much encoding space as possible between words. When can two words share the same encoding? When one word is a suffix of another.

Think about it this way: if we have "time" and "me", we can encode them as "time#" with indices [0, 2]. The word "me" doesn't need its own separate encoding because it's already contained within "time" as a suffix.

This naturally leads us to think about suffix relationships. How can we efficiently determine which words are suffixes of others? If we reverse all the words, suffix relationships become prefix relationships. For example:

"time" reversed becomes "emit"
"me" reversed becomes "em"
Now "em" is a prefix of "emit"

Prefix relationships are perfectly handled by a Trie data structure! When we insert reversed words into a Trie:

Words that share prefixes will share paths in the Trie
A word that is a complete prefix of another (originally a suffix) won't be a leaf node
Only leaf nodes represent words that need their own encoding

For example, with ["time", "me"]:

Insert "emit" and "em" into the Trie
"em" is not a leaf (since "emit" extends beyond it)
Only "emit" is at a leaf, so only "time#" is needed in the encoding

The final step is calculating the encoding length. We traverse the Trie with DFS, and for each leaf node we encounter, we add the path length plus 1 (for the # character). This gives us the minimum encoding length because we've maximally shared encoding space between words with suffix relationships.

Learn more about Trie patterns.

Solution Approach

The implementation uses a Trie data structure to identify suffix relationships by working with reversed words.

Step 1: Build the Trie with Reversed Words

First, we create a Trie class with an array of 26 children (for each lowercase letter):

class Trie:
    def __init__(self):
        self.children = [None] * 26

Then we insert each word in reverse order into the Trie:

root = Trie()
for w in words:
    cur = root
    for c in w[::-1]:  # Reverse the word
        idx = ord(c) - ord("a")  # Get character index (0-25)
        if cur.children[idx] == None:
            cur.children[idx] = Trie()
        cur = cur.children[idx]

For example, with words ["time", "me", "bell"]:

"time" becomes "emit" and creates path: root → e → m → i → t
"me" becomes "em" and follows existing path: root → e → m
"bell" becomes "lleb" and creates path: root → l → l → e → b

Step 2: Calculate Minimum Encoding Length with DFS

We traverse the Trie using depth-first search to find all leaf nodes:

def dfs(cur: Trie, l: int) -> int:
    isLeaf, ans = True, 0
    for i in range(26):
        if cur.children[i] != None:
            isLeaf = False
            ans += self.dfs(cur.children[i], l + 1)
    if isLeaf:
        ans += l
    return ans

The DFS works as follows:

Start from the root with length 1
For each node, check all 26 possible children
If a node has children, it's not a leaf - recursively explore children with incremented length
If a node is a leaf (no children), add its path length to the answer
The path length includes the # character that will follow the word

In our example:

The path to "t" (from "emit") has length 5, contributing 5 to the answer
The path to "m" (from "em") is not a leaf, contributing 0
The path to "b" (from "lleb") has length 5, contributing 5 to the answer
Total: 5 + 5 = 10, representing "time#bell#"

The algorithm correctly identifies that "me" doesn't need separate encoding since it's a suffix of "time", resulting in the minimum encoding length.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with the example words = ["time", "me", "bell"].

Step 1: Build the Trie with reversed words

We reverse each word and insert it into the Trie:

"time" → "emit"
"me" → "em"
"bell" → "lleb"

Building the Trie:

Insert "emit": Creates path root → e → m → i → t
Insert "em": Follows existing path root → e → m (no new nodes created!)
Insert "lleb": Creates new path root → l → l → e → b

The Trie structure looks like:

Step 2: DFS traversal to find leaf nodes

Starting DFS from root with initial length 1:

From root, explore child 'e':
- From 'e', explore child 'm':
  - From 'm', explore child 'i':
    - From 'i', explore child 't':
      - 't' is a leaf node! Add length 5 to answer
- 'm' is not a leaf (has child 'i'), so contributes 0
From root, explore child 'l':
- From 'l', explore child 'l':
  - From second 'l', explore child 'e':
    - From 'e', explore child 'b':
      - 'b' is a leaf node! Add length 5 to answer

Step 3: Calculate result

Total encoding length = 5 (for "time#") + 5 (for "bell#") = 10

The minimum encoding string is "time#bell#" with:

Indices [0, 2, 5]
"time" starts at index 0
"me" starts at index 2 (suffix of "time")
"bell" starts at index 5

Notice how "me" doesn't need its own encoding because it's already contained within "time" as a suffix. This is exactly what our algorithm detected when "em" didn't create a leaf node in the Trie!

Solution Implementation

1class Trie:
2    """Trie node for storing reversed words."""
3    def __init__(self) -> None:
4        # Array to store 26 possible children (one for each lowercase letter)
5        self.children: list[Optional['Trie']] = [None] * 26
6
7
8class Solution:
9    def minimumLengthEncoding(self, words: List[str]) -> int:
10        """
11        Find the minimum length of a reference string that encodes all words.
12        Words that are suffixes of other words can share the same encoding.
13      
14        Args:
15            words: List of words to encode
16          
17        Returns:
18            Minimum length of the encoding string including '#' separators
19        """
20        # Build a trie with reversed words to identify suffix relationships
21        root = Trie()
22      
23        # Insert each word in reverse order into the trie
24        for word in words:
25            current_node = root
26            # Traverse the word from end to beginning
27            for char in word[::-1]:
28                char_index = ord(char) - ord('a')
29                # Create new node if path doesn't exist
30                if current_node.children[char_index] is None:
31                    current_node.children[char_index] = Trie()
32                current_node = current_node.children[char_index]
33      
34        # Calculate total encoding length by summing leaf node depths
35        return self.dfs(root, 1)
36
37    def dfs(self, current_node: Trie, depth: int) -> int:
38        """
39        Depth-first search to calculate encoding length.
40        Only leaf nodes (words not being suffixes) contribute to the total.
41      
42        Args:
43            current_node: Current trie node being visited
44            depth: Current depth in the trie (includes '#' separator)
45          
46        Returns:
47            Total encoding length for this subtree
48        """
49        is_leaf = True
50        total_length = 0
51      
52        # Check all possible children
53        for i in range(26):
54            if current_node.children[i] is not None:
55                is_leaf = False
56                # Recursively calculate length for child subtree
57                total_length += self.dfs(current_node.children[i], depth + 1)
58      
59        # If this is a leaf node, add its depth to total
60        # (represents a word that needs to be encoded)
61        if is_leaf:
62            total_length += depth
63          
64        return total_length
65

1class Trie {
2    // Array to store 26 possible children (one for each lowercase letter)
3    Trie[] children = new Trie[26];
4}
5
6class Solution {
7    /**
8     * Finds the minimum length of the reference string that encodes all words.
9     * Words that are suffixes of other words can share the same encoding.
10     * 
11     * @param words Array of words to encode
12     * @return Minimum length of encoding string including '#' separators
13     */
14    public int minimumLengthEncoding(String[] words) {
15        // Initialize the root of the Trie
16        Trie root = new Trie();
17      
18        // Build a Trie by inserting each word in reverse order
19        // This allows us to identify words that are suffixes of other words
20        for (String word : words) {
21            Trie currentNode = root;
22          
23            // Traverse the word from right to left (reverse order)
24            for (int i = word.length() - 1; i >= 0; i--) {
25                // Calculate the index for the current character
26                int charIndex = word.charAt(i) - 'a';
27              
28                // Create a new Trie node if it doesn't exist
29                if (currentNode.children[charIndex] == null) {
30                    currentNode.children[charIndex] = new Trie();
31                }
32              
33                // Move to the child node
34                currentNode = currentNode.children[charIndex];
35            }
36        }
37      
38        // Calculate the total encoding length using DFS
39        // Start with length 1 to account for the '#' separator
40        return dfs(root, 1);
41    }
42  
43    /**
44     * Performs depth-first search to calculate the encoding length.
45     * Only leaf nodes contribute to the total encoding length.
46     * 
47     * @param currentNode Current Trie node being processed
48     * @param currentLength Current depth/length from root (including '#')
49     * @return Total encoding length for this subtree
50     */
51    private int dfs(Trie currentNode, int currentLength) {
52        // Flag to check if current node is a leaf
53        boolean isLeaf = true;
54      
55        // Accumulator for the total encoding length
56        int totalLength = 0;
57      
58        // Check all possible children
59        for (int i = 0; i < 26; i++) {
60            if (currentNode.children[i] != null) {
61                // Node has at least one child, so it's not a leaf
62                isLeaf = false;
63              
64                // Recursively calculate encoding length for child subtree
65                totalLength += dfs(currentNode.children[i], currentLength + 1);
66            }
67        }
68      
69        // If this is a leaf node, add its length to the total
70        // Leaf nodes represent complete words that need encoding
71        if (isLeaf) {
72            totalLength += currentLength;
73        }
74      
75        return totalLength;
76    }
77}
78

1struct TrieNode {
2    TrieNode* children[26] = {nullptr};  // Array to store 26 lowercase letter children
3};
4
5class Solution {
6public:
7    int minimumLengthEncoding(vector<string>& words) {
8        // Create the root of the Trie
9        TrieNode* root = new TrieNode();
10      
11        // Build a reverse Trie by inserting each word from end to beginning
12        // This helps identify words that are suffixes of other words
13        for (const string& word : words) {
14            TrieNode* current = root;
15          
16            // Insert characters in reverse order (from last to first)
17            for (int i = word.size() - 1; i >= 0; --i) {
18                int charIndex = word[i] - 'a';  // Convert character to index (0-25)
19              
20                // Create new node if path doesn't exist
21                if (current->children[charIndex] == nullptr) {
22                    current->children[charIndex] = new TrieNode();
23                }
24              
25                // Move to the child node
26                current = current->children[charIndex];
27            }
28        }
29      
30        // Calculate the minimum encoding length by traversing the Trie
31        // Start with length 1 to account for the '#' delimiter
32        return calculateEncodingLength(root, 1);
33    }
34
35private:
36    int calculateEncodingLength(TrieNode* node, int currentLength) {
37        bool isLeafNode = true;
38        int totalLength = 0;
39      
40        // Check all possible children (26 lowercase letters)
41        for (int i = 0; i < 26; ++i) {
42            if (node->children[i] != nullptr) {
43                isLeafNode = false;  // Node has at least one child, so it's not a leaf
44              
45                // Recursively calculate length for the subtree
46                totalLength += calculateEncodingLength(node->children[i], currentLength + 1);
47            }
48        }
49      
50        // If this is a leaf node, it represents the end of a word that needs encoding
51        // Add its length (including the '#' delimiter) to the total
52        if (isLeafNode) {
53            totalLength += currentLength;
54        }
55      
56        return totalLength;
57    }
58};
59

1// Define the TrieNode structure
2interface TrieNode {
3    children: (TrieNode | null)[];  // Array to store 26 lowercase letter children
4}
5
6// Create a new TrieNode
7function createTrieNode(): TrieNode {
8    return {
9        children: new Array(26).fill(null)  // Initialize with 26 null values for a-z
10    };
11}
12
13function minimumLengthEncoding(words: string[]): number {
14    // Create the root of the Trie
15    const root: TrieNode = createTrieNode();
16  
17    // Build a reverse Trie by inserting each word from end to beginning
18    // This helps identify words that are suffixes of other words
19    for (const word of words) {
20        let current: TrieNode = root;
21      
22        // Insert characters in reverse order (from last to first)
23        for (let i = word.length - 1; i >= 0; i--) {
24            const charIndex: number = word.charCodeAt(i) - 'a'.charCodeAt(0);  // Convert character to index (0-25)
25          
26            // Create new node if path doesn't exist
27            if (current.children[charIndex] === null) {
28                current.children[charIndex] = createTrieNode();
29            }
30          
31            // Move to the child node
32            current = current.children[charIndex] as TrieNode;
33        }
34    }
35  
36    // Calculate the minimum encoding length by traversing the Trie
37    // Start with length 1 to account for the '#' delimiter
38    return calculateEncodingLength(root, 1);
39}
40
41function calculateEncodingLength(node: TrieNode, currentLength: number): number {
42    let isLeafNode: boolean = true;
43    let totalLength: number = 0;
44  
45    // Check all possible children (26 lowercase letters)
46    for (let i = 0; i < 26; i++) {
47        if (node.children[i] !== null) {
48            isLeafNode = false;  // Node has at least one child, so it's not a leaf
49          
50            // Recursively calculate length for the subtree
51            totalLength += calculateEncodingLength(node.children[i] as TrieNode, currentLength + 1);
52        }
53    }
54  
55    // If this is a leaf node, it represents the end of a word that needs encoding
56    // Add its length (including the '#' delimiter) to the total
57    if (isLeafNode) {
58        totalLength += currentLength;
59    }
60  
61    return totalLength;
62}
63

Time and Space Complexity

Time Complexity: O(N * M)

Building the Trie: We iterate through all N words, and for each word, we iterate through all its characters (average length M). For each character, we perform constant time operations (checking if child exists, creating new node, moving pointer). This gives us O(N * M) for Trie construction.
DFS traversal: We visit each node in the Trie exactly once. In the worst case, the Trie contains O(N * M) nodes (when all words are completely different). For each node, we check all 26 children (constant time). Therefore, DFS takes O(N * M).
Overall time complexity: O(N * M) + O(N * M) = O(N * M)

Space Complexity: O(N * M)

Trie storage: In the worst case, when all words share no common suffixes (since we're inserting reversed words), we need to store all characters from all words. Each Trie node contains an array of 26 pointers. Maximum nodes needed is O(N * M).
DFS recursion stack: The maximum depth of recursion equals the length of the longest word, which is at most M. This contributes O(M) to space complexity.
Overall space complexity: O(N * M) + O(M) = O(N * M)

where N is the number of words and M is the average/maximum length of the words.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Not Handling Duplicate Words

A common mistake is assuming all words in the input array are unique. If duplicates exist, inserting the same word multiple times into the Trie doesn't create additional paths, but naive implementations might count them separately or handle them incorrectly.

Example Problem:

words = ["time", "me", "time"]  # "time" appears twice

Solution: Use a set to remove duplicates before processing:

def minimumLengthEncoding(self, words: List[str]) -> int:
    # Remove duplicates first
    words = list(set(words))
    root = Trie()
    # Continue with the rest of the implementation...

2. Incorrect Initial Depth Value

Starting the DFS with depth 0 instead of 1 is a frequent error. The depth should start at 1 because each word in the encoding needs a '#' separator after it.

Wrong:

return self.dfs(root, 0)  # Incorrect - misses the '#' character

Correct:

return self.dfs(root, 1)  # Correct - accounts for '#' separator

3. Building Regular Trie Instead of Suffix Trie

Attempting to solve this problem with a regular (forward) Trie is a conceptual error. The problem requires identifying suffix relationships, which becomes prefix relationships when words are reversed.

Wrong Approach:

# Inserting words forward - WRONG
for word in words:
    current = root
    for char in word:  # Should be word[::-1]
        # This won't identify suffix relationships correctly

Correct Approach:

# Insert reversed words to identify suffixes
for word in words:
    current = root
    for char in word[::-1]:  # Reverse the word
        # Now suffixes become prefixes in the Trie

4. Memory Optimization Overlooked

Creating a full 26-element array for each Trie node can be wasteful if the input contains words with limited character variety. For large datasets with sparse character usage, this approach uses unnecessary memory.

Alternative Solution Using Dictionary:

class Trie:
    def __init__(self):
        self.children = {}  # Use dictionary instead of array
      
# In the insertion logic:
for char in word[::-1]:
    if char not in current_node.children:
        current_node.children[char] = Trie()
    current_node = current_node.children[char]

# In the DFS logic:
for child in current_node.children.values():
    is_leaf = False
    total_length += self.dfs(child, depth + 1)

5. Not Considering Edge Cases

Failing to handle edge cases like empty arrays or single-character words.

Edge Cases to Consider:

# Empty array
words = []  # Should return 0

# Single character words
words = ["a", "b", "a"]  # Should return 4 ("a#b#")

# All words are suffixes of one word
words = ["me", "time", "ime"]  # Should return 5 ("time#")

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan: