527. Word Abbreviation


Problem Description

The problem requires creating minimal unique abbreviations for an array of distinct words. An abbreviation for a word is a shorthand representation that includes the first letter of the word, a number indicating the count of characters that are in between the first and the last letter, and then the last letter itself (e.g. "internationalization" can be abbreviated to "i18n").

The challenge comes when two words have the same abbreviation. In such a case, we need to find a way to make them unique by increasing the length of the abbreviation incrementally, without defeating the purpose of creating abbreviations, which is to shorten the word. Specifically, we start to increase the abbreviation from the beginning of the word until all abbreviations are unique. However, if at any point, the abbreviation is as long as the original word, then the abbreviation is discarded, and we keep the word as is.

Intuition

The intuition behind the solution is to first track the uniqueness of each abbreviation. As we need to ensure abbreviations are unique across all words, we can use a Trie to keep track of this. A Trie is a tree-like data structure that is often used to store strings. Each node in the Trie would contain links to other nodes corresponding to different characters of the alphabet, and we can also store at each node how many times a certain abbreviation ending in a specific character and having specific length has been seen.

The solution proceeds by inserting each word into the Trie, character by character. As we insert each character, we keep track of the count for the last character and total length at that point in the Trie.

When searching for an abbreviation of a word, we traverse the Trie by each character of the word until we find a unique abbreviation — this means looking for the part of the abbreviation until which it remains unique (the count is 1). Then we create an abbreviation by taking this unique starting substring, the number of characters skipped, and the last letter of the word. If this abbreviation doesn't make the word shorter, we discard it and return the original word.

Essentially, the Trie data structure guides us to find the shortest unique abbreviation for each word, using both insertion and search operations which are efficient in terms of both time and space, as they only go as deep as necessary to ensure uniqueness of the abbreviation.

Solution Approach

The solution approach leverages a Trie data structure, where each node potentially represents a letter in a set of strings. This Trie is specialized with a dictionary or map in each node, which counts how many times a certain abbreviation with a specific final character and length is represented in the Trie. Here is a breakdown of the key components of the solution:

  1. Trie Data Structure: Each node of the Trie (TrieNode) contains an array of 26 elements, representing each letter of the alphabet, and a dictionary (implemented using defaultdict(int)) that maps a tuple of end character and word length to its frequency.

  2. Insert Words into the Trie: The insert method is used to add words into the Trie. During insertion, the frequency map is updated for each prefix ending with the current character, taking into account the overall length and the final character of the word.

  3. Searching for Unique Abbreviations: The search method takes a word and tries to find its unique abbreviation using the Trie. It traverses the Trie based on each character of the word, collecting characters until it finds a unique abbreviation (where the frequency is 1). It then forms the abbreviation by appending the number of characters skipped if any, and the last character of the word. If the abbreviation turns out to be as long as or longer than the original word, the word is kept unchanged.

  4. Converting Words to Abbreviations: The wordsAbbreviation method creates an instance of Trie and inserts all given words. It then iterates through the words, calling search to find the minimal abbreviation for each. The result is a list of these minimal abbreviations.

The algorithm works by gradually creating longer abbreviations for the words that share the same shorter abbreviation. It keeps increasing the distinct portion of the abbreviation until all are unique.

The time complexity of this approach is optimal in the sense that each letter in each word is processed exactly once during Trie construction. The search will also traverse no longer than the length of the word. The space complexity involves the storage of all the words within the Trie, alongside the frequency mapping for uniqueness checking.

💪
Level Up Your
Algo Skills

Example Walkthrough

Let’s take a small example with the following distinct words to illustrate how the solution approach works: ["god", "good", "goat", "gold"]. We want to find minimal unique abbreviations for them.

  1. Construction of Trie:

    We start by inserting each word into the Trie.

    • Insert "god":

      Each letter 'g', 'o', 'd' is inserted into the Trie. When inserting 'd', we update our frequency map for the tuple ('d', 1) to 1 since we've encountered this abbreviation "g1d" once.

    • Insert "good":

      We proceed similarly with "good". The frequency map is updated for the tuple ('d', 2) to 1 (for abbreviation "g2d").

    • Insert "goat":

      For "goat”, "g2t" becomes the abbreviation. The frequency map for ('t', 2) is set to 1.

    • Insert "gold":

      Finally, inserting "gold", we encounter a collision with "god" since both would abbreviate to "g2d". Here, the frequency of the tuple ('d', 2) is updated to 2.

    The Trie is now constructed with the words and their potential abbreviations along with their frequencies.

  2. Searching for Unique Abbreviations:

    Next, we look for unique abbreviations.

    • For "god", the initial abbreviation "g1d" was unique and remains so.
    • "gold" now conflicts with "god", so we need an abbreviation with one more character, resulting in "go1d".
    • "good" already has a unique abbreviation "g2d".
    • "goat" has "g2t", which is also unique.
  3. Producing the Final Abbreviations:

    Now that we have determined the unique abbreviations for our words, we can produce the final list.

    • "god" -> "g1d"
    • "good" -> "g2d"
    • "goat" -> "g2t"
    • "gold" -> "go1d"

In this case, we found unique abbreviations by increasing the length of the abbreviation incrementally until they became distinct. For longer and more complex arrays of words, the approach remains the same, but the Trie might grow in depth and complexity to accommodate all unique distinctions. The Trie data structure efficiently supports the formation of these abbreviations by only needing to update the relevant parts of the structure when necessary and keeping track of the frequencies of each abbreviation tail ('end character', 'length').

Python Solution

1from collections import defaultdict
2
3class Trie:
4    def __init__(self):
5        self.children = [None] * 26  # Initialize children for each letter.
6        self.frequency_counter = defaultdict(int)  # To store frequencies of the words.
7
8    def insert(self, word):
9        node = self
10        for char in word:
11            index = ord(char) - ord('a')  # Convert character to index (0-25).
12            if node.children[index] is None:
13                node.children[index] = Trie()  # Create new trie node if it doesn't exist.
14            node = node.children[index]
15            # Increment the frequency for this word's last char and length tuple.
16            node.frequency_counter[(word[-1], len(word))] += 1
17
18    def search(self, word):
19        node = self
20        prefix = []  # To store the prefix of the word.
21        for char in word[:-1]:  # Skip the last character of the word.
22            index = ord(char) - ord('a')
23            node = node.children[index]
24            prefix.append(char)
25            # Check if this word's last char and length is unique in this trie node.
26            if node.frequency_counter[(word[-1], len(word))] == 1:
27                break
28        # Calculate how many characters to represent with an abbreviation number.
29        num_to_abbreviate = len(word) - len(prefix) - 1
30        if num_to_abbreviate:
31            prefix.append(str(num_to_abbreviate))  # Add abbreviation number.
32        prefix.append(word[-1])  # Add last character of the word.
33        abbreviation = ''.join(prefix)
34        # Return the abbreviation if it's shorter than the word, otherwise the word itself.
35        return abbreviation if len(abbreviation) < len(word) else word
36
37class Solution:
38    def wordsAbbreviation(self, words):
39        trie = Trie()
40        for word in words:
41            trie.insert(word)  # Insert each word into the trie.
42        # Search for the abbreviation of each word in the input list using the trie.
43        return [trie.search(word) for word in words]
44

Java Solution

1import java.util.*;
2
3// Trie node class to hold the trie structure
4class TrieNode {
5    // children array of TrieNode to store references to child nodes
6    TrieNode[] children = new TrieNode[26];
7    // count array to store number of occurrences of each character at the end of the word
8    int[] count = new int[26];
9
10    // Method to insert a word into the trie
11    void insert(String word) {
12        TrieNode node = this;
13        int lastCharIndex = word.charAt(word.length() - 1) - 'a'; // Get index of the last character
14        for (char c : word.toCharArray()) {
15            int index = c - 'a'; // Convert char to index
16            if (node.children[index] == null) {
17                node.children[index] = new TrieNode();
18            }
19            node = node.children[index];
20            node.count[lastCharIndex]++;
21        }
22    }
23
24    // Method to search for the abbreviation of a word in the trie
25    String search(String word) {
26        TrieNode node = this;
27        StringBuilder abbreviation = new StringBuilder();
28        int lastCharIndex = word.charAt(word.length() - 1) - 'a';
29        for (int i = 0; i < word.length() - 1; ++i) {
30            char c = word.charAt(i);
31            node = node.children[c - 'a'];
32            abbreviation.append(c);
33            if (node.count[lastCharIndex] == 1) {
34                break;
35            }
36        }
37        int charsLeft = word.length() - 1 - abbreviation.length();
38        if (charsLeft > 0) {
39            abbreviation.append(charsLeft);
40        }
41        abbreviation.append(word.charAt(word.length() - 1));
42        return abbreviation.length() < word.length() ? abbreviation.toString() : word;
43    }
44}
45
46class Solution {
47    // Method to generate abbreviations for a list of words
48    public List<String> wordsAbbreviation(List<String> words) {
49        Map<Integer, TrieNode> trees = new HashMap<>();
50        // First pass: prepare TrieNodes for each unique word length
51        for (String word : words) {
52            trees.computeIfAbsent(word.length(), k -> new TrieNode());
53        }
54        // Second pass: insert words into corresponding trie based on word length
55        for (String word : words) {
56            trees.get(word.length()).insert(word);
57        }
58        List<String> abbreviations = new ArrayList<>();
59        // Third pass: search for the abbreviation of each word
60        for (String word : words) {
61            abbreviations.add(trees.get(word.length()).search(word));
62        }
63        return abbreviations;
64    }
65}
66

C++ Solution

1#include <iostream>
2#include <vector>
3#include <unordered_map>
4#include <string>
5
6// TrieNode class to represent each node in the trie
7class TrieNode {
8public:
9    // Array of pointers to child TrieNodes
10    TrieNode* children[26];
11    // Count array to store the frequency of the last character of the words ending at this node
12    int count[26];
13
14    // Constructor to initialize the TrieNode with null children and zero counts
15    TrieNode() {
16        for (int i = 0; i < 26; ++i) {
17            children[i] = nullptr;
18            count[i] = 0;
19        }
20    }
21
22    // Method to insert a word into the trie
23    void Insert(const std::string& word) {
24        TrieNode* node = this;
25        int lastCharIndex = word.back() - 'a'; // Get index of the last character
26        for (char c : word) {
27            int index = c - 'a';
28            if (node->children[index] == nullptr) {
29                node->children[index] = new TrieNode();
30            }
31            node = node->children[index];
32            node->count[lastCharIndex]++;
33        }
34    }
35
36    // Method to search for the abbreviation of a word in the trie
37    std::string SearchAbbreviation(const std::string& word) {
38        TrieNode* node = this;
39        std::string abbreviation;
40        int lastCharIndex = word.back() - 'a';
41        for (size_t i = 0; i < word.length() - 1; ++i) {
42            char c = word[i];
43            node = node->children[c - 'a'];
44            abbreviation += c;
45            if (node->count[lastCharIndex] == 1) {
46                break;
47            }
48        }
49        int charsLeft = word.length() - 1 - abbreviation.length();
50        if (charsLeft > 0) {
51            abbreviation += std::to_string(charsLeft);
52        }
53        abbreviation += word.back();
54        return abbreviation.length() < word.length() ? abbreviation : word;
55    }
56};
57
58// Solution class to handle the abbreviation process
59class Solution {
60public:
61    // Method to generate abbreviations for a list of words
62    std::vector<std::string> WordsAbbreviation(const std::vector<std::string>& words) {
63        std::unordered_map<int, TrieNode*> trees;
64        // First pass: create TrieNodes for each unique word length
65        for (const std::string& word : words) {
66            trees.emplace(word.length(), new TrieNode());
67        }
68        // Second pass: insert words into the corresponding trie based on word length
69        for (const std::string& word : words) {
70            trees[word.length()]->Insert(word);
71        }
72        std::vector<std::string> abbreviations;
73        // Third pass: search for the abbreviation of each word
74        for (const std::string& word : words) {
75            abbreviations.push_back(trees[word.length()]->SearchAbbreviation(word));
76        }
77        return abbreviations;
78    }
79};
80
81int main() {
82    // Code for testing (not part of the solution)
83    Solution solution;
84    std::vector<std::string> words = {"like", "god", "internal", "me", "internet", "interval", "intension", "face", "intrusion"};
85    std::vector<std::string> abbreviations = solution.WordsAbbreviation(words);
86
87    for (const std::string& abbr : abbreviations) {
88        std::cout << abbr << std::endl;
89    }
90
91    return 0;
92}
93

Typescript Solution

1// Create a structure for the Trie node
2class TrieNode {
3    // References to child nodes
4    children: TrieNode[] = new Array(26).fill(null);
5    // Store the number of occurrences of each character
6    count: number[] = new Array(26).fill(0);
7
8    // Inserts a word into the trie
9    insert(word: string): void {
10        let node: TrieNode = this;
11        let lastCharIndex: number = word.charCodeAt(word.length - 1) - 'a'.charCodeAt(0);
12        for (let c of word) {
13            let index: number = c.charCodeAt(0) - 'a'.charCodeAt(0);
14            if (node.children[index] === null) {
15                node.children[index] = new TrieNode();
16            }
17            node = node.children[index];
18            // Increment the count for the last character of the word
19            node.count[lastCharIndex]++;
20        }
21    }
22
23    // Searches for the abbreviation of a word in the trie
24    search(word: string): string {
25        let node: TrieNode = this;
26        let abbreviation: string = "";
27        let lastCharIndex: number = word.charCodeAt(word.length - 1) - 'a'.charCodeAt(0);
28        for (let i = 0; i < word.length - 1; ++i) {
29            let c: string = word[i];
30            node = node.children[c.charCodeAt(0) - 'a'.charCodeAt(0)];
31            abbreviation += c;
32            if (node.count[lastCharIndex] === 1) {
33                break;
34            }
35        }
36        let charsLeft: number = word.length - 1 - abbreviation.length;
37        if (charsLeft > 0) {
38            abbreviation += charsLeft.toString();
39        }
40        abbreviation += word[word.length - 1];
41        return abbreviation.length < word.length ? abbreviation : word;
42    }
43}
44
45// Global function to generate abbreviations for a list of words
46function wordsAbbreviation(words: string[]): string[] {
47    let trees: Map<number, TrieNode> = new Map<number, TrieNode>();
48    // Prepare TrieNodes for each unique word length
49    words.forEach(word => {
50        trees.set(word.length, trees.get(word.length) || new TrieNode());
51    });
52    // Insert words into corresponding trie based on word length
53    words.forEach(word => {
54        trees.get(word.length).insert(word);
55    });
56    let abbreviations: string[] = [];
57    // Search for the abbreviation of each word
58    words.forEach(word => {
59        abbreviations.push(trees.get(word.length).search(word));
60    });
61    return abbreviations;
62}
63

Time and Space Complexity

The given Python code defines a Trie data structure to compress and represent a list of words in an abbreviated form. It includes two main operations: insert and search.

Time Complexity

  • Trie Insertion: The insert method of the Trie class inserts words into the trie and updates a counter of occurrences of a particular ending character with the corresponding word length. This operation has a time complexity of O(L) for each word, where L is the length of the word. As the insert is called for each word in the list containing N words, the total time complexity for building the trie becomes O(N * L).

  • Trie Search: The search method traverses the trie for each word, again in O(L) time (due to the same reasoning as above), to find each unique abbreviation. For N words, this gives us a total time complexity of O(N * L).

Assuming all words have around the same length, the overall time complexity for processing all N words would be O(N * L).

Space Complexity

  • Trie Storage: The space complexity of a trie depends on the number of nodes and the size of each node. Each node potentially has 26 children (for each letter of the alphabet) and a dictionary to store occurrences of ending characters and word lengths. Given a worst-case scenario where none of the words share a common prefix and all have maximal length L, the space complexity would be O(N * L * 26) for storing the trie. However, in practical scenarios, common prefixes will exist which can reduce this space requirement significantly.

  • Occurrences Dictionary: This stores various combinations of ending characters and word lengths. In the worst-case scenario where all words have unique lengths and endings, this would add up to O(N * L) space complexity.

Considering the trie storage and the occurrences dictionary, the total space complexity of the trie data structure can be considered O(N * L * 26), which simplifies to O(N * L) since the constant 26 can be dropped in the Big O notation.

In summary, the time complexity is O(N * L) and space complexity is O(N * L) for the provided code.

😈
Become an
Algo Monster

Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫