527. Word Abbreviation
Problem Description
The problem requires creating minimal unique abbreviations for an array of distinct words. An abbreviation for a word is a shorthand representation that includes the first letter of the word, a number indicating the count of characters that are in between the first and the last letter, and then the last letter itself (e.g. "internationalization" can be abbreviated to "i18n").
The challenge comes when two words have the same abbreviation. In such a case, we need to find a way to make them unique by increasing the length of the abbreviation incrementally, without defeating the purpose of creating abbreviations, which is to shorten the word. Specifically, we start to increase the abbreviation from the beginning of the word until all abbreviations are unique. However, if at any point, the abbreviation is as long as the original word, then the abbreviation is discarded, and we keep the word as is.
Intuition
The intuition behind the solution is to first track the uniqueness of each abbreviation. As we need to ensure abbreviations are unique across all words, we can use a Trie to keep track of this. A Trie is a tree-like data structure that is often used to store strings. Each node in the Trie would contain links to other nodes corresponding to different characters of the alphabet, and we can also store at each node how many times a certain abbreviation ending in a specific character and having specific length has been seen.
The solution proceeds by inserting each word into the Trie, character by character. As we insert each character, we keep track of the count for the last character and total length at that point in the Trie.
When searching for an abbreviation of a word, we traverse the Trie by each character of the word until we find a unique abbreviation — this means looking for the part of the abbreviation until which it remains unique (the count is 1). Then we create an abbreviation by taking this unique starting substring, the number of characters skipped, and the last letter of the word. If this abbreviation doesn't make the word shorter, we discard it and return the original word.
Essentially, the Trie data structure guides us to find the shortest unique abbreviation for each word, using both insertion and search operations which are efficient in terms of both time and space, as they only go as deep as necessary to ensure uniqueness of the abbreviation.
Solution Approach
The solution approach leverages a Trie data structure, where each node potentially represents a letter in a set of strings. This Trie is specialized with a dictionary or map in each node, which counts how many times a certain abbreviation with a specific final character and length is represented in the Trie. Here is a breakdown of the key components of the solution:
-
Trie Data Structure: Each node of the Trie (
TrieNode
) contains an array of 26 elements, representing each letter of the alphabet, and a dictionary (implemented usingdefaultdict(int)
) that maps a tuple of end character and word length to its frequency. -
Insert Words into the Trie: The
insert
method is used to add words into the Trie. During insertion, the frequency map is updated for each prefix ending with the current character, taking into account the overall length and the final character of the word. -
Searching for Unique Abbreviations: The
search
method takes a word and tries to find its unique abbreviation using the Trie. It traverses the Trie based on each character of the word, collecting characters until it finds a unique abbreviation (where the frequency is 1). It then forms the abbreviation by appending the number of characters skipped if any, and the last character of the word. If the abbreviation turns out to be as long as or longer than the original word, the word is kept unchanged. -
Converting Words to Abbreviations: The
wordsAbbreviation
method creates an instance of[Trie](/problems/trie_intro)
and inserts all given words. It then iterates through the words, callingsearch
to find the minimal abbreviation for each. The result is a list of these minimal abbreviations.
The algorithm works by gradually creating longer abbreviations for the words that share the same shorter abbreviation. It keeps increasing the distinct portion of the abbreviation until all are unique.
The time complexity of this approach is optimal in the sense that each letter in each word is processed exactly once during Trie construction. The search will also traverse no longer than the length of the word. The space complexity involves the storage of all the words within the Trie, alongside the frequency mapping for uniqueness checking.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let’s take a small example with the following distinct words to illustrate how the solution approach works: ["god", "good", "goat", "gold"]. We want to find minimal unique abbreviations for them.
-
Construction of Trie:
We start by inserting each word into the Trie.
-
Insert "god":
Each letter 'g', 'o', 'd' is inserted into the Trie. When inserting 'd', we update our frequency map for the tuple ('d', 1) to 1 since we've encountered this abbreviation "g1d" once.
-
Insert "good":
We proceed similarly with "good". The frequency map is updated for the tuple ('d', 2) to 1 (for abbreviation "g2d").
-
Insert "goat":
For "goat”, "g2t" becomes the abbreviation. The frequency map for ('t', 2) is set to 1.
-
Insert "gold":
Finally, inserting "gold", we encounter a collision with "god" since both would abbreviate to "g2d". Here, the frequency of the tuple ('d', 2) is updated to 2.
The Trie is now constructed with the words and their potential abbreviations along with their frequencies.
-
-
Searching for Unique Abbreviations:
Next, we look for unique abbreviations.
- For "god", the initial abbreviation "g1d" was unique and remains so.
- "gold" now conflicts with "god", so we need an abbreviation with one more character, resulting in "go1d".
- "good" already has a unique abbreviation "g2d".
- "goat" has "g2t", which is also unique.
-
Producing the Final Abbreviations:
Now that we have determined the unique abbreviations for our words, we can produce the final list.
- "god" -> "g1d"
- "good" -> "g2d"
- "goat" -> "g2t"
- "gold" -> "go1d"
In this case, we found unique abbreviations by increasing the length of the abbreviation incrementally until they became distinct. For longer and more complex arrays of words, the approach remains the same, but the Trie might grow in depth and complexity to accommodate all unique distinctions. The Trie data structure efficiently supports the formation of these abbreviations by only needing to update the relevant parts of the structure when necessary and keeping track of the frequencies of each abbreviation tail ('end character', 'length').
Solution Implementation
1from collections import defaultdict
2
3class Trie:
4 def __init__(self):
5 self.children = [None] * 26 # Initialize children for each letter.
6 self.frequency_counter = defaultdict(int) # To store frequencies of the words.
7
8 def insert(self, word):
9 node = self
10 for char in word:
11 index = ord(char) - ord('a') # Convert character to index (0-25).
12 if node.children[index] is None:
13 node.children[index] = Trie() # Create new trie node if it doesn't exist.
14 node = node.children[index]
15 # Increment the frequency for this word's last char and length tuple.
16 node.frequency_counter[(word[-1], len(word))] += 1
17
18 def search(self, word):
19 node = self
20 prefix = [] # To store the prefix of the word.
21 for char in word[:-1]: # Skip the last character of the word.
22 index = ord(char) - ord('a')
23 node = node.children[index]
24 prefix.append(char)
25 # Check if this word's last char and length is unique in this trie node.
26 if node.frequency_counter[(word[-1], len(word))] == 1:
27 break
28 # Calculate how many characters to represent with an abbreviation number.
29 num_to_abbreviate = len(word) - len(prefix) - 1
30 if num_to_abbreviate:
31 prefix.append(str(num_to_abbreviate)) # Add abbreviation number.
32 prefix.append(word[-1]) # Add last character of the word.
33 abbreviation = ''.join(prefix)
34 # Return the abbreviation if it's shorter than the word, otherwise the word itself.
35 return abbreviation if len(abbreviation) < len(word) else word
36
37class Solution:
38 def wordsAbbreviation(self, words):
39 trie = Trie()
40 for word in words:
41 trie.insert(word) # Insert each word into the trie.
42 # Search for the abbreviation of each word in the input list using the trie.
43 return [trie.search(word) for word in words]
44
1import java.util.*;
2
3// Trie node class to hold the trie structure
4class TrieNode {
5 // children array of TrieNode to store references to child nodes
6 TrieNode[] children = new TrieNode[26];
7 // count array to store number of occurrences of each character at the end of the word
8 int[] count = new int[26];
9
10 // Method to insert a word into the trie
11 void insert(String word) {
12 TrieNode node = this;
13 int lastCharIndex = word.charAt(word.length() - 1) - 'a'; // Get index of the last character
14 for (char c : word.toCharArray()) {
15 int index = c - 'a'; // Convert char to index
16 if (node.children[index] == null) {
17 node.children[index] = new TrieNode();
18 }
19 node = node.children[index];
20 node.count[lastCharIndex]++;
21 }
22 }
23
24 // Method to search for the abbreviation of a word in the trie
25 String search(String word) {
26 TrieNode node = this;
27 StringBuilder abbreviation = new StringBuilder();
28 int lastCharIndex = word.charAt(word.length() - 1) - 'a';
29 for (int i = 0; i < word.length() - 1; ++i) {
30 char c = word.charAt(i);
31 node = node.children[c - 'a'];
32 abbreviation.append(c);
33 if (node.count[lastCharIndex] == 1) {
34 break;
35 }
36 }
37 int charsLeft = word.length() - 1 - abbreviation.length();
38 if (charsLeft > 0) {
39 abbreviation.append(charsLeft);
40 }
41 abbreviation.append(word.charAt(word.length() - 1));
42 return abbreviation.length() < word.length() ? abbreviation.toString() : word;
43 }
44}
45
46class Solution {
47 // Method to generate abbreviations for a list of words
48 public List<String> wordsAbbreviation(List<String> words) {
49 Map<Integer, TrieNode> trees = new HashMap<>();
50 // First pass: prepare TrieNodes for each unique word length
51 for (String word : words) {
52 trees.computeIfAbsent(word.length(), k -> new TrieNode());
53 }
54 // Second pass: insert words into corresponding trie based on word length
55 for (String word : words) {
56 trees.get(word.length()).insert(word);
57 }
58 List<String> abbreviations = new ArrayList<>();
59 // Third pass: search for the abbreviation of each word
60 for (String word : words) {
61 abbreviations.add(trees.get(word.length()).search(word));
62 }
63 return abbreviations;
64 }
65}
66
1#include <iostream>
2#include <vector>
3#include <unordered_map>
4#include <string>
5
6// TrieNode class to represent each node in the trie
7class TrieNode {
8public:
9 // Array of pointers to child TrieNodes
10 TrieNode* children[26];
11 // Count array to store the frequency of the last character of the words ending at this node
12 int count[26];
13
14 // Constructor to initialize the TrieNode with null children and zero counts
15 TrieNode() {
16 for (int i = 0; i < 26; ++i) {
17 children[i] = nullptr;
18 count[i] = 0;
19 }
20 }
21
22 // Method to insert a word into the trie
23 void Insert(const std::string& word) {
24 TrieNode* node = this;
25 int lastCharIndex = word.back() - 'a'; // Get index of the last character
26 for (char c : word) {
27 int index = c - 'a';
28 if (node->children[index] == nullptr) {
29 node->children[index] = new TrieNode();
30 }
31 node = node->children[index];
32 node->count[lastCharIndex]++;
33 }
34 }
35
36 // Method to search for the abbreviation of a word in the trie
37 std::string SearchAbbreviation(const std::string& word) {
38 TrieNode* node = this;
39 std::string abbreviation;
40 int lastCharIndex = word.back() - 'a';
41 for (size_t i = 0; i < word.length() - 1; ++i) {
42 char c = word[i];
43 node = node->children[c - 'a'];
44 abbreviation += c;
45 if (node->count[lastCharIndex] == 1) {
46 break;
47 }
48 }
49 int charsLeft = word.length() - 1 - abbreviation.length();
50 if (charsLeft > 0) {
51 abbreviation += std::to_string(charsLeft);
52 }
53 abbreviation += word.back();
54 return abbreviation.length() < word.length() ? abbreviation : word;
55 }
56};
57
58// Solution class to handle the abbreviation process
59class Solution {
60public:
61 // Method to generate abbreviations for a list of words
62 std::vector<std::string> WordsAbbreviation(const std::vector<std::string>& words) {
63 std::unordered_map<int, TrieNode*> trees;
64 // First pass: create TrieNodes for each unique word length
65 for (const std::string& word : words) {
66 trees.emplace(word.length(), new TrieNode());
67 }
68 // Second pass: insert words into the corresponding trie based on word length
69 for (const std::string& word : words) {
70 trees[word.length()]->Insert(word);
71 }
72 std::vector<std::string> abbreviations;
73 // Third pass: search for the abbreviation of each word
74 for (const std::string& word : words) {
75 abbreviations.push_back(trees[word.length()]->SearchAbbreviation(word));
76 }
77 return abbreviations;
78 }
79};
80
81int main() {
82 // Code for testing (not part of the solution)
83 Solution solution;
84 std::vector<std::string> words = {"like", "god", "internal", "me", "internet", "interval", "intension", "face", "intrusion"};
85 std::vector<std::string> abbreviations = solution.WordsAbbreviation(words);
86
87 for (const std::string& abbr : abbreviations) {
88 std::cout << abbr << std::endl;
89 }
90
91 return 0;
92}
93
1// Create a structure for the Trie node
2class TrieNode {
3 // References to child nodes
4 children: TrieNode[] = new Array(26).fill(null);
5 // Store the number of occurrences of each character
6 count: number[] = new Array(26).fill(0);
7
8 // Inserts a word into the trie
9 insert(word: string): void {
10 let node: TrieNode = this;
11 let lastCharIndex: number = word.charCodeAt(word.length - 1) - 'a'.charCodeAt(0);
12 for (let c of word) {
13 let index: number = c.charCodeAt(0) - 'a'.charCodeAt(0);
14 if (node.children[index] === null) {
15 node.children[index] = new TrieNode();
16 }
17 node = node.children[index];
18 // Increment the count for the last character of the word
19 node.count[lastCharIndex]++;
20 }
21 }
22
23 // Searches for the abbreviation of a word in the trie
24 search(word: string): string {
25 let node: TrieNode = this;
26 let abbreviation: string = "";
27 let lastCharIndex: number = word.charCodeAt(word.length - 1) - 'a'.charCodeAt(0);
28 for (let i = 0; i < word.length - 1; ++i) {
29 let c: string = word[i];
30 node = node.children[c.charCodeAt(0) - 'a'.charCodeAt(0)];
31 abbreviation += c;
32 if (node.count[lastCharIndex] === 1) {
33 break;
34 }
35 }
36 let charsLeft: number = word.length - 1 - abbreviation.length;
37 if (charsLeft > 0) {
38 abbreviation += charsLeft.toString();
39 }
40 abbreviation += word[word.length - 1];
41 return abbreviation.length < word.length ? abbreviation : word;
42 }
43}
44
45// Global function to generate abbreviations for a list of words
46function wordsAbbreviation(words: string[]): string[] {
47 let trees: Map<number, TrieNode> = new Map<number, TrieNode>();
48 // Prepare TrieNodes for each unique word length
49 words.forEach(word => {
50 trees.set(word.length, trees.get(word.length) || new TrieNode());
51 });
52 // Insert words into corresponding trie based on word length
53 words.forEach(word => {
54 trees.get(word.length).insert(word);
55 });
56 let abbreviations: string[] = [];
57 // Search for the abbreviation of each word
58 words.forEach(word => {
59 abbreviations.push(trees.get(word.length).search(word));
60 });
61 return abbreviations;
62}
63
Time and Space Complexity
The given Python code defines a Trie
data structure to compress and represent a list of words in an abbreviated form. It includes two main operations: insert
and search
.
Time Complexity
-
Trie Insertion: The
insert
method of theTrie
class inserts words into the trie and updates a counter of occurrences of a particular ending character with the corresponding word length. This operation has a time complexity ofO(L)
for each word, whereL
is the length of the word. As the insert is called for each word in the list containingN
words, the total time complexity for building the trie becomesO(N * L)
. -
Trie Search: The
search
method traverses the trie for each word, again inO(L)
time (due to the same reasoning as above), to find each unique abbreviation. ForN
words, this gives us a total time complexity ofO(N * L)
.
Assuming all words have around the same length, the overall time complexity for processing all N
words would be O(N * L)
.
Space Complexity
-
Trie Storage: The space complexity of a trie depends on the number of nodes and the size of each node. Each node potentially has 26 children (for each letter of the alphabet) and a dictionary to store occurrences of ending characters and word lengths. Given a worst-case scenario where none of the words share a common prefix and all have maximal length
L
, the space complexity would beO(N * L * 26)
for storing the trie. However, in practical scenarios, common prefixes will exist which can reduce this space requirement significantly. -
Occurrences Dictionary: This stores various combinations of ending characters and word lengths. In the worst-case scenario where all words have unique lengths and endings, this would add up to
O(N * L)
space complexity.
Considering the trie storage and the occurrences dictionary, the total space complexity of the trie data structure can be considered O(N * L * 26)
, which simplifies to O(N * L)
since the constant 26 can be dropped in the Big O notation.
In summary, the time complexity is O(N * L)
and space complexity is O(N * L)
for the provided code.
Learn more about how to find time and space complexity quickly using problem constraints.
How many ways can you arrange the three letters A, B and C?
Recommended Readings
Greedy Introduction div class responsive iframe iframe src https www youtube com embed WTslqPbj7I title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture web share allowfullscreen iframe div When do we use greedy Greedy algorithms tend to solve optimization problems Typically they will ask you to calculate the max min of some value Commonly you may see this phrased in the problem as max min longest shortest largest smallest etc These keywords can be identified by just scanning
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Want a Structured Path to Master System Design Too? Don’t Miss This!