616. Add Bold Tag in String

MediumTrieArrayHash TableStringString Matching

Problem Description

You are provided with two inputs:

A string s, which is the text you are working on.
An array of strings words, which are the substrings you need to find in s.

Your task is to add HTML-like bold tags ( and ) around the parts of s that are exactly present in words. There are some rules to follow while adding these bold tags:

If the substrings in s that match any of the words overlap, you should wrap this entire overlapped section in a single set of bold tags.
If there are consecutive substrings already wrapped in bold tags, they need to be merged into a single set of bold tags.

Your goal is to return the modified string s with the appropriate bold tags added.

Intuition

The intuition behind the solution is to efficiently identify all the substrings in s that match any of the words in words and then determine how to add bold tags around these substrings according to the specified rules.

To achieve this, a Trie data structure can be utilized to store all the words. Trie is particularly useful for matching prefices of strings which is necessary since we are looking to find substrings in s that match any in words.

After populating the Trie with all the words, we then iterate through each character in the string s, using the Trie to check for matches as we extend our current substring character by character. Each time we find a matching end of a word in words, we store this as a pair of indices marking the start and end of the bold tag.

Once we have all the index pairs, we need to merge overlapping ranges to comply with the given requirements. We initialize a new list that will hold the merged index ranges and iterate over the pairs, merging them if necessary.

Finally, we go through the merged index ranges, inserting the bold tags into the correct positions. We ensure that we only add non-bolded parts of s first, add an opening bold tag, append the bolded part, and then append a closing bold tag until all bolded ranges are processed.

The implementation uses a list to keep track of the modified string's parts and then joins them at the end to form the final result with the bold tags in place.

Learn more about Trie patterns.

Solution Approach

The solution consists of several steps, utilizing a Trie data structure, list manipulation, and string manipulation. Below is an explanation of the implementation steps:

Building the Trie: The Trie class is defined, which has an insert method allowing us to insert a word into the Trie. Each node has a fixed size array of child pointers to cover all the possible ASCII characters (128 in this case). The is_end boolean indicates whether the path to the node represents the end of a word. We iterate over all the words and insert them into the Trie.
Searching for Matching Substrings: The addBoldTag function iterates over each character in the string s. It does this by fixing a start point and extending the end point character by character, checking if the current substring is in the Trie. If a match is found (signified by is_end being True), the indices (start, end) are added to the pairs list.
Merging Overlapping and Consecutive Ranges: Before wrapping substrings in bold tags, we first need to merge overlapping or consecutive index ranges. The pairs list containing start and end index pairs is iterated over, and if the current end index is greater than or equal to the start index of the next pair, the ranges are merged. This is achieved by updating the current end index to be the maximum of the current end index and the end index of the next pair.
Building the Output String with Bold Tags: The ans list will contain parts of the final string, including strings without bold tags and strings within bold tags. We iterate over the indices of the string s, checking against the merged index list t. If the current index i falls in a bolded range, we add the bold tags and the bolded substring to ans. If i is before the next bolded range, we add the non-bolded substring. This process continues until all characters have been processed or until the end of the t list is reached.
Returning the Final Result: Finally, we join all the parts in the ans list into a single string, which now contains the required bold tags, and return it.

The code is efficient since it identifies and processes overlaps in a single pass after Trie construction, and building the string from the list of parts is done in O(n) time, where n is the length of the input string s.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's use a small example to illustrate the solution approach.

Suppose we have the string s:

"abcxyz123"

And the array of strings words:

["abc","123"]

Here are the steps we would follow based on the solution approach:

Building the Trie: First, we create a Trie and insert the words "abc" and "123" into it. After this, our Trie will have two paths, one ending in 'c' and another ending in '3', each marked as the endpoint of a word.
Searching for Matching Substrings: We then iterate over each character in s. Starting at 'a', we look into the Trie and find that "abc" is a word that ends here. So, we add the pair (0, 2) to our list of pairs because the substring from the index 0 to index 2 matches a word in words. We perform similar checks for other indices and find that the substring "123" starts at index 6 and ends at index 8, so we add the pair (6,8) to our list.
Merging Overlapping and Consecutive Ranges: In this simple example, there are no overlapping or consecutive ranges in the pairs list. If there were, we would merge them into a single range. For illustration, if words also included "xyz", the pair list would be [(0, 2), (3, 5), (6, 8)], and we would merge (0, 2) and (3, 5) into (0, 5).
Building the Output String with Bold Tags: Next, we build the output string. We start from index 0, and since it's the start of a bold range, we insert  before "abc", and because (0, 2) marks the end of the bold segment, we insert  after "c". We do the same for the "123" substring starting at index 6 and insert  after "3".
Returning the Final Result: Lastly, we combine all parts together. After inserting the necessary bold tags, the resulting string parts are ["abc", "xyz", "123"]. We join these parts to get the final result: "abcxyz123".

The final output is a string with the substrings "abc" and "123" wrapped in bold tags since they match words from the words array. The sequence within the string has been kept, and tags have been appropriately placed without any overlaps or unnecessary tag insertions, following the described solution's approach.

Solution Implementation

1class TrieNode:
2    def __init__(self):
3        # Initialize a list of children of size 128 for all ASCII characters
4        self.children = [None] * 128
5        # Flag indicating the end of a word
6        self.is_end = False
7
8    def insert(self, word):
9        # Insert a word into the Trie
10        node = self
11        for char in word:
12            index = ord(char)  # Get ASCII value to find the index in children
13            if node.children[index] is None:
14                node.children[index] = TrieNode()
15            node = node.children[index]
16        node.is_end = True  # Mark the end of the word
17
18      
19class Solution:
20    def addBoldTag(self, string: str, words: list[str]) -> str:
21        trie = TrieNode()
22        # Insert every word into the trie
23        for word in words:
24            trie.insert(word)
25          
26        length = len(string)
27        bold_pairs = []  # To store start and end indices of substrings
28      
29        # Find the substrings that are present in Trie and store their start and end indices
30        for start in range(length):
31            node = trie
32            for end in range(start, length):
33                index = ord(string[end])
34                if node.children[index] is None:
35                    break
36                node = node.children[index]
37                if node.is_end:
38                    bold_pairs.append([start, end])
39
40        # If there's no substring found, return the original string
41        if not bold_pairs:
42            return string
43      
44        # Merge overlapping or contiguous intervals
45        merged_pairs = []
46        current_start, current_end = bold_pairs[0]
47        for start, end in bold_pairs[1:]:
48            if current_end + 1 < start:
49                merged_pairs.append([current_start, current_end])
50                current_start, current_end = start, end
51            else:
52                current_end = max(current_end, end)
53        merged_pairs.append([current_start, current_end])
54      
55        # Build the result string with bold tags
56        result = []
57        index = 0  # Keep track of the current position in the string
58      
59        for start, end in merged_pairs:
60            # Append the non-bold part of the string
61            if index < start:
62                result.append(string[index:start])
63            # Append the bold tag and the bold part of the string
64            result.append('<b>')
65            result.append(string[start:end + 1])
66            result.append('</b>')
67            index = end + 1  # Move the current position past the bold part
68      
69        # Append the remaining part of the string, if any
70        if index < length:
71            result.append(string[index:])
72      
73        # Join all parts together to form the final string
74        return ''.join(result)
75

1class Trie {
2    // Trie node array covering ASCII characters
3    Trie[] children = new Trie[128]; 
4    // Flag to indicate the end of a word
5    boolean isEndOfWord; 
6
7    // Insert a word into the trie
8    public void insert(String word) {
9        Trie node = this;
10        for (char ch : word.toCharArray()) {
11            if (node.children[ch] == null) {
12                node.children[ch] = new Trie();
13            }
14            node = node.children[ch];
15        }
16        node.isEndOfWord = true;
17    }
18}
19
20class Solution {
21  
22    // Function to add bold tags around substrings in 's' that appear in 'words'
23    public String addBoldTag(String s, String[] words) {
24        Trie trie = new Trie();
25        // Insert all words into the trie
26        for (String word : words) {
27            trie.insert(word);
28        }
29      
30        List<int[]> intervals = new ArrayList<>();
31        int length = s.length();
32      
33        // Find all substrings in 's' that are a prefix in the trie
34        for (int i = 0; i < length; ++i) {
35            Trie node = trie;
36            for (int j = i; j < length; ++j) {
37                char currentChar = s.charAt(j);
38                if (node.children[currentChar] == null) {
39                    break;
40                }
41                node = node.children[currentChar];
42                // If a word ends here, add the interval [i, j] to the list
43                if (node.isEndOfWord) {
44                    intervals.add(new int[] {i, j});
45                }
46            }
47        }
48      
49        // If there are no intervals, return the original string
50        if (intervals.isEmpty()) {
51            return s;
52        }
53      
54        // Merge overlapping and consecutive intervals
55        List<int[]> mergedIntervals = new ArrayList<>();
56        int start = intervals.get(0)[0], end = intervals.get(0)[1];
57        for (int i = 1; i < intervals.size(); ++i) {
58            int currentStart = intervals.get(i)[0], currentEnd = intervals.get(i)[1];
59            if (end + 1 < currentStart) {
60                mergedIntervals.add(new int[] {start, end});
61                start = currentStart;
62                end = currentEnd;
63            } else {
64                end = Math.max(end, currentEnd);
65            }
66        }
67        // Add the last interval
68        mergedIntervals.add(new int[] {start, end});
69
70        // Construct the final string with <b> tags
71        StringBuilder result = new StringBuilder();
72        int currentIndex = 0, mergedIndex = 0;
73
74        while (currentIndex < length) {
75            if (mergedIndex == mergedIntervals.size()) {
76                result.append(s.substring(currentIndex));
77                break;
78            }
79            start = mergedIntervals.get(mergedIndex)[0];
80            end = mergedIntervals.get(mergedIndex)[1];
81          
82            // Add non-bold part of the string
83            if (currentIndex < start) {
84                result.append(s.substring(currentIndex, start));
85            }
86          
87            // Move to the next interval
88            ++mergedIndex;
89          
90            // Enclose the substring within <b> tags
91            result.append("<b>")
92                  .append(s.substring(start, end + 1))
93                  .append("</b>");
94          
95            // Update the currentIndex to the end of the bold section
96            currentIndex = end + 1;
97        }
98
99        return result.toString();
100    }
101}
102

1// Trie node class
2class Trie {
3public:
4    vector<Trie*> children; // Children of the current node
5    bool isEndOfWord; // Flag to check if the node represents end of a valid word
6
7    // Constructor initializes the Trie node
8    Trie() : children(128, nullptr), isEndOfWord(false) {}
9
10    // Insert a word into the Trie
11    void insert(const string& word) {
12        Trie* node = this;
13        for (char character : word) {
14            if (node->children[character] == nullptr) {
15                node->children[character] = new Trie();
16            }
17            node = node->children[character];
18        }
19        node->isEndOfWord = true;
20    }
21};
22
23class Solution {
24public:
25    // Method to add bold tags around substrings in 's' that appear in 'words'
26    string addBoldTag(string s, const vector<string>& words) {
27        Trie* trie = new Trie();
28        // Build the Trie from the list of words
29        for (const string& word : words) {
30            trie->insert(word);
31        }
32        int strLength = s.size();
33        vector<pair<int, int>> intervals; // Pairs to remember the start and end indices for bold tags
34      
35        // Find all substrings of 's' that appear in 'words'
36        for (int i = 0; i < strLength; ++i) {
37            Trie* node = trie;
38            for (int j = i; j < strLength; ++j) {
39                char currentChar = s[j];
40                if (!node->children[currentChar]) break; // No further match
41                node = node->children[currentChar];
42                if (node->isEndOfWord) {
43                    intervals.emplace_back(i, j); // Found a substring that is in words
44                }
45            }
46        }
47        if (intervals.empty()) return s; // No substrings found, return the original string
48      
49        // Merge overlapping intervals
50        vector<pair<int, int>> mergedIntervals;
51        int start = intervals[0].first, end = intervals[0].second;
52        for (int i = 1; i < intervals.size(); ++i) {
53            int currentStart = intervals[i].first, currentEnd = intervals[i].second;
54            if (end + 1 < currentStart) {
55                mergedIntervals.emplace_back(start, end);
56                start = currentStart;
57                end = currentEnd;
58            } else {
59                end = max(end, currentEnd);
60            }
61        }
62        mergedIntervals.emplace_back(start, end); // Add the last interval
63      
64        // Construct the result by inserting <b> and </b> tags
65        string result;
66        int currentIndex = 0, intervalIndex = 0;
67        while (currentIndex < strLength) {
68            if (intervalIndex == mergedIntervals.size()) {
69                result += s.substr(currentIndex); // Add the remainder of the string
70                break;
71            }
72            start = mergedIntervals[intervalIndex].first;
73            end = mergedIntervals[intervalIndex].second;
74            if (currentIndex < start) {
75                result += s.substr(currentIndex, start - currentIndex); // Add non-bold part
76            }
77            result += "<b>" + s.substr(start, end - start + 1) + "</b>"; // Add bold part
78            currentIndex = end + 1; // Move past the bold part
79            ++intervalIndex; // Move to the next interval
80        }
81        return result;
82    }
83};
84

1// Define a TrieNode type with children and isEndOfWord properties
2type TrieNode = {
3    children: (TrieNode | null)[];
4    isEndOfWord: boolean;
5};
6
7// Initializer function to create a new TrieNode
8function createTrieNode(): TrieNode {
9    return {
10        children: Array(128).fill(null), // Create array with 128 slots initialized to null
11        isEndOfWord: false // Flag for end of a word is initially false
12    };
13}
14
15// Function to insert a word into the Trie
16function insertWord(root: TrieNode, word: string): void {
17    let node: TrieNode = root;
18    for (const char of word) {
19        const charCode: number = char.charCodeAt(0);
20        if (node.children[charCode] === null) {
21            node.children[charCode] = createTrieNode(); // Create new TrieNode if it doesn't exist
22        }
23        node = node.children[charCode] as TrieNode;
24    }
25    node.isEndOfWord = true; // Set the flag at the end of word
26}
27
28// Method to add bold tags around substrings in 'text' that appear in 'dictionaryWords'
29function addBoldTag(text: string, dictionaryWords: string[]): string {
30    const root: TrieNode = createTrieNode(); // Create the root of the Trie
31    // Build the Trie from the list of words
32    for (const word of dictionaryWords) {
33        insertWord(root, word);
34    }
35    const textLength: number = text.length;
36    let intervals: Array<[number, number]> = []; // Pairs to remember the start and end indices for bold tags
37
38    // Find all substrings of 'text' that appear in 'dictionaryWords'
39    for (let i = 0; i < textLength; ++i) {
40        let node: TrieNode | null = root;
41        for (let j = i; j < textLength; ++j) {
42            const currentChar: number = text.charCodeAt(j);
43            if (node.children[currentChar] === null) break; // No further match
44            node = node.children[currentChar] as TrieNode;
45            if (node.isEndOfWord) {
46                intervals.push([i, j]); // Found a substring that is within dictionaryWords
47            }
48        }
49    }
50    if (intervals.length === 0) return text; // No substrings found, return the original string
51
52    // Merge overlapping intervals
53    let mergedIntervals: Array<[number, number]> = [];
54    let [start, end] = intervals[0];
55
56    for (let i = 1; i < intervals.length; ++i) {
57        let [currentStart, currentEnd] = intervals[i];
58        if (end + 1 < currentStart) {
59            mergedIntervals.push([start, end]);
60            start = currentStart; // Start of new interval
61            end = currentEnd; // End of new interval
62        } else {
63            end = Math.max(end, currentEnd); // Extend the end if overlapping
64        }
65    }
66    mergedIntervals.push([start, end]); // Add the last interval
67
68    // Construct the result by inserting <b> and </b> tags
69    let result: string = '';
70    let currentIndex: number = 0;
71    let intervalIndex: number = 0;
72
73    while (currentIndex < textLength) {
74        if (intervalIndex === mergedIntervals.length) {
75            result += text.substring(currentIndex); // Add the remainder of the string
76            break;
77        }
78        [start, end] = mergedIntervals[intervalIndex];
79        if (currentIndex < start) {
80            result += text.substring(currentIndex, start); // Add non-bold part
81        }
82        result += `<b>${text.substring(start, end + 1)}</b>`; // Add bold part
83        currentIndex = end + 1; // Move past the bold part
84        intervalIndex++; // Move to the next interval
85    }
86
87    return result;
88}
89

Time and Space Complexity

Time Complexity

The time complexity of building the Trie is O(W * L) where W is the number of words in the input list words and L is the maximum length of a word since we iterate through each character in every word.

For searching all the substrings in s, in the worst case, every substring starting from i to the end of s could be a potential match. This results in a time complexity of O(N^2) where N is the length of the string s.

The merging of intervals has a complexity of O(M) where M is the total number of intervals before merging. In the worst case, M could be O(N) since there could be overlapping intervals for every pair of characters in the string s.

Finally, building the resultant string with bold tags involves iterating over the merged intervals and the string s. The complexity for this step is O(N).

Combining all the steps, the total time complexity is O(W * L + N^2 + M + N), which simplifies to O(W * L + N^2) given that M is at most N.

Space Complexity

The space complexity of the Trie is O(AL) where A is the size of the character set and L is the maximum length of a word due to the space taken by nodes of the Trie.

The space required for the list of intervals (pairs) and the merged list (t) is at most O(N) since there can be at most one interval per character.

Then, there is additional space used by the resultant ans list. In the worst case, when every character in s is within a bold tag, we would need space for the characters of s as well as the  and  tags, resulting in a space complexity of O(N).

Therefore, the total space complexity is O(AL + N + N), which is O(AL + N) because A is a constant (128 in this case), and L is typically much smaller than N.

Learn more about how to find time and space complexity quickly using problem constraints.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Consider the classic dynamic programming of longest increasing subsequence:

Find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.

For example, the length of LIS for [50, 3, 10, 7, 40, 80] is 4 and LIS is [3, 7, 40, 80].

What is the recurrence relation?

dp[i] = dp[i] + 1

dp[i] = dp[i] + dp[i - 1]

dp[i] = (dp[i] + dp[j] + 1) for j in 0 to i

dp[i] = max(dp[i], dp[j] + 1) for j in 0 to i

I don't know