758. Bold Words in String

MediumTrieArrayHash TableStringString Matching
Leetcode Link

Problem Description

Given an array of keywords words and a string s, your task is to make all appearances of any keyword words[i] in the string s bold. In HTML, text is made bold by wrapping it with <b> and </b> tags. You need to return the string s after adding the bold tags such that any letters between these tags are bold in the HTML sense. The string returned must use the fewest number of tags necessary, and the tags should form a valid markup without overlap or redundant tags.

Intuition

To solve this problem effectively, we need to perform a couple of different steps in sequence. First, we need to be able to find all appearances of each keyword in the original string. One efficient way to do this is through the use of a trie, which is a tree-like data structure that can store a dynamic set or associative array where the keys are usually strings.

The intuition behind using a trie is that it allows us to quickly and efficiently search for multiple keywords in a given input string. By inserting all the keywords into the trie and then parsing the input string while traversing the trie, we can identify the start and end points of each keyword occurrence.

Once we've found these occurrences, the next step is to merge any overlapping or contiguous intervals of bold text so that we can minimize the number of bold tags used. We process the intervals such that when we find overlapping or contiguous intervals, we merge them into a single interval by extending the end to the maximum end among the overlapping intervals.

Finally, with all intervals determined and merged as needed, we construct the resulting string. We include the correct text segments inside bold tags while ensuring that non-bold text remains unchanged. Care is taken to avoid overlapping tags and to ensure the final string is a valid HTML string with bold tags only where necessary.

The provided code implements this approach effectively, using a class for the trie structure, inserting keywords, finding matches in the input string, merging intervals, and building the final string with appropriate tags.

Solution Approach

The implementation of the solution follows a multi-step approach leveraging the Trie data structure, array manipulations, and string concatenation.

Firstly, a Trie class is created with operations to insert words into the trie. Each node of the trie consists of an array of children (to store subsequent characters of words) and a boolean flag is_end to mark the end of a word.

Insertion into the trie is straightforward: for each character c in a word, check if the character exists as a child node. If not, create a new Trie node. On reaching the end of the word, set is_end to True.

To find occurrences of keywords in s, we iterate through each index i of s, using nested loops to check for keywords starting at that index. For each character s[j], where j is from i to the end of the string, we use the trie to check if the character sequence matches any word. If the trie indicates that we reach the end of a word, add the interval [i, j] to the array pairs.

After locating all matching intervals, the next operation is to merge overlapping or adjacent intervals. This ensures that we do not insert duplicate or unnecessary bold tags into the final answer. To do this, a temporary list t is created, which holds the merged intervals. We loop through pairs, check if the current interval overlaps with the last added interval in t, and accordingly merge or append intervals.

The final step is string construction with the proper bold tags. We create the ans list and loop through the string s and the merged intervals in t. We append regular text to ans and then for each interval [st, ed] in t, append the bold tags and the bolded text to ans. After appending the last interval and any remaining text, we join all parts from ans into a single string using ''.join(ans) and return it.

The algorithm uses efficient data structures (trie for keyword matching and a list for interval tracking) and minimizes the use of string concatenation by using a list to build the final result, which is later converted into a string only once. The complexity of the algorithm is dependent on the length of s and the number of keywords to be searched, making it efficient for multiple keyword searches in a large text.

💪
Level Up Your
Algo Skills

Example Walkthrough

Let's walk through a simplified example to illustrate the solution approach. Suppose we have the following array of keywords and string s:

  • Keywords: ["ab", "bc"]
  • String s: "aabc"

Step 1: We create a trie and insert the keywords "ab" and "bc" into it.

1Root
2 |
3 a - b (is_end=True)
4 |
5 b - c (is_end=True)

Step 2: We start at each character in s and look for keywords.

  • Starting at index 0 "a", we find nothing.
  • Starting at index 1 "a", we traverse to "ab" and find a match, adding the interval [1, 2] to our pairs.
  • Starting at index 2 "b", we find a match with "bc", adding the interval [2, 3] to our pairs.

The pairs array now contains [[1,2], [2,3]].

Step 3: Merge overlapping or adjacent intervals.

  • The interval [1,2] and [2,3] are adjacent. They can be merged to form [1,3].

After merging, we have just one interval [1,3].

Step 4: Construct the final string with bold tags.

  • We iterate through s and the merged intervals.
  • Before reaching the first interval, we add "a" to ans.
  • We encounter interval [1,3] and add bold tags and text, resulting in ans being ["a", "<b>abc</b>"].

Step 5: Join and return the final result.

  • We join ans into a single string "a<b>abc</b>" and return it.

The final output for the input given is "a<b>abc</b>", which correctly encloses the appearances of the keywords "ab" and "bc" in bold tags while using the fewest number of tags necessary.

Python Solution

1from typing import List
2
3class TrieNode:
4    def __init__(self):
5        self.children = [None] * 128  # All ASCII characters
6        self.is_end_of_word = False  # Flag to mark the end of a word
7
8    def insert(self, word: str) -> None:
9        # Insert a word into the trie
10        node = self
11        for char in word:
12            index = ord(char)  # Get ASCII value for the character
13            if node.children[index] is None:
14                node.children[index] = TrieNode()  # Create a new node if necessary
15            node = node.children[index]
16        node.is_end_of_word = True  # Mark the end of the word
17
18
19class Solution:
20    def boldWords(self, words: List[str], s: str) -> str:
21        # Initialize the Trie and insert all the words
22        trie = TrieNode()
23        for word in words:
24            trie.insert(word)
25
26        n = len(s)
27        pairs = []  # To hold start and end indexes of bold areas
28
29        # Find all occurrences of words in 's'
30        for start in range(n):
31            node = trie
32            for end in range(start, n):
33                index = ord(s[end])
34                if node.children[index] is None:
35                    break
36                node = node.children[index]
37                if node.is_end_of_word:
38                    pairs.append([start, end])
39
40        # If no matches are found, return the original string
41        if not pairs:
42            return s
43
44        # Merge the overlapping intervals
45        merged_intervals = []
46        start, end = pairs[0]
47        for interval_start, interval_end in pairs[1:]:
48            if end + 1 < interval_start:
49                merged_intervals.append([start, end])
50                start, end = interval_start, interval_end
51            else:
52                end = max(end, interval_end)
53        merged_intervals.append([start, end])
54
55        # Build the final string with bold tags
56        answer = []
57        index = 0
58        for start, end in merged_intervals:
59            # Add non-bold part of string
60            if index < start:
61                answer.append(s[index:start])
62            # Add the bold tag and the bold part of string
63            answer.append('<b>' + s[start:end + 1] + '</b>')
64            # Update the index to the end of the bold part
65            index = end + 1
66
67        # Add any remaining non-bold part of the string
68        if index < n:
69            answer.append(s[index:])
70
71        # Join all parts to form the final string
72        return ''.join(answer)
73
74# Example usage:
75# solution = Solution()
76# print(solution.boldWords(["ab", "bc"], "aabcd"))  # Output should be: "<b>aab</b>cd"
77

Java Solution

1class Trie {
2    Trie[] children = new Trie[128]; // A large array to accommodate all ASCII characters
3    boolean isEndOfWord; // Indicates the end of a word
4
5    // Inserts a word into the Trie
6    public void insert(String word) {
7        Trie node = this;
8        for (char ch : word.toCharArray()) {
9            if (node.children[ch] == null) {
10                node.children[ch] = new Trie(); // Initialize the Trie node for the character if it doesn't exist
11            }
12            node = node.children[ch]; // Move to the child node
13        }
14        node.isEndOfWord = true; // Mark the end of a valid word
15    }
16}
17
18class Solution {
19    public String boldWords(String[] words, String s) {
20        Trie trie = new Trie();
21        // Insert all pattern words into the Trie
22        for (String word : words) {
23            trie.insert(word);
24        }
25      
26        List<int[]> intervals = new ArrayList<>();
27        int length = s.length();
28      
29        // Go through the string to find all substrings that match words in the Trie
30        for (int i = 0; i < length; ++i) {
31            Trie node = trie;
32            for (int j = i; j < length; ++j) {
33                char currentChar = s.charAt(j);
34                if (node.children[currentChar] == null) {
35                    break; // Stop if there is no matching character in Trie
36                }
37              
38                node = node.children[currentChar];
39              
40                // If a word ends here, add the interval to the list
41                if (node.isEndOfWord) {
42                    intervals.add(new int[] {i, j});
43                }
44            }
45        }
46      
47        // If no words to be bolded are found, return the original string
48        if (intervals.isEmpty()) {
49            return s;
50        }
51      
52        // Merge overlapping and adjacent intervals
53        List<int[]> mergedIntervals = new ArrayList<>();
54        int start = intervals.get(0)[0], end = intervals.get(0)[1];
55        for (int i = 1; i < intervals.size(); ++i) {
56            int newStart = intervals.get(i)[0], newEnd = intervals.get(i)[1];
57            if (end + 1 < newStart) {
58                // If the current end is before the new start, add the interval and update to the new one
59                mergedIntervals.add(new int[] {start, end});
60                start = newStart;
61            }
62            end = Math.max(end, newEnd); // Extend the current interval to include the new end
63        }
64        // Add the last merged interval
65        mergedIntervals.add(new int[] {start, end});
66
67        // Build the resulting string with bold tags
68        StringBuilder result = new StringBuilder();
69        int currentIndex = 0, intervalIndex = 0;
70        while (currentIndex < length) {
71            if (intervalIndex == mergedIntervals.size()) {
72                // Append remaining part of the string if there are no more intervals
73                result.append(s.substring(currentIndex));
74                break;
75            }
76          
77            start = mergedIntervals.get(intervalIndex)[0];
78            end = mergedIntervals.get(intervalIndex)[1];
79          
80            // Append non-bold part of the string
81            if (currentIndex < start) {
82                result.append(s.substring(currentIndex, start));
83            }
84          
85            // Move to the next interval
86            ++intervalIndex;
87          
88            // Append the bold part of the string
89            result.append("<b>").append(s.substring(start, end + 1)).append("</b>");
90            currentIndex = end + 1; // Update current index to the end of the bold interval
91        }
92      
93        return result.toString();
94    }
95}
96

C++ Solution

1#include <vector>
2#include <string>
3#include <algorithm>
4using namespace std;
5
6// Definition of a Trie node.
7class Trie {
8public:
9    vector<Trie*> children; // Vector to store children nodes.
10    bool isEndOfWord;       // Flag to mark the end of a word.
11
12    // Constructor initializes the children with 128 null pointers for ASCII and sets isEndOfWord to false.
13    Trie() : children(128, nullptr), isEndOfWord(false) {}
14
15    // Function to insert a word into the Trie.
16    void insert(const string& word) {
17        Trie* node = this;
18        for (char c : word) {
19            if (!node->children[c]) {
20                node->children[c] = new Trie();
21            }
22            node = node->children[c];
23        }
24        node->isEndOfWord = true;
25    }
26};
27
28// Solution class to handle bold words problem.
29class Solution {
30public:
31    // Function to add bold tags around specific words found in a string.
32    string boldWords(vector<string>& words, string& s) {
33        Trie* trie = new Trie();
34        // Insert all words into the Trie.
35        for (const string& w : words) {
36            trie->insert(w);
37        }
38
39        int n = s.size();
40        // Pairs to store intervals that need to be bolded.
41        vector<pair<int, int>> pairs;
42
43        // Find all occurrences of given words in the string s.
44        for (int i = 0; i < n; ++i) {
45            Trie* node = trie;
46            for (int j = i; j < n; ++j) {
47                int idx = s[j];
48                // If there is no child, break current iteration.
49                if (!node->children[idx]) break;
50                node = node->children[idx];
51                // If a word ends here, store the interval.
52                if (node->isEndOfWord) pairs.emplace_back(i, j);
53            }
54        }
55
56        if (pairs.empty()) {
57            return s;
58        }
59
60        // Merge intervals if they overlap.
61        vector<pair<int, int>> mergedIntervals;
62        int start = pairs[0].first, end = pairs[0].second;
63        for (int i = 1; i < pairs.size(); ++i) {
64            int a = pairs[i].first, b = pairs[i].second;
65            if (end + 1 < a) {
66                mergedIntervals.emplace_back(start, end);
67                start = a;
68                end = b;
69            } else {
70                end = max(end, b);
71            }
72        }
73        mergedIntervals.emplace_back(start, end);
74
75        // Construct the result with bold tags.
76        string result;
77        int currentIndex = 0, intervalIndex = 0;
78        while (currentIndex < n) {
79            if (intervalIndex == mergedIntervals.size()) {
80                result += s.substr(currentIndex);
81                break;
82            }
83            start = mergedIntervals[intervalIndex].first, end = mergedIntervals[intervalIndex].second;
84            if (currentIndex < start) {
85                result += s.substr(currentIndex, start - currentIndex);
86            }
87            result += "<b>" + s.substr(start, end - start + 1) + "</b>";
88            currentIndex = end + 1;
89            ++intervalIndex;
90        }
91
92        delete trie; // Clean up the Trie.
93        return result;
94    }
95};
96

Typescript Solution

1// A vector is an array in TypeScript
2type TrieNode = Trie | null;
3
4// Definition of the Trie Node.
5class Trie {
6  // An array to store children nodes for ASCII characters.
7  public children: TrieNode[] = new Array(128).fill(null);
8  // A flag that indicates the end of the word.
9  public isEndOfWord: boolean = false;
10
11  // Method to insert a word into the Trie.
12  public insert(word: string): void {
13    let node: Trie = this;
14    for (const char of word) {
15      const index = char.charCodeAt(0);
16      if (!node.children[index]) {
17        node.children[index] = new Trie();
18      }
19      node = node.children[index]!;
20    }
21    node.isEndOfWord = true;
22  }
23}
24
25// Utility function to add bold tags around specified words found in a string.
26function boldWords(words: string[], s: string): string {
27  // Create a Trie and insert all words into it.
28  const trie = new Trie();
29  for (const word of words) {
30    trie.insert(word);
31  }
32
33  const n: number = s.length;
34  // Array to store intervals that need to be bolded.
35  const intervals: Array<[number, number]> = [];
36
37  // Find all occurrences of words in the string s.
38  for (let i = 0; i < n; ++i) {
39    let node: Trie | null = trie;
40    for (let j = i; j < n; ++j) {
41      const index = s.charCodeAt(j);
42      // If no child node is found, break from the loop.
43      if (!node.children[index]) break;
44      node = node.children[index]!;
45      // If a word ends here, store the interval.
46      if (node.isEndOfWord) intervals.push([i, j]);
47    }
48  }
49
50  // Merge overlapping intervals.
51  const mergedIntervals: Array<[number, number]> = [];
52  for (let i = 0; i < intervals.length; i++) {
53    const [start, end] = intervals[i];
54    // If the intervals do not overlap, add to the merged list.
55    if (mergedIntervals.length === 0 || mergedIntervals[mergedIntervals.length - 1][1] < start) {
56      mergedIntervals.push([start, end]);
57    } else {
58      // If they do overlap, merge with the last interval.
59      mergedIntervals[mergedIntervals.length - 1][1] = Math.max(end, mergedIntervals[mergedIntervals.length - 1][1]);
60    }
61  }
62
63  // Construct the result with bold tags.
64  let result = '';
65  let currentIndex = 0;
66  for (const [start, end] of mergedIntervals) {
67    // Append non-bold text to the result.
68    result += s.slice(currentIndex, start);
69    // Append bold text to the result.
70    result += `<b>${s.slice(start, end + 1)}</b>`;
71    currentIndex = end + 1;
72  }
73  // Append any remaining non-bold text.
74  result += s.slice(currentIndex);
75
76  // Return the final result with added bold tags.
77  return result;
78}
79
80export { Trie, boldWords };
81

Time and Space Complexity

Time Complexity

The overall time complexity of the provided code can be split into two main parts: building the trie and finding & merging the intervals.

  1. Building the Trie: Inserting a word into the trie has a time complexity of O(m), where m is the length of the word being inserted. Since we insert all k words in the words list into the trie, and let's assume the average length of the words is m, the total time for building the trie is O(k * m).

  2. Finding & Merging Intervals: For each starting position i in the string s of length n, we look for words in trie that match the substring starting at i. Comparing the substring of s with the trie has a worst-case time complexity of O(n) if the substring includes the whole string s. Thus, the worst-case scenario for iterating through the entire string with overlaps can be O(n^2).

Then, we merge the intervals in the pairs list. Merging intervals is linear with respect to the number of intervals found, let's call this number p. In the worst case, p can be n (for example, when every character is the start of a word in the trie), so this operation is O(p) = O(n).

The total time complexity of the bolding part is then O(n^2 + n), which simplifies to O(n^2).

Therefore, the total time complexity is O(k * m + n^2).

Space Complexity

The space complexity is dominated by the trie storage and the intervals list:

  1. Space for Trie: Since we are using a fixed-size array 128 for each trie node to represent all possible ASCII characters, and assuming an average depth of m, the space for the trie in the worst case can be O(128^m), which simplifies to O(1) since m is bounded by the fixed character set size. In practical average case scenarios (limited number of total characters), the space complexity is O(k * m) where k is the number of unique words and m is the average length of the words.

  2. Space for Interval Pairs: The pairs list, in the worst case, can have an entry for every character in s, which would give us a space complexity of O(n). Since we are merging intervals, this list can potentially be smaller, but we consider the worst case.

Additionally, the space for the output list ans is O(n) since it contains, at most, the characters from s along with additional tags.

Thus, the total space complexity of the function is O(k * m + n).

😈
Become an
Algo Monster

Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫