758. Bold Words in String
Problem Description
Given an array of keywords words
and a string s
, your task is to make all appearances of any keyword words[i]
in the string s
bold. In HTML, text is made bold by wrapping it with <b>
and </b>
tags. You need to return the string s
after adding the bold tags such that any letters between these tags are bold in the HTML sense. The string returned must use the fewest number of tags necessary, and the tags should form a valid markup without overlap or redundant tags.
Intuition
To solve this problem effectively, we need to perform a couple of different steps in sequence. First, we need to be able to find all appearances of each keyword in the original string. One efficient way to do this is through the use of a trie, which is a tree-like data structure that can store a dynamic set or associative array where the keys are usually strings.
The intuition behind using a trie is that it allows us to quickly and efficiently search for multiple keywords in a given input string. By inserting all the keywords into the trie and then parsing the input string while traversing the trie, we can identify the start and end points of each keyword occurrence.
Once we've found these occurrences, the next step is to merge any overlapping or contiguous intervals of bold text so that we can minimize the number of bold tags used. We process the intervals such that when we find overlapping or contiguous intervals, we merge them into a single interval by extending the end to the maximum end among the overlapping intervals.
Finally, with all intervals determined and merged as needed, we construct the resulting string. We include the correct text segments inside bold tags while ensuring that non-bold text remains unchanged. Care is taken to avoid overlapping tags and to ensure the final string is a valid HTML string with bold tags only where necessary.
The provided code implements this approach effectively, using a class for the trie structure, inserting keywords, finding matches in the input string, merging intervals, and building the final string with appropriate tags.
Learn more about Trie patterns.
Solution Approach
The implementation of the solution follows a multi-step approach leveraging the Trie data structure, array manipulations, and string concatenation.
Firstly, a [Trie](/problems/trie_intro)
class is created with operations to insert words into the trie. Each node of the trie consists of an array of children (to store subsequent characters of words) and a boolean flag is_end
to mark the end of a word.
Insertion into the trie is straightforward: for each character c
in a word, check if the character exists as a child node. If not, create a new Trie node. On reaching the end of the word, set is_end
to True
.
To find occurrences of keywords in s
, we iterate through each index i
of s
, using nested loops to check for keywords starting at that index. For each character s[j]
, where j
is from i
to the end of the string, we use the trie to check if the character sequence matches any word. If the trie indicates that we reach the end of a word, add the interval [i, j]
to the array pairs
.
After locating all matching intervals, the next operation is to merge overlapping or adjacent intervals. This ensures that we do not insert duplicate or unnecessary bold tags into the final answer. To do this, a temporary list t
is created, which holds the merged intervals. We loop through pairs
, check if the current interval overlaps with the last added interval in t
, and accordingly merge or append intervals.
The final step is string construction with the proper bold tags. We create the ans
list and loop through the string s
and the merged intervals in t
. We append regular text to ans
and then for each interval [st, ed]
in t
, append the bold tags and the bolded text to ans
. After appending the last interval and any remaining text, we join all parts from ans
into a single string using ''.join(ans)
and return it.
The algorithm uses efficient data structures (trie for keyword matching and a list for interval tracking) and minimizes the use of string concatenation by using a list to build the final result, which is later converted into a string only once. The complexity of the algorithm is dependent on the length of s
and the number of keywords to be searched, making it efficient for multiple keyword searches in a large text.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a simplified example to illustrate the solution approach. Suppose we have the following array of keywords and string s
:
- Keywords:
["ab", "bc"]
- String
s
:"aabc"
Step 1: We create a trie and insert the keywords "ab" and "bc" into it.
1Root 2 | 3 a - b (is_end=True) 4 | 5 b - c (is_end=True)
Step 2: We start at each character in s
and look for keywords.
- Starting at index 0
"a"
, we find nothing. - Starting at index 1
"a"
, we traverse to"ab"
and find a match, adding the interval[1, 2]
to ourpairs
. - Starting at index 2
"b"
, we find a match with"bc"
, adding the interval[2, 3]
to ourpairs
.
The pairs
array now contains [[1,2], [2,3]]
.
Step 3: Merge overlapping or adjacent intervals.
- The interval
[1,2]
and[2,3]
are adjacent. They can be merged to form[1,3]
.
After merging, we have just one interval [1,3]
.
Step 4: Construct the final string with bold tags.
- We iterate through
s
and the merged intervals. - Before reaching the first interval, we add
"a"
toans
. - We encounter interval
[1,3]
and add bold tags and text, resulting inans
being["a", "<b>abc</b>"]
.
Step 5: Join and return the final result.
- We join
ans
into a single string"a<b>abc</b>"
and return it.
The final output for the input given is "a<b>abc</b>"
, which correctly encloses the appearances of the keywords "ab" and "bc" in bold tags while using the fewest number of tags necessary.
Solution Implementation
1from typing import List
2
3class TrieNode:
4 def __init__(self):
5 self.children = [None] * 128 # All ASCII characters
6 self.is_end_of_word = False # Flag to mark the end of a word
7
8 def insert(self, word: str) -> None:
9 # Insert a word into the trie
10 node = self
11 for char in word:
12 index = ord(char) # Get ASCII value for the character
13 if node.children[index] is None:
14 node.children[index] = TrieNode() # Create a new node if necessary
15 node = node.children[index]
16 node.is_end_of_word = True # Mark the end of the word
17
18
19class Solution:
20 def boldWords(self, words: List[str], s: str) -> str:
21 # Initialize the Trie and insert all the words
22 trie = TrieNode()
23 for word in words:
24 trie.insert(word)
25
26 n = len(s)
27 pairs = [] # To hold start and end indexes of bold areas
28
29 # Find all occurrences of words in 's'
30 for start in range(n):
31 node = trie
32 for end in range(start, n):
33 index = ord(s[end])
34 if node.children[index] is None:
35 break
36 node = node.children[index]
37 if node.is_end_of_word:
38 pairs.append([start, end])
39
40 # If no matches are found, return the original string
41 if not pairs:
42 return s
43
44 # Merge the overlapping intervals
45 merged_intervals = []
46 start, end = pairs[0]
47 for interval_start, interval_end in pairs[1:]:
48 if end + 1 < interval_start:
49 merged_intervals.append([start, end])
50 start, end = interval_start, interval_end
51 else:
52 end = max(end, interval_end)
53 merged_intervals.append([start, end])
54
55 # Build the final string with bold tags
56 answer = []
57 index = 0
58 for start, end in merged_intervals:
59 # Add non-bold part of string
60 if index < start:
61 answer.append(s[index:start])
62 # Add the bold tag and the bold part of string
63 answer.append('<b>' + s[start:end + 1] + '</b>')
64 # Update the index to the end of the bold part
65 index = end + 1
66
67 # Add any remaining non-bold part of the string
68 if index < n:
69 answer.append(s[index:])
70
71 # Join all parts to form the final string
72 return ''.join(answer)
73
74# Example usage:
75# solution = Solution()
76# print(solution.boldWords(["ab", "bc"], "aabcd")) # Output should be: "<b>aab</b>cd"
77
1class Trie {
2 Trie[] children = new Trie[128]; // A large array to accommodate all ASCII characters
3 boolean isEndOfWord; // Indicates the end of a word
4
5 // Inserts a word into the Trie
6 public void insert(String word) {
7 Trie node = this;
8 for (char ch : word.toCharArray()) {
9 if (node.children[ch] == null) {
10 node.children[ch] = new Trie(); // Initialize the Trie node for the character if it doesn't exist
11 }
12 node = node.children[ch]; // Move to the child node
13 }
14 node.isEndOfWord = true; // Mark the end of a valid word
15 }
16}
17
18class Solution {
19 public String boldWords(String[] words, String s) {
20 Trie trie = new Trie();
21 // Insert all pattern words into the Trie
22 for (String word : words) {
23 trie.insert(word);
24 }
25
26 List<int[]> intervals = new ArrayList<>();
27 int length = s.length();
28
29 // Go through the string to find all substrings that match words in the Trie
30 for (int i = 0; i < length; ++i) {
31 Trie node = trie;
32 for (int j = i; j < length; ++j) {
33 char currentChar = s.charAt(j);
34 if (node.children[currentChar] == null) {
35 break; // Stop if there is no matching character in Trie
36 }
37
38 node = node.children[currentChar];
39
40 // If a word ends here, add the interval to the list
41 if (node.isEndOfWord) {
42 intervals.add(new int[] {i, j});
43 }
44 }
45 }
46
47 // If no words to be bolded are found, return the original string
48 if (intervals.isEmpty()) {
49 return s;
50 }
51
52 // Merge overlapping and adjacent intervals
53 List<int[]> mergedIntervals = new ArrayList<>();
54 int start = intervals.get(0)[0], end = intervals.get(0)[1];
55 for (int i = 1; i < intervals.size(); ++i) {
56 int newStart = intervals.get(i)[0], newEnd = intervals.get(i)[1];
57 if (end + 1 < newStart) {
58 // If the current end is before the new start, add the interval and update to the new one
59 mergedIntervals.add(new int[] {start, end});
60 start = newStart;
61 }
62 end = Math.max(end, newEnd); // Extend the current interval to include the new end
63 }
64 // Add the last merged interval
65 mergedIntervals.add(new int[] {start, end});
66
67 // Build the resulting string with bold tags
68 StringBuilder result = new StringBuilder();
69 int currentIndex = 0, intervalIndex = 0;
70 while (currentIndex < length) {
71 if (intervalIndex == mergedIntervals.size()) {
72 // Append remaining part of the string if there are no more intervals
73 result.append(s.substring(currentIndex));
74 break;
75 }
76
77 start = mergedIntervals.get(intervalIndex)[0];
78 end = mergedIntervals.get(intervalIndex)[1];
79
80 // Append non-bold part of the string
81 if (currentIndex < start) {
82 result.append(s.substring(currentIndex, start));
83 }
84
85 // Move to the next interval
86 ++intervalIndex;
87
88 // Append the bold part of the string
89 result.append("<b>").append(s.substring(start, end + 1)).append("</b>");
90 currentIndex = end + 1; // Update current index to the end of the bold interval
91 }
92
93 return result.toString();
94 }
95}
96
1#include <vector>
2#include <string>
3#include <algorithm>
4using namespace std;
5
6// Definition of a Trie node.
7class Trie {
8public:
9 vector<Trie*> children; // Vector to store children nodes.
10 bool isEndOfWord; // Flag to mark the end of a word.
11
12 // Constructor initializes the children with 128 null pointers for ASCII and sets isEndOfWord to false.
13 Trie() : children(128, nullptr), isEndOfWord(false) {}
14
15 // Function to insert a word into the Trie.
16 void insert(const string& word) {
17 Trie* node = this;
18 for (char c : word) {
19 if (!node->children[c]) {
20 node->children[c] = new Trie();
21 }
22 node = node->children[c];
23 }
24 node->isEndOfWord = true;
25 }
26};
27
28// Solution class to handle bold words problem.
29class Solution {
30public:
31 // Function to add bold tags around specific words found in a string.
32 string boldWords(vector<string>& words, string& s) {
33 Trie* trie = new Trie();
34 // Insert all words into the Trie.
35 for (const string& w : words) {
36 trie->insert(w);
37 }
38
39 int n = s.size();
40 // Pairs to store intervals that need to be bolded.
41 vector<pair<int, int>> pairs;
42
43 // Find all occurrences of given words in the string s.
44 for (int i = 0; i < n; ++i) {
45 Trie* node = trie;
46 for (int j = i; j < n; ++j) {
47 int idx = s[j];
48 // If there is no child, break current iteration.
49 if (!node->children[idx]) break;
50 node = node->children[idx];
51 // If a word ends here, store the interval.
52 if (node->isEndOfWord) pairs.emplace_back(i, j);
53 }
54 }
55
56 if (pairs.empty()) {
57 return s;
58 }
59
60 // Merge intervals if they overlap.
61 vector<pair<int, int>> mergedIntervals;
62 int start = pairs[0].first, end = pairs[0].second;
63 for (int i = 1; i < pairs.size(); ++i) {
64 int a = pairs[i].first, b = pairs[i].second;
65 if (end + 1 < a) {
66 mergedIntervals.emplace_back(start, end);
67 start = a;
68 end = b;
69 } else {
70 end = max(end, b);
71 }
72 }
73 mergedIntervals.emplace_back(start, end);
74
75 // Construct the result with bold tags.
76 string result;
77 int currentIndex = 0, intervalIndex = 0;
78 while (currentIndex < n) {
79 if (intervalIndex == mergedIntervals.size()) {
80 result += s.substr(currentIndex);
81 break;
82 }
83 start = mergedIntervals[intervalIndex].first, end = mergedIntervals[intervalIndex].second;
84 if (currentIndex < start) {
85 result += s.substr(currentIndex, start - currentIndex);
86 }
87 result += "<b>" + s.substr(start, end - start + 1) + "</b>";
88 currentIndex = end + 1;
89 ++intervalIndex;
90 }
91
92 delete trie; // Clean up the Trie.
93 return result;
94 }
95};
96
1// A vector is an array in TypeScript
2type TrieNode = Trie | null;
3
4// Definition of the Trie Node.
5class Trie {
6 // An array to store children nodes for ASCII characters.
7 public children: TrieNode[] = new Array(128).fill(null);
8 // A flag that indicates the end of the word.
9 public isEndOfWord: boolean = false;
10
11 // Method to insert a word into the Trie.
12 public insert(word: string): void {
13 let node: Trie = this;
14 for (const char of word) {
15 const index = char.charCodeAt(0);
16 if (!node.children[index]) {
17 node.children[index] = new Trie();
18 }
19 node = node.children[index]!;
20 }
21 node.isEndOfWord = true;
22 }
23}
24
25// Utility function to add bold tags around specified words found in a string.
26function boldWords(words: string[], s: string): string {
27 // Create a Trie and insert all words into it.
28 const trie = new Trie();
29 for (const word of words) {
30 trie.insert(word);
31 }
32
33 const n: number = s.length;
34 // Array to store intervals that need to be bolded.
35 const intervals: Array<[number, number]> = [];
36
37 // Find all occurrences of words in the string s.
38 for (let i = 0; i < n; ++i) {
39 let node: Trie | null = trie;
40 for (let j = i; j < n; ++j) {
41 const index = s.charCodeAt(j);
42 // If no child node is found, break from the loop.
43 if (!node.children[index]) break;
44 node = node.children[index]!;
45 // If a word ends here, store the interval.
46 if (node.isEndOfWord) intervals.push([i, j]);
47 }
48 }
49
50 // Merge overlapping intervals.
51 const mergedIntervals: Array<[number, number]> = [];
52 for (let i = 0; i < intervals.length; i++) {
53 const [start, end] = intervals[i];
54 // If the intervals do not overlap, add to the merged list.
55 if (mergedIntervals.length === 0 || mergedIntervals[mergedIntervals.length - 1][1] < start) {
56 mergedIntervals.push([start, end]);
57 } else {
58 // If they do overlap, merge with the last interval.
59 mergedIntervals[mergedIntervals.length - 1][1] = Math.max(end, mergedIntervals[mergedIntervals.length - 1][1]);
60 }
61 }
62
63 // Construct the result with bold tags.
64 let result = '';
65 let currentIndex = 0;
66 for (const [start, end] of mergedIntervals) {
67 // Append non-bold text to the result.
68 result += s.slice(currentIndex, start);
69 // Append bold text to the result.
70 result += `<b>${s.slice(start, end + 1)}</b>`;
71 currentIndex = end + 1;
72 }
73 // Append any remaining non-bold text.
74 result += s.slice(currentIndex);
75
76 // Return the final result with added bold tags.
77 return result;
78}
79
80export { Trie, boldWords };
81
Time and Space Complexity
Time Complexity
The overall time complexity of the provided code can be split into two main parts: building the trie and finding & merging the intervals.
-
Building the Trie: Inserting a word into the trie has a time complexity of
O(m)
, wherem
is the length of the word being inserted. Since we insert allk
words in thewords
list into the trie, and let's assume the average length of the words ism
, the total time for building the trie isO(k * m)
. -
Finding & Merging Intervals: For each starting position
i
in the strings
of lengthn
, we look for words in trie that match the substring starting ati
. Comparing the substring ofs
with the trie has a worst-case time complexity ofO(n)
if the substring includes the whole strings
. Thus, the worst-case scenario for iterating through the entire string with overlaps can beO(n^2)
.
Then, we merge the intervals in the pairs
list. Merging intervals is linear with respect to the number of intervals found, let's call this number p
. In the worst case, p
can be n
(for example, when every character is the start of a word in the trie), so this operation is O(p) = O(n)
.
The total time complexity of the bolding part is then O(n^2 + n)
, which simplifies to O(n^2)
.
Therefore, the total time complexity is O(k * m + n^2)
.
Space Complexity
The space complexity is dominated by the trie storage and the intervals list:
-
Space for Trie: Since we are using a fixed-size array
128
for each trie node to represent all possible ASCII characters, and assuming an average depth ofm
, the space for the trie in the worst case can beO(128^m)
, which simplifies toO(1)
sincem
is bounded by the fixed character set size. In practical average case scenarios (limited number of total characters), the space complexity isO(k * m)
wherek
is the number of unique words andm
is the average length of the words. -
Space for Interval Pairs: The pairs list, in the worst case, can have an entry for every character in
s
, which would give us a space complexity ofO(n)
. Since we are merging intervals, this list can potentially be smaller, but we consider the worst case.
Additionally, the space for the output list ans
is O(n)
since it contains, at most, the characters from s
along with additional tags.
Thus, the total space complexity of the function is O(k * m + n)
.
Learn more about how to find time and space complexity quickly using problem constraints.
Consider the classic dynamic programming of longest increasing subsequence:
Find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.
For example, the length of LIS for [50, 3, 10, 7, 40, 80]
is 4
and LIS is
[3, 7, 40, 80]
.
What is the recurrence relation?
Recommended Readings
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
Patterns The Shortest Path Algorithm for Coding Interviews The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we can determine the
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Got a question? Ask the Monster Assistant anything you don't understand.
Still not clear? Submit the part you don't understand to our editors. Or join our Discord and ask the community.