2781. Length of the Longest Valid Substring


Problem Description

You're given a string word and an array of strings forbidden. The goal is to find the length of the longest substring within word that doesn't contain any sequence of characters present in forbidden. A substring is simply a contiguous sequence of characters within another string and it can even be an empty string.

To make a string "valid" in the context of this problem, none of its substrings can match any string present in forbidden. It's your task to figure out what's the maximum length of such a "valid" substring within the given word.

Intuition

The intuition behind the solution revolves around the concept of dynamic window sliding coupled with efficient substring checking. The brute force method of checking every possible substring would be too slow, especially since string word can be quite large. Instead, we need to find a quicker way of validating substrings against the forbidden list.

We create a sliding window that expands and contracts as needed while scanning the word from left to right. The set s allows for constant-time checks if a substring is in forbidden—this is much faster than repeatedly scanning a list.

Our approach is as follows:

  • Start with two pointers i and j, initializing i to the beginning of the word and j to increment over each character.
  • At every step, check the current window from j down to i to see if any substring is in forbidden. If it is, move i past the forbidden substring's start to make a valid window again.
  • Keep track of the maximum valid window seen so far as ans.

One key optimization lies in the knowledge that the forbidden substrings have a maximum possible length. Thus the inner loop doesn't need to check further back than this maximum length from j, which prevents needless comparisons and significantly speeds up the algorithm.

This solution effectively balances between the thoroughness of checking all relevant substrings and the speed of skipping unnecessary checks, achieving the task within acceptable time complexity.

Learn more about Sliding Window patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Which of the following is a min heap?

Solution Approach

In the provided solution, a set data structure is utilized to store the forbidden strings. The choice of a set is crucial because it provides O(1) lookup time complexity, which means checking if a substring is in forbidden is very fast, regardless of the size of forbidden.

Here is the step-by-step explanation of the code:

  1. The set s is created from the forbidden array to take advantage of fast lookups.
  2. Two pointers are introduced:
    • i is initialized to 0, pointing to the beginning of the current valid substring.
    • j is used to iterate through each character in word.
  3. A variable ans is used to keep track of the length of the longest valid substring found so far.
  4. We iterate over the string word with the help of j.
    • During each iteration, we look for any potential forbidden substrings within a window --- the substring from [k : j + 1], where k will iterate backwards from j but not further back than 10 places (owing to the maximum length of forbidden substrings) or before i, whichever is lesser.
    • The use of max(j - 10, i - 1) ensures that we don't check beyond the necessary window backwards from the current character.
  5. If a forbidden substring is found, i is moved to one character past the start of the forbidden substring. This effectively skips the forbidden part, searching for the next valid starting point.
  6. ans is updated to be the maximum of its current value or j - i + 1, which represents the length of the new valid substring from the current position j back to i.
  7. The iteration continues until the end of word is reached, with ans being updated as necessary.
  8. At the end, ans holds the length of the longest valid substring, and it is returned as the solution.

The key algorithmic pattern here is the sliding window, employed to dynamically adjust the substring being inspected. Effective use of data structuring and algorithmic patterns are what make this solution both elegant and efficient.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Consider the classic dynamic programming of fibonacci numbers, what is the recurrence relation?

Example Walkthrough

Let's take an example to understand how this solution works. Suppose we have word = "abcdefgh" and forbidden = ["bc", "fg"]. We want to find the longest substring of word that doesn't contain any sequence of characters from forbidden.

We start by creating a set s containing forbidden strings for quick lookup: s = {"bc", "fg"}. We initialize two pointers: i at index 0, and j which will traverse through word.

The ans variable starts at 0 to keep track of the length of the longest valid substring found so far. Now, we begin iterating over word:

  1. With j at 0, our window is ["a"], and since it doesn't match any forbidden strings, we move on.

  2. With j at 1, our window is ["ab"], which is also not forbidden. We continue.

  3. With j at 2, our window is ["abc"], and checking from the previous 10 characters (or from i if less than 10), we find a match with s because "bc" is in the window. We update i to one position after where "bc" started, so i becomes 2.

  4. Now j moves to 3, and our new window starts from i (2) to j (3), which is ["cd"]. It is valid, so we continue.

  5. With j at 4, our window is ["cde"], still valid.

  6. With j moving up to 5, our window is now ["cdef"], which is still not in forbidden.

  7. At j = 6, our window becomes ["cdefg"], which contains "fg". We update i to move past the start of "fg", making i = 6.

  8. As j moves to 7, our window is ["h"], since i was just updated to 6. This window is valid.

Throughout the process, we update ans to be the maximum length of valid substrings we have found. The sequence "cde" was the longest valid substring before we encountered "fg", with a length of 3. Therefore, ans is 3.

In this case, the longest substring of word without any forbidden sequences would be "cde" which has a length of 3. This is the value we return.

Solution Implementation

1class Solution:
2    def longestValidSubstring(self, word: str, forbidden: list[str]) -> int:
3        # Create a set from the `forbidden` list for O(1) access time during checks.
4        forbidden_set = set(forbidden)
5      
6        # Initialize the maximum length of the valid substring (ans) and the starting index (start) of the current substring.
7        ans = start = 0
8      
9        # Iterate over each character index `end` of the `word`.
10        for end in range(len(word)):
11            # Check all substrings that end at `end` and do not exceed 10 characters in length.
12            # `max(end - 10, start - 1)` ensures we only consider substrings of up to 10 characters
13            # and do not reexamine substrings already invalidated by a forbidden word.
14            for check_start in range(end, max(end - 10, start - 1), -1):
15                # If the substring from `check_start` to `end` (inclusive) is in the forbidden set,
16                # update the `start` index to one position after `check_start`
17                # and break from the loop to start checking the next end position.
18                if word[check_start:end + 1] in forbidden_set:
19                    start = check_start + 1
20                    break
21          
22            # Calculate the length of the current valid substring and update `ans` if it's greater than the previous maximum.
23            ans = max(ans, end - start + 1)
24      
25        # Return the length of the longest valid substring that doesn't contain a forbidden word.
26        return ans
27
1class Solution {
2    public int longestValidSubstring(String word, List<String> forbiddenSubstrings) {
3        // A set to store the forbidden substrings for faster lookup
4        Set<String> forbiddenSet = new HashSet<>(forbiddenSubstrings);
5      
6        // Initialize variables for the answer and the length of the word
7        int maxValidLength = 0;
8        int wordLength = word.length();
9      
10        // Two pointers for iterating through the string
11        // i - start of the current valid substring
12        // j - end of the current valid substring
13        for (int i = 0, j = 0; j < wordLength; ++j) {
14            // Check substrings within the current window from [i,j]
15            for (int k = j; k > Math.max(j - 10, i - 1); --k) {
16                // If a forbidden substring is found within this window
17                if (forbiddenSet.contains(word.substring(k, j + 1))) {
18                    // Move the start index after the forbidden substring
19                    i = k + 1;
20                    break;
21                }
22            }
23            // Calculate the length of the current valid substring
24            // And update the maximum if this is the longest so far
25            maxValidLength = Math.max(maxValidLength, j - i + 1);
26        }
27      
28        // Return the length of the longest valid substring found
29        return maxValidLength;
30    }
31}
32
1#include <string>
2#include <vector>
3#include <unordered_set>
4using namespace std;
5
6class Solution {
7public:
8    /**
9     * Finds the longest valid substring that does not contain any forbidden substrings.
10     * @param word The main string in which to look for the substring.
11     * @param forbidden A vector of forbidden substrings.
12     * @return The length of the longest valid substring.
13     */
14    int longestValidSubstring(string word, vector<string>& forbidden) {
15        // Convert the forbidden vector to a set for efficient lookup
16        unordered_set<string> forbiddenSet(forbidden.begin(), forbidden.end());
17      
18        // Variable to store the answer, i.e., the length of the longest valid substring
19        int maxLength = 0;
20        // Variable to store the length of the word
21        int wordLength = word.size();
22
23        // Two pointers technique to iterate through the string
24        // `start` denotes the start index of the current valid substring
25        // `end` denotes the end index of the current valid substring
26        for (int start = 0, end = 0; end < wordLength; ++end) {
27            // Check all possible forbidden substrings within the last 10 characters
28            for (int k = end; k > max(end - 10, start - 1); --k) {
29                // If a forbidden substring is found, move the start pointer past it
30                if (forbiddenSet.count(word.substr(k, end - k + 1))) {
31                    start = k + 1;
32                    break;
33                }
34            }
35            // Update the maximum length with the length of the current valid substring
36            maxLength = max(maxLength, end - start + 1);
37        }
38        // Return the length of the longest valid substring found
39        return maxLength;
40    }
41};
42
1// Defines a function to find the length of the longest substring without any forbidden words.
2function longestValidSubstring(word: string, forbidden: string[]): number {
3    // Create a Set to efficiently check if a substring is forbidden.
4    const forbiddenSet: Set<string> = new Set(forbidden);
5    // Get the length of the word to iterate through.
6    const wordLength = word.length;
7    // Initialize the answer to store the length of the longest valid substring.
8    let maxLength = 0;
9  
10    // Initialize two pointers for the sliding window technique.
11    // 'start' tracks the start index of the current substring.
12    // 'end' tracks the end index of the current substring.
13    for (let start = 0, end = 0; end < wordLength; ++end) {
14        // Check all possible substrings from the current 'end' to 'start',
15        // but we limit the substring length to a maximum of 10 characters.
16        for (let k = end; k > Math.max(end - 10, start - 1); --k) {
17            // If a forbidden substring is found, move 'start' past the forbidden word,
18            // effectively shrinking the current window and continue searching.
19            if (forbiddenSet.has(word.substring(k, end + 1))) {
20                start = k + 1;
21                break;
22            }
23        }
24        // Update the length of the longest valid substring found.
25        maxLength = Math.max(maxLength, end - start + 1);
26    }
27    // Return the length of the longest valid substring.
28    return maxLength;
29}
30
Not Sure What to Study? Take the 2-min Quiz:

In a binary min heap, the maximum element can be found in:

Time and Space Complexity

Time Complexity

The time complexity of the given code can be analyzed by going through the nested loops and operations within them. The outer loop runs for len(word) times, where 'word' is the input string and its length is denoted by 'n'. The inner loop at worst case runs for a maximum of 10 iterations since it looks back from the current position j only up to 10 characters because of max(j - 10, i - 1) condition.

Hence, in the worst-case scenario, for each character we might look at the next 10 characters to check if the substring is in the forbidden set. Since checking whether a substring is in a set is O(1) operation due to hashing, the worst case would consider these constant time checks up to 10 times for each character.

Combining both loops, the worst-case time complexity can therefore be simplified to O(n), where each character is checked against a constant number of potential forbidden substrings.

Space Complexity

The space complexity consists of the space required for the set 's' and a few integer variables. Since 's' contains the forbidden substrings, its space complexity will be O(m), where 'm' is the total length of all forbidden substrings combined.

The other variables, 'ans', 'i', and 'j' are constant space, adding O(1) to the total space complexity.

Therefore, the total space complexity of the code is O(m + 1) which simplifies to O(m), denoting the space taken up by the set of forbidden substrings.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which of the following array represent a max heap?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫