2982. Find Longest Special Substring That Occurs Thrice II

MediumHash TableStringBinary SearchCountingSliding Window
Leetcode Link

Problem Description

In the given problem, we work with a string s comprised entirely of lowercase English letters. We are concerned with identifying what is termed as a "special" substring within this string. A "special" substring is one that is made up only of a single character. For instance, a substring such as "aaa" is considered special because it contains only the letter 'a', while "abc" would not be considered special as it contains multiple distinct characters.

The task is to find the length of the longest special substring that occurs in the string s at least three times. If there exists no such substring, the function should return -1. The requirement of a substring to be contiguous and non-empty means that the characters must be adjacent and the length must be at least one. This problem thereby asks us to analyze the string for repeated patterns of single letters while optimizing for the longest pattern that appears three or more times.

Intuition

To solve this problem efficiently, we can consider the properties that would make our search quicker. Here, the intuition can involve a recognition that if a special substring of a particular length exists at least three times, then a shorter substring made from cutting any character from the ends of the longer special substring would also exist at least three times. This property of the problem lends itself to a binary search approach. Why? Because we can find the boundary points between lengths that can form such substrings and lengths that cannot.

We use binary search to hone in on the longest length of a special substring which occurs three or more times. We start with a range that begins at 0 (no special substring) and ends at the length of the string (meaning the whole string is a special substring). The aim is to narrow down this range until we find the longest length that matches our criteria.

Our binary search relies on a helper function, check(x), which validates whether a special substring of length x exists at least three times in s. This function counts occurrences of each letter, as part of a potential special substring, and sees if it can form a valid substring of length x that appears at least three times.

The solution's approach entails iterating over the string, tracking and counting the lengths of uninterrupted sequences of the same character, and then applying the binary search technique to determine the longest special substring length that appears thrice or more. This requires loop controls and conditional checks to handle the unique aspects of counting uninterrupted sequences and appropriately updating our binary search bounds.

Learn more about Binary Search and Sliding Window patterns.

Solution Approach

The provided solution utilizes binary search to find the length of the longest special substring that occurs at least thrice. Binary search is an efficient algorithm for finding an item from a sorted list of items. It works by repeatedly dividing in half the portion of the list that could contain the item, until you've narrowed the possible locations to just one.

However, binary search alone is not enough to solve this problem. Therefore, it's combined with a sliding window counting technique to verify the binary search condition. The sliding window technique is useful for problems where we need to examine a contiguous sequence of elements from an array or string, and this fits perfectly for counting special substrings.

In the solution, the binary search algorithm is used to narrow down the maximum length of the special substring that repeats at least three times:

  1. We initiate the search with the left boundary l set to 0 and the right boundary r set to the length of the string n.
  2. As customary with binary search, the middle point mid is calculated as (l + r + 1) >> 1. The >> operator here is a bitwise shift to the right, which effectively divides by two.
  3. At each mid, we verify if a special substring of that length appears three times using the check(x) function.

The check(x) function works as follows:

  • It initializes a defaultdict cnt to count occurrences of characters for possible special substrings of length x.
  • A loop iterates through the string and when a sequence of identical characters is found, it increments the count for that character in the count dictionary by the number of times a substring of length x can be formed from the sequence.
  • This count is calculated by taking the length of the sequence j - i and subtracting x - 1 (the adjustment -1 is because a substring of length x will start within the first x - 1 positions of a special sequence).
  • It returns True if any character has been found to form a substring of length x at least three times, indicated by a count of three or more in cnt.

This process is repeated until the binary search boundaries converge. If check(mid) is True, l is updated to mid; otherwise, r is updated to mid - 1. This narrows down the search space and hones in on the longest special substring.

The binary search ends when l >= r, at which point we have found the maximum length that satisfies the condition, or we have determined that no such length exists. If l == 0, then there is no special substring that occurs at least three times, and -1 is returned. Otherwise, l itself represents the length of the longest special substring that meets the conditions.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's illustrate the solution approach with an example.

Suppose our string s is "aaabaaaacaaabaa". We are looking for the longest special substring that occurs at least three times.

  1. Initiate Boundary Values:

    • Let's start with the left boundary l = 0 and right boundary r = length of s, which is 15.
  2. Binary Search:

    • We now perform a binary search for the maximum length of a special substring. The mid-point mid is calculated as (l + r + 1) >> 1.
    • Initially, mid will be (0 + 15 + 1) >> 1 = 8.
  3. Check Mid-Point:

    • We need to check if there's a special substring of length 8 that appears at least three times using our check(8) function.
  4. Apply check(x) function:

    • We traverse through the string s and count occurrences of each character that could make up a special substring of length x = 8.
    • As we scan s, we find a sequence "aaaabaaa" from index 0 to 7. Since this doesn't form a special substring of length 8, we continue.
    • Then we encounter "aaaa" from index 4 to 7 and "aaaa" from index 8 to 11, but each is too short.
    • Finally, from index 11 to the end we see "aaabaa". Again, we don't have a continuous stretch of 8 identical characters.
  5. Cannot Form a Special Substring of Mid Length:

    • Since we cannot form a special substring of length 8 three times, we update r to be mid - 1, so r becomes 7.
  6. Repeat Binary Search with New Bounds:

    • We calculate a new mid with l = 0 and r = 7. The new mid is (0 + 7 + 1) >> 1 = 4.
    • We now check for a special substring of length 4 using our check(4) function.
  7. Apply check(x) function again:

    • Traversing through s again, we find subsequence "aaaa" from index 4 to 7 and another "aaaa" from index 8 to 11. Each of these is a special substring of length 4, and they occur twice.
    • Continuing our search, from index 11 to 14, we have "aaab". This part includes three 'a's, which can form a substring of length 4 starting from indices 11, 12, and 13.
  8. Special Substring of Mid Length Found:

    • Since we can form a special substring of length 4 at least three times, our check(4) returns True.
    • Now we set l to mid, so l becomes 4.
  9. Narrow Down Further:

    • With the updated l, we now have l = 4 and r = 7. A new mid calculation gives us (4 + 7 + 1) >> 1 = 6.
    • Checking for a special substring of length 6, we find we cannot form such a substring thrice in our string.
  10. Converge Boundaries:

  • Since check(6) returns False, we update r to mid - 1, so r becomes 5.
  • l = 4 and r = 5 now, and our next mid is (4 + 5 + 1) >> 1 = 5.
  • check(5) also fails as we can't find a special substring of this length appearing three times.
  1. Final Boundaries:
  • We end up with l = 4 and r = 4, since we reduced r to 4 after the last check failed. At this point, since l = r, binary search concludes.
  1. Result:
  • Since l is not updated from 4 after the last iteration, it means the longest special substring that occurs at least three times has a length of 4.
  • If l was still 0, we would return -1, indicating no such substring was found.

Using this example, we have walked through the binary search and sliding window counting technique to find the length of the longest special substring that occurs at least three times, which in this case is 4.

Solution Implementation

1from collections import defaultdict
2
3class Solution:
4    def maximumLength(self, s: str) -> int:
5        # Helper function that checks whether there's a substring of length x
6        # with at least 3 repeated characters after removing x - 1 characters
7        def is_valid(x: int) -> bool:
8            count = defaultdict(int)
9            i = 0
10            while i < string_length:
11                j = i + 1
12                while j < string_length and s[j] == s[i]:
13                    j += 1
14                count[s[i]] += max(0, j - i - x + 1)
15                i = j
16            return max(count.values()) >= 3
17
18        # Main function logic
19        string_length = len(s)  # Length of the input string
20        left, right = 0, string_length  # Initialize the binary search bounds
21
22        # Perform a binary search to find the maximum length
23        while left < right:
24            mid = (left + right + 1) // 2
25            if is_valid(mid):
26                # If valid, search the upper half
27                left = mid
28            else:
29                # If not valid, search the lower half
30                right = mid - 1
31
32        # Return -1 if left is 0, indicating no such length is found
33        # Otherwise, return the length found
34        return -1 if left == 0 else left
35
1class Solution {
2    private String inputString;
3    private int stringLength;
4
5    // Calculates the maximum length of the substring that satisfies the conditions
6    public int maximumLength(String s) {
7        this.inputString = s;
8        this.stringLength = s.length();
9      
10        int left = 0, right = stringLength;
11      
12        // Binary search for the maximum length of the valid substring
13        while (left < right) {
14            int mid = (left + right + 1) >> 1;  // Same as (left + right + 1) / 2, with integer division
15            if (isValidSubstringLength(mid)) {
16                left = mid;
17            } else {
18                right = mid - 1;
19            }
20        }
21      
22        // Return the appropriate length or -1 if no valid length is found
23        return left == 0 ? -1 : left;
24    }
25
26    // Helper method to check if a specific length x can be a valid substring length
27    private boolean isValidSubstringLength(int x) {
28        int[] charCount = new int[26];  // Array to count occurrences of each letter
29
30        // Iterate over characters in the input string
31        for (int i = 0; i < stringLength;) {
32            int j = i + 1;
33            // Continue until end of string or until a different character is found
34            while (j < stringLength && inputString.charAt(j) == inputString.charAt(i)) {
35                j++;
36            }
37            int charIndex = inputString.charAt(i) - 'a';  // Calculate index for character 'a' to 'z'
38            charCount[charIndex] += Math.max(0, j - i - x + 1);
39          
40            // If any character count reaches 3 or more, return true (valid substring length)
41            if (charCount[charIndex] >= 3) {
42                return true;
43            }
44          
45            // Move to the next group of characters
46            i = j;
47        }
48      
49        // No valid substring length found for the given x
50        return false;
51    }
52}
53
1class Solution {
2public:
3    int maximumLength(string s) {
4        int stringSize = s.size(); // Size of the input string
5        int left = 0;  // Initialize the left pointer for binary search
6        int right = stringSize; // Initialize the right pointer for binary search
7
8        // "check" is a lambda function that checks if there's a valid substring when "maxLength" is applied
9        auto check = [&](int maxLength) {
10            int charCount[26] = {}; // Initialize character count array with 0s
11          
12            // Iterate over the string 
13            for (int i = 0; i < stringSize;) {
14                int j = i + 1;
15                // Count the length of a sequence of identical characters starting from position i
16                while (j < stringSize && s[j] == s[i]) {
17                    ++j;
18                }
19              
20                int charIndex = s[i] - 'a'; // Convert character to an index (0-25)
21              
22                // Increase the count of this character sequence by the excess amount
23                // over the allowed maxLength, if any
24                charCount[charIndex] += max(0, j - i - maxLength + 1);
25              
26                // If there are 3 or more occurrences, the check failed
27                if (charCount[charIndex] >= 3) {
28                    return true;
29                }
30              
31                // Move to the next sequence of different characters
32                i = j;
33            }
34            // If we've gone through the string without failing, the maxLength is valid
35            return false;
36        };
37      
38        // Perform binary search to find the maximum valid substring length
39        while (left < right) {
40            int mid = (left + right + 1) >> 1; // Calulate middle point
41
42            // If the check passes for this mid value then we can try higher values
43            if (check(mid)) {
44                left = mid;
45            } else {
46                // Otherwise we look for a smaller maximum by reducing our 'right' bound
47                right = mid - 1;
48            }
49        }
50      
51        // If left is still 0 then there's no valid substring, otherwise return the max length found
52        return left == 0 ? -1 : left;
53    }
54};
55
1function maximumLength(s: string): number {
2    // n represents the length of the string
3    const n = s.length;
4
5    // 'left' and 'right' will represent the bounds of the binary search
6    let left = 0;
7    let right = n;
8
9    /**
10     * Helper function to check if there's a sequence of three or more identical characters
11     * after removing 'x' number of each character.
12     * 
13     * @param removeCount - the count of each character to remove
14     * @returns a boolean value indicating whether the condition is satisfied
15     */
16    const isExceedingSequence = (removeCount: number): boolean => {
17        // countArray stores the number of excess characters after removals
18        const countArray: number[] = Array(26).fill(0);
19    
20        // Iterate over the string
21        for (let i = 0; i < n; ) {
22            // Find the end of the current sequence of the same character
23            let j = i + 1;
24            while (j < n && s[j] === s[i]) {
25                j++;
26            }
27        
28            // Find the index of the current character in the alphabet
29            const charIndex = s[i].charCodeAt(0) - 'a'.charCodeAt(0);
30        
31            // Update the countArray for this character to include excess characters
32            countArray[charIndex] += Math.max(0, j - i - removeCount);
33        
34            // If more than 2 characters remain, sequence exceeds the limit
35            if (countArray[charIndex] >= 3) {
36                return true;
37            }
38        
39            // Move to the next sequence of characters
40            i = j;
41        }
42        return false;
43    };
44
45    // Perform binary search to find the maximum length
46    while (left < right) {
47        // Find the mid point, leaning to the right to prevent infinite loop
48        const mid = Math.floor((left + right + 1) / 2);
49      
50        // Use isExceedingSequence to check if mid is a valid remove count
51        if (isExceedingSequence(mid)) {
52            left = mid; // If true, a larger remove count still satisfies the condition
53        } else {
54            right = mid - 1; // If false, we need a smaller remove count
55        }
56    }
57
58    // If left is 0, no valid configuration was found, so return -1.
59    // Otherwise, return the found maximum length.
60    return left === 0 ? -1 : left;
61}
62

Time and Space Complexity

The time complexity of the code is O(n log n), where n is the length of the input string s. This complexity arises because the code performs a binary search (which has a complexity of log n) within the range of the length of the string. During each step of the binary search, the check function is called, which in the worst case can iterate over the entire string, leading to O(n) complexity for each search step.

The binary search is bounded by [0, n], and for each midpoint calculated, the check function is utilized. The check function does a single pass over the string while maintaining a running count of subsequences which exceed the length x - 1. In total, we combine the binary search and the linear check for each mid, amounting to O(n log n) complexity.

The space complexity of the code is O(1) for the size of the character set (denoted |\Sigma|) as the problem specifies a fixed lowercase English letters character set which contains 26 letters. However, because we store a count of characters, the space complexity is actually O(|\Sigma|), which in this case remains constant because the character set does not grow with the input size n. Hence the space complexity is O(26), which simplifies to O(1).

Learn more about how to find time and space complexity quickly using problem constraints.


Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:
Question 1 out of 10

The three-steps of Depth First Search are:

  1. Identify states;
  2. Draw the state-space tree;
  3. DFS on the state-space tree.

Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!