2981. Find Longest Special Substring That Occurs Thrice I

MediumHash TableStringBinary SearchCountingSliding Window
Leetcode Link

Problem Description

In this problem, we are asked to find the longest special substring in a given string s consisting of lowercase English letters. A special substring is defined as a substring composed of a single unique character. The catch is that we are only interested in special substrings that occur at least three times within the original string. If such a substring doesn't exist, then we must return -1. A key thing to note is that these occurrences can overlap.

For example, in the string "aaabaaa", the special substring "aaa" occurs twice and it is made up of a single character a.

Intuition

To approach this problem, the solution employs binary search in conjunction with a sliding window counting mechanism. This combination is effective because of the nature of special substrings: the property of being special is preserved in a smaller substring of a special substring. In simpler terms, if you have a special substring of length 5, it is guaranteed that there is a special substring of length 4, 3, 2 and 1 – that's the monotonicity which enables binary search.

The binary search operates by trying to find the longest possible length x for which a special substring of that length appears at least thrice. We start with a lower bound l = 0 (as initially, we are not sure if there is a special substring) and an upper bound r = n which is the length of the string because no substring can be longer than the string itself.

At every step, the binary search checks the midpoint mid of the current l and r. The check(x) function helps in determining if there is a special substring of length mid that occurs thrice. If such a substring exists, we shift our lower bound to mid to see if there's an even longer special substring meeting the criteria. If it doesn't exist, we reduce our upper bound to mid - 1.

The check(x) function uses a sliding window to count the frequency of each character substring of length x. The sliding window keeps extending as long as the characters are the same. As soon as a character changes, it checks if the length of this window minus x plus 1 (which is the count of valid substrings of the desired length) is non-negative and adds this number to the count of occurrences for the current character. If the maximum value among these counts is at least 3, then a special substring of length x appears at least thrice, otherwise, it does not.

Finally, binary search uses these outcomes to narrow down the maximum length of the special substring that appears at least thrice, with the search ending when l and r converge. The result is either -1 (if l is 0, meaning no special substring is found) or l, the length of the longest special substring found.

Learn more about Binary Search and Sliding Window patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Which of the following array represent a max heap?

Solution Approach

The given solution uses a binary search algorithm in conjunction with a sliding window technique to efficiently find the special substring.

Binary Search: Given the monotonic nature of the problem, where a longer special substring guarantees the existence of a shorter one, binary search can be used. Binary search is a divide-and-conquer algorithm that divides the search interval in half repeatedly to find the target value. In this case, instead of locating a specific value, we're trying to find the maximum length x for which a valid special substring exists.

  1. Initialize the binary search boundaries with l = 0 and r = n, where n is the length of s.
  2. In each iteration, calculate the mid value using the formula mid = (l + r + 1) >> 1.
  3. Use the check(x) function to determine whether there is a special substring of length mid that occurs at least three times.
  4. If such a substring exists, move the lower boundary l to mid, since we want to explore the possibility of a longer valid substring.
  5. If not, adjust the upper boundary r to mid - 1 since no special substring of length mid satisfies the condition and thus, we need to look for shorter lengths.

This process continues until l and r meet, and the longest length for a valid special substring is found.

Sliding Window: For checking the existence of special substrings of a certain length x, a sliding window mechanism is utilized within check(x):

  1. A counter dictionary cnt keeps track of the number of windows (special substrings) of length x for each character.
  2. Iterate through the string using an index i that starts at the beginning of a character run and continues until the run ends, marked by index j.
  3. While the characters at i and j are the same, increment j. This stretch from i to j-1 is homogeneous and thus forms a special substring.
  4. Once we stop at j where s[j] != s[i], we update the count for character s[i] by adding the number of special substrings of length x that can be formed between i and j-1. The count is maxed with zero to avoid negative addition, which could occur when the window doesn't support a substring of length x.
  5. Move i to j and repeat the counting for the new special substring.

The purpose of the sliding window is to avoid re-computation by dynamically changing the size of the considered substring.

Returning the Result: After binary search concludes, we have the boundary l as the length of the potential longest special substring. If l is 0, it means no such substring is found that satisfies the problem's requirements and thus we return -1. Otherwise, l is the length of the longest special substring that occurs at least thrice, and it is returned as the solution.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which of the following is equvalent to O(3*2^n + n^3 + n!+ log n)?

Example Walkthrough

Let’s consider the string "aabbaaaaabb". We want to find the length of the longest special substring where a special substring is a substring composed of a single unique character, and it must occur at least three times.

Step-by-Step Execution:

Binary Search Initialization: We start with l = 0 and r = 11 (the length of our example string).

First Binary Search Iteration - Checking Middle Length: mid = (l + r + 1) >> 1, yields mid = 6. We must now check if there is a special substring of length 6 that appears at least three times using our check(x) function.

Sliding Window Check For mid=6: A sliding window from s[0] to s[5] is "aabbaa". This is not a special substring as it contains more than one unique character. We move to the next set of characters.

Our sliding window now checks from s[4] to s[9] which is "aaaaab". Once again, this is not a special substring. We continue this approach and find that no special substring of length 6 appears thrice. So we move our upper boundary, r = mid - 1, to r = 5.

Second Binary Search Iteration: Now, mid = (l + r + 1) >> 1, yielding mid = 3. Using our check(x) function with mid = 3, we setup a sliding window.

Sliding Window Check For mid=3: For i = 0 to j = 2, our substring is "aab". This does not meet our criteria.

For i = 2 to j = 4, our substring is "bba". Again, this does not meet our criteria.

For i = 4 to j = 6, our substring is "aaa". This does meet our criteria and occurs consecutively until j = 10 for a total of 7 occurrences.

Since we have found at least three occurrences of a special substring of length 3, our check(x) function returns true.

We now move our lower boundary l to mid, making l = 3.

Continuing Binary Search Iteration: Assuming the binary search continues, we would repeatedly check the middle length using the check(x) function with the sliding window technique, updating l and r accordingly, until l and r converge.

Since in our example, l = 3 already meets the condition and we know that a substring of length 6 is not valid, subsequent iterations, not shown here for brevity, would find that length 4 or 5 does not have three occurrences, meaning ultimately l would remain 3.

Returning the Result: After the binary search concludes, l would be the length of the longest special substring that occurs at least thrice. Given our previous example and assuming no longer valid substring was found beyond length 3, we would return l = 3 as the solution to our example input "aabbaaaaabb".

Solution Implementation

1from collections import defaultdict
2
3class Solution:
4    def maximumLength(self, s: str) -> int:
5        # Helper function to check if after removing at most x adjacent duplicates 
6        # from the string, any character occurs at least 3 times in a row.
7        def is_valid(x: int) -> bool:
8            char_counts = defaultdict(int)
9            index = 0
10            while index < string_length:
11                next_index = index + 1
12                # Count the length of the current sequence of identical characters.
13                while next_index < string_length and s[next_index] == s[index]:
14                    next_index += 1
15                # Update the count for this character by the length of the sequence 
16                # minus the allowed number x, adding at least 1 to count each sequence.
17                char_counts[s[index]] += max(0, next_index - index - x + 1)
18                # Move to the next sequence of characters.
19                index = next_index
20            # Check if any character's count is at least 3 after the removal.
21            return max(char_counts.values()) >= 3
22
23        # Initialize the start and end bounds of the binary search.
24        string_length = len(s)
25        left, right = 0, string_length
26        # Perform a binary search to find the maximum valid x.
27        while left < right:
28            mid = (left + right + 1) >> 1  # Right-biased mid calculation
29            # If the current mid value satisfies the condition, move
30            # the search space to the right half including mid itself.
31            if is_valid(mid):
32                left = mid
33            else:
34                # Otherwise, move the search space to the left half excluding mid.
35                right = mid - 1
36      
37        # If left equals 0, then it's not possible to remove characters 
38        # such that any character appears at least 3 times. Return -1.
39        # Otherwise, return the length of the maximum sequence after removal.
40        return -1 if left == 0 else left
41
1class Solution {
2    private String inputString;
3    private int stringLength;
4
5    // Method to find the maximum length of substring satisfying specific criteria
6    public int maximumLength(String inputString) {
7        this.inputString = inputString;
8        this.stringLength = inputString.length();
9
10        int left = 0, right = stringLength;
11
12        // Binary search to find the maximum substring length
13        while (left < right) {
14            int mid = (left + right + 1) >> 1;
15            if (isCriteriaSatisfied(mid)) {
16                left = mid;
17            } else {
18                right = mid - 1;
19            }
20        }
21
22        // If unable to find such a length, return -1. Otherwise, return the length found.
23        return left == 0 ? -1 : left;
24    }
25
26    // Helper method to check if a specific length satisfies the criteria
27    private boolean isCriteriaSatisfied(int targetLength) {
28        int[] frequencyCounter = new int[26];
29
30        // Iterate over the string to check the frequency of each character
31        for (int i = 0; i < stringLength;) {
32            // Find the end index of the current character sequence
33            int endIndex = i + 1;
34            while (endIndex < stringLength && inputString.charAt(endIndex) == inputString.charAt(i)) {
35                endIndex++;
36            }
37
38            // Calculate the character's index in the alphabet
39            int charIndex = inputString.charAt(i) - 'a';
40
41            // Update the frequency counter for each character based on the criteria
42            frequencyCounter[charIndex] += Math.max(0, endIndex - i - targetLength + 1);
43
44            // If any character occurs >= 3 times after removing `targetLength` elements, return true
45            if (frequencyCounter[charIndex] >= 3) {
46                return true;
47            }
48
49            // Move to the next different character
50            i = endIndex;
51        }
52
53        // If no character meets the criteria, return false
54        return false;
55    }
56}
57
1class Solution {
2public:
3    int maximumLength(string s) {
4        int n = s.size();
5        int left = 0, right = n; // Use 'left' and 'right' to represent the search range bounds.
6      
7        // Lambda function to check if a valid subsequence of length 'x' exists.
8        auto check = [&](int x) {
9            int charCount[26] = {}; // Initialize an array to count characters.
10            for (int i = 0; i < n;) {
11                int j = i + 1;
12              
13                // Extend 'j' while characters are identical.
14                while (j < n && s[j] == s[i]) {
15                    ++j;
16                }
17              
18                int charIndex = s[i] - 'a'; // Calculate the index of the current character.
19              
20                // Increment the count for this character by the number of repeat appearances
21                // exceeding x - 1 (since we're allowed x appearances).
22                charCount[charIndex] += max(0, j - i - x + 1);
23              
24                // If any character appears three or more times, return true.
25                if (charCount[charIndex] >= 3) {
26                    return true;
27                }
28                i = j; // Move to the next sequence of characters.
29            }
30            return false; // No character appears three or more times.
31        };
32      
33        // Perform binary search to find the largest 'x' for which 
34        // the 'check' function returns true.
35        while (left < right) {
36            int mid = (left + right + 1) / 2; // Use integer division.
37          
38            // If 'check' function is true for 'mid', we update 'left' to search the upper half.
39            if (check(mid)) {
40                left = mid;
41            } else {
42                // Else search the lower half.
43                right = mid - 1;
44            }
45        }
46      
47        // If left is 0, it means no such subsequence exists, return -1,
48        // otherwise return the largest 'x' found.
49        return left == 0 ? -1 : left;
50    }
51};
52
1// Function to calculate the maximum length of fragments after removals
2function maximumLength(s: string): number {
3    const lengthOfString = s.length;
4    let leftPointer = 0;
5    let rightPointer = lengthOfString;
6    // Function to check if the current fragment size satisfies the condition
7    const doesFragmentSizeWork = (fragmentSize: number): boolean => {
8        // Create an array to count occurrences of each letter
9        const letterCounts: number[] = Array(26).fill(0);
10        // Iterate over the string
11        for (let i = 0; i < lengthOfString; ) {
12            // Find the end index of the current group of identical letters
13            let endIndex = i + 1;
14            while (endIndex < lengthOfString && s[endIndex] === s[i]) {
15                endIndex++;
16            }
17            // Calculate the index of the letter in the alphabet
18            const letterIndex = s[i].charCodeAt(0) - 'a'.charCodeAt(0);
19            letterCounts[letterIndex] += Math.max(0, endIndex - i - fragmentSize + 1);
20            // If a letter count reaches 3 or more, the size doesn't work
21            if (letterCounts[letterIndex] >= 3) {
22                return true;
23            }
24            i = endIndex;
25        }
26        return false;
27    };
28  
29    // Binary search to find the maximum valid fragment size
30    while (leftPointer < rightPointer) {
31        // Calculate the middle point using bitwise operator for efficiency
32        const midPoint = (leftPointer + rightPointer + 1) >> 1;
33        // If the current fragment size works, go right
34        if (doesFragmentSizeWork(midPoint)) {
35            leftPointer = midPoint;
36        } else { // If it doesn't, go left
37            rightPointer = midPoint - 1;
38        }
39    }
40  
41    // If no valid size was found, return -1; otherwise, return the found size
42    return leftPointer === 0 ? -1 : leftPointer;
43}
44
Not Sure What to Study? Take the 2-min Quiz:

What is the worst case running time for finding an element in a binary tree (not necessarily binary search tree) of size n?

Time and Space Complexity

The time complexity of the given code depends on two main operations: the binary search and the check function which iterates over the string in the worst case.

The binary search is performed over the range from 0 to n, where n is the length of the string s. This results in a logarithmic time complexity factor of O(log n) because with each iteration we halve our search space.

Within each step of the binary search, the check function is called, which has a linear time complexity in the size n because it potentially iterates over the entire string. Additionally, it updates the count of each character which is a constant time operation since the size of the character set |\Sigma| is constant (26 for the lowercase English letters).

Hence, for each binary search step, the check function contributes a complexity of O(n). Therefore, the combined time complexity for these operations is O(n log n).

However, the given reference answer mentions a time complexity of O((n + |\Sigma|) * log n), which likely accounts for the operation of finding the maximum value in the dictionary, which adds O(|\Sigma|) to each binary search step since |\Sigma| is the size of the character count dictionary. Ultimately, since |\Sigma| is a constant (26), this does not change the overall order of magnitude and it can be simplified to O(n log n) for large n.

The space complexity of the code is primarily due to the storage requirements of the character count dictionary. Since |\Sigma| is the size of the character set, the space complexity is O(|\Sigma|).

In summary:

  • Time Complexity: O(n log n) where n is the length of the string s.
  • Space Complexity: O(|\Sigma|) where |\Sigma| is the size of the character set, in this case, 26.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which of the following uses divide and conquer strategy?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫