2982. Find Longest Special Substring That Occurs Thrice II
Problem Description
In the given problem, we work with a string s
comprised entirely of lowercase English letters. We are concerned with identifying what is termed as a "special" substring within this string. A "special" substring is one that is made up only of a single character. For instance, a substring such as "aaa" is considered special because it contains only the letter 'a', while "abc" would not be considered special as it contains multiple distinct characters.
The task is to find the length of the longest special substring that occurs in the string s
at least three times. If there exists no such substring, the function should return -1. The requirement of a substring to be contiguous and non-empty means that the characters must be adjacent and the length must be at least one. This problem thereby asks us to analyze the string for repeated patterns of single letters while optimizing for the longest pattern that appears three or more times.
Intuition
To solve this problem efficiently, we can consider the properties that would make our search quicker. Here, the intuition can involve a recognition that if a special substring of a particular length exists at least three times, then a shorter substring made from cutting any character from the ends of the longer special substring would also exist at least three times. This property of the problem lends itself to a binary search approach. Why? Because we can find the boundary points between lengths that can form such substrings and lengths that cannot.
We use binary search to hone in on the longest length of a special substring which occurs three or more times. We start with a range that begins at 0 (no special substring) and ends at the length of the string (meaning the whole string is a special substring). The aim is to narrow down this range until we find the longest length that matches our criteria.
Our binary search relies on a helper function, check(x)
, which validates whether a special substring of length x
exists at least three times in s
. This function counts occurrences of each letter, as part of a potential special substring, and sees if it can form a valid substring of length x
that appears at least three times.
The solution's approach entails iterating over the string, tracking and counting the lengths of uninterrupted sequences of the same character, and then applying the binary search technique to determine the longest special substring length that appears thrice or more. This requires loop controls and conditional checks to handle the unique aspects of counting uninterrupted sequences and appropriately updating our binary search bounds.
Learn more about Binary Search and Sliding Window patterns.
Solution Approach
The provided solution utilizes binary search to find the length of the longest special substring that occurs at least thrice. Binary search is an efficient algorithm for finding an item from a sorted list of items. It works by repeatedly dividing in half the portion of the list that could contain the item, until you've narrowed the possible locations to just one.
However, binary search alone is not enough to solve this problem. Therefore, it's combined with a sliding window counting technique to verify the binary search condition. The sliding window technique is useful for problems where we need to examine a contiguous sequence of elements from an array or string, and this fits perfectly for counting special substrings.
In the solution, the binary search algorithm is used to narrow down the maximum length of the special substring that repeats at least three times:
- We initiate the search with the left boundary
l
set to 0 and the right boundaryr
set to the length of the stringn
. - As customary with binary search, the middle point
mid
is calculated as(l + r + 1) >> 1
. The>>
operator here is a bitwise shift to the right, which effectively divides by two. - At each mid, we verify if a special substring of that length appears three times using the
check(x)
function.
The check(x)
function works as follows:
- It initializes a defaultdict
cnt
to count occurrences of characters for possible special substrings of lengthx
. - A loop iterates through the string and when a sequence of identical characters is found, it increments the count for that character in the count dictionary by the number of times a substring of length
x
can be formed from the sequence. - This count is calculated by taking the length of the sequence
j - i
and subtractingx - 1
(the adjustment-1
is because a substring of lengthx
will start within the firstx - 1
positions of a special sequence). - It returns
True
if any character has been found to form a substring of lengthx
at least three times, indicated by a count of three or more incnt
.
This process is repeated until the binary search boundaries converge. If check(mid)
is True
, l
is updated to mid
; otherwise, r
is updated to mid - 1
. This narrows down the search space and hones in on the longest special substring.
The binary search ends when l >= r
, at which point we have found the maximum length that satisfies the condition, or we have determined that no such length exists. If l == 0
, then there is no special substring that occurs at least three times, and -1
is returned. Otherwise, l
itself represents the length of the longest special substring that meets the conditions.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's illustrate the solution approach with an example.
Suppose our string s
is "aaabaaaacaaabaa"
. We are looking for the longest special substring that occurs at least three times.
-
Initiate Boundary Values:
- Let's start with the left boundary
l = 0
and right boundaryr = length of s
, which is15
.
- Let's start with the left boundary
-
Binary Search:
- We now perform a binary search for the maximum length of a special substring. The mid-point
mid
is calculated as(l + r + 1) >> 1
. - Initially,
mid
will be(0 + 15 + 1) >> 1 = 8
.
- We now perform a binary search for the maximum length of a special substring. The mid-point
-
Check Mid-Point:
- We need to check if there's a special substring of length 8 that appears at least three times using our
check(8)
function.
- We need to check if there's a special substring of length 8 that appears at least three times using our
-
Apply
check(x)
function:- We traverse through the string
s
and count occurrences of each character that could make up a special substring of lengthx = 8
. - As we scan
s
, we find a sequence"aaaabaaa"
from index 0 to 7. Since this doesn't form a special substring of length 8, we continue. - Then we encounter
"aaaa"
from index 4 to 7 and"aaaa"
from index 8 to 11, but each is too short. - Finally, from index 11 to the end we see
"aaabaa"
. Again, we don't have a continuous stretch of 8 identical characters.
- We traverse through the string
-
Cannot Form a Special Substring of Mid Length:
- Since we cannot form a special substring of length 8 three times, we update
r
to bemid - 1
, sor
becomes7
.
- Since we cannot form a special substring of length 8 three times, we update
-
Repeat Binary Search with New Bounds:
- We calculate a new
mid
withl = 0
andr = 7
. The newmid
is(0 + 7 + 1) >> 1 = 4
. - We now check for a special substring of length 4 using our
check(4)
function.
- We calculate a new
-
Apply
check(x)
function again:- Traversing through
s
again, we find subsequence"aaaa"
from index 4 to 7 and another"aaaa"
from index 8 to 11. Each of these is a special substring of length 4, and they occur twice. - Continuing our search, from index 11 to 14, we have
"aaab"
. This part includes three 'a's, which can form a substring of length 4 starting from indices 11, 12, and 13.
- Traversing through
-
Special Substring of Mid Length Found:
- Since we can form a special substring of length 4 at least three times, our
check(4)
returnsTrue
. - Now we set
l
tomid
, sol
becomes4
.
- Since we can form a special substring of length 4 at least three times, our
-
Narrow Down Further:
- With the updated
l
, we now havel = 4
andr = 7
. A newmid
calculation gives us(4 + 7 + 1) >> 1 = 6
. - Checking for a special substring of length 6, we find we cannot form such a substring thrice in our string.
- With the updated
-
Converge Boundaries:
- Since
check(6)
returnsFalse
, we updater
tomid - 1
, sor
becomes5
. l = 4
andr = 5
now, and our nextmid
is(4 + 5 + 1) >> 1 = 5
.check(5)
also fails as we can't find a special substring of this length appearing three times.
- Final Boundaries:
- We end up with
l = 4
andr = 4
, since we reducedr
to4
after the last check failed. At this point, sincel = r
, binary search concludes.
- Result:
- Since
l
is not updated from4
after the last iteration, it means the longest special substring that occurs at least three times has a length of4
. - If
l
was still0
, we would return-1
, indicating no such substring was found.
Using this example, we have walked through the binary search and sliding window counting technique to find the length of the longest special substring that occurs at least three times, which in this case is 4.
Solution Implementation
1from collections import defaultdict
2
3class Solution:
4 def maximumLength(self, s: str) -> int:
5 # Helper function that checks whether there's a substring of length x
6 # with at least 3 repeated characters after removing x - 1 characters
7 def is_valid(x: int) -> bool:
8 count = defaultdict(int)
9 i = 0
10 while i < string_length:
11 j = i + 1
12 while j < string_length and s[j] == s[i]:
13 j += 1
14 count[s[i]] += max(0, j - i - x + 1)
15 i = j
16 return max(count.values()) >= 3
17
18 # Main function logic
19 string_length = len(s) # Length of the input string
20 left, right = 0, string_length # Initialize the binary search bounds
21
22 # Perform a binary search to find the maximum length
23 while left < right:
24 mid = (left + right + 1) // 2
25 if is_valid(mid):
26 # If valid, search the upper half
27 left = mid
28 else:
29 # If not valid, search the lower half
30 right = mid - 1
31
32 # Return -1 if left is 0, indicating no such length is found
33 # Otherwise, return the length found
34 return -1 if left == 0 else left
35
1class Solution {
2 private String inputString;
3 private int stringLength;
4
5 // Calculates the maximum length of the substring that satisfies the conditions
6 public int maximumLength(String s) {
7 this.inputString = s;
8 this.stringLength = s.length();
9
10 int left = 0, right = stringLength;
11
12 // Binary search for the maximum length of the valid substring
13 while (left < right) {
14 int mid = (left + right + 1) >> 1; // Same as (left + right + 1) / 2, with integer division
15 if (isValidSubstringLength(mid)) {
16 left = mid;
17 } else {
18 right = mid - 1;
19 }
20 }
21
22 // Return the appropriate length or -1 if no valid length is found
23 return left == 0 ? -1 : left;
24 }
25
26 // Helper method to check if a specific length x can be a valid substring length
27 private boolean isValidSubstringLength(int x) {
28 int[] charCount = new int[26]; // Array to count occurrences of each letter
29
30 // Iterate over characters in the input string
31 for (int i = 0; i < stringLength;) {
32 int j = i + 1;
33 // Continue until end of string or until a different character is found
34 while (j < stringLength && inputString.charAt(j) == inputString.charAt(i)) {
35 j++;
36 }
37 int charIndex = inputString.charAt(i) - 'a'; // Calculate index for character 'a' to 'z'
38 charCount[charIndex] += Math.max(0, j - i - x + 1);
39
40 // If any character count reaches 3 or more, return true (valid substring length)
41 if (charCount[charIndex] >= 3) {
42 return true;
43 }
44
45 // Move to the next group of characters
46 i = j;
47 }
48
49 // No valid substring length found for the given x
50 return false;
51 }
52}
53
1class Solution {
2public:
3 int maximumLength(string s) {
4 int stringSize = s.size(); // Size of the input string
5 int left = 0; // Initialize the left pointer for binary search
6 int right = stringSize; // Initialize the right pointer for binary search
7
8 // "check" is a lambda function that checks if there's a valid substring when "maxLength" is applied
9 auto check = [&](int maxLength) {
10 int charCount[26] = {}; // Initialize character count array with 0s
11
12 // Iterate over the string
13 for (int i = 0; i < stringSize;) {
14 int j = i + 1;
15 // Count the length of a sequence of identical characters starting from position i
16 while (j < stringSize && s[j] == s[i]) {
17 ++j;
18 }
19
20 int charIndex = s[i] - 'a'; // Convert character to an index (0-25)
21
22 // Increase the count of this character sequence by the excess amount
23 // over the allowed maxLength, if any
24 charCount[charIndex] += max(0, j - i - maxLength + 1);
25
26 // If there are 3 or more occurrences, the check failed
27 if (charCount[charIndex] >= 3) {
28 return true;
29 }
30
31 // Move to the next sequence of different characters
32 i = j;
33 }
34 // If we've gone through the string without failing, the maxLength is valid
35 return false;
36 };
37
38 // Perform binary search to find the maximum valid substring length
39 while (left < right) {
40 int mid = (left + right + 1) >> 1; // Calulate middle point
41
42 // If the check passes for this mid value then we can try higher values
43 if (check(mid)) {
44 left = mid;
45 } else {
46 // Otherwise we look for a smaller maximum by reducing our 'right' bound
47 right = mid - 1;
48 }
49 }
50
51 // If left is still 0 then there's no valid substring, otherwise return the max length found
52 return left == 0 ? -1 : left;
53 }
54};
55
1function maximumLength(s: string): number {
2 // n represents the length of the string
3 const n = s.length;
4
5 // 'left' and 'right' will represent the bounds of the binary search
6 let left = 0;
7 let right = n;
8
9 /**
10 * Helper function to check if there's a sequence of three or more identical characters
11 * after removing 'x' number of each character.
12 *
13 * @param removeCount - the count of each character to remove
14 * @returns a boolean value indicating whether the condition is satisfied
15 */
16 const isExceedingSequence = (removeCount: number): boolean => {
17 // countArray stores the number of excess characters after removals
18 const countArray: number[] = Array(26).fill(0);
19
20 // Iterate over the string
21 for (let i = 0; i < n; ) {
22 // Find the end of the current sequence of the same character
23 let j = i + 1;
24 while (j < n && s[j] === s[i]) {
25 j++;
26 }
27
28 // Find the index of the current character in the alphabet
29 const charIndex = s[i].charCodeAt(0) - 'a'.charCodeAt(0);
30
31 // Update the countArray for this character to include excess characters
32 countArray[charIndex] += Math.max(0, j - i - removeCount);
33
34 // If more than 2 characters remain, sequence exceeds the limit
35 if (countArray[charIndex] >= 3) {
36 return true;
37 }
38
39 // Move to the next sequence of characters
40 i = j;
41 }
42 return false;
43 };
44
45 // Perform binary search to find the maximum length
46 while (left < right) {
47 // Find the mid point, leaning to the right to prevent infinite loop
48 const mid = Math.floor((left + right + 1) / 2);
49
50 // Use isExceedingSequence to check if mid is a valid remove count
51 if (isExceedingSequence(mid)) {
52 left = mid; // If true, a larger remove count still satisfies the condition
53 } else {
54 right = mid - 1; // If false, we need a smaller remove count
55 }
56 }
57
58 // If left is 0, no valid configuration was found, so return -1.
59 // Otherwise, return the found maximum length.
60 return left === 0 ? -1 : left;
61}
62
Time and Space Complexity
The time complexity of the code is O(n log n)
, where n
is the length of the input string s
. This complexity arises because the code performs a binary search (which has a complexity of log n
) within the range of the length of the string. During each step of the binary search, the check
function is called, which in the worst case can iterate over the entire string, leading to O(n)
complexity for each search step.
The binary search is bounded by [0, n]
, and for each midpoint calculated, the check
function is utilized. The check
function does a single pass over the string while maintaining a running count of subsequences which exceed the length x - 1
. In total, we combine the binary search and the linear check for each mid
, amounting to O(n log n)
complexity.
The space complexity of the code is O(1)
for the size of the character set (denoted |\Sigma|
) as the problem specifies a fixed lowercase English letters character set which contains 26 letters. However, because we store a count of characters, the space complexity is actually O(|\Sigma|)
, which in this case remains constant because the character set does not grow with the input size n
. Hence the space complexity is O(26)
, which simplifies to O(1)
.
Learn more about how to find time and space complexity quickly using problem constraints.
The three-steps of Depth First Search are:
- Identify states;
- Draw the state-space tree;
- DFS on the state-space tree.
Recommended Readings
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
https algomonster s3 us east 2 amazonaws com cover_photos stack svg Sliding Window Maximum Monotonic Stack We have an array and a sliding window defined by a start index and an end index The sliding window moves from left of the array to right There are always k elements in
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Want a Structured Path to Master System Design Too? Don’t Miss This!