2981. Find Longest Special Substring That Occurs Thrice I
Problem Description
In this problem, we are asked to find the longest special substring in a given string s
consisting of lowercase English letters. A special substring is defined as a substring composed of a single unique character. The catch is that we are only interested in special substrings that occur at least three times within the original string. If such a substring doesn't exist, then we must return -1
. A key thing to note is that these occurrences can overlap.
For example, in the string "aaabaaa"
, the special substring "aaa"
occurs twice and it is made up of a single character a
.
Intuition
To approach this problem, the solution employs binary search in conjunction with a sliding window counting mechanism. This combination is effective because of the nature of special substrings: the property of being special is preserved in a smaller substring of a special substring. In simpler terms, if you have a special substring of length 5
, it is guaranteed that there is a special substring of length 4
, 3
, 2
and 1
– that's the monotonicity which enables binary search.
The binary search operates by trying to find the longest possible length x
for which a special substring of that length appears at least thrice. We start with a lower bound l = 0
(as initially, we are not sure if there is a special substring) and an upper bound r = n
which is the length of the string because no substring can be longer than the string itself.
At every step, the binary search checks the midpoint mid
of the current l
and r
. The check(x)
function helps in determining if there is a special substring of length mid
that occurs thrice. If such a substring exists, we shift our lower bound to mid
to see if there's an even longer special substring meeting the criteria. If it doesn't exist, we reduce our upper bound to mid - 1
.
The check(x)
function uses a sliding window to count the frequency of each character substring of length x
. The sliding window keeps extending as long as the characters are the same. As soon as a character changes, it checks if the length of this window minus x
plus 1
(which is the count of valid substrings of the desired length) is non-negative and adds this number to the count of occurrences for the current character. If the maximum value among these counts is at least 3
, then a special substring of length x
appears at least thrice, otherwise, it does not.
Finally, binary search uses these outcomes to narrow down the maximum length of the special substring that appears at least thrice, with the search ending when l
and r
converge. The result is either -1
(if l
is 0
, meaning no special substring is found) or l
, the length of the longest special substring found.
Learn more about Binary Search and Sliding Window patterns.
Solution Approach
The given solution uses a binary search algorithm in conjunction with a sliding window technique to efficiently find the special substring.
Binary Search:
Given the monotonic nature of the problem, where a longer special substring guarantees the existence of a shorter one, binary search can be used. Binary search is a divide-and-conquer algorithm that divides the search interval in half repeatedly to find the target value. In this case, instead of locating a specific value, we're trying to find the maximum length x
for which a valid special substring exists.
- Initialize the binary search boundaries with
l = 0
andr = n
, wheren
is the length ofs
. - In each iteration, calculate the
mid
value using the formulamid = (l + r + 1) >> 1
. - Use the
check(x)
function to determine whether there is a special substring of lengthmid
that occurs at least three times. - If such a substring exists, move the lower boundary
l
tomid
, since we want to explore the possibility of a longer valid substring. - If not, adjust the upper boundary
r
tomid - 1
since no special substring of lengthmid
satisfies the condition and thus, we need to look for shorter lengths.
This process continues until l
and r
meet, and the longest length for a valid special substring is found.
Sliding Window:
For checking the existence of special substrings of a certain length x
, a sliding window mechanism is utilized within check(x)
:
- A counter dictionary
cnt
keeps track of the number of windows (special substrings) of lengthx
for each character. - Iterate through the string using an index
i
that starts at the beginning of a character run and continues until the run ends, marked by indexj
. - While the characters at
i
andj
are the same, incrementj
. This stretch fromi
toj-1
is homogeneous and thus forms a special substring. - Once we stop at
j
wheres[j] != s[i]
, we update the count for characters[i]
by adding the number of special substrings of lengthx
that can be formed betweeni
andj-1
. The count is maxed with zero to avoid negative addition, which could occur when the window doesn't support a substring of lengthx
. - Move
i
toj
and repeat the counting for the new special substring.
The purpose of the sliding window is to avoid re-computation by dynamically changing the size of the considered substring.
Returning the Result:
After binary search concludes, we have the boundary l
as the length of the potential longest special substring. If l
is 0
, it means no such substring is found that satisfies the problem's requirements and thus we return -1
. Otherwise, l
is the length of the longest special substring that occurs at least thrice, and it is returned as the solution.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let’s consider the string "aabbaaaaabb"
. We want to find the length of the longest special substring where a special substring is a substring composed of a single unique character, and it must occur at least three times.
Step-by-Step Execution:
Binary Search Initialization:
We start with l = 0
and r = 11
(the length of our example string).
First Binary Search Iteration - Checking Middle Length:
mid = (l + r + 1) >> 1
, yields mid = 6
. We must now check if there is a special substring of length 6
that appears at least three times using our check(x)
function.
Sliding Window Check For mid=6:
A sliding window from s[0]
to s[5]
is "aabbaa"
. This is not a special substring as it contains more than one unique character. We move to the next set of characters.
Our sliding window now checks from s[4]
to s[9]
which is "aaaaab"
. Once again, this is not a special substring. We continue this approach and find that no special substring of length 6 appears thrice. So we move our upper boundary, r = mid - 1
, to r = 5
.
Second Binary Search Iteration:
Now, mid = (l + r + 1) >> 1
, yielding mid = 3
.
Using our check(x)
function with mid = 3
, we setup a sliding window.
Sliding Window Check For mid=3:
For i = 0
to j = 2
, our substring is "aab"
. This does not meet our criteria.
For i = 2
to j = 4
, our substring is "bba"
. Again, this does not meet our criteria.
For i = 4
to j = 6
, our substring is "aaa"
. This does meet our criteria and occurs consecutively until j = 10
for a total of 7
occurrences.
Since we have found at least three occurrences of a special substring of length 3
, our check(x)
function returns true
.
We now move our lower boundary l
to mid
, making l = 3
.
Continuing Binary Search Iteration:
Assuming the binary search continues, we would repeatedly check the middle length using the check(x)
function with the sliding window technique, updating l
and r
accordingly, until l
and r
converge.
Since in our example, l = 3
already meets the condition and we know that a substring of length 6
is not valid, subsequent iterations, not shown here for brevity, would find that length 4
or 5
does not have three occurrences, meaning ultimately l
would remain 3
.
Returning the Result:
After the binary search concludes, l
would be the length of the longest special substring that occurs at least thrice. Given our previous example and assuming no longer valid substring was found beyond length 3
, we would return l = 3
as the solution to our example input "aabbaaaaabb"
.
Solution Implementation
1from collections import defaultdict
2
3class Solution:
4 def maximumLength(self, s: str) -> int:
5 # Helper function to check if after removing at most x adjacent duplicates
6 # from the string, any character occurs at least 3 times in a row.
7 def is_valid(x: int) -> bool:
8 char_counts = defaultdict(int)
9 index = 0
10 while index < string_length:
11 next_index = index + 1
12 # Count the length of the current sequence of identical characters.
13 while next_index < string_length and s[next_index] == s[index]:
14 next_index += 1
15 # Update the count for this character by the length of the sequence
16 # minus the allowed number x, adding at least 1 to count each sequence.
17 char_counts[s[index]] += max(0, next_index - index - x + 1)
18 # Move to the next sequence of characters.
19 index = next_index
20 # Check if any character's count is at least 3 after the removal.
21 return max(char_counts.values()) >= 3
22
23 # Initialize the start and end bounds of the binary search.
24 string_length = len(s)
25 left, right = 0, string_length
26 # Perform a binary search to find the maximum valid x.
27 while left < right:
28 mid = (left + right + 1) >> 1 # Right-biased mid calculation
29 # If the current mid value satisfies the condition, move
30 # the search space to the right half including mid itself.
31 if is_valid(mid):
32 left = mid
33 else:
34 # Otherwise, move the search space to the left half excluding mid.
35 right = mid - 1
36
37 # If left equals 0, then it's not possible to remove characters
38 # such that any character appears at least 3 times. Return -1.
39 # Otherwise, return the length of the maximum sequence after removal.
40 return -1 if left == 0 else left
41
1class Solution {
2 private String inputString;
3 private int stringLength;
4
5 // Method to find the maximum length of substring satisfying specific criteria
6 public int maximumLength(String inputString) {
7 this.inputString = inputString;
8 this.stringLength = inputString.length();
9
10 int left = 0, right = stringLength;
11
12 // Binary search to find the maximum substring length
13 while (left < right) {
14 int mid = (left + right + 1) >> 1;
15 if (isCriteriaSatisfied(mid)) {
16 left = mid;
17 } else {
18 right = mid - 1;
19 }
20 }
21
22 // If unable to find such a length, return -1. Otherwise, return the length found.
23 return left == 0 ? -1 : left;
24 }
25
26 // Helper method to check if a specific length satisfies the criteria
27 private boolean isCriteriaSatisfied(int targetLength) {
28 int[] frequencyCounter = new int[26];
29
30 // Iterate over the string to check the frequency of each character
31 for (int i = 0; i < stringLength;) {
32 // Find the end index of the current character sequence
33 int endIndex = i + 1;
34 while (endIndex < stringLength && inputString.charAt(endIndex) == inputString.charAt(i)) {
35 endIndex++;
36 }
37
38 // Calculate the character's index in the alphabet
39 int charIndex = inputString.charAt(i) - 'a';
40
41 // Update the frequency counter for each character based on the criteria
42 frequencyCounter[charIndex] += Math.max(0, endIndex - i - targetLength + 1);
43
44 // If any character occurs >= 3 times after removing `targetLength` elements, return true
45 if (frequencyCounter[charIndex] >= 3) {
46 return true;
47 }
48
49 // Move to the next different character
50 i = endIndex;
51 }
52
53 // If no character meets the criteria, return false
54 return false;
55 }
56}
57
1class Solution {
2public:
3 int maximumLength(string s) {
4 int n = s.size();
5 int left = 0, right = n; // Use 'left' and 'right' to represent the search range bounds.
6
7 // Lambda function to check if a valid subsequence of length 'x' exists.
8 auto check = [&](int x) {
9 int charCount[26] = {}; // Initialize an array to count characters.
10 for (int i = 0; i < n;) {
11 int j = i + 1;
12
13 // Extend 'j' while characters are identical.
14 while (j < n && s[j] == s[i]) {
15 ++j;
16 }
17
18 int charIndex = s[i] - 'a'; // Calculate the index of the current character.
19
20 // Increment the count for this character by the number of repeat appearances
21 // exceeding x - 1 (since we're allowed x appearances).
22 charCount[charIndex] += max(0, j - i - x + 1);
23
24 // If any character appears three or more times, return true.
25 if (charCount[charIndex] >= 3) {
26 return true;
27 }
28 i = j; // Move to the next sequence of characters.
29 }
30 return false; // No character appears three or more times.
31 };
32
33 // Perform binary search to find the largest 'x' for which
34 // the 'check' function returns true.
35 while (left < right) {
36 int mid = (left + right + 1) / 2; // Use integer division.
37
38 // If 'check' function is true for 'mid', we update 'left' to search the upper half.
39 if (check(mid)) {
40 left = mid;
41 } else {
42 // Else search the lower half.
43 right = mid - 1;
44 }
45 }
46
47 // If left is 0, it means no such subsequence exists, return -1,
48 // otherwise return the largest 'x' found.
49 return left == 0 ? -1 : left;
50 }
51};
52
1// Function to calculate the maximum length of fragments after removals
2function maximumLength(s: string): number {
3 const lengthOfString = s.length;
4 let leftPointer = 0;
5 let rightPointer = lengthOfString;
6 // Function to check if the current fragment size satisfies the condition
7 const doesFragmentSizeWork = (fragmentSize: number): boolean => {
8 // Create an array to count occurrences of each letter
9 const letterCounts: number[] = Array(26).fill(0);
10 // Iterate over the string
11 for (let i = 0; i < lengthOfString; ) {
12 // Find the end index of the current group of identical letters
13 let endIndex = i + 1;
14 while (endIndex < lengthOfString && s[endIndex] === s[i]) {
15 endIndex++;
16 }
17 // Calculate the index of the letter in the alphabet
18 const letterIndex = s[i].charCodeAt(0) - 'a'.charCodeAt(0);
19 letterCounts[letterIndex] += Math.max(0, endIndex - i - fragmentSize + 1);
20 // If a letter count reaches 3 or more, the size doesn't work
21 if (letterCounts[letterIndex] >= 3) {
22 return true;
23 }
24 i = endIndex;
25 }
26 return false;
27 };
28
29 // Binary search to find the maximum valid fragment size
30 while (leftPointer < rightPointer) {
31 // Calculate the middle point using bitwise operator for efficiency
32 const midPoint = (leftPointer + rightPointer + 1) >> 1;
33 // If the current fragment size works, go right
34 if (doesFragmentSizeWork(midPoint)) {
35 leftPointer = midPoint;
36 } else { // If it doesn't, go left
37 rightPointer = midPoint - 1;
38 }
39 }
40
41 // If no valid size was found, return -1; otherwise, return the found size
42 return leftPointer === 0 ? -1 : leftPointer;
43}
44
Time and Space Complexity
The time complexity of the given code depends on two main operations: the binary search and the check
function which iterates over the string in the worst case.
The binary search is performed over the range from 0 to n
, where n
is the length of the string s
. This results in a logarithmic time complexity factor of O(log n)
because with each iteration we halve our search space.
Within each step of the binary search, the check
function is called, which has a linear time complexity in the size n
because it potentially iterates over the entire string. Additionally, it updates the count of each character which is a constant time operation since the size of the character set |\Sigma|
is constant (26 for the lowercase English letters).
Hence, for each binary search step, the check
function contributes a complexity of O(n)
. Therefore, the combined time complexity for these operations is O(n log n)
.
However, the given reference answer mentions a time complexity of O((n + |\Sigma|) * log n)
, which likely accounts for the operation of finding the maximum value in the dictionary, which adds O(|\Sigma|)
to each binary search step since |\Sigma|
is the size of the character count dictionary. Ultimately, since |\Sigma|
is a constant (26), this does not change the overall order of magnitude and it can be simplified to O(n log n)
for large n
.
The space complexity of the code is primarily due to the storage requirements of the character count dictionary. Since |\Sigma|
is the size of the character set, the space complexity is O(|\Sigma|)
.
In summary:
- Time Complexity:
O(n log n)
wheren
is the length of the strings
. - Space Complexity:
O(|\Sigma|)
where|\Sigma|
is the size of the character set, in this case, 26.
Learn more about how to find time and space complexity quickly using problem constraints.
Which of the two traversal algorithms (BFS and DFS) can be used to find whether two nodes are connected?
Recommended Readings
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
https algomonster s3 us east 2 amazonaws com cover_photos stack svg Sliding Window Maximum Monotonic Stack We have an array and a sliding window defined by a start index and an end index The sliding window moves from left of the array to right There are always k elements in
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Want a Structured Path to Master System Design Too? Don’t Miss This!