395. Longest Substring with At Least K Repeating Characters
Problem Description
The problem is about finding the longest substring within a given string s
where the minimum frequency of each unique character in that substring is at least k
. The frequency of a character means the number of times that character appears in the substring. A substring is a contiguous sequence of characters within a string. If there is no such substring that meets the criteria, the output should be 0.
To summarize:
- Suppose we have a string "aaabb" and k = 3.
- A valid substring would be "aaa" because the character 'a' appears at least 3 times.
- The substring "aaabb" is not valid because the character 'b' only appears twice, which is less than k.
The goal is to return the length of the longest valid substring.
Intuition
The intuition behind the solution is to use a divide-and-conquer approach. This approach is suitable because we can check for the validity of character frequencies in substrings and can recursively apply this to smaller substrings if we find a character not meeting the frequency criteria.
Here's the step-by-step intuition for the provided code:
-
We begin the solution by defining a recursive function
dfs(l, r)
, which will be responsible for solving the problem for the substring ofs
starting at indexl
and ending at indexr
. -
We use a
Counter
from Python'scollections
module to count the frequencies of characters in the current substrings[l:r+1]
. -
We look for the first character (called split) in the substring whose frequency is less than
k
by iterating over the count items. If we don't find such a character, it means the entire current substring is valid, and we return its length. -
If we find such a character, we use it to split our current substring into smaller substrings on which we will perform the same check recursively.
-
We iterate over the indices between
l
andr
, looking for the split character. Once the split character is found, we perform a recursive call to thedfs
function for the substring before the split character appears. -
We maintain a variable
ans
which keeps track of the length of the longest valid substring found so far and update it with the values returned from recursive calls. -
After the loop,
ans
will contain the length of the longest valid substring of the original substrings[l:r+1]
where all characters' frequencies are at leastk
, and we return this value.
The overall function longestSubstring
initiates the process by calling dfs(0, len(s) - 1)
(i.e., for the whole string) and returns the result.
The beauty of this approach lies in its ability to narrow down the problem into smaller instances of the same problem. Thus, focusing on the character frequency condition, we further reduce the problem size by eliminating segments that do not meet the condition until a valid solution is reached or all possibilities are exhausted.
Learn more about Divide and Conquer and Sliding Window patterns.
Solution Approach
The solution to the given problem involves several key concepts of algorithm design and usage of data structures, which are elaborated below:
Data Structures:
- Counter: From Python's
collections
module. It's a subclass of dictionary used to count hashable objects (characters in this case). It's used to compute the frequency of each character within the current substring.
Algorithm:
- Divide and Conquer: This is a recursive strategy, where the problem is divided into subproblems, each of which is solved independently. In this case, whenever a character with frequency less than
k
is encountered, the problem is split and the process is repeated on each resulting substring.
Step-by-Step Implementation:
-
Recursive Function: The
dfs(l, r)
function is defined to apply a divide-and-conquer strategy. It is responsible for returning the length of the longest valid substring withins[l:r+1]
. -
Counting Frequencies:
Counter(s[l:r+1])
calculates the frequency of each character in the current substring. -
Split Character Check:
split = next((c for c, v in cnt.items() if v < k), '')
looks for a character that does not satisfy the frequency condition (frequency is less thank
). If such a character is found, it will be used to split the substring. If no such character exists, the condition is satisfied for the entire substring, which is then a candidate for the answer. -
Iterate and Split: If a split character is found, the function iterates over
s[l:r+1]
, looking for occurrences of this character. Each time it's found, that denotes a potential end of a valid substring, which is then processed recursively:- Skip past any consecutive split characters.
- Find the next segment that doesn't contain the split character.
- Recursively apply
dfs
on this segment to find the length of its longest valid substring.
-
Updating the Answer: Within the iteration, the
ans
variable is updated with the maximum value returned by the recursive calls, ensuring that the maximum length valid substring is kept. -
Recursive Calls for Substrings: The recursive call
t = dfs(i, j - 1)
computes the length of the longest valid substring for the segment determined between the indicesi
toj-1
. -
Combining Results: After the loop concludes, the variable
ans
holds the maximum length among all valid substrings found, which is returned by the recursive functiondfs
. -
Initialization: The recursive process is started off by the
longestSubstring
function with a call todfs(0, len(s) - 1)
, which passes the entire string into thedfs
function. -
Result: Finally, the maximal length is returned by the
longestSubstring
function, indicating the length of the longest valid substring found in the initial strings
.
By combining this divide-and-conquer strategy with efficient count operations and recursion, this solution effectively breaks down the string into smaller segments, each of which can be validated independently, resulting in a powerful and elegant solution to the problem.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's use the string "ababbc" and k = 2
as an example to illustrate the solution approach.
-
Initialization: We call the recursive function
dfs
with the entire string, sodfs(0, 5)
. -
Counting Frequencies: The
Counter
is used to count characters, and we get{'a': 2, 'b': 3, 'c': 1}
. -
Split Character Check: We check for a character with frequency less than
k
. The character'c'
has a frequency less than 2, so it is chosen as the split character. -
Iterate and Split: We use
'c'
to split the string into valid substrings to search for our answer. The string "ababbc" is split into "ababb" and "", since there are no characters after'c'
. -
Recursive Calls for Substrings: Now we consider the substring "ababb":
- We count the frequencies within "ababb" and get
{'a': 2, 'b': 3}
. - All characters meet the frequency criteria, so "ababb" is a valid substring.
- We update
ans
to 5, which is the length of "ababb".
- We count the frequencies within "ababb" and get
-
Combining Results: At this point,
ans
is 5, and there are no more substrings to consider. -
Result: The
longestSubstring
function returnsans
, which is 5, indicating the length of the longest valid substring found in the initial string"ababbc"
with a minimum character frequency of 2.
This walkthrough shows how we use a divide-and-conquer strategy to split the initial problem into a simpler subproblem. By applying the same logic to smaller and smaller substrings, the algorithm finds the length of the longest substring that satisfies the minimum character frequency requirement.
Solution Implementation
1from collections import Counter
2
3class Solution:
4 def longestSubstring(self, s: str, k: int) -> int:
5 # Helper function that performs depth-first search to find the longest substring.
6 def dfs(left, right):
7 # Count the frequency of each character in the current substring.
8 counter = Counter(s[left:right + 1])
9
10 # Find the first character which has fewer occurrences than k
11 # and use it as a split point to divide the string.
12 split_char = next((char for char, freq in counter.items() if freq < k), None)
13
14 # If no such character that occurred fewer than k times was found,
15 # the whole substring [left:right+1] qualifies and its length is returned.
16 if not split_char:
17 return right - left + 1
18
19 max_length = 0 # Initialize the max length of a valid substring.
20 i = left # Start of the next substring to examine.
21
22 # Iterate through the string to find and process all valid substrings.
23 while i <= right:
24 # Skip all occurrences of the split character.
25 while i <= right and s[i] == split_char:
26 i += 1
27
28 # If the end of the string is reached, break out of the loop.
29 if i > right:
30 break
31
32 # Find the end of the next substring that doesn't contain the split character.
33 j = i
34 while j <= right and s[j] != split_char:
35 j += 1
36
37 # Recursively call dfs on the next valid substring.
38 current_length = dfs(i, j - 1)
39
40 # Update the maximum length found so far.
41 max_length = max(max_length, current_length)
42
43 # Start the next iteration from the end of the discovered substring.
44 i = j
45
46 # Return the maximum length of a valid substring found.
47 return max_length
48
49 # Call our helper function with the entire string to start.
50 return dfs(0, len(s) - 1)
51
52# Example usage.
53solution = Solution()
54print(solution.longestSubstring("aaabb", 3)) # Output should be 3, as "aaa" is the longest valid substring.
55
1class Solution {
2 // Class instance variables to hold the input string and the minimum repeat-count 'k'.
3 private String inputString;
4 private int minRepeats;
5
6 // Public method to initiate the longest substring search.
7 public int longestSubstring(String s, int k) {
8 this.inputString = s;
9 this.minRepeats = k;
10 // Start the depth-first search for the longest substring.
11 return depthFirstSearch(0, s.length() - 1);
12 }
13
14 // A private helper method for the depth-first search to find the longest substring.
15 private int depthFirstSearch(int left, int right) {
16 // Array to count occurrences of each character.
17 int[] charCounts = new int[26];
18 for (int i = left; i <= right; ++i) {
19 charCounts[inputString.charAt(i) - 'a']++;
20 }
21
22 // Initialize the variable ‘splitChar’ that holds the character to split on.
23 char splitChar = 0;
24 for (int i = 0; i < 26; ++i) {
25 // Find the first character that occurs less than 'k' times, if any.
26 if (charCounts[i] > 0 && charCounts[i] < minRepeats) {
27 splitChar = (char) (i + 'a');
28 break;
29 }
30 }
31
32 // If no split character is found (all characters occur at least k times), return the substring length.
33 if (splitChar == 0) {
34 return right - left + 1;
35 }
36
37 // Initialize the start index for the next segment of the search.
38 int start = left;
39 int maximumLength = 0;
40 while (start <= right) {
41 // Skip all occurrences of the split character.
42 while (start <= right && inputString.charAt(start) == splitChar) {
43 start++;
44 }
45 if (start > right) { // If there is no non-split character left, break.
46 break;
47 }
48
49 // Find the next segment without split character.
50 int end = start;
51 while (end <= right && inputString.charAt(end) != splitChar) {
52 end++;
53 }
54
55 // Calculate the maximum length for the current segment.
56 int segmentLength = depthFirstSearch(start, end - 1);
57
58 // Update the maximum length if segment length is larger.
59 maximumLength = Math.max(maximumLength, segmentLength);
60
61 // Move to the next potential segment.
62 start = end;
63 }
64
65 // Return the maximum length found.
66 return maximumLength;
67 }
68}
69
1#include <algorithm>
2#include <string>
3#include <functional>
4
5class Solution {
6public:
7 int longestSubstring(std::string s, int k) {
8 // Define a lambda function to perform the depth-first search.
9 std::function<int(int, int)> dfs = [&](int start, int end) -> int {
10 int counts[26] = {0};
11 // Count occurrences of each character in the current substring.
12 for (int i = start; i <= end; ++i) {
13 counts[s[i] - 'a']++;
14 }
15
16 // Find a character that occurs less than k times to split the problem.
17 char separator = 0;
18 for (int i = 0; i < 26; ++i) {
19 if (counts[i] > 0 && counts[i] < k) {
20 separator = 'a' + i;
21 break;
22 }
23 }
24
25 // If there is no such character, the whole substring is valid.
26 if (separator == 0) {
27 return end - start + 1;
28 }
29
30 // Initialize start index for the next segment.
31 int currentStart = start;
32 int maxLength = 0;
33
34 // Split the string and use recursion to find the longest valid substring.
35 while (currentStart <= end) {
36 // Skip the segment of invalid characters (separator).
37 while (currentStart <= end && s[currentStart] == separator) {
38 ++currentStart;
39 }
40
41 // If we have reached the end of the string, break the loop.
42 if (currentStart > end) {
43 break;
44 }
45
46 // Find the end index of the next segment that does not include the separator.
47 int currentEnd = currentStart;
48 while (currentEnd <= end && s[currentEnd] != separator) {
49 ++currentEnd;
50 }
51
52 // Perform a recursion call for the current segment.
53 int currentLength = dfs(currentStart, currentEnd - 1);
54
55 // Update the maximum length found so far.
56 maxLength = std::max(maxLength, currentLength);
57
58 // Move to the next segment.
59 currentStart = currentEnd;
60 }
61
62 // Return the maximum length of all segments.
63 return maxLength;
64 };
65
66 // Call the lambda function with the full string as the initial segment.
67 return dfs(0, s.size() - 1);
68 }
69};
70
1function longestSubstring(s: string, k: number): number {
2 // Recursive helper function to perform a depth-first search for the longest substring.
3 const dfs = (start: number, end: number): number => {
4 const counts = new Array(26).fill(0);
5 // Count the occurrences of each character in the current substring.
6 for (let i = start; i <= end; ++i) {
7 counts[s.charCodeAt(i) - 'a'.charCodeAt(0)]++;
8 }
9
10 // Find a character that occurs fewer than 'k' times to split the problem.
11 let separator = 0;
12 for (let i = 0; i < 26; ++i) {
13 if (counts[i] > 0 && counts[i] < k) {
14 separator = 'a'.charCodeAt(0) + i;
15 break;
16 }
17 }
18
19 // If there is no such character, the whole substring is valid.
20 if (separator === 0) {
21 return end - start + 1;
22 }
23
24 // Initialize start index for the next segment.
25 let currentStart = start;
26 let maxLength = 0;
27
28 // Split the string and use recursion to find the longest valid substring.
29 while (currentStart <= end) {
30 // Skip the segment of characters that are invalid (those being the separator).
31 while (currentStart <= end && s.charCodeAt(currentStart) === separator) {
32 ++currentStart;
33 }
34
35 // If we have reached the end of the string, exit the loop.
36 if (currentStart > end) {
37 break;
38 }
39
40 // Find the end index of the next segment that does not include the separator.
41 let currentEnd = currentStart;
42 while (currentEnd <= end && s.charCodeAt(currentEnd) !== separator) {
43 ++currentEnd;
44 }
45
46 // Perform a recursive call for the current segment.
47 let currentLength = dfs(currentStart, currentEnd - 1);
48
49 // Update the maximum length found so far.
50 maxLength = Math.max(maxLength, currentLength);
51
52 // Proceed to the next segment.
53 currentStart = currentEnd;
54 }
55
56 // Return the maximum length of all segments.
57 return maxLength;
58 };
59
60 // Call the recursive function with the full string as the initial segment.
61 return dfs(0, s.length - 1);
62}
63
Time and Space Complexity
The time complexity of the given code can vary significantly based on the input string s
and the integer k
. In the best-case scenario, where no character frequency is smaller than k
, the function dfs
runs once, and the time complexity is O(n)
where n
is the length of s
, since it only requires a single pass to count the character frequencies. However, in the worst case, the recursive function dfs
may be called multiple times due to characters not meeting the k
threshold and splitting the string into subproblems.
For every recursive call, a new Counter object is created, which takes O(m)
time, where m
is the size of the substring being examined. In the worst case, the code could potentially split after every character if all characters are unique or the k
requirement is high relative to the number of times a character appears. Thus, the recursive splitting can lead to the algorithm having a time complexity of O(n^2)
in the worst case for a string with mostly unique characters, since the Counter object is built for each call and there could be n
recursive calls with string splits at each character.
As for the space complexity, the main space usage comes from the recursive call stack and the Counter objects being created. The recursive call stack can go as deep as O(n)
in the worst case when the function splits at every character. Additionally, O(n)
space would be required for each Counter object in the worst case, each one existing during its own frame of the call stack. However, only one Counter exists at a time during a single path of execution, so it doesn’t multiply by the depth of recursion. Therefore, space complexity is also O(n)
as it is determined by the depth of the recursion stack and the space for the Counter which holds at most n
characters at the initial call.
Learn more about how to find time and space complexity quickly using problem constraints.
Which technique can we use to find the middle of a linked list?
Recommended Readings
https algomonster s3 us east 2 amazonaws com cover_photos dnd svg Divide and Conquer The main idea of divide and conquer is to split the main problem into two smaller components assuming that each one of the components had already been solved recursively and then try to solve the bigger
https algomonster s3 us east 2 amazonaws com cover_photos stack svg Sliding Window Maximum Monotonic Stack We have an array and a sliding window defined by a start index and an end index The sliding window moves from left of the array to right There are always k elements in
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Want a Structured Path to Master System Design Too? Don’t Miss This!