395. Longest Substring with At Least K Repeating Characters


Problem Description

The problem is about finding the longest substring within a given string s where the minimum frequency of each unique character in that substring is at least k. The frequency of a character means the number of times that character appears in the substring. A substring is a contiguous sequence of characters within a string. If there is no such substring that meets the criteria, the output should be 0.

To summarize:

  • Suppose we have a string "aaabb" and k = 3.
  • A valid substring would be "aaa" because the character 'a' appears at least 3 times.
  • The substring "aaabb" is not valid because the character 'b' only appears twice, which is less than k.

The goal is to return the length of the longest valid substring.

Intuition

The intuition behind the solution is to use a divide-and-conquer approach. This approach is suitable because we can check for the validity of character frequencies in substrings and can recursively apply this to smaller substrings if we find a character not meeting the frequency criteria.

Here's the step-by-step intuition for the provided code:

  1. We begin the solution by defining a recursive function dfs(l, r), which will be responsible for solving the problem for the substring of s starting at index l and ending at index r.

  2. We use a Counter from Python's collections module to count the frequencies of characters in the current substring s[l:r+1].

  3. We look for the first character (called split) in the substring whose frequency is less than k by iterating over the count items. If we don't find such a character, it means the entire current substring is valid, and we return its length.

  4. If we find such a character, we use it to split our current substring into smaller substrings on which we will perform the same check recursively.

  5. We iterate over the indices between l and r, looking for the split character. Once the split character is found, we perform a recursive call to the dfs function for the substring before the split character appears.

  6. We maintain a variable ans which keeps track of the length of the longest valid substring found so far and update it with the values returned from recursive calls.

  7. After the loop, ans will contain the length of the longest valid substring of the original substring s[l:r+1] where all characters' frequencies are at least k, and we return this value.

The overall function longestSubstring initiates the process by calling dfs(0, len(s) - 1) (i.e., for the whole string) and returns the result.

The beauty of this approach lies in its ability to narrow down the problem into smaller instances of the same problem. Thus, focusing on the character frequency condition, we further reduce the problem size by eliminating segments that do not meet the condition until a valid solution is reached or all possibilities are exhausted.

Learn more about Divide and Conquer and Sliding Window patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Problem: Given a list of tasks and a list of requirements, compute a sequence of tasks that can be performed, such that we complete every task once while satisfying all the requirements.

Which of the following method should we use to solve this problem?

Solution Approach

The solution to the given problem involves several key concepts of algorithm design and usage of data structures, which are elaborated below:

Data Structures:

  • Counter: From Python's collections module. It's a subclass of dictionary used to count hashable objects (characters in this case). It's used to compute the frequency of each character within the current substring.

Algorithm:

  • Divide and Conquer: This is a recursive strategy, where the problem is divided into subproblems, each of which is solved independently. In this case, whenever a character with frequency less than k is encountered, the problem is split and the process is repeated on each resulting substring.

Step-by-Step Implementation:

  1. Recursive Function: The dfs(l, r) function is defined to apply a divide-and-conquer strategy. It is responsible for returning the length of the longest valid substring within s[l:r+1].

  2. Counting Frequencies: Counter(s[l:r+1]) calculates the frequency of each character in the current substring.

  3. Split Character Check: split = next((c for c, v in cnt.items() if v < k), '') looks for a character that does not satisfy the frequency condition (frequency is less than k). If such a character is found, it will be used to split the substring. If no such character exists, the condition is satisfied for the entire substring, which is then a candidate for the answer.

  4. Iterate and Split: If a split character is found, the function iterates over s[l:r+1], looking for occurrences of this character. Each time it's found, that denotes a potential end of a valid substring, which is then processed recursively:

    • Skip past any consecutive split characters.
    • Find the next segment that doesn't contain the split character.
    • Recursively apply dfs on this segment to find the length of its longest valid substring.
  5. Updating the Answer: Within the iteration, the ans variable is updated with the maximum value returned by the recursive calls, ensuring that the maximum length valid substring is kept.

  6. Recursive Calls for Substrings: The recursive call t = dfs(i, j - 1) computes the length of the longest valid substring for the segment determined between the indices i to j-1.

  7. Combining Results: After the loop concludes, the variable ans holds the maximum length among all valid substrings found, which is returned by the recursive function dfs.

  8. Initialization: The recursive process is started off by the longestSubstring function with a call to dfs(0, len(s) - 1), which passes the entire string into the dfs function.

  9. Result: Finally, the maximal length is returned by the longestSubstring function, indicating the length of the longest valid substring found in the initial string s.

By combining this divide-and-conquer strategy with efficient count operations and recursion, this solution effectively breaks down the string into smaller segments, each of which can be validated independently, resulting in a powerful and elegant solution to the problem.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

What data structure does Breadth-first search typically uses to store intermediate states?

Example Walkthrough

Let's use the string "ababbc" and k = 2 as an example to illustrate the solution approach.

  1. Initialization: We call the recursive function dfs with the entire string, so dfs(0, 5).

  2. Counting Frequencies: The Counter is used to count characters, and we get {'a': 2, 'b': 3, 'c': 1}.

  3. Split Character Check: We check for a character with frequency less than k. The character 'c' has a frequency less than 2, so it is chosen as the split character.

  4. Iterate and Split: We use 'c' to split the string into valid substrings to search for our answer. The string "ababbc" is split into "ababb" and "", since there are no characters after 'c'.

  5. Recursive Calls for Substrings: Now we consider the substring "ababb":

    • We count the frequencies within "ababb" and get {'a': 2, 'b': 3}.
    • All characters meet the frequency criteria, so "ababb" is a valid substring.
    • We update ans to 5, which is the length of "ababb".
  6. Combining Results: At this point, ans is 5, and there are no more substrings to consider.

  7. Result: The longestSubstring function returns ans, which is 5, indicating the length of the longest valid substring found in the initial string "ababbc" with a minimum character frequency of 2.

This walkthrough shows how we use a divide-and-conquer strategy to split the initial problem into a simpler subproblem. By applying the same logic to smaller and smaller substrings, the algorithm finds the length of the longest substring that satisfies the minimum character frequency requirement.

Solution Implementation

1from collections import Counter
2
3class Solution:
4    def longestSubstring(self, s: str, k: int) -> int:
5        # Helper function that performs depth-first search to find the longest substring.
6        def dfs(left, right):
7            # Count the frequency of each character in the current substring.
8            counter = Counter(s[left:right + 1])
9
10            # Find the first character which has fewer occurrences than k
11            # and use it as a split point to divide the string.
12            split_char = next((char for char, freq in counter.items() if freq < k), None)
13
14            # If no such character that occurred fewer than k times was found,
15            # the whole substring [left:right+1] qualifies and its length is returned.
16            if not split_char:
17                return right - left + 1
18          
19            max_length = 0  # Initialize the max length of a valid substring.
20            i = left  # Start of the next substring to examine.
21          
22            # Iterate through the string to find and process all valid substrings.
23            while i <= right:
24                # Skip all occurrences of the split character.
25                while i <= right and s[i] == split_char:
26                    i += 1
27              
28                # If the end of the string is reached, break out of the loop.
29                if i > right:
30                    break
31
32                # Find the end of the next substring that doesn't contain the split character.
33                j = i
34                while j <= right and s[j] != split_char:
35                    j += 1
36
37                # Recursively call dfs on the next valid substring.
38                current_length = dfs(i, j - 1)
39              
40                # Update the maximum length found so far.
41                max_length = max(max_length, current_length)
42
43                # Start the next iteration from the end of the discovered substring.
44                i = j
45          
46            # Return the maximum length of a valid substring found.
47            return max_length
48
49        # Call our helper function with the entire string to start.
50        return dfs(0, len(s) - 1)
51
52# Example usage.
53solution = Solution()
54print(solution.longestSubstring("aaabb", 3))  # Output should be 3, as "aaa" is the longest valid substring.
55
1class Solution {
2    // Class instance variables to hold the input string and the minimum repeat-count 'k'.
3    private String inputString;
4    private int minRepeats;
5
6    // Public method to initiate the longest substring search.
7    public int longestSubstring(String s, int k) {
8        this.inputString = s;
9        this.minRepeats = k;
10        // Start the depth-first search for the longest substring.
11        return depthFirstSearch(0, s.length() - 1);
12    }
13
14    // A private helper method for the depth-first search to find the longest substring.
15    private int depthFirstSearch(int left, int right) {
16        // Array to count occurrences of each character.
17        int[] charCounts = new int[26];
18        for (int i = left; i <= right; ++i) {
19            charCounts[inputString.charAt(i) - 'a']++;
20        }
21
22        // Initialize the variable ‘splitChar’ that holds the character to split on.
23        char splitChar = 0;
24        for (int i = 0; i < 26; ++i) {
25            // Find the first character that occurs less than 'k' times, if any.
26            if (charCounts[i] > 0 && charCounts[i] < minRepeats) {
27                splitChar = (char) (i + 'a');
28                break;
29            }
30        }
31      
32        // If no split character is found (all characters occur at least k times), return the substring length.
33        if (splitChar == 0) {
34            return right - left + 1;
35        }
36
37        // Initialize the start index for the next segment of the search.
38        int start = left;
39        int maximumLength = 0;
40        while (start <= right) {
41            // Skip all occurrences of the split character.
42            while (start <= right && inputString.charAt(start) == splitChar) {
43                start++;
44            }
45            if (start > right) { // If there is no non-split character left, break.
46                break;
47            }
48          
49            // Find the next segment without split character.
50            int end = start;
51            while (end <= right && inputString.charAt(end) != splitChar) {
52                end++;
53            }
54          
55            // Calculate the maximum length for the current segment.
56            int segmentLength = depthFirstSearch(start, end - 1);
57          
58            // Update the maximum length if segment length is larger.
59            maximumLength = Math.max(maximumLength, segmentLength);
60          
61            // Move to the next potential segment.
62            start = end;
63        }
64      
65        // Return the maximum length found.
66        return maximumLength;
67    }
68}
69
1#include <algorithm>
2#include <string>
3#include <functional>
4
5class Solution {
6public:
7    int longestSubstring(std::string s, int k) {
8        // Define a lambda function to perform the depth-first search.
9        std::function<int(int, int)> dfs = [&](int start, int end) -> int {
10            int counts[26] = {0};
11            // Count occurrences of each character in the current substring.
12            for (int i = start; i <= end; ++i) {
13                counts[s[i] - 'a']++;
14            }
15          
16            // Find a character that occurs less than k times to split the problem.
17            char separator = 0;
18            for (int i = 0; i < 26; ++i) {
19                if (counts[i] > 0 && counts[i] < k) {
20                    separator = 'a' + i;
21                    break;
22                }
23            }
24
25            // If there is no such character, the whole substring is valid.
26            if (separator == 0) {
27                return end - start + 1;
28            }
29
30            // Initialize start index for the next segment.
31            int currentStart = start;
32            int maxLength = 0;
33
34            // Split the string and use recursion to find the longest valid substring.
35            while (currentStart <= end) {
36                // Skip the segment of invalid characters (separator).
37                while (currentStart <= end && s[currentStart] == separator) {
38                    ++currentStart;
39                }
40
41                // If we have reached the end of the string, break the loop.
42                if (currentStart > end) {
43                    break;
44                }
45
46                // Find the end index of the next segment that does not include the separator.
47                int currentEnd = currentStart;
48                while (currentEnd <= end && s[currentEnd] != separator) {
49                    ++currentEnd;
50                }
51
52                // Perform a recursion call for the current segment.
53                int currentLength = dfs(currentStart, currentEnd - 1);
54
55                // Update the maximum length found so far.
56                maxLength = std::max(maxLength, currentLength);
57
58                // Move to the next segment.
59                currentStart = currentEnd;
60            }
61
62            // Return the maximum length of all segments.
63            return maxLength;
64        };
65
66        // Call the lambda function with the full string as the initial segment.
67        return dfs(0, s.size() - 1);
68    }
69};
70
1function longestSubstring(s: string, k: number): number {
2    // Recursive helper function to perform a depth-first search for the longest substring.
3    const dfs = (start: number, end: number): number => {
4        const counts = new Array(26).fill(0);
5        // Count the occurrences of each character in the current substring.
6        for (let i = start; i <= end; ++i) {
7            counts[s.charCodeAt(i) - 'a'.charCodeAt(0)]++;
8        }
9
10        // Find a character that occurs fewer than 'k' times to split the problem.
11        let separator = 0;
12        for (let i = 0; i < 26; ++i) {
13            if (counts[i] > 0 && counts[i] < k) {
14                separator = 'a'.charCodeAt(0) + i;
15                break;
16            }
17        }
18
19        // If there is no such character, the whole substring is valid.
20        if (separator === 0) {
21            return end - start + 1;
22        }
23
24        // Initialize start index for the next segment.
25        let currentStart = start;
26        let maxLength = 0;
27
28        // Split the string and use recursion to find the longest valid substring.
29        while (currentStart <= end) {
30            // Skip the segment of characters that are invalid (those being the separator).
31            while (currentStart <= end && s.charCodeAt(currentStart) === separator) {
32                ++currentStart;
33            }
34
35            // If we have reached the end of the string, exit the loop.
36            if (currentStart > end) {
37                break;
38            }
39
40            // Find the end index of the next segment that does not include the separator.
41            let currentEnd = currentStart;
42            while (currentEnd <= end && s.charCodeAt(currentEnd) !== separator) {
43                ++currentEnd;
44            }
45
46            // Perform a recursive call for the current segment.
47            let currentLength = dfs(currentStart, currentEnd - 1);
48
49            // Update the maximum length found so far.
50            maxLength = Math.max(maxLength, currentLength);
51
52            // Proceed to the next segment.
53            currentStart = currentEnd;
54        }
55
56        // Return the maximum length of all segments.
57        return maxLength;
58    };
59
60    // Call the recursive function with the full string as the initial segment.
61    return dfs(0, s.length - 1);
62}
63
Not Sure What to Study? Take the 2-min Quiz:

Which of the following is the prefix sum of array [1, 2, 3, 4, 5]?

Time and Space Complexity

The time complexity of the given code can vary significantly based on the input string s and the integer k. In the best-case scenario, where no character frequency is smaller than k, the function dfs runs once, and the time complexity is O(n) where n is the length of s, since it only requires a single pass to count the character frequencies. However, in the worst case, the recursive function dfs may be called multiple times due to characters not meeting the k threshold and splitting the string into subproblems.

For every recursive call, a new Counter object is created, which takes O(m) time, where m is the size of the substring being examined. In the worst case, the code could potentially split after every character if all characters are unique or the k requirement is high relative to the number of times a character appears. Thus, the recursive splitting can lead to the algorithm having a time complexity of O(n^2) in the worst case for a string with mostly unique characters, since the Counter object is built for each call and there could be n recursive calls with string splits at each character.

As for the space complexity, the main space usage comes from the recursive call stack and the Counter objects being created. The recursive call stack can go as deep as O(n) in the worst case when the function splits at every character. Additionally, O(n) space would be required for each Counter object in the worst case, each one existing during its own frame of the call stack. However, only one Counter exists at a time during a single path of execution, so it doesn’t multiply by the depth of recursion. Therefore, space complexity is also O(n) as it is determined by the depth of the recursion stack and the space for the Counter which holds at most n characters at the initial call.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

In a binary min heap, the minimum element can be found in:


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫