Facebook Pixel

395. Longest Substring with At Least K Repeating Characters

Problem Description

You are given a string s and an integer k. Your task is to find the length of the longest possible substring where every character appears at least k times.

For example, if s = "aaabb" and k = 3, the longest valid substring would be "aaa" with length 3, since 'a' appears 3 times (meeting the requirement), while the full string wouldn't work because 'b' only appears 2 times (less than k=3).

The solution uses a divide-and-conquer approach. It first counts the frequency of all characters in the current substring. If any character appears fewer than k times, that character cannot be part of any valid substring, so it acts as a "split point". The algorithm then recursively checks all segments between these split points to find the maximum length valid substring.

The key insight is that any character appearing less than k times in a range cannot be included in a valid substring within that range. So the algorithm:

  1. Counts character frequencies in the current range [l, r]
  2. Finds a character that appears less than k times (if any)
  3. If no such character exists, the entire range is valid, return its length
  4. Otherwise, split the range at all occurrences of this character and recursively check each segment
  5. Return the maximum length found among all segments

If no valid substring exists (all characters appear fewer than k times), the function returns 0.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key observation is that if a character appears fewer than k times in our string, it can never be part of any valid substring. This character essentially "breaks" the string into smaller segments.

Think about it this way: if we have a string like "aaabccc" and k = 3, the character 'b' appears only once. No matter what substring we choose that includes 'b', it will violate our requirement. So 'b' acts as a natural divider - any valid substring must come entirely from the left side of 'b' or entirely from the right side.

This leads us to a divide-and-conquer strategy. We can identify these "invalid" characters that appear less than k times and use them as split points. Then we recursively examine each segment between these split points.

Why does this work? Because we're essentially eliminating impossible regions. Once we split by an invalid character, we know:

  1. That character cannot be in our answer
  2. Our answer must come from one of the remaining continuous segments
  3. Each segment is independent - we can find the best answer in each and take the maximum

The recursion naturally handles cases where after splitting, some segments might have new characters that now appear less than k times (since we're looking at a smaller range). The process continues until we either find a segment where all characters meet the frequency requirement, or we've split everything down to segments that are too small to be valid.

This approach is efficient because we're not checking every possible substring. Instead, we're intelligently pruning the search space by recognizing which characters make certain substrings impossible, allowing us to focus only on potentially valid regions.

Solution Approach

The implementation uses a recursive divide-and-conquer approach with a helper function dfs(l, r) that processes substrings from index l to r.

Step-by-step breakdown:

  1. Count character frequencies: For the current substring s[l:r+1], we use Counter to get the frequency of each character in the range.

  2. Find split character: We search for any character whose frequency is less than k. This is done using:

    split = next((c for c, v in cnt.items() if v < k), '')

    If no such character exists (returns empty string), the entire substring is valid.

  3. Base case: If no split character is found, all characters in the current range appear at least k times, so we return the length r - l + 1.

  4. Divide the problem: When a split character exists, we iterate through the range and skip over all occurrences of the split character while identifying continuous segments that don't contain it:

    while i <= r:
        while i <= r and s[i] == split:  # Skip split characters
            i += 1
        if i >= r:
            break
        j = i
        while j <= r and s[j] != split:  # Find end of valid segment
            j += 1
        t = dfs(i, j - 1)  # Recursively process this segment
        ans = max(ans, t)   # Keep track of maximum
        i = j
  5. Segment processing: For each segment between split characters:

    • Start at position i after skipping split characters
    • Find the end position j where the next split character appears
    • Recursively call dfs(i, j-1) on this segment
    • Update the maximum length found so far
  6. Return maximum: After processing all segments, return the maximum valid substring length found. If no valid segments exist, this will be 0.

The main function simply initiates the process by calling dfs(0, len(s) - 1) to examine the entire string.

Time Complexity: O(n) in the best case when no splitting is needed, but can be O(n²) in the worst case when we need to split at every level and recount characters.

Space Complexity: O(n) for the recursion stack in the worst case and the Counter objects created at each recursive level.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's trace through the algorithm with s = "aaabbbccd" and k = 3.

Initial call: dfs(0, 8) (examining "aaabbbccd")

  1. Count frequencies: {'a': 3, 'b': 3, 'c': 2, 'd': 1}
  2. Find split character: 'c' has frequency 2 (< 3), so we'll split on 'c'
  3. Split the string at all occurrences of 'c':
    • Segment 1: "aaabbb" (indices 0-5)
    • Skip 'c' at indices 6-7
    • Segment 2: "d" (index 8)

Recursive call 1: dfs(0, 5) (examining "aaabbb")

  1. Count frequencies: {'a': 3, 'b': 3}
  2. Find split character: None! Both 'a' and 'b' appear exactly 3 times
  3. Base case reached: return length = 5 - 0 + 1 = 6

Recursive call 2: dfs(8, 8) (examining "d")

  1. Count frequencies: {'d': 1}
  2. Find split character: 'd' has frequency 1 (< 3), so we'll split on 'd'
  3. After skipping all 'd' characters, no segments remain
  4. Return 0

Back to initial call:

  • Maximum of all segments: max(6, 0) = 6
  • Return 6

The answer is 6, corresponding to the substring "aaabbb" where both 'a' and 'b' appear exactly 3 times.

Key observations from this walkthrough:

  • Character 'c' (appearing 2 times) acted as our first split point, dividing the string into "aaabbb" and "d"
  • The segment "aaabbb" was valid since all its characters met the k=3 requirement
  • The segment "d" couldn't form any valid substring since 'd' appears only once
  • The algorithm efficiently identified the longest valid substring without checking all possible substrings

Solution Implementation

1from collections import Counter
2from typing import Optional
3
4class Solution:
5    def longestSubstring(self, s: str, k: int) -> int:
6        """
7        Find the length of the longest substring where every character appears at least k times.
8      
9        Args:
10            s: Input string
11            k: Minimum frequency requirement for each character
12          
13        Returns:
14            Length of the longest valid substring
15        """
16      
17        def divide_and_conquer(left: int, right: int) -> int:
18            """
19            Recursively find the longest valid substring in s[left:right+1].
20          
21            Strategy: Find characters that appear less than k times and use them as split points.
22            Then recursively check substrings between these split points.
23          
24            Args:
25                left: Start index (inclusive)
26                right: End index (inclusive)
27              
28            Returns:
29                Length of the longest valid substring in the given range
30            """
31            # Count character frequencies in the current substring
32            char_count = Counter(s[left:right + 1])
33          
34            # Find a character that appears less than k times (if any)
35            # This character cannot be part of any valid substring
36            invalid_char = next((char for char, freq in char_count.items() if freq < k), '')
37          
38            # Base case: if no invalid character found, entire substring is valid
39            if not invalid_char:
40                return right - left + 1
41          
42            # Split the substring using the invalid character as delimiter
43            current_index = left
44            max_length = 0
45          
46            while current_index <= right:
47                # Skip over consecutive occurrences of the invalid character
48                while current_index <= right and s[current_index] == invalid_char:
49                    current_index += 1
50              
51                # Check if we've reached the end
52                if current_index >= right:
53                    break
54              
55                # Find the next occurrence of the invalid character
56                next_invalid_index = current_index
57                while next_invalid_index <= right and s[next_invalid_index] != invalid_char:
58                    next_invalid_index += 1
59              
60                # Recursively check the substring between invalid characters
61                substring_length = divide_and_conquer(current_index, next_invalid_index - 1)
62                max_length = max(max_length, substring_length)
63              
64                # Move to the next segment
65                current_index = next_invalid_index
66          
67            return max_length
68      
69        # Start the divide and conquer process with the entire string
70        return divide_and_conquer(0, len(s) - 1)
71
1class Solution {
2    private String inputString;
3    private int minFrequency;
4
5    /**
6     * Finds the length of the longest substring where every character appears at least k times
7     * @param s The input string
8     * @param k The minimum frequency required for each character
9     * @return The length of the longest valid substring
10     */
11    public int longestSubstring(String s, int k) {
12        this.inputString = s;
13        this.minFrequency = k;
14        return divideAndConquer(0, s.length() - 1);
15    }
16
17    /**
18     * Recursively finds the longest valid substring using divide and conquer approach
19     * @param left The left boundary of the current substring (inclusive)
20     * @param right The right boundary of the current substring (inclusive)
21     * @return The length of the longest valid substring in the range [left, right]
22     */
23    private int divideAndConquer(int left, int right) {
24        // Count frequency of each character in the current range
25        int[] charFrequency = new int[26];
26        for (int i = left; i <= right; i++) {
27            charFrequency[inputString.charAt(i) - 'a']++;
28        }
29      
30        // Find a character that appears but doesn't meet the minimum frequency requirement
31        // This character will be used to split the string
32        char splitChar = 0;
33        for (int i = 0; i < 26; i++) {
34            if (charFrequency[i] > 0 && charFrequency[i] < minFrequency) {
35                splitChar = (char) (i + 'a');
36                break;
37            }
38        }
39      
40        // If no split character found, all characters meet the requirement
41        // Return the length of the current substring
42        if (splitChar == 0) {
43            return right - left + 1;
44        }
45      
46        // Split the string by the invalid character and recursively process each segment
47        int currentIndex = left;
48        int maxLength = 0;
49      
50        while (currentIndex <= right) {
51            // Skip all occurrences of the split character
52            while (currentIndex <= right && inputString.charAt(currentIndex) == splitChar) {
53                currentIndex++;
54            }
55          
56            // Check if we've reached the end
57            if (currentIndex > right) {
58                break;
59            }
60          
61            // Find the end of the current valid segment (before next split character)
62            int segmentEnd = currentIndex;
63            while (segmentEnd <= right && inputString.charAt(segmentEnd) != splitChar) {
64                segmentEnd++;
65            }
66          
67            // Recursively process this segment and update the maximum length
68            int segmentLength = divideAndConquer(currentIndex, segmentEnd - 1);
69            maxLength = Math.max(maxLength, segmentLength);
70          
71            // Move to the next segment
72            currentIndex = segmentEnd;
73        }
74      
75        return maxLength;
76    }
77}
78
1class Solution {
2public:
3    int longestSubstring(string s, int k) {
4        // Define a recursive function to find the longest valid substring in range [left, right]
5        function<int(int, int)> findLongestValidSubstring = [&](int left, int right) -> int {
6            // Count frequency of each character in the current range
7            int charFrequency[26] = {0};
8            for (int i = left; i <= right; ++i) {
9                charFrequency[s[i] - 'a']++;
10            }
11          
12            // Find a character that appears but doesn't meet the minimum frequency k
13            // This character will be used as a separator to split the string
14            char separator = 0;
15            for (int i = 0; i < 26; ++i) {
16                if (charFrequency[i] > 0 && charFrequency[i] < k) {
17                    separator = 'a' + i;
18                    break;
19                }
20            }
21          
22            // If no separator found, all characters meet the requirement
23            // Return the length of the current range
24            if (separator == 0) {
25                return right - left + 1;
26            }
27          
28            // Split the string by the separator and recursively process each valid segment
29            int currentIndex = left;
30            int maxLength = 0;
31          
32            while (currentIndex <= right) {
33                // Skip all consecutive separator characters
34                while (currentIndex <= right && s[currentIndex] == separator) {
35                    ++currentIndex;
36                }
37              
38                // Check if we've reached the end
39                if (currentIndex >= right) {
40                    break;
41                }
42              
43                // Find the end of the current valid segment (before next separator)
44                int segmentEnd = currentIndex;
45                while (segmentEnd <= right && s[segmentEnd] != separator) {
46                    ++segmentEnd;
47                }
48              
49                // Recursively find the longest valid substring in this segment
50                int segmentLength = findLongestValidSubstring(currentIndex, segmentEnd - 1);
51                maxLength = max(maxLength, segmentLength);
52              
53                // Move to the next segment
54                currentIndex = segmentEnd;
55            }
56          
57            return maxLength;
58        };
59      
60        // Start the recursive search from the entire string
61        return findLongestValidSubstring(0, s.size() - 1);
62    }
63};
64
1function longestSubstring(s: string, k: number): number {
2    /**
3     * Recursively finds the longest valid substring where every character appears at least k times
4     * @param left - Starting index of the substring range
5     * @param right - Ending index of the substring range
6     * @returns Length of the longest valid substring in the given range
7     */
8    const findLongestValidSubstring = (left: number, right: number): number => {
9        // Count frequency of each character in the current range
10        const charFrequency: number[] = new Array(26).fill(0);
11        for (let i = left; i <= right; i++) {
12            charFrequency[s.charCodeAt(i) - 'a'.charCodeAt(0)]++;
13        }
14      
15        // Find a character that appears but doesn't meet the minimum frequency k
16        // This character will be used as a separator to split the string
17        let separator: string | null = null;
18        for (let i = 0; i < 26; i++) {
19            if (charFrequency[i] > 0 && charFrequency[i] < k) {
20                separator = String.fromCharCode('a'.charCodeAt(0) + i);
21                break;
22            }
23        }
24      
25        // If no separator found, all characters meet the requirement
26        // Return the length of the current range
27        if (separator === null) {
28            return right - left + 1;
29        }
30      
31        // Split the string by the separator and recursively process each valid segment
32        let currentIndex = left;
33        let maxLength = 0;
34      
35        while (currentIndex <= right) {
36            // Skip all consecutive separator characters
37            while (currentIndex <= right && s[currentIndex] === separator) {
38                currentIndex++;
39            }
40          
41            // Check if we've reached the end
42            if (currentIndex > right) {
43                break;
44            }
45          
46            // Find the end of the current valid segment (before next separator)
47            let segmentEnd = currentIndex;
48            while (segmentEnd <= right && s[segmentEnd] !== separator) {
49                segmentEnd++;
50            }
51          
52            // Recursively find the longest valid substring in this segment
53            const segmentLength = findLongestValidSubstring(currentIndex, segmentEnd - 1);
54            maxLength = Math.max(maxLength, segmentLength);
55          
56            // Move to the next segment
57            currentIndex = segmentEnd;
58        }
59      
60        return maxLength;
61    };
62  
63    // Start the recursive search from the entire string
64    return findLongestValidSubstring(0, s.length - 1);
65}
66

Time and Space Complexity

Time Complexity: O(n^2) in the worst case, where n is the length of the string.

The algorithm uses a divide-and-conquer approach with recursion. In each recursive call:

  • We count character frequencies using Counter(s[l : r + 1]), which takes O(r - l + 1) time
  • We search for a split character with frequency less than k, which takes O(26) = O(1) for lowercase English letters
  • We iterate through the substring to find segments separated by the split character, which takes O(r - l + 1) time

The key insight is that each character can be a split point at most once across all recursion levels. When a character becomes a split point, it divides the problem and is never considered again in the subproblems. With at most 26 unique characters, the recursion depth is bounded by O(26) = O(1).

At each recursion level, we process each character in the current range exactly once. Since characters are partitioned and not revisited in the same context, the total work across all recursive calls at the same depth level is O(n). With constant recursion depth, the overall time complexity is O(n) * O(26) = O(n).

However, the string slicing operation s[l : r + 1] in the Counter creates a new string, which takes O(r - l + 1) time. In the worst case where no valid substring exists and we recursively process many small segments, this could lead to O(n^2) time complexity due to repeated substring creation.

Space Complexity: O(n) in the worst case.

The space complexity consists of:

  • Recursion stack depth: O(26) = O(1) in the worst case (limited by unique characters)
  • Counter storage at each recursion level: O(26) = O(1) for character counts
  • String slicing s[l : r + 1] creates new strings: O(r - l + 1) per call, which could be O(n) for the initial call
  • Total space: O(n) dominated by the string slicing operation

Common Pitfalls

1. Off-by-One Errors in Index Management

One of the most common pitfalls in this divide-and-conquer approach is mishandling indices when splitting the string. The algorithm needs to carefully track inclusive/exclusive boundaries when:

  • Finding the next occurrence of the split character
  • Passing indices to recursive calls
  • Calculating substring lengths

Example of the pitfall:

# Incorrect: Using wrong boundary
substring_length = divide_and_conquer(current_index, next_invalid_index)  # Should be next_invalid_index - 1

Solution: Always be consistent with inclusive boundaries. When next_invalid_index points to the split character, the recursive call should use next_invalid_index - 1 as the right boundary.

2. Inefficient Character Frequency Counting

Creating a new Counter for the entire substring at each recursive level can lead to redundant work, especially when the string is long and has many split points.

Example of the pitfall:

# This creates a new Counter and processes the entire substring every time
char_count = Counter(s[left:right + 1])

Solution: For better performance, consider maintaining a frequency map that gets updated as you traverse, or use a sliding window approach for certain variations of the problem:

def divide_and_conquer_optimized(left: int, right: int, char_count: Optional[Counter] = None) -> int:
    # Reuse or update char_count when possible
    if char_count is None:
        char_count = Counter(s[left:right + 1])
    # ... rest of the logic

3. Not Handling Edge Cases Properly

The algorithm might fail or return incorrect results for edge cases like:

  • Empty string
  • k = 0 or k = 1
  • String where all characters appear less than k times

Example of the pitfall:

# Missing edge case handling
def longestSubstring(self, s: str, k: int) -> int:
    # Directly calling divide_and_conquer without checking edge cases
    return divide_and_conquer(0, len(s) - 1)

Solution: Add proper edge case handling:

def longestSubstring(self, s: str, k: int) -> int:
    if not s or len(s) < k:
        return 0
    if k <= 1:
        return len(s)
    return divide_and_conquer(0, len(s) - 1)

4. Infinite Recursion or Stack Overflow

If the boundary conditions aren't handled correctly, especially when current_index >= right, the algorithm might enter infinite recursion.

Example of the pitfall:

# Incorrect boundary check
if current_index > right:  # Should be >= to handle single character at boundary
    break

Solution: Ensure proper boundary checks and add a base case for single characters or empty ranges:

def divide_and_conquer(left: int, right: int) -> int:
    # Add base case for invalid or single-character ranges
    if left > right:
        return 0
    if left == right:
        return 1 if k <= 1 else 0
    # ... rest of the logic
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which data structure is used in a depth first search?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More