Facebook Pixel

1759. Count Number of Homogenous Substrings

MediumMathString
Leetcode Link

Problem Description

You are given a string s consisting of characters. Your task is to count the total number of homogenous substrings in the string.

A homogenous substring is a substring where all characters are identical. For example, in the string "aaa", the homogenous substrings are "a", "a", "a", "aa", "aa", and "aaa".

A substring is any contiguous sequence of characters taken from the original string.

Since the count can be very large, you need to return the result modulo 10^9 + 7.

For example:

  • If s = "abbcccaa", the homogenous substrings are:
    • From "a": 1 substring ("a")
    • From "bb": 3 substrings ("b", "b", "bb")
    • From "ccc": 6 substrings ("c", "c", "c", "cc", "cc", "ccc")
    • From "aa": 3 substrings ("a", "a", "aa")
    • Total: 1 + 3 + 6 + 3 = 13

The key insight is that for a segment of n identical consecutive characters, the number of homogenous substrings that can be formed is n * (n + 1) / 2 (which is the sum of numbers from 1 to n).

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

When we look at a string like "aaabbc", we can observe that homogenous substrings can only come from continuous segments of identical characters. We can't form a homogenous substring by mixing characters from different segments.

Let's think about what happens when we have a segment of identical characters, say "aaa". How many homogenous substrings can we form?

  • Substrings of length 1: "a", "a", "a" → 3 substrings
  • Substrings of length 2: "aa", "aa" → 2 substrings
  • Substrings of length 3: "aaa" → 1 substring
  • Total: 3 + 2 + 1 = 6 substrings

This follows a pattern! For a segment of n identical characters, we can form:

  • n substrings of length 1
  • n-1 substrings of length 2
  • n-2 substrings of length 3
  • ... and so on
  • 1 substring of length n

This gives us n + (n-1) + (n-2) + ... + 1, which is the sum of first n natural numbers = n * (n + 1) / 2.

So our strategy becomes clear:

  1. Find each continuous segment of identical characters
  2. For each segment of length cnt, add cnt * (cnt + 1) / 2 to our answer
  3. Move to the next segment and repeat

This way, we only need to traverse the string once, identifying the boundaries where characters change, making this an efficient linear time solution.

Learn more about Math patterns.

Solution Approach

The implementation uses a two-pointer technique to identify and process continuous segments of identical characters:

  1. Initialize variables:

    • mod = 10^9 + 7 for the modulo operation
    • i = 0 as the starting pointer
    • ans = 0 to accumulate the result
  2. Main loop to process segments:

    while i < n:
        j = i
        while j < n and s[j] == s[i]:
            j += 1
    • Start with pointer i at the beginning of a segment
    • Use pointer j to find the end of the current segment of identical characters
    • The inner while loop continues as long as s[j] equals s[i], effectively finding all consecutive identical characters
  3. Calculate contribution of current segment:

    cnt = j - i
    ans += (1 + cnt) * cnt // 2
    ans %= mod
    • cnt = j - i gives the length of the current segment
    • Apply the formula (1 + cnt) * cnt // 2 which equals cnt * (cnt + 1) / 2
    • This formula calculates the sum 1 + 2 + 3 + ... + cnt
    • Apply modulo to keep the answer within bounds
  4. Move to next segment:

    i = j
    • Set i to j to start processing the next segment
    • Since j is already at the first character of the next different segment, we don't miss any characters

The algorithm efficiently processes the entire string in a single pass with O(n) time complexity and O(1) space complexity, as we only use a few variables to track positions and counts.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with the string s = "abbcccaa".

Initial Setup:

  • mod = 10^9 + 7
  • i = 0 (starting position)
  • ans = 0 (total count)
  • n = 8 (length of string)

First Segment - "a" at position 0:

  • Start: i = 0, s[i] = 'a'
  • Set j = 0 and expand while s[j] == 'a'
    • j = 0: s[0] = 'a' ✓, increment to j = 1
    • j = 1: s[1] = 'b' ✗, stop
  • Segment length: cnt = j - i = 1 - 0 = 1
  • Contribution: (1 + 1) * 1 // 2 = 1
  • Update: ans = 0 + 1 = 1
  • Move: i = 1

Second Segment - "bb" at positions 1-2:

  • Start: i = 1, s[i] = 'b'
  • Set j = 1 and expand while s[j] == 'b'
    • j = 1: s[1] = 'b' ✓, increment to j = 2
    • j = 2: s[2] = 'b' ✓, increment to j = 3
    • j = 3: s[3] = 'c' ✗, stop
  • Segment length: cnt = j - i = 3 - 1 = 2
  • Contribution: (1 + 2) * 2 // 2 = 3
  • Update: ans = 1 + 3 = 4
  • Move: i = 3

Third Segment - "ccc" at positions 3-5:

  • Start: i = 3, s[i] = 'c'
  • Set j = 3 and expand while s[j] == 'c'
    • j = 3: s[3] = 'c' ✓, increment to j = 4
    • j = 4: s[4] = 'c' ✓, increment to j = 5
    • j = 5: s[5] = 'c' ✓, increment to j = 6
    • j = 6: s[6] = 'a' ✗, stop
  • Segment length: cnt = j - i = 6 - 3 = 3
  • Contribution: (1 + 3) * 3 // 2 = 6
  • Update: ans = 4 + 6 = 10
  • Move: i = 6

Fourth Segment - "aa" at positions 6-7:

  • Start: i = 6, s[i] = 'a'
  • Set j = 6 and expand while s[j] == 'a'
    • j = 6: s[6] = 'a' ✓, increment to j = 7
    • j = 7: s[7] = 'a' ✓, increment to j = 8
    • j = 8: out of bounds, stop
  • Segment length: cnt = j - i = 8 - 6 = 2
  • Contribution: (1 + 2) * 2 // 2 = 3
  • Update: ans = 10 + 3 = 13
  • Move: i = 8

Termination:

  • i = 8 >= n, exit loop
  • Final answer: 13

The homogenous substrings counted are:

  • From "a": 1 substring
  • From "bb": 3 substrings ("b", "b", "bb")
  • From "ccc": 6 substrings ("c", "c", "c", "cc", "cc", "ccc")
  • From "aa": 3 substrings ("a", "a", "aa")

Total: 1 + 3 + 6 + 3 = 13

Solution Implementation

1class Solution:
2    def countHomogenous(self, s: str) -> int:
3        """
4        Count the number of homogeneous substrings in the given string.
5        A homogeneous substring consists of a single repeated character.
6      
7        Args:
8            s: Input string to analyze
9          
10        Returns:
11            Count of homogeneous substrings modulo 10^9 + 7
12        """
13        # Define the modulo value for preventing integer overflow
14        MOD = 10**9 + 7
15      
16        # Initialize pointers and result counter
17        current_index = 0
18        string_length = len(s)
19        total_count = 0
20      
21        # Process the string by finding consecutive identical characters
22        while current_index < string_length:
23            # Start of a new group of identical characters
24            group_start = current_index
25          
26            # Find the end of the current group of identical characters
27            while current_index < string_length and s[current_index] == s[group_start]:
28                current_index += 1
29          
30            # Calculate the length of the current homogeneous group
31            group_length = current_index - group_start
32          
33            # Add the count of all possible substrings in this group
34            # For a group of length n, there are n*(n+1)/2 substrings
35            total_count += (1 + group_length) * group_length // 2
36          
37            # Apply modulo to prevent overflow
38            total_count %= MOD
39      
40        return total_count
41
1class Solution {
2    // Define modulo constant for preventing integer overflow
3    private static final int MOD = (int) 1e9 + 7;
4
5    /**
6     * Counts the number of homogenous substrings in the given string.
7     * A homogenous substring consists of only a single repeating character.
8     * 
9     * @param s the input string
10     * @return the total count of homogenous substrings modulo 10^9 + 7
11     */
12    public int countHomogenous(String s) {
13        int length = s.length();
14        long totalCount = 0;
15      
16        // Use two pointers to identify consecutive groups of same characters
17        int start = 0;
18        while (start < length) {
19            int end = start;
20          
21            // Expand end pointer while characters are the same
22            while (end < length && s.charAt(end) == s.charAt(start)) {
23                end++;
24            }
25          
26            // Calculate the length of the current homogenous group
27            int groupLength = end - start;
28          
29            // For a group of n identical characters, the number of homogenous substrings
30            // is n + (n-1) + (n-2) + ... + 1 = n * (n + 1) / 2
31            totalCount += (long) (1 + groupLength) * groupLength / 2;
32          
33            // Apply modulo to prevent overflow
34            totalCount %= MOD;
35          
36            // Move start pointer to the beginning of the next group
37            start = end;
38        }
39      
40        return (int) totalCount;
41    }
42}
43
1class Solution {
2public:
3    const int MOD = 1e9 + 7;
4
5    int countHomogenous(string s) {
6        int n = s.size();
7        long totalCount = 0;
8      
9        // Process each homogeneous substring
10        for (int start = 0, end = 0; start < n; start = end) {
11            // Find the end of current homogeneous substring
12            end = start;
13            while (end < n && s[end] == s[start]) {
14                ++end;
15            }
16          
17            // Calculate length of current homogeneous substring
18            int length = end - start;
19          
20            // Add count of all substrings within this homogeneous substring
21            // For a substring of length n, number of substrings = n*(n+1)/2
22            totalCount += 1LL * (1 + length) * length / 2;
23            totalCount %= MOD;
24        }
25      
26        return totalCount;
27    }
28};
29
1/**
2 * Counts the number of homogenous substrings in a given string.
3 * A homogenous substring consists of only one unique character.
4 * 
5 * @param s - The input string to analyze
6 * @returns The total count of homogenous substrings modulo 10^9 + 7
7 */
8function countHomogenous(s: string): number {
9    // Define modulo constant to prevent integer overflow
10    const MOD: number = 1e9 + 7;
11  
12    // Get the length of the input string
13    const length: number = s.length;
14  
15    // Initialize the answer counter
16    let totalCount: number = 0;
17  
18    // Use two pointers to track homogenous segments
19    // startIndex: marks the beginning of current homogenous segment
20    // currentIndex: iterates through the string
21    let startIndex: number = 0;
22  
23    for (let currentIndex: number = 0; currentIndex < length; currentIndex++) {
24        // When characters don't match, reset the start of homogenous segment
25        if (s[startIndex] !== s[currentIndex]) {
26            startIndex = currentIndex;
27        }
28      
29        // Add the count of homogenous substrings ending at currentIndex
30        // The number of such substrings equals (currentIndex - startIndex + 1)
31        // This represents all substrings from startIndex to currentIndex
32        totalCount = (totalCount + currentIndex - startIndex + 1) % MOD;
33    }
34  
35    return totalCount;
36}
37

Time and Space Complexity

Time Complexity: O(n), where n is the length of the string s.

The algorithm uses a two-pointer approach with variables i and j. While there are nested while loops, each character in the string is visited exactly once:

  • The outer while loop iterates through the string with pointer i
  • The inner while loop advances pointer j to find the end of each homogeneous substring
  • After processing a homogeneous substring, i jumps directly to j, skipping all processed characters
  • Each character is examined once when j passes over it, resulting in a total of n character comparisons

The arithmetic operations inside the loop (calculating cnt, the sum formula (1 + cnt) * cnt // 2, and the modulo operation) all take O(1) time.

Space Complexity: O(1)

The algorithm uses only a constant amount of extra space:

  • Integer variables: mod, i, j, n, ans, cnt
  • No additional data structures are created
  • The space used does not depend on the input size

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Integer Overflow When Calculating Substring Count

The Pitfall: When calculating (1 + group_length) * group_length // 2, the intermediate multiplication (1 + group_length) * group_length can cause integer overflow before the division, especially for very long segments of identical characters. Even though Python handles arbitrary precision integers, in languages like Java or C++, this would cause incorrect results. Additionally, applying modulo only at the end could theoretically lead to very large intermediate values.

Example of the Issue:

# Problematic approach - accumulates large values before modulo
total_count += (1 + group_length) * group_length // 2
total_count %= MOD  # Modulo applied after potentially large addition

Solution: Apply modulo operation more frequently to keep numbers manageable:

# Better approach - apply modulo after each calculation
substring_count = (1 + group_length) * group_length // 2
total_count = (total_count + substring_count) % MOD

2. Off-by-One Error in Formula Application

The Pitfall: Confusing the formula for counting substrings. Some might incorrectly use n * (n - 1) / 2 or (n + 1) * n / 2 with wrong interpretation of what n represents.

Common Mistakes:

# Wrong: Using n*(n-1)/2 (formula for combinations, not substring count)
total_count += group_length * (group_length - 1) // 2

# Wrong: Misunderstanding the formula
total_count += group_length * (group_length + 1) // 2 + 1

Solution: Remember that for n consecutive identical characters, the number of homogeneous substrings is exactly n * (n + 1) / 2. This counts:

  • n substrings of length 1
  • n-1 substrings of length 2
  • ...
  • 1 substring of length n

Total = n + (n-1) + ... + 1 = n * (n + 1) / 2

3. Incorrect Pointer Movement

The Pitfall: Incrementing the outer loop pointer incorrectly, causing characters to be skipped or processed multiple times.

Common Mistake:

while current_index < string_length:
    group_start = current_index
    while current_index < string_length and s[current_index] == s[group_start]:
        current_index += 1
    # ... calculate count ...
    current_index += 1  # WRONG: This skips the first character of the next group!

Solution: Don't increment current_index after the inner loop since it's already pointing to the start of the next different character group:

while current_index < string_length:
    group_start = current_index
    while current_index < string_length and s[current_index] == s[group_start]:
        current_index += 1
    # ... calculate count ...
    # No need to increment current_index here - it's already at the right position
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which of the following is a min heap?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More