1759. Count Number of Homogenous Substrings
Problem Description
You are given a string s
consisting of characters. Your task is to count the total number of homogenous substrings in the string.
A homogenous substring is a substring where all characters are identical. For example, in the string "aaa", the homogenous substrings are "a", "a", "a", "aa", "aa", and "aaa".
A substring is any contiguous sequence of characters taken from the original string.
Since the count can be very large, you need to return the result modulo 10^9 + 7
.
For example:
- If
s = "abbcccaa"
, the homogenous substrings are:- From "a": 1 substring ("a")
- From "bb": 3 substrings ("b", "b", "bb")
- From "ccc": 6 substrings ("c", "c", "c", "cc", "cc", "ccc")
- From "aa": 3 substrings ("a", "a", "aa")
- Total: 1 + 3 + 6 + 3 = 13
The key insight is that for a segment of n
identical consecutive characters, the number of homogenous substrings that can be formed is n * (n + 1) / 2
(which is the sum of numbers from 1 to n).
Intuition
When we look at a string like "aaabbc", we can observe that homogenous substrings can only come from continuous segments of identical characters. We can't form a homogenous substring by mixing characters from different segments.
Let's think about what happens when we have a segment of identical characters, say "aaa". How many homogenous substrings can we form?
- Substrings of length 1: "a", "a", "a" → 3 substrings
- Substrings of length 2: "aa", "aa" → 2 substrings
- Substrings of length 3: "aaa" → 1 substring
- Total: 3 + 2 + 1 = 6 substrings
This follows a pattern! For a segment of n
identical characters, we can form:
n
substrings of length 1n-1
substrings of length 2n-2
substrings of length 3- ... and so on
1
substring of length n
This gives us n + (n-1) + (n-2) + ... + 1
, which is the sum of first n
natural numbers = n * (n + 1) / 2
.
So our strategy becomes clear:
- Find each continuous segment of identical characters
- For each segment of length
cnt
, addcnt * (cnt + 1) / 2
to our answer - Move to the next segment and repeat
This way, we only need to traverse the string once, identifying the boundaries where characters change, making this an efficient linear time solution.
Learn more about Math patterns.
Solution Approach
The implementation uses a two-pointer technique to identify and process continuous segments of identical characters:
-
Initialize variables:
mod = 10^9 + 7
for the modulo operationi = 0
as the starting pointerans = 0
to accumulate the result
-
Main loop to process segments:
while i < n: j = i while j < n and s[j] == s[i]: j += 1
- Start with pointer
i
at the beginning of a segment - Use pointer
j
to find the end of the current segment of identical characters - The inner while loop continues as long as
s[j]
equalss[i]
, effectively finding all consecutive identical characters
- Start with pointer
-
Calculate contribution of current segment:
cnt = j - i ans += (1 + cnt) * cnt // 2 ans %= mod
cnt = j - i
gives the length of the current segment- Apply the formula
(1 + cnt) * cnt // 2
which equalscnt * (cnt + 1) / 2
- This formula calculates the sum
1 + 2 + 3 + ... + cnt
- Apply modulo to keep the answer within bounds
-
Move to next segment:
i = j
- Set
i
toj
to start processing the next segment - Since
j
is already at the first character of the next different segment, we don't miss any characters
- Set
The algorithm efficiently processes the entire string in a single pass with O(n)
time complexity and O(1)
space complexity, as we only use a few variables to track positions and counts.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with the string s = "abbcccaa"
.
Initial Setup:
mod = 10^9 + 7
i = 0
(starting position)ans = 0
(total count)n = 8
(length of string)
First Segment - "a" at position 0:
- Start:
i = 0
,s[i] = 'a'
- Set
j = 0
and expand whiles[j] == 'a'
j = 0
:s[0] = 'a'
✓, increment toj = 1
j = 1
:s[1] = 'b'
✗, stop
- Segment length:
cnt = j - i = 1 - 0 = 1
- Contribution:
(1 + 1) * 1 // 2 = 1
- Update:
ans = 0 + 1 = 1
- Move:
i = 1
Second Segment - "bb" at positions 1-2:
- Start:
i = 1
,s[i] = 'b'
- Set
j = 1
and expand whiles[j] == 'b'
j = 1
:s[1] = 'b'
✓, increment toj = 2
j = 2
:s[2] = 'b'
✓, increment toj = 3
j = 3
:s[3] = 'c'
✗, stop
- Segment length:
cnt = j - i = 3 - 1 = 2
- Contribution:
(1 + 2) * 2 // 2 = 3
- Update:
ans = 1 + 3 = 4
- Move:
i = 3
Third Segment - "ccc" at positions 3-5:
- Start:
i = 3
,s[i] = 'c'
- Set
j = 3
and expand whiles[j] == 'c'
j = 3
:s[3] = 'c'
✓, increment toj = 4
j = 4
:s[4] = 'c'
✓, increment toj = 5
j = 5
:s[5] = 'c'
✓, increment toj = 6
j = 6
:s[6] = 'a'
✗, stop
- Segment length:
cnt = j - i = 6 - 3 = 3
- Contribution:
(1 + 3) * 3 // 2 = 6
- Update:
ans = 4 + 6 = 10
- Move:
i = 6
Fourth Segment - "aa" at positions 6-7:
- Start:
i = 6
,s[i] = 'a'
- Set
j = 6
and expand whiles[j] == 'a'
j = 6
:s[6] = 'a'
✓, increment toj = 7
j = 7
:s[7] = 'a'
✓, increment toj = 8
j = 8
: out of bounds, stop
- Segment length:
cnt = j - i = 8 - 6 = 2
- Contribution:
(1 + 2) * 2 // 2 = 3
- Update:
ans = 10 + 3 = 13
- Move:
i = 8
Termination:
i = 8 >= n
, exit loop- Final answer:
13
The homogenous substrings counted are:
- From "a": 1 substring
- From "bb": 3 substrings ("b", "b", "bb")
- From "ccc": 6 substrings ("c", "c", "c", "cc", "cc", "ccc")
- From "aa": 3 substrings ("a", "a", "aa")
Total: 1 + 3 + 6 + 3 = 13
Solution Implementation
1class Solution:
2 def countHomogenous(self, s: str) -> int:
3 """
4 Count the number of homogeneous substrings in the given string.
5 A homogeneous substring consists of a single repeated character.
6
7 Args:
8 s: Input string to analyze
9
10 Returns:
11 Count of homogeneous substrings modulo 10^9 + 7
12 """
13 # Define the modulo value for preventing integer overflow
14 MOD = 10**9 + 7
15
16 # Initialize pointers and result counter
17 current_index = 0
18 string_length = len(s)
19 total_count = 0
20
21 # Process the string by finding consecutive identical characters
22 while current_index < string_length:
23 # Start of a new group of identical characters
24 group_start = current_index
25
26 # Find the end of the current group of identical characters
27 while current_index < string_length and s[current_index] == s[group_start]:
28 current_index += 1
29
30 # Calculate the length of the current homogeneous group
31 group_length = current_index - group_start
32
33 # Add the count of all possible substrings in this group
34 # For a group of length n, there are n*(n+1)/2 substrings
35 total_count += (1 + group_length) * group_length // 2
36
37 # Apply modulo to prevent overflow
38 total_count %= MOD
39
40 return total_count
41
1class Solution {
2 // Define modulo constant for preventing integer overflow
3 private static final int MOD = (int) 1e9 + 7;
4
5 /**
6 * Counts the number of homogenous substrings in the given string.
7 * A homogenous substring consists of only a single repeating character.
8 *
9 * @param s the input string
10 * @return the total count of homogenous substrings modulo 10^9 + 7
11 */
12 public int countHomogenous(String s) {
13 int length = s.length();
14 long totalCount = 0;
15
16 // Use two pointers to identify consecutive groups of same characters
17 int start = 0;
18 while (start < length) {
19 int end = start;
20
21 // Expand end pointer while characters are the same
22 while (end < length && s.charAt(end) == s.charAt(start)) {
23 end++;
24 }
25
26 // Calculate the length of the current homogenous group
27 int groupLength = end - start;
28
29 // For a group of n identical characters, the number of homogenous substrings
30 // is n + (n-1) + (n-2) + ... + 1 = n * (n + 1) / 2
31 totalCount += (long) (1 + groupLength) * groupLength / 2;
32
33 // Apply modulo to prevent overflow
34 totalCount %= MOD;
35
36 // Move start pointer to the beginning of the next group
37 start = end;
38 }
39
40 return (int) totalCount;
41 }
42}
43
1class Solution {
2public:
3 const int MOD = 1e9 + 7;
4
5 int countHomogenous(string s) {
6 int n = s.size();
7 long totalCount = 0;
8
9 // Process each homogeneous substring
10 for (int start = 0, end = 0; start < n; start = end) {
11 // Find the end of current homogeneous substring
12 end = start;
13 while (end < n && s[end] == s[start]) {
14 ++end;
15 }
16
17 // Calculate length of current homogeneous substring
18 int length = end - start;
19
20 // Add count of all substrings within this homogeneous substring
21 // For a substring of length n, number of substrings = n*(n+1)/2
22 totalCount += 1LL * (1 + length) * length / 2;
23 totalCount %= MOD;
24 }
25
26 return totalCount;
27 }
28};
29
1/**
2 * Counts the number of homogenous substrings in a given string.
3 * A homogenous substring consists of only one unique character.
4 *
5 * @param s - The input string to analyze
6 * @returns The total count of homogenous substrings modulo 10^9 + 7
7 */
8function countHomogenous(s: string): number {
9 // Define modulo constant to prevent integer overflow
10 const MOD: number = 1e9 + 7;
11
12 // Get the length of the input string
13 const length: number = s.length;
14
15 // Initialize the answer counter
16 let totalCount: number = 0;
17
18 // Use two pointers to track homogenous segments
19 // startIndex: marks the beginning of current homogenous segment
20 // currentIndex: iterates through the string
21 let startIndex: number = 0;
22
23 for (let currentIndex: number = 0; currentIndex < length; currentIndex++) {
24 // When characters don't match, reset the start of homogenous segment
25 if (s[startIndex] !== s[currentIndex]) {
26 startIndex = currentIndex;
27 }
28
29 // Add the count of homogenous substrings ending at currentIndex
30 // The number of such substrings equals (currentIndex - startIndex + 1)
31 // This represents all substrings from startIndex to currentIndex
32 totalCount = (totalCount + currentIndex - startIndex + 1) % MOD;
33 }
34
35 return totalCount;
36}
37
Time and Space Complexity
Time Complexity: O(n)
, where n
is the length of the string s
.
The algorithm uses a two-pointer approach with variables i
and j
. While there are nested while loops, each character in the string is visited exactly once:
- The outer while loop iterates through the string with pointer
i
- The inner while loop advances pointer
j
to find the end of each homogeneous substring - After processing a homogeneous substring,
i
jumps directly toj
, skipping all processed characters - Each character is examined once when
j
passes over it, resulting in a total ofn
character comparisons
The arithmetic operations inside the loop (calculating cnt
, the sum formula (1 + cnt) * cnt // 2
, and the modulo operation) all take O(1)
time.
Space Complexity: O(1)
The algorithm uses only a constant amount of extra space:
- Integer variables:
mod
,i
,j
,n
,ans
,cnt
- No additional data structures are created
- The space used does not depend on the input size
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Integer Overflow When Calculating Substring Count
The Pitfall:
When calculating (1 + group_length) * group_length // 2
, the intermediate multiplication (1 + group_length) * group_length
can cause integer overflow before the division, especially for very long segments of identical characters. Even though Python handles arbitrary precision integers, in languages like Java or C++, this would cause incorrect results. Additionally, applying modulo only at the end could theoretically lead to very large intermediate values.
Example of the Issue:
# Problematic approach - accumulates large values before modulo total_count += (1 + group_length) * group_length // 2 total_count %= MOD # Modulo applied after potentially large addition
Solution: Apply modulo operation more frequently to keep numbers manageable:
# Better approach - apply modulo after each calculation substring_count = (1 + group_length) * group_length // 2 total_count = (total_count + substring_count) % MOD
2. Off-by-One Error in Formula Application
The Pitfall:
Confusing the formula for counting substrings. Some might incorrectly use n * (n - 1) / 2
or (n + 1) * n / 2
with wrong interpretation of what n
represents.
Common Mistakes:
# Wrong: Using n*(n-1)/2 (formula for combinations, not substring count) total_count += group_length * (group_length - 1) // 2 # Wrong: Misunderstanding the formula total_count += group_length * (group_length + 1) // 2 + 1
Solution:
Remember that for n
consecutive identical characters, the number of homogeneous substrings is exactly n * (n + 1) / 2
. This counts:
n
substrings of length 1n-1
substrings of length 2- ...
1
substring of length n
Total = n + (n-1) + ... + 1 = n * (n + 1) / 2
3. Incorrect Pointer Movement
The Pitfall: Incrementing the outer loop pointer incorrectly, causing characters to be skipped or processed multiple times.
Common Mistake:
while current_index < string_length: group_start = current_index while current_index < string_length and s[current_index] == s[group_start]: current_index += 1 # ... calculate count ... current_index += 1 # WRONG: This skips the first character of the next group!
Solution:
Don't increment current_index
after the inner loop since it's already pointing to the start of the next different character group:
while current_index < string_length: group_start = current_index while current_index < string_length and s[current_index] == s[group_start]: current_index += 1 # ... calculate count ... # No need to increment current_index here - it's already at the right position
Which of the following is a min heap?
Recommended Readings
Math for Technical Interviews How much math do I need to know for technical interviews The short answer is about high school level math Computer science is often associated with math and some universities even place their computer science department under the math faculty However the reality is that you
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Don’t Miss This!