2083. Substrings That Begin and End With the Same Letter 🔒
Problem Description
You need to find the total number of substrings in a given string where the first and last characters are the same.
Given a string s
that contains only lowercase English letters (indexed starting from 0), count all possible substrings that begin and end with the same character.
A substring is defined as any contiguous sequence of characters within the string that is not empty.
For example, if s = "aba"
:
- Single character substrings:
"a"
,"b"
,"a"
(all begin and end with the same character) - Two character substrings:
"ab"
,"ba"
(neither begins and ends with the same character) - Three character substring:
"aba"
(begins and ends with'a'
) - Total count: 4
The solution uses a counting approach: for each character encountered while traversing the string, it keeps track of how many times that character has appeared so far. Each occurrence of a character can form valid substrings with all its previous occurrences (including itself as a single-character substring), so the count for that character is added to the answer.
Intuition
When we need to count substrings that begin and end with the same character, let's think about what happens when we encounter a character while traversing the string.
Consider a character 'a'
appearing at different positions in the string. If 'a'
appears at positions i1
, i2
, i3
, ..., then:
- The
'a'
at positioni1
forms 1 valid substring (itself) - The
'a'
at positioni2
forms 2 valid substrings: itself and the substring fromi1
toi2
- The
'a'
at positioni3
forms 3 valid substrings: itself, the substring fromi1
toi3
, and the substring fromi2
toi3
The pattern becomes clear: when we encounter a character c
that has already appeared k
times before, this new occurrence can form valid substrings with each of the previous k
occurrences, plus itself as a single-character substring. That's k + 1
new valid substrings.
This leads to the key insight: instead of checking all possible substrings (which would be O(n²)), we can simply keep a running count of how many times each character has appeared. When we see a character, we increment its count and add that count to our answer. This works because:
- The first occurrence of a character contributes 1 substring
- The second occurrence contributes 2 substrings
- The third occurrence contributes 3 substrings
- And so on...
This transforms the problem into a simple counting exercise that can be solved in a single pass through the string with O(n) time complexity.
Learn more about Math and Prefix Sum patterns.
Solution Approach
The implementation uses a hash table (or an array of size 26 for lowercase English letters) to track the count of each character as we traverse the string.
Here's how the algorithm works step by step:
-
Initialize data structures: Create a counter
cnt
(using Python'sCounter
class or a hash table) to store the frequency of each character encountered so far. Initialize the answer variableans
to 0. -
Single pass traversal: Iterate through each character
c
in the strings
. -
Update character count: For each character
c
:- Increment its count:
cnt[c] += 1
- This new count represents how many times we've seen this character up to the current position
- Increment its count:
-
Add to answer: Add
cnt[c]
to the answer. This is the crucial step because:- If
cnt[c] = 1
(first occurrence), it contributes 1 substring (itself) - If
cnt[c] = 2
(second occurrence), it can form substrings with the first occurrence and itself, contributing 2 more substrings - If
cnt[c] = k
, it can form substrings with allk-1
previous occurrences plus itself, contributingk
substrings
- If
-
Return result: After processing all characters, return
ans
.
The beauty of this approach is its efficiency:
- Time Complexity: O(n) where n is the length of the string, as we only need one pass
- Space Complexity: O(1) since we only need to store counts for at most 26 lowercase letters (or O(k) where k is the size of the character set)
For example, with s = "abcab"
:
'a'
at index 0:cnt['a'] = 1
, add 1 to answer'b'
at index 1:cnt['b'] = 1
, add 1 to answer'c'
at index 2:cnt['c'] = 1
, add 1 to answer'a'
at index 3:cnt['a'] = 2
, add 2 to answer (substrings "a" and "abca")'b'
at index 4:cnt['b'] = 2
, add 2 to answer (substrings "b" and "bcab")- Final answer: 1 + 1 + 1 + 2 + 2 = 7
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the algorithm with the string s = "ababa"
.
We'll track our character counts and running answer as we process each character:
Initial state:
cnt = {}
(empty counter)ans = 0
Step 1: Process 'a' at index 0
- Increment count:
cnt['a'] = 1
- Add to answer:
ans = 0 + 1 = 1
- Valid substrings ending here:
"a"
(1 substring)
Step 2: Process 'b' at index 1
- Increment count:
cnt['b'] = 1
- Add to answer:
ans = 1 + 1 = 2
- Valid substrings ending here:
"b"
(1 substring)
Step 3: Process 'a' at index 2
- Increment count:
cnt['a'] = 2
- Add to answer:
ans = 2 + 2 = 4
- Valid substrings ending here:
"a"
and"aba"
(2 substrings)- The 'a' at index 2 can form substrings with the 'a' at index 0 and itself
Step 4: Process 'b' at index 3
- Increment count:
cnt['b'] = 2
- Add to answer:
ans = 4 + 2 = 6
- Valid substrings ending here:
"b"
and"bab"
(2 substrings)- The 'b' at index 3 can form substrings with the 'b' at index 1 and itself
Step 5: Process 'a' at index 4
- Increment count:
cnt['a'] = 3
- Add to answer:
ans = 6 + 3 = 9
- Valid substrings ending here:
"a"
,"aba"
, and"ababa"
(3 substrings)- The 'a' at index 4 can form substrings with both 'a's at indices 0 and 2, plus itself
Final Result: 9
All valid substrings that begin and end with the same character:
"a"
(index 0)"b"
(index 1)"a"
(index 2)"aba"
(indices 0-2)"b"
(index 3)"bab"
(indices 1-3)"a"
(index 4)"aba"
(indices 2-4)"ababa"
(indices 0-4)
The key insight: when we encounter the third 'a', it contributes 3 new valid substrings because it can pair with each of the 2 previous 'a's to form substrings, plus count itself as a single-character substring.
Solution Implementation
1from collections import Counter
2
3class Solution:
4 def numberOfSubstrings(self, s: str) -> int:
5 # Counter to track frequency of each character seen so far
6 char_count = Counter()
7 # Total count of substrings
8 total_substrings = 0
9
10 # Iterate through each character in the string
11 for char in s:
12 # Increment the count for current character
13 char_count[char] += 1
14
15 # Add the count of current character to total
16 # This represents all substrings ending at current position
17 # that start from any previous occurrence of the same character
18 total_substrings += char_count[char]
19
20 return total_substrings
21
1class Solution {
2 /**
3 * Counts the total number of substrings that contain at least one occurrence
4 * of each character that appears in them.
5 *
6 * The algorithm works by counting how many substrings end at each position.
7 * For each character, the number of substrings ending at that character
8 * equals the number of times that character has appeared so far (including current).
9 *
10 * @param s the input string
11 * @return the total count of valid substrings
12 */
13 public long numberOfSubstrings(String s) {
14 // Array to store the frequency count of each letter (a-z)
15 int[] characterCount = new int[26];
16
17 // Variable to accumulate the total number of valid substrings
18 long totalSubstrings = 0;
19
20 // Iterate through each character in the string
21 for (char currentChar : s.toCharArray()) {
22 // Convert character to index (0-25) by subtracting 'a'
23 int charIndex = currentChar - 'a';
24
25 // Increment the count for this character and add to total
26 // The pre-increment returns the new count value
27 // This represents all substrings ending at current position
28 // that start from any previous occurrence of the same character
29 characterCount[charIndex]++;
30 totalSubstrings += characterCount[charIndex];
31 }
32
33 return totalSubstrings;
34 }
35}
36
1class Solution {
2public:
3 long long numberOfSubstrings(string s) {
4 // Array to store frequency count of each character (a-z)
5 int charCount[26] = {0};
6
7 // Total count of substrings where all characters are the same
8 long long totalSubstrings = 0;
9
10 // Iterate through each character in the string
11 for (char& currentChar : s) {
12 // Increment the count for current character and add to result
13 // For each occurrence of a character, it can form n substrings
14 // with previous occurrences of the same character
15 // where n is the new count after incrementing
16 int charIndex = currentChar - 'a';
17 charCount[charIndex]++;
18 totalSubstrings += charCount[charIndex];
19 }
20
21 return totalSubstrings;
22 }
23};
24
1/**
2 * Counts the total number of substrings that end at each position in the string.
3 * For each character, it counts how many times that character has appeared so far,
4 * which represents the number of substrings ending at the current position
5 * that start with any previous occurrence of the same character (including itself).
6 *
7 * @param s - The input string to analyze
8 * @returns The total count of all substrings
9 */
10function numberOfSubstrings(s: string): number {
11 // Object to store the frequency count of each character seen so far
12 const characterCount: Record<string, number> = {};
13
14 // Accumulator for the total number of substrings
15 let totalSubstrings = 0;
16
17 // Iterate through each character in the string
18 for (const currentChar of s) {
19 // Increment the count for the current character (initialize to 0 if first occurrence)
20 characterCount[currentChar] = (characterCount[currentChar] || 0) + 1;
21
22 // Add the current character's count to the total
23 // This represents all substrings ending at this position that begin with this character
24 totalSubstrings += characterCount[currentChar];
25 }
26
27 return totalSubstrings;
28}
29
Time and Space Complexity
The time complexity is O(n)
, where n
is the length of the string s
. The algorithm iterates through each character in the string exactly once, and for each character, it performs constant-time operations (incrementing the counter and adding to the answer).
The space complexity is O(|Σ|)
, where Σ
is the character set. The Counter
object stores at most one entry for each unique character that appears in the string. Since the problem deals with strings that could contain lowercase English letters, the maximum number of unique characters is 26, so |Σ| = 26
. Therefore, the space complexity is O(26) = O(1)
in terms of constant space for this specific constraint.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Misunderstanding the Counting Logic
Pitfall: Many developers initially think they need to explicitly find all substrings and check if their first and last characters match, leading to an O(n²) or O(n³) solution.
# Incorrect approach - brute force checking all substrings
def numberOfSubstrings_incorrect(s: str) -> int:
count = 0
n = len(s)
for i in range(n):
for j in range(i, n):
substring = s[i:j+1]
if substring[0] == substring[-1]:
count += 1
return count
Solution: Understand that when we encounter a character at position i
, it can form valid substrings with all previous occurrences of the same character. The count at each step represents the number of valid substrings ending at the current position.
2. Off-by-One Error in Counting
Pitfall: Forgetting to count the single-character substring or incorrectly calculating the contribution of each character occurrence.
# Incorrect - forgetting to include the character itself
def numberOfSubstrings_wrong(s: str) -> int:
char_count = Counter()
total = 0
for char in s:
total += char_count[char] # Missing the increment before adding
char_count[char] += 1
return total
Solution: Always increment the character count BEFORE adding to the total, ensuring each character contributes correctly (including single-character substrings).
3. Using Inefficient Data Structure
Pitfall: Using a list to track all positions of each character and then calculating distances, which unnecessarily complicates the solution.
# Overly complex approach
def numberOfSubstrings_complex(s: str) -> int:
positions = defaultdict(list)
total = 0
for i, char in enumerate(s):
positions[char].append(i)
total += len(positions[char])
return total
Solution: Simply maintain a count for each character. You don't need to track positions since we only care about the number of previous occurrences.
4. Memory Optimization Misconception
Pitfall: Trying to optimize memory by using an array of size 26 but making indexing errors.
# Error-prone array indexing
def numberOfSubstrings_array_error(s: str) -> int:
counts = [0] * 26
total = 0
for char in s:
counts[ord(char) - ord('a')] += 1
total += counts[ord(char) - ord('A')] # Wrong offset!
return total
Solution: If using an array, be consistent with the character-to-index conversion. Using Counter()
is cleaner and prevents such errors:
def numberOfSubstrings_correct(s: str) -> int:
char_count = Counter()
total = 0
for char in s:
char_count[char] += 1
total += char_count[char]
return total
What does the following code do?
1def f(arr1, arr2):
2 i, j = 0, 0
3 new_arr = []
4 while i < len(arr1) and j < len(arr2):
5 if arr1[i] < arr2[j]:
6 new_arr.append(arr1[i])
7 i += 1
8 else:
9 new_arr.append(arr2[j])
10 j += 1
11 new_arr.extend(arr1[i:])
12 new_arr.extend(arr2[j:])
13 return new_arr
14
1public static List<Integer> f(int[] arr1, int[] arr2) {
2 int i = 0, j = 0;
3 List<Integer> newArr = new ArrayList<>();
4
5 while (i < arr1.length && j < arr2.length) {
6 if (arr1[i] < arr2[j]) {
7 newArr.add(arr1[i]);
8 i++;
9 } else {
10 newArr.add(arr2[j]);
11 j++;
12 }
13 }
14
15 while (i < arr1.length) {
16 newArr.add(arr1[i]);
17 i++;
18 }
19
20 while (j < arr2.length) {
21 newArr.add(arr2[j]);
22 j++;
23 }
24
25 return newArr;
26}
27
1function f(arr1, arr2) {
2 let i = 0, j = 0;
3 let newArr = [];
4
5 while (i < arr1.length && j < arr2.length) {
6 if (arr1[i] < arr2[j]) {
7 newArr.push(arr1[i]);
8 i++;
9 } else {
10 newArr.push(arr2[j]);
11 j++;
12 }
13 }
14
15 while (i < arr1.length) {
16 newArr.push(arr1[i]);
17 i++;
18 }
19
20 while (j < arr2.length) {
21 newArr.push(arr2[j]);
22 j++;
23 }
24
25 return newArr;
26}
27
Recommended Readings
Math for Technical Interviews How much math do I need to know for technical interviews The short answer is about high school level math Computer science is often associated with math and some universities even place their computer science department under the math faculty However the reality is that you
Prefix Sum The prefix sum is an incredibly powerful and straightforward technique Its primary goal is to allow for constant time range sum queries on an array What is Prefix Sum The prefix sum of an array at index i is the sum of all numbers from index 0 to i By
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Don’t Miss This!