Facebook Pixel

2083. Substrings That Begin and End With the Same Letter 🔒

MediumHash TableMathStringCountingPrefix Sum
Leetcode Link

Problem Description

You need to find the total number of substrings in a given string where the first and last characters are the same.

Given a string s that contains only lowercase English letters (indexed starting from 0), count all possible substrings that begin and end with the same character.

A substring is defined as any contiguous sequence of characters within the string that is not empty.

For example, if s = "aba":

  • Single character substrings: "a", "b", "a" (all begin and end with the same character)
  • Two character substrings: "ab", "ba" (neither begins and ends with the same character)
  • Three character substring: "aba" (begins and ends with 'a')
  • Total count: 4

The solution uses a counting approach: for each character encountered while traversing the string, it keeps track of how many times that character has appeared so far. Each occurrence of a character can form valid substrings with all its previous occurrences (including itself as a single-character substring), so the count for that character is added to the answer.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

When we need to count substrings that begin and end with the same character, let's think about what happens when we encounter a character while traversing the string.

Consider a character 'a' appearing at different positions in the string. If 'a' appears at positions i1, i2, i3, ..., then:

  • The 'a' at position i1 forms 1 valid substring (itself)
  • The 'a' at position i2 forms 2 valid substrings: itself and the substring from i1 to i2
  • The 'a' at position i3 forms 3 valid substrings: itself, the substring from i1 to i3, and the substring from i2 to i3

The pattern becomes clear: when we encounter a character c that has already appeared k times before, this new occurrence can form valid substrings with each of the previous k occurrences, plus itself as a single-character substring. That's k + 1 new valid substrings.

This leads to the key insight: instead of checking all possible substrings (which would be O(n²)), we can simply keep a running count of how many times each character has appeared. When we see a character, we increment its count and add that count to our answer. This works because:

  • The first occurrence of a character contributes 1 substring
  • The second occurrence contributes 2 substrings
  • The third occurrence contributes 3 substrings
  • And so on...

This transforms the problem into a simple counting exercise that can be solved in a single pass through the string with O(n) time complexity.

Learn more about Math and Prefix Sum patterns.

Solution Approach

The implementation uses a hash table (or an array of size 26 for lowercase English letters) to track the count of each character as we traverse the string.

Here's how the algorithm works step by step:

  1. Initialize data structures: Create a counter cnt (using Python's Counter class or a hash table) to store the frequency of each character encountered so far. Initialize the answer variable ans to 0.

  2. Single pass traversal: Iterate through each character c in the string s.

  3. Update character count: For each character c:

    • Increment its count: cnt[c] += 1
    • This new count represents how many times we've seen this character up to the current position
  4. Add to answer: Add cnt[c] to the answer. This is the crucial step because:

    • If cnt[c] = 1 (first occurrence), it contributes 1 substring (itself)
    • If cnt[c] = 2 (second occurrence), it can form substrings with the first occurrence and itself, contributing 2 more substrings
    • If cnt[c] = k, it can form substrings with all k-1 previous occurrences plus itself, contributing k substrings
  5. Return result: After processing all characters, return ans.

The beauty of this approach is its efficiency:

  • Time Complexity: O(n) where n is the length of the string, as we only need one pass
  • Space Complexity: O(1) since we only need to store counts for at most 26 lowercase letters (or O(k) where k is the size of the character set)

For example, with s = "abcab":

  • 'a' at index 0: cnt['a'] = 1, add 1 to answer
  • 'b' at index 1: cnt['b'] = 1, add 1 to answer
  • 'c' at index 2: cnt['c'] = 1, add 1 to answer
  • 'a' at index 3: cnt['a'] = 2, add 2 to answer (substrings "a" and "abca")
  • 'b' at index 4: cnt['b'] = 2, add 2 to answer (substrings "b" and "bcab")
  • Final answer: 1 + 1 + 1 + 2 + 2 = 7

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the algorithm with the string s = "ababa".

We'll track our character counts and running answer as we process each character:

Initial state:

  • cnt = {} (empty counter)
  • ans = 0

Step 1: Process 'a' at index 0

  • Increment count: cnt['a'] = 1
  • Add to answer: ans = 0 + 1 = 1
  • Valid substrings ending here: "a" (1 substring)

Step 2: Process 'b' at index 1

  • Increment count: cnt['b'] = 1
  • Add to answer: ans = 1 + 1 = 2
  • Valid substrings ending here: "b" (1 substring)

Step 3: Process 'a' at index 2

  • Increment count: cnt['a'] = 2
  • Add to answer: ans = 2 + 2 = 4
  • Valid substrings ending here: "a" and "aba" (2 substrings)
    • The 'a' at index 2 can form substrings with the 'a' at index 0 and itself

Step 4: Process 'b' at index 3

  • Increment count: cnt['b'] = 2
  • Add to answer: ans = 4 + 2 = 6
  • Valid substrings ending here: "b" and "bab" (2 substrings)
    • The 'b' at index 3 can form substrings with the 'b' at index 1 and itself

Step 5: Process 'a' at index 4

  • Increment count: cnt['a'] = 3
  • Add to answer: ans = 6 + 3 = 9
  • Valid substrings ending here: "a", "aba", and "ababa" (3 substrings)
    • The 'a' at index 4 can form substrings with both 'a's at indices 0 and 2, plus itself

Final Result: 9

All valid substrings that begin and end with the same character:

  1. "a" (index 0)
  2. "b" (index 1)
  3. "a" (index 2)
  4. "aba" (indices 0-2)
  5. "b" (index 3)
  6. "bab" (indices 1-3)
  7. "a" (index 4)
  8. "aba" (indices 2-4)
  9. "ababa" (indices 0-4)

The key insight: when we encounter the third 'a', it contributes 3 new valid substrings because it can pair with each of the 2 previous 'a's to form substrings, plus count itself as a single-character substring.

Solution Implementation

1from collections import Counter
2
3class Solution:
4    def numberOfSubstrings(self, s: str) -> int:
5        # Counter to track frequency of each character seen so far
6        char_count = Counter()
7        # Total count of substrings
8        total_substrings = 0
9      
10        # Iterate through each character in the string
11        for char in s:
12            # Increment the count for current character
13            char_count[char] += 1
14          
15            # Add the count of current character to total
16            # This represents all substrings ending at current position
17            # that start from any previous occurrence of the same character
18            total_substrings += char_count[char]
19      
20        return total_substrings
21
1class Solution {
2    /**
3     * Counts the total number of substrings that contain at least one occurrence
4     * of each character that appears in them.
5     * 
6     * The algorithm works by counting how many substrings end at each position.
7     * For each character, the number of substrings ending at that character
8     * equals the number of times that character has appeared so far (including current).
9     * 
10     * @param s the input string
11     * @return the total count of valid substrings
12     */
13    public long numberOfSubstrings(String s) {
14        // Array to store the frequency count of each letter (a-z)
15        int[] characterCount = new int[26];
16      
17        // Variable to accumulate the total number of valid substrings
18        long totalSubstrings = 0;
19      
20        // Iterate through each character in the string
21        for (char currentChar : s.toCharArray()) {
22            // Convert character to index (0-25) by subtracting 'a'
23            int charIndex = currentChar - 'a';
24          
25            // Increment the count for this character and add to total
26            // The pre-increment returns the new count value
27            // This represents all substrings ending at current position
28            // that start from any previous occurrence of the same character
29            characterCount[charIndex]++;
30            totalSubstrings += characterCount[charIndex];
31        }
32      
33        return totalSubstrings;
34    }
35}
36
1class Solution {
2public:
3    long long numberOfSubstrings(string s) {
4        // Array to store frequency count of each character (a-z)
5        int charCount[26] = {0};
6      
7        // Total count of substrings where all characters are the same
8        long long totalSubstrings = 0;
9      
10        // Iterate through each character in the string
11        for (char& currentChar : s) {
12            // Increment the count for current character and add to result
13            // For each occurrence of a character, it can form n substrings
14            // with previous occurrences of the same character
15            // where n is the new count after incrementing
16            int charIndex = currentChar - 'a';
17            charCount[charIndex]++;
18            totalSubstrings += charCount[charIndex];
19        }
20      
21        return totalSubstrings;
22    }
23};
24
1/**
2 * Counts the total number of substrings that end at each position in the string.
3 * For each character, it counts how many times that character has appeared so far,
4 * which represents the number of substrings ending at the current position
5 * that start with any previous occurrence of the same character (including itself).
6 * 
7 * @param s - The input string to analyze
8 * @returns The total count of all substrings
9 */
10function numberOfSubstrings(s: string): number {
11    // Object to store the frequency count of each character seen so far
12    const characterCount: Record<string, number> = {};
13  
14    // Accumulator for the total number of substrings
15    let totalSubstrings = 0;
16  
17    // Iterate through each character in the string
18    for (const currentChar of s) {
19        // Increment the count for the current character (initialize to 0 if first occurrence)
20        characterCount[currentChar] = (characterCount[currentChar] || 0) + 1;
21      
22        // Add the current character's count to the total
23        // This represents all substrings ending at this position that begin with this character
24        totalSubstrings += characterCount[currentChar];
25    }
26  
27    return totalSubstrings;
28}
29

Time and Space Complexity

The time complexity is O(n), where n is the length of the string s. The algorithm iterates through each character in the string exactly once, and for each character, it performs constant-time operations (incrementing the counter and adding to the answer).

The space complexity is O(|Σ|), where Σ is the character set. The Counter object stores at most one entry for each unique character that appears in the string. Since the problem deals with strings that could contain lowercase English letters, the maximum number of unique characters is 26, so |Σ| = 26. Therefore, the space complexity is O(26) = O(1) in terms of constant space for this specific constraint.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Misunderstanding the Counting Logic

Pitfall: Many developers initially think they need to explicitly find all substrings and check if their first and last characters match, leading to an O(n²) or O(n³) solution.

# Incorrect approach - brute force checking all substrings
def numberOfSubstrings_incorrect(s: str) -> int:
    count = 0
    n = len(s)
    for i in range(n):
        for j in range(i, n):
            substring = s[i:j+1]
            if substring[0] == substring[-1]:
                count += 1
    return count

Solution: Understand that when we encounter a character at position i, it can form valid substrings with all previous occurrences of the same character. The count at each step represents the number of valid substrings ending at the current position.

2. Off-by-One Error in Counting

Pitfall: Forgetting to count the single-character substring or incorrectly calculating the contribution of each character occurrence.

# Incorrect - forgetting to include the character itself
def numberOfSubstrings_wrong(s: str) -> int:
    char_count = Counter()
    total = 0
    for char in s:
        total += char_count[char]  # Missing the increment before adding
        char_count[char] += 1
    return total

Solution: Always increment the character count BEFORE adding to the total, ensuring each character contributes correctly (including single-character substrings).

3. Using Inefficient Data Structure

Pitfall: Using a list to track all positions of each character and then calculating distances, which unnecessarily complicates the solution.

# Overly complex approach
def numberOfSubstrings_complex(s: str) -> int:
    positions = defaultdict(list)
    total = 0
    for i, char in enumerate(s):
        positions[char].append(i)
        total += len(positions[char])
    return total

Solution: Simply maintain a count for each character. You don't need to track positions since we only care about the number of previous occurrences.

4. Memory Optimization Misconception

Pitfall: Trying to optimize memory by using an array of size 26 but making indexing errors.

# Error-prone array indexing
def numberOfSubstrings_array_error(s: str) -> int:
    counts = [0] * 26
    total = 0
    for char in s:
        counts[ord(char) - ord('a')] += 1
        total += counts[ord(char) - ord('A')]  # Wrong offset!
    return total

Solution: If using an array, be consistent with the character-to-index conversion. Using Counter() is cleaner and prevents such errors:

def numberOfSubstrings_correct(s: str) -> int:
    char_count = Counter()
    total = 0
    for char in s:
        char_count[char] += 1
        total += char_count[char]
    return total
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

What does the following code do?

1def f(arr1, arr2):
2  i, j = 0, 0
3  new_arr = []
4  while i < len(arr1) and j < len(arr2):
5      if arr1[i] < arr2[j]:
6          new_arr.append(arr1[i])
7          i += 1
8      else:
9          new_arr.append(arr2[j])
10          j += 1
11  new_arr.extend(arr1[i:])
12  new_arr.extend(arr2[j:])
13  return new_arr
14
1public static List<Integer> f(int[] arr1, int[] arr2) {
2  int i = 0, j = 0;
3  List<Integer> newArr = new ArrayList<>();
4
5  while (i < arr1.length && j < arr2.length) {
6      if (arr1[i] < arr2[j]) {
7          newArr.add(arr1[i]);
8          i++;
9      } else {
10          newArr.add(arr2[j]);
11          j++;
12      }
13  }
14
15  while (i < arr1.length) {
16      newArr.add(arr1[i]);
17      i++;
18  }
19
20  while (j < arr2.length) {
21      newArr.add(arr2[j]);
22      j++;
23  }
24
25  return newArr;
26}
27
1function f(arr1, arr2) {
2  let i = 0, j = 0;
3  let newArr = [];
4  
5  while (i < arr1.length && j < arr2.length) {
6      if (arr1[i] < arr2[j]) {
7          newArr.push(arr1[i]);
8          i++;
9      } else {
10          newArr.push(arr2[j]);
11          j++;
12      }
13  }
14  
15  while (i < arr1.length) {
16      newArr.push(arr1[i]);
17      i++;
18  }
19  
20  while (j < arr2.length) {
21      newArr.push(arr2[j]);
22      j++;
23  }
24  
25  return newArr;
26}
27

Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More