2083. Substrings That Begin and End With the Same Letter

MediumHash TableMathStringCountingPrefix Sum
Leetcode Link

Problem Description

The problem presents a situation where we are given a string s, which contains only lowercase English letters. Our goal is to determine the total number of substrings within this string that start and end with the same character. It's essential to remember that a substring is simply a sequence of characters that are adjacent in the string and that the string in question uses 0-based indexing.

For example, if given the string s = "ababa", some valid substrings that begin and end with the same character would be ["a", "aba", "ababa", "b", "bab", "a"]. In this challenge, we need to enumerate all possible substrings that meet this criterion and return their count.

Intuition

The solution strategy revolves around the observation that for each occurrence of a character in the string, it can potentially form a substring with each of its previous appearances. This is because any sequence of characters between the two same characters (including no characters at all) will still result in a substring that starts and ends with that character.

For example, if we have the string s = "aa", then there are three such substrings: "a", "a", "aa". For the first 'a', there is 1 substring which is itself. For the second 'a', there are 2 substrings: itself and the substring "aa" that includes both the first and the second 'a'.

To implement this, we utilize a dictionary to keep track of the count of each character as we iterate through the string. Whenever we encounter a character, we increment its count in the dictionary. The current count of the character gives us the number of substrings that end with this character and start with a previous instance of the same character. We add this count to our answer for each character we encounter. The sum of all these counts will give us the total number of desired substrings.

The Counter class from the collections module in Python provides a convenient way to count occurrences of each character. The variable cnt is our counter, and ans is used to accumulate the count of valid substrings. We iterate through each character in the string, update its count in cnt, and add the updated count to ans, yielding the total number of substrings that start and end with the same character by the time we've processed the entire string.

Learn more about Math and Prefix Sum patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Which data structure is used to implement recursion?

Solution Approach

The implementation uses a hash table to keep track of how many times each character has appeared in the string so far. The hash table is implemented using the Counter class from Python's collections module, which efficiently handles the counting of hashable objects.

Here's a step-by-step breakdown of the implementation:

  1. A Counter object named cnt is created to keep track of the occurrences of each character in the string.
  2. An integer variable ans is initialized to 0. It will serve as an accumulator for the total count of substrings satisfying the given condition.
  3. The code enters a loop that iterates over each character c in the string s.
  4. For each character, the loop increments its count in cnt. This is achieved by the expression cnt[c] += 1. In the Counter, if the character c doesn't exist yet, it would be initialized to 0 before being incremented. So, after this step, cnt[c] contains the number of times c has appeared so far.
  5. The current value of cnt[c] is then added to ans. This is based on the observation that if we have seen a character n times so far, there are n substrings that can end with this instance of the character, each starting with one of the previously seen instances of this character (including a substring just consisting of this one character).
  6. After the loop ends, ans contains the total number of qualified substrings, and the function returns this value.

This approach is effective for the problem at hand because it leverages the fact that the number of substrings ending with a certain character equates to the instances of that character seen so far. This allows the algorithm to count the requisite substrings in a single linear pass over the string, thus having a time complexity of O(n), where n is the length of the string.

Here is the reference solution approach implemented in Python:

1class Solution:
2    def numberOfSubstrings(self, s: str) -> int:
3        cnt = Counter()
4        ans = 0
5        for c in s:
6            cnt[c] += 1
7            ans += cnt[c]
8        return ans

In this code, for c in s iterates over the characters in the string. The Counter object cnt keeps track of the occurrence count, and the accumulator ans aggregates the number of substrings. After visiting all characters, the total count is held in ans, which is returned as the final result.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which type of traversal does breadth first search do?

Example Walkthrough

Let's use a small example to illustrate the solution approach described above. Suppose we have the string s = "abcab".

  1. We start with an empty Counter object cnt and an accumulator ans, initialized to 0.
  2. We iterate through each character in the string, in order.
  3. Upon encountering the first character a, we update the counter cnt['a'] += 1. Now, cnt = Counter({'a': 1}), and we increment ans by cnt['a'], so ans = 1.
  4. Next, we see the character b. We update cnt['b'] += 1, making cnt = Counter({'a': 1, 'b': 1}), and append cnt['b'] to ans, resulting in ans = 2.
  5. The third character is c. We update cnt['c'] += 1, giving us cnt = Counter({'a': 1, 'b': 1, 'c': 1}), and increment ans by cnt['c'], leading to ans = 3.
  6. The fourth character is a again. This time, cnt['a'] += 1 makes cnt = Counter({'a': 2, 'b': 1, 'c': 1}) because a has appeared twice. We add cnt['a'] to ans, and now ans = 5 (since cnt['a'] is now 2, we add 2).
  7. Finally, we see another b. We increment cnt['b'], so cnt = Counter({'a': 2, 'b': 2, 'c': 1}), and then ans += cnt['b'] makes ans = 7.

At the end of the loop, we have considered every character and updated our accumulator ans based on the occurrences of each character. The final value of ans is 7, which is the number of substrings that start and end with the same character in the string s = "abcab".

These substrings are specifically "a", "b", "c", "abca", "bcab", "aba", and "bab". Notice how substrings that start and end with the same character and contain other characters in the middle are also included, as illustrated by "abca" and "bcab".

Solution Implementation

1from collections import Counter
2
3class Solution:
4    def numberOfSubstrings(self, string: str) -> int:
5        # Initialize an empty Counter to keep track of character frequencies.
6        char_count = Counter()
7      
8        # Initialize the answer variable to store the result.
9        total_substrings = 0
10      
11        # Iterate over each character in the string.
12        for char in string:
13            # Increment the count of the current character.
14            char_count[char] += 1
15          
16            # The number of substrings ending with the current character is equal
17            # to its current count (since any substring ending before could be extended by this character).
18            total_substrings += char_count[char]
19      
20        # Return the total number of substrings.
21        return total_substrings
22
1class Solution {
2    public long numberOfSubstrings(String s) {
3        // Create an array to store the count of each letter.
4        int[] letterCount = new int[26];
5      
6        // Initialize a variable to store the total number of substrings.
7        long totalSubstrings = 0;
8      
9        // Iterate over each character in the string.
10        for (int i = 0; i < s.length(); ++i) {
11            // Calculate the index of the current character 'a' being 0, 'b' being 1, ... 'z' being 25.
12            int index = s.charAt(i) - 'a';
13          
14            // Increment the count of the current letter in the letter count array.
15            ++letterCount[index];
16          
17            // Each time a new character is processed, increment totalSubstrings by the count of that character.
18            // The count represents the number of substrings ending with that character.
19            totalSubstrings += letterCount[index];
20        }
21      
22        // Return the total number of substrings.
23        return totalSubstrings;
24    }
25}
26
1class Solution {
2public:
3    long long numberOfSubstrings(string s) {
4        int count[26] = {}; // Initialize array to store the count of each letter
5        long long answer = 0; // Initialize the answer variable to store the total number of substrings
6
7        // Loop through each character in the string
8        for (char& c : s) {
9            // Increment the count of the current letter and add the new count to the answer
10            // The new count represents the number of substrings ending with the current letter
11            // that have not been counted before
12            answer += ++count[c - 'a'];
13        }
14
15        return answer; // Return the total number of substrings
16    }
17};
18
1// Function to count the number of substrings
2function numberOfSubstrings(s: string): number {
3  // Initialize an array to store the count of each letter ('a' to 'z')
4  let count: number[] = new Array(26).fill(0);
5  // Initialize the answer variable to store the total number of substrings
6  let answer: number = 0;
7
8  // Loop through each character in the string
9  for (const c of s) {
10    // Increment the count of the current letter (using its ASCII value to map to an index)
11    // The new count represents the number of substrings ending with the current letter
12    // that have not been counted before
13    answer += ++count[c.charCodeAt(0) - 'a'.charCodeAt(0)];
14  }
15
16  // Return the total number of substrings
17  return answer;
18}
19
20// Note: Since this function is defined globally, it can be called directly with a string parameter.
21
Not Sure What to Study? Take the 2-min Quiz:

What's the output of running the following function using the following tree as input?

1def serialize(root):
2    res = []
3    def dfs(root):
4        if not root:
5            res.append('x')
6            return
7        res.append(root.val)
8        dfs(root.left)
9        dfs(root.right)
10    dfs(root)
11    return ' '.join(res)
12
1import java.util.StringJoiner;
2
3public static String serialize(Node root) {
4    StringJoiner res = new StringJoiner(" ");
5    serializeDFS(root, res);
6    return res.toString();
7}
8
9private static void serializeDFS(Node root, StringJoiner result) {
10    if (root == null) {
11        result.add("x");
12        return;
13    }
14    result.add(Integer.toString(root.val));
15    serializeDFS(root.left, result);
16    serializeDFS(root.right, result);
17}
18
1function serialize(root) {
2    let res = [];
3    serialize_dfs(root, res);
4    return res.join(" ");
5}
6
7function serialize_dfs(root, res) {
8    if (!root) {
9        res.push("x");
10        return;
11    }
12    res.push(root.val);
13    serialize_dfs(root.left, res);
14    serialize_dfs(root.right, res);
15}
16

Time and Space Complexity

The given code calculates the number of substrings that can be formed in a string with each character being counted once it is part of a substring as it's introduced.

Time Complexity:

Let's analyze the time complexity of the provided function:

  1. The function iterates through each character in the string s which has a length of n.
  2. In each iteration, it performs an update on the Counter() dictionary and an addition operation to the ans variable.

The Counter() update operation and addition operation are both O(1) operations.

Hence, the loop runs n times with constant-time operations within it, resulting in a time complexity of O(n).

Space Complexity:

Now, let's analyze the space complexity:

  1. The Counter() data structure is used to keep a count of each character. In the worst case, this would store as many keys as there are unique characters in the string s.
  2. Assuming the input string s could include all possible characters in the charset used (let's say ASCII for simplicity), there is a constant number of possible characters, which means the Counter would have at most C keys, where C is the size of the charset.

Therefore, since the size of the charset is fixed and does not grow with the input size n, the space required by the Counter is O(1).

As a result, the space complexity of the code is O(1) if we consider the size of the charset to be constant and not dependent on the size of the input string. If considering unicode or a variable character set that indeed scales with n, it would be O(n) for space complexity.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫