2955. Number of Same-End Substrings

MediumArrayHash TableStringCountingPrefix Sum
Leetcode Link

Problem Description

The task is to handle a series of queries on a given string s. For each query, you are given two indices l and r, which define a substring of s that starts at index l and ends at index r (both inclusive). The objective is to count how many substrings within this defined substring have the same character at the beginning and the end. These are called "same-end" substrings. It's important to include all substrings, even if they are just a single character, as a character is considered a "same-end" substring of itself. The final result should be an array where each element corresponds to the count for each query.

Intuition

To efficiently address multiple queries without re-calculating substring counts for all characters each time, the solution uses a Prefix Sum approach. A Prefix Sum is an array that helps to quickly calculate the sum of elements in a given range of an array. Here, it's adapted to count characters by building a prefix sum array for each unique character in string s. This array records how many times each character has appeared up to every position in the string. Thus, with this pre-computed information, for any given range [l, r], you can quickly find out how often a character appears within that range.

To get to the number of "same-end" substrings for a given range, we sum up the counts for each character. For any character that appears more than once in the range, we calculate the number of ways to pick any two instances of that character, as they will form a "same-end" substring. This count is given by the formula for combinations: for x instances of a character, there are x * (x - 1) / 2 ways to pick two. This is combined with the count of single-character substrings (which is always a "same-end" substring), which is just the length of the range. The sum of these calculations for each character gives the total count of "same-end" substrings for the range.

Learn more about Prefix Sum patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

What is the worst case running time for finding an element in a binary search tree(not necessarily balanced) of size n?

Solution Approach

The implementation of the solution is based on the following key steps:

  1. Character Set Creation: First, we create a set cs of all unique characters found in the string s. This set is used to identify which characters we need to build prefix sums for.

  2. Prefix Sum Calculation: Next, we create a dictionary cnt where each key is a character from our set cs, and the value is an array with n + 1 elements (where n is the length of the string s). The extra element at the beginning of the array is used to handle cases where the substring starts at index 0. This array is used to store prefix sums, such that cnt[c][i] represents the count of occurrences of character c up to the i-th (0-indexed) position in the string. We populate this array with the actual prefix sums by iterating through the string and updating the counts for each character at each position.

  3. Evaluating Queries: With the prefix sum arrays ready, we can now efficiently evaluate each query. For every query [l, r], we initialize the count t with the length of the query range (i.e., r - l + 1), as every single character is considered a "same-end" substring.

  4. Counting Same-End Substrings: We then enumerate each character c from our set cs and use the prefix sums to calculate the number of times c appears in the interval [l, r]. This number is calculated using the formula x = cnt[c][r + 1] - cnt[c][l]. As mentioned previously, we calculate the total possible "same-end" substrings that can be formed using this character by the formula x * (x - 1) / 2. We add this to our ongoing count t.

  5. Query Results Collection: After all characters have been evaluated for a given query, the final count t represents the total number of "same-end" substrings for this particular substring of s. We append this count to our answer list ans.

The full enumeration for each character and swift calculations using prefix sums allow us to efficiently process multiple queries without recalculating the entire substring each time, which significantly optimizes the performance for a large number of queries.

In terms of the algorithmic pattern, this implementation uses a combination of prefix sums for precomputation and enumeration over a set to perform the actual calculation. This pattern reduces the time complexity from potentially O(n^2) per query to O(n + k) preprocessing and O(u) per query, where n is the length of the string, k is the total number of queries, and u is the number of unique characters.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

What is an advantages of top-down dynamic programming vs bottom-up dynamic programming?

Example Walkthrough

Let's say we have a string s = "abaac" and we need to handle the following queries: [[0, 3], [1, 4], [2, 2]].

  1. Character Set Creation: From the string s, we identify the set of unique characters: cs = {'a', 'b', 'c'}.

  2. Prefix Sum Calculation: We create prefix sum arrays for each character in cs:

    For a: cnt['a'] = [0, 1, 2, 2, 2, 3] (There's one a by index 0, two by index 1, and so on.)

    For b: cnt['b'] = [0, 0, 1, 1, 1, 1] (The first b appears by index 1.)

    For c: cnt['c'] = [0, 0, 0, 0, 1, 1] (The first c appears by index 4.)

  3. Evaluating Queries:

    Query [0, 3]: We initialize count t with 3 - 0 + 1 = 4, since there are four single characters within this range.

  4. Counting Same-End Substrings: For each unique character:

    For a: x = cnt['a'][3 + 1] - cnt['a'][0] = 2 - 1 = 1. There's only one 'a' so no same-end substrings can be formed apart from the single character itself.

    For b: x = cnt['b'][3 + 1] - cnt['b'][0] = 1 - 0 = 1. Only one 'b' is present.

    For c: x = cnt['c'][3 + 1] - cnt['c'][0] = 0 - 0 = 0. No 'c' is present in this range.

    Adding them up gives t = 4 + 0 + 0 + 0 = 4. There are 4 "same-end" substrings for this query.

    Query [1, 4]: Initialize t with 4 - 1 + 1 = 4.

    For a: x = cnt['a'][4 + 1] - cnt['a'][1] = 3 - 2 = 1.

    For b: x = cnt['b'][4 + 1] - cnt['b'][1] = 1 - 1 = 0.

    For c: x = cnt['c'][4 + 1] - cnt['c'][1] = 1 - 0 = 1. There's one 'c' so no same-end substrings can be formed apart from the single character itself.

    Total t = 4 + 0 + 0 = 4. There are 4 "same-end" substrings for this query.

    Query [2, 2]: Initialize t with 2 - 2 + 1 = 1.

    Since this is a single-character query, it only contains the character itself as a "same-end" substring.

    Total t = 1. There is 1 "same-end" substring for this query.

  5. Query Results Collection: We collect the results for each query in our answer list: ans = [4, 4, 1].

The final result of the queries is [4, 4, 1], which efficiently gives us the count of same-end substrings in each query range without calculating from scratch each time thanks to the prefix sum arrays.

Solution Implementation

1from typing import List
2
3class Solution:
4    def sameEndSubstringCount(self, s: str, queries: List[List[int]]) -> List[int]:
5        # Get the length of the string.
6        n = len(s)
7      
8        # Create a set with all unique characters in the string s.
9        unique_chars = set(s)
10      
11        # Initialize a dictionary to store prefix counts for each character.
12        prefix_count = {char: [0] * (n + 1) for char in unique_chars}
13      
14        # Populate the prefix count for characters in the string.
15        for i, char in enumerate(s, 1):
16            for unique_char in unique_chars:
17                prefix_count[unique_char][i] = prefix_count[unique_char][i - 1]
18            prefix_count[char][i] += 1
19      
20        # Initialize a list to hold the answer for each query.
21        answers = []
22      
23        # Process each query in the queries list.
24        for left, right in queries:
25            # Calculate the length of the substring.
26            length = right - left + 1
27          
28            # Initialize the total count of end-matching substrings with the length of substring.
29            total_count = length
30          
31            # Count the occurrences of each character within the current query's substring.
32            for char in unique_chars:
33                # Calculate the frequency of the current character in the substring.
34                char_count = prefix_count[char][right + 1] - prefix_count[char][left]
35              
36                # Include the count of substrings with the same ending character.
37                total_count += char_count * (char_count - 1) // 2
38          
39            # Add the total count of end-matching substrings to the answers list.
40            answers.append(total_count)
41      
42        # Return the list of answers for each query.
43        return answers
44
1class Solution {
2    public int[] sameEndSubstringCount(String s, int[][] queries) {
3        int strLength = s.length();
4        // Initialize the count array to keep track of character frequencies up to each index
5        int[][] charCount = new int[26][strLength + 1];
6
7        // Precompute the number of each character up to each index j
8        for (int j = 1; j <= strLength; ++j) {
9            // Copy counts from the previous index
10            for (int i = 0; i < 26; ++i) {
11                charCount[i][j] = charCount[i][j - 1];
12            }
13            // Update the count of the current character
14            charCount[s.charAt(j - 1) - 'a'][j]++;
15        }
16
17        int numQueries = queries.length;
18        int[] answer = new int[numQueries];
19
20        // Process each query in queries array
21        for (int k = 0; k < numQueries; ++k) {
22            int left = queries[k][0], right = queries[k][1];
23            // Initial count is the length of the substring
24            answer[k] = right - left + 1;
25          
26            // Count the number of substrings that start and end with the same character
27            for (int i = 0; i < 26; ++i) {
28                int countInSubstring = charCount[i][right + 1] - charCount[i][left];
29                // Add the combination of two same characters to the answer
30                answer[k] += countInSubstring * (countInSubstring - 1) / 2;
31            }
32        }
33
34        return answer;
35    }
36}
37
1#include <vector>
2#include <string>
3#include <cstring>
4using namespace std;
5
6class Solution {
7public:
8    vector<int> sameEndSubstringCount(string s, vector<vector<int>>& queries) {
9        int strLength = s.size();
10        int charCount[26][strLength + 1];
11        memset(charCount, 0, sizeof(charCount)); // Initialize all elements to 0
12
13        // Populate the character count array, storing the cumulative
14        // count of each character up to current index
15        for (int index = 1; index <= strLength; ++index) {
16            for (int i = 0; i < 26; ++i) {
17                charCount[i][index] = charCount[i][index - 1];
18            }
19            charCount[s[index - 1] - 'a'][index]++;
20        }
21
22        vector<int> results; // Initializing the result vector
23
24        // Iterate through each query to calculate results
25        for (auto& query : queries) {
26            int left = query[0], right = query[1];
27            // Directly adding the length of the substring as part of the result
28            results.push_back(right - left + 1);
29          
30            // Iterate through each character in the alphabet
31            for (int i = 0; i < 26; ++i) {
32                // Compute the count of the current character within the given range
33                int charInRange = charCount[i][right + 1] - charCount[i][left];
34                // Use the count to calculate combinations and add to current result
35                results.back() += charInRange * (charInRange - 1) / 2;
36            }
37        }
38      
39        return results; // Return the final computed result vector
40    }
41};
42
1function sameEndSubstringCount(s: string, queries: number[][]): number[] {
2  const lengthOfString: number = s.length;
3  // Initialize a 2D array to store counts of characters up to each index
4  const characterCount: number[][] = Array.from({ length: 26 }, () => Array(lengthOfString + 1).fill(0));
5
6  // Fill the character count array with the cumulative counts of each character
7  for (let endIndex = 1; endIndex <= lengthOfString; endIndex++) {
8    for (let i = 0; i < 26; i++) {
9      characterCount[i][endIndex] = characterCount[i][endIndex - 1];
10    }
11    characterCount[s.charCodeAt(endIndex - 1) - 'a'.charCodeAt(0)][endIndex]++;
12  }
13
14  // Initialize an array to store the answers to the queries
15  const answer: number[] = [];
16  // Process each query to count substrings ending with the same letter
17  for (const [leftIndex, rightIndex] of queries) {
18  
19    // Start with the length of the substring as the base count
20    answer.push(rightIndex - leftIndex + 1);
21    // Count the pairs of identical characters within the query range
22    for (let i = 0; i < 26; i++) {
23      const countInRange: number = characterCount[i][rightIndex + 1] - characterCount[i][leftIndex];
24      answer[answer.length - 1] += (countInRange * (countInRange - 1)) / 2;
25    }
26  }
27  return answer;
28}
29
Not Sure What to Study? Take the 2-min Quiz:

Which two pointer techniques do you use to check if a string is a palindrome?

Time and Space Complexity

Time Complexity

The time complexity of the given code is O((n + m) * |Σ|), where n is the length of the string s and m is the number of queries, while |Σ| represents the size of the set of letters appearing in the string s.

This complexity arises because for each of the n characters in the string s, we update a counter for each character in the character set Σ. This process has a complexity of O(n * |Σ|). Additionally, for each query, we calculate a number of substrings based on character counts, which involves iterating over the set of characters again, leading to a complexity of O(m * |Σ|). Combining both parts results in O((n + m) * |Σ|) total time complexity.

Space Complexity

The space complexity of the code is O(n * |Σ|) because we store a count for each character of Σ at each index of the string, leading to a two-dimensional data structure with dimensions n (length of the string) by |Σ| (number of unique characters in the character set).

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Problem: Given a list of tasks and a list of requirements, compute a sequence of tasks that can be performed, such that we complete every task once while satisfying all the requirements.

Which of the following method should we use to solve this problem?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫