2955. Number of Same-End Substrings
Problem Description
The task is to handle a series of queries on a given string s
. For each query, you are given two indices l
and r
, which define a substring of s
that starts at index l
and ends at index r
(both inclusive). The objective is to count how many substrings within this defined substring have the same character at the beginning and the end. These are called "same-end" substrings. It's important to include all substrings, even if they are just a single character, as a character is considered a "same-end" substring of itself. The final result should be an array where each element corresponds to the count for each query.
Intuition
To efficiently address multiple queries without re-calculating substring counts for all characters each time, the solution uses a Prefix Sum approach. A Prefix Sum is an array that helps to quickly calculate the sum of elements in a given range of an array. Here, it's adapted to count characters by building a prefix sum array for each unique character in string s
. This array records how many times each character has appeared up to every position in the string. Thus, with this pre-computed information, for any given range [l, r], you can quickly find out how often a character appears within that range.
To get to the number of "same-end" substrings for a given range, we sum up the counts for each character. For any character that appears more than once in the range, we calculate the number of ways to pick any two instances of that character, as they will form a "same-end" substring. This count is given by the formula for combinations: for x
instances of a character, there are x * (x - 1) / 2
ways to pick two. This is combined with the count of single-character substrings (which is always a "same-end" substring), which is just the length of the range. The sum of these calculations for each character gives the total count of "same-end" substrings for the range.
Learn more about Prefix Sum patterns.
Solution Approach
The implementation of the solution is based on the following key steps:
-
Character Set Creation: First, we create a set
cs
of all unique characters found in the strings
. This set is used to identify which characters we need to build prefix sums for. -
Prefix Sum Calculation: Next, we create a dictionary
cnt
where each key is a character from our setcs
, and the value is an array withn + 1
elements (wheren
is the length of the strings
). The extra element at the beginning of the array is used to handle cases where the substring starts at index 0. This array is used to store prefix sums, such thatcnt[c][i]
represents the count of occurrences of characterc
up to thei
-th (0-indexed) position in the string. We populate this array with the actual prefix sums by iterating through the string and updating the counts for each character at each position. -
Evaluating Queries: With the prefix sum arrays ready, we can now efficiently evaluate each query. For every query
[l, r]
, we initialize the countt
with the length of the query range (i.e.,r - l + 1
), as every single character is considered a "same-end" substring. -
Counting Same-End Substrings: We then enumerate each character
c
from our setcs
and use the prefix sums to calculate the number of timesc
appears in the interval[l, r]
. This number is calculated using the formulax = cnt[c][r + 1] - cnt[c][l]
. As mentioned previously, we calculate the total possible "same-end" substrings that can be formed using this character by the formulax * (x - 1) / 2
. We add this to our ongoing countt
. -
Query Results Collection: After all characters have been evaluated for a given query, the final count
t
represents the total number of "same-end" substrings for this particular substring ofs
. We append this count to our answer listans
.
The full enumeration for each character and swift calculations using prefix sums allow us to efficiently process multiple queries without recalculating the entire substring each time, which significantly optimizes the performance for a large number of queries.
In terms of the algorithmic pattern, this implementation uses a combination of prefix sums for precomputation and enumeration over a set to perform the actual calculation. This pattern reduces the time complexity from potentially O(n^2) per query to O(n + k) preprocessing and O(u) per query, where n
is the length of the string, k
is the total number of queries, and u
is the number of unique characters.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's say we have a string s = "abaac"
and we need to handle the following queries: [[0, 3], [1, 4], [2, 2]]
.
-
Character Set Creation: From the string
s
, we identify the set of unique characters:cs = {'a', 'b', 'c'}
. -
Prefix Sum Calculation: We create prefix sum arrays for each character in
cs
:For
a
:cnt['a'] = [0, 1, 2, 2, 2, 3]
(There's onea
by index 0, two by index 1, and so on.)For
b
:cnt['b'] = [0, 0, 1, 1, 1, 1]
(The firstb
appears by index 1.)For
c
:cnt['c'] = [0, 0, 0, 0, 1, 1]
(The firstc
appears by index 4.) -
Evaluating Queries:
Query
[0, 3]
: We initialize countt
with3 - 0 + 1 = 4
, since there are four single characters within this range. -
Counting Same-End Substrings: For each unique character:
For
a
:x = cnt['a'][3 + 1] - cnt['a'][0] = 2 - 1 = 1
. There's only one 'a' so no same-end substrings can be formed apart from the single character itself.For
b
:x = cnt['b'][3 + 1] - cnt['b'][0] = 1 - 0 = 1
. Only one 'b' is present.For
c
:x = cnt['c'][3 + 1] - cnt['c'][0] = 0 - 0 = 0
. No 'c' is present in this range.Adding them up gives
t = 4 + 0 + 0 + 0 = 4
. There are 4 "same-end" substrings for this query.Query
[1, 4]
: Initializet
with4 - 1 + 1 = 4
.For
a
:x = cnt['a'][4 + 1] - cnt['a'][1] = 3 - 2 = 1
.For
b
:x = cnt['b'][4 + 1] - cnt['b'][1] = 1 - 1 = 0
.For
c
:x = cnt['c'][4 + 1] - cnt['c'][1] = 1 - 0 = 1
. There's one 'c' so no same-end substrings can be formed apart from the single character itself.Total
t = 4 + 0 + 0 = 4
. There are 4 "same-end" substrings for this query.Query
[2, 2]
: Initializet
with2 - 2 + 1 = 1
.Since this is a single-character query, it only contains the character itself as a "same-end" substring.
Total
t = 1
. There is 1 "same-end" substring for this query. -
Query Results Collection: We collect the results for each query in our answer list:
ans = [4, 4, 1]
.
The final result of the queries is [4, 4, 1]
, which efficiently gives us the count of same-end substrings in each query range without calculating from scratch each time thanks to the prefix sum arrays.
Solution Implementation
1from typing import List
2
3class Solution:
4 def sameEndSubstringCount(self, s: str, queries: List[List[int]]) -> List[int]:
5 # Get the length of the string.
6 n = len(s)
7
8 # Create a set with all unique characters in the string s.
9 unique_chars = set(s)
10
11 # Initialize a dictionary to store prefix counts for each character.
12 prefix_count = {char: [0] * (n + 1) for char in unique_chars}
13
14 # Populate the prefix count for characters in the string.
15 for i, char in enumerate(s, 1):
16 for unique_char in unique_chars:
17 prefix_count[unique_char][i] = prefix_count[unique_char][i - 1]
18 prefix_count[char][i] += 1
19
20 # Initialize a list to hold the answer for each query.
21 answers = []
22
23 # Process each query in the queries list.
24 for left, right in queries:
25 # Calculate the length of the substring.
26 length = right - left + 1
27
28 # Initialize the total count of end-matching substrings with the length of substring.
29 total_count = length
30
31 # Count the occurrences of each character within the current query's substring.
32 for char in unique_chars:
33 # Calculate the frequency of the current character in the substring.
34 char_count = prefix_count[char][right + 1] - prefix_count[char][left]
35
36 # Include the count of substrings with the same ending character.
37 total_count += char_count * (char_count - 1) // 2
38
39 # Add the total count of end-matching substrings to the answers list.
40 answers.append(total_count)
41
42 # Return the list of answers for each query.
43 return answers
44
1class Solution {
2 public int[] sameEndSubstringCount(String s, int[][] queries) {
3 int strLength = s.length();
4 // Initialize the count array to keep track of character frequencies up to each index
5 int[][] charCount = new int[26][strLength + 1];
6
7 // Precompute the number of each character up to each index j
8 for (int j = 1; j <= strLength; ++j) {
9 // Copy counts from the previous index
10 for (int i = 0; i < 26; ++i) {
11 charCount[i][j] = charCount[i][j - 1];
12 }
13 // Update the count of the current character
14 charCount[s.charAt(j - 1) - 'a'][j]++;
15 }
16
17 int numQueries = queries.length;
18 int[] answer = new int[numQueries];
19
20 // Process each query in queries array
21 for (int k = 0; k < numQueries; ++k) {
22 int left = queries[k][0], right = queries[k][1];
23 // Initial count is the length of the substring
24 answer[k] = right - left + 1;
25
26 // Count the number of substrings that start and end with the same character
27 for (int i = 0; i < 26; ++i) {
28 int countInSubstring = charCount[i][right + 1] - charCount[i][left];
29 // Add the combination of two same characters to the answer
30 answer[k] += countInSubstring * (countInSubstring - 1) / 2;
31 }
32 }
33
34 return answer;
35 }
36}
37
1#include <vector>
2#include <string>
3#include <cstring>
4using namespace std;
5
6class Solution {
7public:
8 vector<int> sameEndSubstringCount(string s, vector<vector<int>>& queries) {
9 int strLength = s.size();
10 int charCount[26][strLength + 1];
11 memset(charCount, 0, sizeof(charCount)); // Initialize all elements to 0
12
13 // Populate the character count array, storing the cumulative
14 // count of each character up to current index
15 for (int index = 1; index <= strLength; ++index) {
16 for (int i = 0; i < 26; ++i) {
17 charCount[i][index] = charCount[i][index - 1];
18 }
19 charCount[s[index - 1] - 'a'][index]++;
20 }
21
22 vector<int> results; // Initializing the result vector
23
24 // Iterate through each query to calculate results
25 for (auto& query : queries) {
26 int left = query[0], right = query[1];
27 // Directly adding the length of the substring as part of the result
28 results.push_back(right - left + 1);
29
30 // Iterate through each character in the alphabet
31 for (int i = 0; i < 26; ++i) {
32 // Compute the count of the current character within the given range
33 int charInRange = charCount[i][right + 1] - charCount[i][left];
34 // Use the count to calculate combinations and add to current result
35 results.back() += charInRange * (charInRange - 1) / 2;
36 }
37 }
38
39 return results; // Return the final computed result vector
40 }
41};
42
1function sameEndSubstringCount(s: string, queries: number[][]): number[] {
2 const lengthOfString: number = s.length;
3 // Initialize a 2D array to store counts of characters up to each index
4 const characterCount: number[][] = Array.from({ length: 26 }, () => Array(lengthOfString + 1).fill(0));
5
6 // Fill the character count array with the cumulative counts of each character
7 for (let endIndex = 1; endIndex <= lengthOfString; endIndex++) {
8 for (let i = 0; i < 26; i++) {
9 characterCount[i][endIndex] = characterCount[i][endIndex - 1];
10 }
11 characterCount[s.charCodeAt(endIndex - 1) - 'a'.charCodeAt(0)][endIndex]++;
12 }
13
14 // Initialize an array to store the answers to the queries
15 const answer: number[] = [];
16 // Process each query to count substrings ending with the same letter
17 for (const [leftIndex, rightIndex] of queries) {
18
19 // Start with the length of the substring as the base count
20 answer.push(rightIndex - leftIndex + 1);
21 // Count the pairs of identical characters within the query range
22 for (let i = 0; i < 26; i++) {
23 const countInRange: number = characterCount[i][rightIndex + 1] - characterCount[i][leftIndex];
24 answer[answer.length - 1] += (countInRange * (countInRange - 1)) / 2;
25 }
26 }
27 return answer;
28}
29
Time and Space Complexity
Time Complexity
The time complexity of the given code is O((n + m) * |Σ|)
, where n
is the length of the string s
and m
is the number of queries, while |Σ|
represents the size of the set of letters appearing in the string s
.
This complexity arises because for each of the n
characters in the string s
, we update a counter for each character in the character set Σ
. This process has a complexity of O(n * |Σ|)
. Additionally, for each query, we calculate a number of substrings based on character counts, which involves iterating over the set of characters again, leading to a complexity of O(m * |Σ|)
. Combining both parts results in O((n + m) * |Σ|)
total time complexity.
Space Complexity
The space complexity of the code is O(n * |Σ|)
because we store a count for each character of Σ
at each index of the string, leading to a two-dimensional data structure with dimensions n
(length of the string) by |Σ|
(number of unique characters in the character set).
Learn more about how to find time and space complexity quickly using problem constraints.
Which technique can we use to find the middle of a linked list?
Recommended Readings
Prefix Sum The prefix sum is an incredibly powerful and straightforward technique Its primary goal is to allow for constant time range sum queries on an array What is Prefix Sum The prefix sum of an array at index i is the sum of all numbers from index 0 to i By
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Don’t Miss This!