828. Count Unique Characters of All Substrings of a Given String
Problem Description
The problem is about counting unique characters within all possible substrings of a given string s
. The goal is to compute the sum of the number of unique characters (countUniqueChars
) for each substring in s
. A unique character in a substring is a character that appears exactly once in it.
To clarify the requirement, let's consider if s = "ABC"
, we then look at each substring:
- "A" has 1 unique character,
- "B" has 1 unique character,
- "C" has 1 unique character,
- "AB" has 2 unique characters,
- "BC" has 2 unique characters,
- "ABC" has 3 unique characters.
Adding these counts together, the sum would be 1 + 1 + 1 + 2 + 2 + 3 = 10.
The challenge of the problem is that a straightforward approach to finding and counting unique characters for each substring could be very inefficient, especially when s
is long, because the number of substrings grows quadratically with the length of s
.
Intuition
To efficiently solve this problem, we need an approach that avoids redundantly counting unique characters in overlapping substrings. This is where the intuition for the provided solution comes into play.
We create a dictionary, d
, that maps each character in the string s
to the list of indices where it appears. This allows us to quickly understand where each character contributes to the uniqueness in the substrings.
By iterating through the dictionary, we can consider each character independently and determine its contribution to the overall count. A key insight is that the contribution of a character at position i
to the sum is determined by the distance to the previous occurrence of the same character and the distance to the next occurrence of the same character. This is because a character contributes to the uniqueness of all substrings that start after its previous occurrence and end before its next occurrence.
For example, if character 'A' appears at indices 3
, 5
, and 8
in s
, then for the substring s[4:6]
(which includes character at index 5), 'A' is unique. The number of such substrings is the product of the distance from the previous index (5 - 3
) and the distance to the next index (8 - 5
).
The solution efficiently calculates the contribution of each character by iterating through the list of indices in v
(enhanced with start and end markers at -1
and len(s)
, respectively) for each character, multiplying the distances from the current index to its previous and next, and summing up these products to get the total count.
Overall, the intuition is to transform the original problem into calculating character contributions based on their positions, rather than evaluating each substring individually.
Learn more about Dynamic Programming patterns.
Solution Approach
The solution uses a combination of dictionary and list data structures alongside some clever indexing to solve the problem efficiently.
Here's a step-by-step explanation:
-
Dictionary Creation: Create a dictionary
d
where each key-value pair consists of a character from the string and a list of indices where that character appears. This is achieved through enumeration of strings
. -
Initializing the Answer: Start with a variable
ans
initialized to0
. This variable will hold the sum of unique characters counts for all substrings. -
Iterate Over Each Character's Occurrences:
-
For each character, we get its list of indices where it appears in the string and add two sentinel indices at the beginning and end
-1
andlen(s)
. These sentinels represent fictive occurrences before the start and after the end of the string to handle edge cases for the first and last characters. -
Recursive Sub-problem Identification: Each unique occurrence of a character can be viewed as the center of a range extending to its previous and next occurrences. This character is unique in all substrings that start after its previous occurrence and end before its next occurrence.
-
-
Accumulate Contribution:
- Iterate over the list of indices (now with added sentinels) for the given character. For index
i
that corresponds to a true occurrence of the character (not the-1
orlen(s)
), calculate:- The distance to the previous index:
left = v[i] - v[i - 1]
. - The distance to the next index:
right = v[i + 1] - v[i]
.
- The distance to the previous index:
- The contribution of the character at this particular index to the overall unique character count is given by
left * right
. It represents the count of substrings where this character is unique, by combining the number of possible start points (left) with the number of possible endpoints (right).
- Iterate over the list of indices (now with added sentinels) for the given character. For index
-
Sum Up Contributions:
- Add each character's contributions to the
ans
variable.
- Add each character's contributions to the
-
Return the Result: After processing all characters, return the final accumulated value in
ans
, which represents the sum of unique characters for all substrings ofs
.
This solution approach leverages the indexing pattern to avoid redundant calculations by using the distances between character occurrences to infer the number of substrings where those characters are unique. It's a great example of how considering the problem from a different angle can lead to an efficient approach that avoids brute-force inefficiency.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's go through an example to illustrate the solution approach using the string s = "ABA"
.
-
Dictionary Creation:
- Iterate over characters of the string:
A
,B
,A
. - Create a dictionary
d
where we record the indices of each character:d = {'A': [0, 2], 'B': [1]}
.
- Iterate over characters of the string:
-
Initializing the Answer:
- Initialize
ans
to0
.
- Initialize
-
Iterate Over Each Character's Occurrences:
- We add sentinel values to the list of indices for each character in
d
. The updated lists will be{'A': [-1, 0, 2, 3], 'B': [-1, 1, 3]}
.
- We add sentinel values to the list of indices for each character in
-
Accumulate Contribution:
- For character
A
:- For the first true occurrence at index
0
: The left distance is0 - (-1) = 1
and the right distance is2 - 0 = 2
. The contribution is1 * 2 = 2
. - For the second true occurrence at index
2
: The left distance is2 - 0 = 2
and the right distance is3 - 2 = 1
. The contribution is2 * 1 = 2
.
- For the first true occurrence at index
- The total contribution for
A
is2 + 2 = 4
. - For character
B
:- For the true occurrence at index
1
: The left distance is1 - (-1) = 2
and the right distance is3 - 1 = 2
. The contribution is2 * 2 = 4
.
- For the true occurrence at index
- The total contribution for
B
is4
.
- For character
-
Sum Up Contributions:
- Sum up the contributions:
4 (from A) + 4 (from B) = 8
. - Update
ans
to8
.
- Sum up the contributions:
-
Return the Result:
- After processing all characters, return the accumulated value in
ans
, which is8
.
- After processing all characters, return the accumulated value in
This example demonstrates the efficiency of the solution approach, which counts the unique characters in all substrings without directly examining each substring.
Solution Implementation
1from collections import defaultdict
2
3class Solution:
4 def unique_letter_string(self, input_string: str) -> int:
5 # Create a default dictionary to store the indices of each character.
6 index_map = defaultdict(list)
7
8 # Iterate through the characters in the string, along with their indices.
9 for index, character in enumerate(input_string):
10 index_map[character].append(index)
11
12 # Initialize the answer to 0.
13 unique_count = 0
14
15 # Iterate through the values in the index map dictionary.
16 for indices in index_map.values():
17 # Add pseudo-indices at the beginning and end.
18 indices = [-1] + indices + [len(input_string)]
19
20 # Iterate through the indices of the current character.
21 for i in range(1, len(indices) - 1):
22 # Calculate the contribution of each index to the unique count.
23 # The idea is that for each index, we count contributions
24 # from the previous occurrence to the current (v[i] - v[i-1]),
25 # and from the current to the next occurrence (v[i+1] - v[i]).
26 unique_count += (indices[i] - indices[i - 1]) * (indices[i + 1] - indices[i])
27
28 # Return the total count of unique substrings.
29 return unique_count
30
1class Solution {
2 public int uniqueLetterString(String s) {
3 // Create a list of lists to keep track of the occurrences of each letter
4 List<Integer>[] indexList = new List[26];
5 // Initialize lists for each character 'A' to 'Z'
6 Arrays.setAll(indexList, x -> new ArrayList<>());
7
8 // Add a starting index -1 for each character,
9 // representing the position before the start of the string
10 for (int i = 0; i < 26; ++i) {
11 indexList[i].add(-1);
12 }
13
14 // Iterate over the string and add the index of each character to the corresponding list
15 for (int i = 0; i < s.length(); ++i) {
16 indexList[s.charAt(i) - 'A'].add(i);
17 }
18
19 // Initialize a variable to hold the sum of unique letter strings
20 int ans = 0;
21
22 // Iterate through each list in indexList
23 for (var occurences : indexList) {
24 // Add the length of the string as the last index for each character
25 occurences.add(s.length());
26
27 // Calculate contributions for each index
28 for (int i = 1; i < occurences.size() - 1; ++i) {
29 // The count for a unique letter string is determined by the product of the distance to
30 // the previous and next occurrence of the same character
31 ans += (occurences.get(i) - occurences.get(i - 1)) * (occurences.get(i + 1) - occurences.get(i));
32 }
33 }
34
35 // Return the total sum of unique letter strings
36 return ans;
37 }
38}
39
1class Solution {
2public:
3 // Function to calculate the sum of counts of unique characters in all substrings of the given string.
4 int uniqueLetterString(string s) {
5 // Create a 2D vector with 26 rows to store the indices of each character's occurrence.
6 // Initialize the first index as -1, which is used as a dummy index for calculation.
7 vector<vector<int>> index(26, {-1});
8
9 // Loop through the given string to fill in the actual indices of each character.
10 for (int i = 0; i < s.size(); ++i) {
11 index[s[i] - 'A'].push_back(i);
12 }
13
14 // Initialize the counter for the answer.
15 int count = 0;
16
17 // Loop through the 2D vector to calculate the count for each character.
18 for (auto& indices : index) {
19 // Push the dummy index equal to the length of the string for the calculation.
20 indices.push_back(s.size());
21
22 // Loop through each group of indices for the character.
23 for (int i = 1; i < indices.size() - 1; ++i) {
24 // Calculate the contribution of the character at position 'i' and add to the answer.
25 // The multiplier is the number of substrings where this character is unique.
26 count += (indices[i] - indices[i - 1]) * (indices[i + 1] - indices[i]);
27 }
28 }
29
30 // Return the total count of unique characters in all substrings of the string.
31 return count;
32 }
33};
34
1// Function to calculate the sum of counts of unique characters in all substrings of the given string.
2function uniqueLetterString(s: string): number {
3 // Create a 2D array with 26 elements to store the indices of each character's occurrence.
4 // Initialize the first index as -1, which is used as a dummy index for calculation.
5 let indices: number[][] = Array.from({ length: 26 }, () => [-1]);
6
7 // Loop through the given string to fill in the actual indices of each character.
8 for (let i = 0; i < s.length; i++) {
9 indices[s.charCodeAt(i) - 'A'.charCodeAt(0)].push(i);
10 }
11
12 // Initialize the counter for the answer.
13 let count: number = 0;
14
15 // Loop through the 2D array to calculate the count for each character.
16 indices.forEach((charIndices: number[]) => {
17 // Push the dummy index equal to the length of the string for the calculation.
18 charIndices.push(s.length);
19
20 // Loop through each group of indices for the character.
21 for (let i = 1; i < charIndices.length - 1; i++) {
22 // Calculate the contribution of the character at position 'i' and add to the answer.
23 // The contribution is the number of substrings where this character is unique.
24 count += (charIndices[i] - charIndices[i - 1]) * (charIndices[i + 1] - charIndices[i]);
25 }
26 });
27
28 // Return the total count of unique characters in all substrings of the string.
29 return count;
30}
31
Time and Space Complexity
Time Complexity
The given Python code computes the number of substrings where each character occurs exactly once. To understand the time complexity, let's analyze the steps in the code:
- A dictionary
d
is created using adefaultdict
to store positions of each character in strings
. - The string
s
is enumerated over once, so this isO(n)
wheren
is the length of the strings
. - For each character, the positions stored are iterated over in a nested loop to calculate the contribution of each character occurrence towards the unique substrings.
- There are
m
characters in strings
, and the nested loop iterates overk_i
positions for thei
-th character (k_i
is the number of times thei
-th character appears ins
). This results in a time complexity ofO(m * k_i)
.
Hence, the overall time complexity is O(n + m * k_i)
. However, since a character cannot appear more times than the string length n
, and there are at most "26" English letters, the m
and k_i
can be bound by n
, leading to a simplified bound of O(n^2)
.
Space Complexity
For space complexity,
- A dictionary
d
is used to store the index positions for each character ins
. Ifs
contains all unique characters, the space complexity of this operation would beO(n)
. - Temporary lists are created for the positions of individual characters with two extra elements for boundary positions; however, their impact on space complexity is negligible as they don't grow with
n
.
The predominant factor is the space taken by d
which is O(n)
.
So, the space complexity of the code is O(n)
.
Note that the auxiliary storage created within the loop is small and does not exceed the length of the string s
. Since there are a fixed number of english characters, this does not affect the overall space complexity significantly.
Learn more about how to find time and space complexity quickly using problem constraints.
Which algorithm should you use to find a node that is close to the root of the tree?
Recommended Readings
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
Patterns The Shortest Path Algorithm for Coding Interviews The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we can determine the
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Got a question?ย Ask the Monster Assistantย anything you don't understand.
Still not clear? ย Submitย the part you don't understand to our editors. Or join ourย Discord and ask the community.
Why do you need to multiply left with right (indices[i] - indices[i - 1]) * (indices[i + 1] - indices[i])? I don't figure out the sense