828. Count Unique Characters of All Substrings of a Given String
Problem Description
We need to find the sum of unique character counts across all possible substrings of a given string s
.
First, let's understand what "unique characters" means. A character is considered unique in a string if it appears exactly once in that string. For example:
- In
"LEETCODE"
, the unique characters are'L'
,'T'
,'C'
,'O'
,'D'
(they each appear once), while'E'
appears three times so it's not unique. The count of unique characters is 5. - In
"AAA"
, there are no unique characters since'A'
appears three times. The count is 0. - In
"ABC"
, all characters are unique. The count is 3.
The problem asks us to:
- Generate all possible substrings of the input string
s
- For each substring, count how many characters appear exactly once in that substring
- Sum up all these counts
For example, if s = "ABC"
:
- Substrings of length 1:
"A"
(1 unique),"B"
(1 unique),"C"
(1 unique) - Substrings of length 2:
"AB"
(2 unique),"BC"
(2 unique) - Substrings of length 3:
"ABC"
(3 unique) - Total sum = 1 + 1 + 1 + 2 + 2 + 3 = 10
Note that if a substring appears multiple times, we count each occurrence separately. For instance, if s = "ABA"
:
- The substring
"A"
appears at positions 0 and 2, so we count it twice - Each occurrence contributes to the final sum
The challenge is to efficiently calculate this sum without explicitly generating and checking every substring, as that would be computationally expensive for long strings.
Intuition
Instead of looking at every substring and counting unique characters in each one, let's flip our perspective: for each character in the string, we can ask "in how many substrings does this character appear exactly once?"
Consider a character at position i
in the string. For this character to be unique in a substring, that substring must include position i
but cannot include any other position where the same character appears.
Let's say character 'A'
appears at positions [2, 5, 8]
in our string. For the 'A'
at position 5 to be unique in a substring:
- The substring cannot extend left beyond position 2 (otherwise it would include another
'A'
) - The substring cannot extend right beyond position 8 (otherwise it would include another
'A'
) - The substring must include position 5
So the valid substrings are those that:
- Start anywhere from position
3
to position5
(positions after the previous'A'
up to and including position 5) - End anywhere from position
5
to position7
(from position 5 up to but not including the next'A'
)
The number of such substrings is (5 - 2) × (8 - 5) = 3 × 3 = 9
.
This gives us the key insight: for each character at position p
, if the previous occurrence of the same character is at position l
and the next occurrence is at position r
, then the character at p
contributes (p - l) × (r - p)
to the final answer.
By calculating this contribution for every character position in the string and summing them up, we get the total count of unique characters across all substrings. This approach is much more efficient than examining every substring individually.
To handle edge cases, we can treat the boundaries before the first occurrence as position -1
and after the last occurrence as position len(s)
, ensuring our formula works consistently for all characters.
Learn more about Dynamic Programming patterns.
Solution Approach
We implement the contribution-based approach using a hash table to track character positions:
-
Build Position Map: First, we create a dictionary
d
where each key is a character and each value is a list of indices where that character appears in the string. We iterate through the string once, recording each character's position:d = defaultdict(list) for i, c in enumerate(s): d[c].append(i)
For example, if
s = "LEETCODE"
, thend['E'] = [1, 2, 7]
. -
Calculate Contributions: For each character that appears in the string, we process its list of positions:
for v in d.values(): v = [-1] + v + [len(s)]
We pad the position list with
-1
at the beginning andlen(s)
at the end. This represents the boundaries - there's no occurrence of the character before index 0 or after the last index. -
Apply the Formula: For each position
i
in the padded list (excluding the boundary values), we calculate how many substrings have the character at positionv[i]
as unique:for i in range(1, len(v) - 1): ans += (v[i] - v[i - 1]) * (v[i + 1] - v[i])
v[i] - v[i - 1]
gives the number of possible starting positions for substrings (from right after the previous occurrence up to the current position)v[i + 1] - v[i]
gives the number of possible ending positions for substrings (from the current position up to right before the next occurrence)- Their product gives the total number of substrings where this character appears exactly once
-
Sum All Contributions: We accumulate the contributions from all character positions to get the final answer.
Example Walkthrough:
If s = "ABA"
:
d['A'] = [0, 2]
,d['B'] = [1]
- For
'A'
: padded list =[-1, 0, 2, 3]
- Position 0:
(0 - (-1)) × (2 - 0) = 1 × 2 = 2
- Position 2:
(2 - 0) × (3 - 2) = 2 × 1 = 2
- Position 0:
- For
'B'
: padded list =[-1, 1, 3]
- Position 1:
(1 - (-1)) × (3 - 1) = 2 × 2 = 4
- Position 1:
- Total =
2 + 2 + 4 = 8
The time complexity is O(n)
where n
is the length of the string, and space complexity is O(n)
for storing the position lists.
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with s = "ABCB"
:
Step 1: Build Position Map We track where each character appears:
'A'
appears at position[0]
'B'
appears at positions[1, 3]
'C'
appears at position[2]
So our dictionary d
becomes:
d['A'] = [0]
d['B'] = [1, 3]
d['C'] = [2]
Step 2: Pad Position Lists with Boundaries
We add -1
before and len(s) = 4
after each position list:
'A'
:[-1, 0, 4]
'B'
:[-1, 1, 3, 4]
'C'
:[-1, 2, 4]
Step 3: Calculate Contributions for Each Character Position
For 'A'
at position 0:
- Previous occurrence boundary:
-1
- Current position:
0
- Next occurrence boundary:
4
- Contribution:
(0 - (-1)) × (4 - 0) = 1 × 4 = 4
- This means the
'A'
at position 0 appears as unique in 4 substrings:"A"
,"AB"
,"ABC"
,"ABCB"
For 'B'
at position 1:
- Previous occurrence boundary:
-1
- Current position:
1
- Next occurrence:
3
- Contribution:
(1 - (-1)) × (3 - 1) = 2 × 2 = 4
- This means the
'B'
at position 1 appears as unique in 4 substrings:"B"
,"BC"
,"AB"
,"ABC"
For 'B'
at position 3:
- Previous occurrence:
1
- Current position:
3
- Next occurrence boundary:
4
- Contribution:
(3 - 1) × (4 - 3) = 2 × 1 = 2
- This means the
'B'
at position 3 appears as unique in 2 substrings:"CB"
,"B"
For 'C'
at position 2:
- Previous occurrence boundary:
-1
- Current position:
2
- Next occurrence boundary:
4
- Contribution:
(2 - (-1)) × (4 - 2) = 3 × 2 = 6
- This means the
'C'
at position 2 appears as unique in 6 substrings:"C"
,"BC"
,"CB"
,"ABC"
,"BCB"
,"ABCB"
Step 4: Sum All Contributions
Total = 4 + 4 + 2 + 6 = 16
We can verify this by manually checking all substrings:
- Length 1:
"A"
(1),"B"
(1),"C"
(1),"B"
(1) = 4 total - Length 2:
"AB"
(2),"BC"
(2),"CB"
(2) = 6 total - Length 3:
"ABC"
(2),"BCB"
(1) = 3 total - Length 4:
"ABCB"
(2) = 3 total - Sum = 4 + 6 + 3 + 3 = 16 ✓
Solution Implementation
1from collections import defaultdict
2from typing import List
3
4class Solution:
5 def uniqueLetterString(self, s: str) -> int:
6 # Dictionary to store all positions where each character appears
7 char_positions = defaultdict(list)
8
9 # Record the index position for each character in the string
10 for index, char in enumerate(s):
11 char_positions[char].append(index)
12
13 total_count = 0
14
15 # For each unique character, calculate its contribution to all substrings
16 for positions in char_positions.values():
17 # Add boundary markers: -1 at start and len(s) at end
18 # This helps calculate the range where each occurrence is unique
19 positions_with_bounds = [-1] + positions + [len(s)]
20
21 # For each occurrence of the character, calculate how many substrings
22 # contain this occurrence as the unique instance of that character
23 for i in range(1, len(positions_with_bounds) - 1):
24 # Left distance: from previous occurrence (or start boundary)
25 left_distance = positions_with_bounds[i] - positions_with_bounds[i - 1]
26 # Right distance: to next occurrence (or end boundary)
27 right_distance = positions_with_bounds[i + 1] - positions_with_bounds[i]
28 # Total substrings where this character occurrence is unique
29 total_count += left_distance * right_distance
30
31 return total_count
32
1class Solution {
2 public int uniqueLetterString(String s) {
3 // Create an array of lists to store indices for each letter (A-Z)
4 // Each list will contain all positions where that letter appears
5 List<Integer>[] letterIndices = new List[26];
6 Arrays.setAll(letterIndices, k -> new ArrayList<>());
7
8 // Initialize each list with -1 as a boundary marker for the left side
9 for (int i = 0; i < 26; i++) {
10 letterIndices[i].add(-1);
11 }
12
13 // Record the index of each character in the string
14 for (int i = 0; i < s.length(); i++) {
15 int letterIndex = s.charAt(i) - 'A';
16 letterIndices[letterIndex].add(i);
17 }
18
19 int totalCount = 0;
20
21 // Calculate contribution of each letter to the total count
22 for (List<Integer> positions : letterIndices) {
23 // Add string length as a boundary marker for the right side
24 positions.add(s.length());
25
26 // For each occurrence of the letter, calculate how many substrings
27 // contain this occurrence as a unique character
28 for (int i = 1; i < positions.size() - 1; i++) {
29 int previousPosition = positions.get(i - 1);
30 int currentPosition = positions.get(i);
31 int nextPosition = positions.get(i + 1);
32
33 // Number of substrings where current occurrence is unique:
34 // (number of possible left boundaries) * (number of possible right boundaries)
35 int leftChoices = currentPosition - previousPosition;
36 int rightChoices = nextPosition - currentPosition;
37 totalCount += leftChoices * rightChoices;
38 }
39 }
40
41 return totalCount;
42 }
43}
44
1class Solution {
2public:
3 int uniqueLetterString(string s) {
4 // Create a 2D vector to store positions of each character ('A' to 'Z')
5 // Initialize each character's position list with -1 as a sentinel value
6 vector<vector<int>> charPositions(26, {-1});
7
8 // Record the position of each character in the string
9 for (int i = 0; i < s.size(); ++i) {
10 int charIndex = s[i] - 'A'; // Convert character to index (0-25)
11 charPositions[charIndex].push_back(i);
12 }
13
14 int totalCount = 0;
15
16 // Process each character's position list
17 for (auto& positions : charPositions) {
18 // Add string length as the right boundary sentinel
19 positions.push_back(s.size());
20
21 // For each occurrence of the character, calculate its contribution
22 // The contribution is: (distance to previous occurrence) * (distance to next occurrence)
23 // This counts how many substrings contain this character exactly once
24 for (int i = 1; i < positions.size() - 1; ++i) {
25 int leftDistance = positions[i] - positions[i - 1];
26 int rightDistance = positions[i + 1] - positions[i];
27 totalCount += leftDistance * rightDistance;
28 }
29 }
30
31 return totalCount;
32 }
33};
34
1/**
2 * Counts the sum of unique characters in all substrings of the given string.
3 * A character is unique in a substring if it appears exactly once in that substring.
4 *
5 * @param s - Input string containing only uppercase letters
6 * @returns The sum of count of unique characters across all substrings
7 */
8function uniqueLetterString(s: string): number {
9 // Initialize a 2D array to store positions of each letter (A-Z)
10 // Each sub-array starts with -1 as a boundary marker
11 const letterPositions: number[][] = Array.from({ length: 26 }, () => [-1]);
12
13 // Record the position of each character in the string
14 for (let i = 0; i < s.length; ++i) {
15 const letterIndex = s.charCodeAt(i) - 'A'.charCodeAt(0);
16 letterPositions[letterIndex].push(i);
17 }
18
19 let totalUniqueCount = 0;
20
21 // Process each letter's contribution to the total count
22 for (const positions of letterPositions) {
23 // Add string length as right boundary marker
24 positions.push(s.length);
25
26 // For each occurrence of the letter, calculate its contribution
27 // The contribution is the number of substrings where this occurrence is unique
28 for (let i = 1; i < positions.length - 1; ++i) {
29 const leftDistance = positions[i] - positions[i - 1];
30 const rightDistance = positions[i + 1] - positions[i];
31 // Multiply left and right distances to get number of substrings
32 // where this character occurrence is the only one
33 totalUniqueCount += leftDistance * rightDistance;
34 }
35 }
36
37 return totalUniqueCount;
38}
39
Time and Space Complexity
Time Complexity: O(n)
, where n
is the length of the string s
.
The algorithm consists of two main parts:
- First loop: Iterates through the string once to build the dictionary, which takes
O(n)
time. - Second loop: Iterates through each unique character's position list. Since each position in the string is processed exactly once across all characters (each index appears in exactly one character's list), the total work done is
O(n)
.
Even though there's a nested loop structure, each index of the string contributes to exactly one calculation, making the overall time complexity linear.
Space Complexity: O(n)
, where n
is the length of the string s
.
The space is used for:
- Dictionary
d
: Stores position lists for each unique character. In the worst case (all characters are the same), we storen
positions in one list. In the best case (all characters are different), we storen
positions distributed across multiple lists. Either way, the total space for all positions isO(n)
. - Temporary list
v
: For each character, we create a new list with at mostn + 2
elements (original positions plus two sentinels). However, this doesn't add to the overall complexity as it's still bounded byO(n)
.
Therefore, the total space complexity is O(n)
.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Incorrect Boundary Handling
One of the most common mistakes is forgetting to add proper boundaries (-1
and len(s)
) to the position lists. Without these boundaries, the calculation for characters at the beginning or end of the string will be incorrect.
Incorrect approach:
for positions in char_positions.values():
# Missing boundaries - will cause index errors or wrong calculations
for i in range(len(positions)):
left = positions[i] - positions[i-1] if i > 0 else positions[i] + 1
right = positions[i+1] - positions[i] if i < len(positions)-1 else len(s) - positions[i]
total_count += left * right
Solution: Always add boundaries before processing:
positions_with_bounds = [-1] + positions + [len(s)]
2. Off-by-One Errors in Distance Calculation
Another pitfall is making off-by-one errors when calculating the left and right distances. Some might incorrectly add or subtract 1 from the distances.
Incorrect approach:
# Wrong: Adding unnecessary +1 or -1 left_distance = positions_with_bounds[i] - positions_with_bounds[i - 1] - 1 right_distance = positions_with_bounds[i + 1] - positions_with_bounds[i] - 1
Solution: The distances should be calculated directly without any adjustments:
left_distance = positions_with_bounds[i] - positions_with_bounds[i - 1] right_distance = positions_with_bounds[i + 1] - positions_with_bounds[i]
3. Using Regular Dictionary Instead of defaultdict
Using a regular dictionary requires checking if a key exists before appending, which adds unnecessary complexity and potential for errors.
Incorrect approach:
char_positions = {}
for index, char in enumerate(s):
if char not in char_positions:
char_positions[char] = []
char_positions[char].append(index)
Solution: Use defaultdict(list)
for cleaner code:
char_positions = defaultdict(list)
for index, char in enumerate(s):
char_positions[char].append(index)
4. Modifying the Original Position List
A subtle but dangerous pitfall is modifying the original position list when adding boundaries, which could cause issues if the list is used elsewhere.
Incorrect approach:
for positions in char_positions.values():
positions.insert(0, -1) # Modifies original list
positions.append(len(s)) # Modifies original list
# Process positions...
Solution: Create a new list with boundaries:
positions_with_bounds = [-1] + positions + [len(s)]
5. Misunderstanding the Problem - Counting Characters Instead of Contributions
Some might try to count unique characters in each substring explicitly, leading to an O(n³) solution that times out for large inputs.
Incorrect approach:
total = 0
for i in range(len(s)):
for j in range(i, len(s)):
substring = s[i:j+1]
char_count = {}
for char in substring:
char_count[char] = char_count.get(char, 0) + 1
unique_count = sum(1 for count in char_count.values() if count == 1)
total += unique_count
Solution: Use the contribution-based approach that calculates how many substrings contain each character position as unique, achieving O(n) time complexity.
Which technique can we use to find the middle of a linked list?
Recommended Readings
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Don’t Miss This!