828. Count Unique Characters of All Substrings of a Given String


Problem Description

The problem is about counting unique characters within all possible substrings of a given string s. The goal is to compute the sum of the number of unique characters (countUniqueChars) for each substring in s. A unique character in a substring is a character that appears exactly once in it.

To clarify the requirement, let's consider if s = "ABC", we then look at each substring:

  • "A" has 1 unique character,
  • "B" has 1 unique character,
  • "C" has 1 unique character,
  • "AB" has 2 unique characters,
  • "BC" has 2 unique characters,
  • "ABC" has 3 unique characters.

Adding these counts together, the sum would be 1 + 1 + 1 + 2 + 2 + 3 = 10.

The challenge of the problem is that a straightforward approach to finding and counting unique characters for each substring could be very inefficient, especially when s is long, because the number of substrings grows quadratically with the length of s.

Intuition

To efficiently solve this problem, we need an approach that avoids redundantly counting unique characters in overlapping substrings. This is where the intuition for the provided solution comes into play.

We create a dictionary, d, that maps each character in the string s to the list of indices where it appears. This allows us to quickly understand where each character contributes to the uniqueness in the substrings.

By iterating through the dictionary, we can consider each character independently and determine its contribution to the overall count. A key insight is that the contribution of a character at position i to the sum is determined by the distance to the previous occurrence of the same character and the distance to the next occurrence of the same character. This is because a character contributes to the uniqueness of all substrings that start after its previous occurrence and end before its next occurrence.

For example, if character 'A' appears at indices 3, 5, and 8 in s, then for the substring s[4:6] (which includes character at index 5), 'A' is unique. The number of such substrings is the product of the distance from the previous index (5 - 3) and the distance to the next index (8 - 5).

The solution efficiently calculates the contribution of each character by iterating through the list of indices in v (enhanced with start and end markers at -1 and len(s), respectively) for each character, multiplying the distances from the current index to its previous and next, and summing up these products to get the total count.

Overall, the intuition is to transform the original problem into calculating character contributions based on their positions, rather than evaluating each substring individually.

Learn more about Dynamic Programming patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece

Which data structure is used in a depth first search?

Solution Approach

The solution uses a combination of dictionary and list data structures alongside some clever indexing to solve the problem efficiently.

Here's a step-by-step explanation:

  1. Dictionary Creation: Create a dictionary d where each key-value pair consists of a character from the string and a list of indices where that character appears. This is achieved through enumeration of string s.

  2. Initializing the Answer: Start with a variable ans initialized to 0. This variable will hold the sum of unique characters counts for all substrings.

  3. Iterate Over Each Character's Occurrences:

    • For each character, we get its list of indices where it appears in the string and add two sentinel indices at the beginning and end -1 and len(s). These sentinels represent fictive occurrences before the start and after the end of the string to handle edge cases for the first and last characters.

    • Recursive Sub-problem Identification: Each unique occurrence of a character can be viewed as the center of a range extending to its previous and next occurrences. This character is unique in all substrings that start after its previous occurrence and end before its next occurrence.

  4. Accumulate Contribution:

    • Iterate over the list of indices (now with added sentinels) for the given character. For index i that corresponds to a true occurrence of the character (not the -1 or len(s)), calculate:
      • The distance to the previous index: left = v[i] - v[i - 1].
      • The distance to the next index: right = v[i + 1] - v[i].
    • The contribution of the character at this particular index to the overall unique character count is given by left * right. It represents the count of substrings where this character is unique, by combining the number of possible start points (left) with the number of possible endpoints (right).
  5. Sum Up Contributions:

    • Add each character's contributions to the ans variable.
  6. Return the Result: After processing all characters, return the final accumulated value in ans, which represents the sum of unique characters for all substrings of s.

This solution approach leverages the indexing pattern to avoid redundant calculations by using the distances between character occurrences to infer the number of substrings where those characters are unique. It's a great example of how considering the problem from a different angle can lead to an efficient approach that avoids brute-force inefficiency.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

What is the worst case running time for finding an element in a binary tree (not necessarily binary search tree) of size n?

Example Walkthrough

Let's go through an example to illustrate the solution approach using the string s = "ABA".

  1. Dictionary Creation:

    • Iterate over characters of the string: A, B, A.
    • Create a dictionary d where we record the indices of each character: d = {'A': [0, 2], 'B': [1]}.
  2. Initializing the Answer:

    • Initialize ans to 0.
  3. Iterate Over Each Character's Occurrences:

    • We add sentinel values to the list of indices for each character in d. The updated lists will be {'A': [-1, 0, 2, 3], 'B': [-1, 1, 3]}.
  4. Accumulate Contribution:

    • For character A:
      • For the first true occurrence at index 0: The left distance is 0 - (-1) = 1 and the right distance is 2 - 0 = 2. The contribution is 1 * 2 = 2.
      • For the second true occurrence at index 2: The left distance is 2 - 0 = 2 and the right distance is 3 - 2 = 1. The contribution is 2 * 1 = 2.
    • The total contribution for A is 2 + 2 = 4.
    • For character B:
      • For the true occurrence at index 1: The left distance is 1 - (-1) = 2 and the right distance is 3 - 1 = 2. The contribution is 2 * 2 = 4.
    • The total contribution for B is 4.
  5. Sum Up Contributions:

    • Sum up the contributions: 4 (from A) + 4 (from B) = 8.
    • Update ans to 8.
  6. Return the Result:

    • After processing all characters, return the accumulated value in ans, which is 8.

This example demonstrates the efficiency of the solution approach, which counts the unique characters in all substrings without directly examining each substring.

Solution Implementation

1from collections import defaultdict
2
3class Solution:
4    def unique_letter_string(self, input_string: str) -> int:
5        # Create a default dictionary to store the indices of each character.
6        index_map = defaultdict(list)
7      
8        # Iterate through the characters in the string, along with their indices.
9        for index, character in enumerate(input_string):
10            index_map[character].append(index)
11      
12        # Initialize the answer to 0.
13        unique_count = 0
14      
15        # Iterate through the values in the index map dictionary.
16        for indices in index_map.values():
17            # Add pseudo-indices at the beginning and end.
18            indices = [-1] + indices + [len(input_string)]
19          
20            # Iterate through the indices of the current character.
21            for i in range(1, len(indices) - 1):
22                # Calculate the contribution of each index to the unique count.
23                # The idea is that for each index, we count contributions
24                # from the previous occurrence to the current (v[i] - v[i-1]),
25                # and from the current to the next occurrence (v[i+1] - v[i]).
26                unique_count += (indices[i] - indices[i - 1]) * (indices[i + 1] - indices[i])
27      
28        # Return the total count of unique substrings.
29        return unique_count
30
1class Solution {
2    public int uniqueLetterString(String s) {
3        // Create a list of lists to keep track of the occurrences of each letter
4        List<Integer>[] indexList = new List[26];
5        // Initialize lists for each character 'A' to 'Z'
6        Arrays.setAll(indexList, x -> new ArrayList<>());
7      
8        // Add a starting index -1 for each character,
9        // representing the position before the start of the string 
10        for (int i = 0; i < 26; ++i) {
11            indexList[i].add(-1);
12        }
13      
14        // Iterate over the string and add the index of each character to the corresponding list
15        for (int i = 0; i < s.length(); ++i) {
16            indexList[s.charAt(i) - 'A'].add(i);
17        }
18      
19        // Initialize a variable to hold the sum of unique letter strings
20        int ans = 0;
21      
22        // Iterate through each list in indexList
23        for (var occurences : indexList) {
24            // Add the length of the string as the last index for each character
25            occurences.add(s.length());
26
27            // Calculate contributions for each index
28            for (int i = 1; i < occurences.size() - 1; ++i) {
29                // The count for a unique letter string is determined by the product of the distance to
30                // the previous and next occurrence of the same character
31                ans += (occurences.get(i) - occurences.get(i - 1)) * (occurences.get(i + 1) - occurences.get(i));
32            }
33        }
34      
35        // Return the total sum of unique letter strings
36        return ans;
37    }
38}
39
1class Solution {
2public:
3    // Function to calculate the sum of counts of unique characters in all substrings of the given string.
4    int uniqueLetterString(string s) {
5        // Create a 2D vector with 26 rows to store the indices of each character's occurrence.
6        // Initialize the first index as -1, which is used as a dummy index for calculation.
7        vector<vector<int>> index(26, {-1});
8      
9        // Loop through the given string to fill in the actual indices of each character.
10        for (int i = 0; i < s.size(); ++i) {
11            index[s[i] - 'A'].push_back(i);
12        }
13      
14        // Initialize the counter for the answer.
15        int count = 0;
16      
17        // Loop through the 2D vector to calculate the count for each character.
18        for (auto& indices : index) {
19            // Push the dummy index equal to the length of the string for the calculation.
20            indices.push_back(s.size());
21          
22            // Loop through each group of indices for the character.
23            for (int i = 1; i < indices.size() - 1; ++i) {
24                // Calculate the contribution of the character at position 'i' and add to the answer.
25                // The multiplier is the number of substrings where this character is unique.
26                count += (indices[i] - indices[i - 1]) * (indices[i + 1] - indices[i]);
27            }
28        }
29      
30        // Return the total count of unique characters in all substrings of the string.
31        return count;
32    }
33};
34
1// Function to calculate the sum of counts of unique characters in all substrings of the given string.
2function uniqueLetterString(s: string): number {
3    // Create a 2D array with 26 elements to store the indices of each character's occurrence.
4    // Initialize the first index as -1, which is used as a dummy index for calculation.
5    let indices: number[][] = Array.from({ length: 26 }, () => [-1]);
6
7    // Loop through the given string to fill in the actual indices of each character.
8    for (let i = 0; i < s.length; i++) {
9        indices[s.charCodeAt(i) - 'A'.charCodeAt(0)].push(i);
10    }
11
12    // Initialize the counter for the answer.
13    let count: number = 0;
14
15    // Loop through the 2D array to calculate the count for each character.
16    indices.forEach((charIndices: number[]) => {
17        // Push the dummy index equal to the length of the string for the calculation.
18        charIndices.push(s.length);
19
20        // Loop through each group of indices for the character.
21        for (let i = 1; i < charIndices.length - 1; i++) {
22            // Calculate the contribution of the character at position 'i' and add to the answer.
23            // The contribution is the number of substrings where this character is unique.
24            count += (charIndices[i] - charIndices[i - 1]) * (charIndices[i + 1] - charIndices[i]);
25        }
26    });
27
28    // Return the total count of unique characters in all substrings of the string.
29    return count;
30}
31
Not Sure What to Study? Take the 2-min Quiz

What's the relationship between a tree and a graph?

Time and Space Complexity

Time Complexity

The given Python code computes the number of substrings where each character occurs exactly once. To understand the time complexity, let's analyze the steps in the code:

  1. A dictionary d is created using a defaultdict to store positions of each character in string s.
  2. The string s is enumerated over once, so this is O(n) where n is the length of the string s.
  3. For each character, the positions stored are iterated over in a nested loop to calculate the contribution of each character occurrence towards the unique substrings.
  4. There are m characters in string s, and the nested loop iterates over k_i positions for the i-th character (k_i is the number of times the i-th character appears in s). This results in a time complexity of O(m * k_i).

Hence, the overall time complexity is O(n + m * k_i). However, since a character cannot appear more times than the string length n, and there are at most "26" English letters, the m and k_i can be bound by n, leading to a simplified bound of O(n^2).

Space Complexity

For space complexity,

  1. A dictionary d is used to store the index positions for each character in s. If s contains all unique characters, the space complexity of this operation would be O(n).
  2. Temporary lists are created for the positions of individual characters with two extra elements for boundary positions; however, their impact on space complexity is negligible as they don't grow with n.

The predominant factor is the space taken by d which is O(n).

So, the space complexity of the code is O(n).

Note that the auxiliary storage created within the loop is small and does not exceed the length of the string s. Since there are a fixed number of english characters, this does not affect the overall space complexity significantly.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

What are the most two important steps in writing a depth first search function? (Select 2)


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.

←
↑TA đŸ‘šâ€đŸ«