Leetcode 828. Unique Letter String

Problem Explanation

The problem at hand asks to calculate the sum of unique characters for all possible substrings of a given string. It requires to iterate over all substrings of a given string and compute the count of unique letters in each. For instance, for a given string "ABC", all possible substrings are "A","B","C","AB","BC" and "ABC". Each of these substrings has unique characters so the total sum would be 1 + 1 + 1 + 2 + 2 + 3 = 10.

The challenge of this problem lies in calculating the sum of unique characters for substrings efficiently, as there can be a large number of substrings, especially for long strings.

Walkthrough of Solution

The approach used for the solution is to iterate over the string while keeping track of the count of unique letters until the current position and the last seen position for each letter. For each position in the string, the count is updated by subtracting the last count for the current letter and adding the current count. The new count is then added to the total sum of unique characters.

The data structures used are two vector arrays:

  1. lastCount array that stores the last count of each letter.
  2. lastSeen array that keeps the position of the last occurrence of each letter.

Starting from the beginning of the string, for each character, we calculate how many substrings it's the end point with this new unique character, which is the length from its previous occurrence to the current position.

For example, consider a string "ABCA". For the first three characters, we have unique characters and there are 6 substrings as result("A","B","C","AB","BC","ABC"). When we come to the last 'A', "ABCA" is not a unique string and the count should be decreased by one because there is an 'A' already. But "BCA" is still a unique string and we should count it in. So, the substrings that contribute to the result when we add 'A' to "ABC" are "A", "CA", "BCA". The length of these strings to the current position is i - lastSeen[c].

Solution in Python

1
2python
3class Solution:
4    def uniqueLetterString(self, s: str) -> int:
5        ans = 0
6        count = 0
7        lastCount = [0]*26
8        lastSeen = [-1]*26
9
10        # Iterate over the string
11        for i in range(len(s)):
12            c = ord(s[i]) - ord('A')
13            currentCount = i - lastSeen[c]
14
15            # Update the count and total sum of unique characters
16            count = count - lastCount[c] + currentCount
17            lastCount[c] = currentCount
18            lastSeen[c] = i
19            ans += count
20
21        # Return the total sum of unique characters
22        return ans % (10**9 + 7)

Solution in JAVA

1
2java
3class Solution {
4    public int uniqueLetterString(String s) {
5        int ans = 0;
6        int count = 0;
7        int[] lastCount = new int[26];
8        int[] lastSeen = new int[26];
9
10        // Initialize lastSeen array to -1
11        Arrays.fill(lastSeen, -1);
12
13        // Iterate over the string
14        for (int i = 0; i < s.length(); i++) {
15            int c = s.charAt(i) - 'A';
16            int currentCount = i - lastSeen[c];
17
18            // Update the count and total sum of unique characters
19            count = count - lastCount[c] + currentCount;
20            lastCount[c] = currentCount;
21            lastSeen[c] = i;
22            ans += count;
23        }
24
25        // Return the total sum of unique characters
26        return ans % (1000000000 + 7);
27    }
28}

Solution in Javascript

1
2javascript
3var uniqueLetterString = function(s) {
4     let ans = 0;
5     let count = 0;
6     let lastCount = new Array(26).fill(0);
7     let lastSeen = new Array(26).fill(-1);
8
9     for (let i = 0; i < s.length; i++) {
10         let c = s.charCodeAt(i) - 65;
11         let currentCount = i - lastSeen[c];
12
13         // update count and ans
14         count = count - lastCount[c] + currentCount;
15         lastCount[c] = currentCount;
16         lastSeen[c] = i;
17         ans += count;
18     }
19
20     return ans % 1000000007;
21};

Solution in C++

1
2c++
3class Solution {
4public:
5    int uniqueLetterString(string s) {
6        int ans = 0, count = 0;
7        vector<int> lastCount(26), lastSeen(26, -1);
8
9        for (int i = 0; i < s.length(); ++i) {
10            int c = s[i] - 'A';
11            int currentCount = i - lastSeen[c];
12
13            // update count and ans
14            count = count - lastCount[c] + currentCount;
15            lastCount[c] = currentCount;
16            lastSeen[c] = i;
17            ans += count;
18        }
19
20        return ans % 1000000007;
21    }
22};

Solution in C#

1
2csharp
3public class Solution {
4    public int UniqueLetterString(string s) {
5        int ans = 0, count = 0;
6        int[] lastCount = new int[26];
7        int[] lastSeen = new int[26];
8
9        for(int i=0; i<26; i++) {
10            lastSeen[i] = -1;
11        }
12
13        for (int i = 0; i < s.Length; ++i) {
14            int c = s[i] - 'A';
15            int currentCount = i - lastSeen[c];
16
17            // update count and ans
18            count = count - lastCount[c] + currentCount;
19            lastCount[c] = currentCount;
20            lastSeen[c] = i;
21            ans += count;
22        }
23
24        return ans % 1000000007;
25    }
26}

Overall, it is clear that tracking the nth-to-last position of occurrence of a character being analyzed can be a powerful strategy for this type of problem. As the solution requires us to persistently track two data points (the character's nth-to-last position and the count of unique substrings), we make use of fixed-size arrays to keep track of these data points for all 26 alphabetic characters.

This type of problem exposes the programmer to thinking about how to use data structures like arrays to record past information to solve a current problem efficiently, a strategy commonly referred to as dynamic programming.

Additionally, the problem also demonstrates that hashing to arbitrary indices can be done to improve program performance, as with the use of the ord() function in Python and charCodeAt() in Javascript which maps a character to its ASCII equivalent, then subtracting the ASCII value of 'A' or 'a' to get '0' through '25' values for each letter.

C++, C#, and Java provide a similar solution to the explained Python solution in their respective syntax. All of them also rely on persisting count information in arrays or vectors indexed by character from 'A' to 'Z' to solve this problem.

Conclusion

The varying solutions contain algorithms that enable you to find the count of unique characters for all possible substrings of a string. Taking you through Python, Java, JavaScript, C++, and C# languages, this problem along with these diverse solutions provide not only resolutions to finding the track of unique characters in substrings but also gives an insight into the dynamic programming strategy. This can be beneficial in solving many problems that ask for optimization or counting the number of ways to do something, and are a common theme in competitive programming and many interview questions. It also demonstrates the effective use of basic data structures and built-in functions in different programming languages.


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫