940. Distinct Subsequences II
Problem Description
Given a string s
, the task is to calculate the total number of distinct non-empty subsequences that can be formed from the characters of the string. Note that a subsequence maintains the original order of characters but does not necessarily include all characters. Importantly, if the solution is a very large number, it should be returned modulo 10^9 + 7
to keep the number within manageable bounds. This modular operation ensures that we deal with smaller numbers that are more practical for computation and comparison.
Intuition
The intuition for solving this problem lies in dynamic programming - a method used for solving complex problems by breaking them down into simpler subproblems. The key insight is to build up the number of distinct subsequences as we iterate through each character of the given string.
We maintain an array dp
of size 26 (to account for each letter of the alphabet) which keeps track of the contribution of each character towards the number of distinct subsequences so far. The variable ans
stores the total count of distinct subsequences at any given point.
As we traverse the string:
- We calculate the index
i
corresponding to the characterc
(by finding the difference between the ASCII value ofc
and that ofa
). - The variable
add
is set to the difference between the current total number of subsequencesans
and the old count of subsequences ending with the characterc
(dp[i]
). We add 1 to this because a new subsequence consisting of just the characterc
can also be formed. - The
ans
is updated to include the new subsequences introduced by including the characterc
. The newans
is also taken modulo10^9 + 7
to handle the large numbers. - The count of subsequences ending with the character
c
(dp[i]
) is incremented byadd
to reflect the updated state.
This approach ensures that each character's contribution is accounted for exactly once, and by the end of the iteration, ans
contains the count of all possible distinct subsequences modulo 10^9 + 7
.
Learn more about Dynamic Programming patterns.
Solution Approach
The implementation of the solution is fairly straightforward once we've understood the intuition behind it. This problem employs dynamic programming and a single array to keep track of the contributing counts. Here's the breakdown of the approach:
-
Initialize the Modulus and DP Array:
- We use a modulus
mod
set to10**9 + 7
to ensure we manage the large numbers effectively by returning the number of subsequences modulo this value. - The data structure
dp
is an array initialized to size 26 (representing the English alphabet) with all zeroes. This array will store the count of subsequences that end with a particular character.
- We use a modulus
-
Iterate Over the String:
- For each character
c
in the strings
, we do the following steps:
- For each character
-
Determine the Index for
c
:- We calculate an index
i
by taking the ASCII value of the characterc
, subtracting the ASCII value of 'a' from it (ord(c) - ord('a')
). This gives us a unique index from 0 to 25 for each lowercase letter of the alphabet.
- We calculate an index
-
Calculate the Additive Contribution:
- Compute
add
, which will determine how much the current characterc
will contribute to the new subsequences count. It is determined by the current total of distinct subsequencesans
minus the previous count stored indp[i]
for that characterc
. We add one toadd
to account for the subsequence consisting solely of the new characterc
.
- Compute
-
Update the Total Count of Subsequences:
- Update
ans
by adding the new contribution from the characterc
. Applying modulomod
here ensures we handle overflow and large numbers.
- Update
-
Update the DP Array:
- Increase the count in
dp[i]
by the value ofadd
. This means that any subsequences that now end with the current characterc
include the old subsequences plus any new subsequences formed due to addingc
.
- Increase the count in
Using this method, we build up the solution incrementally, considering the influence of each character on the subsequences. The reason we have to subtract the old count of subsequences for the character c
before adding it again (after increasing with add
) is to ensure that we do not double-count subsequences already accounted for by previous appearances of c
.
The complexity of this solution is O(n), where n is the length of the string since we go through every character of the string once, performing O(1) operations for each.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
To illustrate the solution approach, let's go through a small example using the string "aba"
.
-
Initialize the Modulus and DP Array:
We setmod
to10^9 + 7
. Ourdp
array, which has 26 elements for each letter of the English alphabet, is initialized to zeroes. -
Iterate Over the String:
We begin iterating through the string"aba"
character by character. -
First Character - 'a':
- Determine the Index for 'a': We calculate the index for 'a' as
0
(since'a' - 'a' = 0
). - Calculate the Additive Contribution: Since this is the first character, and there are no previous subsequences, we calculate
add = ans (initially 0) - dp[0] + 1 = 0 - 0 + 1 = 1
. - Update the Total Count of Subsequences: We set
ans = ans + add = 0 + 1 = 1
(the subsequences are now""
and"a"
). - Update the DP Array: We update
dp[0]
todp[0] + add = 0 + 1 = 1
.
- Determine the Index for 'a': We calculate the index for 'a' as
-
Second Character - 'b':
- Determine the Index for 'b': The index for 'b' is
1
('b' - 'a' = 1
). - Calculate the Additive Contribution: The current total count of distinct subsequences,
ans
, is 1.add = ans - dp[1] + 1 = 1 - 0 + 1 = 2
(which corresponds to new subsequences"b"
and"ab"
). - Update the Total Count of Subsequences: Now
ans = ans + add = 1 + 2 = 3
(the subsequences are""
,"a"
,"b"
, and"ab"
). - Update the DP Array: We update
dp[1]
todp[1] + add = 0 + 2 = 2
.
- Determine the Index for 'b': The index for 'b' is
-
Third Character - 'a' (again):
- Determine the Index for 'a': The index is still
0
. - Calculate the Additive Contribution: We know
ans
is 3, and since we've encountered 'a' before, we subtract its previous contribution fromans
and add 1:add = ans - dp[0] + 1 = 3 - 1 + 1 = 3
(these are new subsequences:"a"
,"ba"
, and"aba"
). - Update the Total Count of Subsequences: The new
ans
will beans + add = 3 + 3 = 6
(the subsequences now are""
,"a"
,"b"
,"ab"
,"ba"
,"aa"
, and"aba"
; note that we don't count subsequences like"aa"
as distinct since the order must be maintained). - Update the DP Array: We set
dp[0]
todp[0] + add = 1 + 3 = 4
.
- Determine the Index for 'a': The index is still
At the end of this process, the final answer for the total count of distinct non-empty subsequences is 6
. However, if ans
were larger, we would apply the modulo operation to ensure we obtain a result within the bounded range.
This walkthrough demonstrates how the algorithm incrementally computes the count of distinct subsequences using dynamic programming and efficient arithmetic operations, handling each character's contribution exactly once.
Solution Implementation
1class Solution:
2 def distinctSubseqII(self, s: str) -> int:
3 # Define the modulo value to handle large numbers
4 MOD = 10**9 + 7
5
6 # Initialize an array to keep track of the last count of subsequences
7 # ending with each letter of the alphabet
8 last_count = [0] * 26
9
10 # Initialize the total count of distinct subsequences
11 total_count = 0
12
13 # Iterate over each character in the string
14 for char in s:
15 # Get the index of the current character in the alphabet (0-25)
16 index = ord(char) - ord('a')
17
18 # Calculate how many new subsequences are added by this character:
19 # It is total_count (all previous subsequences) minus last_count[index]
20 # (which we have already counted with the current character) plus 1 for the character itself
21 added_subseq = total_count - last_count[index] + 1
22
23 # Update the total count of distinct subsequences
24 total_count = (total_count + added_subseq) % MOD
25
26 # Update the last count of subsequences for the current character
27 last_count[index] = (last_count[index] + added_subseq) % MOD
28
29 # Return the total count of distinct subsequences
30 return total_count
31
1class Solution {
2 private static final int MOD = (int) 1e9 + 7; // Modulus value for handling large numbers
3
4 public int distinctSubseqII(String s) {
5 int[] lastOccurrenceCount = new int[26]; // Array to store the last occurrence count of each character
6 int totalDistinctSubsequences = 0; // Variable to store the total count of distinct subsequences
7
8 // Iterate through each character in the string
9 for (int i = 0; i < s.length(); ++i) {
10 // Determine the alphabet index of current character
11 int alphabetIndex = s.charAt(i) - 'a';
12
13 // Calculate the number to add. This number represents the new subsequences that will be formed by adding the new character.
14 // Subtract the last occurrence count of this character to avoid counting subsequences formed by prior occurrences of this character.
15 // And add 1 for the subsequence consisting of the character itself.
16 int newSubsequences = (totalDistinctSubsequences - lastOccurrenceCount[alphabetIndex] + 1 + MOD) % MOD;
17
18 // Update the totalDistinctSubsequences by adding newSubsequences
19 totalDistinctSubsequences = (totalDistinctSubsequences + newSubsequences) % MOD;
20
21 // Update the last occurrence count for this character in the lastOccurrenceCount array
22 lastOccurrenceCount[alphabetIndex] = (lastOccurrenceCount[alphabetIndex] + newSubsequences) % MOD;
23 }
24
25 // Since the result can be negative due to the subtraction during the loop,
26 // we add MOD and then take the modulus to ensure a non-negative result
27 return (totalDistinctSubsequences + MOD) % MOD;
28 }
29}
30
1class Solution {
2public:
3 const int MODULO = 1e9 + 7;
4
5 // Method to calculate the number of distinct subsequences in the string `s`.
6 int distinctSubseqII(string s) {
7 vector<long> lastOccurrence(26, 0); // Array to store the last occurrence contribution for each character
8 long totalCount = 0; // Total count of distinct subsequences
9
10 // Loop through each character in the string
11 for (char& c : s) {
12 int index = c - 'a'; // Map character 'a' to 'z' to index 0 to 25
13 long additionalCount = (totalCount - lastOccurrence[index] + 1 + MODULO) % MODULO; // Calculate the additional count for current character
14
15 totalCount = (totalCount + additionalCount) % MODULO; // Update the total count
16 lastOccurrence[index] = (lastOccurrence[index] + additionalCount) % MODULO; // Update the last occurrence contribution for current character
17 }
18
19 return (int)totalCount; // Return the total count of distinct subsequences as an integer
20 }
21};
22
1function distinctSubseqII(s: string): number {
2 const MODULO: number = 1e9 + 7; // A constant for the modulo operation to prevent overflow
3 const lastOccurrence = new Array<number>(26).fill(0); // Store the count of distinct subsequences ending with each letter
4
5 // Iterate over each character in the string
6 for (const char of s) {
7 const charIndex: number = char.charCodeAt(0) - 'a'.charCodeAt(0); // Map 'a' to 0, 'b' to 1, etc.
8
9 // Calculate the number of new distinct subsequences ending with the current character
10 // It is the sum of all distinct subsequences seen so far plus 1 (for the char itself)
11 lastOccurrence[charIndex] = lastOccurrence.reduce((runningTotal, currentValue) =>
12 (runningTotal + currentValue) % MODULO, 0) + 1;
13 }
14
15 // Return the sum of all distinct subsequences modulo the defined constant to avoid overflow
16 return lastOccurrence.reduce((runningTotal, currentValue) =>
17 (runningTotal + currentValue) % MODULO, 0);
18}
19
Time and Space Complexity
The provided code computes the count of distinct subsequences in a string using dynamic programming.
Time Complexity
The key operation is iterating over each character in the input string s
. For each character, the algorithm performs a constant amount of work; it updates ans
and modifies an element in the dp
array, which is of fixed size 26 (corresponding to the lowercase English alphabet). The time complexity is therefore O(N), where N is the length of the string s
.
Space Complexity
The space complexity of the algorithm is defined by the space needed to store the dp
array and the variables used. The dp
array requires space for 26 integers, regardless of the length of the input string. Hence, the space complexity is O(1) since the required space does not scale with the input size but is a constant size due to the fixed alphabet size.
Learn more about how to find time and space complexity quickly using problem constraints.
What are the most two important steps in writing a depth first search function? (Select 2)
Recommended Readings
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
Patterns The Shortest Path Algorithm for Coding Interviews The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we can determine the
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Got a question?ย Ask the Monster Assistantย anything you don't understand.
Still not clear? ย Submitย the part you don't understand to our editors. Or join ourย Discord and ask the community.