2930. Number of Strings Which Can Be Rearranged to Contain Substring


Problem Description

You are given an integer n. The task is to determine the total number of "good" strings of length n. A string is considered good if it contains only lowercase English characters and can be rearranged in such a way that the resulting string has "leet" as a contiguous sequence of characters (also known as a substring). For instance, the string "lteer" is good because it can be rearranged to form "leetr", which contains "leet" as a substring.

It is required to calculate the total number of such good strings and since this number can be quite large, the answer should be given modulo 10^9 + 7.

A "substring" is a specified sequence of characters within a string that is formed by deleting any characters before or after the sequence. This problem, therefore, challenges us to figure out all possible combinations of "good" strings given a specific length.

Intuition

To determine a solution for this problem, we can use a memorization search technique that employs a Dynamic Programming (DP) approach. The idea is to recursively construct strings and determine if they can potentially form a good string by including "leet" as they progress.

To achieve this, we can define a recursive function dfs(i, l, e, t) that tracks the number of ways to form good strings with i remaining positions to fill and with at least l 'l's, e 'e's, and t 't's already planned to be a part of the string. The base case of this recursive function is when i is zero, which means all positions have been filled. If at this point, the counts of 'l', 'e', and 't' correspond to the string "leet" (l = 1, e = 2, t = 1), we have successfully formed one good string.

As we construct the string, at every step we can:

  1. Add any lowercase letter that is not 'l', 'e', 't', of which there are 23 possibilities, without changing the counts of 'l', 'e', 't' (dfs(i - 1, l, e, t) * 23).
  2. Add an 'l' to increment the count of 'l's by 1, capped at 1, because we need only one 'l' for "leet" (dfs(i - 1, min(1, l + 1), e, t)).
  3. Add an 'e' to increment the count of 'e's by 1, capped at 2, since we need two 'e's for "leet" (dfs(i - 1, l, min(2, e + 1), t)).
  4. Add a 't' to increment the count of 't's by 1, capped at 1, as we need only one 't' for "leet" (dfs(i - 1, l, e, min(1, t + 1))).

All these possibilities need to be added up and taken modulo 10^9 + 7 to avoid integer overflow. To improve efficiency and avoid recalculating the same states, we memoize the results of the dfs function calls. The entry point for our search is dfs(n, 0, 0, 0), which signifies the start of constructing the string with n positions to fill and no 'l', 'e', or 't' counted yet.

The Python code provided uses the @cache decorator from Pythonโ€™s functools to automatically handle memorization, reducing the need for implementing explicit memoization logic.

Learn more about Math, Dynamic Programming and Combinatorics patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece๏ผš

Given a sorted array of integers and an integer called target, find the element that equals to the target and return its index. Select the correct code that fills the ___ in the given code snippet.

1def binary_search(arr, target):
2    left, right = 0, len(arr) - 1
3    while left ___ right:
4        mid = (left + right) // 2
5        if arr[mid] == target:
6            return mid
7        if arr[mid] < target:
8            ___ = mid + 1
9        else:
10            ___ = mid - 1
11    return -1
12
1public static int binarySearch(int[] arr, int target) {
2    int left = 0;
3    int right = arr.length - 1;
4
5    while (left ___ right) {
6        int mid = left + (right - left) / 2;
7        if (arr[mid] == target) return mid;
8        if (arr[mid] < target) {
9            ___ = mid + 1;
10        } else {
11            ___ = mid - 1;
12        }
13    }
14    return -1;
15}
16
1function binarySearch(arr, target) {
2    let left = 0;
3    let right = arr.length - 1;
4
5    while (left ___ right) {
6        let mid = left + Math.trunc((right - left) / 2);
7        if (arr[mid] == target) return mid;
8        if (arr[mid] < target) {
9            ___ = mid + 1;
10        } else {
11            ___ = mid - 1;
12        }
13    }
14    return -1;
15}
16

Solution Approach

The provided Python solution uses a depth-first search (DFS) function dfs that employs recursion and memorization to compute the number of good strings of length i. Memorization is a technique that stores the results of expensive function calls and reuses these results when the same inputs occur again, avoiding the need for repeated computations.

The algorithm can be broken down into several steps:

  1. Define the recursive function dfs(i, l, e, t). This function calculates the number of good strings when there are i positions left to fill, making sure that by the end of the string construction, there are at least l 'l' characters, e 'e' characters, and t 't' characters.

  2. If i is zero, check if we have the exact counts of 'l', 'e', and 't' required to form the substring "leet". If l == 1, e == 2, and t == 1, return 1, signifying one good string has been found; otherwise, return 0 for no valid string.

  3. If i is not zero, continue constructing the string by trying all possible next moves:

    • Add any of the 23 other characters: dfs(i - 1, l, e, t) * 23.

    • Add an 'l': dfs(i - 1, min(1, l + 1), e, t).

    • Add an 'e': dfs(i - 1, l, min(2, e + 1), t).

    • Add a 't': dfs(i - 1, l, e, min(1, t + 1)).

    • Take all these possibilities, sum them up and apply modulus with 10^9 + 7.

  4. The @cache decorator from Pythonโ€™s functools module is used to automatically memorize the results of dfs function calls, eliminating the necessity for an explicit memoization data structure.

  5. The recursion starts from dfs(n, 0, 0, 0) which represents the starting state with n characters to place and no 'l', 'e', or 't' characters placed yet.

The DFS algorithm used in the solution is efficient in exploring all possible string configurations, and due to memorization, it ensures that each unique state is computed only once. This approach significantly reduces the total number of recursive calls, as the search space is pruned by returning cached results for previously encountered states.

Since the problem involves finding a substring in permutations of a string, we implement a strategy that only tracks the number of certain needed characters, leveraging combinatorics and the knowledge that the order of the other characters does not matter.

Ultimately, the technique optimizes our DFS algorithm, enabling it to compute the total number of valid permutations that include the substring "leet" within the constraints of the problem.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which data structure is used to implement recursion?

Example Walkthrough

Let's illustrate the solution approach with a small example by finding the number of "good" strings of length n = 3.

To identify "good" strings of length 3, we must recognize that "leet" cannot be formed as a substring because "leet" itself is of length 4 which is longer than our string length. Therefore, the number of good strings for any n < 4 should be zero as it is impossible to form the substring "leet".

Let's walk through this with our method dfs(i, l, e, t), where i is the length of the string still to be constructed and l, e, and t keep track of the number of occurrences of each respective character, ensuring that by the end they match or exceed the counts in the word "leet".

Starting with dfs(3, 0, 0, 0), we explore the following states:

  • If we add a non-'leet' letter, we call dfs(2, 0, 0, 0) and multiply the result by 23.
  • If we add an 'l', we call dfs(2, 1, 0, 0).
  • If we add an 'e', we call dfs(2, 0, 1, 0).
  • If we add a 't', we call dfs(2, 0, 0, 1).

However, no matter how we choose the characters in these recursive steps, we will eventually reach the base case dfs(0, l, e, t) for some combination of l, e, and t.

For the example where n = 3, the base case dfs(0, l, e, t) will never have l == 1, e == 2, t == 1 because we don't have enough characters to meet these requirements. So, every base case returns 0. As a result, the overall sum will also be zero because any number multiplied by zero is zero.

Thus, in this specific example, the total number of "good" strings of length 3 is zero.

For a greater length n >= 4, the dfs function would work similarly by exploring all possible combinations, except in some cases, it would return 1 when the base case is reached and the required number of 'l', 'e', and 't' characters are present, indicating a "good" string has been formed.

In conclusion, this DFS and memoization method provides a systematic way to count all "good" strings up to a certain length by building them character by character, while memoization optimizes the process by remembering the counts for substrings already assessed. This walk-through is a basic illustration that shows the algorithm would produce a result of zero for any string length less than 4 as those cannot possibly contain the substring "leet".

Solution Implementation

1class Solution:
2    def stringCount(self, n: int) -> int:
3        # Use functools cache decorator to memoize the results of subproblems
4        from functools import cache
5
6        # Define a helper function for depth-first search to count valid strings
7        @cache
8        def dfs(index: int, count_l: int, count_e: int, count_t: int) -> int:
9            # Base case: When index is 0, check if counts of l, e, t are valid
10            if index == 0:
11                return int(count_l == 1 and count_e == 2 and count_t == 1)
12
13            # Calculate possible strings without adding any of 'leet'
14            a = dfs(index - 1, count_l, count_e, count_t) * 23 % mod
15
16            # Calculate possible strings with one additional 'l'
17            b = dfs(index - 1, min(1, count_l + 1), count_e, count_t)
18
19            # Calculate possible strings with one additional 'e'
20            c = dfs(index - 1, count_l, min(2, count_e + 1), count_t)
21
22            # Calculate possible strings with one additional 't'
23            d = dfs(index - 1, count_l, count_e, min(1, count_t + 1))
24
25            # Return the sum of different possibilities, modulo mod
26            return (a + b + c + d) % mod
27
28        # Define the modulo constant
29        mod = 10**9 + 7
30
31        # Start the DFS from index n with the counts of 'l', 'e', 't' all zero
32        return dfs(n, 0, 0, 0)
33
1class Solution {
2    private static final int MOD = (int) 1e9 + 7; // Declare the modulus value as a constant
3    private Long[][][][] memoizationCache; // Define a 4D array for memoization
4
5    public int stringCount(int n) {
6        memoizationCache = new Long[n + 1][2][3][2]; // Initialize the memoization array
7        return (int) depthFirstSearch(n, 0, 0, 0); // Start the DFS with initial parameters
8    }
9
10    // Helper DFS method with memoization
11    private long depthFirstSearch(int remainingChars, int hasL, int hasE, int hasT) {
12        // Base case: if we've placed all characters, check if we meet the conditions
13        if (remainingChars == 0) {
14            return (hasL == 1 && hasE == 2 && hasT == 1) ? 1 : 0;
15        }
16        // Check if we've computed this state before to avoid recomputation
17        if (memoizationCache[remainingChars][hasL][hasE][hasT] != null) {
18            return memoizationCache[remainingChars][hasL][hasE][hasT];
19        }
20
21        // Recursive case: Try placing a non-LET character ('*' can place 23 different chars)
22        long placeNonLET = depthFirstSearch(remainingChars - 1, hasL, hasE, hasT) * 23 % MOD;
23
24        // Try placing 'L' if it wasn't placed before
25        long placeL = depthFirstSearch(remainingChars - 1, Math.min(1, hasL + 1), hasE, hasT);
26
27        // Try placing 'E' if it wasn't placed twice before
28        long placeE = depthFirstSearch(remainingChars - 1, hasL, Math.min(2, hasE + 1), hasT);
29
30        // Try placing 'T' if it wasn't placed before
31        long placeT = depthFirstSearch(remainingChars - 1, hasL, hasE, Math.min(1, hasT + 1));
32
33        // Save the result in the cache after combining all possibilities
34        return memoizationCache[remainingChars][hasL][hasE][hasT] = (placeNonLET + placeL + placeE + placeT) % MOD;
35    }
36}
37
1class Solution {
2public:
3    int stringCount(int n) {
4        const int MOD = 1e9 + 7; // The modulus value for avoiding integer overflow.
5      
6        // Define typedef for easy use of long long type.
7        using ll = long long;
8      
9        // f[i][l][e][t] represents the number of ways to form a string with conditions:
10        // i = number of characters left to place, 
11        // l = whether a 'c' has been placed before (1) or not (0), 
12        // e = counts 'b' before first 'c', capped at 2.
13        // t = counts 'a's after the last 'b' and before first 'c', capped at 1.
14        ll f[n + 1][2][3][2];
15      
16        // Initialize the memory to -1, which denotes unused states.
17        memset(f, -1, sizeof(f));
18      
19        // The dfs (depth-first search) function recursively counts the number of valid strings.
20        function<ll(int, int, int, int)> dfs = [&](int i, int hasC, int bCount, int aCount) -> ll {
21            // Base case: if no characters are left to place, 
22            // check if the required conditions are met.
23            if (i == 0) {
24                return (hasC == 1 && bCount == 2 && aCount == 1) ? 1 : 0;
25            }
26          
27            // If this state has already been computed, return the stored value.
28            if (f[i][hasC][bCount][aCount] != -1) {
29                return f[i][hasC][bCount][aCount];
30            }
31          
32            // Recursive case: try all possible next moves and accumulate results.
33            // Place one of the 23 other letters when there's no 'c' influence.
34            ll caseOther = dfs(i - 1, hasC, bCount, aCount) * 23 % MOD;
35          
36            // Place the letter 'c' if it hasn't been placed yet.
37            ll caseC = hasC == 0 ? dfs(i - 1, 1, bCount, aCount) % MOD : 0;
38          
39            // Place the letter 'b' if we haven't placed two 'b's before the first 'c' yet.
40            ll caseB = bCount < 2 ? dfs(i - 1, hasC, bCount + 1, aCount) % MOD : 0;
41          
42            // Place the letter 'a' if 'a' has not been placed after the last 'b' before the first 'c'.
43            ll caseA = aCount == 0 && bCount == 2 ? dfs(i - 1, hasC, bCount, aCount + 1) % MOD : 0;
44          
45            // Sum the above four cases and take modulus to avoid overflow.
46            // Then store the result in the memoization table before returning.
47            return f[i][hasC][bCount][aCount] = (caseOther + caseC + caseB + caseA) % MOD;
48        };
49      
50        // Kick off the recursion with the initial state.
51        return dfs(n, 0, 0, 0);
52    }
53};
54
1function stringCount(n: number): number {
2    const mod = 10 ** 9 + 7;  // Define the modulus to keep numbers within bounds
3
4    // Initialize a 4-dimensional array for memoization, with default value -1
5    const memo: number[][][][] = Array.from({ length: n + 1 }, () =>
6        Array.from({ length: 2 }, () =>
7            Array.from({ length: 3 }, () => Array.from({ length: 2 }, () => -1)),
8        ),
9    );
10
11    // Recursive function to calculate the number of strings with conditions
12    const dfs = (index: number, hasL: number, countE: number, hasT: number): number => {
13        if (index === 0) {
14            // If the required conditions hold, return 1, otherwise 0
15            return hasL === 1 && countE === 2 && hasT === 1 ? 1 : 0;
16        }
17
18        // Check the memo array to avoid recomputing
19        if (memo[index][hasL][countE][hasT] !== -1) {
20            return memo[index][hasL][countE][hasT];
21        }
22
23        // Recursive steps considering 4 scenarios:
24        // a) No new character 'L', 'E', or 'T' i.e., any other character 23 times (since there are 26-3 possible characters)
25        const a = (dfs(index - 1, hasL, countE, hasT) * 23) % mod;
26      
27        // b) Adding a new 'L' if we haven't had an 'L' before
28        const b = dfs(index - 1, Math.min(1, hasL + 1), countE, hasT);
29      
30        // c) Adding a new 'E' if we haven't exceeded the limit of 2 'E's
31        const c = dfs(index - 1, hasL, Math.min(2, countE + 1), hasT);
32      
33        // d) Adding a new 'T' if we haven't had a 'T' before
34        const d = dfs(index - 1, hasL, countE, Math.min(1, hasT + 1));
35      
36        // Store the computed value in the memo array, so it's available for the next calls
37        return memo[index][hasL][countE][hasT] = (a + b + c + d) % mod;
38    };
39
40    // Kick off the recursive function
41    return dfs(n, 0, 0, 0);
42}
43
Not Sure What to Study? Take the 2-min Quiz๏ผš

Which algorithm should you use to find a node that is close to the root of the tree?

Time and Space Complexity

The provided code uses a depth-first search (DFS) approach to count the number of strings of a specific length n that contain a particular combination of letters with memoization (note the @cache decorator).

Time Complexity

The function dfs is called recursively with different parameter states (i, l, e, t). Here, i tracks the current string length, while l, e, and t track the occurrences of specific characters. At every function call to dfs, it branches out into four recursive calls unless i == 0, i.e., the base case is reached. However, due to memoization, each unique state (i, l, e, t) is computed only once.

Since l, e, and t have upper limits (1, 2, and 1, respectively), the total number of states is bounded by n * 2 * 3 * 2, where n is the string length. This is because l can be 0 or 1 (2 states), e can be 0, 1, or 2 (3 states), and t can be 0 or 1 (2 states).

Hence, the total time complexity is O(n) because each state is solved only once, and we have a linear number of states with respect to n.

Space Complexity

Due to memoization, we need to store the result of each unique state (i, l, e, t) only once. As found in the time complexity analysis, there are n * 2 * 3 * 2 unique states. Since we store an integer for each state, and the size directly scales with n, the space complexity is O(n).

In summary, the analysis concurs with the reference answer for both time and space complexity of the code.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which of the following uses divide and conquer strategy?


Recommended Readings


Got a question?ย Ask the Teaching Assistantย anything you don't understand.

Still not clear? Ask in the Forum, ย Discordย orย Submitย the part you don't understand to our editors.

โ†
โ†‘TA ๐Ÿ‘จโ€๐Ÿซ