1898. Maximum Number of Removable Characters


Problem Description

You are provided with two strings s and p, where p must be a subsequence of s. There is also an array called removable with distinct indices from s. The goal is to find out the maximum number of characters you can remove from s (using the indices provided in removable from lowest to highest) so that p remains a subsequence of s. The steps for removing characters are as follows: mark characters in s at indices indicated by the first k elements of removable, remove them, and then check if p remains a subsequence of the modified s. A subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.

Intuition

The key to solving this problem lies in understanding that once you remove characters from s, you must check whether each subsequent removal maintains p as a subsequence. Since removable contains distinct indices, and we want to maximize k, we can approach this problem using binary search.

The intuition behind using binary search here is that if removing k characters still leaves p as a subsequence, removing fewer than k characters will also leave p as a subsequence. Conversely, if p is not a subsequence after removing k characters, removing more than k characters will certainly not leave p as a subsequence. Therefore, there is a threshold value for k which we want to find - the maximum k for which p remains a subsequence. Binary search allows us to efficiently home in on this threshold by halving the search space with each iteration and identifying the exact point where p stops being a subsequence.

In the provided solution, the function check(k) takes a number k and simulates the process of marking and removing k characters from s to verify if p remains a subsequence. The binary search then adjusts the search interval (left, right) based on the outcomes of check(k). If p is still a subsequence after removing k characters, it means we could possibly remove more, so the search space is adjusted to [mid, right]. If not, we have removed too many characters, and the search space is adjusted to [left, mid-1]. The search continues until the maximum k is found.

Learn more about Binary Search patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Which of these properties could exist for a graph but not a tree?

Solution Approach

The implementation makes use of a binary search pattern, specifically what's often referred to as "Binary Search Template 2". This is due to the way the mid point is calculated and how the search space is adjusted. The pattern is conducive when we want to search for the element or condition which requires accessing the current index and its immediate right neighbour in the array.

Here's the breakdown of the solution approach:

  1. Binary Search Algorithm: We want to find the maximum value of k such that removing k characters specified by the first k indices in removable from s leaves p as a subsequence. We perform a binary search over the length of the array removable because the problem has a "yes" or "no" nature (is p still a subsequence after the removal?) and because the removal condition is monotonic—that is, if removing k characters works, then removing any less than k will also work.

  2. check Function: This function simulates the removal of characters from s and checks if p is still a subsequence. It uses two pointers, i for the string s and j for the string p. As it iterates over s, it skips over the characters at the indices specified in the first k entries of removable. If a non-removable character in s matches the current character in p, it moves the pointer j forward. If the end of p is reached (j == n), then p is still a subsequence of s after the removals.

  3. Set Data Structure: A set called ids is used to store the indices from removable that we're proposing to remove. This allows constant-time checks to determine if a character at a particular index in s should be skipped during the subsequence check.

  4. Calculating mid: The variable mid is calculated as (left + right + 1) // 2 to ensure that the search space moves towards the higher numbers when the check(mid) is successful, ultimately helping to identify the upper bound of k.

  5. Adjusting Search Space: Based on the results of check(mid), the binary search narrows the search space. If removing mid characters works, it means we should look for a possibly higher k, so the new search interval is [mid, right]. If not, we search the space [left, mid - 1]. The search ends when left and right meet, which will be the point just before p stops being a subsequence, thus giving us the maximum k.

The combination of these components results in a solution that efficiently pinpoints the largest k for which p remains a subsequence in s, despite the removals.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which data structure is used to implement recursion?

Example Walkthrough

Let's consider a small example to illustrate the solution approach.

Suppose:

  • s = "abcacb"
  • p = "ab"
  • removable = [3, 1, 0]

Applying Binary Search

  1. Initial Setup: We set up our binary search with bounds left = 0 and right = len(removable) - 1 = 2. The goal is to find the maximum k such that p is still a subsequence of the modified s after the removal.

  2. First Iteration:

    • Calculate mid = (left + right + 1) // 2 = (0 + 2 + 1) // 2 = 1.
    • Use check(1) to simulate the removal of characters at indices removable[0] and removable[1], which are 0 and 1.
    • After removal, s = "_b_acb". p = "ab" is still a subsequence of this new string s.
    • Since check(1) is true, we adjust the search range to [mid, right] which means [1, 2].
  3. Second Iteration:

    • Calculate mid = (1 + 2 + 1) // 2 = 2.
    • Use check(2) to simulate the removal of characters at indices removable[0], removable[1], and removable[2], which are 0, 1, and 3.
    • After removal, s = "_b__cb". p = "ab" is no longer a subsequence of s.
    • Since check(2) is false, we adjust the search range to [left, mid - 1] which means [1, 1].

Since left and right are the same, we conclude the binary search. We determined that the maximum k for which p = "ab" remains a subsequence of s after removal is 1.

Therefore, the maximum number of characters that can be removed from s while ensuring p remains a subsequence is 1 (removing the characters at indices [0, 1] from s).

Solution Implementation

1from typing import List
2
3class Solution:
4    def maximum_removals(self, string: str, pattern: str, removable: List[int]) -> int:
5        def can_form_pattern_after_removals(k: int) -> bool:
6            # Initialize pointers for pattern and string
7            pattern_index = 0
8            string_index = 0
9            # Convert the list of removable indices up to k into a set for faster look-ups
10            removal_set = set(removable[:k])
11            # Iterate through string and pattern
12            while pattern_index < pattern_length and string_index < string_length:
13                # Check if current index is not removed, and characters match
14                if string_index not in removal_set and string[string_index] == pattern[pattern_index]:
15                    # Move to the next character in the pattern
16                    pattern_index += 1
17                # Move to the next character in the string
18                string_index += 1
19            # We can form the pattern if we have gone through all characters of the pattern
20            return pattern_index == pattern_length
21
22        # Get string and pattern lengths
23        string_length, pattern_length = len(string), len(pattern)
24        # Set the search space for the binary search
25        left, right = 0, len(removable)
26        # Perform binary search to find the maximum number of removable characters
27        while left < right:
28            mid = (left + right + 1) // 2  # Pick the middle value
29            # Check if the pattern can be formed after removing mid characters
30            if can_form_pattern_after_removals(mid):
31                left = mid  # If it is possible, try removing more characters
32            else:
33                right = mid - 1  # Otherwise, try with fewer removable characters
34        return left  # The maximum number of removable characters is left
35
36# Explanation:
37# The above code checks the maximum number of indices that can be removed from the string `s`
38# such that it is still possible to find the pattern `p` as a subsequence.
39# It uses binary search to efficiently find this maximum number.
40
1class Solution {
2    // Method to find the maximum number of characters that can be removed from 
3    // string s so that string p is still a subsequence of s
4    public int maximumRemovals(String s, String p, int[] removable) {
5        int left = 0, right = removable.length; // Define binary search boundaries
6      
7        // Perform binary search to find the maximum index 'mid'
8        while (left < right) {
9            int mid = (left + right + 1) / 2; // Choose a middle point (avoid overflow)
10            // Check if string p is still a subsequence of s after removing mid characters
11            if (checkSubsequence(s, p, removable, mid)) {
12                left = mid; // Move left pointer to mid if p is still a subsequence
13            } else {
14                right = mid - 1; // Move right pointer if p is not a subsequence
15            }
16        }
17        return left; // Returns maximum removables that still keep p as a subsequence of s
18    }
19
20    // Helper method to check if p is a subsequence of s after removing certain characters
21    private boolean checkSubsequence(String s, String p, int[] removable, int mid) {
22        int sLen = s.length(); // Length of string s
23        int pLen = p.length(); // Length of string p
24        int i = 0; // Index for iterating over s
25        int j = 0; // Index for iterating over p
26      
27        // Create a hash set to store the indices of characters to be removed
28        Set<Integer> removalSet = new HashSet<>();
29        for (int k = 0; k < mid; k++) {
30            removalSet.add(removable[k]); // Add removable characters' indices up to mid
31        }
32      
33        // Iterate over string s and p to check if p is a subsequence
34        while (i < sLen && j < pLen) {
35            // If current index is not in the removal set and characters match, move j
36            if (!removalSet.contains(i) && s.charAt(i) == p.charAt(j)) {
37                j++;
38            }
39            i++; // Always move i to the next character
40        }
41      
42        // If we've found the whole string p, return true; otherwise, return false
43        return j == pLen;
44    }
45}
46
1#include <vector>
2#include <string>
3#include <unordered_set>
4
5class Solution {
6public:
7    int maximumRemovals(std::string s, std::string p, std::vector<int>& removable) {
8        int left = 0, right = removable.size();
9
10        // Binary search to find the maximum number of characters that can be removed
11        while (left < right) {
12            // Choose the midpoint of the current range
13            int mid = (left + right + 1) / 2;
14          
15            // Check if the subsequence 'p' is still in 's' after removing 'mid' characters
16            if (checkSubsequence(s, p, removable, mid)) {
17                left = mid; // If it is, increase the lower bound of the search range
18            } else {
19                right = mid - 1; // If not, decrease the upper bound of the search range
20            }
21        }
22        return left; // The maximum number of characters we can remove
23    }
24
25    // Helper function to check if 'p' is a subsequence of 's' after removing 'mid' characters
26    bool checkSubsequence(std::string& s, std::string& p, std::vector<int>& removable, int mid) {
27        int sLength = s.size();
28        int pLength = p.size();
29        int i = 0, j = 0; // Pointers for string 's' and the subsequence 'p'
30      
31        // Using a set to keep track of the indices that have been removed
32        std::unordered_set<int> removedIndices;
33        for (int k = 0; k < mid; ++k) {
34            removedIndices.insert(removable[k]);
35        }
36      
37        // Iterate over both strings and check if 'p' is a subsequence of 's'
38        while (i < sLength && j < pLength) {
39            // If the current character is not removed and matches with `p[j]`,
40            // move the pointer `j` of subsequence `p`
41            if (removedIndices.count(i) == 0 && s[i] == p[j]) {
42                ++j;
43            }
44            ++i; // Always move the pointer `i` of string `s`
45        }
46      
47        // If we have iterated through the entire subsequence `p`, it's a subsequence of `s`
48        return j == pLength;
49    }
50};
51
1// Function to determine the maximum number of characters that can be removed
2// from string 's' such that string 'p' is still a subsequence of 's'.
3function maximumRemovals(s: string, p: string, removable: number[]): number {
4    let leftIndex = 0;
5    let rightIndex = removable.length;
6
7    while (leftIndex < rightIndex) {
8        // Midpoint calculation to use as the possible number of characters to be removed
9        let midPoint = Math.floor((leftIndex + rightIndex + 1) / 2);
10
11        // Check if 'p' is still a subsequence after removing 'midPoint' characters
12        if (isSubsequence(s, p, new Set(removable.slice(0, midPoint)))) {
13            leftIndex = midPoint;
14        } else {
15            rightIndex = midPoint - 1;
16        }
17    }
18    // The maximum number of characters that can be removed
19    return leftIndex;
20}
21
22// Function to check if 'sub' is a subsequence of 'str' when indices in 'removedIndices' are removed.
23function isSubsequence(str: string, sub: string, removedIndices: Set<number>): boolean {
24    let strLength = str.length;
25    let subLength = sub.length;
26    let indexStr = 0;
27    let indexSub = 0;
28
29    // Iterate over each character of 'str' and 'sub' to check for subsequence
30    while (indexStr < strLength && indexSub < subLength) {
31        // If the current index is not removed, and characters match, increment 'sub' index
32        if (!removedIndices.has(indexStr) && str.charAt(indexStr) === sub.charAt(indexSub)) {
33            indexSub++;
34        }
35        indexStr++;
36    }
37    // Check if the end of 'sub' was reached, indicating it is a subsequence
38    return indexSub === subLength;
39}
40
Not Sure What to Study? Take the 2-min Quiz:

Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?

Time and Space Complexity

Time Complexity:

The time complexity of the function mainly comes from two parts: the binary search and repeatedly checking whether p is a subsequence of s after certain characters have been removed.

The binary search is done over the length of removable, which has a complexity of O(log k) where k is the length of removable. For each iteration of binary search, the check function is called, which iterates over the characters of s and p.

The check function can take O(m + k) time in the worst case, where m is the length of s and k is the length of the prefix of removable considered in the check function. Although k changes with each binary search iteration, we need to consider the worst-case scenario where the function checks the entire prefix, hence k.

Combining these two parts, the overall time complexity is O((m + k) * log k).

Space Complexity:

The space complexity is primarily due to the set ids used in the check function, which stores the indices that are removable up to k. This set can grow up to size k.

Thus, the space complexity is O(k).

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which of the following is a min heap?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫