3008. Find Beautiful Indices in the Given Array II

HardTwo PointersStringBinary SearchString MatchingHash FunctionRolling Hash

Problem Description

You are given four inputs: a 0-indexed string s, and strings a, b, as well as an integer k. The task is to find all the "beautiful" indices in the string s. An index i is considered "beautiful" if it satisfies the following conditions:

It's within the bounds of s length when considering string a, meaning: 0 <= i <= s.length - a.length.
The substring of s starting at index i and having the length of a is equal to a. Formally, s[i..(i + a.length - 1)] == a.
There exists an index j (for the string b) within the bounds of s, which means: 0 <= j <= s.length - b.length.
The substring of s starting at index j and having the length of b is equal to b. Formally, s[j..(j + b.length - 1)] == b.
The absolute difference between i and j is less than or equal to k, which can be expressed as |j - i| <= k.

The output should be an array of these "beautiful" indices, sorted from the smallest index to the largest.

Intuition

To find the beautiful indices efficiently, we can use the Knuth-Morris-Pratt (KMP) algorithm, which is a string searching (or string matching) algorithm that seeks the occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters.

Here's the intuition broken down:

The code defines two functions build_prefix_function and kmp_search that are classic implementations of the KMP algorithm. The prefix function is used to avoid re-scanning the text s by pre-computing the longest prefix which is also a suffix, for substrings in pattern strings a and b.
After getting prefix arrays for a and b, the kmp_search function is used to find all occurrences of a and b inside the text s.
Once we have occurrences of a and b in s, the algorithm then checks which index of a can be paired with an index of b such that they meet the criteria of being beautiful - their position is not farther than k apart.

The solution involves iterating through each occurrence of a and trying to find the nearest occurrence of b that satisfies the distance constraint. It keeps track of the current positions while scanning through both sets of occurrences to avoid unnecessary comparisons and thus optimize the search.

Learn more about Two Pointers and Binary Search patterns.

Solution Approach

To address the problem, we use the Knuth-Morris-Pratt (KMP) algorithm due to its efficiency in string searching over a naïve approach which can be significantly slower when dealing with large strings. The provided solution comprises two main parts: building the prefix functions for strings a and b, and searching for occurrences of a and b using the KMP search algorithm within the main string s. Let's walk through the implementation of the solution:

Building the Prefix Functions:
- The build_prefix_function function takes a pattern (either a or b) and precomputes an array that contains the lengths of the longest proper prefix that is also a suffix for every prefix of the pattern.
- This array is essential to optimize the KMP search by avoiding the repetition of the comparisons already done.
KMP Search:
- Once the prefix functions are prepared, the kmp_search function scans the string s for the occurrences of patterns a and b.
- When scanning s, if a mismatch is found, the function uses the prefix function to skip characters in the text that are known to match the pattern up to a certain point, effectively reducing the number of comparisons.
- Every time a complete match of the pattern is found in s, the start index is stored in the occurrences list.
Finding Beautiful Indices:
- With the indices of occurrences of a and b in hands, we iterate through each index in the occurrences of a, while simultaneously iterating through the occurrences of b to check for the distance constraint.
- A while loop is used to find the nearest occurrence of b for each occurrence of a. If the condition |resb[j] - resa[i]| <= k is satisfied, then the index resa[i] is a beautiful index, and it is appended to the result list, after which we break to move on to the next index in resa.
- We maintain two pointers, i and j, that move through the lists resa and resb respectively, seeking to find the smallest j such that the distance between a and b is within k.
- To ensure that we don't miss any closer b occurrences, we check if moving j forward gets us closer to the current i, and if not, we have found the nearest b for the current i and move on.

Finally, we return the list res which contains all the beautiful indices sorted from smallest to largest as by their natural order of finding during the scan.

This algorithm is efficient because it uses the precomputed information in the prefix arrays to avoid redundant scanning and comparisons, enabling a linear-time complexity for the search process, which is significantly faster than quadratic time complexity for the brute force method.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's illustrate the solution approach with a small example. Assume we have a string s = "abcxabcdabxabcdabcdabcy", pattern a = "abcdabcy", and pattern b = "abc" with a distance constraint k = 10.

Building the Prefix Functions:
- We first build the prefix function for a = "abcdabcy". This results in a prefix array [0, 0, 0, 0, 1, 2, 3, 0], indicating where we can safely resume the comparison if a mismatch happens.
- Next, we build the prefix function for b = "abc", leading to [0, 0, 0], meaning there is no proper prefix in b that matches a suffix in the case of a mismatch.
KMP Search:
- Using the kmp_search function, we search for occurrences of a within s. The only match is found when i = 15, which is stored in resa.
- Searching for b in s results in matches at i = 0, i = 6, and i = 12, which are recorded in resb.
Finding Beautiful Indices:
- With resa = [15] and resb = [0, 6, 12], we begin by comparing each index in resa with indices in resb to find any that meet the distance constraint.
- Starting with the first and only index in resa (i = 15), we check the indices in resb. The index j = 12 from resb matches the condition |15 - 12| <= 10. Thus index i = 15 from resa is considered "beautiful".
- The condition is satisfied for j = 12, and it is not necessary to check for j = 6 or j = 0 since these indices would not provide a smaller absolute difference with i = 15 within the constraint k.

The result list res will thus contain the single index [15], indicating the position of the beautiful index in s with respect to a.

This example demonstrates how the KMP algorithm efficiently searches for occurrences of the patterns and how we only need to compare the indices of these occurrences to find the beautiful indices without re-scanning large parts of the string s.

Solution Implementation

1from typing import List
2
3class Solution:
4    def beautifulIndices(self, s: str, a: str, b: str, k: int) -> List[int]:
5        # Helper function to build the prefix function for a given pattern,
6        # used by the Knuth-Morris-Pratt (KMP) algorithm.
7        def build_prefix_function(pattern: str) -> List[int]:
8            prefix_function = [0] * len(pattern)
9            j = 0
10            # Loop through pattern to calculate the longest prefix that is also a suffix.
11            for i in range(1, len(pattern)):
12                while j > 0 and pattern[i] != pattern[j]:
13                    j = prefix_function[j - 1]
14                if pattern[i] == pattern[j]:
15                    j += 1
16                prefix_function[i] = j
17            return prefix_function
18
19        # KMP search algorithm to find all occurrences of a pattern in a text.
20        def kmp_search(pattern: str, text: str, prefix_function: List[int]) -> List[int]:
21            occurrences = []
22            j = 0
23            # Loop through text to find all matches of the pattern.
24            for i in range(len(text)):
25                while j > 0 and text[i] != pattern[j]:
26                    j = prefix_function[j - 1]
27                if text[i] == pattern[j]:
28                    j += 1
29                if j == len(pattern):
30                    occurrences.append(i - j + 1)
31                    j = prefix_function[j - 1]
32            return occurrences
33
34        # Build prefix functions for both patterns a and b.
35        prefix_a = build_prefix_function(a)
36        prefix_b = build_prefix_function(b)
37
38        # Use KMP search to find all occurrences of a and b in string s.
39        matches_a = kmp_search(a, s, prefix_a)
40        matches_b = kmp_search(b, s, prefix_b)
41
42        # List to hold all beautiful indices.
43        beautiful_indices = []
44      
45        # Initialize pointers for iterating through matches of a and b.
46        i, j = 0, 0
47
48        # Process all matches of pattern a.
49        while i < len(matches_a):
50            while j < len(matches_b):
51                # Check if the current match of b is within k distance from match a.
52                if abs(matches_b[j] - matches_a[i]) <= k:
53                    beautiful_indices.append(matches_a[i])
54                    break
55                # If not, and there is another b ahead that's closer to match a,
56                # move to the next match of b.
57                elif j + 1 < len(matches_b) and abs(matches_b[j + 1] - matches_a[i]) < abs(matches_b[j] - matches_a[i]):
58                    j += 1
59                else:
60                    break
61            i += 1
62      
63        # Return the list of beautiful indices.
64        return beautiful_indices
65

1import java.util.ArrayList;
2import java.util.List;
3
4public class Solution {
5
6    // Computes the Longest Prefix Suffix (LPS) array used in the KMP algorithm
7    public void computeLPS(String pattern, int[] lps) {
8        int lengthOfPattern = pattern.length();
9        int length = 0; // length of the previous longest prefix suffix
10
11        lps[0] = 0; // LPS[0] is always 0 because a single character doesn't have a proper prefix and suffix
12
13        int index = 1;
14        while (index < lengthOfPattern) {
15            // check if pattern continued from previous lps
16            if (pattern.charAt(index) == pattern.charAt(length)) {
17                length++;
18                lps[index] = length;
19                index++;
20            } else { // (pattern[i] != pattern[len])
21                // fallback to previous longest proper prefix and suffix
22                if (length != 0) {
23                    length = lps[length - 1];
24                } else { // if no proper prefix and suffix, set to 0
25                    lps[index] = 0;
26                    index++;
27                }
28            }
29        }
30    }
31
32    // KMP algorithm to find all occurrences of a pattern in a text
33    public List<Integer> KMP_codestorywithMIK(String pattern, String text) {
34        int lengthOfText = text.length();
35        int lengthOfPattern = pattern.length();
36        List<Integer> result = new ArrayList<>();
37
38        int[] lps = new int[lengthOfPattern];
39        computeLPS(pattern, lps);
40
41        int indexOfText = 0; // Index for the text
42        int indexOfPattern = 0; // Index for the pattern
43
44        while (indexOfText < lengthOfText) {
45            if (pattern.charAt(indexOfPattern) == text.charAt(indexOfText)) {
46                indexOfText++;
47                indexOfPattern++;
48            }
49
50            // Check if we have found a pattern match
51            if (indexOfPattern == lengthOfPattern) {
52                // Pattern found at index (indexOfText - indexOfPattern); add to result
53                result.add(indexOfText - indexOfPattern);
54                indexOfPattern = lps[indexOfPattern - 1];
55            } 
56            // Mismatch after `indexOfPattern` matches
57            else if (indexOfText < lengthOfText && pattern.charAt(indexOfPattern) != text.charAt(indexOfText)) {
58                // Do not match lps[0..lps[indexOfPattern-1]] characters,
59                // they will match anyway
60                if (indexOfPattern != 0) {
61                    indexOfPattern = lps[indexOfPattern - 1];
62                } else {
63                    indexOfText++;
64                }
65            }
66        }
67
68        return result;
69    }
70
71    // Implements the Lower Bound function to find the smallest index of an element greater or equal to 'target'
72    private int lowerBound(List<Integer> list, int target) {
73        int left = 0, right = list.size() - 1, result = list.size();
74
75        while (left <= right) {
76            int mid = left + (right - left) / 2;
77
78            if (list.get(mid) >= target) {
79                result = mid;
80                right = mid - 1;
81            } else {
82                left = mid + 1;
83            }
84        }
85
86        return result;
87    }
88
89    // Finds indices such that substring starting at those indices and pattern a match,
90    // and there exists a substring within the k-neighborhood that matches pattern b
91    public List<Integer> beautifulIndices(String s, String a, String b, int k) {
92        int lengthOfString = s.length();
93
94        // Get all the starting indices of pattern a in string s
95        List<Integer> aIndices = KMP_codestorywithMIK(a, s);
96        // Get all the starting indices of pattern b in string s
97        List<Integer> bIndices = KMP_codestorywithMIK(b, s);
98
99        List<Integer> result = new ArrayList<>();
100
101        // Loop through all the starting indices of pattern a
102        for (int indexA : aIndices) {
103
104            // Define the boundaries of the k-neighborhood
105            int leftLimit = Math.max(0, indexA - k);
106            int rightLimit = Math.min(lengthOfString - 1, indexA + k);
107
108            // Find the first index in bIndices which is greater or equal to leftLimit
109            int lowerBoundIndex = lowerBound(bIndices, leftLimit);
110
111            // Check if within bounds and add to result if conditions are met
112            if (lowerBoundIndex < bIndices.size() && bIndices.get(lowerBoundIndex) <= rightLimit) {
113                result.add(indexA);
114            }
115        }
116
117        return result;
118    }
119}
120

1#include <vector>
2#include <string>
3#include <algorithm>
4
5class Solution {
6public:
7    // Finds and returns a vector containing the indices of the 's' where
8    // 'patternA' and 'patternB' occur, respecting the condition that any index in
9    // 'patternB' should not be at a distance of more than 'k' from 'patternA'.
10    std::vector<int> beautifulIndices(const std::string& s, const std::string& pattern_a, const std::string& pattern_b, int k) {
11        // Find all occurrences of patternA in s
12        std::vector<int> beautiful_indices_a = kmpSearch(s, pattern_a);
13        // Find all occurrences of patternB in s
14        std::vector<int> beautiful_indices_b = kmpSearch(s, pattern_b);
15
16        // Sort indices of patternB's occurrences for binary search
17        std::sort(beautiful_indices_b.begin(), beautiful_indices_b.end());
18
19        std::vector<int> result;
20        // Iterate over each index where patternA is found
21        for (int index_a : beautiful_indices_a) {
22            // Find the lower bound for valid indices of patternB
23            int left = std::lower_bound(beautiful_indices_b.begin(), beautiful_indices_b.end(), index_a - k) - beautiful_indices_b.begin();
24            // Find the upper bound for valid indices of patternB
25            int right = std::lower_bound(beautiful_indices_b.begin(), beautiful_indices_b.end(), index_a + k + pattern_b.length()) - beautiful_indices_b.begin();
26
27            // Check within the valid range for a beautiful index and add it to the result
28            for (int index_b = left; index_b < right; index_b++) {
29                if (std::abs(beautiful_indices_b[index_b] - index_a) <= k) {
30                    result.push_back(index_a);
31                    break;
32                }
33            }
34        }
35
36        return result;
37    }
38
39private:
40    // Knuth-Morris-Pratt (KMP) Algorithm for String Matching
41    std::vector<int> kmpSearch(const std::string& text, const std::string& pattern) {
42        std::vector<int> indices;
43        std::vector<int> prefix_function = computePrefixFunction(pattern);
44
45        int matched_characters = 0;
46        // Iterate through text to find all occurrences of the pattern
47        for (int i = 0; i < text.length(); i++) {
48            while (matched_characters > 0 && pattern[matched_characters] != text[i]) {
49                matched_characters = prefix_function[matched_characters - 1];
50            }
51            if (pattern[matched_characters] == text[i]) {
52                matched_characters++;
53            }
54            if (matched_characters == pattern.length()) {
55                // Pattern found, push starting index to the list
56                indices.push_back(i - matched_characters + 1);
57                matched_characters = prefix_function[matched_characters - 1];
58            }
59        }
60
61        return indices;
62    }
63
64    // Preprocessing function to calculate the longest prefix suffix (LPS) values
65    std::vector<int> computePrefixFunction(const std::string& pattern) {
66        int m = pattern.length();
67        std::vector<int> prefix_function(m, 0);
68        int length_of_longest_prefix_suffix = 0;
69        // Applying KMP preprocessing to calculate LPS values
70        for (int i = 1; i < m; i++) {
71            while (length_of_longest_prefix_suffix > 0 && pattern[length_of_longest_prefix_suffix] != pattern[i]) {
72                length_of_longest_prefix_suffix = prefix_function[length_of_longest_prefix_suffix - 1];
73            }
74            if (pattern[length_of_longest_prefix_suffix] == pattern[i]) {
75                length_of_longest_prefix_suffix++;
76            }
77            prefix_function[i] = length_of_longest_prefix_suffix;
78        }
79        return prefix_function;
80    }
81};
82

1function beautifulIndices(s: string, patternA: string, patternB: string, k: number): number[] {
2    // Find all occurrences of patternA in s using KMP algorithm
3    let indicesOfA: number[] = kmpSearch(s, patternA);
4    // Find all occurrences of patternB in s using KMP algorithm
5    let indicesOfB: number[] = kmpSearch(s, patternB);
6
7    // Sort indices of patternB's occurrences for binary search
8    indicesOfB.sort((a, b) => a - b);
9
10    let result: number[] = [];
11    // Iterate over each index where patternA is found
12    for (let indexA of indicesOfA) {
13        // Find the lower bound for valid indices of patternB
14        let left = lowerBound(indicesOfB, indexA - k);
15        // Find the upper bound for valid indices of patternB
16        let right = lowerBound(indicesOfB, indexA + k + patternB.length);
17
18        // Check within the valid range for a beautiful index and add it to the result
19        for (let i = left; i < right; i++) {
20            if (Math.abs(indicesOfB[i] - indexA) <= k) {
21                result.push(indexA);
22                break;
23            }
24        }
25    }
26
27    return result;
28}
29
30// Implementation of KMP search algorithm
31function kmpSearch(text: string, pattern: string): number[] {
32    let indices: number[] = [];
33    let prefixFunction: number[] = computePrefixFunction(pattern);
34
35    let matchedChars = 0;
36    // Iterate through text to find all occurrences of the pattern
37    for (let i = 0; i < text.length; i++) {
38        while (matchedChars > 0 && pattern[matchedChars] !== text[i]) {
39            matchedChars = prefixFunction[matchedChars - 1];
40        }
41        if (pattern[matchedChars] === text[i]) {
42            matchedChars++;
43        }
44        if (matchedChars === pattern.length) {
45            // Pattern found, push starting index to the list
46            indices.push(i - matchedChars + 1);
47            matchedChars = prefixFunction[matchedChars - 1];
48        }
49    }
50
51    return indices;
52}
53
54// Preprocessing function to calculate the prefix function (longest prefix suffix array)
55function computePrefixFunction(pattern: string): number[] {
56    let m: number = pattern.length;
57    let prefixFunction: number[] = new Array(m).fill(0);
58    let len: number = 0; // length of the longest prefix suffix
59    // Applying KMP preprocessing to calculate prefix function values
60    for (let i = 1; i < m; i++) {
61        while (len > 0 && pattern[len] !== pattern[i]) {
62            len = prefixFunction[len - 1];
63        }
64        if (pattern[len] === pattern[i]) {
65            len++;
66        }
67        prefixFunction[i] = len;
68    }
69    return prefixFunction;
70}
71
72// Helper function to find the lower bound of a value in a sorted array
73function lowerBound(arr: number[], value: number): number {
74    let low: number = 0;
75    let high: number = arr.length;
76    while (low < high) {
77        let mid: number = Math.floor((low + high) / 2);
78        if (arr[mid] < value) {
79            low = mid + 1;
80        } else {
81            high = mid;
82        }
83    }
84    return low;
85}
86

Time and Space Complexity

The given Python code implements the Knuth-Morris-Pratt (KMP) string-searching algorithm to find 'beautiful' indices in a string based on two patterns a and b, and a distance constraint k.

Time Complexity

Building Prefix Functions: The build_prefix_function function constructs the prefix table for pattern a and pattern b. The complexity of building the prefix function for a pattern of length m is O(m). Since we build it for both patterns a and b, the total time for this step is O(len(a) + len(b)).
KMP Search: The kmp_search function uses the prefix table to search the occurrences of patterns a and b in string s. The KMP search runs in O(n + m) for a pattern of length m and text of length n. Since we search for both patterns a and b separately in string s, the complexity is O(len(s) + len(a)) + O(len(s) + len(b)), which simplifies to O(2*len(s) + len(a) + len(b)).
Finding Beautiful Indices: In the worst case, every index can be a match, leading to O(len(s)) matches for both patterns. Iterating through all matches of a and checking for the condition with matches of b has the worst-case complexity of O(len(s)^2) since for each index in resa, we potentially compare it to all indices in resb.

Combining all the steps, the overall worst-case time complexity is O(len(a) + len(b) + 2*len(s) + len(s)^2). Since O(len(s)^2) is the dominant term, we can simplify the total time complexity to O(len(s)^2).

Space Complexity

Prefix Functions: The space used to store the prefix tables of patterns a and b is proportional to the lengths of a and b, making the space complexity for this step O(len(a) + len(b)).
Occurrences: The space used to store the occurrences of a and b in s can be up to O(len(s)) for both resa and resb in the worst case where each character of s starts a match.
Final Result: We store the final result in res, which in the worst case can also be of size O(len(s)).

Hence, the total space complexity of the algorithm is O(len(a) + len(b) + 3*len(s)). Simplifying, we get O(len(a) + len(b) + len(s)), with O(len(s)) being the dominant term if len(s) is much longer than patterns a and b.

Learn more about how to find time and space complexity quickly using problem constraints.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan: