2156. Find Substring With Given Hash Value

HardStringSliding WindowHash FunctionRolling Hash
Leetcode Link

Problem Description

In this problem, we're given a string s and four integers named power, modulo, k, and hashValue. The objective is to find a specific substring of s of length k whose hash, calculated by a given hash function, equals hashValue. The hash function multiplies the value of each character in the substring (decided by its position in the alphabet where 'a' = 1, 'b' = 2, ..., 'z' = 26) by power raised to the index of the character, sums these products, and then takes the modulo with modulo.

More formally, for a substring of s starting at index i and ending at index i + k - 1, where 0 ≤ i ≤ len(s) - k, the hash is defined as:

1(hash(s[i]) * power^0 + hash(s[i + 1]) * power^1 + ... + hash(s[i + k - 1]) * power^(k-1)) mod modulo

We must find the first such substring from the left that matches the given hashValue.

Intuition

The problem can be solved by using a sliding window technique that moves across the string from right to left. Since a straightforward approach would lead to a time complexity of O(n * k) which could be prohibitive for large strings, we need to devise a more efficient way to update the hash value as we slide the window. The algorithm uses the property that the hash of a window (substring) depends on the character leaving the window and the character entering it.

The solution relies on calculating the hash backwards, starting from the end of the string. As we move the window to the left by one character, we update the calculated hash by subtracting the value of the character that is leaving the window, multiplying the entire hash by power (to account for the positional shifts), and adding the value of the new character entering the window on the left. We take the modulo after each operation to ensure the result stays within bounds considering the modulo parameter.

Calculating power^k for each new window would be computationally expensive, so we compute it only once at the beginning, and as we move the window, we use this precomputed value to subtract the influence of the character that left the window.

The implementation uses BigInt for arithmetic operations, to handle the case where the numbers get very large. Each time when a new hash matches the given hashValue, we update the position of the answer. Once we've scanned through the string, we extract the substring from the correct position and return it. If by any chance we don't find any such substring, which shouldn't happen according to the problem statement, we'd return an empty string.

Learn more about Sliding Window patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

How does quick sort divide the problem into subproblems?

Solution Approach

The solution approach involves the use of the sliding window technique along with modular arithmetic principles to efficiently calculate and update the hash values of substrings. Let's dissect the solution code to understand its implementation:

  1. The power and modulo parameters are converted to BigInt, as we will be dealing with potentially large integers due to the hash calculations and the power operations.

  2. The pk variable is initialized to 1n, which represents power^k in BigInt form. This will be used later to discount the character value that is leaving the window as we slide.

  3. The ac variable holds the current hash value of the windowed substring as it slides. It is also BigInt.

  4. We first compute the hash value of the last k characters of the string s, since we are dealing with a reverse iteration. We multiply ac by power and add the character value of the current character using the getCode function, then take the remainder after dividing by modulo.

  5. If the hash value matches hashValue at this initial position, we mark this position as ans. It is the starting index of a substring with the desired hash value.

  6. Then, for each iteration moving the window one step to the left, we perform the following operations:

    • We calculate pre, which is the modular value that needs to be removed from the current hash. It represents the influence of the character that is about to leave the window.
    • Then we update ac by removing pre, and adding the new character's value, after which we multiply by power. We also need to add modulo before taking the remainder to ensure we are dealing with non-negative numbers.
  7. If the newly calculated ac equals hashValue, we update ans with the current starting index of the windowed substring. Since we are iterating from right to left, the first match we find will be the first from the left in the string s.

  8. The getCode helper function calculates the character value based on the alphabet index: str.charCodeAt(index) - 'a'.charCodeAt(0) + 1.

After the loop is finished, ans will hold the starting index of the correct substring or -1 if no such substring is found (as per the problem constraints, an answer always exists). We return the substring using s.substring(ans, ans + k) if a match is found; otherwise, we return an empty string.

The elegance of this approach is in the sliding window and the way we update the hash value in constant time, avoiding recomputation from scratch for each new window. This optimizes the solution and allows it to run efficiently even for large inputs.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which algorithm should you use to find a node that is close to the root of the tree?

Example Walkthrough

Let's consider a small example to illustrate the solution approach:

Suppose we have a string s = "abcde", and the integers power = 2, modulo = 100, k = 2, and hashValue = 9.

Here, the character values based on their position in the alphabet are 'a' = 1, 'b' = 2, 'c' = 3, 'd' = 4, and 'e' = 5.

The hash of a substring is calculated as follows:

1(hash(s[i]) * power^0 + hash(s[i + 1]) * power^1 + ... + hash(s[i + k - 1]) * power^(k-1)) mod modulo

We want to find a substring of length k = 2 whose hash value is 9.

  1. Convert power and modulo into BigInt: power = 2n and modulo = 100n.

  2. The initial value of pk (which represents power^k) will be 2^2 = 4. So pk = 4n.

  3. Calculate the hash value ac of the last 2 characters 'de': ac = (4 * power^0 + 5 * power^1) mod modulo = (4 + 5 * 2) mod 100 = (4 + 10) mod 100 = 14 mod 100 = 14. Since 14 is not equal to hashValue, we proceed.

  4. Now slide the window to the left to compute the hash for the substring 'cd':

    • pre = hash(s[i + k - 1]) * pk mod modulo = 4 * 4 mod 100 = 16 mod 100 = 16.
    • Update ac: ac = (ac * power - pre + hash(s[i])) mod modulo = (14 * 2 - 16 + 3) mod 100 = (28 - 16 + 3) mod 100 = 15 mod 100 = 15. Again, 15 is not equal to hashValue, so we continue to move the window.
  5. Next, for 'bc':

    • pre = hash(s[i + k - 1]) * pk mod modulo = 3 * 4 mod 100 = 12 mod 100 = 12.
    • Update ac: ac = (ac * power - pre + hash(s[i])) mod modulo = (15 * 2 - 12 + 2) mod 100 = (30 - 12 + 2) mod 100 = 20 mod 100 = 20. 20 does not match hashValue.
  6. Lastly, check 'ab':

    • pre = hash(s[i + k - 1]) * pk mod modulo = 2 * 4 mod 100 = 8 mod 100 = 8.
    • Update ac: ac = (ac * power - pre + hash(s[i])) mod modulo = (20 * 2 - 8 + 1) mod 100 = (40 - 8 + 1) mod 100 = 33 mod 100 = 33. 33 does not match hashValue.

In this case, no substring of length 2 with the hash value of 9 can be found using the hash function described. However, this is purely an illustrative example, and according to the problem constraints, there will always be a solution.

Here, the power of the described method lies in its constant-time updates of the hash value for each sliding step, rather than recalculating from scratch. Each step involves just a few arithmetic operations, regardless of k, which makes the approach efficient for larger data.

Solution Implementation

1def sub_str_hash(s, power, modulo, k, hash_value):
2    """
3    This function searches for a substring in a given string 's' where the hash value of the
4    substring matches 'hash_value'. It uses a sliding window and rolling hash technique to 
5    compute the hash of substrings in an efficient manner.
6  
7    :param s: The string to search within.
8    :param power: The base value for hashing.
9    :param modulo: The modulo value for hashing to prevent overflow.
10    :param k: The length of the substring to find.
11    :param hash_value: The target hash value.
12    :return: The first substring that matches the hash_value, or an empty string if not found.
13    """
14    power_bigint = pow(power, 1, modulo)  # Power without 'n', in Python we use pow with 3 arguments to manage modulo
15    hash_value_bigint = hash_value  # No need for conversion, in Python all integers are of arbitrary size
16    n = len(s)
17    power_k = 1
18    rolling_hash = 0
19  
20    # Prepare the hash for the last 'k' characters of the string 's'
21    for i in range(n - 1, n - k - 1, -1):  # Use range for reverse counting
22        rolling_hash = (rolling_hash * power_bigint + get_char_value(s, i)) % modulo
23        power_k = (power_k * power_bigint) % modulo
24  
25    found_position = -1
26    # If the rolling hash of the last 'k' characters matches 'hash_value', store this position
27    if rolling_hash == hash_value_bigint:
28        found_position = n - k
29  
30    # Slide the window, now processing from right to left
31    for i in range(n - k - 1, -1, -1):
32        pre_value = (get_char_value(s, i + k) * power_k) % modulo
33        rolling_hash = (rolling_hash * power_bigint + get_char_value(s, i) - pre_value + modulo) % modulo
34        if rolling_hash == hash_value_bigint:
35            found_position = i
36  
37    if found_position == -1:
38        return ''
39    else:
40        return s[found_position: found_position + k]
41
42def get_char_value(str, index):
43    """
44    This function computes the hash code of a character at a specific index in a string.
45  
46    :param str: The string which contains the character.
47    :param index: The index of the character.
48    :return: The hash code of the character.
49    """
50    return ord(str[index]) - ord('a') + 1
51
52# Example usage:
53print(sub_str_hash("abcdef", 31, 100, 3, 53))  # Example output: "abc"
54
1public class SubStringHasher {
2
3    /**
4     * Converts a character to its corresponding hash value.
5     *
6     * @param ch - The character to be converted.
7     * @return - The hash code of the character.
8     */
9    private static long getCharValue(char ch) {
10        return ch - 'a' + 1;
11    }
12
13    /**
14     * Finds the substring of a given string 's' where the hash value of the substring
15     * matches 'hashValue', using a sliding window and rolling hash technique.
16     *
17     * @param s         - The string to search within.
18     * @param power     - The base value for hashing.
19     * @param modulo    - The modulo value for hashing to avoid overflow.
20     * @param k         - The length of the substring to find.
21     * @param hashValue - The target hash value.
22     * @return - The first substring matching the hashValue, or an empty string if not found.
23     */
24    public String subStrHash(String s, int power, int modulo, int k, int hashValue) {
25        long powerLong = power;
26        long moduloLong = modulo;
27        long hashValueLong = hashValue;
28        int n = s.length();
29
30        long powerK = 1;
31        long rollingHash = 0;
32
33        // Prepare the initial value of powerK to be power^k mod modulo
34        for (int i = 0; i < k; i++) {
35            powerK = (powerK * powerLong) % moduloLong;
36        }
37
38        // Calculate the hash of the last k characters of string 's'
39        for (int i = n - 1; i >= n - k; i--) {
40            rollingHash = (rollingHash * powerLong + getCharValue(s.charAt(i))) % moduloLong;
41        }
42
43        // Start as "not found"
44        int foundPosition = -1;
45
46        // Check if the initial hash already matches the target hash
47        if (rollingHash == hashValueLong) {
48            foundPosition = n - k;
49        }
50
51        // Slide the window from the end to the beginning of the string 's'
52        for (int i = n - k - 1; i >= 0; i--) {
53            // Remove the value of the previous character and add the next character's value to update rolling hash
54            rollingHash =
55                    ((rollingHash - getCharValue(s.charAt(i + k)) * powerK % moduloLong + moduloLong) * powerLong
56                            + getCharValue(s.charAt(i))) % moduloLong;
57
58            // If a matching hash is found, record the position
59            if (rollingHash == hashValueLong) {
60                foundPosition = i;
61            }
62        }
63
64        // Return the found substring or an empty string if no match is found
65        return foundPosition != -1 ? s.substring(foundPosition, foundPosition + k) : "";
66    }
67}
68
1#include <string>
2
3/**
4 * Calculates the rolling hash value of a substring.
5 * Uses a reverse sliding window and rolling hash technique to efficiently calculate the hash of substrings.
6 * 
7 * @param s The string to search within.
8 * @param power The base value for hashing.
9 * @param modulo The modulo value for hashing to prevent overflow.
10 * @param k The length of the substring to find.
11 * @param hash_value The target hash value.
12 * @return The first substring matching the hash value or an empty string if not found.
13 */
14std::string subStrHash(const std::string& s, int power, int modulo, int k, int hash_value) {
15    long long power_k = 1;
16    long long rolling_hash = 0;
17    int n = s.length();
18  
19    // Calculate the power of base raised to the length of the substring (k)
20    for (int i = 0; i < k; ++i) {
21        power_k = (power_k * power) % modulo;
22    }
23  
24    // Calculate the hash of the last k characters of the string using the reverse sliding window technique
25    // Start from the end and work backwards
26    for (int i = n - 1; i >= n - k; --i) {
27        rolling_hash = (rolling_hash * power + getCharValue(s[i])) % modulo;
28    }
29  
30    int found_position = -1;
31    // Check if the hash matches the target hash value
32    if (rolling_hash == hash_value) {
33        found_position = n - k;
34    }
35  
36    // Slide the window from the end towards the beginning of the string 's'
37    for (int i = n - k - 1; i >= 0; --i) {
38        // Roll the hash one position backward and include the new character at the beginning while excluding the last character
39        rolling_hash = (rolling_hash * power - getCharValue(s[i + k]) * power_k % modulo + getCharValue(s[i])) % modulo;
40        // Since modulo operation can yield negative values, ensure the hash value is positive
41        if (rolling_hash < 0) {
42            rolling_hash += modulo;
43        }
44        // If a matching hash is found, update the found position
45        if (rolling_hash == hash_value) {
46            found_position = i;
47        }
48    }
49  
50    // Return the matched substring or an empty string if no match is found
51    return found_position == -1 ? "" : s.substr(found_position, k);
52}
53
54/**
55 * Converts a character to an integer value based on its alphabet position.
56 *
57 * @param c The character to be converted.
58 * @return The integer value of the character.
59 */
60long long getCharValue(char c) {
61    // Convert character to its position in the alphabet (1-based indexing)
62    return c - 'a' + 1;
63}
64
1/**
2 * Finds the substring of a given string 's', where the hash value of the substring matches 'hashValue'.
3 * Uses a sliding window and rolling hash technique to efficiently calculate the hash of substrings.
4 *
5 * @param {string} s - The string to search within.
6 * @param {number} power - The base value for hashing.
7 * @param {number} modulo - The modulo value for hashing to avoid overflow.
8 * @param {number} k - The length of the substring to find.
9 * @param {number} hashValue - The target hash value.
10 * @returns {string} - The first substring matching the hashValue, or an empty string if not found.
11 */
12function subStrHash(s: string, power: number, modulo: number, k: number, hashValue: number): string {
13    const powerBigInt: bigint = BigInt(power);
14    const moduloBigInt: bigint = BigInt(modulo);
15    const hashValueBigInt: bigint = BigInt(hashValue);
16    const n: number = s.length;
17    let powerK: bigint = 1n;
18    let rollingHash: bigint = 0n;
19    // Reverse sliding window to calculate the hash of the substring
20    for (let i: number = n - 1; i > n - 1 - k; i--) {
21        rollingHash = (rollingHash * powerBigInt + getCharValue(s, i)) % moduloBigInt;
22        powerK = (powerK * powerBigInt) % moduloBigInt;
23    }
24    let foundPosition: number = -1;
25    // Check if the last k characters match the hashValue
26    if (rollingHash == hashValueBigInt) {
27        foundPosition = n - k;
28    }
29    // Slide the window from the end to the beginning of string 's'
30    for (let i: number = n - k - 1; i >= 0; i--) {
31        const preValue: bigint = (getCharValue(s, i + k) * powerK) % moduloBigInt;
32        rollingHash = (rollingHash * powerBigInt + getCharValue(s, i) - preValue + moduloBigInt) % moduloBigInt;
33        // If a matching hash is found, update the found position
34        if (rollingHash == hashValueBigInt) {
35            foundPosition = i;
36        }
37    }
38    // Return the found substring or an empty string if no match is found
39    return foundPosition === -1 ? '' : s.substring(foundPosition, foundPosition + k);
40}
41
42/**
43 * Gets the hash code of a character at a specific index in a string.
44 *
45 * @param {string} str - The string containing the character.
46 * @param {number} index - The index of the character in the string.
47 * @returns {bigint} - The hash code as a bigint.
48 */
49function getCharValue(str: string, index: number): bigint {
50    return BigInt(str.charCodeAt(index) - 'a'.charCodeAt(0) + 1);
51}
52
Not Sure What to Study? Take the 2-min Quiz:

Which of the following array represent a max heap?

Time and Space Complexity

Time Complexity

The time complexity of the function subStrHash is O(n * k). Here's the breakdown:

  • The first for loop runs for k iterations (where k is the length of substring we're looking for). Inside this loop, operations are constant time (with the exception of BigInt operations, which we'll assume to be O(1) for large numbers that can fit in the memory). This loop contributes O(k).
  • The second for loop runs for n - k iterations where n is the length of the string s. The operations inside this loop are also constant time, thus this loop contributes O(n - k).
  • Therefore, when you add both the contributions together, the worst case time complexity is O(n).

Space Complexity

The space complexity of the function subStrHash is O(1). Here's why:

  • No additional data structures that grow with the input size are used. Only a fixed number of BigInt variables (ac, pk, power, modulo, hashValue) and integers (n, ans, i) are used.
  • Since the algorithm doesn't utilize any space relative to the input size (ignoring the space taken by input arguments), the space complexity is constant.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Breadth first search can be used to find the shortest path between two nodes in a directed graph.


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫