1316. Distinct Echo Substrings

HardTrieStringHash FunctionRolling Hash
Leetcode Link

Problem Description

The challenge is to count the unique non-empty substrings of a given text string that can be expressed as the concatenation of the same string twice. In other words, for a given string text, we are looking for all the distinguishable substrings which satisfy the condition that substring X can be written as X = a + a, where a is some string. We regard two substrings as distinct if they appear at different positions in text, even if they are composed of the same characters.

Intuition

To solve this problem efficiently, we utilize a hash function to represent substrings, which allows us to compare the equality of two substrings in constant time instead of linear time. This hash function needs to be carefully designed to avoid collisions where different substrings may have the same hash value.

The solution involves the following steps:

  1. Precompute hash values for all prefixes of the text string, as well as powers of the base used in the hash function. This is to ensure we can calculate the hash for any substring quickly.
  2. Iterate over all possible lengths of 2*a, where a is the substring we are trying to find. We do this by using two nested loops, the outer loop going from the start of the text to the second last character, and the inner loop stepping through the text with steps of size 2 (since we're considering pairs of substring a).
  3. For every such potential concatenation, use the prefix hashes to compute the hashes of the two halves and compare them.
  4. If they are the same, it indicates we found a substring a such that a + a is a valid concatenated form present in text.
  5. To count the distinct substrings, we use a set to store the hash of any a we find. The set will automatically handle the uniqueness part.
  6. In the end, the size of this set gives us the number of distinct echo substrings.

Learn more about Trie patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

What is an advantages of top-down dynamic programming vs bottom-up dynamic programming?

Solution Approach

This solution approach heavily relies on computational string hashing to efficiently compare substrings. The idea behind string hashing is to transform a string into a numerical value (the hash) in such a way that if two strings are equal, their hashes should also be equal. String hashing supports constant-time substring comparison, which is crucial for this solution.

Here's the detailed solution approach:

  1. Hashing Initialization: Before we start comparing substrings, we precompute and store the hash values for all possible prefixes of the text, as well as the powers of our base number (chosen arbitrarily to be 131). The modulus value mod is chosen to be a large prime number to reduce the chance of collisions. These precomputations enable us to later obtain the hash of any substring in constant time.

  2. Computing Prefix Hashes and Powers: We iterate through the entire text, character by character. For each character, we calculate the running hash and corresponding power of the base.

    • The hash for the (i+1)th prefix h[i + 1] is computed as (h[i] * base + t) % mod, where t is the current character's numerical value.
    • The power p[i + 1] is simply the previous power times the base, modulo mod.
  3. Enumerating Echo Substrings: An echo substring is a string that can be evenly split into two identical halves (a + a). We iterate over all potential starting positions for a substring by using an outer loop. For each of these starting points, the inner loop tries to find the echo substrings with that starting point by varying the length in steps of 2.

  4. Hash Comparison: To check if a potential substring is an echo substring, we compute the hash values for both halves using the precomputed prefix hash array and compare them. For example, for a substring starting at index i and ending at index j, we want to find if the substring partitioned at index k (i + j) >> 1) is an echo substring, so we calculate get(i + 1, k + 1) and get(k + 2, j + 1) and check for equality.

  5. Storing Unique Hashes: If the two halves are identical (their hashes match), we add the hash of the first half to a set named vis. Since a set automatically dismisses duplicates, this ensures that only unique echo substrings' hashes are stored.

  6. Return the Answer: The total number of unique echo substrings is the size of the vis set, which is returned as the final result.

An important thing to note is that since we are dealing with indices, we're careful to use 1-based indexing for the hashes and powers, because the initial hash (empty string hash) and initial power should be 0 and 1 respectively.

Lastly, by utilizing a hash function, the complexity of checking whether two substrings are equal is reduced from O(n) to O(1), which makes this solution computationally efficient for larger texts.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which of the two traversal algorithms (BFS and DFS) can be used to find whether two nodes are connected?

Example Walkthrough

Let's illustrate the solution approach with an example. Consider the text string text = "ababaa" and we want to find all distinct echo substrings.

  1. Hashing Initialization:

    • We choose a base, let's say 131, and a large prime as the modulus for hashing.
    • Prefix hashes and powers of the base will be precomputed. Assume an arbitrary prime number, for example, mod = 10^9 + 7.
  2. Computing Prefix Hashes and Powers:

    • Let's initialize our hash array h and power array p with h[0] = 0 and p[0] = 1.
    • Now, we calculate h[1] = (h[0] * 131 + 'a') % mod, h[2] = (h[1] * 131 + 'b') % mod, and so on.
    • Similarly, we calculate p[1] = (p[0] * 131) % mod, p[2] = (p[1] * 131) % mod, and so forth.
  3. Enumerating Echo Substrings:

    • We explore all possible echo substrings. Start from the first character and consider all even-length substrings.
    • For instance, we'll look at substring lengths of 2, 4, up to the length of text.
  4. Hash Comparison:

    • To check for echo substrings, we compare the computed hashes. For example, with the substring text[0:1] ("ab"), we compare the hash of text[0:0] ("a") and text[1:1] ("b").
    • Since these hashes will differ, we move on.
  5. Storing Unique Hashes:

    • When we come across the substring text[0:3] ("abab"), we can split this into text[0:1] ("ab") and text[2:3] ("ab"). If hashes match, it's an echo substring.
    • We would then store the hash of text[0:1] in our vis set.
  6. Return the Answer:

    • As we continue this process along text, we find that the only other echo substring is text[2:5] ("abaa"), which splits into text[2:3] ("ab") and text[4:5] ("aa").
    • The hash of text[2:3] ("ab") is already in vis, but text[4:5] ("aa") is new and will be added to vis.
    • Hence, the number of unique echo substrings is the size of vis, which contains two elements, so the answer is 2.

By following these steps, we efficiently process the text to find all unique echo substrings, ensuring that the operation scales well for larger strings by using constant-time hash comparisons.

Solution Implementation

1class Solution:
2    def distinctEchoSubstrings(self, text: str) -> int:
3        # Calculate the hash value of the substring from index l to index r
4        def get_hash(l: int, r: int) -> int:
5            return (prefix_hashes[r] - prefix_hashes[l - 1] * pow_base[r - l + 1]) % mod
6
7        length_of_text = len(text)
8        base = 131
9        mod = 10**9 + 7
10
11        # Initialize prefix hash and base power arrays
12        prefix_hashes = [0] * (length_of_text + 1)
13        pow_base = [1] * (length_of_text + 1)
14
15        # Pre-compute hashes and powers of base for all prefixes
16        for i, character in enumerate(text):
17            char_code = ord(character) - ord('a') + 1
18            prefix_hashes[i + 1] = (prefix_hashes[i] * base + char_code) % mod
19            pow_base[i + 1] = (pow_base[i] * base) % mod
20
21        # Use a set to record unique echo substrings by their hash values
22        seen_echo_hashes = set()
23
24        # Check for echo substrings in the text
25        for i in range(length_of_text - 1):
26            for j in range(i + 1, length_of_text, 2):
27                middle = (i + j) // 2
28                hash_first_half = get_hash(i + 1, middle + 1)
29                hash_second_half = get_hash(middle + 2, j + 1)
30                # If both halves are identical, record the hash of the first half
31                if hash_first_half == hash_second_half:
32                    seen_echo_hashes.add(hash_first_half)
33
34        # The number of distinct echo substrings is the size of the hash set
35        return len(seen_echo_hashes)
36
1import java.util.HashSet;
2import java.util.Set;
3
4class Solution {
5    private long[] prefixHashes;
6    private long[] powersOfBase;
7
8    public int distinctEchoSubstrings(String text) {
9        int length = text.length();
10        int base = 131; // A prime number chosen as the base for hashing
11        prefixHashes = new long[length + 10];
12        powersOfBase = new long[length + 10];
13        powersOfBase[0] = 1;
14
15        // Precompute prefix hashes and powers of base for the text
16        for (int i = 0; i < length; ++i) {
17            int charValue = text.charAt(i) - 'a' + 1;
18            prefixHashes[i + 1] = prefixHashes[i] * base + charValue;
19            powersOfBase[i + 1] = powersOfBase[i] * base;
20        }
21
22        // Use a hash set to store echo substrings' hashes without duplication
23        Set<Long> vis = new HashSet<>();
24        for (int i = 0; i < length - 1; ++i) {
25            // The j index is incremented by 2 to ensure the substring can be split into two equal parts
26            for (int j = i + 1; j < length; j += 2) {
27                int mid = (i + j) / 2;
28                long firstHalfHash = getHash(i + 1, mid + 1);
29                long secondHalfHash = getHash(mid + 2, j + 1);
30
31                // If the hashes match, the substrings are equal, so add the hash to the set
32                if (firstHalfHash == secondHalfHash) {
33                    vis.add(firstHalfHash);
34                }
35            }
36        }
37
38        // The number of distinct echo substrings is the size of the set
39        return vis.size();
40    }
41
42    // Helper function to compute the hash of a substring using prefix hashes and the precomputed powers of base
43    private long getHash(int start, int end) {
44        return prefixHashes[end] - prefixHashes[start - 1] * powersOfBase[end - start + 1];
45    }
46}
47
1#include <string>
2#include <vector>
3#include <unordered_set>
4
5typedef unsigned long long ull; // Define `ull` for simplicity.
6
7class Solution {
8public:
9    // Function to calculate the number of distinct echo substrings in the given text.
10    int distinctEchoSubstrings(std::string text) {
11        int length = text.size();
12        int base = 131; // Base for polynomial rolling hash function.
13        std::vector<ull> powers(length + 10); // Precompute the powers of base.
14        std::vector<ull> hashValues(length + 10); // Hash values for prefixes of text.
15        powers[0] = 1;
16      
17        // Initialize the powers and hashValues vectors.
18        for (int i = 0; i < length; ++i) {
19            int letterValue = text[i] - 'a' + 1;
20            powers[i + 1] = powers[i] * base;
21            hashValues[i + 1] = hashValues[i] * base + letterValue;
22        }
23      
24        std::unordered_set<ull> uniqueEchoHashes; // Set to keep track of unique echo substrings.
25      
26        // Iterate over the text to find echo substrings.
27        for (int i = 0; i < length - 1; ++i) {
28            // Only need to check even length substrings for "echoes".
29            for (int j = i + 1; j < length; j += 2) {
30                int mid = (i + j) / 2;
31                ull firstHalfHash = calculateHash(i + 1, mid + 1, powers, hashValues);
32                ull secondHalfHash = calculateHash(mid + 2, j + 1, powers, hashValues);
33              
34                // If both halves match, add the hash to the set.
35                if (firstHalfHash == secondHalfHash) {
36                    uniqueEchoHashes.insert(firstHalfHash);
37                }
38            }
39        }
40      
41        // The number of distinct echo substrings is the size of the set.
42        return uniqueEchoHashes.size();
43    }
44
45private:
46    // Helper function to calculate the hash of a substring using precomputed powers and hash values.
47    ull calculateHash(int left, int right, const std::vector<ull>& powers, const std::vector<ull>& hashValues) {
48        return hashValues[right] - hashValues[left - 1] * powers[right - left + 1];
49    }
50};
51
1// Define 'ull' for simplicity.
2type ull = bigint;
3
4// Base for polynomial rolling hash function.
5const base: ull = 131n;
6
7// Function to calculate the hash of a substring using precomputed powers and hash values.
8function calculateHash(left: number, right: number, powers: ull[], hashValues: ull[]): ull {
9  return hashValues[right] - hashValues[left - 1] * powers[right - left + 1];
10}
11
12// Function to calculate the number of distinct echo substrings in the given text.
13function distinctEchoSubstrings(text: string): number {
14  const length: number = text.length;
15
16  // Precompute the powers of base.
17  const powers: ull[] = Array(length + 10).fill(0n);
18  // Hash values for prefixes of text.
19  const hashValues: ull[] = Array(length + 10).fill(0n);
20  powers[0] = 1n;
21
22  // Initialize the powers and hashValues arrays.
23  for (let i = 0; i < length; ++i) {
24    const letterValue: ull = BigInt(text.charCodeAt(i) - 'a'.charCodeAt(0) + 1);
25    powers[i + 1] = powers[i] * base;
26    hashValues[i + 1] = hashValues[i] * base + letterValue;
27  }
28
29  // Set to keep track of unique echo substring hashes.
30  const uniqueEchoHashes: Set<ull> = new Set();
31
32  // Iterate over the text to find echo substrings.
33  for (let i = 0; i < length - 1; ++i) {
34    // Only need to check even length substrings for "echoes".
35    for (let j = i + 1; j < length; j += 2) {
36      const mid: number = Math.floor((i + j) / 2);
37      const firstHalfHash: ull = calculateHash(i + 1, mid + 1, powers, hashValues);
38      const secondHalfHash: ull = calculateHash(mid + 2, j + 1, powers, hashValues);
39
40      // If both halves match, add the hash to the set.
41      if (firstHalfHash === secondHalfHash) {
42        uniqueEchoHashes.add(firstHalfHash);
43      }
44    }
45  }
46
47  // The number of distinct echo substrings is the size of the set.
48  return uniqueEchoHashes.size;
49}
50
51// Example usage:
52// const result: number = distinctEchoSubstrings("yourtext");
53// console.log(result);
54
Not Sure What to Study? Take the 2-min Quiz:

Which of the following array represent a max heap?

Time and Space Complexity

Time Complexity

The time complexity of the code is mainly determined by the two nested loops that run over the string to find all possible echo substrings. For a string of length n, the outer loop runs n - 1 times and the inner loop runs in the order of n/2 times on average (since it increments by 2 each time). Within the inner loop, the time to calculate hashes is constant, thanks to the precomputed hash and power values.

So, the total time complexity is O(n^2) considering the nested loops and the fact that each hash computation within the loops is done in constant time.

Space Complexity

Space complexity comes from the storage of precomputed hash values h and power values p, as well as the hash set vis used to store unique hash values corresponding to echo substrings. This results in O(n) space complexity.

  • h and p each require O(n) space.
  • vis may potentially store up to O(n/2) distinct hashes (in the worst case, every substring might be an echo substring).

Therefore, the space complexity is O(n) + O(n) + O(n/2), which simplifies to O(n).

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which one best describes the time complexity of the following code?

1int factorial(int n) {
2  if (n < 0) {
3    return -1;
4  } else if (n == 0) {
5    return 1;
6  } else {
7    return n * factorial(n - 1);
8  }
9}

Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫