1316. Distinct Echo Substrings
Problem Description
The challenge is to count the unique non-empty substrings of a given text string that can be expressed as the concatenation of the same string twice. In other words, for a given string text
, we are looking for all the distinguishable substrings which satisfy the condition that substring X
can be written as X = a + a
, where a
is some string. We regard two substrings as distinct if they appear at different positions in text
, even if they are composed of the same characters.
Intuition
To solve this problem efficiently, we utilize a hash function to represent substrings, which allows us to compare the equality of two substrings in constant time instead of linear time. This hash function needs to be carefully designed to avoid collisions where different substrings may have the same hash value.
The solution involves the following steps:
- Precompute hash values for all prefixes of the text string, as well as powers of the base used in the hash function. This is to ensure we can calculate the hash for any substring quickly.
- Iterate over all possible lengths of
2*a
, wherea
is the substring we are trying to find. We do this by using two nested loops, the outer loop going from the start of the text to the second last character, and the inner loop stepping through the text with steps of size 2 (since we're considering pairs of substringa
). - For every such potential concatenation, use the prefix hashes to compute the hashes of the two halves and compare them.
- If they are the same, it indicates we found a substring
a
such thata + a
is a valid concatenated form present intext
. - To count the distinct substrings, we use a set to store the hash of any
a
we find. The set will automatically handle the uniqueness part. - In the end, the size of this set gives us the number of distinct echo substrings.
Learn more about Trie patterns.
Solution Approach
This solution approach heavily relies on computational string hashing to efficiently compare substrings. The idea behind string hashing is to transform a string into a numerical value (the hash) in such a way that if two strings are equal, their hashes should also be equal. String hashing supports constant-time substring comparison, which is crucial for this solution.
Here's the detailed solution approach:
-
Hashing Initialization: Before we start comparing substrings, we precompute and store the hash values for all possible prefixes of the text, as well as the powers of our base number (chosen arbitrarily to be
131
). The modulus valuemod
is chosen to be a large prime number to reduce the chance of collisions. These precomputations enable us to later obtain the hash of any substring in constant time. -
Computing Prefix Hashes and Powers: We iterate through the entire text, character by character. For each character, we calculate the running hash and corresponding power of the base.
- The hash for the (i+1)th prefix
h[i + 1]
is computed as(h[i] * base + t) % mod
, wheret
is the current character's numerical value. - The power
p[i + 1]
is simply the previous power times the base, modulomod
.
- The hash for the (i+1)th prefix
-
Enumerating Echo Substrings: An echo substring is a string that can be evenly split into two identical halves (
a + a
). We iterate over all potential starting positions for a substring by using an outer loop. For each of these starting points, the inner loop tries to find the echo substrings with that starting point by varying the length in steps of 2. -
Hash Comparison: To check if a potential substring is an echo substring, we compute the hash values for both halves using the precomputed prefix hash array and compare them. For example, for a substring starting at index
i
and ending at indexj
, we want to find if the substring partitioned at indexk
(i + j) >> 1
) is an echo substring, so we calculateget(i + 1, k + 1)
andget(k + 2, j + 1)
and check for equality. -
Storing Unique Hashes: If the two halves are identical (their hashes match), we add the hash of the first half to a set named
vis
. Since a set automatically dismisses duplicates, this ensures that only unique echo substrings' hashes are stored. -
Return the Answer: The total number of unique echo substrings is the size of the
vis
set, which is returned as the final result.
An important thing to note is that since we are dealing with indices, we're careful to use 1-based indexing for the hashes and powers, because the initial hash (empty string hash) and initial power should be 0
and 1
respectively.
Lastly, by utilizing a hash function, the complexity of checking whether two substrings are equal is reduced from O(n)
to O(1)
, which makes this solution computationally efficient for larger texts.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's illustrate the solution approach with an example. Consider the text string text = "ababaa"
and we want to find all distinct echo substrings.
-
Hashing Initialization:
- We choose a base, let's say 131, and a large prime as the modulus for hashing.
- Prefix hashes and powers of the base will be precomputed. Assume an arbitrary prime number, for example,
mod = 10^9 + 7
.
-
Computing Prefix Hashes and Powers:
- Let's initialize our hash array
h
and power arrayp
withh[0] = 0
andp[0] = 1
. - Now, we calculate
h[1] = (h[0] * 131 + 'a') % mod
,h[2] = (h[1] * 131 + 'b') % mod
, and so on. - Similarly, we calculate
p[1] = (p[0] * 131) % mod
,p[2] = (p[1] * 131) % mod
, and so forth.
- Let's initialize our hash array
-
Enumerating Echo Substrings:
- We explore all possible echo substrings. Start from the first character and consider all even-length substrings.
- For instance, we'll look at substring lengths of 2, 4, up to the length of
text
.
-
Hash Comparison:
- To check for echo substrings, we compare the computed hashes. For example, with the substring
text[0:1]
("ab"), we compare the hash oftext[0:0]
("a") andtext[1:1]
("b"). - Since these hashes will differ, we move on.
- To check for echo substrings, we compare the computed hashes. For example, with the substring
-
Storing Unique Hashes:
- When we come across the substring
text[0:3]
("abab"), we can split this intotext[0:1]
("ab") andtext[2:3]
("ab"). If hashes match, it's an echo substring. - We would then store the hash of
text[0:1]
in ourvis
set.
- When we come across the substring
-
Return the Answer:
- As we continue this process along
text
, we find that the only other echo substring istext[2:5]
("abaa"), which splits intotext[2:3]
("ab") andtext[4:5]
("aa"). - The hash of
text[2:3]
("ab") is already invis
, buttext[4:5]
("aa") is new and will be added tovis
. - Hence, the number of unique echo substrings is the size of
vis
, which contains two elements, so the answer is 2.
- As we continue this process along
By following these steps, we efficiently process the text to find all unique echo substrings, ensuring that the operation scales well for larger strings by using constant-time hash comparisons.
Solution Implementation
1class Solution:
2 def distinctEchoSubstrings(self, text: str) -> int:
3 # Calculate the hash value of the substring from index l to index r
4 def get_hash(l: int, r: int) -> int:
5 return (prefix_hashes[r] - prefix_hashes[l - 1] * pow_base[r - l + 1]) % mod
6
7 length_of_text = len(text)
8 base = 131
9 mod = 10**9 + 7
10
11 # Initialize prefix hash and base power arrays
12 prefix_hashes = [0] * (length_of_text + 1)
13 pow_base = [1] * (length_of_text + 1)
14
15 # Pre-compute hashes and powers of base for all prefixes
16 for i, character in enumerate(text):
17 char_code = ord(character) - ord('a') + 1
18 prefix_hashes[i + 1] = (prefix_hashes[i] * base + char_code) % mod
19 pow_base[i + 1] = (pow_base[i] * base) % mod
20
21 # Use a set to record unique echo substrings by their hash values
22 seen_echo_hashes = set()
23
24 # Check for echo substrings in the text
25 for i in range(length_of_text - 1):
26 for j in range(i + 1, length_of_text, 2):
27 middle = (i + j) // 2
28 hash_first_half = get_hash(i + 1, middle + 1)
29 hash_second_half = get_hash(middle + 2, j + 1)
30 # If both halves are identical, record the hash of the first half
31 if hash_first_half == hash_second_half:
32 seen_echo_hashes.add(hash_first_half)
33
34 # The number of distinct echo substrings is the size of the hash set
35 return len(seen_echo_hashes)
36
1import java.util.HashSet;
2import java.util.Set;
3
4class Solution {
5 private long[] prefixHashes;
6 private long[] powersOfBase;
7
8 public int distinctEchoSubstrings(String text) {
9 int length = text.length();
10 int base = 131; // A prime number chosen as the base for hashing
11 prefixHashes = new long[length + 10];
12 powersOfBase = new long[length + 10];
13 powersOfBase[0] = 1;
14
15 // Precompute prefix hashes and powers of base for the text
16 for (int i = 0; i < length; ++i) {
17 int charValue = text.charAt(i) - 'a' + 1;
18 prefixHashes[i + 1] = prefixHashes[i] * base + charValue;
19 powersOfBase[i + 1] = powersOfBase[i] * base;
20 }
21
22 // Use a hash set to store echo substrings' hashes without duplication
23 Set<Long> vis = new HashSet<>();
24 for (int i = 0; i < length - 1; ++i) {
25 // The j index is incremented by 2 to ensure the substring can be split into two equal parts
26 for (int j = i + 1; j < length; j += 2) {
27 int mid = (i + j) / 2;
28 long firstHalfHash = getHash(i + 1, mid + 1);
29 long secondHalfHash = getHash(mid + 2, j + 1);
30
31 // If the hashes match, the substrings are equal, so add the hash to the set
32 if (firstHalfHash == secondHalfHash) {
33 vis.add(firstHalfHash);
34 }
35 }
36 }
37
38 // The number of distinct echo substrings is the size of the set
39 return vis.size();
40 }
41
42 // Helper function to compute the hash of a substring using prefix hashes and the precomputed powers of base
43 private long getHash(int start, int end) {
44 return prefixHashes[end] - prefixHashes[start - 1] * powersOfBase[end - start + 1];
45 }
46}
47
1#include <string>
2#include <vector>
3#include <unordered_set>
4
5typedef unsigned long long ull; // Define `ull` for simplicity.
6
7class Solution {
8public:
9 // Function to calculate the number of distinct echo substrings in the given text.
10 int distinctEchoSubstrings(std::string text) {
11 int length = text.size();
12 int base = 131; // Base for polynomial rolling hash function.
13 std::vector<ull> powers(length + 10); // Precompute the powers of base.
14 std::vector<ull> hashValues(length + 10); // Hash values for prefixes of text.
15 powers[0] = 1;
16
17 // Initialize the powers and hashValues vectors.
18 for (int i = 0; i < length; ++i) {
19 int letterValue = text[i] - 'a' + 1;
20 powers[i + 1] = powers[i] * base;
21 hashValues[i + 1] = hashValues[i] * base + letterValue;
22 }
23
24 std::unordered_set<ull> uniqueEchoHashes; // Set to keep track of unique echo substrings.
25
26 // Iterate over the text to find echo substrings.
27 for (int i = 0; i < length - 1; ++i) {
28 // Only need to check even length substrings for "echoes".
29 for (int j = i + 1; j < length; j += 2) {
30 int mid = (i + j) / 2;
31 ull firstHalfHash = calculateHash(i + 1, mid + 1, powers, hashValues);
32 ull secondHalfHash = calculateHash(mid + 2, j + 1, powers, hashValues);
33
34 // If both halves match, add the hash to the set.
35 if (firstHalfHash == secondHalfHash) {
36 uniqueEchoHashes.insert(firstHalfHash);
37 }
38 }
39 }
40
41 // The number of distinct echo substrings is the size of the set.
42 return uniqueEchoHashes.size();
43 }
44
45private:
46 // Helper function to calculate the hash of a substring using precomputed powers and hash values.
47 ull calculateHash(int left, int right, const std::vector<ull>& powers, const std::vector<ull>& hashValues) {
48 return hashValues[right] - hashValues[left - 1] * powers[right - left + 1];
49 }
50};
51
1// Define 'ull' for simplicity.
2type ull = bigint;
3
4// Base for polynomial rolling hash function.
5const base: ull = 131n;
6
7// Function to calculate the hash of a substring using precomputed powers and hash values.
8function calculateHash(left: number, right: number, powers: ull[], hashValues: ull[]): ull {
9 return hashValues[right] - hashValues[left - 1] * powers[right - left + 1];
10}
11
12// Function to calculate the number of distinct echo substrings in the given text.
13function distinctEchoSubstrings(text: string): number {
14 const length: number = text.length;
15
16 // Precompute the powers of base.
17 const powers: ull[] = Array(length + 10).fill(0n);
18 // Hash values for prefixes of text.
19 const hashValues: ull[] = Array(length + 10).fill(0n);
20 powers[0] = 1n;
21
22 // Initialize the powers and hashValues arrays.
23 for (let i = 0; i < length; ++i) {
24 const letterValue: ull = BigInt(text.charCodeAt(i) - 'a'.charCodeAt(0) + 1);
25 powers[i + 1] = powers[i] * base;
26 hashValues[i + 1] = hashValues[i] * base + letterValue;
27 }
28
29 // Set to keep track of unique echo substring hashes.
30 const uniqueEchoHashes: Set<ull> = new Set();
31
32 // Iterate over the text to find echo substrings.
33 for (let i = 0; i < length - 1; ++i) {
34 // Only need to check even length substrings for "echoes".
35 for (let j = i + 1; j < length; j += 2) {
36 const mid: number = Math.floor((i + j) / 2);
37 const firstHalfHash: ull = calculateHash(i + 1, mid + 1, powers, hashValues);
38 const secondHalfHash: ull = calculateHash(mid + 2, j + 1, powers, hashValues);
39
40 // If both halves match, add the hash to the set.
41 if (firstHalfHash === secondHalfHash) {
42 uniqueEchoHashes.add(firstHalfHash);
43 }
44 }
45 }
46
47 // The number of distinct echo substrings is the size of the set.
48 return uniqueEchoHashes.size;
49}
50
51// Example usage:
52// const result: number = distinctEchoSubstrings("yourtext");
53// console.log(result);
54
Time and Space Complexity
Time Complexity
The time complexity of the code is mainly determined by the two nested loops that run over the string to find all possible echo substrings. For a string of length n
, the outer loop runs n - 1
times and the inner loop runs in the order of n/2
times on average (since it increments by 2
each time). Within the inner loop, the time to calculate hashes is constant, thanks to the precomputed hash and power values.
So, the total time complexity is O(n^2) considering the nested loops and the fact that each hash computation within the loops is done in constant time.
Space Complexity
Space complexity comes from the storage of precomputed hash values h
and power values p
, as well as the hash set vis
used to store unique hash values corresponding to echo substrings. This results in O(n) space complexity.
h
andp
each require O(n) space.vis
may potentially store up to O(n/2) distinct hashes (in the worst case, every substring might be an echo substring).
Therefore, the space complexity is O(n) + O(n) + O(n/2), which simplifies to O(n).
Learn more about how to find time and space complexity quickly using problem constraints.
Which algorithm should you use to find a node that is close to the root of the tree?
Recommended Readings
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
Patterns The Shortest Path Algorithm for Coding Interviews The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we can determine the
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Got a question?ย Ask the Monster Assistantย anything you don't understand.
Still not clear? ย Submitย the part you don't understand to our editors. Or join ourย Discord and ask the community.