1316. Distinct Echo Substrings
Problem Description
The problem asks you to find the number of distinct substrings in a given string text
that can be formed by concatenating a string with itself.
In other words, you need to count how many unique substrings have the pattern a + a
, where a
is some non-empty string. This means the substring can be split into two identical halves.
For example:
- In the string
"abcabc"
, the substring"abcabc"
itself is an echo substring because it equals"abc" + "abc"
- In the string
"aaaa"
, substrings like"aa"
(which is"a" + "a"
),"aaaa"
(which is"aa" + "aa"
), are echo substrings
The key requirements are:
- The substrings must be non-empty
- Each substring must be able to be split into two identical consecutive parts
- Count only distinct echo substrings (if the same echo substring appears multiple times at different positions, count it only once)
The solution uses rolling hash to efficiently compare whether two halves of a substring are identical. It iterates through all possible even-length substrings (since echo substrings must have even length), checks if their two halves match using hash values, and stores the unique echo substrings in a set to avoid counting duplicates.
Intuition
To find substrings that can be written as a + a
, we need to check if any substring can be split into two identical halves. The straightforward approach would be to:
- Generate all possible substrings
- Check if each substring has even length (odd-length strings can't be split into two equal parts)
- Split each even-length substring in half and compare if both halves are identical
However, directly comparing strings character by character for every possible substring would be inefficient, especially for longer strings. This is where the key insight comes in: we can use rolling hash to speed up string comparisons.
Rolling hash allows us to compute a hash value for any substring in O(1)
time after preprocessing. The idea is:
- Precompute hash values for all prefixes of the string
- For any substring from position
i
toj
, we can calculate its hash value using the prefix hashes - If two substrings have the same hash value, they are likely identical (with very high probability when using a good hash function)
The algorithm flow becomes:
- For each possible starting position
i
in the string - For each possible ending position
j
that makes the substring length even (by incrementingj
by 2 each time) - Find the middle point
k
of the substring fromi
toj
- Use hash values to quickly check if substring
[i, k]
equals substring[k+1, j]
- If they match, we found an echo substring - add its hash to a set to track distinct ones
Using a set to store the hash values of valid echo substrings automatically handles the "distinct" requirement, as duplicate echo substrings will have the same hash and won't be counted twice.
Learn more about Trie patterns.
Solution Approach
The solution implements a rolling hash algorithm to efficiently find all distinct echo substrings. Here's the step-by-step implementation:
1. Rolling Hash Setup
base = 131
mod = int(1e9) + 7
h = [0] * (n + 10) # Prefix hash array
p = [1] * (n + 10) # Powers of base array
- We use
base = 131
as the base for our polynomial hash function mod = 10^9 + 7
to prevent integer overflowh[i]
stores the hash value of the prefixtext[0...i-1]
p[i]
storesbase^i % mod
for quick computation
2. Preprocessing - Computing Prefix Hashes
for i, c in enumerate(text):
t = ord(c) - ord('a') + 1
h[i + 1] = (h[i] * base) % mod + t
p[i + 1] = (p[i] * base) % mod
- Convert each character to a number (1-26 for 'a'-'z')
- Build the prefix hash array using the formula:
h[i+1] = h[i] * base + text[i]
- Precompute powers of base for later use
3. Hash Extraction Function
def get(l, r):
return (h[r] - h[l - 1] * p[r - l + 1]) % mod
- This function extracts the hash value of substring
text[l-1...r-1]
in O(1) time - Formula:
hash(s[l...r]) = (h[r] - h[l-1] * base^(r-l+1)) % mod
- This works by removing the contribution of the prefix before position
l
4. Finding Echo Substrings
vis = set()
for i in range(n - 1):
for j in range(i + 1, n, 2):
k = (i + j) >> 1
a = get(i + 1, k + 1)
b = get(k + 2, j + 1)
if a == b:
vis.add(a)
i
is the starting index (0-indexed in the loop, but 1-indexed for theget
function)j
is the ending index, incremented by 2 to ensure even-length substringsk = (i + j) >> 1
finds the middle point (using bit shift for division by 2)- Compare hash values of the first half
[i+1, k+1]
and second half[k+2, j+1]
- If hashes match, the substring is an echo substring - add its hash to the set
- Using a set automatically handles duplicates
5. Return Result
return len(vis)
- The size of the set gives us the count of distinct echo substrings
The time complexity is O(nΒ²)
for checking all possible even-length substrings, with O(1)
hash comparisons. The space complexity is O(n)
for storing the hash arrays and the set of distinct echo substrings.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with the string text = "abcabc"
.
Step 1: Initialize Rolling Hash
n = 6
(length of "abcabc")- Initialize arrays:
h = [0, 0, 0, 0, 0, 0, 0]
andp = [1, 0, 0, 0, 0, 0, 0]
base = 131
,mod = 10^9 + 7
Step 2: Build Prefix Hash Array
- For 'a' (index 0):
t = 1
,h[1] = 0 * 131 + 1 = 1
,p[1] = 131
- For 'b' (index 1):
t = 2
,h[2] = 1 * 131 + 2 = 133
,p[2] = 131^2
- For 'c' (index 2):
t = 3
,h[3] = 133 * 131 + 3 = 17426
,p[3] = 131^3
- For 'a' (index 3):
t = 1
,h[4] = 17426 * 131 + 1 = 2282807
,p[4] = 131^4
- For 'b' (index 4):
t = 2
,h[5] = 2282807 * 131 + 2 = 299047719
,p[5] = 131^5
- For 'c' (index 5):
t = 3
,h[6] = 299047719 * 131 + 3 = (computed with mod)
Step 3: Find Echo Substrings
Let's trace through some iterations:
-
i = 0 (starting at 'a'):
-
j = 1: Substring "ab" (length 2)
- Middle point:
k = (0 + 1) >> 1 = 0
- First half:
get(1, 1)
= hash of "a" = 1 - Second half:
get(2, 2)
= hash of "b" = 2 - 1 β 2, not an echo substring
- Middle point:
-
j = 3: Substring "abca" (length 4)
- Middle point:
k = (0 + 3) >> 1 = 1
- First half:
get(1, 2)
= hash of "ab" = 133 - Second half:
get(3, 4)
= hash of "ca" = different hash - Not an echo substring
- Middle point:
-
j = 5: Substring "abcabc" (length 6)
- Middle point:
k = (0 + 5) >> 1 = 2
- First half:
get(1, 3)
= hash of "abc" = 17426 - Second half:
get(4, 6)
= hash of "abc" = 17426 - 17426 = 17426, this IS an echo substring!
- Add 17426 to
vis
set
- Middle point:
-
-
i = 2 (starting at 'c'):
-
j = 3: Substring "ca" (length 2)
- First half: hash of "c", Second half: hash of "a"
- Different hashes, not an echo substring
-
j = 5: Substring "cabc" (length 4)
- First half: hash of "ca", Second half: hash of "bc"
- Different hashes, not an echo substring
-
-
i = 3 (starting at 'a'):
- j = 4: Substring "ab" (length 2)
- This is the same "ab" we saw earlier
- Not an echo substring (different halves)
- j = 4: Substring "ab" (length 2)
Step 4: Return Result
vis = {17426}
contains only one distinct echo substring ("abcabc" = "abc" + "abc")- Return
len(vis) = 1
The algorithm correctly identifies that "abcabc" has exactly one distinct echo substring, which is the entire string itself formed by concatenating "abc" with itself.
Solution Implementation
1class Solution:
2 def distinctEchoSubstrings(self, text: str) -> int:
3 """
4 Find the number of distinct non-empty substrings that can be written
5 as the concatenation of some string with itself (echo substrings).
6
7 Uses rolling hash technique for efficient substring comparison.
8 """
9 def get_hash(left: int, right: int) -> int:
10 """
11 Calculate the hash value of substring text[left-1:right].
12 Uses 1-indexed positions for convenience with the hash array.
13 """
14 return (hash_values[right] - hash_values[left - 1] * powers[right - left + 1]) % MOD
15
16 # Initialize constants and variables
17 n = len(text)
18 BASE = 131 # Prime base for rolling hash
19 MOD = 10**9 + 7 # Large prime modulus to avoid collisions
20
21 # Precompute hash values and powers of base
22 # hash_values[i] stores hash of text[0:i]
23 # powers[i] stores BASE^i mod MOD
24 hash_values = [0] * (n + 10)
25 powers = [1] * (n + 10)
26
27 # Build hash array and power array
28 for i, char in enumerate(text):
29 # Convert character to number (a=1, b=2, ...)
30 char_value = ord(char) - ord('a') + 1
31 # Calculate cumulative hash
32 hash_values[i + 1] = (hash_values[i] * BASE + char_value) % MOD
33 # Calculate powers of base
34 powers[i + 1] = (powers[i] * BASE) % MOD
35
36 # Set to store unique echo substring hashes
37 unique_echo_hashes = set()
38
39 # Check all possible echo substrings
40 # i is the starting position (0-indexed)
41 for i in range(n - 1):
42 # j is the ending position (0-indexed), step by 2 for even length
43 for j in range(i + 1, n, 2):
44 # Calculate midpoint of substring
45 midpoint = (i + j) >> 1
46
47 # Get hash of first half [i, midpoint]
48 first_half_hash = get_hash(i + 1, midpoint + 1)
49
50 # Get hash of second half [midpoint+1, j]
51 second_half_hash = get_hash(midpoint + 2, j + 1)
52
53 # If both halves are identical, it's an echo substring
54 if first_half_hash == second_half_hash:
55 unique_echo_hashes.add(first_half_hash)
56
57 return len(unique_echo_hashes)
58
1class Solution {
2 private long[] prefixHash; // Stores prefix hash values for the string
3 private long[] basePowers; // Stores powers of the base for hash calculation
4
5 public int distinctEchoSubstrings(String text) {
6 int textLength = text.length();
7 int hashBase = 131; // Prime number base for rolling hash
8
9 // Initialize arrays with extra space to avoid boundary issues
10 prefixHash = new long[textLength + 10];
11 basePowers = new long[textLength + 10];
12 basePowers[0] = 1; // Base^0 = 1
13
14 // Build prefix hash array and base powers array
15 // prefixHash[i+1] represents hash of text[0...i]
16 for (int i = 0; i < textLength; ++i) {
17 // Convert character to 1-based value (a=1, b=2, ..., z=26)
18 int charValue = text.charAt(i) - 'a' + 1;
19
20 // Calculate prefix hash: hash(s[0...i]) = hash(s[0...i-1]) * base + s[i]
21 prefixHash[i + 1] = prefixHash[i] * hashBase + charValue;
22
23 // Calculate base^(i+1) for future hash calculations
24 basePowers[i + 1] = basePowers[i] * hashBase;
25 }
26
27 // Set to store unique hash values of echo substrings
28 Set<Long> uniqueEchoHashes = new HashSet<>();
29
30 // Iterate through all possible starting positions
31 for (int startIndex = 0; startIndex < textLength - 1; ++startIndex) {
32 // Check all odd-length substrings (echo substrings have even length)
33 // endIndex increments by 2 to ensure even-length substrings
34 for (int endIndex = startIndex + 1; endIndex < textLength; endIndex += 2) {
35 // Calculate midpoint of the substring
36 int midPoint = (startIndex + endIndex) >> 1;
37
38 // Get hash of first half: text[startIndex...midPoint]
39 // Using 1-based indexing for the get method
40 long firstHalfHash = getSubstringHash(startIndex + 1, midPoint + 1);
41
42 // Get hash of second half: text[midPoint+1...endIndex]
43 long secondHalfHash = getSubstringHash(midPoint + 2, endIndex + 1);
44
45 // If both halves have the same hash, it's an echo substring
46 if (firstHalfHash == secondHalfHash) {
47 uniqueEchoHashes.add(firstHalfHash);
48 }
49 }
50 }
51
52 return uniqueEchoHashes.size();
53 }
54
55 /**
56 * Calculates the hash value of substring from position i to j (1-based indexing)
57 * Uses the formula: hash(s[i-1...j-1]) = (prefixHash[j] - prefixHash[i-1] * base^(j-i+1))
58 *
59 * @param i Starting position (1-based)
60 * @param j Ending position (1-based)
61 * @return Hash value of the substring
62 */
63 private long getSubstringHash(int i, int j) {
64 return prefixHash[j] - prefixHash[i - 1] * basePowers[j - i + 1];
65 }
66}
67
1typedef unsigned long long ull;
2
3class Solution {
4public:
5 int distinctEchoSubstrings(string text) {
6 int n = text.size();
7 const int BASE = 131; // Base for polynomial rolling hash
8
9 // power[i] stores BASE^i for efficient hash computation
10 vector<ull> power(n + 10);
11 // prefixHash[i] stores the hash value of text[0...i-1]
12 vector<ull> prefixHash(n + 10);
13
14 // Initialize power array and compute prefix hashes
15 power[0] = 1;
16 for (int i = 0; i < n; ++i) {
17 // Convert character to 1-based value (a=1, b=2, ...)
18 int charValue = text[i] - 'a' + 1;
19 power[i + 1] = power[i] * BASE;
20 prefixHash[i + 1] = prefixHash[i] * BASE + charValue;
21 }
22
23 // Set to store unique hash values of echo substrings
24 unordered_set<ull> uniqueEchoHashes;
25
26 // Iterate through all possible echo substrings
27 // i is the starting index (0-based)
28 for (int i = 0; i < n - 1; ++i) {
29 // j is the ending index (0-based), increment by 2 to ensure even length
30 for (int j = i + 1; j < n; j += 2) {
31 // Calculate midpoint to split the substring into two equal halves
32 int mid = (i + j) >> 1;
33
34 // Get hash of first half: text[i...mid]
35 ull firstHalfHash = getSubstringHash(i + 1, mid + 1, power, prefixHash);
36 // Get hash of second half: text[mid+1...j]
37 ull secondHalfHash = getSubstringHash(mid + 2, j + 1, power, prefixHash);
38
39 // If both halves are identical, we found an echo substring
40 if (firstHalfHash == secondHalfHash) {
41 uniqueEchoHashes.insert(firstHalfHash);
42 }
43 }
44 }
45
46 return uniqueEchoHashes.size();
47 }
48
49private:
50 // Compute hash value of substring text[left-1...right-1] using 1-based indexing
51 ull getSubstringHash(int left, int right, vector<ull>& power, vector<ull>& prefixHash) {
52 return prefixHash[right] - prefixHash[left - 1] * power[right - left + 1];
53 }
54};
55
1type ull = bigint;
2
3function distinctEchoSubstrings(text: string): number {
4 const n = text.length;
5 const BASE = 131n; // Base for polynomial rolling hash
6
7 // power[i] stores BASE^i for efficient hash computation
8 const power: ull[] = new Array(n + 10);
9 // prefixHash[i] stores the hash value of text[0...i-1]
10 const prefixHash: ull[] = new Array(n + 10);
11
12 // Initialize power array and compute prefix hashes
13 power[0] = 1n;
14 prefixHash[0] = 0n;
15
16 for (let i = 0; i < n; i++) {
17 // Convert character to 1-based value (a=1, b=2, ...)
18 const charValue = BigInt(text.charCodeAt(i) - 'a'.charCodeAt(0) + 1);
19 power[i + 1] = power[i] * BASE;
20 prefixHash[i + 1] = prefixHash[i] * BASE + charValue;
21 }
22
23 // Set to store unique hash values of echo substrings
24 const uniqueEchoHashes = new Set<string>();
25
26 // Iterate through all possible echo substrings
27 // i is the starting index (0-based)
28 for (let i = 0; i < n - 1; i++) {
29 // j is the ending index (0-based), increment by 2 to ensure even length
30 for (let j = i + 1; j < n; j += 2) {
31 // Calculate midpoint to split the substring into two equal halves
32 const mid = (i + j) >> 1;
33
34 // Get hash of first half: text[i...mid]
35 const firstHalfHash = getSubstringHash(i + 1, mid + 1, power, prefixHash);
36 // Get hash of second half: text[mid+1...j]
37 const secondHalfHash = getSubstringHash(mid + 2, j + 1, power, prefixHash);
38
39 // If both halves are identical, we found an echo substring
40 if (firstHalfHash === secondHalfHash) {
41 // Convert bigint to string for Set storage
42 uniqueEchoHashes.add(firstHalfHash.toString());
43 }
44 }
45 }
46
47 return uniqueEchoHashes.size;
48}
49
50// Compute hash value of substring text[left-1...right-1] using 1-based indexing
51function getSubstringHash(left: number, right: number, power: ull[], prefixHash: ull[]): ull {
52 return prefixHash[right] - prefixHash[left - 1] * power[right - left + 1];
53}
54
Time and Space Complexity
Time Complexity: O(nΒ²)
The algorithm consists of two main parts:
-
Preprocessing phase: Computing the hash array
h
and power arrayp
takesO(n)
time, wheren
is the length of the input string. -
Main computation phase: The nested loops iterate through all possible starting positions
i
(from0
ton-1
) and examine substrings of even length starting at positioni
. For each starting positioni
, the inner loop runs approximatelyn/2
times in the worst case (whenj
goes fromi+1
ton
with step size 2). This gives usO(n) * O(n/2) = O(nΒ²)
iterations. Inside the loops, theget
function performs constant time operations using the precomputed hash values.
Therefore, the overall time complexity is O(n) + O(nΒ²) = O(nΒ²)
.
Space Complexity: O(n)
The space usage breaks down as follows:
- Hash array
h
:O(n + 10) = O(n)
space - Power array
p
:O(n + 10) = O(n)
space - Set
vis
to store unique hash values: In the worst case, this could storeO(nΒ²)
different substrings, but since we're only storing hash values of echo substrings (substrings that can be split into two identical halves), the maximum number of unique echo substrings is bounded byO(nΒ²)
. However, in practice, the set typically stores much fewer elements.
The dominant space usage comes from the auxiliary arrays and the set, giving us an overall space complexity of O(n)
for the auxiliary arrays plus potentially O(nΒ²)
for the set in the worst case, resulting in O(nΒ²)
worst-case space complexity. However, if we consider only the additional space used beyond the input, the practical space complexity is often closer to O(n)
due to the limited number of actual echo substrings.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Hash Collision Leading to False Positives
The most critical pitfall in this rolling hash solution is that hash collisions can occur - two different strings might produce the same hash value, leading to incorrect counting of echo substrings.
Problem Example:
# Even with a good hash function, collisions can happen # String "ab" and "xy" might hash to the same value # This would incorrectly identify "abxy" as an echo substring
Solution: Use double hashing or verify matches with actual string comparison:
def distinctEchoSubstrings(self, text: str) -> int:
# ... (hash setup code remains the same)
unique_echo_substrings = set() # Store actual substrings, not hashes
for i in range(n - 1):
for j in range(i + 1, n, 2):
midpoint = (i + j) >> 1
first_half_hash = get_hash(i + 1, midpoint + 1)
second_half_hash = get_hash(midpoint + 2, j + 1)
if first_half_hash == second_half_hash:
# Verify with actual string comparison to avoid false positives
first_half = text[i:midpoint + 1]
second_half = text[midpoint + 1:j + 1]
if first_half == second_half:
unique_echo_substrings.add(text[i:j + 1])
return len(unique_echo_substrings)
2. Integer Overflow in Hash Calculation
Even with modulo operations, intermediate calculations can overflow, especially hash_values[left - 1] * powers[right - left + 1]
.
Problem Example:
# This calculation can overflow before applying modulo (hash_values[left - 1] * powers[right - left + 1]) % MOD
Solution: Apply modulo at each step and handle negative values properly:
def get_hash(left: int, right: int) -> int:
# Apply modulo to each multiplication step
hash_diff = hash_values[right] - (hash_values[left - 1] * powers[right - left + 1]) % MOD
# Handle negative values correctly
return (hash_diff % MOD + MOD) % MOD
3. Off-by-One Errors in Index Conversion
Mixing 0-indexed loop variables with 1-indexed hash array access is error-prone.
Problem Example:
# Easy to confuse these indices:
for i in range(n - 1): # i is 0-indexed
get_hash(i + 1, k + 1) # get_hash expects 1-indexed positions
Solution: Use consistent indexing throughout or add clear documentation:
def distinctEchoSubstrings(self, text: str) -> int:
# Alternative: Use 0-indexed throughout
def get_hash_0indexed(left: int, right: int) -> int:
"""Get hash of text[left:right+1] using 0-based indexing"""
if left == 0:
return hash_values[right + 1]
return (hash_values[right + 1] - hash_values[left] * powers[right - left + 1]) % MOD
# Now the loop is cleaner:
for start in range(n - 1):
for end in range(start + 1, n, 2):
mid = (start + end) // 2
first_half = get_hash_0indexed(start, mid)
second_half = get_hash_0indexed(mid + 1, end)
# ...
4. Using Non-Prime Base or Small Modulus
Using a composite number as base or a small modulus increases collision probability.
Problem Example:
BASE = 256 # Power of 2 - higher collision probability MOD = 10007 # Too small for large strings
Solution: Always use large prime numbers:
# Good choices for rolling hash BASE = 131 # or 137, 139, 149 (prime numbers) MOD = 10**9 + 7 # or 10**9 + 9, 2**61 - 1 (large primes) # For extra security, use multiple hash functions BASE1, MOD1 = 131, 10**9 + 7 BASE2, MOD2 = 137, 10**9 + 9
Depth first search is equivalent to which of the tree traversal order?
Recommended Readings
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Donβt Miss This!