Facebook Pixel

1316. Distinct Echo Substrings

HardTrieStringHash FunctionRolling Hash
Leetcode Link

Problem Description

The problem asks you to find the number of distinct substrings in a given string text that can be formed by concatenating a string with itself.

In other words, you need to count how many unique substrings have the pattern a + a, where a is some non-empty string. This means the substring can be split into two identical halves.

For example:

  • In the string "abcabc", the substring "abcabc" itself is an echo substring because it equals "abc" + "abc"
  • In the string "aaaa", substrings like "aa" (which is "a" + "a"), "aaaa" (which is "aa" + "aa"), are echo substrings

The key requirements are:

  1. The substrings must be non-empty
  2. Each substring must be able to be split into two identical consecutive parts
  3. Count only distinct echo substrings (if the same echo substring appears multiple times at different positions, count it only once)

The solution uses rolling hash to efficiently compare whether two halves of a substring are identical. It iterates through all possible even-length substrings (since echo substrings must have even length), checks if their two halves match using hash values, and stores the unique echo substrings in a set to avoid counting duplicates.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

To find substrings that can be written as a + a, we need to check if any substring can be split into two identical halves. The straightforward approach would be to:

  1. Generate all possible substrings
  2. Check if each substring has even length (odd-length strings can't be split into two equal parts)
  3. Split each even-length substring in half and compare if both halves are identical

However, directly comparing strings character by character for every possible substring would be inefficient, especially for longer strings. This is where the key insight comes in: we can use rolling hash to speed up string comparisons.

Rolling hash allows us to compute a hash value for any substring in O(1) time after preprocessing. The idea is:

  • Precompute hash values for all prefixes of the string
  • For any substring from position i to j, we can calculate its hash value using the prefix hashes
  • If two substrings have the same hash value, they are likely identical (with very high probability when using a good hash function)

The algorithm flow becomes:

  1. For each possible starting position i in the string
  2. For each possible ending position j that makes the substring length even (by incrementing j by 2 each time)
  3. Find the middle point k of the substring from i to j
  4. Use hash values to quickly check if substring [i, k] equals substring [k+1, j]
  5. If they match, we found an echo substring - add its hash to a set to track distinct ones

Using a set to store the hash values of valid echo substrings automatically handles the "distinct" requirement, as duplicate echo substrings will have the same hash and won't be counted twice.

Learn more about Trie patterns.

Solution Approach

The solution implements a rolling hash algorithm to efficiently find all distinct echo substrings. Here's the step-by-step implementation:

1. Rolling Hash Setup

base = 131
mod = int(1e9) + 7
h = [0] * (n + 10)  # Prefix hash array
p = [1] * (n + 10)  # Powers of base array
  • We use base = 131 as the base for our polynomial hash function
  • mod = 10^9 + 7 to prevent integer overflow
  • h[i] stores the hash value of the prefix text[0...i-1]
  • p[i] stores base^i % mod for quick computation

2. Preprocessing - Computing Prefix Hashes

for i, c in enumerate(text):
    t = ord(c) - ord('a') + 1
    h[i + 1] = (h[i] * base) % mod + t
    p[i + 1] = (p[i] * base) % mod
  • Convert each character to a number (1-26 for 'a'-'z')
  • Build the prefix hash array using the formula: h[i+1] = h[i] * base + text[i]
  • Precompute powers of base for later use

3. Hash Extraction Function

def get(l, r):
    return (h[r] - h[l - 1] * p[r - l + 1]) % mod
  • This function extracts the hash value of substring text[l-1...r-1] in O(1) time
  • Formula: hash(s[l...r]) = (h[r] - h[l-1] * base^(r-l+1)) % mod
  • This works by removing the contribution of the prefix before position l

4. Finding Echo Substrings

vis = set()
for i in range(n - 1):
    for j in range(i + 1, n, 2):
        k = (i + j) >> 1
        a = get(i + 1, k + 1)
        b = get(k + 2, j + 1)
        if a == b:
            vis.add(a)
  • i is the starting index (0-indexed in the loop, but 1-indexed for the get function)
  • j is the ending index, incremented by 2 to ensure even-length substrings
  • k = (i + j) >> 1 finds the middle point (using bit shift for division by 2)
  • Compare hash values of the first half [i+1, k+1] and second half [k+2, j+1]
  • If hashes match, the substring is an echo substring - add its hash to the set
  • Using a set automatically handles duplicates

5. Return Result

return len(vis)
  • The size of the set gives us the count of distinct echo substrings

The time complexity is O(nΒ²) for checking all possible even-length substrings, with O(1) hash comparisons. The space complexity is O(n) for storing the hash arrays and the set of distinct echo substrings.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with the string text = "abcabc".

Step 1: Initialize Rolling Hash

  • n = 6 (length of "abcabc")
  • Initialize arrays: h = [0, 0, 0, 0, 0, 0, 0] and p = [1, 0, 0, 0, 0, 0, 0]
  • base = 131, mod = 10^9 + 7

Step 2: Build Prefix Hash Array

  • For 'a' (index 0): t = 1, h[1] = 0 * 131 + 1 = 1, p[1] = 131
  • For 'b' (index 1): t = 2, h[2] = 1 * 131 + 2 = 133, p[2] = 131^2
  • For 'c' (index 2): t = 3, h[3] = 133 * 131 + 3 = 17426, p[3] = 131^3
  • For 'a' (index 3): t = 1, h[4] = 17426 * 131 + 1 = 2282807, p[4] = 131^4
  • For 'b' (index 4): t = 2, h[5] = 2282807 * 131 + 2 = 299047719, p[5] = 131^5
  • For 'c' (index 5): t = 3, h[6] = 299047719 * 131 + 3 = (computed with mod)

Step 3: Find Echo Substrings

Let's trace through some iterations:

  • i = 0 (starting at 'a'):

    • j = 1: Substring "ab" (length 2)

      • Middle point: k = (0 + 1) >> 1 = 0
      • First half: get(1, 1) = hash of "a" = 1
      • Second half: get(2, 2) = hash of "b" = 2
      • 1 β‰  2, not an echo substring
    • j = 3: Substring "abca" (length 4)

      • Middle point: k = (0 + 3) >> 1 = 1
      • First half: get(1, 2) = hash of "ab" = 133
      • Second half: get(3, 4) = hash of "ca" = different hash
      • Not an echo substring
    • j = 5: Substring "abcabc" (length 6)

      • Middle point: k = (0 + 5) >> 1 = 2
      • First half: get(1, 3) = hash of "abc" = 17426
      • Second half: get(4, 6) = hash of "abc" = 17426
      • 17426 = 17426, this IS an echo substring!
      • Add 17426 to vis set
  • i = 2 (starting at 'c'):

    • j = 3: Substring "ca" (length 2)

      • First half: hash of "c", Second half: hash of "a"
      • Different hashes, not an echo substring
    • j = 5: Substring "cabc" (length 4)

      • First half: hash of "ca", Second half: hash of "bc"
      • Different hashes, not an echo substring
  • i = 3 (starting at 'a'):

    • j = 4: Substring "ab" (length 2)
      • This is the same "ab" we saw earlier
      • Not an echo substring (different halves)

Step 4: Return Result

  • vis = {17426} contains only one distinct echo substring ("abcabc" = "abc" + "abc")
  • Return len(vis) = 1

The algorithm correctly identifies that "abcabc" has exactly one distinct echo substring, which is the entire string itself formed by concatenating "abc" with itself.

Solution Implementation

1class Solution:
2    def distinctEchoSubstrings(self, text: str) -> int:
3        """
4        Find the number of distinct non-empty substrings that can be written 
5        as the concatenation of some string with itself (echo substrings).
6      
7        Uses rolling hash technique for efficient substring comparison.
8        """
9        def get_hash(left: int, right: int) -> int:
10            """
11            Calculate the hash value of substring text[left-1:right].
12            Uses 1-indexed positions for convenience with the hash array.
13            """
14            return (hash_values[right] - hash_values[left - 1] * powers[right - left + 1]) % MOD
15      
16        # Initialize constants and variables
17        n = len(text)
18        BASE = 131  # Prime base for rolling hash
19        MOD = 10**9 + 7  # Large prime modulus to avoid collisions
20      
21        # Precompute hash values and powers of base
22        # hash_values[i] stores hash of text[0:i]
23        # powers[i] stores BASE^i mod MOD
24        hash_values = [0] * (n + 10)
25        powers = [1] * (n + 10)
26      
27        # Build hash array and power array
28        for i, char in enumerate(text):
29            # Convert character to number (a=1, b=2, ...)
30            char_value = ord(char) - ord('a') + 1
31            # Calculate cumulative hash
32            hash_values[i + 1] = (hash_values[i] * BASE + char_value) % MOD
33            # Calculate powers of base
34            powers[i + 1] = (powers[i] * BASE) % MOD
35      
36        # Set to store unique echo substring hashes
37        unique_echo_hashes = set()
38      
39        # Check all possible echo substrings
40        # i is the starting position (0-indexed)
41        for i in range(n - 1):
42            # j is the ending position (0-indexed), step by 2 for even length
43            for j in range(i + 1, n, 2):
44                # Calculate midpoint of substring
45                midpoint = (i + j) >> 1
46              
47                # Get hash of first half [i, midpoint]
48                first_half_hash = get_hash(i + 1, midpoint + 1)
49              
50                # Get hash of second half [midpoint+1, j]
51                second_half_hash = get_hash(midpoint + 2, j + 1)
52              
53                # If both halves are identical, it's an echo substring
54                if first_half_hash == second_half_hash:
55                    unique_echo_hashes.add(first_half_hash)
56      
57        return len(unique_echo_hashes)
58
1class Solution {
2    private long[] prefixHash;  // Stores prefix hash values for the string
3    private long[] basePowers;  // Stores powers of the base for hash calculation
4  
5    public int distinctEchoSubstrings(String text) {
6        int textLength = text.length();
7        int hashBase = 131;  // Prime number base for rolling hash
8      
9        // Initialize arrays with extra space to avoid boundary issues
10        prefixHash = new long[textLength + 10];
11        basePowers = new long[textLength + 10];
12        basePowers[0] = 1;  // Base^0 = 1
13      
14        // Build prefix hash array and base powers array
15        // prefixHash[i+1] represents hash of text[0...i]
16        for (int i = 0; i < textLength; ++i) {
17            // Convert character to 1-based value (a=1, b=2, ..., z=26)
18            int charValue = text.charAt(i) - 'a' + 1;
19          
20            // Calculate prefix hash: hash(s[0...i]) = hash(s[0...i-1]) * base + s[i]
21            prefixHash[i + 1] = prefixHash[i] * hashBase + charValue;
22          
23            // Calculate base^(i+1) for future hash calculations
24            basePowers[i + 1] = basePowers[i] * hashBase;
25        }
26      
27        // Set to store unique hash values of echo substrings
28        Set<Long> uniqueEchoHashes = new HashSet<>();
29      
30        // Iterate through all possible starting positions
31        for (int startIndex = 0; startIndex < textLength - 1; ++startIndex) {
32            // Check all odd-length substrings (echo substrings have even length)
33            // endIndex increments by 2 to ensure even-length substrings
34            for (int endIndex = startIndex + 1; endIndex < textLength; endIndex += 2) {
35                // Calculate midpoint of the substring
36                int midPoint = (startIndex + endIndex) >> 1;
37              
38                // Get hash of first half: text[startIndex...midPoint]
39                // Using 1-based indexing for the get method
40                long firstHalfHash = getSubstringHash(startIndex + 1, midPoint + 1);
41              
42                // Get hash of second half: text[midPoint+1...endIndex]
43                long secondHalfHash = getSubstringHash(midPoint + 2, endIndex + 1);
44              
45                // If both halves have the same hash, it's an echo substring
46                if (firstHalfHash == secondHalfHash) {
47                    uniqueEchoHashes.add(firstHalfHash);
48                }
49            }
50        }
51      
52        return uniqueEchoHashes.size();
53    }
54  
55    /**
56     * Calculates the hash value of substring from position i to j (1-based indexing)
57     * Uses the formula: hash(s[i-1...j-1]) = (prefixHash[j] - prefixHash[i-1] * base^(j-i+1))
58     * 
59     * @param i Starting position (1-based)
60     * @param j Ending position (1-based)
61     * @return Hash value of the substring
62     */
63    private long getSubstringHash(int i, int j) {
64        return prefixHash[j] - prefixHash[i - 1] * basePowers[j - i + 1];
65    }
66}
67
1typedef unsigned long long ull;
2
3class Solution {
4public:
5    int distinctEchoSubstrings(string text) {
6        int n = text.size();
7        const int BASE = 131;  // Base for polynomial rolling hash
8      
9        // power[i] stores BASE^i for efficient hash computation
10        vector<ull> power(n + 10);
11        // prefixHash[i] stores the hash value of text[0...i-1]
12        vector<ull> prefixHash(n + 10);
13      
14        // Initialize power array and compute prefix hashes
15        power[0] = 1;
16        for (int i = 0; i < n; ++i) {
17            // Convert character to 1-based value (a=1, b=2, ...)
18            int charValue = text[i] - 'a' + 1;
19            power[i + 1] = power[i] * BASE;
20            prefixHash[i + 1] = prefixHash[i] * BASE + charValue;
21        }
22      
23        // Set to store unique hash values of echo substrings
24        unordered_set<ull> uniqueEchoHashes;
25      
26        // Iterate through all possible echo substrings
27        // i is the starting index (0-based)
28        for (int i = 0; i < n - 1; ++i) {
29            // j is the ending index (0-based), increment by 2 to ensure even length
30            for (int j = i + 1; j < n; j += 2) {
31                // Calculate midpoint to split the substring into two equal halves
32                int mid = (i + j) >> 1;
33              
34                // Get hash of first half: text[i...mid]
35                ull firstHalfHash = getSubstringHash(i + 1, mid + 1, power, prefixHash);
36                // Get hash of second half: text[mid+1...j]
37                ull secondHalfHash = getSubstringHash(mid + 2, j + 1, power, prefixHash);
38              
39                // If both halves are identical, we found an echo substring
40                if (firstHalfHash == secondHalfHash) {
41                    uniqueEchoHashes.insert(firstHalfHash);
42                }
43            }
44        }
45      
46        return uniqueEchoHashes.size();
47    }
48
49private:
50    // Compute hash value of substring text[left-1...right-1] using 1-based indexing
51    ull getSubstringHash(int left, int right, vector<ull>& power, vector<ull>& prefixHash) {
52        return prefixHash[right] - prefixHash[left - 1] * power[right - left + 1];
53    }
54};
55
1type ull = bigint;
2
3function distinctEchoSubstrings(text: string): number {
4    const n = text.length;
5    const BASE = 131n;  // Base for polynomial rolling hash
6  
7    // power[i] stores BASE^i for efficient hash computation
8    const power: ull[] = new Array(n + 10);
9    // prefixHash[i] stores the hash value of text[0...i-1]
10    const prefixHash: ull[] = new Array(n + 10);
11  
12    // Initialize power array and compute prefix hashes
13    power[0] = 1n;
14    prefixHash[0] = 0n;
15  
16    for (let i = 0; i < n; i++) {
17        // Convert character to 1-based value (a=1, b=2, ...)
18        const charValue = BigInt(text.charCodeAt(i) - 'a'.charCodeAt(0) + 1);
19        power[i + 1] = power[i] * BASE;
20        prefixHash[i + 1] = prefixHash[i] * BASE + charValue;
21    }
22  
23    // Set to store unique hash values of echo substrings
24    const uniqueEchoHashes = new Set<string>();
25  
26    // Iterate through all possible echo substrings
27    // i is the starting index (0-based)
28    for (let i = 0; i < n - 1; i++) {
29        // j is the ending index (0-based), increment by 2 to ensure even length
30        for (let j = i + 1; j < n; j += 2) {
31            // Calculate midpoint to split the substring into two equal halves
32            const mid = (i + j) >> 1;
33          
34            // Get hash of first half: text[i...mid]
35            const firstHalfHash = getSubstringHash(i + 1, mid + 1, power, prefixHash);
36            // Get hash of second half: text[mid+1...j]
37            const secondHalfHash = getSubstringHash(mid + 2, j + 1, power, prefixHash);
38          
39            // If both halves are identical, we found an echo substring
40            if (firstHalfHash === secondHalfHash) {
41                // Convert bigint to string for Set storage
42                uniqueEchoHashes.add(firstHalfHash.toString());
43            }
44        }
45    }
46  
47    return uniqueEchoHashes.size;
48}
49
50// Compute hash value of substring text[left-1...right-1] using 1-based indexing
51function getSubstringHash(left: number, right: number, power: ull[], prefixHash: ull[]): ull {
52    return prefixHash[right] - prefixHash[left - 1] * power[right - left + 1];
53}
54

Time and Space Complexity

Time Complexity: O(nΒ²)

The algorithm consists of two main parts:

  1. Preprocessing phase: Computing the hash array h and power array p takes O(n) time, where n is the length of the input string.

  2. Main computation phase: The nested loops iterate through all possible starting positions i (from 0 to n-1) and examine substrings of even length starting at position i. For each starting position i, the inner loop runs approximately n/2 times in the worst case (when j goes from i+1 to n with step size 2). This gives us O(n) * O(n/2) = O(nΒ²) iterations. Inside the loops, the get function performs constant time operations using the precomputed hash values.

Therefore, the overall time complexity is O(n) + O(nΒ²) = O(nΒ²).

Space Complexity: O(n)

The space usage breaks down as follows:

  1. Hash array h: O(n + 10) = O(n) space
  2. Power array p: O(n + 10) = O(n) space
  3. Set vis to store unique hash values: In the worst case, this could store O(nΒ²) different substrings, but since we're only storing hash values of echo substrings (substrings that can be split into two identical halves), the maximum number of unique echo substrings is bounded by O(nΒ²). However, in practice, the set typically stores much fewer elements.

The dominant space usage comes from the auxiliary arrays and the set, giving us an overall space complexity of O(n) for the auxiliary arrays plus potentially O(nΒ²) for the set in the worst case, resulting in O(nΒ²) worst-case space complexity. However, if we consider only the additional space used beyond the input, the practical space complexity is often closer to O(n) due to the limited number of actual echo substrings.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Hash Collision Leading to False Positives

The most critical pitfall in this rolling hash solution is that hash collisions can occur - two different strings might produce the same hash value, leading to incorrect counting of echo substrings.

Problem Example:

# Even with a good hash function, collisions can happen
# String "ab" and "xy" might hash to the same value
# This would incorrectly identify "abxy" as an echo substring

Solution: Use double hashing or verify matches with actual string comparison:

def distinctEchoSubstrings(self, text: str) -> int:
    # ... (hash setup code remains the same)
  
    unique_echo_substrings = set()  # Store actual substrings, not hashes
  
    for i in range(n - 1):
        for j in range(i + 1, n, 2):
            midpoint = (i + j) >> 1
          
            first_half_hash = get_hash(i + 1, midpoint + 1)
            second_half_hash = get_hash(midpoint + 2, j + 1)
          
            if first_half_hash == second_half_hash:
                # Verify with actual string comparison to avoid false positives
                first_half = text[i:midpoint + 1]
                second_half = text[midpoint + 1:j + 1]
              
                if first_half == second_half:
                    unique_echo_substrings.add(text[i:j + 1])
  
    return len(unique_echo_substrings)

2. Integer Overflow in Hash Calculation

Even with modulo operations, intermediate calculations can overflow, especially hash_values[left - 1] * powers[right - left + 1].

Problem Example:

# This calculation can overflow before applying modulo
(hash_values[left - 1] * powers[right - left + 1]) % MOD

Solution: Apply modulo at each step and handle negative values properly:

def get_hash(left: int, right: int) -> int:
    # Apply modulo to each multiplication step
    hash_diff = hash_values[right] - (hash_values[left - 1] * powers[right - left + 1]) % MOD
    # Handle negative values correctly
    return (hash_diff % MOD + MOD) % MOD

3. Off-by-One Errors in Index Conversion

Mixing 0-indexed loop variables with 1-indexed hash array access is error-prone.

Problem Example:

# Easy to confuse these indices:
for i in range(n - 1):  # i is 0-indexed
    get_hash(i + 1, k + 1)  # get_hash expects 1-indexed positions

Solution: Use consistent indexing throughout or add clear documentation:

def distinctEchoSubstrings(self, text: str) -> int:
    # Alternative: Use 0-indexed throughout
    def get_hash_0indexed(left: int, right: int) -> int:
        """Get hash of text[left:right+1] using 0-based indexing"""
        if left == 0:
            return hash_values[right + 1]
        return (hash_values[right + 1] - hash_values[left] * powers[right - left + 1]) % MOD
  
    # Now the loop is cleaner:
    for start in range(n - 1):
        for end in range(start + 1, n, 2):
            mid = (start + end) // 2
            first_half = get_hash_0indexed(start, mid)
            second_half = get_hash_0indexed(mid + 1, end)
            # ...

4. Using Non-Prime Base or Small Modulus

Using a composite number as base or a small modulus increases collision probability.

Problem Example:

BASE = 256  # Power of 2 - higher collision probability
MOD = 10007  # Too small for large strings

Solution: Always use large prime numbers:

# Good choices for rolling hash
BASE = 131  # or 137, 139, 149 (prime numbers)
MOD = 10**9 + 7  # or 10**9 + 9, 2**61 - 1 (large primes)

# For extra security, use multiple hash functions
BASE1, MOD1 = 131, 10**9 + 7
BASE2, MOD2 = 137, 10**9 + 9
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Depth first search is equivalent to which of the tree traversal order?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More