Facebook Pixel

1044. Longest Duplicate Substring

HardStringBinary SearchSuffix ArraySliding WindowHash FunctionRolling Hash
Leetcode Link

Problem Description

You are given a string s and need to find the longest substring that appears at least twice in the string. These duplicate substrings can overlap with each other.

A duplicated substring is a contiguous sequence of characters from s that occurs 2 or more times anywhere in the string. For example, in the string "banana", the substring "ana" appears twice (at positions 1-3 and 3-5), and these occurrences overlap at position 3.

Your task is to return any duplicated substring that has the maximum possible length. If no substring appears more than once in s, return an empty string "".

The solution uses binary search on the length of the potential duplicate substring. The check function verifies if there exists a duplicate substring of a given length l by iterating through all possible substrings of that length and using a set to track which substrings have been seen. If a substring is encountered that's already in the set, it means we've found a duplicate.

The binary search works by:

  • Setting the search range from 0 to the length of the string
  • For each middle value mid, checking if a duplicate substring of length mid exists
  • If found, searching for longer duplicates (moving left pointer up)
  • If not found, searching for shorter duplicates (moving right pointer down)
  • Keeping track of the longest valid duplicate found so far in ans
Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that if a substring of length k appears twice in the string, then we know the answer is at least k characters long. However, there might be an even longer duplicate substring of length k+1, k+2, and so on. We want to find the maximum such length.

A brute force approach would check all possible substrings, but this would be inefficient. Instead, we can observe that the problem has a monotonic property: if there exists a duplicate substring of length k, then there must also exist duplicate substrings of all lengths from 1 to k-1. This is because any substring of a duplicate substring is also a duplicate substring.

This monotonic property suggests using binary search on the answer. We can search for the maximum length L such that there exists a duplicate substring of length L.

For each candidate length during binary search, we need to verify if any substring of that length appears twice. We can do this efficiently by:

  1. Generating all substrings of the given length
  2. Using a hash set to track which substrings we've seen
  3. If we encounter a substring that's already in the set, we've found a duplicate

The binary search narrows down the search space from O(n) possible lengths to O(log n) checks. For each check, we examine O(n) substrings, and comparing strings takes O(L) time where L is the current length being checked. This gives us a more efficient solution than checking all possible substring pairs.

The approach elegantly combines the efficiency of binary search with the simplicity of hash-based duplicate detection, making it practical even for longer strings.

Learn more about Binary Search and Sliding Window patterns.

Solution Approach

The solution implements a binary search combined with a hash set-based duplicate detection function.

Binary Search Setup:

  • Initialize left = 0 and right = n (length of string) as the search boundaries for possible substring lengths
  • Use ans to keep track of the longest duplicate substring found so far

Binary Search Logic: The main loop continues while left < right:

  1. Calculate mid = (left + right + 1) >> 1 (equivalent to (left + right + 1) // 2)
    • The +1 ensures we round up to avoid infinite loops when left and right are adjacent
  2. Call check(mid) to verify if a duplicate substring of length mid exists
  3. Update the answer: ans = t or ans keeps the non-empty result (either the new found substring t or the previous ans)
  4. Adjust search boundaries:
    • If a duplicate of length mid is found, search for longer substrings: left = mid
    • Otherwise, search for shorter substrings: right = mid - 1

The check Function: This helper function determines if any substring of length l appears at least twice:

  1. Initialize an empty set vis to track seen substrings
  2. Iterate through all possible starting positions from 0 to n - l
  3. For each position i, extract substring t = s[i : i + l]
  4. If t is already in vis, we found a duplicate - return it immediately
  5. Otherwise, add t to vis and continue
  6. If no duplicate is found after checking all positions, return empty string

Time Complexity Analysis:

  • Binary search performs O(log n) iterations
  • Each check call examines O(n) substrings
  • String slicing and hashing takes O(L) time for length L
  • Overall: O(n * L * log n) where L is the average substring length checked

The algorithm efficiently narrows down the search space using binary search while maintaining simplicity through straightforward string comparison using Python's built-in set operations.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's trace through the algorithm with the string s = "banana".

Initial Setup:

  • String: "banana" (length n = 6)
  • Binary search range: left = 0, right = 6
  • Answer: ans = ""

Iteration 1:

  • mid = (0 + 6 + 1) >> 1 = 3
  • Check if any substring of length 3 appears twice:
    • i=0: "ban" → add to vis = {"ban"}
    • i=1: "ana" → add to vis = {"ban", "ana"}
    • i=2: "nan" → add to vis = {"ban", "ana", "nan"}
    • i=3: "ana" → already in vis! Return "ana"
  • Found duplicate "ana", so ans = "ana"
  • Search for longer: left = 3, right = 6

Iteration 2:

  • mid = (3 + 6 + 1) >> 1 = 5
  • Check if any substring of length 5 appears twice:
    • i=0: "banan" → add to vis = {"banan"}
    • i=1: "anana" → add to vis = {"banan", "anana"}
  • No duplicates found, return ""
  • No duplicate found, search for shorter: left = 3, right = 4

Iteration 3:

  • mid = (3 + 4 + 1) >> 1 = 4
  • Check if any substring of length 4 appears twice:
    • i=0: "bana" → add to vis = {"bana"}
    • i=1: "anan" → add to vis = {"bana", "anan"}
    • i=2: "nana" → add to vis = {"bana", "anan", "nana"}
  • No duplicates found, return ""
  • No duplicate found, search for shorter: left = 3, right = 3

Termination:

  • left = right = 3, loop ends
  • Return ans = "ana"

The algorithm correctly identifies "ana" as the longest duplicated substring. It appears at positions [1:4] and [3:6], with an overlap at position 3. Through binary search, we efficiently narrowed down from checking all possible lengths to finding the optimal length in just 3 iterations.

Solution Implementation

1class Solution:
2    def longestDupSubstring(self, s: str) -> str:
3        """
4        Find the longest duplicate substring in a given string.
5        Uses binary search on the length of substring.
6        """
7
8        def has_duplicate_of_length(length: int) -> str:
9            """Check if a duplicate substring of given length exists."""
10            seen_substrings = set()
11
12            for start_idx in range(len(s) - length + 1):
13                current_substring = s[start_idx : start_idx + length]
14                if current_substring in seen_substrings:
15                    return current_substring
16                seen_substrings.add(current_substring)
17
18            return ''
19
20        def feasible(length: int) -> bool:
21            """
22            Returns True if NO duplicate of this length exists.
23            This inverts the condition so we can find "first true" = first length with no duplicate.
24            """
25            return has_duplicate_of_length(length) == ''
26
27        n = len(s)
28        left, right = 1, n - 1
29        first_true_index = -1
30        longest_duplicate = ''
31
32        # Binary search template: find first length where no duplicate exists
33        while left <= right:
34            mid = (left + right) // 2
35            if feasible(mid):
36                first_true_index = mid
37                right = mid - 1  # Search for smaller length with no duplicate
38            else:
39                # Duplicate exists at this length, record it and search longer
40                longest_duplicate = has_duplicate_of_length(mid)
41                left = mid + 1
42
43        return longest_duplicate
44
1class Solution {
2    private long[] basePowers;
3    private long[] prefixHash;
4
5    public String longestDupSubstring(String s) {
6        final int BASE = 131;
7        int n = s.length();
8
9        // Precompute powers and prefix hash
10        basePowers = new long[n + 10];
11        prefixHash = new long[n + 10];
12        basePowers[0] = 1;
13        for (int i = 0; i < n; i++) {
14            basePowers[i + 1] = basePowers[i] * BASE;
15            prefixHash[i + 1] = prefixHash[i] * BASE + s.charAt(i);
16        }
17
18        // Binary search template
19        int left = 1;
20        int right = n - 1;
21        int firstTrueIndex = -1;
22        String longestDuplicate = "";
23
24        // Find first length where no duplicate exists (feasible = true means no duplicate)
25        while (left <= right) {
26            int mid = left + (right - left) / 2;
27            String duplicateFound = checkDuplicateOfLength(s, mid);
28
29            if (duplicateFound.isEmpty()) {
30                // No duplicate at this length (feasible = true)
31                firstTrueIndex = mid;
32                right = mid - 1;
33            } else {
34                // Duplicate exists, record it and search for longer
35                longestDuplicate = duplicateFound;
36                left = mid + 1;
37            }
38        }
39
40        return longestDuplicate;
41    }
42
43    private String checkDuplicateOfLength(String s, int length) {
44        int n = s.length();
45        Set<Long> seenHashes = new HashSet<>();
46
47        for (int startPos = 1; startPos + length - 1 <= n; startPos++) {
48            int endPos = startPos + length - 1;
49            long hashValue = prefixHash[endPos] - prefixHash[startPos - 1] * basePowers[endPos - startPos + 1];
50
51            if (seenHashes.contains(hashValue)) {
52                return s.substring(startPos - 1, endPos);
53            }
54            seenHashes.add(hashValue);
55        }
56
57        return "";
58    }
59}
60
1typedef unsigned long long ULL;
2
3class Solution {
4public:
5    ULL powers[30010];
6    ULL prefixHash[30010];
7
8    string longestDupSubstring(string s) {
9        const int BASE = 131;
10        int n = s.size();
11
12        // Precompute powers and prefix hash
13        powers[0] = 1;
14        for (int i = 0; i < n; ++i) {
15            powers[i + 1] = powers[i] * BASE;
16            prefixHash[i + 1] = prefixHash[i] * BASE + s[i];
17        }
18
19        // Binary search template
20        int left = 1;
21        int right = n - 1;
22        int firstTrueIndex = -1;
23        string longestDuplicate = "";
24
25        // Find first length where no duplicate exists
26        while (left <= right) {
27            int mid = left + (right - left) / 2;
28            string duplicateFound = findDuplicateOfLength(s, mid);
29
30            if (duplicateFound.empty()) {
31                // No duplicate at this length (feasible = true)
32                firstTrueIndex = mid;
33                right = mid - 1;
34            } else {
35                // Duplicate exists, record it and search for longer
36                longestDuplicate = duplicateFound;
37                left = mid + 1;
38            }
39        }
40
41        return longestDuplicate;
42    }
43
44private:
45    string findDuplicateOfLength(string& s, int targetLength) {
46        int n = s.size();
47        unordered_set<ULL> seenHashes;
48
49        for (int startIdx = 1; startIdx + targetLength - 1 <= n; ++startIdx) {
50            int endIdx = startIdx + targetLength - 1;
51            ULL currentHash = prefixHash[endIdx] - prefixHash[startIdx - 1] * powers[endIdx - startIdx + 1];
52
53            if (seenHashes.count(currentHash)) {
54                return s.substr(startIdx - 1, targetLength);
55            }
56            seenHashes.insert(currentHash);
57        }
58
59        return "";
60    }
61};
62
1type ULL = bigint;
2
3const powers: ULL[] = new Array(30010);
4const prefixHash: ULL[] = new Array(30010);
5
6function longestDupSubstring(s: string): string {
7    const BASE: bigint = 131n;
8    const n: number = s.length;
9
10    // Precompute powers and prefix hash
11    powers[0] = 1n;
12    for (let i = 0; i < n; ++i) {
13        powers[i + 1] = powers[i] * BASE;
14        prefixHash[i + 1] = prefixHash[i] * BASE + BigInt(s.charCodeAt(i));
15    }
16
17    // Binary search template
18    let left = 1;
19    let right = n - 1;
20    let firstTrueIndex = -1;
21    let longestDuplicate = "";
22
23    // Find first length where no duplicate exists
24    while (left <= right) {
25        const mid = Math.floor((left + right) / 2);
26        const duplicateFound = findDuplicateOfLength(s, mid);
27
28        if (duplicateFound === "") {
29            // No duplicate at this length (feasible = true)
30            firstTrueIndex = mid;
31            right = mid - 1;
32        } else {
33            // Duplicate exists, record it and search for longer
34            longestDuplicate = duplicateFound;
35            left = mid + 1;
36        }
37    }
38
39    return longestDuplicate;
40}
41
42function findDuplicateOfLength(s: string, targetLength: number): string {
43    const n: number = s.length;
44    const seenHashes: Set<ULL> = new Set();
45
46    for (let startIdx = 1; startIdx + targetLength - 1 <= n; ++startIdx) {
47        const endIdx: number = startIdx + targetLength - 1;
48        const currentHash: ULL = prefixHash[endIdx] - prefixHash[startIdx - 1] * powers[endIdx - startIdx + 1];
49
50        if (seenHashes.has(currentHash)) {
51            return s.substring(startIdx - 1, startIdx - 1 + targetLength);
52        }
53        seenHashes.add(currentHash);
54    }
55
56    return "";
57}
58

Time and Space Complexity

Time Complexity: O(n² * log n)

The algorithm uses binary search to find the longest duplicate substring. The binary search runs for O(log n) iterations where n is the length of the string. In each iteration, the check function is called.

The check function iterates through all possible substrings of length l, which takes O(n - l + 1) ≈ O(n) iterations. For each substring, it performs:

  • String slicing s[i : i + l] which takes O(l) time
  • Set membership check and insertion which takes O(l) time on average (due to string hashing)

Therefore, the check function has time complexity O(n * l). In the worst case, l can be up to n/2 (when the longest duplicate substring is half the string length), making each check call O(n²) in the worst case.

Overall time complexity: O(log n) * O(n²) = O(n² * log n)

Space Complexity: O(n²)

The space complexity is dominated by the vis set in the check function. In the worst case:

  • The set can store up to O(n) different substrings
  • Each substring can have length up to O(n)
  • Therefore, the total space used can be O(n²)

Additionally, the ans variable stores a string of length up to O(n), but this doesn't change the overall space complexity.

Overall space complexity: O(n²)

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Using Wrong Binary Search Template Variant

A common mistake is using while left < right with left = mid when a duplicate is found:

# WRONG - different template variant
while left < right:
    mid = (left + right + 1) >> 1
    if has_duplicate(mid):
        left = mid
    else:
        right = mid - 1
return result

Correct template:

first_true_index = -1
while left <= right:
    mid = (left + right) // 2
    if feasible(mid):  # no duplicate exists
        first_true_index = mid
        right = mid - 1
    else:  # duplicate exists, record and search longer
        longest_duplicate = has_duplicate_of_length(mid)
        left = mid + 1
return longest_duplicate

For "find maximum length" problems, invert the feasible condition to find "first length where no duplicate exists".

2. Memory Limit Exceeded with Large Strings

Storing full substring strings in a set consumes excessive memory. Use rolling hash (Rabin-Karp algorithm) instead.

3. Off-by-One Error in Substring Extraction

Using range(n - length) instead of range(n - length + 1) misses the last valid substring.

for start_idx in range(string_length - length + 1):  # Correct

4. Hash Collision Issues (When Using Rolling Hash)

Hash collisions can lead to false positives. Perform secondary verification or use multiple hash functions when a hash match is found.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which of the following is a min heap?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More