1044. Longest Duplicate Substring
Problem Description
You are given a string s and need to find the longest substring that appears at least twice in the string. These duplicate substrings can overlap with each other.
A duplicated substring is a contiguous sequence of characters from s that occurs 2 or more times anywhere in the string. For example, in the string "banana", the substring "ana" appears twice (at positions 1-3 and 3-5), and these occurrences overlap at position 3.
Your task is to return any duplicated substring that has the maximum possible length. If no substring appears more than once in s, return an empty string "".
The solution uses binary search on the length of the potential duplicate substring. The check function verifies if there exists a duplicate substring of a given length l by iterating through all possible substrings of that length and using a set to track which substrings have been seen. If a substring is encountered that's already in the set, it means we've found a duplicate.
The binary search works by:
- Setting the search range from 0 to the length of the string
- For each middle value
mid, checking if a duplicate substring of lengthmidexists - If found, searching for longer duplicates (moving
leftpointer up) - If not found, searching for shorter duplicates (moving
rightpointer down) - Keeping track of the longest valid duplicate found so far in
ans
Intuition
The key insight is that if a substring of length k appears twice in the string, then we know the answer is at least k characters long. However, there might be an even longer duplicate substring of length k+1, k+2, and so on. We want to find the maximum such length.
A brute force approach would check all possible substrings, but this would be inefficient. Instead, we can observe that the problem has a monotonic property: if there exists a duplicate substring of length k, then there must also exist duplicate substrings of all lengths from 1 to k-1. This is because any substring of a duplicate substring is also a duplicate substring.
This monotonic property suggests using binary search on the answer. We can search for the maximum length L such that there exists a duplicate substring of length L.
For each candidate length during binary search, we need to verify if any substring of that length appears twice. We can do this efficiently by:
- Generating all substrings of the given length
- Using a hash set to track which substrings we've seen
- If we encounter a substring that's already in the set, we've found a duplicate
The binary search narrows down the search space from O(n) possible lengths to O(log n) checks. For each check, we examine O(n) substrings, and comparing strings takes O(L) time where L is the current length being checked. This gives us a more efficient solution than checking all possible substring pairs.
The approach elegantly combines the efficiency of binary search with the simplicity of hash-based duplicate detection, making it practical even for longer strings.
Learn more about Binary Search and Sliding Window patterns.
Solution Approach
The solution implements a binary search combined with a hash set-based duplicate detection function.
Binary Search Setup:
- Initialize
left = 0andright = n(length of string) as the search boundaries for possible substring lengths - Use
ansto keep track of the longest duplicate substring found so far
Binary Search Logic:
The main loop continues while left < right:
- Calculate
mid = (left + right + 1) >> 1(equivalent to(left + right + 1) // 2)- The
+1ensures we round up to avoid infinite loops whenleftandrightare adjacent
- The
- Call
check(mid)to verify if a duplicate substring of lengthmidexists - Update the answer:
ans = t or anskeeps the non-empty result (either the new found substringtor the previousans) - Adjust search boundaries:
- If a duplicate of length
midis found, search for longer substrings:left = mid - Otherwise, search for shorter substrings:
right = mid - 1
- If a duplicate of length
The check Function:
This helper function determines if any substring of length l appears at least twice:
- Initialize an empty set
visto track seen substrings - Iterate through all possible starting positions from
0ton - l - For each position
i, extract substringt = s[i : i + l] - If
tis already invis, we found a duplicate - return it immediately - Otherwise, add
ttovisand continue - If no duplicate is found after checking all positions, return empty string
Time Complexity Analysis:
- Binary search performs
O(log n)iterations - Each
checkcall examinesO(n)substrings - String slicing and hashing takes
O(L)time for lengthL - Overall:
O(n * L * log n)whereLis the average substring length checked
The algorithm efficiently narrows down the search space using binary search while maintaining simplicity through straightforward string comparison using Python's built-in set operations.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's trace through the algorithm with the string s = "banana".
Initial Setup:
- String: "banana" (length n = 6)
- Binary search range: left = 0, right = 6
- Answer: ans = ""
Iteration 1:
- mid = (0 + 6 + 1) >> 1 = 3
- Check if any substring of length 3 appears twice:
- i=0: "ban" → add to vis = {"ban"}
- i=1: "ana" → add to vis = {"ban", "ana"}
- i=2: "nan" → add to vis = {"ban", "ana", "nan"}
- i=3: "ana" → already in vis! Return "ana"
- Found duplicate "ana", so ans = "ana"
- Search for longer: left = 3, right = 6
Iteration 2:
- mid = (3 + 6 + 1) >> 1 = 5
- Check if any substring of length 5 appears twice:
- i=0: "banan" → add to vis = {"banan"}
- i=1: "anana" → add to vis = {"banan", "anana"}
- No duplicates found, return ""
- No duplicate found, search for shorter: left = 3, right = 4
Iteration 3:
- mid = (3 + 4 + 1) >> 1 = 4
- Check if any substring of length 4 appears twice:
- i=0: "bana" → add to vis = {"bana"}
- i=1: "anan" → add to vis = {"bana", "anan"}
- i=2: "nana" → add to vis = {"bana", "anan", "nana"}
- No duplicates found, return ""
- No duplicate found, search for shorter: left = 3, right = 3
Termination:
- left = right = 3, loop ends
- Return ans = "ana"
The algorithm correctly identifies "ana" as the longest duplicated substring. It appears at positions [1:4] and [3:6], with an overlap at position 3. Through binary search, we efficiently narrowed down from checking all possible lengths to finding the optimal length in just 3 iterations.
Solution Implementation
1class Solution:
2 def longestDupSubstring(self, s: str) -> str:
3 """
4 Find the longest duplicate substring in a given string.
5 Uses binary search on the length of substring.
6 """
7
8 def has_duplicate_of_length(length: int) -> str:
9 """Check if a duplicate substring of given length exists."""
10 seen_substrings = set()
11
12 for start_idx in range(len(s) - length + 1):
13 current_substring = s[start_idx : start_idx + length]
14 if current_substring in seen_substrings:
15 return current_substring
16 seen_substrings.add(current_substring)
17
18 return ''
19
20 def feasible(length: int) -> bool:
21 """
22 Returns True if NO duplicate of this length exists.
23 This inverts the condition so we can find "first true" = first length with no duplicate.
24 """
25 return has_duplicate_of_length(length) == ''
26
27 n = len(s)
28 left, right = 1, n - 1
29 first_true_index = -1
30 longest_duplicate = ''
31
32 # Binary search template: find first length where no duplicate exists
33 while left <= right:
34 mid = (left + right) // 2
35 if feasible(mid):
36 first_true_index = mid
37 right = mid - 1 # Search for smaller length with no duplicate
38 else:
39 # Duplicate exists at this length, record it and search longer
40 longest_duplicate = has_duplicate_of_length(mid)
41 left = mid + 1
42
43 return longest_duplicate
441class Solution {
2 private long[] basePowers;
3 private long[] prefixHash;
4
5 public String longestDupSubstring(String s) {
6 final int BASE = 131;
7 int n = s.length();
8
9 // Precompute powers and prefix hash
10 basePowers = new long[n + 10];
11 prefixHash = new long[n + 10];
12 basePowers[0] = 1;
13 for (int i = 0; i < n; i++) {
14 basePowers[i + 1] = basePowers[i] * BASE;
15 prefixHash[i + 1] = prefixHash[i] * BASE + s.charAt(i);
16 }
17
18 // Binary search template
19 int left = 1;
20 int right = n - 1;
21 int firstTrueIndex = -1;
22 String longestDuplicate = "";
23
24 // Find first length where no duplicate exists (feasible = true means no duplicate)
25 while (left <= right) {
26 int mid = left + (right - left) / 2;
27 String duplicateFound = checkDuplicateOfLength(s, mid);
28
29 if (duplicateFound.isEmpty()) {
30 // No duplicate at this length (feasible = true)
31 firstTrueIndex = mid;
32 right = mid - 1;
33 } else {
34 // Duplicate exists, record it and search for longer
35 longestDuplicate = duplicateFound;
36 left = mid + 1;
37 }
38 }
39
40 return longestDuplicate;
41 }
42
43 private String checkDuplicateOfLength(String s, int length) {
44 int n = s.length();
45 Set<Long> seenHashes = new HashSet<>();
46
47 for (int startPos = 1; startPos + length - 1 <= n; startPos++) {
48 int endPos = startPos + length - 1;
49 long hashValue = prefixHash[endPos] - prefixHash[startPos - 1] * basePowers[endPos - startPos + 1];
50
51 if (seenHashes.contains(hashValue)) {
52 return s.substring(startPos - 1, endPos);
53 }
54 seenHashes.add(hashValue);
55 }
56
57 return "";
58 }
59}
601typedef unsigned long long ULL;
2
3class Solution {
4public:
5 ULL powers[30010];
6 ULL prefixHash[30010];
7
8 string longestDupSubstring(string s) {
9 const int BASE = 131;
10 int n = s.size();
11
12 // Precompute powers and prefix hash
13 powers[0] = 1;
14 for (int i = 0; i < n; ++i) {
15 powers[i + 1] = powers[i] * BASE;
16 prefixHash[i + 1] = prefixHash[i] * BASE + s[i];
17 }
18
19 // Binary search template
20 int left = 1;
21 int right = n - 1;
22 int firstTrueIndex = -1;
23 string longestDuplicate = "";
24
25 // Find first length where no duplicate exists
26 while (left <= right) {
27 int mid = left + (right - left) / 2;
28 string duplicateFound = findDuplicateOfLength(s, mid);
29
30 if (duplicateFound.empty()) {
31 // No duplicate at this length (feasible = true)
32 firstTrueIndex = mid;
33 right = mid - 1;
34 } else {
35 // Duplicate exists, record it and search for longer
36 longestDuplicate = duplicateFound;
37 left = mid + 1;
38 }
39 }
40
41 return longestDuplicate;
42 }
43
44private:
45 string findDuplicateOfLength(string& s, int targetLength) {
46 int n = s.size();
47 unordered_set<ULL> seenHashes;
48
49 for (int startIdx = 1; startIdx + targetLength - 1 <= n; ++startIdx) {
50 int endIdx = startIdx + targetLength - 1;
51 ULL currentHash = prefixHash[endIdx] - prefixHash[startIdx - 1] * powers[endIdx - startIdx + 1];
52
53 if (seenHashes.count(currentHash)) {
54 return s.substr(startIdx - 1, targetLength);
55 }
56 seenHashes.insert(currentHash);
57 }
58
59 return "";
60 }
61};
621type ULL = bigint;
2
3const powers: ULL[] = new Array(30010);
4const prefixHash: ULL[] = new Array(30010);
5
6function longestDupSubstring(s: string): string {
7 const BASE: bigint = 131n;
8 const n: number = s.length;
9
10 // Precompute powers and prefix hash
11 powers[0] = 1n;
12 for (let i = 0; i < n; ++i) {
13 powers[i + 1] = powers[i] * BASE;
14 prefixHash[i + 1] = prefixHash[i] * BASE + BigInt(s.charCodeAt(i));
15 }
16
17 // Binary search template
18 let left = 1;
19 let right = n - 1;
20 let firstTrueIndex = -1;
21 let longestDuplicate = "";
22
23 // Find first length where no duplicate exists
24 while (left <= right) {
25 const mid = Math.floor((left + right) / 2);
26 const duplicateFound = findDuplicateOfLength(s, mid);
27
28 if (duplicateFound === "") {
29 // No duplicate at this length (feasible = true)
30 firstTrueIndex = mid;
31 right = mid - 1;
32 } else {
33 // Duplicate exists, record it and search for longer
34 longestDuplicate = duplicateFound;
35 left = mid + 1;
36 }
37 }
38
39 return longestDuplicate;
40}
41
42function findDuplicateOfLength(s: string, targetLength: number): string {
43 const n: number = s.length;
44 const seenHashes: Set<ULL> = new Set();
45
46 for (let startIdx = 1; startIdx + targetLength - 1 <= n; ++startIdx) {
47 const endIdx: number = startIdx + targetLength - 1;
48 const currentHash: ULL = prefixHash[endIdx] - prefixHash[startIdx - 1] * powers[endIdx - startIdx + 1];
49
50 if (seenHashes.has(currentHash)) {
51 return s.substring(startIdx - 1, startIdx - 1 + targetLength);
52 }
53 seenHashes.add(currentHash);
54 }
55
56 return "";
57}
58Time and Space Complexity
Time Complexity: O(n² * log n)
The algorithm uses binary search to find the longest duplicate substring. The binary search runs for O(log n) iterations where n is the length of the string. In each iteration, the check function is called.
The check function iterates through all possible substrings of length l, which takes O(n - l + 1) ≈ O(n) iterations. For each substring, it performs:
- String slicing
s[i : i + l]which takesO(l)time - Set membership check and insertion which takes
O(l)time on average (due to string hashing)
Therefore, the check function has time complexity O(n * l). In the worst case, l can be up to n/2 (when the longest duplicate substring is half the string length), making each check call O(n²) in the worst case.
Overall time complexity: O(log n) * O(n²) = O(n² * log n)
Space Complexity: O(n²)
The space complexity is dominated by the vis set in the check function. In the worst case:
- The set can store up to
O(n)different substrings - Each substring can have length up to
O(n) - Therefore, the total space used can be
O(n²)
Additionally, the ans variable stores a string of length up to O(n), but this doesn't change the overall space complexity.
Overall space complexity: O(n²)
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Using Wrong Binary Search Template Variant
A common mistake is using while left < right with left = mid when a duplicate is found:
# WRONG - different template variant while left < right: mid = (left + right + 1) >> 1 if has_duplicate(mid): left = mid else: right = mid - 1 return result
Correct template:
first_true_index = -1 while left <= right: mid = (left + right) // 2 if feasible(mid): # no duplicate exists first_true_index = mid right = mid - 1 else: # duplicate exists, record and search longer longest_duplicate = has_duplicate_of_length(mid) left = mid + 1 return longest_duplicate
For "find maximum length" problems, invert the feasible condition to find "first length where no duplicate exists".
2. Memory Limit Exceeded with Large Strings
Storing full substring strings in a set consumes excessive memory. Use rolling hash (Rabin-Karp algorithm) instead.
3. Off-by-One Error in Substring Extraction
Using range(n - length) instead of range(n - length + 1) misses the last valid substring.
for start_idx in range(string_length - length + 1): # Correct
4. Hash Collision Issues (When Using Rolling Hash)
Hash collisions can lead to false positives. Perform secondary verification or use multiple hash functions when a hash match is found.
Which of the following is a min heap? 
Recommended Readings
https assets algo monster cover_photos Binary_Search svg Binary Search Intuition Binary search is an efficient array search algorithm It works by narrowing down the search range by half each time If you have looked up a word in a physical dictionary you've already used binary search in real life Let's
https assets algo monster cover_photos stack svg Sliding Window Maximum Monotonic Stack We have an array and a sliding window defined by a start index and an end index The sliding window moves from left of the array to right There are always k elements in the window The window
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Don’t Miss This!