1044. Longest Duplicate Substring
Problem Description
The problem requires us to find the longest duplicated substring within a given string s
; these are substrings that appear more than once. A substring is considered a consecutive sequence of characters within the string. The key challenge here is handling the potential overlap of occurrences—this means that the same characters can be part of different duplicated substrings as long as those substrings start at different positions in s
.
For example, in the string "banana", the substring "ana" is duplicated with overlapping occurrences: "banana" contains "ana" starting at the second character and again starting at the fourth character.
If no such substring exists, our function should return an empty string. The desired solution can be any one of the longest duplicated substrings if there are multiple with the same maximum length.
Intuition
The intuition behind the solution involves a [binary search](/problems/binary-search-speedrun)
algorithm combined with a rolling hash
technique or a hash set
to quickly check for duplicate substrings.
Binary search is used to reduce the search space for the length of the longest duplicated substring. Instead of checking every possible length, we can start in the middle of the possible range of lengths and then increase or decrease our search based on whether we find a duplicate of that length.
Hash sets are used to store previously seen substrings of a certain length. As we increase or decrease the length of substrings we are looking at, we can add these to the set or check against it to see if we have seen them before. If a substring is already in the set, it means we've found a duplicate.
The process works as follows:
- We define
left
as the minimum length of the substring andright
as the maximum possible length of a duplicated substring, which isn - 1
(wheren
is the length of the input string). - We use binary search to find the largest
mid
value such thatmid
is the length of a possible duplicated substring. We check if there is a duplicate substring of this length using the helper functioncheck
. - The
check
function iterates through all possible substrings of lengthl
(wherel
is the current midpoint of the binary search), adding them to a hash set or returning one immediately if a duplicate is found.
In this solution, we leverage the efficiency of hash sets and the power of binary search to arrive at an efficient solution.
Learn more about Binary Search and Sliding Window patterns.
Solution Approach
The solution provided is a classic application of binary search with a rolling hash or a hash set to efficiently find the longest duplicated substring. Below is an explanation of the code implementation:
-
- Instead of checking all possible lengths, binary search is used to find the length of the longest duplicate substring. We initialize
left
as 0 andright
asn - 1
, wheren
is the length of the input strings
. - Within a loop, we calculate
mid
as the average ofleft
andright
, rounded up. The>> 1
is a bitwise operation that effectively divides by 2. We are rounding up to ensure the search space reduces on each iteration. - We check if there is a duplicate with the length
mid
using a helper function. If a duplicate exists,left
is set tomid
. If not,right
is set tomid - 1
. - This process continues until
left < right
is no longer true.
- Instead of checking all possible lengths, binary search is used to find the length of the longest duplicate substring. We initialize
-
Helper Function - check:
- This function takes a length
l
and checks if any substring of this length occurs more than once. - We use of a
set
to store substrings hashes, which helps in constant-time lookups to check for duplicates. - The function iterates over all possible substrings of length
l
, creating a substringt
by slicing the input strings
from indexi
to indexi + l
. - If
t
is already in thevis
, we have found our duplicate and return it immediately. Otherwise, we addt
tovis
and continue checking the next substrings. - If no duplicate is found, an empty string is returned.
- This function takes a length
-
Maintaining the Longest Duplicate Substring:
- We maintain the variable
ans
to store the longest duplicate substring found so far. - After each successful check, we update
ans
if the length of the found duplicatet
is greater than the length ofans
.
- We maintain the variable
-
Efficiency:
- By using binary search, we reduce the potential search space from
n(n-1)/2
tolog(n)
, which is a significant improvement, particularly for larger strings. - The use of a
set
to store hashes implies anO(1)
search time for duplicates. However, the creation of substrings and their hashes, done once per substring per iteration, has its own cost.
- By using binary search, we reduce the potential search space from
-
Termination and Result:
- The loop terminates when the binary search space collapses, meaning
left
is equal toright
, which also implies we have either found the longest duplicate substring or determined that no duplicate exists. - The final result is returned as
ans
, which is the longest duplicate found during the binary search process.
- The loop terminates when the binary search space collapses, meaning
This approach is efficient for large strings because it avoids inspecting every possible substring directly. Instead, it quickly narrows down the range of possible lengths and checks for duplicates using a HashSet, considerably reducing the time complexity.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's take a small example to illustrate the solution approach. Consider the string "axaxa":
-
Initial Setup:
- We initialize
left
to 0 andright
to 4 (since the length of "axaxa" is 5, and the max length of a duplicate substring can ben - 1
).
- We initialize
-
First Iteration of Binary Search:
- Calculate
mid
as (0 + 4) / 2, which is 2. - Check for duplicates of length
mid
(2). Here, we find that "ax" is a substring starting at index 0, and it also appears starting at index 2.
- Calculate
-
Update Binary Search Bounds:
- Since we found a duplicate, we update
left
tomid
, making it 2. We keepright
at 4.
- Since we found a duplicate, we update
-
Second Iteration of Binary Search:
- Calculate the new
mid
as (2 + 4) / 2, which is 3. - Check for duplicates of length 3. We see "axa" starting at index 0 and again at index 2, so a duplicate is found.
- Calculate the new
-
Keep Updating Bounds:
- We update
left
to the newmid
, makingleft
now 3.Right
remains 4.
- We update
-
Third Iteration of Binary Search:
- Calculate
mid
as (3 + 4) / 2, which gets rounded down to 3 (since we need integers for substrings lengths). - Since
mid
is still 3 and we've already found a duplicate for length 3, the bounds don't need to be changed.
- Calculate
-
No Further Progress:
- As
left
andright
are no longer converging, the loop will terminate.
- As
-
Termination and Result:
- In this case, the binary search would stop here as it cannot progress further and we have found that the longest duplicated substring is "axa".
By employing binary search, we quickly identify the longest duplicate without checking every length. We start from the midpoint and either increase or decrease depending on the search results. Thus, the longest duplicated substring in "axaxa" is "axa".
Solution Implementation
1class Solution:
2 def longestDupSubstring(self, s: str) -> str:
3 # Helper function to check if a duplicate substring of given length exists
4 def has_duplicate(length):
5 seen = set()
6 for i in range(n - length + 1):
7 # Extract a substring of given length
8 substring = s[i : i + length]
9 # If this substring has already been seen, return it as a duplicate
10 if substring in seen:
11 return substring
12 # If not seen, add this substring to the set of seen substrings
13 seen.add(substring)
14 # Return an empty string if no duplicate is found
15 return ''
16
17 # Length of input string
18 n = len(s)
19 # Initialize binary search bounds
20 left, right = 0, n
21 # variable to store the longest duplicate substring found so far
22 longest_dup = ''
23
24 # Perform binary search to find the length of the longest duplicate substring
25 while left < right:
26 # Calculate mid-point
27 mid = (left + right + 1) // 2
28 # Use the helper function to check for a duplicate of the current mid length
29 current_dup = has_duplicate(mid)
30 # Keep the longest duplicate seen so far
31 longest_dup = current_dup or longest_dup
32
33 # If a duplicate of the current length exists, search in the upper half
34 if current_dup:
35 left = mid
36 # If no duplicate of the current length exists, search in the lower half
37 else:
38 right = mid - 1
39
40 # Return the longest duplicate substring found
41 return longest_dup
42
1import java.util.HashSet;
2import java.util.Set;
3
4public class Solution {
5 // Precomputed hashes and powers of the base
6 private long[] powers;
7 private long[] hashes;
8
9 // Finds the longest duplicate substring in a given string
10 public String longestDupSubstring(String s) {
11 int base = 131; // Base value for polynomial hash calculation
12 int n = s.length(); // Length of the input string
13 powers = new long[n + 10];
14 hashes = new long[n + 10];
15 powers[0] = 1;
16
17 // Precompute the powers and hashes
18 for (int i = 0; i < n; ++i) {
19 powers[i + 1] = powers[i] * base;
20 hashes[i + 1] = hashes[i] * base + s.charAt(i);
21 }
22
23 String longestDuplicate = ""; // Store the longest duplicate substring
24 int leftBound = 0, rightBound = n; // Define search bounds
25
26 // Perform binary search on substring length
27 while (leftBound < rightBound) {
28 int mid = (leftBound + rightBound + 1) >>> 1; // Middle point (length of substring)
29 String duplicate = checkForDuplicate(s, mid);
30 if (!duplicate.isEmpty()) {
31 leftBound = mid; // If a duplicate is found, search in the upper half
32 longestDuplicate = duplicate;
33 } else {
34 rightBound = mid - 1; // Otherwise, search in the lower half
35 }
36 }
37
38 return longestDuplicate; // Return the longest duplicate substring found
39 }
40
41 // Checks for a duplicate substring of a given length
42 private String checkForDuplicate(String s, int len) {
43 int n = s.length();
44 Set<Long> seenHashes = new HashSet<>(); // Set to store previously encountered hashes
45
46 // Iterate over each possible substring of the given length
47 for (int i = 1; i + len - 1 <= n; ++i) {
48 int j = i + len - 1;
49 // Compute the hash for the current substring
50 long currentHash = hashes[j] - hashes[i - 1] * powers[j - i + 1];
51 if (seenHashes.contains(currentHash)) {
52 // If the hash is already in the set, we found a duplicate
53 return s.substring(i - 1, j);
54 }
55 // Add the current hash to the set
56 seenHashes.add(currentHash);
57 }
58 // If no duplicate is found, return an empty string
59 return "";
60 }
61}
62
1#include <string>
2#include <unordered_set>
3using namespace std;
4
5typedef unsigned long long ull;
6
7class Solution {
8public:
9 ull power[30010];
10 ull hash[30010];
11
12 // Function to find the longest duplicated substring in the given string s.
13 string longestDupSubstring(string s) {
14 int base = 131; // Base used for polynomial rolling hash function.
15 int n = s.size();
16 // Pre-computing the powers of base.
17 power[0] = 1;
18 for (int i = 0; i < n; ++i) {
19 power[i + 1] = power[i] * base;
20 // Calculating hash values for the prefix ending at current character.
21 hash[i + 1] = hash[i] * base + s[i];
22 }
23 // Binary search on the length of the duplicated substring.
24 int left = 0, right = n;
25 string result = "";
26 while (left < right) {
27 // Trying out the middle length in the binary search.
28 int mid = (left + right + 1) >> 1;
29 // Checking if there's a duplicated substring of length 'mid'.
30 string candidate = check(s, mid);
31 if (candidate.empty())
32 right = mid - 1; // If not found, look for shorter substrings.
33 else {
34 left = mid; // If found, look for longer substrings.
35 result = candidate;
36 }
37 }
38 return result;
39 }
40
41private:
42 // Check function to find a duplicated substring of given length.
43 string check(const string& s, int len) {
44 int n = s.size();
45 unordered_set<ull> visited; // Hashes of substrings we have seen.
46 for (int i = 1; i + len - 1 <= n; ++i) {
47 int j = i + len - 1;
48 // Compute the hash of the current substring using a sliding window.
49 ull current_hash = hash[j] - hash[i - 1] * power[j - i + 1];
50 // Check if the current substring's hash has already been seen.
51 if (visited.count(current_hash))
52 return s.substr(i - 1, len); // If so, return the duplicated substring.
53 visited.insert(current_hash);
54 }
55 return ""; // No duplicated substring of this length exists.
56 }
57};
58
1// Define a base for the polynomial rolling hash function
2const base = 131;
3
4// Arrays for storing power values of the base and hash values of the prefixes
5let power: bigint[] = [];
6let hash: bigint[] = [];
7
8// Function to pre-compute the powers of the base up to the given length
9function computePowers(length: number): void {
10 power[0] = BigInt(1);
11 for (let i = 0; i < length; ++i) {
12 power[i + 1] = power[i] * BigInt(base);
13 }
14}
15
16// Function to pre-compute the hash values of the prefixes of the string
17function computeHashes(s: string): void {
18 const n = s.length;
19 hash[0] = BigInt(0);
20 for (let i = 0; i < n; ++i) {
21 hash[i + 1] = hash[i] * BigInt(base) + BigInt(s.charCodeAt(i));
22 }
23}
24
25// Function to find the longest duplicated substring in the given string
26function longestDupSubstring(s: string): string {
27 const n = s.length;
28
29 // Pre-compute powers and hashes
30 computePowers(n);
31 computeHashes(s);
32
33 // Binary search for the longest duplicated substring length
34 let left = 0, right = n;
35 let result = "";
36
37 while (left < right) {
38 const mid = Math.floor((left + right + 1) / 2);
39 const candidate = check(s, mid);
40
41 if (candidate) {
42 left = mid;
43 result = candidate;
44 } else {
45 right = mid - 1;
46 }
47 }
48
49 return result;
50}
51
52// Function to check for a duplicated substring of the given length
53function check(s: string, len: number): string | null {
54 const n = s.length;
55 const visited = new Set<bigint>();
56
57 for (let i = 1; i + len - 1 <= n; ++i) {
58 const j = i + len - 1;
59 const currentHash = hash[j] - hash[i - 1] * power[len];
60
61 if (visited.has(currentHash)) {
62 return s.substring(i - 1, i - 1 + len);
63 }
64
65 visited.add(currentHash);
66 }
67
68 return null;
69}
70
71// Example usage
72const inputString = "banana";
73const result = longestDupSubstring(inputString);
74console.log(result); // Outputs the longest duplicated substring, if found.
75
Time and Space Complexity
Time Complexity
The provided code uses a binary search combined with a rolling hash (or in this case, a direct string check) for finding the longest duplicated substring in a given string. The main while
loop (which is effectively the binary search) will run for O(log n)
iterations, where n
is the length of the input string s
.
Inside the binary search, there's a call to the check
function which runs for each possible substring length determined by mid
. The check
function iterates for each possible starting position of a substring (n - l + 1
times, with l
being the length of the current substring we're checking). For each iteration, it checks if that substring has already been seen (O(1)
operation for a hash set lookup on average), and if not, it adds the substring to the set (O(l)
operation for hashing the substring). Therefore, the time complexity of each check
call is O((n - l + 1) * l)
.
Since the outer loop runs O(log n)
times and the check
function is O((n - l + 1) * l)
for each iteration, and l
ranges from 0 to n
, the overall time complexity is O(n log n * l)
, which can also be simplified to O(n^2 log n)
as the worst-case scenario when l
is close to n
.
Space Complexity
The space complexity is dominated by the vis
set, which at most will store n
substrings of length l
. Since Python strings are immutable and slicing a string creates a new string, the space required to store each individual substring is O(l)
, where l
can be at most n
.
Hence, in the worst case, the space complexity is O(n^2)
, occurring when l
is close to n
and we store all possible substrings of that length.
Learn more about how to find time and space complexity quickly using problem constraints.
Which of these pictures shows the visit order of a depth-first search?
Recommended Readings
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
https algomonster s3 us east 2 amazonaws com cover_photos stack svg Sliding Window Maximum Monotonic Stack We have an array and a sliding window defined by a start index and an end index The sliding window moves from left of the array to right There are always k elements in
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Want a Structured Path to Master System Design Too? Don’t Miss This!