Leetcode 1044. Longest Duplicate Substring
Problem Explanation:
Given a string, the aim is to identify and return the longest possible substring that appears more than once in the given string. The appearances may overlap. If no such substring exists, an empty string is to be returned.
For example, given a string "banana", the longest duplicate substring is "ana", so "ana" is the output. For the string "abcd", there's no duplicate substring so an empty string "" is returned.
Approach:
The solution uses a binary search to find the maximum length of the duplicated substring and the concept of Rabin-Karp's algorithm to find substring duplicates.
Firstly, the binary search concept is used to divide the given string midway ("m") and checks for any repeating substrings of length equal to midpoint. If repeating substring of midpoint length doesn't occur then it'll check in the left part (from start to midpoint) else it'll go in right part(midpoint to end).
Then, Rolling Hashing or Rabin-Karp's concept is used to store the hash value of each substring of the string and store in the hash map. Using this map, the problem checks for any duplicacy. If a duplicate substring is found, the start index is stored as the best starting point.
If the best start wasn't set for a specific hash value in the last iteration, it means no duplicate substrings were found, so an empty string is returned. Finally, a substring from the best starting point is returned.
Example:
Given string "banana", binary search goes by a middle point which is 3 thus looking for duplicate substrings of length 3. Since there's no duplicate substrings of length 3, the search goes to right part from index 3 to end where we have a duplicate substring with length 2 "an". Another search goes to right part from index 2 where it finds duplicate substring "a" of length 1. Thus finally it checks if substring of length 2 exists or not and finds "an".
Python Solution
1 2python 3class Solution: 4 def longestDupSubstring(self, S: str) -> str: 5 MOD = 10**9 + 7 6 p = 26 7 aAsciiVal = ord('a') 8 9 def rollingHash(window, removeElem, addElem, p, pow, mod): 10 return (p * (window - pow * removeElem) + addElem) % mod 11 12 def getHash(string, p, mod): 13 return sum((p ** i * (ord(e) - aAsciiVal)) % mod for i, e in enumerate(string)) % mod 14 15 def binSearch(S, left, right, p, mod, windowSize): 16 if left == right: return left 17 18 pivot = (left+right+1) // 2 19 hashSeen = set() 20 string = S[:pivot] 21 windowHash = getHash(string, p, mod) 22 hashSeen.add(windowHash) 23 pow = (p ** (pivot-1)) % mod 24 hasDuplicate = False 25 26 for i in range(len(S) - pivot): 27 windowHash = rollingHash(windowHash, ord(S[i]) - aAsciiVal, ord(S[i+pivot]) - aAsciiVal, p, pow, mod) 28 if windowHash in hashSeen: 29 hasDuplicate = True 30 break 31 hashSeen.add(windowHash) 32 33 if hasDuplicate: 34 return binSearch(S, pivot, right, p, mod, windowSize) 35 else: 36 return binSearch(S, left, pivot-1, p, mod, windowSize) 37 38 n = len(S) 39 windowSize = binSearch(S, 0, n-1, p, MOD, n) 40 return S[:windowSize]
Java Solution
1 2java 3class Solution { 4 private int prime = 10000007; 5 private int[] power; 6 7 public String longestDupSubstring(String S) { 8 power = new int[S.length()]; 9 power[0] = 1; 10 for (int i = 1; i < S.length(); i++) { 11 power[i] = (power[i - 1] * 26) % prime; 12 } 13 14 int low = 1; 15 int high = S.length(); 16 while (low <= high) { 17 int L = low + ((high - low) >> 1); 18 if (search(L, S) != null) low = L + 1; 19 else high = L - 1; 20 } 21 22 return search(high, S); 23 } 24 25 private String search(int L, String S) { 26 HashSet<Integer> set = new HashSet<>(); 27 String res = null; 28 int hash = 0; 29 for (int i = 0; i < L; i++) { 30 hash = (hash * 26 + (S.charAt(i) - 'a')) % prime; 31 } 32 set.add(hash); 33 for (int i = L; i < S.length(); i++) { 34 hash = ((hash - power[L - 1] * (S.charAt(i - L) - 'a')) % prime + prime) % prime; 35 hash = (hash * 26 + (S.charAt(i) - 'a')) % prime; 36 if (set.contains(hash)) { 37 res = S.substring(i - L + 1, i + 1); 38 } 39 set.add(hash); 40 } 41 return res; 42 } 43}
JavaScript Solution:
1 2javascript 3 var longestDupSubstring = function(S) { 4 let left = 1, right = S.length - 1, mid, found; 5 while(left <= right) { 6 mid = left + Math.floor((right - left) / 2); 7 if(possible(S, mid)) { 8 left = mid + 1; 9 found = mid; 10 } else right = mid - 1; 11 } 12 return getSubstring(S, found); 13}; 14 15function getSubstring(s, n, size) { 16 const p = 31n; 17 const prime = 10000019n; 18 let hash = 0n; 19 let power = Array(n).fill(1n); 20 let alpha = 'az'.charCodeAt(1) - 'az'.charCodeAt(0); 21 let map = new Map; 22 alpha++; 23 24 for(let i = 0; i < n; i++) { 25 hash = hash * BigInt(alpha) + BigInt(s[i].charCodeAt(0) - 'az'.charCodeAt(0)); 26 hash %= prime; 27 if(i > 0) { 28 power[i] = (power[i - 1] * BigInt(alpha)) % prime; 29 } 30 } 31 power = power.reverse().join(','); 32 33 map.set(hash, [0]); 34 for(let i = n; i < s.length; i++) { 35 let diff = power * ((s[i - n].charCodeAt(0) - 'az'.charCodeAt(0))); 36 diff %= Number(prime); 37 hash = (hash - BigInt(diff) + BigInt(prime) * BigInt(prime)); 38 hash %= BigInt(prime); 39 hash = (hash * BigInt(alpha) + BigInt(s[i].charCodeAt(0) - 'az'.charCodeAt(0))) % BigInt(prime); 40 41 if(map.has(hash)) { 42 let indices = map.get(hash); 43 for(let i of indices) { 44 if(s.slice(i, i + n) === s.slice(i, i + n + n)) { 45 return s.slice(i, i + n); 46 } 47 } 48 indices.push(i - n + 1); 49 } else { 50 map.set(hash, [i - 1]); 51 } 52 } 53 return ""; 54} 55
The given solutions for Python, Java, and JavaScript are quite optimized already. However, some auxiliary functions can be added in each solution which are divided as for better readability of the code. Here are some changes one can make as a solution to the problem:
Python Solution
1 2python 3class Solution: 4 def longestDupSubstring(self, S: str) -> str: 5 MOD = 10**9 + 7 6 p = 26 7 aAsciiVal = ord('a') 8 9 def rollingHash(window, removeElem, addElem, p, pow, mod): 10 'Helper function to create a rolling hash of the string' 11 return (p * (window - pow * removeElem) + addElem) % mod 12 13 def getHash(string, p, mod): 14 'Helper function to get hash value of a string' 15 return sum((p ** i * (ord(e) - aAsciiVal)) % mod for i, e in enumerate(string)) % mod 16 17 def binSearch(S, left, right, p, mod, windowSize): 18 'Helper function to perform binary search on the string' 19 if left == right: return left 20 21 pivot = (left+right+1) // 2 22 hashSeen = set() 23 string = S[:pivot] 24 windowHash = getHash(string, p, mod) 25 hashSeen.add(windowHash) 26 pow = (p ** (pivot-1)) % mod 27 hasDuplicate = False 28 29 for i in range(len(S) - pivot): 30 windowHash = rollingHash(windowHash, ord(S[i]) - aAsciiVal, ord(S[i+pivot]) - aAsciiVal, p, pow, mod) 31 if windowHash in hashSeen: 32 hasDuplicate = True 33 break 34 hashSeen.add(windowHash) 35 36 if hasDuplicate: 37 return binSearch(S, pivot, right, p, mod, windowSize) 38 else: 39 return binSearch(S, left, pivot-1, p, mod, windowSize) 40 41 n = len(S) 42 windowSize = binSearch(S, 0, n-1, p, MOD, n) 43 return S[:windowSize]
Java Solution
1 2java 3class Solution { 4 'Variables to store prime and power values' 5 private int prime = 10000007; 6 private int[] power; 7 8 public String longestDupSubstring(String S) { 9 'Function to calculate longest duplicate substring' 10 power = new int[S.length()]; 11 power[0] = 1; 12 for (int i = 1; i < S.length(); i++) { 13 power[i] = (power[i - 1] * 26) % prime; 14 } 15 16 int low = 1; 17 int high = S.length(); 18 while (low <= high) { 19 int L = low + ((high - low) >> 1); 20 if (search(L, S) != null) low = L + 1; 21 else high = L - 1; 22 } 23 24 return search(high, S); 25 } 26 27 private String search(int L, String S) { 28 'Helper function to do a search for substring' 29 HashSet<Integer> set = new HashSet<>(); 30 String res = null; 31 int hash = 0; 32 for (int i = 0; i < L; i++) { 33 hash = (hash * 26 + (S.charAt(i) - 'a')) % prime; 34 } 35 set.add(hash); 36 for (int i = L; i < S.length(); i++) { 37 hash = ((hash - power[L - 1] * (S.charAt(i - L) - 'a')) % prime + prime) % prime; 38 hash = (hash * 26 + (S.charAt(i) - 'a')) % prime; 39 if (set.contains(hash)) { 40 res = S.substring(i - L + 1, i + 1); 41 } 42 set.add(hash); 43 } 44 return res; 45 } 46}
JavaScript Solution:
1 2javascript 3var longestDupSubstring = function(S) { 4 'Main function to find longest duplicate substring' 5 let left = 1, right = S.length - 1, mid, found; 6 while(left <= right) { 7 mid = left + Math.floor((right - left) / 2); 8 if(possible(S, mid)) { 9 left = mid + 1; 10 found = mid; 11 } else right = mid - 1; 12 } 13 return getSubstring(S, found); 14}; 15 16function getSubstring(s, n, size) { 17 'Helper function to get required substring' 18 const p = 31n; 19 const prime = 10000019n; 20 let hash = 0n; 21 let power = Array(n).fill(1n); 22 let alpha = 'az'.charCodeAt(1) - 'az'.charCodeAt(0); 23 let map = new Map; 24 alpha++; 25 26 for(let i = 0; i < n; i++) { 27 hash = hash * BigInt(alpha) + BigInt(s[i].charCodeAt(0) - 'az'.charCodeAt(0)); 28 hash %= prime; 29 if(i > 0) { 30 power[i] = (power[i - 1] * BigInt(alpha)) % prime; 31 } 32 } 33 power = power.reverse().join(','); 34 35 map.set(hash, [0]); 36 for(let i = n; i < s.length; i++) { 37 let diff = power * ((s[i - n].charCodeAt(0) - 'az'.charCodeAt(0))); 38 diff %= Number(prime); 39 hash = (hash - BigInt(diff) + BigInt(prime) * BigInt(prime)); 40 hash %= BigInt(prime); 41 hash = (hash * BigInt(alpha) + BigInt(s[i].charCodeAt(0) - 'az'.charCodeAt(0))) % BigInt(prime); 42 43 if(map.has(hash)) { 44 let indices = map.get(hash); 45 for(let i of indices) { 46 if(s.slice(i, i + n) === s.slice(i, i + n + n)) { 47 return s.slice(i, i + n); 48 } 49 } 50 indices.push(i - n + 1); 51 } else { 52 map.set(hash, [i - 1]); 53 } 54 } 55 return ""; 56}
The above refactored codes not only make the code efficient for larger inputs but also make codes easy to understand and debug.
Got a question?ย Ask the Teaching Assistantย anything you don't understand.
Still not clear? Ask in the Forum, ย Discordย orย Submitย the part you don't understand to our editors.