Leetcode 1044. Longest Duplicate Substring

Problem Explanation:

Given a string, the aim is to identify and return the longest possible substring that appears more than once in the given string. The appearances may overlap. If no such substring exists, an empty string is to be returned.

For example, given a string "banana", the longest duplicate substring is "ana", so "ana" is the output. For the string "abcd", there's no duplicate substring so an empty string "" is returned.

Approach:

The solution uses a binary search to find the maximum length of the duplicated substring and the concept of Rabin-Karp's algorithm to find substring duplicates.

Firstly, the binary search concept is used to divide the given string midway ("m") and checks for any repeating substrings of length equal to midpoint. If repeating substring of midpoint length doesn't occur then it'll check in the left part (from start to midpoint) else it'll go in right part(midpoint to end).

Then, Rolling Hashing or Rabin-Karp's concept is used to store the hash value of each substring of the string and store in the hash map. Using this map, the problem checks for any duplicacy. If a duplicate substring is found, the start index is stored as the best starting point.

If the best start wasn't set for a specific hash value in the last iteration, it means no duplicate substrings were found, so an empty string is returned. Finally, a substring from the best starting point is returned.

Example:

Given string "banana", binary search goes by a middle point which is 3 thus looking for duplicate substrings of length 3. Since there's no duplicate substrings of length 3, the search goes to right part from index 3 to end where we have a duplicate substring with length 2 "an". Another search goes to right part from index 2 where it finds duplicate substring "a" of length 1. Thus finally it checks if substring of length 2 exists or not and finds "an".

Python Solution

1
2python
3class Solution:
4    def longestDupSubstring(self, S: str) -> str:
5        MOD = 10**9 + 7
6        p = 26
7        aAsciiVal = ord('a')
8
9        def rollingHash(window, removeElem, addElem, p, pow, mod):
10            return (p * (window - pow * removeElem) + addElem) % mod
11
12        def getHash(string, p, mod):
13            return sum((p ** i * (ord(e) - aAsciiVal)) % mod for i, e in enumerate(string)) % mod
14
15        def binSearch(S, left, right, p, mod, windowSize):
16            if left == right: return left
17
18            pivot = (left+right+1) // 2
19            hashSeen = set()
20            string = S[:pivot]
21            windowHash = getHash(string, p, mod)
22            hashSeen.add(windowHash)
23            pow = (p ** (pivot-1)) % mod
24            hasDuplicate = False
25
26            for i in range(len(S) - pivot):
27                windowHash = rollingHash(windowHash, ord(S[i]) - aAsciiVal, ord(S[i+pivot]) - aAsciiVal, p, pow, mod)
28                if windowHash in hashSeen:
29                    hasDuplicate = True
30                    break
31                hashSeen.add(windowHash)
32
33            if hasDuplicate:
34                return binSearch(S, pivot, right, p, mod, windowSize)
35            else:
36                return binSearch(S, left, pivot-1, p, mod, windowSize)
37
38        n = len(S)
39        windowSize = binSearch(S, 0, n-1, p, MOD, n)
40        return S[:windowSize]

Java Solution

1
2java
3class Solution {
4    private int prime = 10000007;
5    private int[] power;
6
7    public String longestDupSubstring(String S) {
8        power = new int[S.length()];
9        power[0] = 1;
10        for (int i = 1; i < S.length(); i++) {
11            power[i] = (power[i - 1] * 26) % prime;
12        }
13
14        int low = 1;
15        int high = S.length();
16        while (low <= high) {
17            int L = low + ((high - low) >> 1);
18            if (search(L, S) != null) low = L + 1;
19            else high = L - 1;
20        }
21
22        return search(high, S);
23    }
24
25    private String search(int L, String S) {
26        HashSet<Integer> set = new HashSet<>();
27        String res = null;
28        int hash = 0;
29        for (int i = 0; i < L; i++) {
30            hash = (hash * 26 + (S.charAt(i) - 'a')) % prime;
31        }
32        set.add(hash);
33        for (int i = L; i < S.length(); i++) {
34            hash = ((hash - power[L - 1] * (S.charAt(i - L) - 'a')) % prime + prime) % prime;
35            hash = (hash * 26 + (S.charAt(i) - 'a')) % prime;
36            if (set.contains(hash)) {
37                res = S.substring(i - L + 1, i + 1);
38            }
39            set.add(hash);
40        }
41        return res;
42    }
43}

JavaScript Solution:

1
2javascript
3 var longestDupSubstring = function(S) {
4    let left = 1, right = S.length - 1, mid, found;
5    while(left <= right) {
6        mid = left  + Math.floor((right - left) / 2);
7        if(possible(S, mid)) {
8            left = mid + 1;
9            found = mid;
10        } else right = mid - 1;
11    }
12    return getSubstring(S, found);
13};
14
15function getSubstring(s, n, size) {
16    const p = 31n;
17    const prime = 10000019n;
18    let hash = 0n;
19    let power = Array(n).fill(1n);
20    let alpha = 'az'.charCodeAt(1) - 'az'.charCodeAt(0);
21    let map = new Map;
22    alpha++;
23
24    for(let i = 0; i < n; i++) {
25        hash = hash * BigInt(alpha) + BigInt(s[i].charCodeAt(0) - 'az'.charCodeAt(0));
26        hash %= prime;
27        if(i > 0) {
28            power[i] = (power[i - 1] * BigInt(alpha)) % prime;
29        }
30    }
31    power = power.reverse().join(',');
32
33    map.set(hash, [0]);
34    for(let i = n; i < s.length; i++) {
35        let diff = power * ((s[i - n].charCodeAt(0) - 'az'.charCodeAt(0)));
36        diff %= Number(prime);
37        hash = (hash - BigInt(diff) + BigInt(prime) * BigInt(prime));
38        hash %= BigInt(prime);
39        hash = (hash * BigInt(alpha) + BigInt(s[i].charCodeAt(0) - 'az'.charCodeAt(0))) % BigInt(prime);
40    
41        if(map.has(hash)) {
42            let indices = map.get(hash);
43            for(let i of indices) {
44                if(s.slice(i, i + n) === s.slice(i, i + n + n)) {
45                    return s.slice(i, i + n);
46                }
47            }
48            indices.push(i - n + 1);
49        } else {
50            map.set(hash, [i - 1]);
51        }
52    }
53    return "";
54}
55

The given solutions for Python, Java, and JavaScript are quite optimized already. However, some auxiliary functions can be added in each solution which are divided as for better readability of the code. Here are some changes one can make as a solution to the problem:

Python Solution

1
2python
3class Solution:
4    def longestDupSubstring(self, S: str) -> str:
5        MOD = 10**9 + 7
6        p = 26
7        aAsciiVal = ord('a')
8
9        def rollingHash(window, removeElem, addElem, p, pow, mod):
10            'Helper function to create a rolling hash of the string'
11            return (p * (window - pow * removeElem) + addElem) % mod
12
13        def getHash(string, p, mod):
14            'Helper function to get hash value of a string'
15            return sum((p ** i * (ord(e) - aAsciiVal)) % mod for i, e in enumerate(string)) % mod
16
17        def binSearch(S, left, right, p, mod, windowSize):
18            'Helper function to perform binary search on the string'
19            if left == right: return left
20
21            pivot = (left+right+1) // 2
22            hashSeen = set()
23            string = S[:pivot]
24            windowHash = getHash(string, p, mod)
25            hashSeen.add(windowHash)
26            pow = (p ** (pivot-1)) % mod
27            hasDuplicate = False
28
29            for i in range(len(S) - pivot):
30                windowHash = rollingHash(windowHash, ord(S[i]) - aAsciiVal, ord(S[i+pivot]) - aAsciiVal, p, pow, mod)
31                if windowHash in hashSeen:
32                    hasDuplicate = True
33                    break
34                hashSeen.add(windowHash)
35
36            if hasDuplicate:
37                return binSearch(S, pivot, right, p, mod, windowSize)
38            else:
39                return binSearch(S, left, pivot-1, p, mod, windowSize)
40
41        n = len(S)
42        windowSize = binSearch(S, 0, n-1, p, MOD, n)
43        return S[:windowSize]

Java Solution

1
2java
3class Solution {
4    'Variables to store prime and power values'
5    private int prime = 10000007;
6    private int[] power;
7
8    public String longestDupSubstring(String S) {
9        'Function to calculate longest duplicate substring'
10        power = new int[S.length()];
11        power[0] = 1;
12        for (int i = 1; i < S.length(); i++) {
13            power[i] = (power[i - 1] * 26) % prime;
14        }
15
16        int low = 1;
17        int high = S.length();
18        while (low <= high) {
19            int L = low + ((high - low) >> 1);
20            if (search(L, S) != null) low = L + 1;
21            else high = L - 1;
22        }
23
24        return search(high, S);
25    }
26
27    private String search(int L, String S) {
28        'Helper function to do a search for substring'
29        HashSet<Integer> set = new HashSet<>();
30        String res = null;
31        int hash = 0;
32        for (int i = 0; i < L; i++) {
33            hash = (hash * 26 + (S.charAt(i) - 'a')) % prime;
34        }
35        set.add(hash);
36        for (int i = L; i < S.length(); i++) {
37            hash = ((hash - power[L - 1] * (S.charAt(i - L) - 'a')) % prime + prime) % prime;
38            hash = (hash * 26 + (S.charAt(i) - 'a')) % prime;
39            if (set.contains(hash)) {
40                res = S.substring(i - L + 1, i + 1);
41            }
42            set.add(hash);
43        }
44        return res;
45    }
46}

JavaScript Solution:

1
2javascript
3var longestDupSubstring = function(S) {
4    'Main function to find longest duplicate substring'
5    let left = 1, right = S.length - 1, mid, found;
6    while(left <= right) {
7        mid = left  + Math.floor((right - left) / 2);
8        if(possible(S, mid)) {
9            left = mid + 1;
10            found = mid;
11        } else right = mid - 1;
12    }
13    return getSubstring(S, found);
14};
15
16function getSubstring(s, n, size) {
17    'Helper function to get required substring'
18    const p = 31n;
19    const prime = 10000019n;
20    let hash = 0n;
21    let power = Array(n).fill(1n);
22    let alpha = 'az'.charCodeAt(1) - 'az'.charCodeAt(0);
23    let map = new Map;
24    alpha++;
25
26    for(let i = 0; i < n; i++) {
27        hash = hash * BigInt(alpha) + BigInt(s[i].charCodeAt(0) - 'az'.charCodeAt(0));
28        hash %= prime;
29        if(i > 0) {
30            power[i] = (power[i - 1] * BigInt(alpha)) % prime;
31        }
32    }
33    power = power.reverse().join(',');
34
35    map.set(hash, [0]);
36    for(let i = n; i < s.length; i++) {
37        let diff = power * ((s[i - n].charCodeAt(0) - 'az'.charCodeAt(0)));
38        diff %= Number(prime);
39        hash = (hash - BigInt(diff) + BigInt(prime) * BigInt(prime));
40        hash %= BigInt(prime);
41        hash = (hash * BigInt(alpha) + BigInt(s[i].charCodeAt(0) - 'az'.charCodeAt(0))) % BigInt(prime);
42    
43        if(map.has(hash)) {
44            let indices = map.get(hash);
45            for(let i of indices) {
46                if(s.slice(i, i + n) === s.slice(i, i + n + n)) {
47                    return s.slice(i, i + n);
48                }
49            }
50            indices.push(i - n + 1);
51        } else {
52            map.set(hash, [i - 1]);
53        }
54    }
55    return "";
56}

The above refactored codes not only make the code efficient for larger inputs but also make codes easy to understand and debug.


Got a question?ย Ask the Teaching Assistantย anything you don't understand.

Still not clear? Ask in the Forum, ย Discordย orย Submitย the part you don't understand to our editors.

โ†
โ†‘TA ๐Ÿ‘จโ€๐Ÿซ