3076. Shortest Uncommon Substring in an Array

MediumTrieArrayHash TableString
Leetcode Link

Problem Description

You are provided with an array called arr where each element is a non-empty string. The size of the array is denoted as n. The goal is to identify for each element in arr, the shortest substring that does not appear in any other strings within the array. If there are several such substrings, the lexicographically smallest one should be chosen. If no such unique substring exists for an element, an empty string should be the output for that element. The result should be returned as an array answer with the same size as arr, where answer[i] contains the shortest unique substring for arr[i].

To grasp what is being asked, one needs to understand basic string operations:

  1. A substring is a contiguous sequence of characters within a string.
  2. To be lexicographically smaller means to come first in alphabetical order; for example, 'abc' is lexicographically smaller than 'acd'.

Thus, the problem is essentially about finding a part of each string that is distinctive to that string alone when compared to the entire array of strings.

Intuition

The direct approach to this problem is to enumerate substrings of each string systematically and check their uniqueness. To achieve this, for every string arr[i] in the input array, we generate all possible substrings beginning with the shortest length (1 character) and increasing in size. For each substring sub of arr[i], we check whether sub is contained in any other string in the array.

We have to do this methodically:

  1. Fixed Length Enumeration: Start with the shortest possible substrings and increase their length incrementally. This ensures that the first unique substring we find will also be the shortest for arr[i].

  2. Checking Uniqueness: For each potential substring, verify that it does not occur in any other strings except arr[i]. If it is unique, we immediately stop the search for this string and move to the next one since this is the shortest unique substring for arr[i].

  3. Lexicographical Order: By starting with the shortest substrings and examining each possible substring from left to right, we guarantee that the first match will be lexicographically the smallest since we are essentially exploring them in "dictionary" order.

Implementing this is quite straightforward, through nested loops, as reflected in the provided Python code. In the first loop, we go through each string arr[i], then we enumerate substring lengths j, and for each length, we slide over the string with another loop starting at position l to get sub. With a few if-conditions and an all-encompassing loop, we ensure all conditions are met before setting the unique substring into our answer array.

It is worth noting that this approach is not the most efficient in terms of time complexity, especially for large arrays or strings. However, the problem statement seems to suggest that the array and strings are of reasonable length, allowing for a brute-force enumeration to be a viable solution.

Learn more about Trie patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Which of the following shows the order of node visit in a Breadth-first Search?

Solution Approach

The solution to this problem uses brute force enumeration, which means trying out all the possibilities systematically until the correct answer is found. In terms of algorithmic patterns, this is a straightforward approach without involving any advanced data structures or optimizations. The code executes a series of nested loops to examine each substring and verify its uniqueness among all strings in the input array.

Here's how the algorithm is implemented step by step:

  1. Initialize an answer list ans with empty strings for each element in arr. This list will eventually hold the shortest unique substrings for each string in arr.

  2. Start with the outermost loop that goes through each string s in arr using its index i. This allows us to keep track of the current string and its corresponding position in the answer array.

  3. For each string s, determine its length m. Enumerate the length j of all possible substrings starting from 1 up to m inclusive. This is done using a loop for the length of the substring, ensuring we start with the shortest length and move to the longer ones.

  4. For each substring length j, use another loop to iterate over all possible starting positions l in string s. A substring sub is formed from string s starting at position l and spanning j characters.

  5. Within the innermost loop, check the following conditions for each sub:

    • If sub is lexigraphically smaller than the current answer for s in ans[i] or if ans[i] is still an empty string, proceed to the next condition.
    • Verify that sub is not a substring of any other string t in arr, except the current string s. Use the built-in Python not in operator to check if sub is not part of string t.
  6. If sub passes these conditions, it means sub is unique and lexicographically smaller than any previously considered valid substrings. Therefore, update ans[i] with this sub.

  7. Once a valid sub is found and ans[i] is updated, there is no need to consider longer substrings for the current string s. So, break the loop that is iterating over the length j to move to the next string s in arr.

  8. After all strings have been processed, the answer list ans is complete with the shortest unique substrings and is returned.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Is the following code DFS or BFS?

1void search(Node root) {
2  if (!root) return;
3  visit(root);
4  root.visited = true;
5  for (Node node in root.adjacent) {
6    if (!node.visited) {
7      search(node);
8    }
9  }
10}

Example Walkthrough

Let's illustrate the solution approach with a small example. Consider the array arr = ["abc", "bca", "dac"].

  1. Initialize an empty answer list ans with the same size as arr. In this case, ans = ["", "", ""].

  2. Start an outer loop over the elements of arr. We first look at arr[0] which is "abc".

  3. For string "abc", with length m = 3, consider all possible substrings. We start with length j = 1.

  4. Investigate all starting positions l of substrings of "abc" with length j = 1:

    • The first substring is "a". It appears in "dac" as well, so it is not unique.
    • The next substring is "b", which also appears in "bca".
    • The last substring is "c", which appears in "bca" as well.
  5. Increase j to 2 since no unique substring was found with length 1. Check all substrings with j = 2:

    • Substring "ab" is not in "bca" or "dac", so it is unique.
    • Even though we could check "bc", we found the shortest and lexicographically smallest unique substring "ab". Update ans[0] to "ab".
  6. Move to the next string in arr, which is "bca".

  7. Repeat the same process: check all unique substrings of "bca" starting with j = 1.

    • "b" isn't unique as it appears in "abc"; same for "c".
    • "a" does not appear in the other strings, so it is unique.
  8. Update ans[1] to "a" since we found the shortest unique substring for the second string in arr.

  9. Apply the same approach to "dac":

    • "d" is unique since it doesn't appear in "abc" or "bca".
  10. Now we can update ans[2] to "d".

The process is done, and we've populated our answer list: ans = ["ab", "a", "d"] which contains, for each string in arr, the shortest unique substring that doesn't appear in any other string in arr.

This example has demonstrated the effectiveness of the brute force enumeration approach in finding the shortest unique substrings.

Solution Implementation

1from typing import List
2
3class Solution:
4    def shortestSubstrings(self, strings: List[str]) -> List[str]:
5        # Initialize the answer list with empty strings for each input string
6        shortest_substrings = [""] * len(strings)
7
8        # Iterate over each string and its index in the list
9        for index, string in enumerate(strings):
10            string_length = len(string)
11
12            # Iterate over all possible substring lengths
13            for length in range(1, string_length + 1):
14                # Iterate over each starting position for the substring
15                for start_pos in range(string_length - length + 1):
16                    # Extract the substring
17                    substring = string[start_pos : start_pos + length]
18
19                    # Check if this substring is a valid candidate:
20                    # - Either it's the first such substring found or
21                    # - It's lexicographically smaller than the previously found substring
22                    # - And it's not a substring of any of the other input strings
23                    if not shortest_substrings[index] or shortest_substrings[index] > substring:
24                        if all(substring not in other_string for other_index, other_string in enumerate(strings) if other_index != index):
25                            shortest_substrings[index] = substring
26
27                # Move onto the next string if we found a valid substring
28                if shortest_substrings[index]:
29                    break
30
31        # Return the list of shortest substrings
32        return shortest_substrings
33
1class Solution {
2
3    /**
4     * Finds the shortest substring for each string in an array
5     * such that the substring is not present in any other string in the array.
6     *
7     * @param strings An array of strings to find substrings from.
8     * @return An array containing the shortest unique substrings.
9     */
10    public String[] shortestSubstrings(String[] strings) {
11        int arrayLength = strings.length;
12        String[] result = new String[arrayLength];
13        // Initialize result array with empty strings
14        Arrays.fill(result, "");
15
16        // Iterate through each string in the array
17        for (int i = 0; i < arrayLength; ++i) {
18            int stringLength = strings[i].length();
19
20            // Try all possible substring lengths starting from 1
21            for (int len = 1; len <= stringLength && result[i].isEmpty(); ++len) {
22                // Try all possible starting positions for the substring of length 'len'
23                for (int start = 0; start <= stringLength - len; ++start) {
24                    // Extract the substring
25                    String substr = strings[i].substring(start, start + len);
26
27                    // Check if we already found a shorter unique substring
28                    if (result[i].isEmpty() || substr.compareTo(result[i]) < 0) {
29                        boolean isUnique = true;
30
31                        // Check if the substring exists in any other string
32                        for (int k = 0; k < arrayLength && isUnique; ++k) {
33                            if (k != i && strings[k].contains(substr)) {
34                                isUnique = false;
35                            }
36                        }
37
38                        // If the substring is unique, update the result
39                        if (isUnique) {
40                            result[i] = substr;
41                        }
42                    }
43                }
44            }
45        }
46        return result;
47    }
48}
49
1#include <vector>
2#include <string>
3
4class Solution {
5public:
6    vector<string> shortestSubstrings(vector<string>& strings) {
7        int numberOfStrings = strings.size(); // Get the number of strings in the array
8        vector<string> answers(numberOfStrings); // Initialize the answer vector with the same size
9
10        // Loop through each string in the array of strings
11        for (int i = 0; i < numberOfStrings; ++i) {
12            int stringLength = strings[i].size(); // Get the length of the current string
13
14            // Iterate over all possible substring lengths starting from 1 to the size of the string
15            for (int length = 1; length <= stringLength && answers[i].empty(); ++length) {
16
17                // Try all substrings of the current length
18                for (int startPosition = 0; startPosition <= stringLength - length; ++startPosition) {
19                    string substring = strings[i].substr(startPosition, length); // Extract the substring
20
21                    // If the answer for the current string is not found or the current substring is lexicographically smaller
22                    if (answers[i].empty() || substring < answers[i]) {
23                        bool isUnique = true; // Flag to check if the substring is unique among all strings
24
25                        // Check if the substring appears in any other string
26                        for (int k = 0; k < numberOfStrings && isUnique; ++k) {
27                            if (k != i && strings[k].find(substring) != string::npos) {
28                                isUnique = false; // The substring is not unique as it is found in another string
29                            }
30                        }
31
32                        // If the substring is unique, update the answer for the current string
33                        if (isUnique) {
34                            answers[i] = substring;
35                        }
36                    }
37                }
38            }
39        }
40
41        return answers; // Return the vector containing the shortest unique substrings
42    }
43};
44
1function shortestSubstrings(arr: string[]): string[] {
2    const arrayLength: number = arr.length; // Length of the input array.
3    const result: string[] = Array(arrayLength).fill(''); // Placeholder for the shortest unique substrings.
4
5    // Iterate over each string in the input array.
6    for (let i = 0; i < arrayLength; ++i) {
7        const stringLength: number = arr[i].length; // Length of the current string.
8
9        // Iterate over all possible substring lengths.
10        for (let currentLength = 1; currentLength <= stringLength && result[i] === ''; ++currentLength) {
11            // Iterate over all possible starting indices for the current substring length.
12            for (let startIdx = 0; startIdx <= stringLength - currentLength; ++startIdx) {
13                const substring: string = arr[i].slice(startIdx, startIdx + currentLength); // Extract the substring.
14
15                // If we haven't found a substring yet or the current one is lexicographically smaller.
16                if (result[i] === '' || substring.localeCompare(result[i]) < 0) {
17                    let isUnique: boolean = true; // Flag to check if the substring is unique.
18                  
19                    // Check the uniqueness of the substring across all strings in the array.
20                    for (let k = 0; k < arrayLength && isUnique; ++k) {
21                        if (k !== i && arr[k].includes(substring)) {
22                            isUnique = false; // Substring is not unique.
23                        }
24                    }
25
26                    // If the substring is unique, set it as the result for the current string.
27                    if (isUnique) {
28                        result[i] = substring;
29                    }
30                }
31            }
32        }
33    }
34
35    return result; // Return the array containing the shortest unique substrings.
36}
37
Not Sure What to Study? Take the 2-min Quiz:

Which data structure is used to implement priority queue?

Time and Space Complexity

Time Complexity

The time complexity of the provided code can be elaborated as follows:

  • We have n strings in our array arr, where n is the length of the array.

  • For each string s of maximum length m, we are looking at all possible substrings. The total number of such substrings that can be formed from a string of length m is m * (m + 1) / 2, which is derived from the sum of the series 1 + 2 + 3 + ... + m.

  • For each substring sub, we check if it exists in any of the other strings with n - 1 comparisons and each comparison could take up to m operations in the worst case.

  • Combining the above points, we have n (for each string) times m * (m + 1) / 2 (for all substrings in a particular string) times n - 1 (for checking each substring against all other strings) times m(for the comparison operation), yielding a worst case complexity of O(n * (m^2) * (n-1) * m) => O(n^2 * m^4).

Space Complexity

The space complexity can be determined as follows:

  • The code uses an extra list ans to store the shortest unique substrings, which will be the same size as the input array arr. Hence, the space taken by this list is O(n).

  • Within each loop iteration, a temporary substring sub is created, which, in the worst case, has the length m. Since these strings are not stored together and only exist temporarily one at a time, the space for this is O(m).

  • Considering the fact that the space required for storing the input array arr is not included in the space complexity of the algorithm (as it is given), the significant extra space used is for the ans list and the temporary substrings, leading to a total space complexity of O(n + m), which is O(m) since the problem constraints guarantee m <= 20, and thus m will not exceed n.

Therefore, given that n is the length of the input array of strings and m is the maximum length of a single string within the array (m <= 20), the time complexity of the code is O(n^2 * m^4) and the space complexity is O(m).

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

What is the running time of the following code?

1int sqrt(int n) {
2  for (int guess = 1; guess * guess <= n; guess++) {
3    if (guess * guess == n) {
4      return guess;
5    }
6  }
7  return -1;
8}

Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫