1408. String Matching in an Array

EasyArrayStringString Matching

Problem Description

You are given an array of strings called words. Your task is to find all strings in this array that appear as a substring within at least one other string in the same array.

A substring is a contiguous sequence of characters within a string. For example, "abc" is a substring of "zabcd", but "acd" is not.

The goal is to return a list containing all strings from words that can be found inside another string from the same array. The strings in your result can be returned in any order.

For example, if words = ["mass", "as", "hero", "superhero"]:

"as" is a substring of "mass"
"hero" is a substring of "superhero"
So the output would be ["as", "hero"]

The solution iterates through each word in the array and checks if it appears as a substring in any other word (excluding itself). If a word is found to be a substring of another word, it gets added to the answer list.

The implementation uses a nested approach where for each word at index i, it checks all other words at different indices j (where i != j) to see if the current word appears within them using Python's in operator for substring checking.

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Intuition

The key insight here is that we need to identify which strings are "contained" within other strings. Since we're looking for substrings, the most straightforward approach is to check each string against all other strings in the array.

Think of it this way: for each word, we want to answer the question "Does this word appear inside any other word in the array?" If the answer is yes, then this word should be included in our result.

The natural approach that comes to mind is:

Pick a word from the array
Check if this word appears as a substring in any of the other words
If it does, add it to our answer

Why does this work? Because we're essentially performing an exhaustive search - we're not missing any potential substring relationships by checking every possible pair.

The condition i != j is crucial because we don't want to compare a word with itself (a word is always a substring of itself, but that's not what the problem is asking for).

Using Python's in operator makes the substring check elegant and simple - s in t directly tells us if string s is a substring of string t. The any() function helps us determine if at least one other word contains our current word as a substring, which is exactly what we need to decide whether to include it in our answer.

This brute force approach works well for this problem because we need to examine all possible pairs anyway to ensure we don't miss any substring relationships.

Solution Approach

Following the brute force enumeration strategy, we iterate through each string in the words array and check if it appears as a substring in any other string.

Here's the step-by-step implementation:

Initialize an empty result list: We create ans = [] to store all strings that are substrings of other strings.
Enumerate through all strings: We use enumerate(words) to get both the index i and the string s for each word in the array. The index is crucial to avoid comparing a string with itself.
Check substring relationship: For each string s at index i, we need to check if it appears in any other string in the array. This is accomplished using:
```
any(i != j and s in t for j, t in enumerate(words))
```
This expression does the following:
- Iterates through all strings again with index j and string t
- Checks two conditions:
  - i != j: Ensures we don't compare the string with itself
  - s in t: Checks if string s is a substring of string t
- The any() function returns True if at least one string t (where j != i) contains s as a substring
Add to result: If the any() function returns True, meaning the current string s is found as a substring in at least one other string, we append it to our answer list:
```
ans.append(s)
```
Return the result: After checking all strings, we return the ans list containing all strings that are substrings of other strings.

The time complexity is O(n² × m) where n is the number of words and m is the average length of the strings, since we check each pair of strings and the substring check takes O(m) time. The space complexity is O(k) where k is the number of strings that are substrings of others.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example with words = ["abc", "ab", "c", "abcd"].

Step 1: Initialize result list

ans = []

Step 2: Check first word "abc" (index 0)

Compare with "ab" (index 1): Is "abc" in "ab"? No
Compare with "c" (index 2): Is "abc" in "c"? No
Compare with "abcd" (index 3): Is "abc" in "abcd"? Yes!
Since we found a match, add "abc" to result: ans = ["abc"]

Step 3: Check second word "ab" (index 1)

Compare with "abc" (index 0): Is "ab" in "abc"? Yes!
Since we found a match, add "ab" to result: ans = ["abc", "ab"]

Step 4: Check third word "c" (index 2)

Compare with "abc" (index 0): Is "c" in "abc"? Yes!
Since we found a match, add "c" to result: ans = ["abc", "ab", "c"]

Step 5: Check fourth word "abcd" (index 3)

Compare with "abc" (index 0): Is "abcd" in "abc"? No
Compare with "ab" (index 1): Is "abcd" in "ab"? No
Compare with "c" (index 2): Is "abcd" in "c"? No
No matches found, so "abcd" is not added to result

Final Result: ["abc", "ab", "c"]

All three strings "abc", "ab", and "c" appear as substrings in at least one other string in the array. The string "abcd" is the only one that doesn't appear as a substring in any other string.

Solution Implementation

1class Solution:
2    def stringMatching(self, words: List[str]) -> List[str]:
3        """
4        Find all strings in the array that are substrings of another string.
5      
6        Args:
7            words: List of strings to check
8          
9        Returns:
10            List of strings that are substrings of at least one other string in the array
11        """
12        result = []
13      
14        # Iterate through each word with its index
15        for current_index, current_word in enumerate(words):
16            # Check if current word is a substring of any other word in the list
17            is_substring = False
18            for other_index, other_word in enumerate(words):
19                # Skip comparing the word with itself
20                if current_index != other_index:
21                    # Check if current word is a substring of other word
22                    if current_word in other_word:
23                        is_substring = True
24                        break
25          
26            # If current word is a substring of at least one other word, add to result
27            if is_substring:
28                result.append(current_word)
29              
30        return result
31

1class Solution {
2    /**
3     * Find all strings that are substrings of another string in the array.
4     * 
5     * @param words Array of strings to check
6     * @return List of strings that appear as substrings in other strings
7     */
8    public List<String> stringMatching(String[] words) {
9        // Initialize result list to store substring matches
10        List<String> result = new ArrayList<>();
11      
12        // Get the total number of words in the array
13        int wordCount = words.length;
14      
15        // Check each word to see if it's a substring of any other word
16        for (int currentIndex = 0; currentIndex < wordCount; ++currentIndex) {
17            // Compare current word with all other words in the array
18            for (int compareIndex = 0; compareIndex < wordCount; ++compareIndex) {
19                // Skip if comparing the same word with itself
20                // Check if current word is a substring of the compared word
21                if (currentIndex != compareIndex && words[compareIndex].contains(words[currentIndex])) {
22                    // Add to result if it's a substring of another word
23                    result.add(words[currentIndex]);
24                    // Break inner loop since we only need to find one match
25                    break;
26                }
27            }
28        }
29      
30        return result;
31    }
32}
33

1class Solution {
2public:
3    vector<string> stringMatching(vector<string>& words) {
4        // Result vector to store all substring words
5        vector<string> result;
6      
7        // Get the total number of words in the input vector
8        int wordCount = words.size();
9      
10        // Iterate through each word as a potential substring
11        for (int i = 0; i < wordCount; ++i) {
12            // Check if current word is a substring of any other word
13            for (int j = 0; j < wordCount; ++j) {
14                // Skip if comparing the same word with itself
15                // Check if words[i] is a substring of words[j]
16                if (i != j && words[j].find(words[i]) != string::npos) {
17                    // Found words[i] as substring, add to result
18                    result.push_back(words[i]);
19                    // Break inner loop as we only need to add once
20                    break;
21                }
22            }
23        }
24      
25        return result;
26    }
27};
28

1/**
2 * Finds all strings in the array that are substrings of another string in the array
3 * @param words - Array of strings to check
4 * @returns Array of strings that are substrings of other strings in the input array
5 */
6function stringMatching(words: string[]): string[] {
7    // Initialize result array to store substring matches
8    const result: string[] = [];
9  
10    // Get the total number of words
11    const wordCount: number = words.length;
12  
13    // Iterate through each word as a potential substring
14    for (let i = 0; i < wordCount; ++i) {
15        // Check if current word is a substring of any other word
16        for (let j = 0; j < wordCount; ++j) {
17            // Skip if comparing the same word with itself
18            // Add to result if current word is found within another word
19            if (i !== j && words[j].includes(words[i])) {
20                result.push(words[i]);
21                // Break inner loop once a match is found to avoid duplicates
22                break;
23            }
24        }
25    }
26  
27    return result;
28}
29

Time and Space Complexity

Time Complexity: O(n² × m) where n is the number of words and m is the average length of the words.

The outer loop iterates through all n words. For each word, the any() function checks against all other words (up to n-1 comparisons). Each substring check using the in operator takes O(m) time in the worst case, where m is the average length of the strings being compared. Therefore, the total time complexity is O(n × n × m) = O(n² × m).

Since the reference answer states O(n³), this suggests that n represents the total length of all strings combined (or treats the length of individual strings as proportional to n). In this interpretation, if we consider n as the total character count across all words, then the number of words would be at most n, and each substring operation would be O(n), giving us O(n × n × n) = O(n³).

Space Complexity: O(n) where n is either the number of words or the total length of the output list.

The space is primarily used by the ans list to store the results. In the worst case, all words except one could be substrings of another word, so the output list could contain up to n-1 elements. The enumerate function and loop variables use O(1) additional space. Therefore, the space complexity is O(n).

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Not Handling Duplicate Strings in the Array

One common pitfall is when the input array contains duplicate strings. If we have words = ["abc", "abc", "xyzabc"], a naive implementation might incorrectly identify "abc" as a substring of itself (the other "abc" in the array).

Why this happens: When checking if a string is a substring of others, we only verify i != j to avoid self-comparison. However, if the same string appears multiple times at different indices, the condition i != j would pass, and "abc" would match with its duplicate.

Solution: While the current implementation actually handles this correctly (since we want to find strings that appear as substrings in any other string, including duplicates), you should be aware that:

If duplicates exist and a string only appears as a substring of its own duplicates, it will still be included in the result
If you need to exclude such cases, you'd need to deduplicate the array first or add additional logic

2. Inefficient Repeated Substring Checks

The current implementation might check the same substring relationship multiple times unnecessarily, especially when dealing with larger arrays.

Why this happens: The nested loop structure checks every pair of strings, which can be redundant when we've already found that a string is a substring of another.

Solution: Use early termination with a break statement (as shown in the provided code) or use a set to track already-found substrings:

def stringMatching(self, words: List[str]) -> List[str]:
    result = set()  # Use set to avoid duplicates
  
    for i in range(len(words)):
        for j in range(len(words)):
            if i != j and words[i] in words[j]:
                result.add(words[i])
                break  # Early termination once found
  
    return list(result)

3. Not Considering Empty Strings or Single Character Edge Cases

The solution might behave unexpectedly with edge cases like empty strings ["", "abc", "def"] or single characters ["a", "aa", "aaa"].

Why this happens: An empty string is technically a substring of every string, and single characters might have special substring relationships that aren't immediately obvious.

Solution: Add validation for edge cases if needed:

def stringMatching(self, words: List[str]) -> List[str]:
    result = []
  
    for i, current_word in enumerate(words):
        # Skip empty strings if needed
        if not current_word:
            continue
          
        for j, other_word in enumerate(words):
            if i != j and current_word in other_word:
                result.append(current_word)
                break
  
    return result

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which of the following shows the order of node visit in a Breadth-first Search?

I don't know