Facebook Pixel

1813. Sentence Similarity III

Problem Description

You are given two strings sentence1 and sentence2, each representing a sentence composed of words. A sentence is a list of words that are separated by a single space with no leading or trailing spaces. Each word consists of only uppercase and lowercase English characters.

Two sentences are considered similar if you can insert an arbitrary sentence (possibly empty) inside one of these sentences to make them equal. The key requirement is that the inserted sentence must be separated from existing words by spaces.

For example:

  • sentence1 = "Hello Jane" and sentence2 = "Hello my name is Jane" are similar because you can insert "my name is" between "Hello" and "Jane" in sentence1 to make them equal.
  • sentence1 = "Frog cool" and sentence2 = "Frogs are cool" are not similar. Even though you could think of inserting "s are" into sentence1, this would require modifying the word "Frog" to "Frogs", which violates the rule that inserted text must be separated by spaces.

The task is to determine if sentence1 and sentence2 are similar according to this definition. Return true if they are similar, otherwise return false.

The solution uses a two-pointer approach. After splitting both sentences into word arrays, it checks how many words match from the beginning and from the end of both sentences. If the total number of matching words from both ends covers the entire shorter sentence, then the sentences are similar (since the remaining words in the longer sentence could be the "inserted" portion).

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that if two sentences are similar, one sentence can be obtained from the other by inserting a continuous block of words at some position. This means the beginning and/or ending portions of both sentences must match.

Think about it this way: if we can insert words into the shorter sentence to make it equal to the longer sentence, then the shorter sentence must be a "subsequence" of the longer sentence where the matching parts are either at the beginning, at the end, or split between both ends.

For instance, if we have "A B C D E" and "A B E", we can see that "A B" matches from the start and "E" matches from the end. The words "C D" in the middle of the longer sentence are what we would need to insert into the shorter sentence between "B" and "E".

This leads us to a two-pointer strategy:

  1. Count how many words match from the beginning of both sentences (let's call this i)
  2. Count how many words match from the end of both sentences (let's call this j)
  3. If i + j >= n (where n is the length of the shorter sentence), then the sentences are similar

Why does i + j >= n work? Because if the sum of matching words from both ends covers or exceeds the total words in the shorter sentence, it means all words in the shorter sentence have a corresponding match in the longer sentence, with possibly some words in between that could be the "inserted" portion. The >= handles the case where the matching portions might overlap, which would happen when the sentences are identical or when the insertion point is at the very beginning or end.

Learn more about Two Pointers patterns.

Solution Approach

The implementation follows a two-pointer approach to check if the sentences are similar:

Step 1: Parse and Prepare

  • Split both sentences into word arrays using split(): words1 and words2
  • Get the lengths m and n of the two word arrays
  • To simplify the logic, ensure words1 is always the longer array by swapping if necessary (if m < n, swap the arrays and their lengths)

Step 2: Match from the Beginning

  • Initialize pointer i = 0
  • While i < n and words1[i] == words2[i], increment i
  • This counts how many words match from the start of both sentences

Step 3: Match from the End

  • Initialize pointer j = 0
  • While j < n and words1[m - 1 - j] == words2[n - 1 - j], increment j
  • This counts how many words match from the end of both sentences
  • Note: We use m - 1 - j and n - 1 - j to access elements from the end

Step 4: Check Similarity Condition

  • Return i + j >= n
  • If the total number of matching words from both ends covers the entire shorter sentence, then the sentences are similar
  • The >= handles cases where the matching portions might overlap (when sentences are identical or when the insertion is at the very beginning or end)

Example Walkthrough:

  • sentence1 = "Hello Jane", sentence2 = "Hello my name is Jane"
  • After splitting: words1 = ["Hello", "Jane"], words2 = ["Hello", "my", "name", "is", "Jane"]
  • After ensuring longer array: words1 = ["Hello", "my", "name", "is", "Jane"], words2 = ["Hello", "Jane"]
  • Matching from start: i = 1 (only "Hello" matches)
  • Matching from end: j = 1 (only "Jane" matches)
  • Check: i + j = 2 >= 2 (length of shorter sentence), so return true

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to understand how the solution works.

Example:

  • sentence1 = "My name is Haley"
  • sentence2 = "My Haley"

Step 1: Parse and Prepare

  • Split into words:
    • words1 = ["My", "name", "is", "Haley"] (length m = 4)
    • words2 = ["My", "Haley"] (length n = 2)
  • Since m > n, no swap needed (words1 remains the longer array)

Step 2: Match from the Beginning

  • Start with i = 0
  • Compare words1[0] with words2[0]: "My" == "My" βœ“
  • Increment i to 1
  • Compare words1[1] with words2[1]: "name" != "Haley" βœ—
  • Stop here, so i = 1

Step 3: Match from the End

  • Start with j = 0
  • Compare words1[3] with words2[1]: "Haley" == "Haley" βœ“
  • Increment j to 1
  • Compare words1[2] with words2[0]: "is" != "My" βœ—
  • Stop here, so j = 1

Step 4: Check Similarity

  • Total matches: i + j = 1 + 1 = 2
  • Length of shorter sentence: n = 2
  • Since 2 >= 2, return true

This makes sense because we can insert "name is" between "My" and "Haley" in sentence2 to get sentence1. The algorithm correctly identifies that all words in the shorter sentence have matching positions in the longer sentence.

Solution Implementation

1class Solution:
2    def areSentencesSimilar(self, sentence1: str, sentence2: str) -> bool:
3        """
4        Check if two sentences are similar by determining if one can be obtained 
5        from the other by inserting a sentence in the middle.
6      
7        Args:
8            sentence1: First sentence as a string
9            sentence2: Second sentence as a string
10          
11        Returns:
12            True if sentences are similar, False otherwise
13        """
14        # Split sentences into word lists
15        words1 = sentence1.split()
16        words2 = sentence2.split()
17      
18        # Get lengths of both word lists
19        len1 = len(words1)
20        len2 = len(words2)
21      
22        # Ensure words1 is the longer list for easier comparison
23        if len1 < len2:
24            words1, words2 = words2, words1
25            len1, len2 = len2, len1
26      
27        # Count matching words from the beginning (prefix)
28        prefix_match_count = 0
29        while prefix_match_count < len2 and words1[prefix_match_count] == words2[prefix_match_count]:
30            prefix_match_count += 1
31      
32        # Count matching words from the end (suffix)
33        suffix_match_count = 0
34        while suffix_match_count < len2 and words1[len1 - 1 - suffix_match_count] == words2[len2 - 1 - suffix_match_count]:
35            suffix_match_count += 1
36      
37        # Check if the total matches cover the entire shorter sentence
38        # This means the shorter sentence can be formed by removing middle part of longer sentence
39        return prefix_match_count + suffix_match_count >= len2
40
1class Solution {
2    public boolean areSentencesSimilar(String sentence1, String sentence2) {
3        // Split both sentences into word arrays
4        String[] wordsArray1 = sentence1.split(" ");
5        String[] wordsArray2 = sentence2.split(" ");
6      
7        // Ensure wordsArray1 is the longer array for consistent comparison
8        if (wordsArray1.length < wordsArray2.length) {
9            String[] temp = wordsArray1;
10            wordsArray1 = wordsArray2;
11            wordsArray2 = temp;
12        }
13      
14        // Store lengths for easier access
15        int longerLength = wordsArray1.length;
16        int shorterLength = wordsArray2.length;
17      
18        // Count matching words from the beginning (prefix)
19        int prefixMatchCount = 0;
20        while (prefixMatchCount < shorterLength && 
21               wordsArray1[prefixMatchCount].equals(wordsArray2[prefixMatchCount])) {
22            prefixMatchCount++;
23        }
24      
25        // Count matching words from the end (suffix)
26        int suffixMatchCount = 0;
27        while (suffixMatchCount < shorterLength && 
28               wordsArray1[longerLength - 1 - suffixMatchCount].equals(
29                   wordsArray2[shorterLength - 1 - suffixMatchCount])) {
30            suffixMatchCount++;
31        }
32      
33        // Check if the total matched words cover the entire shorter sentence
34        // This means a sentence can be inserted in the middle of the longer one
35        return prefixMatchCount + suffixMatchCount >= shorterLength;
36    }
37}
38
1class Solution {
2public:
3    bool areSentencesSimilar(string sentence1, string sentence2) {
4        // Split both sentences into word vectors
5        vector<string> words1 = splitString(sentence1, ' ');
6        vector<string> words2 = splitString(sentence2, ' ');
7      
8        // Ensure words1 is the longer sentence for easier comparison
9        if (words1.size() < words2.size()) {
10            swap(words1, words2);
11        }
12      
13        int longerSize = words1.size();
14        int shorterSize = words2.size();
15      
16        // Count matching words from the beginning
17        int prefixMatchCount = 0;
18        while (prefixMatchCount < shorterSize && 
19               words1[prefixMatchCount] == words2[prefixMatchCount]) {
20            ++prefixMatchCount;
21        }
22      
23        // Count matching words from the end
24        int suffixMatchCount = 0;
25        while (suffixMatchCount < shorterSize && 
26               words1[longerSize - 1 - suffixMatchCount] == words2[shorterSize - 1 - suffixMatchCount]) {
27            ++suffixMatchCount;
28        }
29      
30        // Check if the shorter sentence can be formed by inserting a sentence
31        // in the middle of itself to get the longer sentence
32        // This happens when prefix + suffix matches cover all words in shorter sentence
33        return prefixMatchCount + suffixMatchCount >= shorterSize;
34    }
35
36private:
37    // Helper function to split a string by delimiter into a vector of words
38    vector<string> splitString(const string& str, char delimiter) {
39        stringstream stringStream(str);
40        string word;
41        vector<string> result;
42      
43        // Extract words separated by the delimiter
44        while (getline(stringStream, word, delimiter)) {
45            result.emplace_back(word);
46        }
47      
48        return result;
49    }
50};
51
1/**
2 * Determines if two sentences are similar by checking if one can be made equal to the other
3 * by inserting a sentence (possibly empty) in the middle.
4 * 
5 * @param sentence1 - First sentence string
6 * @param sentence2 - Second sentence string
7 * @returns true if sentences are similar, false otherwise
8 */
9function areSentencesSimilar(sentence1: string, sentence2: string): boolean {
10    // Split sentences into word arrays
11    const words1: string[] = sentence1.split(' ');
12    const words2: string[] = sentence2.split(' ');
13  
14    // Get the lengths of both word arrays
15    const length1: number = words1.length;
16    const length2: number = words2.length;
17
18    // Ensure words1 is the shorter array for consistent processing
19    if (length1 > length2) {
20        return areSentencesSimilar(sentence2, sentence1);
21    }
22
23    // Initialize pointers for matching from left and right
24    let leftMatchCount: number = 0;
25    let rightMatchCount: number = 0;
26  
27    // Iterate through the longer sentence to find matching prefixes and suffixes
28    for (let i = 0; i < length2; i++) {
29        // Count matching words from the left side
30        if (leftMatchCount === i && words1[i] === words2[i]) {
31            leftMatchCount++;
32        }
33      
34        // Count matching words from the right side
35        if (rightMatchCount === i && words2[length2 - i - 1] === words1[length1 - rightMatchCount - 1]) {
36            rightMatchCount++;
37        }
38    }
39
40    // Check if the total matched words cover the entire shorter sentence
41    return leftMatchCount + rightMatchCount >= length1;
42}
43

Time and Space Complexity

The time complexity is O(L), where L is the sum of the lengths of the two sentences. This is because:

  • Splitting both sentences takes O(L) time as we need to traverse through all characters to identify word boundaries
  • The two while loops together iterate through at most n words (the length of the shorter word list), and comparing words involves character-by-character comparison
  • In the worst case, we compare all characters in the sentences, which is bounded by O(L)

The space complexity is O(L), where L is the sum of the lengths of the two sentences. This is because:

  • The split() operation creates two new lists words1 and words2 that store all the words from both sentences
  • The total space needed to store these word lists is proportional to the total number of characters in both sentences
  • The additional variables m, n, i, and j only use O(1) extra space

Learn more about how to find time and space complexity quickly.

Common Pitfalls

Pitfall 1: Overcounting When Prefix and Suffix Overlap

The Problem: When the prefix and suffix matching regions overlap, we might count the same words twice. For example, if sentence1 = "A B C" and sentence2 = "A B C", the prefix match would count all 3 words, and the suffix match would also count all 3 words, giving us prefix_match_count + suffix_match_count = 6, which is greater than the actual length of 3.

Why This Isn't Actually a Problem: The current solution handles this correctly by checking prefix_match_count + suffix_match_count >= len2. The >= operator ensures that even if we overcount due to overlap, the condition still evaluates correctly. When sentences are identical or when the insertion point is at the very beginning or end, overlap is expected and handled properly.

Pitfall 2: Not Handling Empty Sentences

The Problem: If either sentence is empty or contains only whitespace, the split operation might produce unexpected results or the algorithm might not handle edge cases correctly.

Solution: Add explicit checks for empty or whitespace-only strings:

def areSentencesSimilar(self, sentence1: str, sentence2: str) -> bool:
    # Handle edge cases with empty or whitespace strings
    if not sentence1.strip() or not sentence2.strip():
        return sentence1.strip() == sentence2.strip()
  
    words1 = sentence1.split()
    words2 = sentence2.split()
    # ... rest of the code remains the same

Pitfall 3: Incorrect Index Calculation When Matching from End

The Problem: A common mistake is using incorrect indices when matching from the end. For instance, using words1[len1 - j] instead of words1[len1 - 1 - j] would cause an off-by-one error, potentially leading to index out of bounds or incorrect comparisons.

Solution: Always remember that for 0-indexed arrays:

  • The last element is at index length - 1
  • The j-th element from the end is at index length - 1 - j
# Correct way to access from the end
while suffix_match_count < len2 and words1[len1 - 1 - suffix_match_count] == words2[len2 - 1 - suffix_match_count]:
    suffix_match_count += 1

Pitfall 4: Not Ensuring Consistent Array Order

The Problem: If you forget to swap the arrays when len1 < len2, the algorithm logic breaks because it assumes words1 is always the longer (or equal length) array. This could lead to incorrect results or index out of bounds errors.

Solution: Always ensure the arrays are ordered consistently before processing:

# Essential swap to maintain the invariant that words1 is longer
if len1 < len2:
    words1, words2 = words2, words1
    len1, len2 = len2, len1

Pitfall 5: Misunderstanding the Problem Requirements

The Problem: A common misunderstanding is thinking that word modifications are allowed (like changing "Frog" to "Frogs"). The problem strictly requires that insertions must be complete words separated by spaces, not modifications to existing words.

Solution: The two-pointer approach naturally handles this correctly by only comparing complete words. No partial word matching or modification is performed - words must match exactly or they don't match at all.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which data structure is used to implement priority queue?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More