1813. Sentence Similarity III


Problem Description

A sentence in this context is a collection of words that are separated by a single space and do not have any spaces at the beginning or end. These words consist solely of uppercase and lowercase English letters. Two sentences are described as "similar" if you can insert any arbitrary sentence (which might even be empty) into one of them such that both sentences are exactly the same in the end. This problem asks us to determine if two given sentences are similar according to this definition.

Intuition

To determine if two sentences are similar, we can look for a common starting sequence of words and a common ending sequence of words between them. If we have these sequences, whatever is left in the middle of one sentence can potentially be the insert needed to make the sentences similar.

We begin by splitting both sentences into two arrays of words. Then, we compare the words at the beginning of both arrays until we find a pair that doesn't match. We then do the same from the end of both arrays. If sufficiently many words from the beginning and end overlap (the sum of the number of matching words from the beginning and end is greater than or equal to the total words in the shorter sentence), it means all the words in the shorter sentence are included in the longer sentence in the right order, and the sentences are similar.

Let's break this down further:

  • We split the sentences into words and compare them from the start.
  • We keep a count of how many words are the same from the start.
  • Next, we compare the sentences from the end.
  • We then count how many words are the same from the end.
  • If the sum of these two counts is greater than or equal to the length of the shorter sentence, we can conclude that the sentences are indeed similar.

This approach works because we essentially check if the words of the shorter sentence can be found in the longer sentence in the same order, allowing for an arbitrary insertion in the longer one that doesn't disrupt the sequence of the common words.

Learn more about Two Pointers patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

What is the best way of checking if an element exists in an unsorted array once in terms of time complexity? Select the best that applies.

Solution Approach

The implementation of the solution uses a two-pointer technique and basic list operations provided by Python. The code first splits the given sentences into two lists (words1 and words2) using split(), which is a string method that divides a string into a list at each space.

With two pointers (i and j initialized to 0), the algorithm operates in two main stages:

  1. Check from the beginning of both lists for similar words. Increment pointer i as long as words1[i] is equal to words2[i]. This loop continues until a mismatch is encountered or the end of one of the lists is reached.

  2. Check from the end of both lists for similar words. Just like the first stage, increment pointer j as long as the last elements of both lists match (words1[m - 1 - j] is equal to words2[n - 1 - j]). The index calculation here is adjusted for zero-based indexing and goes in reverse because of the backwards check.

The lengths of the lists (m for words1 and n for words2) are taken into account, and if words1 is shorter than words2, they are swapped. This is to make sure that we are always trying to fit the shorter list into the longer one.

Finally, the condition is checked: if the sum of i and j is greater than or equal to n (the length of the shorter list after a potential swap), the sentences are determined to be similar and True is returned; else False is returned.

By using the two-pointer technique, we avoid unnecessary checks and manage to achieve the comparison with a linear time complexity relative to the length of the sentences.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Depth first search can be used to find whether two components in a graph are connected.

Example Walkthrough

To illustrate the solution approach, let's consider the following two sentences:

  • Sentence A: I have a fast car
  • Sentence B: I have a very fast car today

First, we need to split both sentences into arrays of words to compare them word by word. So, Sentence A becomes ["I", "have", "a", "fast", "car"] and Sentence B becomes ["I", "have", "a", "very", "fast", "car", "today"].

Next, the algorithm uses two pointers. We'll describe each step:

  1. From the Beginning: Starting with pointer i at index 0, we compare the words at the current index in both sentences.

    • words1[0] is I, and words2[0] is also I. Since they are the same, we increment i to 1.
    • This process repeats until words1[i] does not equal words2[i].
    • In our case, they match until i is 3 (pointing to the word fast), and at index 4 they diverge: words1[4] is car whereas words2[4] is very.
  2. From the End: Now, we initialize pointer j to 0, and we'll compare the words starting from the end of both sentences.

    • words1[4 - 0] ("car") is the same as words2[6 - 0] ("today"). It's not a match, so j remains 0 for now.
    • We then check the next word from the end, so we increment j to 1 and compare words1[4 - 1] ("fast") with words2[6 - 1] ("car").
    • They still don't match, so we increment j again.
    • Now, words1[4 - 2] ("a") matches with words2[6 - 2] ("fast"). There's still no match.
    • Continuing this process, we find that words1[4 - 3] ("have") matches with words2[6 - 3] ("very"), and so on. However, for this example, no other matches from the end will be found.

After this process, we have i equal to 3 (since 3 words matched from the start) and j equal to 0 (since no words matched from the end).

We compare the sum of i and j to the length of the shorter list, which is len(words1) or 5.

  • i + j is 3 + 0, which is 3.
  • Since 3 is less than 5, the condition i + j >= len(words1) is not met.

Therefore, the sentences are not similar according to the definition, and the algorithm would return False in this case. If instead, B was "I have a fast car today", the result would have been True, as all words in A could be found in B with the addition of "today" at the end.

Not Sure What to Study? Take the 2-min Quiz:

What is the best way of checking if an element exists in an unsorted array once in terms of time complexity? Select the best that applies.

Python Solution

1class Solution:
2    def are_sentences_similar(self, sentence1: str, sentence2: str) -> bool:
3        # Split the sentences into lists of words.
4        words1, words2 = sentence1.split(), sentence2.split()
5      
6        # Determine the length of each list.
7        len_words1, len_words2 = len(words1), len(words2)
8      
9        # Ensure that words1 is the longer list.
10        if len_words1 < len_words2:
11            words1, words2 = words2, words1
12            len_words1, len_words2 = len_words2, len_words1
13      
14        # Initialize two pointers to compare words from the beginning and end.
15        start_index = end_index = 0
16      
17        # Compare words from the beginning of both lists.
18        while start_index < len_words2 and words1[start_index] == words2[start_index]:
19            start_index += 1
20      
21        # Compare words from the end of both lists.
22        while end_index < len_words2 and words1[len_words1 - 1 - end_index] == words2[len_words2 - 1 - end_index]:
23            end_index += 1
24      
25        # Check if all words have been matched when appending the shorter list at any position in the longer list.
26        return start_index + end_index >= len_words2
27

Java Solution

1class Solution {
2    public boolean areSentencesSimilar(String sentence1, String sentence2) {
3        // Split both sentences into arrays of words
4        String[] words1 = sentence1.split(" ");
5        String[] words2 = sentence2.split(" ");
6
7        // Ensure words1 is the longer array
8        if (words1.length < words2.length) {
9            String[] temp = words1;
10            words1 = words2;
11            words2 = temp;
12        }
13
14        // Initialize lengths and indices
15        int length1 = words1.length; // Length of the longer array
16        int length2 = words2.length; // Length of the shorter array
17        int startMatchCount = 0; // Count how many words match from the beginning
18        int endMatchCount = 0; // Count how many words match from the end
19
20        // Count matching words from the start
21        while (startMatchCount < length2 && words1[startMatchCount].equals(words2[startMatchCount])) {
22            startMatchCount++;
23        }
24
25        // Count matching words from the end
26        while (endMatchCount < length2 && words1[length1 - 1 - endMatchCount].equals(words2[length2 - 1 - endMatchCount])) {
27            endMatchCount++;
28        }
29
30        // Check if the total matching words are at least the number of words in the shorter sentence
31        return startMatchCount + endMatchCount >= length2;
32    }
33}
34

C++ Solution

1#include <vector>
2#include <string>
3#include <sstream>
4#include <algorithm>
5
6class Solution {
7public:
8    // Determine if two sentences are similar according to the problem's definition.
9    bool areSentencesSimilar(std::string sentence1, std::string sentence2) {
10        // Split the sentences into words.
11        std::vector<std::string> words1 = split(sentence1, ' ');
12        std::vector<std::string> words2 = split(sentence2, ' ');
13
14        // Ensure that words1 is the longer sentence to simplify the logic.
15        if (words1.size() < words2.size()) {
16            std::swap(words1, words2);
17        }
18
19        int length1 = words1.size(); // Length of the longer sentence.
20        int length2 = words2.size(); // Length of the shorter sentence.
21
22        int prefixMatch = 0; // Number of matching words from the start.
23        int suffixMatch = 0; // Number of matching words from the end.
24
25        // Count the number of matching words from the start of both sentences.
26        while (prefixMatch < length2 && words1[prefixMatch] == words2[prefixMatch]) {
27            ++prefixMatch;
28        }
29
30        // Count the number of matching words from the end of both sentences.
31        while (suffixMatch < length2 && words1[length1 - 1 - suffixMatch] == words2[length2 - 1 - suffixMatch]) {
32            ++suffixMatch;
33        }
34
35        // If the sum of matches is greater than or equal to length2, the sentences are similar.
36        return prefixMatch + suffixMatch >= length2;
37    }
38
39private:
40    // Utility function to split a string by a delimiter, returning a vector of words.
41    std::vector<std::string> split(std::string& s, char delimiter) {
42        std::stringstream stream(s);
43        std::string item;
44        std::vector<std::string> result;
45
46        while (std::getline(stream, item, delimiter)) {
47            result.emplace_back(item);
48        }
49
50        return result;
51    }
52};
53

Typescript Solution

1function areSentencesSimilar(sentence1: string, sentence2: string): boolean {
2    // Split the sentences into arrays of words.
3    const words1 = sentence1.split(' ');
4    const words2 = sentence2.split(' ');
5
6    // If the first sentence has fewer words than the second, swap them to standardize the processing.
7    if (words1.length < words2.length) {
8        return areSentencesSimilar(sentence2, sentence1);
9    }
10
11    // Record the lengths of the word arrays.
12    const words1Length = words1.length;
13    const words2Length = words2.length;
14
15    // Initialize pointers for traversing the words from the start and from the end.
16    let forwardIndex = 0;
17    let backwardIndex = 0;
18
19    // Move the forward pointer as long as the words are similar from the start.
20    while (forwardIndex < words2Length && words1[forwardIndex] === words2[forwardIndex]) {
21        forwardIndex++;
22    }
23
24    // Move the backward pointer as long as the words are similar from the end.
25    while (
26        backwardIndex < words2Length &&
27        words1[words1Length - 1 - backwardIndex] === words2[words2Length - 1 - backwardIndex]
28    ) {
29        backwardIndex++;
30    }
31
32    // Determine if the combined length of similar words from start and end covers the shorter sentence.
33    // Return true if all words from the shorter sentence are covered in the sequence, thus similar.
34    return forwardIndex + backwardIndex >= words2Length;
35}
36
Fast Track Your Learning with Our Quick Skills Quiz:

What data structure does Breadth-first search typically uses to store intermediate states?

Time and Space Complexity

The time complexity of the given areSentencesSimilar function is O(L) where L is the sum of the lengths of the two input sentences. This is because the primary computation in the function involves splitting the sentences into words and then iterating over the word lists, which happens linearly relative to the size of the sentences.

The space complexity of the function is also O(L) since it creates two arrays words1 and words2 to store the words from sentence1 and sentence2, respectively. The size of these arrays is directly proportionate to the length of the input sentences, hence the linear space complexity.

Learn more about how to find time and space complexity quickly.


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫