737. Sentence Similarity II


Problem Description

The problem provides us with two sentences, each represented as an array of words, and an array of word pairs called similarPairs. Each pair consists of words that are considered similar to each other. The task is to determine if the two sentences are similar by adhering to two conditions:

  1. Both sentences must be of the same length, meaning they should contain the same number of words.
  2. For each corresponding pair of words from sentence1 and sentence2, the words must be similar. It's given that the similarity relationship is transitive. That means if a is similar to b and b is similar to c, then a is similar to c.

A word is always similar to itself. The output should be true if the sentences are similar, otherwise false.

Intuition

To solve this problem, we need to figure out a way to keep track of which words are similar to each other. This requires not just checking direct similarities, but also indirect (transitive) connections between words.

The intuition behind the solution is to use Union-Find, a data structure that efficiently handles the equivalence relations, which is precisely what we need to determine the similarity between words. The core idea of Union-Find is to group similar words together so that we can quickly find out if two words are in the same group.

Here is how we approach the solution:

  • First, we need to check if both sentences are of the same length. If not, we can immediately return false.
  • Then, we initialize Union-Find. For that purpose, we use a list called p, which will serve as the parent pointer for each element. At the start, each element is its own parent, indicating different sets. To efficiently track the indices for each word, a dictionary named words maps each word to a unique index.
  • We iterate through each pair in similarPairs and perform the union operation, merging the sets that contain the two words in each pair.
  • Next, we compare words in the same positions from both sentences. If they are the same, we move on to the next pair. If not, we check if they are indirectly similar through the Union-Find data structure by finding their root parents and comparing them. If they have different root parents, we conclude the sentences are not similar and return false.
  • If none of the above returns false, it means all words are sufficiently similar, and we return true.

By using Union-Find, we can handle the transitivity of similarity and quickly query any two words to see if they are part of the same group, thus determining if the sentences are similar or not.

Learn more about Depth-First Search, Breadth-First Search and Union Find patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece๏ผš

Which of the following is a min heap?

Solution Approach

The solution to this problem uses the Union-Find algorithm. Let's break down how the code implements this algorithm and how it applies to the problem.

  • Initially, a p list is created, using list comprehension, with n << 1 elements where n is the number of pairs in similarPairs. The expression n << 1 is equivalent to multiplying n by 2 since it shifts the bits of n to the left by one, effectively doubling the number. This creates a list with indices from 0 to 2n - 1, and each element is initialized to its own index, representing the initial parent of itself.

  • The find function is a recursive function that takes an index x and finds the root parent of x. It does so by checking if p[x] is not equal to x, which means x is not its own parent and it must have a parent somewhere up in its ancestor chain. The function continues to call itself with the parent of x until it reaches the root parent. To optimize this process for future lookups, path compression is used by setting p[x] to the root parent found. This means the next time we call find on x, it will directly reference its root parent.

  • The words dictionary is used to assign each unique word a unique index. When we encounter a new word in similarPairs, we assign it the next available index in the sequence.

  • To merge two word sets, the union operation is performed by setting the parent of one representative (found by find) to be the representative of the other. This effectively merges the sets, making them part of a single group.

  • The main logic iterates through each pair of words in sentence1 and sentence2. If the current words are the same, they are trivially similar, and the algorithm moves on to the next pair. Otherwise, if one of the words is not in the words dictionary or if the root parents of the two words differ (found via find), the sentences are not similar, and the algorithm returns false.

If all word pairs are found to be similar, the algorithm returns true after iterating through all word pairs. This means that all pairs have the same root parent (are in the same set), satisfying the similarity condition according to the transitivity property of the given relationship.

By using a combination of Union-Find with path compression and an efficient mapping of words to indices, the solution effectively checks the similarity of the sentences in relatively quick operations for each word pair.

In summary, the solution approach employs the Union-Find algorithm with path compression to determine if two sentences represented by arrays of words are similar based on the similarity of word pairs provided.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which one best describes the time complexity of the following code?

1int factorial(int n) {
2  if (n < 0) {
3    return -1;
4  } else if (n == 0) {
5    return 1;
6  } else {
7    return n * factorial(n - 1);
8  }
9}

Example Walkthrough

Let's consider an example to illustrate how the Union-Find algorithm is applied to determine if two sentences are similar.

Suppose we have the following inputs:

  • sentence1 = ["great", "acting", "skills"]
  • sentence2 = ["fine", "drama", "talent"]
  • similarPairs = [("great", "good"), ("good", "fine"), ("acting", "drama"), ("skills", "talent")]
  1. Check Length: First, we verify that both sentences are the same length. Here, both have three words, so we can proceed.

  2. Initialize Union-Find: We initialize p with indices up to 2n - 1, where n is the number of unique words across all similarPairs. In this case, n = 6 (since "great", "good", "fine", "acting", "drama", and "skills" are all unique), so p = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11].

  3. Map Words to Indices: We create a words dictionary where each word is mapped to a unique index. It would look like this after processing all similarPairs:

    1{
    2    "great": 0,
    3    "good": 1,
    4    "fine": 2,
    5    "acting": 3,
    6    "drama": 4,
    7    "skills": 5,
    8    "talent": 6
    9}
  4. Union operation: We iterate through similarPairs and perform union operations using the indices from words dictionary. After performing unions, the representative parents might look like this:

    1// "good" and "great" are similar.
    2p[find(1)] = find(0) // p[1] = 0
    3
    4// "good" and "fine" are similar, and "good" is linked to "great", so indirectly "fine" is linked to "great".
    5p[find(2)] = find(1) // p[2] = 0
    6
    7// "acting" and "drama" are similar.
    8p[find(4)] = find(3) // p[4] = 3
    9
    10// "skills" and "talent" are similar.
    11p[find(6)] = find(5) // p[6] = 5
  5. Comparing Sentences: Finally, we compare sentence1 and sentence2 word by word.

    • For "great" and "fine", look up each word in the words dictionary, get their indices, and use the find function to check if they have the same parent. Since "fine" has been indirectly linked to "great", they have the same parent.
    • For "acting" and "drama", do the same, and it will reveal that they have the same parent because they have been directly linked.
    • For "skills" and "talent", they have also been directly linked, so they share the same parent.

    Since for all pairs of words, either they are the same word or they belong to the same set (have the same representative parent in Union-Find terminologies), the algorithm will return true.

    The execution concludes that sentence1 and sentence2 are indeed similar.

Solution Implementation

1from typing import List
2
3class Solution:
4    def areSentencesSimilarTwo(self, sentence1: List[str], sentence2: List[str], similarPairs: List[List[str]]) -> bool:
5        # Check if sentences are of different lengths, return False directly if they are
6        if len(sentence1) != len(sentence2):
7            return False
8      
9        # Initialize the parent list for union-find
10        numPairs = len(similarPairs)
11        parent = list(range(numPairs * 2))
12      
13        # Union-find find function to find the root of the set to which 'x' belongs
14        def find(x):
15            if parent[x] != x:
16                parent[x] = find(parent[x])
17            return parent[x]
18      
19        # Dictionary to store the mapping from word to its index in the union-find structure
20        wordsToIndex = {}
21        index = 0
22        # Iterate over similar pairs and perform union operations
23        for a, b in similarPairs:
24            # If the word 'a' is not in the dictionary, add it and assign a corresponding index
25            if a not in wordsToIndex:
26                wordsToIndex[a] = index
27                index += 1
28            # If the word 'b' is not in the dictionary, add it and assign a corresponding index
29            if b not in wordsToIndex:
30                wordsToIndex[b] = index
31                index += 1
32            # Link the indices of the two words in the parent array to denote their similarity
33            pa, pb = find(wordsToIndex[a]), find(wordsToIndex[b])
34            parent[pa] = pb
35      
36        # Iterate through the sentences and check if each pair of words is similar
37        for word1, word2 in zip(sentence1, sentence2):
38            # If the words are identical, continue
39            if word1 == word2:
40                continue
41            # If either word is not in the wordsToIndex, or their roots are not the same, return False
42            if (word1 not in wordsToIndex or
43                    word2 not in wordsToIndex or
44                    find(wordsToIndex[word1]) != find(wordsToIndex[word2])):
45                return False
46      
47        # If all word pairs are either the same or similar, return True
48        return True
49
1class Solution {
2    private int[] parents; // Array to keep track of the root parent of each element in the Union-Find data structure
3
4    // Method to check if two sentences are similar based on similar word pairs
5    public boolean areSentencesSimilarTwo(
6        String[] sentence1, String[] sentence2, List<List<String>> similarPairs) {
7
8        // Sentences must be of the same length to be similar
9        if (sentence1.length != sentence2.length) {
10            return false;
11        }
12
13        // Initialize Union-Find data structure
14        int pairCount = similarPairs.size();
15        parents = new int[pairCount << 1]; // Twice the size, as each pair contains two elements
16        for (int i = 0; i < parents.length; ++i) {
17            parents[i] = i;
18        }
19
20        // Map to store the index of each unique word
21        Map<String, Integer> wordIndexMap = new HashMap<>();
22        int index = 0;
23        // Create the connections in Union-find for each similar pair
24        for (List<String> pair : similarPairs) {
25            String word1 = pair.get(0);
26            String word2 = pair.get(1);
27
28            // If the word is not already in our map, add it with a unique index
29            if (!wordIndexMap.containsKey(word1)) {
30                wordIndexMap.put(word1, index++);
31            }
32            if (!wordIndexMap.containsKey(word2)) {
33                wordIndexMap.put(word2, index++);
34            }
35
36            // Union the two words in the Union-Find data structure
37            parents[find(wordIndexMap.get(word1))] = find(wordIndexMap.get(word2));
38        }
39
40        // Check for each pair of words in the sentences if they are similar
41        for (int i = 0; i < sentence1.length; ++i) {
42            // If words are identical, they are trivially similar
43            if (Objects.equals(sentence1[i], sentence2[i])) {
44                continue;
45            }
46            // If either word is not known, or their roots in Union-Find are different, they are not similar
47            if (!wordIndexMap.containsKey(sentence1[i]) || !wordIndexMap.containsKey(sentence2[i]) ||
48                find(wordIndexMap.get(sentence1[i])) != find(wordIndexMap.get(sentence2[i]))) {
49                return false;
50            }
51        }
52        // If all word pairs are similar, sentences are similar
53        return true;
54    }
55
56    // Helper method to find the root parent in the Union-Find structure
57    private int find(int x) {
58        if (parents[x] != x) {
59            parents[x] = find(parents[x]); // Path compression for optimization
60        }
61        return parents[x];
62    }
63}
64
1#include <vector>
2#include <string>
3#include <unordered_map>
4using namespace std;
5
6class Solution {
7public:
8    vector<int> parent;
9
10    // Helper function to perform find operation in Union-Find
11    int find(int x) {
12        if (parent[x] != x)
13            parent[x] = find(parent[x]); // Path compression
14        return parent[x];
15    }
16
17    // Main function to check if two sentences are similar
18    bool areSentencesSimilarTwo(vector<string>& sentence1, vector<string>& sentence2, vector<vector<string>>& similarPairs) {
19        // Sentences are not similar if they are of different lengths
20        if (sentence1.size() != sentence2.size())
21            return false;
22    
23        // Initialize Union-Find structure
24        int n = similarPairs.size();
25        parent.resize(n << 1);
26        for (int i = 0; i < parent.size(); ++i)
27            parent[i] = i;
28
29        // Mapping each word to an index
30        unordered_map<string, int> wordIndex;
31        int idx = 0;
32        for (auto& pair : similarPairs) {
33            string word1 = pair[0], word2 = pair[1];
34            // Assign new index if word hasn't been seen before
35            if (!wordIndex.count(word1))
36                wordIndex[word1] = idx++;
37            if (!wordIndex.count(word2))
38                wordIndex[word2] = idx++;
39            // Perform union operation for each similar pair
40            parent[find(wordIndex[word1])] = find(wordIndex[word2]);
41        }
42
43        // Check if each corresponding pair of words in the two sentences are similar
44        for (int i = 0; i < sentence1.size(); ++i) {
45            if (sentence1[i] == sentence2[i]) // Words are identical
46                continue;
47          
48            // If either word isn't in the Union-Find structure or they are not connected,
49            // the sentences are not similar.
50            if (!wordIndex.count(sentence1[i]) || !wordIndex.count(sentence2[i]) || find(wordIndex[sentence1[i]]) != find(wordIndex[sentence2[i]]))
51                return false;
52        }
53      
54        // All checks passed, sentences are similar
55        return true;
56    }
57};
58
1import { string } from "prop-types";
2
3// Define a vector-like structure to emulate the C++ vector behavior.
4let parent: number[] = [];
5
6// Helper function to perform find operation in Union-Find
7function find(x: number): number {
8    if (parent[x] != x)
9        parent[x] = find(parent[x]); // Path compression to optimize future searches
10    return parent[x];
11}
12
13// Function to check if two sentences are similar
14function areSentencesSimilarTwo(sentence1: string[], sentence2: string[], similarPairs: [string, string][]): boolean {
15    // Sentences are not similar if they are of different lengths
16    if (sentence1.length !== sentence2.length)
17        return false;
18  
19    // Initialize Union-Find structure
20    const n = similarPairs.length;
21    parent = Array.from({ length: n << 1 }, (_, i) => i);
22
23    // Mapping each word to an index
24    const wordIndex: { [word: string]: number } = {};
25    let idx = 0;
26    for (const pair of similarPairs) {
27        const [word1, word2] = pair;
28        // Assign new index if word hasn't been seen before
29        if (!(word1 in wordIndex))
30            wordIndex[word1] = idx++;
31        if (!(word2 in wordIndex))
32            wordIndex[word2] = idx++;
33        // Perform union operation for each similar pair
34        parent[find(wordIndex[word1])] = find(wordIndex[word2]);
35    }
36
37    // Check if each corresponding pair of words in the two sentences are similar
38    for (let i = 0; i < sentence1.length; i++) {
39        if (sentence1[i] === sentence2[i]) // Words are identical
40            continue;
41      
42        // If either word isn't in the Union-Find structure or they are not connected,
43        // the sentences are not similar.
44        if (!(sentence1[i] in wordIndex) || !(sentence2[i] in wordIndex) || find(wordIndex[sentence1[i]]) !== find(wordIndex[sentence2[i]]))
45            return false;
46    }
47  
48    // All checks passed, sentences are similar
49    return true;
50}
51
Not Sure What to Study? Take the 2-min Quiz๏ผš

What data structure does Breadth-first search typically uses to store intermediate states?

Time and Space Complexity

Time Complexity

The time complexity of the provided code is analyzed as follows:

  1. Initializing the parent array p with n << 1 elements takes O(2n) time where n is the number of similar pairs. Since each pair provides two elements, we have a maximum of 2n unique words.
  2. The find function is a path compression optimized union-find operation. This results in an amortized time complexity of O(ฮฑ(n)) per find operation, where ฮฑ is the inverse Ackermann function, which is a very slow-growing function and can be considered almost constant for all practical purposes.
  3. Building the words dictionary for all unique words in similarPairs takes O(n) operations, as we're doing constant time insertions.
  4. The union operations in the loop for creating the words dictionary will run n times which will have a complexity of O(nฮฑ(n)), since each union (by the find operation) has a complexity of O(ฮฑ(n)).
  5. The for loop to compare actual sentences runs len(sentence1) times. In the worst case, each look-up and each find operation is O(ฮฑ(n)). Since there are two find operations for each word-pair comparison, the total time complexity for this part is O(2mฮฑ(n)), where m is the number of words in sentence1/sentence2.

Therefore, the overall time complexity of the code is:

O(2n) + O(n) + O(nฮฑ(n)) + O(2mฮฑ(n))

Simplifying, since ฮฑ(n) is a very slow-growing function and can be treated as almost constant:

O(2n + n + n + 2m)

O(4n + 2m)

The time complexity is essentially linear with respect to the total number of elements in similarPairs and the length of the sentence1 and sentence2. When assuming ฮฑ(n) is constant, this can be simplified further to:

O(n + m)

Space Complexity

For space complexity, the following is taken into consideration:

  1. The parent array p has a size of 2n, which is O(2n).
  2. The words dictionary storage depends on the number of unique words across similarPairs, so in the worst case if all words are unique, it will be O(2n).
  3. The find function uses recursive stack space, but due to path compression, the depth will be very shallow on average, contributing insignificantly to the overall space complexity.

Hence, the overall space complexity of the program is:

O(2n) + O(2n)

Which simplifies to:

O(4n)

Considering just the unique elements count, this is:

O(n)

The space complexity is linear with respect to the number of unique elements present in similarPairs.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which data structure is used in a depth first search?


Recommended Readings


Got a question?ย Ask the Teaching Assistantย anything you don't understand.

Still not clear? Ask in the Forum, ย Discordย orย Submitย the part you don't understand to our editors.

โ†
โ†‘TA ๐Ÿ‘จโ€๐Ÿซ