# 737. Sentence Similarity II

## Problem Description

The problem provides us with two sentences, each represented as an array of words, and an array of word pairs called `similarPairs`. Each pair consists of words that are considered similar to each other. The task is to determine if the two sentences are similar by adhering to two conditions:

1. Both sentences must be of the same length, meaning they should contain the same number of words.
2. For each corresponding pair of words from `sentence1` and `sentence2`, the words must be similar. It's given that the similarity relationship is transitive. That means if `a` is similar to `b` and `b` is similar to `c`, then `a` is similar to `c`.

A word is always similar to itself. The output should be `true` if the sentences are similar, otherwise `false`.

## Intuition

To solve this problem, we need to figure out a way to keep track of which words are similar to each other. This requires not just checking direct similarities, but also indirect (transitive) connections between words.

The intuition behind the solution is to use Union-Find, a data structure that efficiently handles the equivalence relations, which is precisely what we need to determine the similarity between words. The core idea of Union-Find is to group similar words together so that we can quickly find out if two words are in the same group.

Here is how we approach the solution:

• First, we need to check if both sentences are of the same length. If not, we can immediately return `false`.
• Then, we initialize Union-Find. For that purpose, we use a list called `p`, which will serve as the parent pointer for each element. At the start, each element is its own parent, indicating different sets. To efficiently track the indices for each word, a dictionary named `words` maps each word to a unique index.
• We iterate through each pair in `similarPairs` and perform the union operation, merging the sets that contain the two words in each pair.
• Next, we compare words in the same positions from both sentences. If they are the same, we move on to the next pair. If not, we check if they are indirectly similar through the Union-Find data structure by finding their root parents and comparing them. If they have different root parents, we conclude the sentences are not similar and return `false`.
• If none of the above returns `false`, it means all words are sufficiently similar, and we return `true`.

By using Union-Find, we can handle the transitivity of similarity and quickly query any two words to see if they are part of the same group, thus determining if the sentences are similar or not.

## Solution Approach

The solution to this problem uses the Union-Find algorithm. Let's break down how the code implements this algorithm and how it applies to the problem.

• Initially, a `p` list is created, using list comprehension, with `n << 1` elements where `n` is the number of pairs in `similarPairs`. The expression `n << 1` is equivalent to multiplying `n` by 2 since it shifts the bits of `n` to the left by one, effectively doubling the number. This creates a list with indices from `0` to `2n - 1`, and each element is initialized to its own index, representing the initial parent of itself.

• The `find` function is a recursive function that takes an index `x` and finds the root parent of `x`. It does so by checking if `p[x]` is not equal to `x`, which means `x` is not its own parent and it must have a parent somewhere up in its ancestor chain. The function continues to call itself with the parent of `x` until it reaches the root parent. To optimize this process for future lookups, path compression is used by setting `p[x]` to the root parent found. This means the next time we call `find` on `x`, it will directly reference its root parent.

• The `words` dictionary is used to assign each unique word a unique index. When we encounter a new word in `similarPairs`, we assign it the next available index in the sequence.

• To merge two word sets, the union operation is performed by setting the parent of one representative (found by `find`) to be the representative of the other. This effectively merges the sets, making them part of a single group.

• The main logic iterates through each pair of words in `sentence1` and `sentence2`. If the current words are the same, they are trivially similar, and the algorithm moves on to the next pair. Otherwise, if one of the words is not in the `words` dictionary or if the root parents of the two words differ (found via `find`), the sentences are not similar, and the algorithm returns `false`.

If all word pairs are found to be similar, the algorithm returns `true` after iterating through all word pairs. This means that all pairs have the same root parent (are in the same set), satisfying the similarity condition according to the transitivity property of the given relationship.

By using a combination of Union-Find with path compression and an efficient mapping of words to indices, the solution effectively checks the similarity of the sentences in relatively quick operations for each word pair.

In summary, the solution approach employs the Union-Find algorithm with path compression to determine if two sentences represented by arrays of words are similar based on the similarity of word pairs provided.

๐ช
Level Up Your
Algo Skills

### Example Walkthrough

Let's consider an example to illustrate how the Union-Find algorithm is applied to determine if two sentences are similar.

Suppose we have the following inputs:

• `sentence1 = ["great", "acting", "skills"]`
• `sentence2 = ["fine", "drama", "talent"]`
• `similarPairs = [("great", "good"), ("good", "fine"), ("acting", "drama"), ("skills", "talent")]`
1. Check Length: First, we verify that both sentences are the same length. Here, both have three words, so we can proceed.

2. Initialize Union-Find: We initialize `p` with indices up to `2n - 1`, where `n` is the number of unique words across all `similarPairs`. In this case, `n = 6` (since "great", "good", "fine", "acting", "drama", and "skills" are all unique), so `p = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]`.

3. Map Words to Indices: We create a `words` dictionary where each word is mapped to a unique index. It would look like this after processing all `similarPairs`:

``````1{
2    "great": 0,
3    "good": 1,
4    "fine": 2,
5    "acting": 3,
6    "drama": 4,
7    "skills": 5,
8    "talent": 6
9}``````
4. Union operation: We iterate through `similarPairs` and perform union operations using the indices from `words` dictionary. After performing unions, the representative parents might look like this:

``````1// "good" and "great" are similar.
2p[find(1)] = find(0) // p[1] = 0
3
4// "good" and "fine" are similar, and "good" is linked to "great", so indirectly "fine" is linked to "great".
5p[find(2)] = find(1) // p[2] = 0
6
7// "acting" and "drama" are similar.
8p[find(4)] = find(3) // p[4] = 3
9
10// "skills" and "talent" are similar.
11p[find(6)] = find(5) // p[6] = 5``````
5. Comparing Sentences: Finally, we compare `sentence1` and `sentence2` word by word.

• For "great" and "fine", look up each word in the `words` dictionary, get their indices, and use the `find` function to check if they have the same parent. Since "fine" has been indirectly linked to "great", they have the same parent.
• For "acting" and "drama", do the same, and it will reveal that they have the same parent because they have been directly linked.
• For "skills" and "talent", they have also been directly linked, so they share the same parent.

Since for all pairs of words, either they are the same word or they belong to the same set (have the same representative parent in Union-Find terminologies), the algorithm will return `true`.

The execution concludes that `sentence1` and `sentence2` are indeed similar.

## Python Solution

``````1from typing import List
2
3class Solution:
4    def areSentencesSimilarTwo(self, sentence1: List[str], sentence2: List[str], similarPairs: List[List[str]]) -> bool:
5        # Check if sentences are of different lengths, return False directly if they are
6        if len(sentence1) != len(sentence2):
7            return False
8
9        # Initialize the parent list for union-find
10        numPairs = len(similarPairs)
11        parent = list(range(numPairs * 2))
12
13        # Union-find find function to find the root of the set to which 'x' belongs
14        def find(x):
15            if parent[x] != x:
16                parent[x] = find(parent[x])
17            return parent[x]
18
19        # Dictionary to store the mapping from word to its index in the union-find structure
20        wordsToIndex = {}
21        index = 0
22        # Iterate over similar pairs and perform union operations
23        for a, b in similarPairs:
24            # If the word 'a' is not in the dictionary, add it and assign a corresponding index
25            if a not in wordsToIndex:
26                wordsToIndex[a] = index
27                index += 1
28            # If the word 'b' is not in the dictionary, add it and assign a corresponding index
29            if b not in wordsToIndex:
30                wordsToIndex[b] = index
31                index += 1
32            # Link the indices of the two words in the parent array to denote their similarity
33            pa, pb = find(wordsToIndex[a]), find(wordsToIndex[b])
34            parent[pa] = pb
35
36        # Iterate through the sentences and check if each pair of words is similar
37        for word1, word2 in zip(sentence1, sentence2):
38            # If the words are identical, continue
39            if word1 == word2:
40                continue
41            # If either word is not in the wordsToIndex, or their roots are not the same, return False
42            if (word1 not in wordsToIndex or
43                    word2 not in wordsToIndex or
44                    find(wordsToIndex[word1]) != find(wordsToIndex[word2])):
45                return False
46
47        # If all word pairs are either the same or similar, return True
48        return True
49``````

## Java Solution

``````1class Solution {
2    private int[] parents; // Array to keep track of the root parent of each element in the Union-Find data structure
3
4    // Method to check if two sentences are similar based on similar word pairs
5    public boolean areSentencesSimilarTwo(
6        String[] sentence1, String[] sentence2, List<List<String>> similarPairs) {
7
8        // Sentences must be of the same length to be similar
9        if (sentence1.length != sentence2.length) {
10            return false;
11        }
12
13        // Initialize Union-Find data structure
14        int pairCount = similarPairs.size();
15        parents = new int[pairCount << 1]; // Twice the size, as each pair contains two elements
16        for (int i = 0; i < parents.length; ++i) {
17            parents[i] = i;
18        }
19
20        // Map to store the index of each unique word
21        Map<String, Integer> wordIndexMap = new HashMap<>();
22        int index = 0;
23        // Create the connections in Union-find for each similar pair
24        for (List<String> pair : similarPairs) {
25            String word1 = pair.get(0);
26            String word2 = pair.get(1);
27
28            // If the word is not already in our map, add it with a unique index
29            if (!wordIndexMap.containsKey(word1)) {
30                wordIndexMap.put(word1, index++);
31            }
32            if (!wordIndexMap.containsKey(word2)) {
33                wordIndexMap.put(word2, index++);
34            }
35
36            // Union the two words in the Union-Find data structure
37            parents[find(wordIndexMap.get(word1))] = find(wordIndexMap.get(word2));
38        }
39
40        // Check for each pair of words in the sentences if they are similar
41        for (int i = 0; i < sentence1.length; ++i) {
42            // If words are identical, they are trivially similar
43            if (Objects.equals(sentence1[i], sentence2[i])) {
44                continue;
45            }
46            // If either word is not known, or their roots in Union-Find are different, they are not similar
47            if (!wordIndexMap.containsKey(sentence1[i]) || !wordIndexMap.containsKey(sentence2[i]) ||
48                find(wordIndexMap.get(sentence1[i])) != find(wordIndexMap.get(sentence2[i]))) {
49                return false;
50            }
51        }
52        // If all word pairs are similar, sentences are similar
53        return true;
54    }
55
56    // Helper method to find the root parent in the Union-Find structure
57    private int find(int x) {
58        if (parents[x] != x) {
59            parents[x] = find(parents[x]); // Path compression for optimization
60        }
61        return parents[x];
62    }
63}
64``````

## C++ Solution

``````1#include <vector>
2#include <string>
3#include <unordered_map>
4using namespace std;
5
6class Solution {
7public:
8    vector<int> parent;
9
10    // Helper function to perform find operation in Union-Find
11    int find(int x) {
12        if (parent[x] != x)
13            parent[x] = find(parent[x]); // Path compression
14        return parent[x];
15    }
16
17    // Main function to check if two sentences are similar
18    bool areSentencesSimilarTwo(vector<string>& sentence1, vector<string>& sentence2, vector<vector<string>>& similarPairs) {
19        // Sentences are not similar if they are of different lengths
20        if (sentence1.size() != sentence2.size())
21            return false;
22
23        // Initialize Union-Find structure
24        int n = similarPairs.size();
25        parent.resize(n << 1);
26        for (int i = 0; i < parent.size(); ++i)
27            parent[i] = i;
28
29        // Mapping each word to an index
30        unordered_map<string, int> wordIndex;
31        int idx = 0;
32        for (auto& pair : similarPairs) {
33            string word1 = pair[0], word2 = pair[1];
34            // Assign new index if word hasn't been seen before
35            if (!wordIndex.count(word1))
36                wordIndex[word1] = idx++;
37            if (!wordIndex.count(word2))
38                wordIndex[word2] = idx++;
39            // Perform union operation for each similar pair
40            parent[find(wordIndex[word1])] = find(wordIndex[word2]);
41        }
42
43        // Check if each corresponding pair of words in the two sentences are similar
44        for (int i = 0; i < sentence1.size(); ++i) {
45            if (sentence1[i] == sentence2[i]) // Words are identical
46                continue;
47
48            // If either word isn't in the Union-Find structure or they are not connected,
49            // the sentences are not similar.
50            if (!wordIndex.count(sentence1[i]) || !wordIndex.count(sentence2[i]) || find(wordIndex[sentence1[i]]) != find(wordIndex[sentence2[i]]))
51                return false;
52        }
53
54        // All checks passed, sentences are similar
55        return true;
56    }
57};
58``````

## Typescript Solution

``````1import { string } from "prop-types";
2
3// Define a vector-like structure to emulate the C++ vector behavior.
4let parent: number[] = [];
5
6// Helper function to perform find operation in Union-Find
7function find(x: number): number {
8    if (parent[x] != x)
9        parent[x] = find(parent[x]); // Path compression to optimize future searches
10    return parent[x];
11}
12
13// Function to check if two sentences are similar
14function areSentencesSimilarTwo(sentence1: string[], sentence2: string[], similarPairs: [string, string][]): boolean {
15    // Sentences are not similar if they are of different lengths
16    if (sentence1.length !== sentence2.length)
17        return false;
18
19    // Initialize Union-Find structure
20    const n = similarPairs.length;
21    parent = Array.from({ length: n << 1 }, (_, i) => i);
22
23    // Mapping each word to an index
24    const wordIndex: { [word: string]: number } = {};
25    let idx = 0;
26    for (const pair of similarPairs) {
27        const [word1, word2] = pair;
28        // Assign new index if word hasn't been seen before
29        if (!(word1 in wordIndex))
30            wordIndex[word1] = idx++;
31        if (!(word2 in wordIndex))
32            wordIndex[word2] = idx++;
33        // Perform union operation for each similar pair
34        parent[find(wordIndex[word1])] = find(wordIndex[word2]);
35    }
36
37    // Check if each corresponding pair of words in the two sentences are similar
38    for (let i = 0; i < sentence1.length; i++) {
39        if (sentence1[i] === sentence2[i]) // Words are identical
40            continue;
41
42        // If either word isn't in the Union-Find structure or they are not connected,
43        // the sentences are not similar.
44        if (!(sentence1[i] in wordIndex) || !(sentence2[i] in wordIndex) || find(wordIndex[sentence1[i]]) !== find(wordIndex[sentence2[i]]))
45            return false;
46    }
47
48    // All checks passed, sentences are similar
49    return true;
50}
51``````

## Time and Space Complexity

### Time Complexity

The time complexity of the provided code is analyzed as follows:

1. Initializing the parent array `p` with `n << 1` elements takes `O(2n)` time where `n` is the number of similar pairs. Since each pair provides two elements, we have a maximum of `2n` unique words.
2. The `find` function is a path compression optimized union-find operation. This results in an amortized time complexity of `O(ฮฑ(n))` per find operation, where `ฮฑ` is the inverse Ackermann function, which is a very slow-growing function and can be considered almost constant for all practical purposes.
3. Building the `words` dictionary for all unique words in `similarPairs` takes `O(n)` operations, as we're doing constant time insertions.
4. The union operations in the loop for creating the `words` dictionary will run `n` times which will have a complexity of `O(nฮฑ(n))`, since each union (by the `find` operation) has a complexity of `O(ฮฑ(n))`.
5. The for loop to compare actual sentences runs `len(sentence1)` times. In the worst case, each look-up and each `find` operation is `O(ฮฑ(n))`. Since there are two `find` operations for each word-pair comparison, the total time complexity for this part is `O(2mฮฑ(n))`, where `m` is the number of words in sentence1/sentence2.

Therefore, the overall time complexity of the code is:

`O(2n) + O(n) + O(nฮฑ(n)) + O(2mฮฑ(n))`

Simplifying, since `ฮฑ(n)` is a very slow-growing function and can be treated as almost constant:

`O(2n + n + n + 2m)`

`O(4n + 2m)`

The time complexity is essentially linear with respect to the total number of elements in `similarPairs` and the length of the `sentence1` and `sentence2`. When assuming `ฮฑ(n)` is constant, this can be simplified further to:

`O(n + m)`

### Space Complexity

For space complexity, the following is taken into consideration:

1. The parent array `p` has a size of `2n`, which is `O(2n)`.
2. The `words` dictionary storage depends on the number of unique words across `similarPairs`, so in the worst case if all words are unique, it will be `O(2n)`.
3. The `find` function uses recursive stack space, but due to path compression, the depth will be very shallow on average, contributing insignificantly to the overall space complexity.

Hence, the overall space complexity of the program is:

`O(2n) + O(2n)`

Which simplifies to:

`O(4n)`

Considering just the unique elements count, this is:

`O(n)`

The space complexity is linear with respect to the number of unique elements present in `similarPairs`.

๐
Become an
Algo Monster

Got a question?ย Ask the Teaching Assistantย anything you don't understand.

Still not clear? Ask in the Forum, ย Discordย orย Submitย the part you don't understand to our editors.

โ
โTA ๐จโ๐ซ