Facebook Pixel

737. Sentence Similarity II πŸ”’

Problem Description

You are given two sentences represented as arrays of words and a list of similar word pairs. Your task is to determine if the two sentences are similar based on specific rules.

Input:

  • sentence1: An array of words representing the first sentence
  • sentence2: An array of words representing the second sentence
  • similarPairs: An array of word pairs [xi, yi] where each pair indicates that words xi and yi are similar

Rules for Sentence Similarity:

  1. Same Length Requirement: Both sentences must have the same number of words

  2. Word-by-Word Similarity: For sentences to be similar, each word at position i in sentence1 must be similar to the word at position i in sentence2

  3. Similarity Properties:

    • A word is always similar to itself
    • Similarity is transitive: If word a is similar to b, and b is similar to c, then a is similar to c

Example:

If we have:

  • sentence1 = ["I", "am", "happy"]
  • sentence2 = ["I", "am", "joyful"]
  • similarPairs = [["happy", "joyful"]]

The sentences would be similar because:

  • Both have 3 words (same length)
  • "I" = "I" (identical)
  • "am" = "am" (identical)
  • "happy" is similar to "joyful" (from similarPairs)

The solution uses a Union-Find (Disjoint Set Union) data structure to efficiently group words that are similar through transitive relationships. It first builds connected components of similar words, then checks if corresponding words in both sentences belong to the same component.

Flowchart Walkthrough

First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:

Is it a graph?

  • Yes: The problem involves words and their similarity relationships, which can be modeled as a graph where words are nodes and similarity pairs are edges.

Is it a tree?

  • No: The similarity relationships form a general graph, not a tree structure. Words can have multiple connections, and there can be cycles (e.g., if word A is similar to B, B to C, and C back to A through other words).

Is the problem related to directed acyclic graphs?

  • No: The similarity relationship is bidirectional (if A is similar to B, then B is similar to A), making it an undirected graph. Also, cycles can exist.

Is the problem related to shortest paths?

  • No: We're not looking for the shortest path between words. We need to determine if words belong to the same connected component.

Does the problem involve connectivity?

  • Yes: The core of the problem is determining whether pairs of words are connected through similarity relationships (transitive property). We need to check if words belong to the same connected component.

Disjoint Set Union (DSU)

  • Yes: DSU is the optimal choice here as it efficiently handles:
    • Grouping similar words into connected components
    • Checking if two words belong to the same component
    • Managing the transitive property of similarity

Conclusion: The flowchart leads us to use Disjoint Set Union (DSU) for this problem. While DFS could also solve the connectivity problem by exploring connected components, DSU is more efficient for this use case where we need to repeatedly check if pairs of words are in the same component. The solution implements DSU with path compression in the find function to optimize performance.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is recognizing that word similarity forms a graph where the transitive property creates connected components. If word A is similar to B, and B is similar to C, then A and C are similar - they all belong to the same "similarity group."

Think of it like social networks: if person A knows person B, and person B knows person C, then A and C are in the same social circle even if they don't know each other directly. Similarly, words connected through any chain of similarities are all similar to each other.

To check if two sentences are similar, we need to verify that corresponding words either:

  1. Are identical (a word is always similar to itself), or
  2. Belong to the same similarity group

The challenge is efficiently determining which words belong to the same group, especially when we have many similarity pairs that might form long chains of connections.

This is where Union-Find (Disjoint Set Union) becomes the perfect tool. Instead of repeatedly traversing the graph to check connectivity, we can:

  • Build the similarity groups by "unioning" words that are similar
  • Quickly check if two words are in the same group using the "find" operation

The algorithm assigns each unique word an index and uses a parent array p where p[i] points to the parent of word i in its similarity group. Words in the same group will share the same root parent. When we process each similarity pair [a, b], we union their groups by making one root point to the other.

For sentence comparison, we iterate through both sentences position by position. If words at the same position are different, we check if they belong to the same similarity group using the find operation. If they don't (or if one word has never appeared in the similarity pairs), the sentences aren't similar.

The path compression optimization in the find function (p[x] = find(p[x])) ensures that future lookups are faster by directly connecting nodes to their root, making the overall solution very efficient.

Learn more about Depth-First Search, Breadth-First Search and Union Find patterns.

Solution Approach

The implementation uses Union-Find (Disjoint Set Union) to efficiently manage word similarity groups. Let's walk through the key components:

1. Initial Setup and Length Check:

if len(sentence1) != len(sentence2):
    return False

First, we verify both sentences have the same length - a necessary condition for similarity.

2. Initialize Union-Find Structure:

n = len(similarPairs)
p = list(range(n << 1))

We create a parent array p with size 2n (using bit shift n << 1 for efficiency). This accounts for the worst case where all words in similarity pairs are unique. Each index initially points to itself, representing individual sets.

3. Find Operation with Path Compression:

def find(x):
    if p[x] != x:
        p[x] = find(p[x])
    return p[x]

This recursive function finds the root parent of a word's group. The path compression optimization (p[x] = find(p[x])) directly connects nodes to their root during traversal, improving future lookups from O(log n) to nearly O(1).

4. Build Word-to-Index Mapping and Union Groups:

words = {}
idx = 0
for a, b in similarPairs:
    if a not in words:
        words[a] = idx
        idx += 1
    if b not in words:
        words[b] = idx
        idx += 1
    p[find(words[a])] = find(words[b])

For each similarity pair:

  • Assign unique indices to new words in the words dictionary
  • Union the two words by making one group's root point to the other's root: p[find(words[a])] = find(words[b])

This builds connected components where all transitively similar words share the same root parent.

5. Verify Sentence Similarity:

for i in range(len(sentence1)):
    if sentence1[i] == sentence2[i]:
        continue
    if (
        sentence1[i] not in words
        or sentence2[i] not in words
        or find(words[sentence1[i]]) != find(words[sentence2[i]])
    ):
        return False
return True

For each position in the sentences:

  • Skip if words are identical (always similar)
  • For different words, check if:
    • Both words exist in our similarity mapping
    • Both words belong to the same group (have the same root parent)
  • Return False if any position fails these checks

Time Complexity: O(N + P), where N is the length of sentences and P is the number of similarity pairs. The Union-Find operations are nearly O(1) with path compression.

Space Complexity: O(P) for storing the parent array and word mapping dictionary.

Ready to land your dream job?

Unlock your dream job with a 3-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to illustrate how the Union-Find solution works:

Input:

  • sentence1 = ["great", "acting", "skills"]
  • sentence2 = ["fine", "drama", "talent"]
  • similarPairs = [["great", "fine"], ["acting", "drama"], ["skills", "talent"]]

Step 1: Length Check Both sentences have 3 words, so we continue.

Step 2: Initialize Union-Find Create parent array p = [0, 1, 2, 3, 4, 5] (size = 2 Γ— 3 = 6)

Step 3: Process Similarity Pairs

Processing ["great", "fine"]:

  • Assign words["great"] = 0, increment idx to 1
  • Assign words["fine"] = 1, increment idx to 2
  • Union: p[find(0)] = find(1) β†’ p[0] = 1
  • Parent array: [1, 1, 2, 3, 4, 5]

Processing ["acting", "drama"]:

  • Assign words["acting"] = 2, increment idx to 3
  • Assign words["drama"] = 3, increment idx to 4
  • Union: p[find(2)] = find(3) β†’ p[2] = 3
  • Parent array: [1, 1, 3, 3, 4, 5]

Processing ["skills", "talent"]:

  • Assign words["skills"] = 4, increment idx to 5
  • Assign words["talent"] = 5, increment idx to 6
  • Union: p[find(4)] = find(5) β†’ p[4] = 5
  • Parent array: [1, 1, 3, 3, 5, 5]

Final word mapping:

words = {
    "great": 0, "fine": 1,
    "acting": 2, "drama": 3,
    "skills": 4, "talent": 5
}

Step 4: Compare Sentences Position by Position

Position 0: "great" vs "fine"

  • Different words, so check similarity
  • find(words["great"]) = find(0) = 1
  • find(words["fine"]) = find(1) = 1
  • Same group (both have root 1) βœ“

Position 1: "acting" vs "drama"

  • Different words, so check similarity
  • find(words["acting"]) = find(2) = 3
  • find(words["drama"]) = find(3) = 3
  • Same group (both have root 3) βœ“

Position 2: "skills" vs "talent"

  • Different words, so check similarity
  • find(words["skills"]) = find(4) = 5
  • find(words["talent"]) = find(5) = 5
  • Same group (both have root 5) βœ“

Result: All corresponding words are similar, so return True.

This example demonstrates how Union-Find efficiently groups similar words and allows quick checking of whether two words belong to the same similarity group.

Solution Implementation

1class Solution:
2    def areSentencesSimilarTwo(
3        self, sentence1: List[str], sentence2: List[str], similarPairs: List[List[str]]
4    ) -> bool:
5        # If sentences have different lengths, they cannot be similar
6        if len(sentence1) != len(sentence2):
7            return False
8      
9        # Initialize parent array for Union-Find
10        # Size is 2 * number of pairs to accommodate all possible unique words
11        num_pairs = len(similarPairs)
12        parent = list(range(num_pairs * 2))
13      
14        def find(node: int) -> int:
15            """Find the root parent of a node with path compression."""
16            if parent[node] != node:
17                parent[node] = find(parent[node])  # Path compression
18            return parent[node]
19      
20        # Map each unique word to a unique index
21        word_to_index = {}
22        current_index = 0
23      
24        # Build the Union-Find structure from similar pairs
25        for word1, word2 in similarPairs:
26            # Assign index to word1 if not seen before
27            if word1 not in word_to_index:
28                word_to_index[word1] = current_index
29                current_index += 1
30          
31            # Assign index to word2 if not seen before
32            if word2 not in word_to_index:
33                word_to_index[word2] = current_index
34                current_index += 1
35          
36            # Union the two words by connecting their root parents
37            root1 = find(word_to_index[word1])
38            root2 = find(word_to_index[word2])
39            parent[root1] = root2
40      
41        # Check if corresponding words in both sentences are similar
42        for i in range(len(sentence1)):
43            word_from_s1 = sentence1[i]
44            word_from_s2 = sentence2[i]
45          
46            # If words are identical, they are similar
47            if word_from_s1 == word_from_s2:
48                continue
49          
50            # Check if both words exist in similarity pairs and belong to same group
51            if (word_from_s1 not in word_to_index or 
52                word_from_s2 not in word_to_index or 
53                find(word_to_index[word_from_s1]) != find(word_to_index[word_from_s2])):
54                return False
55      
56        return True
57
1class Solution {
2    // Parent array for Union-Find data structure
3    private int[] parent;
4
5    /**
6     * Determines if two sentences are similar based on transitive similarity pairs.
7     * Two sentences are similar if they have the same length and each corresponding
8     * word pair is either identical or belongs to the same similarity group.
9     * 
10     * @param sentence1 First sentence as array of words
11     * @param sentence2 Second sentence as array of words
12     * @param similarPairs List of word pairs that are similar
13     * @return true if sentences are similar, false otherwise
14     */
15    public boolean areSentencesSimilarTwo(
16        String[] sentence1, String[] sentence2, List<List<String>> similarPairs) {
17      
18        // Sentences must have the same length to be similar
19        if (sentence1.length != sentence2.length) {
20            return false;
21        }
22      
23        // Initialize Union-Find structure
24        // Size is 2 * number of pairs to accommodate all possible unique words
25        int pairCount = similarPairs.size();
26        parent = new int[pairCount << 1];  // Equivalent to pairCount * 2
27        for (int i = 0; i < parent.length; ++i) {
28            parent[i] = i;  // Each element is initially its own parent
29        }
30      
31        // Map each unique word to a unique index for Union-Find
32        Map<String, Integer> wordToIndex = new HashMap<>();
33        int currentIndex = 0;
34      
35        // Build similarity groups using Union-Find
36        for (List<String> pair : similarPairs) {
37            String word1 = pair.get(0);
38            String word2 = pair.get(1);
39          
40            // Assign index to word1 if not seen before
41            if (!wordToIndex.containsKey(word1)) {
42                wordToIndex.put(word1, currentIndex++);
43            }
44          
45            // Assign index to word2 if not seen before
46            if (!wordToIndex.containsKey(word2)) {
47                wordToIndex.put(word2, currentIndex++);
48            }
49          
50            // Union the two words in the same similarity group
51            int root1 = find(wordToIndex.get(word1));
52            int root2 = find(wordToIndex.get(word2));
53            parent[root1] = root2;
54        }
55      
56        // Check if corresponding words in both sentences are similar
57        for (int i = 0; i < sentence1.length; ++i) {
58            // If words are identical, they are similar
59            if (Objects.equals(sentence1[i], sentence2[i])) {
60                continue;
61            }
62          
63            // Check if both words exist in similarity pairs and belong to same group
64            if (!wordToIndex.containsKey(sentence1[i]) || 
65                !wordToIndex.containsKey(sentence2[i]) ||
66                find(wordToIndex.get(sentence1[i])) != find(wordToIndex.get(sentence2[i]))) {
67                return false;
68            }
69        }
70      
71        return true;
72    }
73
74    /**
75     * Find operation with path compression for Union-Find.
76     * Finds the root parent of element x and compresses the path.
77     * 
78     * @param x Element to find root for
79     * @return Root parent of element x
80     */
81    private int find(int x) {
82        if (parent[x] != x) {
83            // Path compression: make x point directly to root
84            parent[x] = find(parent[x]);
85        }
86        return parent[x];
87    }
88}
89
1class Solution {
2public:
3    vector<int> parent;  // Parent array for Union-Find data structure
4  
5    bool areSentencesSimilarTwo(vector<string>& sentence1, vector<string>& sentence2, vector<vector<string>>& similarPairs) {
6        // If sentences have different lengths, they cannot be similar
7        if (sentence1.size() != sentence2.size()) {
8            return false;
9        }
10      
11        int numPairs = similarPairs.size();
12        // Initialize parent array with size 2 * numPairs (worst case: all words are unique)
13        parent.resize(numPairs * 2);
14        for (int i = 0; i < parent.size(); ++i) {
15            parent[i] = i;  // Initially, each element is its own parent
16        }
17      
18        // Map to store word to index mapping for Union-Find
19        unordered_map<string, int> wordToIndex;
20        int currentIndex = 0;
21      
22        // Build Union-Find structure from similar pairs
23        for (const auto& pair : similarPairs) {
24            string word1 = pair[0];
25            string word2 = pair[1];
26          
27            // Assign indices to new words
28            if (wordToIndex.find(word1) == wordToIndex.end()) {
29                wordToIndex[word1] = currentIndex++;
30            }
31            if (wordToIndex.find(word2) == wordToIndex.end()) {
32                wordToIndex[word2] = currentIndex++;
33            }
34          
35            // Union the two similar words
36            parent[find(wordToIndex[word1])] = find(wordToIndex[word2]);
37        }
38      
39        // Check if corresponding words in both sentences are similar
40        for (int i = 0; i < sentence1.size(); ++i) {
41            // If words are identical, they are similar
42            if (sentence1[i] == sentence2[i]) {
43                continue;
44            }
45          
46            // Check if both words exist in similarity groups and belong to the same group
47            if (wordToIndex.find(sentence1[i]) == wordToIndex.end() || 
48                wordToIndex.find(sentence2[i]) == wordToIndex.end() || 
49                find(wordToIndex[sentence1[i]]) != find(wordToIndex[sentence2[i]])) {
50                return false;
51            }
52        }
53      
54        return true;
55    }
56  
57private:
58    // Find operation with path compression for Union-Find
59    int find(int x) {
60        if (parent[x] != x) {
61            parent[x] = find(parent[x]);  // Path compression
62        }
63        return parent[x];
64    }
65};
66
1let parent: number[] = [];  // Parent array for Union-Find data structure
2
3function areSentencesSimilarTwo(sentence1: string[], sentence2: string[], similarPairs: string[][]): boolean {
4    // If sentences have different lengths, they cannot be similar
5    if (sentence1.length !== sentence2.length) {
6        return false;
7    }
8  
9    const numPairs = similarPairs.length;
10    // Initialize parent array with size 2 * numPairs (worst case: all words are unique)
11    parent = new Array(numPairs * 2);
12    for (let i = 0; i < parent.length; i++) {
13        parent[i] = i;  // Initially, each element is its own parent
14    }
15  
16    // Map to store word to index mapping for Union-Find
17    const wordToIndex = new Map<string, number>();
18    let currentIndex = 0;
19  
20    // Build Union-Find structure from similar pairs
21    for (const pair of similarPairs) {
22        const word1 = pair[0];
23        const word2 = pair[1];
24      
25        // Assign indices to new words
26        if (!wordToIndex.has(word1)) {
27            wordToIndex.set(word1, currentIndex++);
28        }
29        if (!wordToIndex.has(word2)) {
30            wordToIndex.set(word2, currentIndex++);
31        }
32      
33        // Union the two similar words
34        parent[find(wordToIndex.get(word1)!)] = find(wordToIndex.get(word2)!);
35    }
36  
37    // Check if corresponding words in both sentences are similar
38    for (let i = 0; i < sentence1.length; i++) {
39        // If words are identical, they are similar
40        if (sentence1[i] === sentence2[i]) {
41            continue;
42        }
43      
44        // Check if both words exist in similarity groups and belong to the same group
45        if (!wordToIndex.has(sentence1[i]) || 
46            !wordToIndex.has(sentence2[i]) || 
47            find(wordToIndex.get(sentence1[i])!) !== find(wordToIndex.get(sentence2[i])!)) {
48            return false;
49        }
50    }
51  
52    return true;
53}
54
55// Find operation with path compression for Union-Find
56function find(x: number): number {
57    if (parent[x] !== x) {
58        parent[x] = find(parent[x]);  // Path compression
59    }
60    return parent[x];
61}
62

Time and Space Complexity

Time Complexity: O(P * Ξ±(P) + S)

  • Building the union-find structure requires iterating through all P similar pairs, where P = len(similarPairs)
  • For each pair, we perform at most 2 insertions into the dictionary (constant time each) and 2 find operations with union
  • The find operation with path compression has an amortized time complexity of O(Ξ±(P)), where Ξ± is the inverse Ackermann function, which is effectively constant for practical purposes
  • The union operation (setting parent) also involves 2 find calls, so it's O(Ξ±(P))
  • Processing similar pairs: O(P * Ξ±(P))
  • Comparing sentences requires iterating through S words where S = len(sentence1), and for each word we may perform up to 2 find operations: O(S * Ξ±(P))
  • Total: O(P * Ξ±(P) + S * Ξ±(P)) which simplifies to O((P + S) * Ξ±(P)) β‰ˆ O(P + S) for practical purposes

Space Complexity: O(P)

  • The parent array p is initialized with size 2 * P (since there can be at most 2 * P unique words from similar pairs): O(P)
  • The words dictionary stores at most 2 * P unique words: O(P)
  • The recursive call stack for find operation can go up to O(log P) in the worst case before path compression, but with path compression it becomes O(Ξ±(P)) which is effectively constant
  • Total: O(P)

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Forgetting to Handle Self-Similarity

A critical edge case is when words appear in sentences but NOT in the similarity pairs. Many solutions incorrectly return False when a word from sentence1 matches a word from sentence2 but neither appears in similarPairs.

Incorrect Approach:

# This would fail for sentence1=["great"], sentence2=["great"], similarPairs=[]
for i in range(len(sentence1)):
    if (sentence1[i] not in word_to_index or 
        sentence2[i] not in word_to_index or 
        find(word_to_index[sentence1[i]]) != find(word_to_index[sentence2[i]])):
        return False

Correct Solution: Always check for word equality first:

for i in range(len(sentence1)):
    if sentence1[i] == sentence2[i]:
        continue  # Words are identical, so they're similar
    # Only then check similarity pairs

2. Incorrect Union Operation

A common mistake is directly assigning parents without using the find operation, which breaks the Union-Find structure.

Incorrect:

# This doesn't properly unite the groups
parent[word_to_index[word1]] = word_to_index[word2]

Correct:

# Properly unite the root parents of both groups
parent[find(word_to_index[word1])] = find(word_to_index[word2])

3. Insufficient Parent Array Size

Allocating exactly n spaces for the parent array when there could be up to 2n unique words.

Incorrect:

parent = list(range(len(similarPairs)))  # Too small!

Correct:

parent = list(range(len(similarPairs) * 2))  # Accounts for all possible unique words

4. Missing Path Compression

Implementing find without path compression leads to degraded performance in skewed trees.

Inefficient:

def find(x):
    while parent[x] != x:
        x = parent[x]
    return x

Optimized:

def find(x):
    if parent[x] != x:
        parent[x] = find(parent[x])  # Path compression
    return parent[x]

5. Not Handling Empty Similarity Pairs

When similarPairs is empty, the solution should still work correctly by relying on word equality checks.

Test Case to Verify:

sentence1 = ["I", "love", "coding"]
sentence2 = ["I", "love", "coding"]
similarPairs = []
# Should return True (all words are identical)
Discover Your Strengths and Weaknesses: Take Our 3-Minute Quiz to Tailor Your Study Plan:

What is an advantages of top-down dynamic programming vs bottom-up dynamic programming?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More