737. Sentence Similarity II π
Problem Description
You are given two sentences represented as arrays of words and a list of similar word pairs. Your task is to determine if the two sentences are similar based on specific rules.
Input:
sentence1
: An array of words representing the first sentencesentence2
: An array of words representing the second sentencesimilarPairs
: An array of word pairs[xi, yi]
where each pair indicates that wordsxi
andyi
are similar
Rules for Sentence Similarity:
-
Same Length Requirement: Both sentences must have the same number of words
-
Word-by-Word Similarity: For sentences to be similar, each word at position
i
insentence1
must be similar to the word at positioni
insentence2
-
Similarity Properties:
- A word is always similar to itself
- Similarity is transitive: If word
a
is similar tob
, andb
is similar toc
, thena
is similar toc
Example:
If we have:
sentence1 = ["I", "am", "happy"]
sentence2 = ["I", "am", "joyful"]
similarPairs = [["happy", "joyful"]]
The sentences would be similar because:
- Both have 3 words (same length)
- "I" = "I" (identical)
- "am" = "am" (identical)
- "happy" is similar to "joyful" (from similarPairs)
The solution uses a Union-Find (Disjoint Set Union) data structure to efficiently group words that are similar through transitive relationships. It first builds connected components of similar words, then checks if corresponding words in both sentences belong to the same component.
Flowchart Walkthrough
First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:
Is it a graph?
- Yes: The problem involves words and their similarity relationships, which can be modeled as a graph where words are nodes and similarity pairs are edges.
Is it a tree?
- No: The similarity relationships form a general graph, not a tree structure. Words can have multiple connections, and there can be cycles (e.g., if word A is similar to B, B to C, and C back to A through other words).
Is the problem related to directed acyclic graphs?
- No: The similarity relationship is bidirectional (if A is similar to B, then B is similar to A), making it an undirected graph. Also, cycles can exist.
Is the problem related to shortest paths?
- No: We're not looking for the shortest path between words. We need to determine if words belong to the same connected component.
Does the problem involve connectivity?
- Yes: The core of the problem is determining whether pairs of words are connected through similarity relationships (transitive property). We need to check if words belong to the same connected component.
Disjoint Set Union (DSU)
- Yes: DSU is the optimal choice here as it efficiently handles:
- Grouping similar words into connected components
- Checking if two words belong to the same component
- Managing the transitive property of similarity
Conclusion: The flowchart leads us to use Disjoint Set Union (DSU) for this problem. While DFS could also solve the connectivity problem by exploring connected components, DSU is more efficient for this use case where we need to repeatedly check if pairs of words are in the same component. The solution implements DSU with path compression in the find
function to optimize performance.
Intuition
The key insight is recognizing that word similarity forms a graph where the transitive property creates connected components. If word A is similar to B, and B is similar to C, then A and C are similar - they all belong to the same "similarity group."
Think of it like social networks: if person A knows person B, and person B knows person C, then A and C are in the same social circle even if they don't know each other directly. Similarly, words connected through any chain of similarities are all similar to each other.
To check if two sentences are similar, we need to verify that corresponding words either:
- Are identical (a word is always similar to itself), or
- Belong to the same similarity group
The challenge is efficiently determining which words belong to the same group, especially when we have many similarity pairs that might form long chains of connections.
This is where Union-Find (Disjoint Set Union) becomes the perfect tool. Instead of repeatedly traversing the graph to check connectivity, we can:
- Build the similarity groups by "unioning" words that are similar
- Quickly check if two words are in the same group using the "find" operation
The algorithm assigns each unique word an index and uses a parent array p
where p[i]
points to the parent of word i
in its similarity group. Words in the same group will share the same root parent. When we process each similarity pair [a, b]
, we union their groups by making one root point to the other.
For sentence comparison, we iterate through both sentences position by position. If words at the same position are different, we check if they belong to the same similarity group using the find operation. If they don't (or if one word has never appeared in the similarity pairs), the sentences aren't similar.
The path compression optimization in the find
function (p[x] = find(p[x])
) ensures that future lookups are faster by directly connecting nodes to their root, making the overall solution very efficient.
Learn more about Depth-First Search, Breadth-First Search and Union Find patterns.
Solution Approach
The implementation uses Union-Find (Disjoint Set Union) to efficiently manage word similarity groups. Let's walk through the key components:
1. Initial Setup and Length Check:
if len(sentence1) != len(sentence2):
return False
First, we verify both sentences have the same length - a necessary condition for similarity.
2. Initialize Union-Find Structure:
n = len(similarPairs)
p = list(range(n << 1))
We create a parent array p
with size 2n
(using bit shift n << 1
for efficiency). This accounts for the worst case where all words in similarity pairs are unique. Each index initially points to itself, representing individual sets.
3. Find Operation with Path Compression:
def find(x):
if p[x] != x:
p[x] = find(p[x])
return p[x]
This recursive function finds the root parent of a word's group. The path compression optimization (p[x] = find(p[x])
) directly connects nodes to their root during traversal, improving future lookups from O(log n) to nearly O(1).
4. Build Word-to-Index Mapping and Union Groups:
words = {} idx = 0 for a, b in similarPairs: if a not in words: words[a] = idx idx += 1 if b not in words: words[b] = idx idx += 1 p[find(words[a])] = find(words[b])
For each similarity pair:
- Assign unique indices to new words in the
words
dictionary - Union the two words by making one group's root point to the other's root:
p[find(words[a])] = find(words[b])
This builds connected components where all transitively similar words share the same root parent.
5. Verify Sentence Similarity:
for i in range(len(sentence1)):
if sentence1[i] == sentence2[i]:
continue
if (
sentence1[i] not in words
or sentence2[i] not in words
or find(words[sentence1[i]]) != find(words[sentence2[i]])
):
return False
return True
For each position in the sentences:
- Skip if words are identical (always similar)
- For different words, check if:
- Both words exist in our similarity mapping
- Both words belong to the same group (have the same root parent)
- Return
False
if any position fails these checks
Time Complexity: O(N + P), where N is the length of sentences and P is the number of similarity pairs. The Union-Find operations are nearly O(1) with path compression.
Space Complexity: O(P) for storing the parent array and word mapping dictionary.
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a concrete example to illustrate how the Union-Find solution works:
Input:
sentence1 = ["great", "acting", "skills"]
sentence2 = ["fine", "drama", "talent"]
similarPairs = [["great", "fine"], ["acting", "drama"], ["skills", "talent"]]
Step 1: Length Check Both sentences have 3 words, so we continue.
Step 2: Initialize Union-Find
Create parent array p = [0, 1, 2, 3, 4, 5]
(size = 2 Γ 3 = 6)
Step 3: Process Similarity Pairs
Processing ["great", "fine"]
:
- Assign
words["great"] = 0
, increment idx to 1 - Assign
words["fine"] = 1
, increment idx to 2 - Union:
p[find(0)] = find(1)
βp[0] = 1
- Parent array:
[1, 1, 2, 3, 4, 5]
Processing ["acting", "drama"]
:
- Assign
words["acting"] = 2
, increment idx to 3 - Assign
words["drama"] = 3
, increment idx to 4 - Union:
p[find(2)] = find(3)
βp[2] = 3
- Parent array:
[1, 1, 3, 3, 4, 5]
Processing ["skills", "talent"]
:
- Assign
words["skills"] = 4
, increment idx to 5 - Assign
words["talent"] = 5
, increment idx to 6 - Union:
p[find(4)] = find(5)
βp[4] = 5
- Parent array:
[1, 1, 3, 3, 5, 5]
Final word mapping:
words = { "great": 0, "fine": 1, "acting": 2, "drama": 3, "skills": 4, "talent": 5 }
Step 4: Compare Sentences Position by Position
Position 0: "great" vs "fine"
- Different words, so check similarity
find(words["great"]) = find(0) = 1
find(words["fine"]) = find(1) = 1
- Same group (both have root 1) β
Position 1: "acting" vs "drama"
- Different words, so check similarity
find(words["acting"]) = find(2) = 3
find(words["drama"]) = find(3) = 3
- Same group (both have root 3) β
Position 2: "skills" vs "talent"
- Different words, so check similarity
find(words["skills"]) = find(4) = 5
find(words["talent"]) = find(5) = 5
- Same group (both have root 5) β
Result: All corresponding words are similar, so return True
.
This example demonstrates how Union-Find efficiently groups similar words and allows quick checking of whether two words belong to the same similarity group.
Solution Implementation
1class Solution:
2 def areSentencesSimilarTwo(
3 self, sentence1: List[str], sentence2: List[str], similarPairs: List[List[str]]
4 ) -> bool:
5 # If sentences have different lengths, they cannot be similar
6 if len(sentence1) != len(sentence2):
7 return False
8
9 # Initialize parent array for Union-Find
10 # Size is 2 * number of pairs to accommodate all possible unique words
11 num_pairs = len(similarPairs)
12 parent = list(range(num_pairs * 2))
13
14 def find(node: int) -> int:
15 """Find the root parent of a node with path compression."""
16 if parent[node] != node:
17 parent[node] = find(parent[node]) # Path compression
18 return parent[node]
19
20 # Map each unique word to a unique index
21 word_to_index = {}
22 current_index = 0
23
24 # Build the Union-Find structure from similar pairs
25 for word1, word2 in similarPairs:
26 # Assign index to word1 if not seen before
27 if word1 not in word_to_index:
28 word_to_index[word1] = current_index
29 current_index += 1
30
31 # Assign index to word2 if not seen before
32 if word2 not in word_to_index:
33 word_to_index[word2] = current_index
34 current_index += 1
35
36 # Union the two words by connecting their root parents
37 root1 = find(word_to_index[word1])
38 root2 = find(word_to_index[word2])
39 parent[root1] = root2
40
41 # Check if corresponding words in both sentences are similar
42 for i in range(len(sentence1)):
43 word_from_s1 = sentence1[i]
44 word_from_s2 = sentence2[i]
45
46 # If words are identical, they are similar
47 if word_from_s1 == word_from_s2:
48 continue
49
50 # Check if both words exist in similarity pairs and belong to same group
51 if (word_from_s1 not in word_to_index or
52 word_from_s2 not in word_to_index or
53 find(word_to_index[word_from_s1]) != find(word_to_index[word_from_s2])):
54 return False
55
56 return True
57
1class Solution {
2 // Parent array for Union-Find data structure
3 private int[] parent;
4
5 /**
6 * Determines if two sentences are similar based on transitive similarity pairs.
7 * Two sentences are similar if they have the same length and each corresponding
8 * word pair is either identical or belongs to the same similarity group.
9 *
10 * @param sentence1 First sentence as array of words
11 * @param sentence2 Second sentence as array of words
12 * @param similarPairs List of word pairs that are similar
13 * @return true if sentences are similar, false otherwise
14 */
15 public boolean areSentencesSimilarTwo(
16 String[] sentence1, String[] sentence2, List<List<String>> similarPairs) {
17
18 // Sentences must have the same length to be similar
19 if (sentence1.length != sentence2.length) {
20 return false;
21 }
22
23 // Initialize Union-Find structure
24 // Size is 2 * number of pairs to accommodate all possible unique words
25 int pairCount = similarPairs.size();
26 parent = new int[pairCount << 1]; // Equivalent to pairCount * 2
27 for (int i = 0; i < parent.length; ++i) {
28 parent[i] = i; // Each element is initially its own parent
29 }
30
31 // Map each unique word to a unique index for Union-Find
32 Map<String, Integer> wordToIndex = new HashMap<>();
33 int currentIndex = 0;
34
35 // Build similarity groups using Union-Find
36 for (List<String> pair : similarPairs) {
37 String word1 = pair.get(0);
38 String word2 = pair.get(1);
39
40 // Assign index to word1 if not seen before
41 if (!wordToIndex.containsKey(word1)) {
42 wordToIndex.put(word1, currentIndex++);
43 }
44
45 // Assign index to word2 if not seen before
46 if (!wordToIndex.containsKey(word2)) {
47 wordToIndex.put(word2, currentIndex++);
48 }
49
50 // Union the two words in the same similarity group
51 int root1 = find(wordToIndex.get(word1));
52 int root2 = find(wordToIndex.get(word2));
53 parent[root1] = root2;
54 }
55
56 // Check if corresponding words in both sentences are similar
57 for (int i = 0; i < sentence1.length; ++i) {
58 // If words are identical, they are similar
59 if (Objects.equals(sentence1[i], sentence2[i])) {
60 continue;
61 }
62
63 // Check if both words exist in similarity pairs and belong to same group
64 if (!wordToIndex.containsKey(sentence1[i]) ||
65 !wordToIndex.containsKey(sentence2[i]) ||
66 find(wordToIndex.get(sentence1[i])) != find(wordToIndex.get(sentence2[i]))) {
67 return false;
68 }
69 }
70
71 return true;
72 }
73
74 /**
75 * Find operation with path compression for Union-Find.
76 * Finds the root parent of element x and compresses the path.
77 *
78 * @param x Element to find root for
79 * @return Root parent of element x
80 */
81 private int find(int x) {
82 if (parent[x] != x) {
83 // Path compression: make x point directly to root
84 parent[x] = find(parent[x]);
85 }
86 return parent[x];
87 }
88}
89
1class Solution {
2public:
3 vector<int> parent; // Parent array for Union-Find data structure
4
5 bool areSentencesSimilarTwo(vector<string>& sentence1, vector<string>& sentence2, vector<vector<string>>& similarPairs) {
6 // If sentences have different lengths, they cannot be similar
7 if (sentence1.size() != sentence2.size()) {
8 return false;
9 }
10
11 int numPairs = similarPairs.size();
12 // Initialize parent array with size 2 * numPairs (worst case: all words are unique)
13 parent.resize(numPairs * 2);
14 for (int i = 0; i < parent.size(); ++i) {
15 parent[i] = i; // Initially, each element is its own parent
16 }
17
18 // Map to store word to index mapping for Union-Find
19 unordered_map<string, int> wordToIndex;
20 int currentIndex = 0;
21
22 // Build Union-Find structure from similar pairs
23 for (const auto& pair : similarPairs) {
24 string word1 = pair[0];
25 string word2 = pair[1];
26
27 // Assign indices to new words
28 if (wordToIndex.find(word1) == wordToIndex.end()) {
29 wordToIndex[word1] = currentIndex++;
30 }
31 if (wordToIndex.find(word2) == wordToIndex.end()) {
32 wordToIndex[word2] = currentIndex++;
33 }
34
35 // Union the two similar words
36 parent[find(wordToIndex[word1])] = find(wordToIndex[word2]);
37 }
38
39 // Check if corresponding words in both sentences are similar
40 for (int i = 0; i < sentence1.size(); ++i) {
41 // If words are identical, they are similar
42 if (sentence1[i] == sentence2[i]) {
43 continue;
44 }
45
46 // Check if both words exist in similarity groups and belong to the same group
47 if (wordToIndex.find(sentence1[i]) == wordToIndex.end() ||
48 wordToIndex.find(sentence2[i]) == wordToIndex.end() ||
49 find(wordToIndex[sentence1[i]]) != find(wordToIndex[sentence2[i]])) {
50 return false;
51 }
52 }
53
54 return true;
55 }
56
57private:
58 // Find operation with path compression for Union-Find
59 int find(int x) {
60 if (parent[x] != x) {
61 parent[x] = find(parent[x]); // Path compression
62 }
63 return parent[x];
64 }
65};
66
1let parent: number[] = []; // Parent array for Union-Find data structure
2
3function areSentencesSimilarTwo(sentence1: string[], sentence2: string[], similarPairs: string[][]): boolean {
4 // If sentences have different lengths, they cannot be similar
5 if (sentence1.length !== sentence2.length) {
6 return false;
7 }
8
9 const numPairs = similarPairs.length;
10 // Initialize parent array with size 2 * numPairs (worst case: all words are unique)
11 parent = new Array(numPairs * 2);
12 for (let i = 0; i < parent.length; i++) {
13 parent[i] = i; // Initially, each element is its own parent
14 }
15
16 // Map to store word to index mapping for Union-Find
17 const wordToIndex = new Map<string, number>();
18 let currentIndex = 0;
19
20 // Build Union-Find structure from similar pairs
21 for (const pair of similarPairs) {
22 const word1 = pair[0];
23 const word2 = pair[1];
24
25 // Assign indices to new words
26 if (!wordToIndex.has(word1)) {
27 wordToIndex.set(word1, currentIndex++);
28 }
29 if (!wordToIndex.has(word2)) {
30 wordToIndex.set(word2, currentIndex++);
31 }
32
33 // Union the two similar words
34 parent[find(wordToIndex.get(word1)!)] = find(wordToIndex.get(word2)!);
35 }
36
37 // Check if corresponding words in both sentences are similar
38 for (let i = 0; i < sentence1.length; i++) {
39 // If words are identical, they are similar
40 if (sentence1[i] === sentence2[i]) {
41 continue;
42 }
43
44 // Check if both words exist in similarity groups and belong to the same group
45 if (!wordToIndex.has(sentence1[i]) ||
46 !wordToIndex.has(sentence2[i]) ||
47 find(wordToIndex.get(sentence1[i])!) !== find(wordToIndex.get(sentence2[i])!)) {
48 return false;
49 }
50 }
51
52 return true;
53}
54
55// Find operation with path compression for Union-Find
56function find(x: number): number {
57 if (parent[x] !== x) {
58 parent[x] = find(parent[x]); // Path compression
59 }
60 return parent[x];
61}
62
Time and Space Complexity
Time Complexity: O(P * Ξ±(P) + S)
- Building the union-find structure requires iterating through all
P
similar pairs, whereP = len(similarPairs)
- For each pair, we perform at most 2 insertions into the dictionary (constant time each) and 2
find
operations with union - The
find
operation with path compression has an amortized time complexity ofO(Ξ±(P))
, whereΞ±
is the inverse Ackermann function, which is effectively constant for practical purposes - The union operation (setting parent) also involves 2
find
calls, so it'sO(Ξ±(P))
- Processing similar pairs:
O(P * Ξ±(P))
- Comparing sentences requires iterating through
S
words whereS = len(sentence1)
, and for each word we may perform up to 2find
operations:O(S * Ξ±(P))
- Total:
O(P * Ξ±(P) + S * Ξ±(P))
which simplifies toO((P + S) * Ξ±(P))
βO(P + S)
for practical purposes
Space Complexity: O(P)
- The parent array
p
is initialized with size2 * P
(since there can be at most2 * P
unique words from similar pairs):O(P)
- The
words
dictionary stores at most2 * P
unique words:O(P)
- The recursive call stack for
find
operation can go up toO(log P)
in the worst case before path compression, but with path compression it becomesO(Ξ±(P))
which is effectively constant - Total:
O(P)
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Forgetting to Handle Self-Similarity
A critical edge case is when words appear in sentences but NOT in the similarity pairs. Many solutions incorrectly return False
when a word from sentence1
matches a word from sentence2
but neither appears in similarPairs
.
Incorrect Approach:
# This would fail for sentence1=["great"], sentence2=["great"], similarPairs=[]
for i in range(len(sentence1)):
if (sentence1[i] not in word_to_index or
sentence2[i] not in word_to_index or
find(word_to_index[sentence1[i]]) != find(word_to_index[sentence2[i]])):
return False
Correct Solution: Always check for word equality first:
for i in range(len(sentence1)):
if sentence1[i] == sentence2[i]:
continue # Words are identical, so they're similar
# Only then check similarity pairs
2. Incorrect Union Operation
A common mistake is directly assigning parents without using the find
operation, which breaks the Union-Find structure.
Incorrect:
# This doesn't properly unite the groups parent[word_to_index[word1]] = word_to_index[word2]
Correct:
# Properly unite the root parents of both groups parent[find(word_to_index[word1])] = find(word_to_index[word2])
3. Insufficient Parent Array Size
Allocating exactly n
spaces for the parent array when there could be up to 2n
unique words.
Incorrect:
parent = list(range(len(similarPairs))) # Too small!
Correct:
parent = list(range(len(similarPairs) * 2)) # Accounts for all possible unique words
4. Missing Path Compression
Implementing find
without path compression leads to degraded performance in skewed trees.
Inefficient:
def find(x):
while parent[x] != x:
x = parent[x]
return x
Optimized:
def find(x):
if parent[x] != x:
parent[x] = find(parent[x]) # Path compression
return parent[x]
5. Not Handling Empty Similarity Pairs
When similarPairs
is empty, the solution should still work correctly by relying on word equality checks.
Test Case to Verify:
sentence1 = ["I", "love", "coding"] sentence2 = ["I", "love", "coding"] similarPairs = [] # Should return True (all words are identical)
What is an advantages of top-down dynamic programming vs bottom-up dynamic programming?
Recommended Readings
https assets algo monster cover_photos dfs svg Depth First Search Prereqs Recursion Review problems recursion_intro Trees problems tree_intro With a solid understanding of recursion under our belts we are now ready to tackle one of the most useful techniques in coding interviews Depth First Search DFS As the name suggests
https assets algo monster cover_photos bfs svg Breadth First Search on Trees Hopefully by this time you've drunk enough DFS Kool Aid to understand its immense power and seen enough visualization to create a call stack in your mind Now let me introduce the companion spell Breadth First Search BFS
Union Find Disjoint Set Union Data Structure Introduction Prerequisite Depth First Search Review problems dfs_intro Once we have a strong grasp of recursion and Depth First Search we can now introduce Disjoint Set Union DSU This data structure is motivated by the following problem Suppose we have sets of elements
Want a Structured Path to Master System Design Too? Donβt Miss This!