839. Similar String Groups
Problem Description
You are given a list of strings where every string is an anagram of every other string (they contain the same characters but possibly in different order).
Two strings X
and Y
are considered similar if:
- They are identical, OR
- We can make them equal by swapping at most two letters at distinct positions in string
X
For example:
"tars"
and"rats"
are similar (swap positions 0 and 2:tβr
andaβt
)"rats"
and"arts"
are similar (swap positions 0 and 2:rβa
andaβr
)"star"
is NOT similar to"tars"
,"rats"
, or"arts"
Strings form groups based on similarity. A string belongs to a group if it is similar to at least one other string in that group. This creates a transitive relationship - if string A is similar to B, and B is similar to C, then A and C belong to the same group even if A and C are not directly similar.
From the example above:
{"tars", "rats", "arts"}
form one group (even though"tars"
and"arts"
are not directly similar, they are connected through"rats"
){"star"}
forms another group by itself
Your task is to determine how many such groups exist in the given list of anagram strings.
The solution uses a Union-Find data structure to track connected components. For each pair of strings, if they differ in at most 2 positions, they are similar and should be in the same group. The union operation merges their groups, and the final answer is the number of remaining distinct groups (connected components).
Flowchart Walkthrough
First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:
Is it a graph?
- Yes: This problem involves strings that form connections based on similarity. Each string can be viewed as a node, and two nodes are connected if the strings are similar (differ by at most 2 character positions). We need to find connected components in this graph.
Is it a tree?
- No: The graph is not necessarily a tree. Multiple strings can be similar to each other forming cycles, and there can be multiple disconnected components.
Is the problem related to directed acyclic graphs?
- No: The similarity relationship is symmetric (if A is similar to B, then B is similar to A), making this an undirected graph problem.
Is the problem related to shortest paths?
- No: We're not finding shortest paths between nodes. We need to count the number of connected components (groups).
Does the problem involve connectivity?
- Yes: The core of the problem is finding how many connected groups exist. Strings in the same group are connected through similarity relationships, either directly or transitively.
Disjoint Set Union
- Yes: DSU (Union-Find) is the recommended approach for connectivity problems, especially when we need to count connected components.
Conclusion: The flowchart leads us to use Disjoint Set Union (Union-Find) for this connectivity problem. While DFS could also solve this by exploring each connected component, Union-Find is more efficient for counting groups as we can merge similar strings and track the number of distinct components directly.
Note: Although the solution uses Union-Find, DFS would also work by building an adjacency list of similar strings and performing DFS to mark all strings in each connected component, counting how many times we initiate a new DFS traversal.
Intuition
The key insight is recognizing that this is fundamentally a graph connectivity problem in disguise. Each string is like a person in a social network, and two people are "friends" if their names (strings) are similar enough (differ by at most 2 characters). We want to find how many friend groups exist where everyone in a group is connected through a chain of friendships.
Consider the example with "tars"
, "rats"
, "arts"
, and "star"
. Even though "tars"
and "arts"
aren't directly similar, they're in the same group because both are similar to "rats"
. This transitivity property is exactly what connected components in a graph represent.
Since all strings are anagrams, comparing any two strings for similarity is straightforward - we just count how many positions have different characters. If this count is 0 (identical) or 2 (one swap away), they're similar.
The challenge is efficiently grouping these strings. We could use DFS to explore each connected component, but Union-Find is particularly elegant here because:
- We're iterating through pairs of strings anyway to check similarity
- When we find two similar strings, we simply union them into the same group
- The number of successful unions tells us how many times we merged groups
- Starting with
n
individual groups (one per string), each successful union reduces the group count by 1
The beauty of this approach is that Union-Find handles the transitivity automatically. When we union "tars"
with "rats"
, and later union "rats"
with "arts"
, the data structure ensures all three end up in the same group without us explicitly checking if "tars"
and "arts"
should be together.
The final answer is simply the number of remaining distinct groups after all possible unions have been performed.
Learn more about Depth-First Search, Breadth-First Search and Union Find patterns.
Solution Approach
The solution implements Union-Find (Disjoint Set Union) with path compression and union by size optimization to efficiently manage the grouping of similar strings.
Union-Find Data Structure Setup:
The UnionFind
class maintains two arrays:
p
: Parent array wherep[x]
stores the parent of nodex
. Initially, each node is its own parent (p[i] = i
)size
: Size array tracking the size of each component for union by size optimization
The find
method implements path compression - when finding the root of a node, it updates all nodes along the path to point directly to the root, flattening the tree structure for faster future lookups.
The union
method merges two components by:
- Finding the roots of both nodes using
find
- If they're already in the same component, return
False
- Otherwise, attach the smaller component to the larger one (union by size) to keep the tree balanced
- Return
True
to indicate a successful merge
Main Algorithm:
-
Initialize variables:
n
= number of strings (also serves as our group counter)m
= length of each string- Create a Union-Find structure with
n
nodes (one for each string)
-
For each pair of strings
(i, s)
and(j, t)
wherej < i
:- Count the number of differing positions:
sum(s[k] != t[k] for k in range(m))
- If the count is β€ 2, the strings are similar
- Attempt to union indices
i
andj
- If union is successful (they weren't already in the same group), decrement
n
by 1
- Count the number of differing positions:
-
Return
n
as the final number of groups
Why this works:
Starting with n
groups (each string in its own group), every successful union operation merges two groups into one, reducing the total count by 1. The condition sum(s[k] != t[k] for k in range(m)) <= 2
efficiently checks if two anagram strings are similar:
- 0 differences: strings are identical
- 2 differences: exactly one swap can make them equal
- More than 2: not similar
The nested loop structure for i, s in enumerate(strs): for j, t in enumerate(strs[:i])
ensures we check each pair exactly once, avoiding redundant comparisons.
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the algorithm with a small example: strs = ["tars", "rats", "arts", "star"]
Initial Setup:
n = 4
(number of groups, initially each string is its own group)m = 4
(length of each string)- Union-Find structure initialized with 4 nodes:
- Parent array:
[0, 1, 2, 3]
(each node is its own parent) - Size array:
[1, 1, 1, 1]
(each group has size 1)
- Parent array:
Step-by-Step Comparison:
-
i=1, s="rats": Compare with all previous strings
- Compare with j=0, t="tars":
- Differences at positions: 0 (rβ t), 2 (tβ r) β 2 differences
- Similar! Union(1, 0) succeeds
- Parent array becomes:
[0, 0, 2, 3]
(node 1 now points to 0) n = 3
(one group merged)
- Compare with j=0, t="tars":
-
i=2, s="arts": Compare with all previous strings
- Compare with j=0, t="tars":
- Differences at positions: 0 (aβ t), 1 (rβ a) β 2 differences
- Similar! Union(2, 0) succeeds
- Parent array becomes:
[0, 0, 0, 3]
(node 2 joins group with root 0) n = 2
- Compare with j=1, t="rats":
- Differences at positions: 0 (aβ r), 2 (tβ a) β 2 differences
- Similar! Union(2, 1) attempted
- Both already have root 0, so no merge needed
n
remains 2
- Compare with j=0, t="tars":
-
i=3, s="star": Compare with all previous strings
- Compare with j=0, t="tars":
- Differences at positions: 0 (sβ t), 2 (aβ r), 3 (rβ s) β 3 differences
- Not similar (>2 differences)
- Compare with j=1, t="rats":
- Differences at positions: 0 (sβ r), 1 (tβ a), 2 (aβ t), 3 (rβ s) β 4 differences
- Not similar
- Compare with j=2, t="arts":
- Differences at positions: 0 (sβ a), 1 (tβ r), 3 (rβ s) β 3 differences
- Not similar
- Compare with j=0, t="tars":
Final Result:
- Parent array:
[0, 0, 0, 3]
- Groups: {0, 1, 2} and {3}
- Return
n = 2
The algorithm correctly identifies 2 groups:
- Group 1:
["tars", "rats", "arts"]
- all connected through similarity - Group 2:
["star"]
- isolated, not similar to any other string
Solution Implementation
1from typing import List
2
3
4class UnionFind:
5 """
6 Union-Find (Disjoint Set Union) data structure with path compression and union by size.
7 """
8
9 def __init__(self, n: int) -> None:
10 """
11 Initialize Union-Find structure with n elements.
12
13 Args:
14 n: Number of elements (0 to n-1)
15 """
16 # Each element is initially its own parent (self-loop indicates root)
17 self.parent = list(range(n))
18 # Track size of each component for union by size optimization
19 self.component_size = [1] * n
20
21 def find(self, x: int) -> int:
22 """
23 Find the root/representative of the component containing element x.
24 Uses path compression to flatten the tree structure.
25
26 Args:
27 x: Element to find the root for
28
29 Returns:
30 Root element of the component containing x
31 """
32 if self.parent[x] != x:
33 # Path compression: make every node point directly to root
34 self.parent[x] = self.find(self.parent[x])
35 return self.parent[x]
36
37 def union(self, a: int, b: int) -> bool:
38 """
39 Unite the components containing elements a and b.
40 Uses union by size to keep the tree balanced.
41
42 Args:
43 a: First element
44 b: Second element
45
46 Returns:
47 True if elements were in different components (union performed),
48 False if already in same component
49 """
50 root_a = self.find(a)
51 root_b = self.find(b)
52
53 # Already in the same component
54 if root_a == root_b:
55 return False
56
57 # Union by size: attach smaller tree under larger tree's root
58 if self.component_size[root_a] > self.component_size[root_b]:
59 self.parent[root_b] = root_a
60 self.component_size[root_a] += self.component_size[root_b]
61 else:
62 self.parent[root_a] = root_b
63 self.component_size[root_b] += self.component_size[root_a]
64
65 return True
66
67
68class Solution:
69 def numSimilarGroups(self, strs: List[str]) -> int:
70 """
71 Find the number of groups of similar strings.
72 Two strings are similar if they differ in at most 2 positions
73 (can be made equal by swapping at most one pair of characters).
74
75 Args:
76 strs: List of strings to group
77
78 Returns:
79 Number of similar groups
80 """
81 num_strings = len(strs)
82 string_length = len(strs[0])
83
84 # Initialize Union-Find with one component per string
85 union_find = UnionFind(num_strings)
86
87 # Start with assumption that each string is its own group
88 num_groups = num_strings
89
90 # Compare each string with all previous strings
91 for i, current_string in enumerate(strs):
92 for j, previous_string in enumerate(strs[:i]):
93 # Count positions where characters differ
94 diff_count = sum(
95 current_string[k] != previous_string[k]
96 for k in range(string_length)
97 )
98
99 # Strings are similar if they differ in at most 2 positions
100 # (0 differences = identical, 2 differences = one swap away)
101 if diff_count <= 2:
102 # If union is successful (strings were in different groups),
103 # decrease the total number of groups
104 if union_find.union(i, j):
105 num_groups -= 1
106
107 return num_groups
108
1/**
2 * Union-Find (Disjoint Set Union) data structure with path compression and union by size
3 */
4class UnionFind {
5 private final int[] parent; // parent[i] stores the parent of node i
6 private final int[] size; // size[i] stores the size of the tree rooted at i
7
8 /**
9 * Initialize Union-Find structure with n elements (0 to n-1)
10 * @param n Number of elements
11 */
12 public UnionFind(int n) {
13 parent = new int[n];
14 size = new int[n];
15
16 // Initially, each element is its own parent (separate component)
17 for (int i = 0; i < n; ++i) {
18 parent[i] = i;
19 size[i] = 1;
20 }
21 }
22
23 /**
24 * Find the root of the component containing element x
25 * Uses path compression optimization for better performance
26 * @param x Element to find root for
27 * @return Root of the component containing x
28 */
29 public int find(int x) {
30 // Path compression: make every node point directly to root
31 if (parent[x] != x) {
32 parent[x] = find(parent[x]);
33 }
34 return parent[x];
35 }
36
37 /**
38 * Union two components containing elements a and b
39 * Uses union by size optimization (smaller tree attached to larger tree)
40 * @param a First element
41 * @param b Second element
42 * @return true if union was performed (elements were in different components), false otherwise
43 */
44 public boolean union(int a, int b) {
45 int rootA = find(a);
46 int rootB = find(b);
47
48 // Already in the same component
49 if (rootA == rootB) {
50 return false;
51 }
52
53 // Union by size: attach smaller tree to larger tree
54 if (size[rootA] > size[rootB]) {
55 parent[rootB] = rootA;
56 size[rootA] += size[rootB];
57 } else {
58 parent[rootA] = rootB;
59 size[rootB] += size[rootA];
60 }
61
62 return true;
63 }
64}
65
66/**
67 * Solution for finding number of similar string groups
68 * Two strings are similar if they differ by exactly 0 or 2 characters (can be made equal by swapping)
69 */
70class Solution {
71 /**
72 * Find the number of groups of similar strings
73 * @param strs Array of strings (all anagrams of each other)
74 * @return Number of groups where strings in same group are similar
75 */
76 public int numSimilarGroups(String[] strs) {
77 int n = strs.length; // Number of strings
78 int m = strs[0].length(); // Length of each string
79
80 // Initialize Union-Find with n strings (indexed 0 to n-1)
81 UnionFind unionFind = new UnionFind(n);
82
83 // Start with n groups (each string is its own group)
84 int groupCount = n;
85
86 // Check all pairs of strings
87 for (int i = 0; i < n; ++i) {
88 for (int j = 0; j < i; ++j) {
89 // Count character differences between strings i and j
90 int differenceCount = 0;
91 for (int k = 0; k < m; ++k) {
92 if (strs[i].charAt(k) != strs[j].charAt(k)) {
93 ++differenceCount;
94 }
95 }
96
97 // If strings are similar (differ by 0 or 2 characters)
98 // and successfully united (were in different groups)
99 if (differenceCount <= 2 && unionFind.union(i, j)) {
100 // Decrease group count as two groups merged into one
101 --groupCount;
102 }
103 }
104 }
105
106 return groupCount;
107 }
108}
109
1class UnionFind {
2public:
3 /**
4 * Constructor: Initialize Union-Find data structure with n elements
5 * @param n: Number of elements (0 to n-1)
6 */
7 UnionFind(int n) {
8 parent = vector<int>(n);
9 rank = vector<int>(n, 1);
10 // Initialize each element as its own parent (self-loop)
11 iota(parent.begin(), parent.end(), 0);
12 }
13
14 /**
15 * Unite two elements into the same set
16 * @param a: First element
17 * @param b: Second element
18 * @return: true if union was performed (elements were in different sets), false otherwise
19 */
20 bool unite(int a, int b) {
21 int rootA = find(a);
22 int rootB = find(b);
23
24 // Already in the same set
25 if (rootA == rootB) {
26 return false;
27 }
28
29 // Union by rank: attach smaller tree under root of larger tree
30 if (rank[rootA] > rank[rootB]) {
31 parent[rootB] = rootA;
32 rank[rootA] += rank[rootB];
33 } else {
34 parent[rootA] = rootB;
35 rank[rootB] += rank[rootA];
36 }
37
38 return true;
39 }
40
41 /**
42 * Find the root/representative of the set containing element x
43 * Uses path compression for optimization
44 * @param x: Element to find root for
45 * @return: Root element of the set containing x
46 */
47 int find(int x) {
48 // Path compression: make every node point directly to root
49 if (parent[x] != x) {
50 parent[x] = find(parent[x]);
51 }
52 return parent[x];
53 }
54
55private:
56 vector<int> parent; // Parent array for Union-Find
57 vector<int> rank; // Rank/size array for union by rank optimization
58};
59
60class Solution {
61public:
62 /**
63 * Find number of groups of similar strings
64 * Two strings are similar if they differ by at most 2 characters
65 * @param strs: Vector of strings to group
66 * @return: Number of similar groups
67 */
68 int numSimilarGroups(vector<string>& strs) {
69 int numStrings = strs.size();
70 int stringLength = strs[0].size();
71
72 // Initially, each string is its own group
73 int numGroups = numStrings;
74
75 UnionFind uf(numStrings);
76
77 // Check all pairs of strings
78 for (int i = 0; i < numStrings; ++i) {
79 for (int j = 0; j < i; ++j) {
80 // Count number of differing characters between strs[i] and strs[j]
81 int diffCount = 0;
82 for (int k = 0; k < stringLength; ++k) {
83 if (strs[i][k] != strs[j][k]) {
84 diffCount++;
85 }
86 }
87
88 // If strings are similar (differ by at most 2 chars) and not already in same group
89 if (diffCount <= 2 && uf.unite(i, j)) {
90 // Successfully merged two groups, decrease group count
91 numGroups--;
92 }
93 }
94 }
95
96 return numGroups;
97 }
98};
99
1// Parent array for Union-Find structure
2let parent: number[];
3// Size array for union by rank optimization
4let size: number[];
5
6/**
7 * Initializes the Union-Find data structure
8 * @param n - Number of elements
9 */
10function initUnionFind(n: number): void {
11 // Initialize each element as its own parent (self-loop)
12 parent = Array.from({ length: n }, (_, index) => index);
13 // Initialize size of each component as 1
14 size = Array(n).fill(1);
15}
16
17/**
18 * Unites two elements into the same set
19 * @param a - First element
20 * @param b - Second element
21 * @returns true if union was performed, false if already in same set
22 */
23function union(a: number, b: number): boolean {
24 // Find root parents of both elements
25 const rootA = find(a);
26 const rootB = find(b);
27
28 // If already in the same set, no union needed
29 if (rootA === rootB) {
30 return false;
31 }
32
33 // Union by rank: attach smaller tree under larger tree
34 if (size[rootA] > size[rootB]) {
35 parent[rootB] = rootA;
36 size[rootA] += size[rootB];
37 } else {
38 parent[rootA] = rootB;
39 size[rootB] += size[rootA];
40 }
41
42 return true;
43}
44
45/**
46 * Finds the root parent of an element with path compression
47 * @param x - Element to find root for
48 * @returns Root parent of the element
49 */
50function find(x: number): number {
51 // Path compression: make all nodes point directly to root
52 if (parent[x] !== x) {
53 parent[x] = find(parent[x]);
54 }
55 return parent[x];
56}
57
58/**
59 * Counts the number of similar string groups
60 * Two strings are similar if they differ by at most 2 characters
61 * @param strs - Array of strings to group
62 * @returns Number of distinct groups
63 */
64function numSimilarGroups(strs: string[]): number {
65 const n = strs.length;
66 const stringLength = strs[0].length;
67
68 // Initialize Union-Find structure
69 initUnionFind(n);
70
71 // Start with n groups (each string is its own group)
72 let groupCount = n;
73
74 // Compare all pairs of strings
75 for (let i = 0; i < n; i++) {
76 for (let j = 0; j < i; j++) {
77 // Count character differences between strings
78 let differenceCount = 0;
79 for (let charIndex = 0; charIndex < stringLength; charIndex++) {
80 if (strs[i][charIndex] !== strs[j][charIndex]) {
81 differenceCount++;
82 }
83 }
84
85 // If strings are similar (differ by at most 2 chars) and not yet connected
86 if (differenceCount <= 2 && union(i, j)) {
87 // Decrease group count when two groups merge
88 groupCount--;
89 }
90 }
91 }
92
93 return groupCount;
94}
95
Time and Space Complexity
Time Complexity: O(nΒ² Γ (m + Ξ±(n)))
The time complexity breaks down as follows:
- The outer loop iterates through all
n
strings:O(n)
- The inner loop iterates through up to
i
strings for eachi
, giving usO(nΒ²)
total iterations across both loops - For each pair of strings being compared:
- String comparison takes
O(m)
time to check character differences - The
union
operation callsfind
twice, each takingO(Ξ±(n))
amortized time due to path compression - Total per pair:
O(m + Ξ±(n))
- String comparison takes
- Overall:
O(nΒ² Γ (m + Ξ±(n)))
Where n
is the number of strings, m
is the length of each string, and Ξ±(n)
is the inverse Ackermann function (effectively a very small constant in practice).
Space Complexity: O(n)
The space complexity consists of:
- UnionFind data structure stores two arrays:
- Parent array
p
:O(n)
- Size array
size
:O(n)
- Parent array
- No additional significant space is used in the main algorithm
- Total:
O(n)
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Incorrect Similarity Check for Anagrams
A critical pitfall is assuming that two anagram strings differing in exactly 2 positions can always be made equal with one swap. This is only true if those 2 positions contain swapped characters.
Incorrect approach:
# Simply counting differences is NOT sufficient!
diff_count = sum(s[k] != t[k] for k in range(len(s)))
if diff_count <= 2: # This can give false positives
# Mark as similar
The problem: Consider "abcd"
and "abdc"
. They differ at positions 2 and 3 (cβ d
and dβ c
). These ARE similar because swapping positions 2 and 3 in the first string gives the second. However, "abcd"
and "efcd"
also differ in exactly 2 positions (0 and 1), but they are NOT similar because no single swap can transform one into the other.
Correct approach:
def are_similar(s: str, t: str) -> bool:
if s == t:
return True
diff_positions = []
for i in range(len(s)):
if s[i] != t[i]:
diff_positions.append(i)
# Must have exactly 2 differences for a swap to work
if len(diff_positions) != 2:
return False
# Check if swapping would actually make them equal
i, j = diff_positions
return s[i] == t[j] and s[j] == t[i]
2. Inefficient String Comparison for Large Datasets
When the number of strings is small but string length is large, comparing all pairs becomes expensive. The current solution has O(nΒ² Γ m) complexity where n is the number of strings and m is string length.
Optimization for when n >> m: Instead of comparing all pairs, group strings by their sorted character representation first, then only compare within groups:
def numSimilarGroups(self, strs: List[str]) -> int:
from collections import defaultdict
# Group anagrams together first
anagram_groups = defaultdict(list)
for i, s in enumerate(strs):
key = ''.join(sorted(s))
anagram_groups[key].append(i)
n = len(strs)
uf = UnionFind(n)
num_groups = n
# Only compare strings within the same anagram group
for indices in anagram_groups.values():
for i in range(len(indices)):
for j in range(i + 1, len(indices)):
idx1, idx2 = indices[i], indices[j]
if are_similar(strs[idx1], strs[idx2]):
if uf.union(idx1, idx2):
num_groups -= 1
return num_groups
3. Memory Issues with Large String Lists
The Union-Find structure uses O(n) space, but the nested loop approach can cause memory issues when creating intermediate data structures.
Solution: Use generators and iterate efficiently:
# Instead of creating a slice with strs[:i]
for j in range(i): # More memory efficient
if are_similar(strs[i], strs[j]):
# Process similarity
4. Not Handling Edge Cases
Common missed cases:
- Empty list of strings
- Single string in the list
- All identical strings
- Strings of different lengths (though the problem states they're anagrams)
Robust implementation:
def numSimilarGroups(self, strs: List[str]) -> int:
if not strs:
return 0
if len(strs) == 1:
return 1
# Validate that all strings are anagrams (same length, same character frequency)
if len(set(len(s) for s in strs)) > 1:
raise ValueError("All strings must have the same length")
# Continue with main algorithm...
The most critical pitfall is #1 - the similarity check must verify that the differing positions actually contain swapped characters, not just count the differences. This is essential for correctness when working with anagram strings.
Which of the following uses divide and conquer strategy?
Recommended Readings
https assets algo monster cover_photos dfs svg Depth First Search Prereqs Recursion Review problems recursion_intro Trees problems tree_intro With a solid understanding of recursion under our belts we are now ready to tackle one of the most useful techniques in coding interviews Depth First Search DFS As the name suggests
https assets algo monster cover_photos bfs svg Breadth First Search on Trees Hopefully by this time you've drunk enough DFS Kool Aid to understand its immense power and seen enough visualization to create a call stack in your mind Now let me introduce the companion spell Breadth First Search BFS
Union Find Disjoint Set Union Data Structure Introduction Prerequisite Depth First Search Review problems dfs_intro Once we have a strong grasp of recursion and Depth First Search we can now introduce Disjoint Set Union DSU This data structure is motivated by the following problem Suppose we have sets of elements
Want a Structured Path to Master System Design Too? Donβt Miss This!