Facebook Pixel

839. Similar String Groups

Problem Description

You are given a list of strings where every string is an anagram of every other string (they contain the same characters but possibly in different order).

Two strings X and Y are considered similar if:

  • They are identical, OR
  • We can make them equal by swapping at most two letters at distinct positions in string X

For example:

  • "tars" and "rats" are similar (swap positions 0 and 2: t↔r and a↔t)
  • "rats" and "arts" are similar (swap positions 0 and 2: r↔a and a↔r)
  • "star" is NOT similar to "tars", "rats", or "arts"

Strings form groups based on similarity. A string belongs to a group if it is similar to at least one other string in that group. This creates a transitive relationship - if string A is similar to B, and B is similar to C, then A and C belong to the same group even if A and C are not directly similar.

From the example above:

  • {"tars", "rats", "arts"} form one group (even though "tars" and "arts" are not directly similar, they are connected through "rats")
  • {"star"} forms another group by itself

Your task is to determine how many such groups exist in the given list of anagram strings.

The solution uses a Union-Find data structure to track connected components. For each pair of strings, if they differ in at most 2 positions, they are similar and should be in the same group. The union operation merges their groups, and the final answer is the number of remaining distinct groups (connected components).

Flowchart Walkthrough

First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:

Is it a graph?

  • Yes: This problem involves strings that form connections based on similarity. Each string can be viewed as a node, and two nodes are connected if the strings are similar (differ by at most 2 character positions). We need to find connected components in this graph.

Is it a tree?

  • No: The graph is not necessarily a tree. Multiple strings can be similar to each other forming cycles, and there can be multiple disconnected components.

Is the problem related to directed acyclic graphs?

  • No: The similarity relationship is symmetric (if A is similar to B, then B is similar to A), making this an undirected graph problem.

Is the problem related to shortest paths?

  • No: We're not finding shortest paths between nodes. We need to count the number of connected components (groups).

Does the problem involve connectivity?

  • Yes: The core of the problem is finding how many connected groups exist. Strings in the same group are connected through similarity relationships, either directly or transitively.

Disjoint Set Union

  • Yes: DSU (Union-Find) is the recommended approach for connectivity problems, especially when we need to count connected components.

Conclusion: The flowchart leads us to use Disjoint Set Union (Union-Find) for this connectivity problem. While DFS could also solve this by exploring each connected component, Union-Find is more efficient for counting groups as we can merge similar strings and track the number of distinct components directly.

Note: Although the solution uses Union-Find, DFS would also work by building an adjacency list of similar strings and performing DFS to mark all strings in each connected component, counting how many times we initiate a new DFS traversal.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is recognizing that this is fundamentally a graph connectivity problem in disguise. Each string is like a person in a social network, and two people are "friends" if their names (strings) are similar enough (differ by at most 2 characters). We want to find how many friend groups exist where everyone in a group is connected through a chain of friendships.

Consider the example with "tars", "rats", "arts", and "star". Even though "tars" and "arts" aren't directly similar, they're in the same group because both are similar to "rats". This transitivity property is exactly what connected components in a graph represent.

Since all strings are anagrams, comparing any two strings for similarity is straightforward - we just count how many positions have different characters. If this count is 0 (identical) or 2 (one swap away), they're similar.

The challenge is efficiently grouping these strings. We could use DFS to explore each connected component, but Union-Find is particularly elegant here because:

  1. We're iterating through pairs of strings anyway to check similarity
  2. When we find two similar strings, we simply union them into the same group
  3. The number of successful unions tells us how many times we merged groups
  4. Starting with n individual groups (one per string), each successful union reduces the group count by 1

The beauty of this approach is that Union-Find handles the transitivity automatically. When we union "tars" with "rats", and later union "rats" with "arts", the data structure ensures all three end up in the same group without us explicitly checking if "tars" and "arts" should be together.

The final answer is simply the number of remaining distinct groups after all possible unions have been performed.

Learn more about Depth-First Search, Breadth-First Search and Union Find patterns.

Solution Approach

The solution implements Union-Find (Disjoint Set Union) with path compression and union by size optimization to efficiently manage the grouping of similar strings.

Union-Find Data Structure Setup:

The UnionFind class maintains two arrays:

  • p: Parent array where p[x] stores the parent of node x. Initially, each node is its own parent (p[i] = i)
  • size: Size array tracking the size of each component for union by size optimization

The find method implements path compression - when finding the root of a node, it updates all nodes along the path to point directly to the root, flattening the tree structure for faster future lookups.

The union method merges two components by:

  1. Finding the roots of both nodes using find
  2. If they're already in the same component, return False
  3. Otherwise, attach the smaller component to the larger one (union by size) to keep the tree balanced
  4. Return True to indicate a successful merge

Main Algorithm:

  1. Initialize variables:

    • n = number of strings (also serves as our group counter)
    • m = length of each string
    • Create a Union-Find structure with n nodes (one for each string)
  2. For each pair of strings (i, s) and (j, t) where j < i:

    • Count the number of differing positions: sum(s[k] != t[k] for k in range(m))
    • If the count is ≀ 2, the strings are similar
    • Attempt to union indices i and j
    • If union is successful (they weren't already in the same group), decrement n by 1
  3. Return n as the final number of groups

Why this works:

Starting with n groups (each string in its own group), every successful union operation merges two groups into one, reducing the total count by 1. The condition sum(s[k] != t[k] for k in range(m)) <= 2 efficiently checks if two anagram strings are similar:

  • 0 differences: strings are identical
  • 2 differences: exactly one swap can make them equal
  • More than 2: not similar

The nested loop structure for i, s in enumerate(strs): for j, t in enumerate(strs[:i]) ensures we check each pair exactly once, avoiding redundant comparisons.

Ready to land your dream job?

Unlock your dream job with a 3-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the algorithm with a small example: strs = ["tars", "rats", "arts", "star"]

Initial Setup:

  • n = 4 (number of groups, initially each string is its own group)
  • m = 4 (length of each string)
  • Union-Find structure initialized with 4 nodes:
    • Parent array: [0, 1, 2, 3] (each node is its own parent)
    • Size array: [1, 1, 1, 1] (each group has size 1)

Step-by-Step Comparison:

  1. i=1, s="rats": Compare with all previous strings

    • Compare with j=0, t="tars":
      • Differences at positions: 0 (rβ‰ t), 2 (tβ‰ r) β†’ 2 differences
      • Similar! Union(1, 0) succeeds
      • Parent array becomes: [0, 0, 2, 3] (node 1 now points to 0)
      • n = 3 (one group merged)
  2. i=2, s="arts": Compare with all previous strings

    • Compare with j=0, t="tars":
      • Differences at positions: 0 (aβ‰ t), 1 (rβ‰ a) β†’ 2 differences
      • Similar! Union(2, 0) succeeds
      • Parent array becomes: [0, 0, 0, 3] (node 2 joins group with root 0)
      • n = 2
    • Compare with j=1, t="rats":
      • Differences at positions: 0 (aβ‰ r), 2 (tβ‰ a) β†’ 2 differences
      • Similar! Union(2, 1) attempted
      • Both already have root 0, so no merge needed
      • n remains 2
  3. i=3, s="star": Compare with all previous strings

    • Compare with j=0, t="tars":
      • Differences at positions: 0 (sβ‰ t), 2 (aβ‰ r), 3 (rβ‰ s) β†’ 3 differences
      • Not similar (>2 differences)
    • Compare with j=1, t="rats":
      • Differences at positions: 0 (sβ‰ r), 1 (tβ‰ a), 2 (aβ‰ t), 3 (rβ‰ s) β†’ 4 differences
      • Not similar
    • Compare with j=2, t="arts":
      • Differences at positions: 0 (sβ‰ a), 1 (tβ‰ r), 3 (rβ‰ s) β†’ 3 differences
      • Not similar

Final Result:

  • Parent array: [0, 0, 0, 3]
  • Groups: {0, 1, 2} and {3}
  • Return n = 2

The algorithm correctly identifies 2 groups:

  • Group 1: ["tars", "rats", "arts"] - all connected through similarity
  • Group 2: ["star"] - isolated, not similar to any other string

Solution Implementation

1from typing import List
2
3
4class UnionFind:
5    """
6    Union-Find (Disjoint Set Union) data structure with path compression and union by size.
7    """
8  
9    def __init__(self, n: int) -> None:
10        """
11        Initialize Union-Find structure with n elements.
12      
13        Args:
14            n: Number of elements (0 to n-1)
15        """
16        # Each element is initially its own parent (self-loop indicates root)
17        self.parent = list(range(n))
18        # Track size of each component for union by size optimization
19        self.component_size = [1] * n
20  
21    def find(self, x: int) -> int:
22        """
23        Find the root/representative of the component containing element x.
24        Uses path compression to flatten the tree structure.
25      
26        Args:
27            x: Element to find the root for
28          
29        Returns:
30            Root element of the component containing x
31        """
32        if self.parent[x] != x:
33            # Path compression: make every node point directly to root
34            self.parent[x] = self.find(self.parent[x])
35        return self.parent[x]
36  
37    def union(self, a: int, b: int) -> bool:
38        """
39        Unite the components containing elements a and b.
40        Uses union by size to keep the tree balanced.
41      
42        Args:
43            a: First element
44            b: Second element
45          
46        Returns:
47            True if elements were in different components (union performed),
48            False if already in same component
49        """
50        root_a = self.find(a)
51        root_b = self.find(b)
52      
53        # Already in the same component
54        if root_a == root_b:
55            return False
56      
57        # Union by size: attach smaller tree under larger tree's root
58        if self.component_size[root_a] > self.component_size[root_b]:
59            self.parent[root_b] = root_a
60            self.component_size[root_a] += self.component_size[root_b]
61        else:
62            self.parent[root_a] = root_b
63            self.component_size[root_b] += self.component_size[root_a]
64      
65        return True
66
67
68class Solution:
69    def numSimilarGroups(self, strs: List[str]) -> int:
70        """
71        Find the number of groups of similar strings.
72        Two strings are similar if they differ in at most 2 positions
73        (can be made equal by swapping at most one pair of characters).
74      
75        Args:
76            strs: List of strings to group
77          
78        Returns:
79            Number of similar groups
80        """
81        num_strings = len(strs)
82        string_length = len(strs[0])
83      
84        # Initialize Union-Find with one component per string
85        union_find = UnionFind(num_strings)
86      
87        # Start with assumption that each string is its own group
88        num_groups = num_strings
89      
90        # Compare each string with all previous strings
91        for i, current_string in enumerate(strs):
92            for j, previous_string in enumerate(strs[:i]):
93                # Count positions where characters differ
94                diff_count = sum(
95                    current_string[k] != previous_string[k] 
96                    for k in range(string_length)
97                )
98              
99                # Strings are similar if they differ in at most 2 positions
100                # (0 differences = identical, 2 differences = one swap away)
101                if diff_count <= 2:
102                    # If union is successful (strings were in different groups),
103                    # decrease the total number of groups
104                    if union_find.union(i, j):
105                        num_groups -= 1
106      
107        return num_groups
108
1/**
2 * Union-Find (Disjoint Set Union) data structure with path compression and union by size
3 */
4class UnionFind {
5    private final int[] parent;     // parent[i] stores the parent of node i
6    private final int[] size;        // size[i] stores the size of the tree rooted at i
7
8    /**
9     * Initialize Union-Find structure with n elements (0 to n-1)
10     * @param n Number of elements
11     */
12    public UnionFind(int n) {
13        parent = new int[n];
14        size = new int[n];
15      
16        // Initially, each element is its own parent (separate component)
17        for (int i = 0; i < n; ++i) {
18            parent[i] = i;
19            size[i] = 1;
20        }
21    }
22
23    /**
24     * Find the root of the component containing element x
25     * Uses path compression optimization for better performance
26     * @param x Element to find root for
27     * @return Root of the component containing x
28     */
29    public int find(int x) {
30        // Path compression: make every node point directly to root
31        if (parent[x] != x) {
32            parent[x] = find(parent[x]);
33        }
34        return parent[x];
35    }
36
37    /**
38     * Union two components containing elements a and b
39     * Uses union by size optimization (smaller tree attached to larger tree)
40     * @param a First element
41     * @param b Second element
42     * @return true if union was performed (elements were in different components), false otherwise
43     */
44    public boolean union(int a, int b) {
45        int rootA = find(a);
46        int rootB = find(b);
47      
48        // Already in the same component
49        if (rootA == rootB) {
50            return false;
51        }
52      
53        // Union by size: attach smaller tree to larger tree
54        if (size[rootA] > size[rootB]) {
55            parent[rootB] = rootA;
56            size[rootA] += size[rootB];
57        } else {
58            parent[rootA] = rootB;
59            size[rootB] += size[rootA];
60        }
61      
62        return true;
63    }
64}
65
66/**
67 * Solution for finding number of similar string groups
68 * Two strings are similar if they differ by exactly 0 or 2 characters (can be made equal by swapping)
69 */
70class Solution {
71    /**
72     * Find the number of groups of similar strings
73     * @param strs Array of strings (all anagrams of each other)
74     * @return Number of groups where strings in same group are similar
75     */
76    public int numSimilarGroups(String[] strs) {
77        int n = strs.length;          // Number of strings
78        int m = strs[0].length();     // Length of each string
79      
80        // Initialize Union-Find with n strings (indexed 0 to n-1)
81        UnionFind unionFind = new UnionFind(n);
82      
83        // Start with n groups (each string is its own group)
84        int groupCount = n;
85      
86        // Check all pairs of strings
87        for (int i = 0; i < n; ++i) {
88            for (int j = 0; j < i; ++j) {
89                // Count character differences between strings i and j
90                int differenceCount = 0;
91                for (int k = 0; k < m; ++k) {
92                    if (strs[i].charAt(k) != strs[j].charAt(k)) {
93                        ++differenceCount;
94                    }
95                }
96              
97                // If strings are similar (differ by 0 or 2 characters)
98                // and successfully united (were in different groups)
99                if (differenceCount <= 2 && unionFind.union(i, j)) {
100                    // Decrease group count as two groups merged into one
101                    --groupCount;
102                }
103            }
104        }
105      
106        return groupCount;
107    }
108}
109
1class UnionFind {
2public:
3    /**
4     * Constructor: Initialize Union-Find data structure with n elements
5     * @param n: Number of elements (0 to n-1)
6     */
7    UnionFind(int n) {
8        parent = vector<int>(n);
9        rank = vector<int>(n, 1);
10        // Initialize each element as its own parent (self-loop)
11        iota(parent.begin(), parent.end(), 0);
12    }
13
14    /**
15     * Unite two elements into the same set
16     * @param a: First element
17     * @param b: Second element
18     * @return: true if union was performed (elements were in different sets), false otherwise
19     */
20    bool unite(int a, int b) {
21        int rootA = find(a);
22        int rootB = find(b);
23      
24        // Already in the same set
25        if (rootA == rootB) {
26            return false;
27        }
28      
29        // Union by rank: attach smaller tree under root of larger tree
30        if (rank[rootA] > rank[rootB]) {
31            parent[rootB] = rootA;
32            rank[rootA] += rank[rootB];
33        } else {
34            parent[rootA] = rootB;
35            rank[rootB] += rank[rootA];
36        }
37      
38        return true;
39    }
40
41    /**
42     * Find the root/representative of the set containing element x
43     * Uses path compression for optimization
44     * @param x: Element to find root for
45     * @return: Root element of the set containing x
46     */
47    int find(int x) {
48        // Path compression: make every node point directly to root
49        if (parent[x] != x) {
50            parent[x] = find(parent[x]);
51        }
52        return parent[x];
53    }
54
55private:
56    vector<int> parent;  // Parent array for Union-Find
57    vector<int> rank;    // Rank/size array for union by rank optimization
58};
59
60class Solution {
61public:
62    /**
63     * Find number of groups of similar strings
64     * Two strings are similar if they differ by at most 2 characters
65     * @param strs: Vector of strings to group
66     * @return: Number of similar groups
67     */
68    int numSimilarGroups(vector<string>& strs) {
69        int numStrings = strs.size();
70        int stringLength = strs[0].size();
71      
72        // Initially, each string is its own group
73        int numGroups = numStrings;
74      
75        UnionFind uf(numStrings);
76      
77        // Check all pairs of strings
78        for (int i = 0; i < numStrings; ++i) {
79            for (int j = 0; j < i; ++j) {
80                // Count number of differing characters between strs[i] and strs[j]
81                int diffCount = 0;
82                for (int k = 0; k < stringLength; ++k) {
83                    if (strs[i][k] != strs[j][k]) {
84                        diffCount++;
85                    }
86                }
87              
88                // If strings are similar (differ by at most 2 chars) and not already in same group
89                if (diffCount <= 2 && uf.unite(i, j)) {
90                    // Successfully merged two groups, decrease group count
91                    numGroups--;
92                }
93            }
94        }
95      
96        return numGroups;
97    }
98};
99
1// Parent array for Union-Find structure
2let parent: number[];
3// Size array for union by rank optimization
4let size: number[];
5
6/**
7 * Initializes the Union-Find data structure
8 * @param n - Number of elements
9 */
10function initUnionFind(n: number): void {
11    // Initialize each element as its own parent (self-loop)
12    parent = Array.from({ length: n }, (_, index) => index);
13    // Initialize size of each component as 1
14    size = Array(n).fill(1);
15}
16
17/**
18 * Unites two elements into the same set
19 * @param a - First element
20 * @param b - Second element
21 * @returns true if union was performed, false if already in same set
22 */
23function union(a: number, b: number): boolean {
24    // Find root parents of both elements
25    const rootA = find(a);
26    const rootB = find(b);
27  
28    // If already in the same set, no union needed
29    if (rootA === rootB) {
30        return false;
31    }
32  
33    // Union by rank: attach smaller tree under larger tree
34    if (size[rootA] > size[rootB]) {
35        parent[rootB] = rootA;
36        size[rootA] += size[rootB];
37    } else {
38        parent[rootA] = rootB;
39        size[rootB] += size[rootA];
40    }
41  
42    return true;
43}
44
45/**
46 * Finds the root parent of an element with path compression
47 * @param x - Element to find root for
48 * @returns Root parent of the element
49 */
50function find(x: number): number {
51    // Path compression: make all nodes point directly to root
52    if (parent[x] !== x) {
53        parent[x] = find(parent[x]);
54    }
55    return parent[x];
56}
57
58/**
59 * Counts the number of similar string groups
60 * Two strings are similar if they differ by at most 2 characters
61 * @param strs - Array of strings to group
62 * @returns Number of distinct groups
63 */
64function numSimilarGroups(strs: string[]): number {
65    const n = strs.length;
66    const stringLength = strs[0].length;
67  
68    // Initialize Union-Find structure
69    initUnionFind(n);
70  
71    // Start with n groups (each string is its own group)
72    let groupCount = n;
73  
74    // Compare all pairs of strings
75    for (let i = 0; i < n; i++) {
76        for (let j = 0; j < i; j++) {
77            // Count character differences between strings
78            let differenceCount = 0;
79            for (let charIndex = 0; charIndex < stringLength; charIndex++) {
80                if (strs[i][charIndex] !== strs[j][charIndex]) {
81                    differenceCount++;
82                }
83            }
84          
85            // If strings are similar (differ by at most 2 chars) and not yet connected
86            if (differenceCount <= 2 && union(i, j)) {
87                // Decrease group count when two groups merge
88                groupCount--;
89            }
90        }
91    }
92  
93    return groupCount;
94}
95

Time and Space Complexity

Time Complexity: O(nΒ² Γ— (m + Ξ±(n)))

The time complexity breaks down as follows:

  • The outer loop iterates through all n strings: O(n)
  • The inner loop iterates through up to i strings for each i, giving us O(nΒ²) total iterations across both loops
  • For each pair of strings being compared:
    • String comparison takes O(m) time to check character differences
    • The union operation calls find twice, each taking O(Ξ±(n)) amortized time due to path compression
    • Total per pair: O(m + Ξ±(n))
  • Overall: O(nΒ² Γ— (m + Ξ±(n)))

Where n is the number of strings, m is the length of each string, and Ξ±(n) is the inverse Ackermann function (effectively a very small constant in practice).

Space Complexity: O(n)

The space complexity consists of:

  • UnionFind data structure stores two arrays:
    • Parent array p: O(n)
    • Size array size: O(n)
  • No additional significant space is used in the main algorithm
  • Total: O(n)

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Incorrect Similarity Check for Anagrams

A critical pitfall is assuming that two anagram strings differing in exactly 2 positions can always be made equal with one swap. This is only true if those 2 positions contain swapped characters.

Incorrect approach:

# Simply counting differences is NOT sufficient!
diff_count = sum(s[k] != t[k] for k in range(len(s)))
if diff_count <= 2:  # This can give false positives
    # Mark as similar

The problem: Consider "abcd" and "abdc". They differ at positions 2 and 3 (c≠d and d≠c). These ARE similar because swapping positions 2 and 3 in the first string gives the second. However, "abcd" and "efcd" also differ in exactly 2 positions (0 and 1), but they are NOT similar because no single swap can transform one into the other.

Correct approach:

def are_similar(s: str, t: str) -> bool:
    if s == t:
        return True
  
    diff_positions = []
    for i in range(len(s)):
        if s[i] != t[i]:
            diff_positions.append(i)
  
    # Must have exactly 2 differences for a swap to work
    if len(diff_positions) != 2:
        return False
  
    # Check if swapping would actually make them equal
    i, j = diff_positions
    return s[i] == t[j] and s[j] == t[i]

2. Inefficient String Comparison for Large Datasets

When the number of strings is small but string length is large, comparing all pairs becomes expensive. The current solution has O(nΒ² Γ— m) complexity where n is the number of strings and m is string length.

Optimization for when n >> m: Instead of comparing all pairs, group strings by their sorted character representation first, then only compare within groups:

def numSimilarGroups(self, strs: List[str]) -> int:
    from collections import defaultdict
  
    # Group anagrams together first
    anagram_groups = defaultdict(list)
    for i, s in enumerate(strs):
        key = ''.join(sorted(s))
        anagram_groups[key].append(i)
  
    n = len(strs)
    uf = UnionFind(n)
    num_groups = n
  
    # Only compare strings within the same anagram group
    for indices in anagram_groups.values():
        for i in range(len(indices)):
            for j in range(i + 1, len(indices)):
                idx1, idx2 = indices[i], indices[j]
                if are_similar(strs[idx1], strs[idx2]):
                    if uf.union(idx1, idx2):
                        num_groups -= 1
  
    return num_groups

3. Memory Issues with Large String Lists

The Union-Find structure uses O(n) space, but the nested loop approach can cause memory issues when creating intermediate data structures.

Solution: Use generators and iterate efficiently:

# Instead of creating a slice with strs[:i]
for j in range(i):  # More memory efficient
    if are_similar(strs[i], strs[j]):
        # Process similarity

4. Not Handling Edge Cases

Common missed cases:

  • Empty list of strings
  • Single string in the list
  • All identical strings
  • Strings of different lengths (though the problem states they're anagrams)

Robust implementation:

def numSimilarGroups(self, strs: List[str]) -> int:
    if not strs:
        return 0
    if len(strs) == 1:
        return 1
  
    # Validate that all strings are anagrams (same length, same character frequency)
    if len(set(len(s) for s in strs)) > 1:
        raise ValueError("All strings must have the same length")
  
    # Continue with main algorithm...

The most critical pitfall is #1 - the similarity check must verify that the differing positions actually contain swapped characters, not just count the differences. This is essential for correctness when working with anagram strings.

Discover Your Strengths and Weaknesses: Take Our 3-Minute Quiz to Tailor Your Study Plan:

Which of the following uses divide and conquer strategy?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More