839. Similar String Groups


Problem Description

The problem requires us to determine the number of groups of similar strings within a list called strs. Two strings are considered similar if they can be made identical by swapping at most two letters in distinct positions within one of the strings. A group of similar strings is defined such that any string belongs to the group if it is similar to at least one other string in the group. Unlike simple string matching, two strings could be in the same group without being directly similar if they are connected through a chain of similarities. This forms a kind of "similarity network". The task is to count how many such networks or groups are present in the given list of strings.

A key factor to note is that every string in strs is an anagram of every other string in strs, which means they all contain the same characters but in different orders.

Flowchart Walkthrough

First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:

Is it a graph?

  • Yes: The problem can be interpreted as a graph where each string is a node, and there is an edge between two nodes if the strings are similar according to the problem criteria.

Is it a tree?

  • No: The graph may have multiple connected components, and cycles can exist because if string A is similar to string B, and string B is similar to string C, then string A might also be similar to string C.

Is the problem related to directed acyclic graphs (DAGs)?

  • No: This problem deals with undirected graph since the similarity between strings is bidirectional.

Is the problem related to shortest paths?

  • No: We are looking for groups of similar strings, not shortest paths.

Does the problem involve connectivity?

  • Yes: We need to determine how the strings are connected to each other in terms of similarity to form groups.

Conclusion: Using the flowchart, since this is a connectivity problem without the concern for shortest paths and not involving a weighted graph, DFS is suggested for exploring each connected component effectively. This makes Depth-First Search (DFS) an appropriate choice for the task of finding all connected components in the graph created by "similar string groups".

Intuition

To solve this problem, we can use a graph-based approach, treating each string as a node and connecting nodes if the strings are similar (considering the condition of being able to swap at most two letters). Effectively, each connected component in the graph would represent a group of similar strings. We can then count how many connected components exist in the graph to answer the question.

The solution uses the Union-Find data structure, which is a very efficient way to track and merge connected components. Each node (string) initially represents a separate group (its own parent). As we iterate over pairs of strings and find that they are similar, we unify them by assigning them to the same group (or in Union-Find terms, making one of them's root parent the parent of the other). The find function is used to determine the root parent of a node, and it recursively updates each node's parent to be the top-most parent, thereby flattening the structure and speeding up future queries. We respect the rule that two strings are considered similar if they have two or less mismatched characters positions since all strings are anagrams of one another.

As we proceed with the unification process for every similar pair of strings, the number of groups gets reduced until all similar strings are connected. Once we have processed all strings, the number of groups is the count of strings that are their own parent, which indicates that they are the root of their respective groups.

The essence of the solution is in identifying these connected components using the similarity rule provided, and efficiently keeping track of and merging these components with Union-Find.

Learn more about Depth-First Search, Breadth-First Search and Union Find patterns.

Solution Approach

The solution to this problem implements the Union-Find data structure, which is key to solving many problems that involve the grouping or partitioning of elements into disjoint sets. The basic operations of Union-Find are find, which gets the representative of a set (in this case, the index of a string that serves as the root parent), and union, which combines two sets into one (achieved by linking the parent of one set to the other).

Here's a step-by-step explanation of how the provided code works:

  1. Initialization: Each string is assigned its own group initially. This is represented in the code as a list p where each element is initialized to its own index, meaning it is its own parent:

    p = list(range(n))
  2. Find Function: A recursive find function is used to look for the root parent of a string. Along the way, it performs path compression by directly connecting the nodes to their root parent, which speeds up subsequent find operations:

    def find(x):
        if p[x] != x:
            p[x] = find(p[x])
        return p[x]
  3. Iterating Over String Pairs: The main logic iterates over all pairs of strings and counts the differences in their characters. If there are at most two differences (swaps), it means that the strings are similar:

    for i in range(n):
        for j in range(i + 1, n):
            if sum(strs[i][k] != strs[j][k] for k in range(l)) <= 2:
  4. Union Operation: When two strings are found to be similar, their groups are unified by setting the parent of one's group to be the parent of the other's group. This effectively merges their sets:

    p[find(i)] = find(j)
  5. Counting Groups: After all pairs have been considered, the number of groups is the count of strings that are still their own parent, which are the roots of the disjoint sets:

    return sum(i == find(i) for i in range(n))

In essence, the code uses Union-Find with path compression to group the similar strings efficiently. It all comes down to iterative comparisons of strings and updating the data structure to reflect these relationships, which allows us to count the separate groups without needing to explicitly maintain or search the graph of connected components.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's use a small example to illustrate the solution approach described above. Suppose our list of strings strs is as follows:

strs = ["tars", "rats", "arts", "star"]

Each string in strs is an anagram of every other string, containing the same characters in different orders.

Step 1: Initialization

We start by initializing the parent list p with the indices of the strings, assuming each string is its own group:

p = [0, 1, 2, 3]

Step 2: Find Function

We define our find function, which will help us find the root parent of a string and apply path compression:

def find(x):
    if p[x] != x:
        p[x] = find(p[x])
    return p[x]

Step 3: Iterating Over String Pairs

Next, we compare each pair of strings to see if they are similar, which means there can be at most two character swaps to make them identical:

  • Comparing "tars" and "rats": There is one swap difference (t-r and r-t), so they are similar.
  • Comparing "tars" and "arts": There are two swap differences (t-a and a-t), so they are similar.
  • No need to compare "tars" and "star" since "tars" has already been compared to "rats" and "arts", which share the same set of characters.
  • Comparing "rats" and "arts": There is one swap difference (r-a and a-r), so they are similar.
  • No need to compare "rats" and "star" since "rats" has already been shown to be part of a group with "tars" and "arts", and "star" shares the same set of characters. They are all similar by association.
  • No need to compare "arts" and "star" as they are already part of the same similarity group.

Step 4: Union Operation

When we find pairs that are similar, we perform the union by setting the parent of one to the root parent of the other:

  • For "tars" and "rats" we do p[find(0)] = find(1), which makes them part of the same group.
  • For "tars" and "arts" we do p[find(0)] = find(2), but since "tars" and "rats" are already connected, this groups "arts" with them.
  • For "rats" and "arts" we do p[find(1)] = find(2), but all three are already connected, so no changes are needed.

After these operations, p might look like [1, 1, 1, 3] indicating that "tars", "rats", and "arts" have a common root which is the index of "rats".

Step 5: Counting Groups

To count the number of groups, we count how many strings are their own parent:

return sum(i == find(i) for i in range(4))

Since "star" hasn't been merged with any other string, it remains its own parent. Therefore, the count of groups is 2, representing the two sets, one with "tars", "rats", and "arts", and the separate one with "star".

In summary, using the provided strings as inputs for the implementation of the Union-Find based algorithm, we could identify and count the separate similarity groups within the list by analyzing the string pairs and applying the union operations accordingly.

Solution Implementation

1class Solution:
2    def numSimilarGroups(self, strs: List[str]) -> int:
3        def find_leader(member):
4            # Recursively find the leader for the current member
5            # and perform path compression along the way.
6            if parent[member] != member:
7                parent[member] = find_leader(parent[member])
8            return parent[member]
9
10        # Get the total number of strings and the length of each string.
11        num_strings = len(strs)
12        length_of_strings = len(strs[0])
13
14        # Initialize the parent array for union-find structure.
15        parent = list(range(num_strings))
16
17        # Compare every pair of strings in the array.
18        for i in range(num_strings):
19            for j in range(i + 1, num_strings):
20                # Count the differences between two strings and
21                # connect them if they have at most 2 differences.
22                if sum(strs[i][k] != strs[j][k] for k in range(length_of_strings)) <= 2:
23                    # Union of the sets that contain i and j
24                    # by assigning the leader of i’s set to the leader of j's set.
25                    parent[find_leader(i)] = find_leader(j)
26
27        # Count the number of unique sets by checking the number of nodes
28        # that are their own leader.
29        return sum(i == find_leader(i) for i in range(num_strings))
30
1class Solution {
2    private int[] parent; // Array representing the parent of each element in the disjoint-set
3
4    // Method to find the number of similarity groups in the input strings
5    public int numSimilarGroups(String[] strs) {
6        int n = strs.length;      // Number of strings in the array
7        parent = new int[n];      // Initialize the parent array for disjoint-set
8      
9        // Initialize each string to be its own parent (self loop)
10        for (int i = 0; i < n; ++i) {
11            parent[i] = i;
12        }
13      
14        // Compare each string with every other string to check for similarity
15        for (int i = 0; i < n; ++i) {
16            for (int j = i + 1; j < n; ++j) {
17                if (isSimilar(strs[i], strs[j])) {
18                    // If strings are similar, union their sets
19                    parent[findParent(i)] = findParent(j);
20                }
21            }
22        }
23      
24        int count = 0; // Counter for the number of distinct similarity groups
25        // Count the number of root elements which represents distinct groups
26        for (int i = 0; i < n; ++i) {
27            if (i == findParent(i)) {
28                count++;
29            }
30        }
31        return count;
32    }
33
34    // Helper method to find the root parent of x using path compression
35    private int findParent(int x) {
36        if (parent[x] != x) {
37            parent[x] = findParent(parent[x]);
38        }
39        return parent[x];
40    }
41
42    // Method to check if two strings are similar
43    // Two strings are similar if they are the same, or their difference is only by two characters
44    private boolean isSimilar(String a, String b) {
45        int differences = 0; // Counter for the number of differences between the strings
46        // Iterate through the characters of the strings and count differences
47        for (int i = 0; i < a.length(); ++i) {
48            if (a.charAt(i) != b.charAt(i)) {
49                differences++;
50            }
51        }
52        // Strings are similar if they have 2 or less differences
53        return differences <= 2;
54    }
55}
56
1#include <vector>
2#include <string>
3
4using std::vector;
5using std::string;
6
7class Solution {
8public:
9    // Parents array for the union-find structure
10    vector<int> parents;
11
12    // Method to find the number of similar groups in the given vector of strings
13    int numSimilarGroups(vector<string>& strs) {
14        int numberOfStrings = strs.size();
15        parents.resize(numberOfStrings);
16
17        // Initialize each string to be its own parent
18        for (int i = 0; i < numberOfStrings; ++i) parents[i] = i;
19
20        // Compare each pair of strings and union them if similar
21        for (int i = 0; i < numberOfStrings; ++i) {
22            for (int j = i + 1; j < numberOfStrings; ++j) {
23                if (isSimilar(strs[i], strs[j])) {
24                    int parentI = findParent(i);
25                    int parentJ = findParent(j);
26                    if (parentI != parentJ) {
27                        parents[parentI] = parentJ;
28                    }
29                }
30            }
31        }
32
33        // Count the number of distinct groups by checking how many strings are their own parent
34        int numGroups = 0;
35        for (int i = 0; i < numberOfStrings; ++i)
36            if (i == findParent(i))
37                ++numGroups;
38
39        return numGroups;
40    }
41
42    // Helper method to check if two strings are similar
43    // Two strings are similar if they can be made equal by swapping at most two characters.
44    bool isSimilar(string& a, string& b) {
45        int numDifferences = 0;
46
47        for (int i = 0; i < a.size(); ++i) {
48            if (a[i] != b[i]) {
49                ++numDifferences;
50                // If there are more than two differences, the strings cannot be similar
51                if (numDifferences > 2) return false;
52            }
53        }
54
55        return true;
56    }
57
58    // Recursive method to find the root parent of string at index x
59    int findParent(int x) {
60        // Path compression: directly connect the current node to the root
61        if (parents[x] != x) parents[x] = findParent(parents[x]);
62        return parents[x];
63    }
64};
65
1// Typescript does not use includes or the same namespace syntax as C++
2
3// Declare a global parents array for the union-find structure
4let parents: number[];
5
6// Function to find the number of similar groups in the given array of strings
7function numSimilarGroups(strs: string[]): number {
8    const numberOfStrings: number = strs.length;
9    parents = new Array(numberOfStrings);
10
11    // Initialize each string to be its own parent
12    for (let i = 0; i < numberOfStrings; ++i) parents[i] = i;
13
14    // Compare each pair of strings and union them if similar
15    for (let i = 0; i < numberOfStrings; ++i) {
16        for (let j = i + 1; j < numberOfStrings; ++j) {
17            if (isSimilar(strs[i], strs[j])) {
18                let parentI: number = findParent(i);
19                let parentJ: number = findParent(j);
20                if (parentI !== parentJ) {
21                    parents[parentI] = parentJ;
22                }
23            }
24        }
25    }
26
27    // Count the number of distinct groups by checking how many strings are their own parent
28    let numGroups: number = 0;
29    for (let i = 0; i < numberOfStrings; ++i)
30        if (i === findParent(i))
31            ++numGroups;
32
33    return numGroups;
34}
35
36// Helper function to check if two strings are similar
37// Two strings are similar if they can be made equal by swapping at most two characters.
38function isSimilar(a: string, b: string): boolean {
39    let numDifferences: number = 0;
40
41    for (let i = 0; i < a.length; ++i) {
42        if (a[i] !== b[i]) {
43            ++numDifferences;
44            // If there are more than two differences, the strings cannot be similar
45            if (numDifferences > 2) return false;
46        }
47    }
48
49    return true;
50}
51
52// Recursive function to find the root parent of string at index x
53function findParent(x: number): number {
54    // Path compression: directly connect the current node to the root
55    if (parents[x] !== x) parents[x] = findParent(parents[x]);
56    return parents[x];
57}
58

Time and Space Complexity

Time Complexity

The given code performs a comparison between each pair of strings within the strs list to determine if they belong to the same group. For each pair of strings, it checks if they differ by at most 2 characters, executing an operation that can be considered as O(L) where L is the length of the string.

There are N strings in the list, and we are comparing each pair of strings. In the worst-case scenario, it requires O(N^2) comparisons. Combining this with the comparison cost for each string gives us a complexity of O(N^2 * L).

However, we have to also consider the union-find algorithm used in the code. In practice, the amortized time complexity of union-find operations (find and union) is O(alpha(N)), where alpha is the Inverse Ackermann function, which grows very slowly and is less than 5 for any practical value of N.

Multiplying the O(N^2) for comparisons by O(alpha(N)) for each union-find operation, we get the total time complexity to be O(N^2 * L * alpha(N)).

Space Complexity

The space complexity is O(N) because the code only uses a parent array p that stores the representative for each string in the group. No additional significant space is required outside the input and the parent array itself.

Learn more about how to find time and space complexity quickly using problem constraints.


Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:
Question 1 out of 10

You are given an array of intervals where intervals[i] = [start_i, end_i] represent the start and end of the ith interval. You need to merge all overlapping intervals and return an array of the non-overlapping intervals that cover all the intervals in the input.


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!