Leetcode 839. Similar String Groups

Problem Explanation

Given a list of strings, we have to identify the number of groups based on their similarity. Two strings are deemed similar if we can swap two letters at different positions of one string to make it equal to the other string.

For example, if we have the strings "tars", "rats", "arts", and "star", we can swap the letters at positions 0 and 2 in "tars" to get "rats". Therefore, these two strings are similar. We can also swap the letters at positions 0 and 2 in "rats" to get "arts", so these strings are also similar.

Here, it's important to note that we can create groups of similar strings even though some strings within the group are not directly similar to each other. For instance, "tars" and "arts" cannot be directly converted to each other using a single swap, but since they are similar to "rats", they belong to the same group. Hence, the strings "tars", "rats", and "arts" form one group. The string "star" cannot be converted to any other string using a single swap and thus forms its own group.

In this problem, we have to return the number of such groups formed from the given list of strings.

Let's take the example where input is ["tars","rats","arts","star"]. We can swap the letters at positions 0 and 2 in "tars" to get "rats". Therefore, these two strings are similar. We can also swap the letters at positions 0 and 2 in "rats" to get "arts", so these strings are also similar. Hence, "tars", "rats", and "arts" form one group. The string "star" cannot be converted to any other string using a single swap and thus forms its own group. Therefore, the total number of groups is 2.

Solution Approach

The problem can be solved by using depth-first search (DFS). Here, we consider each string as a node of a graph. Two nodes are connected if their corresponding strings are similar. Thus, each group of similar strings represents a connected component in the graph. The task is to find the number of these connected components.

We initialise a boolean array seen to keep track of the nodes (strings) that have already been visited. For each unvisited node, we perform DFS and increment the count ans.

In the DFS, we visit all the unvisited nodes that are similar to the current node. This is done by comparing the current node with all other nodes using the isSimilar function. This function returns true if the two given strings are similar (can be converted to each other using a single swap), and false otherwise.

While comparing two strings in isSimilar, we count the number of positions where the strings have different characters. If this count is more than 2, we return false, else true.

The solution iterates over the given list of strings and performs DFS on those strings that have not been visited yet. It keeps track of the visited strings in the seen array. Once the DFS is completed on an unvisited string, it increments the ans count. After iterating through all the strings, the value of ans reflects the number of groups of similar strings.

Python Solution

1
2python
3class Solution:
4    def numSimilarGroups(self, A):
5        seen = [0]*len(A)
6        ans = 0
7        for i in range(len(A)):
8            if not seen[i]:
9                self.dfs(A, i, seen)
10                ans += 1
11        return ans
12
13    def dfs(self, A, i, seen):
14        seen[i] = 1
15        for j in range(len(A)):
16            if not seen[j] and self.isSimilar(A[i], A[j]):
17                self.dfs(A, j, seen)
18
19    def isSimilar(self, X, Y):
20        diff = 0
21        for i in range(len(X)):
22            if X[i] != Y[i]:
23                diff += 1
24            if diff > 2:
25                return False
26        return True

Java Solution

1
2java
3class Solution {
4    private boolean[] seen;
5
6    public int numSimilarGroups(String[] A) {
7        seen = new boolean[A.length];
8        int ans = 0;
9        for (int i = 0; i < A.length; i++) {
10            if (!seen[i]) {
11                dfs(A, i);
12                ans++;
13            }
14        }
15        return ans;
16    }
17
18    private void dfs(String[] A, int i) {
19        seen[i] = true;
20        for (int j = 0; j < A.length; j++) {
21            if (!seen[j] && isSimilar(A[i], A[j])) {
22                dfs(A, j);
23            }
24        }
25    }
26
27    private boolean isSimilar(String X, String Y) {
28        int diff = 0;
29        for (int i = 0; i < X.length(); i++) {
30            if (X.charAt(i) != Y.charAt(i)) {
31                diff++;
32            }
33            if (diff > 2) {
34                return false;
35            }
36        }
37        return true;
38    }
39}

JavaScript Solution

1
2javascript
3class Solution {
4    constructor() {
5        this.seen = [];
6    }
7    
8    numSimilarGroups(A) {
9        this.seen = new Array(A.length).fill(false);
10        let ans = 0;
11
12        for (let i = 0; i < A.length; i++) {
13            if (!this.seen[i]) {
14                this.dfs(A, i);
15                ans++;
16            }
17        }
18        return ans;
19    }
20
21    dfs(A, i) {
22        this.seen[i] = true;
23        
24        for (let j = 0; j < A.length; j++) {
25            if (!this.seen[j] && this.isSimilar(A[i], A[j])) {
26                this.dfs(A, j);
27            }
28        }
29    }
30
31    isSimilar(X, Y) {
32        let diff = 0;
33
34        for (let i = 0; i < X.length; i++) {
35            if (X[i] !== Y[i]) diff++;
36            if (diff > 2) return false;
37        }
38        return true;
39    }
40}

C++ Solution

1
2C++
3class Solution {
4public:
5    int numSimilarGroups(vector<string>& A) {
6        vector<bool> seen(A.size());
7        int ans = 0;
8        for (int i = 0; i < A.size(); ++i) {
9            if (!seen[i]) {
10                dfs(A, i, seen);
11                ++ans;
12            }
13        }
14        return ans;
15    }
16
17private:
18    void dfs(const vector<string>& A, int i, vector<bool>& seen) {
19        seen[i] = true;
20        for (int j = 0; j < A.size(); ++j)
21            if (!seen[j] && isSimilar(A[i], A[j]))
22               dfs(A, j, seen);
23    }
24
25    bool isSimilar(const string& X, const string& Y) {
26        int diff = 0;
27        for (int i = 0; i < X.length(); ++i) {
28            if (X[i] != Y[i] && ++diff > 2)
29               return false;
30        }
31        return true;
32    }
33};

C# Solution

1
2C#
3public class Solution {
4    private bool[] seen;
5    
6    public int NumSimilarGroups(string[] A) {
7        seen = new bool[A.Length];
8        int ans = 0;
9        for (int i = 0; i < A.Length; i++) {
10            if (!seen[i]) {
11                DFS(A, i);
12                ans++;
13            }
14        }
15        return ans;
16    }
17
18    private void DFS(string[] A, int i) {
19        seen[i] = true;
20        for (int j = 0; j < A.Length; j++) {
21            if (!seen[j] && isSimilar(A[i], A[j])) {
22                DFS(A, j);
23            }
24        }
25    }
26
27    private bool isSimilar(string X, string Y) {
28        int diff = 0;
29        for (int i = 0; i < X.Length; i++) {
30            if (X[i] != Y[i]) {
31                diff++;
32            }
33            if (diff > 2) {
34                return false;
35            }
36        }
37        return true;
38    }
39}

The time complexity for this solution is O(n^2 * m), where n is the number of strings and m is the length of each string. The space complexity is O(n), for the seen array and the recursive call stack.The problem of identifying similar string groups is common in data analysis where we need to categorize or group similar data. This problem also finds its applications in coding competitions and interviews where algorithmic skills are tested. Solutions for such problems often involve string manipulations and graph traversals, which are fundamental topics in many computer science and programming courses.

Nonetheless, while efficient for certain inputs, the solution presented can be slow if the list of strings (n) or the length of the strings (m) is very large. This is due to the fact that in the worst case, the algorithm needs to compare every pair of strings, resulting in a time complexity of O(n^2 * m).

To handle large inputs more efficiently, we could employ a few strategies such as:

  1. Sorting the Characters: Instead of comparing all pairs of strings, we could simply sort the characters within each string and then check if the sorted strings are the same to decide if the original strings are similar. This way we can then use a hashmap or a similar data structure to group all the strings that have the same sorted string together. This strategy would reduce the time complexity to O(n * m log m), where n is the number of strings, and m is the maximum length of a string.

  2. Disjoint Set Union (DSU): DSU is a data structure that allows to quickly merge several disjoint sets of objects into a single set, as well as check whether two objects belong to the same set. Using DSU, we can represent each string as a node and add an edge between two nodes if the corresponding strings are similar. Then the number of sets in the DSU will be equal to the number of similar groups. The time complexity using such method is much lower due to the operations of the DSU being very fast.

Considering these strategies, one can choose the appropriate solution depending on the size of the data and the specific uncertainties found in the problem.


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫