2157. Groups of Strings
Problem Description
In this problem, you are given an array of strings, words
, with each string containing unique lowercase English letters. There are two unique concepts defined:
-
Connectivity between strings: Two strings are considered "connected" if you can perform exactly one of the following operations to one string to get the other:
- Add a single letter.
- Remove a single letter.
- Replace a single letter with another letter (could be the same letter).
-
Grouping: The strings can be divided into "groups" with the following rules:
- A string can join a group if it is connected to at least one other string in the group.
- A string can also form a group by itself if it's not connected to any other.
The task is to find a unique division of words
into groups such that:
- No string belonging to one group is connected to a string in another group.
- The division results in the maximum number of groups possible.
You're to return an array ans
of size 2
, where:
ans[0]
is the maximum number of groups that thewords
can be divided into.ans[1]
is the size of the largest group.
The solution requires ensuring that when a string is grouped, it isn't connected to any other string outside of its group, which guarantees the groups are non-intersecting and uniquely determined.
Intuition
The intuition behind the solution involves bit manipulation and disjoint set (also known as union-find) data structure to keep track of groups efficiently.
- Since each letter can occur only once, and there are 26 lowercase letters, each word can be uniquely represented by a 26-bit number, marking which letters are present in the word.
- Utilize the union-find data structure to merge sets (or groups) when strings are "connected."
- Track the size of each set to determine the maximum group size.
The approach:
- Convert each word into a 26-bit number and initialize its parent to itself and initialize the size of each set.
- Perform union-find operations to merge sets:
- Consider all possible single letter changes for each word's bit representation:
- For adding or deleting a letter, flip the corresponding bit.
- For replacing a letter, flip the two corresponding bits.
- Perform union operation if the altered bit representation corresponds to another word.
- Consider all possible single letter changes for each word's bit representation:
- Count the number of unique parents as the maximum number of groups.
- Keep track of the size of the largest group during the process.
By the end of this process, the disjoint sets have been formed, representing the unique groups, and the largest group size has been captured.
Learn more about Union Find patterns.
Solution Approach
The given solution uses bit manipulation and union-find data structures to achieve the task.
Bit Manipulation: Since each word consists of lowercase letters without repetition, each word can be represented as a 26-bit integer. Each bit represents whether a particular letter (from 'a' to 'z') exists in the word or not. This bit representation is obtained by iterating through each character of the word, calculating the difference of the ASCII value of that character from 'a', and setting the corresponding bit.
Union-Find:
Union-find, also known as disjoint set, is a data structure that keeps track of elements which are split into one or more disjoint sets. Its main operations are find
- to find the representative of a set that an element belongs to, and union
- to merge two sets if they are not already merged.
Here's an explanation of the key parts of the implemented solution:
-
Initialize the necessary data structures:
- Dictionary
p
to track the parent of each 26-bit number. - Counter
size
to track the size of each group. - Int variables
mx
to keep the size of the largest group, andn
, which is the total number of different groups.
- Dictionary
-
Process each word in
words
:- Convert the word to its bit representation and store its initial parent (itself) and increment its count in the size counter.
- Track the maximum size of groups with
mx
.
-
Explore all possible connections via bit manipulation:
- Loop through each 26-bit number and simulate adding, removing, or replacing a letter:
- Toggle a bit to simulate adding or deleting a letter.
- Toggle a pair of bits to simulate replacing a letter.
- If the resulting bit pattern is in the parent dictionary, attempt to
union
the original and altered patterns.
- Loop through each 26-bit number and simulate adding, removing, or replacing a letter:
-
The
find
andunion
functions work together to manage the disjoint sets:find
recursively locates the representative of the set for a given bit pattern, compressing paths along the way.union
joins two sets if they are not already joined, updating the parent mapping, and merges the sizes while updating themx
andn
variables.
The solution dynamically builds connections between words, merging groups when a connection is found. By the end of the iteration, we have the maximum number of groups as n
and the size of the largest group as mx
, which is returned as the final result [n, mx]
.
By using a bitwise representation of the words and the efficient union-find data structure, the solution scales well with the input size, handling the connectivity logic with minimal overhead.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's demonstrate how the solution approach works with a small example.
Suppose we have the following array of words:
words = ["cat", "bat", "rat", "drat", "dart", "carts"]
-
Convert each word to a 26-bit number (we'll use pseudocode for the bit manipulation):
- "cat" -> 0110010... (bits representing 'c', 'a', 't')
- "bat" -> 0110001... (bits representing 'b', 'a', 't')
- "rat" -> 0100001... (bits representing 'r', 'a', 't')
- "drat" -> 1100001... (bits for 'd', 'r', 'a', 't')
- "dart" -> 1110000... (bits for 'd', 'a', 'r', 't')
- "carts" -> 0110110... (bits for 'c', 'a', 'r', 't', 's')
-
Initialize the parent and size for each unique 26-bit number:
p["cat"] = "cat", size["cat"] = 1 p["bat"] = "bat", size["bat"] = 1 p["rat"] = "rat", size["rat"] = 1 p["drat"] = "drat", size["drat"] = 1 p["dart"] = "dart", size["dart"] = 1 p["carts"] = "carts", size["carts"] = 1
-
Explore all possible connections via bit manipulation:
- For "cat" we toggle bits corresponding to adding, removing, or replacing a letter:
- When we flip the 'c' bit, we get "at" which isn't in the array so we ignore it.
- When we flip the 'a' bit, we get "ct" which isn't in the array so we ignore it.
- When we flip the 't' bit, we get "ca" which isn't in the array so we ignore it.
- When we flip the 'c' and 'b', we get a match with "bat", so we union "cat" and "bat". Now they are in the same set.
- We would perform similar operations for "bat," "rat," "drat," "dart," and "carts."
- For "cat" we toggle bits corresponding to adding, removing, or replacing a letter:
-
After doing the union operations for each word, we would find that:
- "cat" and "bat" are connected; they form one group.
- "rat," "drat," and "dart" are connected; they form another group.
- "carts" stands alone as it can't connect with a single letter change.
Which would result in:
p["cat"] = "cat", size["cat"] = 2 (since "bat" is also connected to "cat") p["rat"] = "rat", size["rat"] = 3 (connected to both "drat" and "dart") p["carts"] = "carts", size["carts"] = 1
The number of unique parents is 3, so we have 3 unique groups. The maximum size of a group is 3 (the group with "rat", "drat", and "dart").
So the final output is:
ans = [3, 3]
This output indicates that there are 3 maximum groups and the largest group contains 3 words.
Solution Implementation
1from collections import Counter
2
3class Solution:
4 def groupStrings(self, words: List[str]) -> List[int]:
5 # Helper function to find the root of a set
6 def find(word_bitmask):
7 if parent[word_bitmask] != word_bitmask:
8 parent[word_bitmask] = find(parent[word_bitmask])
9 return parent[word_bitmask]
10
11 # Helper function to union two sets
12 def union(bitmask_a, bitmask_b):
13 nonlocal max_group_size, distinct_groups
14 if bitmask_b not in parent:
15 return
16 parent_a, parent_b = find(bitmask_a), find(bitmask_b)
17 if parent_a == parent_b:
18 return
19 parent[parent_a] = parent_b
20 size[parent_b] += size[parent_a]
21 max_group_size = max(max_group_size, size[parent_b])
22 distinct_groups -= 1
23
24 # Initialization of the disjoint-set union structure
25 parent = {}
26 size = Counter()
27 distinct_groups = len(words)
28 max_group_size = 0
29
30 # First pass: Populate disjoint set and calculate initial group sizes
31 for word in words:
32 bitmask = 0
33 for char in word:
34 bitmask |= 1 << (ord(char) - ord('a'))
35 parent[bitmask] = bitmask
36 size[bitmask] += 1
37 max_group_size = max(max_group_size, size[bitmask])
38 if size[bitmask] > 1:
39 distinct_groups -= 1
40
41 # Second pass: Perform union operations on sets based on shared or differing characters
42 for bitmask in parent.keys():
43 for i in range(26):
44 union(bitmask, bitmask ^ (1 << i))
45 if (bitmask >> i) & 1:
46 for j in range(26):
47 if ((bitmask >> j) & 1) == 0:
48 union(bitmask, bitmask ^ (1 << i) | (1 << j))
49
50 # Return the number of distinct groups and the size of the largest group
51 return [distinct_groups, max_group_size]
52
1class Solution {
2 private Map<Integer, Integer> parent;
3 private Map<Integer, Integer> groupSize;
4 private int maxGroupSize;
5 private int groupCount;
6
7 public int[] groupStrings(String[] words) {
8 parent = new HashMap<>();
9 groupSize = new HashMap<>();
10 groupCount = words.length;
11 maxGroupSize = 0; // Initialize max group size as zero
12
13 // Convert each word into a bitmask and initialize disjoint sets
14 for (String word : words) {
15 int bitmask = 0;
16 for (char c : word.toCharArray()) {
17 bitmask |= 1 << (c - 'a');
18 }
19 // Initialize or update the size of the set that the bitmask belongs to
20 parent.put(bitmask, bitmask);
21 groupSize.put(bitmask, groupSize.getOrDefault(bitmask, 0) + 1);
22 // Update max group size if necessary
23 maxGroupSize = Math.max(maxGroupSize, groupSize.get(bitmask));
24 // Decrement group count for duplicate bitmasks
25 if (groupSize.get(bitmask) > 1) {
26 groupCount--;
27 }
28 }
29
30 // Attempt to union sets based on the shifting of bits
31 for (int x : parent.keySet()) {
32 for (int i = 0; i < 26; ++i) {
33 union(x, x ^ (1 << i)); // Shift bit for each alphabet position
34 if ((x >> i & 1) != 0) { // If the bit at position i is set to 1
35 for (int j = 0; j < 26; ++j) { // Try shifting the bit from position i to position j
36 if ((x >> j & 1) == 0) {
37 union(x, x ^ (1 << i) | (1 << j));
38 }
39 }
40 }
41 }
42 }
43
44 // Return the number of distinct groups and the size of the largest group
45 return new int[] {groupCount, maxGroupSize};
46 }
47
48 // Find the root of the set that the element x belongs to
49 private int find(int x) {
50 if (parent.get(x) != x) {
51 parent.put(x, find(parent.get(x))); // Path compression
52 }
53 return parent.get(x);
54 }
55
56 // Union two sets that a and b belong to, if they are different sets
57 private void union(int a, int b) {
58 // Ignore if set b is not present
59 if (!parent.containsKey(b)) {
60 return;
61 }
62 int rootA = find(a), rootB = find(b);
63 if (rootA == rootB) {
64 return;
65 }
66 // Merge the two sets and update their size
67 parent.put(rootA, rootB);
68 groupSize.put(rootB, groupSize.get(rootB) + groupSize.get(rootA));
69 // Update the max group size after merging
70 maxGroupSize = Math.max(maxGroupSize, groupSize.get(rootB));
71 // Decrement the group count since two groups have been merged
72 groupCount--;
73 }
74}
75
1#include <vector>
2#include <string>
3#include <unordered_map>
4#include <algorithm>
5
6class Solution {
7public:
8 int maxGroupSize, groupCount; // Variables to maintain max group size and total group count
9
10 // Returns a vector containing the number of different groups and the size of the largest group
11 vector<int> groupStrings(vector<string>& words) {
12 unordered_map<int, int> parent; // Parent mapping for disjoint sets
13 unordered_map<int, int> groupSize; // Size of each group
14 maxGroupSize = 0; // Initialize max group size
15 groupCount = words.size(); // Initial group count is the total number of words
16
17 // Initialize the disjoint set and group sizes
18 for (auto& word : words) {
19 int mask = 0;
20 // Create bit mask for each character present in the word
21 for (auto& c : word) mask |= 1 << (c - 'a');
22 parent[mask] = mask; // The parent of each set is itself initially
23 ++groupSize[mask]; // Increment the group size for the bit mask
24 maxGroupSize = max(maxGroupSize, groupSize[mask]); // Update max group size
25 if (groupSize[mask] > 1) --groupCount; // Decrement the group count if we have a duplicate
26 }
27
28 // Union sets by bit manipulation, trying all possible one-bit changes
29 for (auto& [mask, _] : parent) {
30 for (int i = 0; i < 26; ++i) {
31 // Try uniting the current set with sets that have one bit different
32 uniteGroups(mask, mask ^ (1 << i), parent, groupSize);
33 // If current bit is set, try uniting with sets that have this bit unset
34 if ((mask >> i) & 1) {
35 for (int j = 0; j < 26; ++j) {
36 if (((mask >> j) & 1) == 0)
37 uniteGroups(mask, mask ^ (1 << i) | (1 << j), parent, groupSize);
38 }
39 }
40 }
41 }
42 return {groupCount, maxGroupSize};
43 }
44
45private:
46 // Find the root of the set that 'x' belongs to
47 int findSetRoot(int x, unordered_map<int, int>& parent) {
48 if (parent[x] != x) parent[x] = findSetRoot(parent[x], parent);
49 return parent[x];
50 }
51
52 // Unite two disjoint sets by their representatives
53 void uniteGroups(int a, int b, unordered_map<int, int>& parent, unordered_map<int, int>& groupSize) {
54 if (!parent.count(b)) return; // Only unite if 'b' is an existing parent
55 int rootA = findSetRoot(a, parent), rootB = findSetRoot(b, parent);
56 if (rootA == rootB) return; // They are already in the same set
57
58 // Union by size, the larger group absorbs the smaller
59 parent[rootA] = rootB;
60 groupSize[rootB] += groupSize[rootA];
61
62 // Update max group size after union
63 maxGroupSize = max(maxGroupSize, groupSize[rootB]);
64
65 --groupCount; // Decrement the group count as two groups have merged
66 }
67};
68
1// TypeScript's way to implement the same functionality
2// We start by defining required types and global variables
3
4// Represents the parent mapping for disjoint sets
5const parent: { [key: number]: number } = {};
6
7// Size of each group mapped by bit mask representative
8const groupSize: { [key: number]: number } = {};
9
10// Variable to maintain max group size
11let maxGroupSize: number = 0;
12
13// Variable to maintain total group count
14let groupCount: number = 0;
15
16// Returns an array containing the number of different groups and the size of the largest group
17function groupStrings(words: string[]): number[] {
18 maxGroupSize = 0; // Initialize max group size
19 groupCount = words.length; // Initial group count is the total number of words
20
21 // Initialize the disjoint set and group sizes
22 words.forEach(word => {
23 let mask = 0;
24 // Create a bitmask for each character present in the word
25 [...word].forEach(c => mask |= 1 << (c.charCodeAt(0) - 'a'.charCodeAt(0)));
26
27 if (!parent[mask]) {
28 // The parent of each set is itself initially
29 parent[mask] = mask;
30 // The group size for the bitmask starts at 1
31 groupSize[mask] = 1;
32 } else {
33 // Increment the group size for the existing bitmask
34 ++groupSize[mask];
35 // If group size is incremented, decrement group count if it was a unique group before
36 if (groupSize[mask] === 2) --groupCount;
37 }
38 // Update max group size
39 maxGroupSize = Math.max(maxGroupSize, groupSize[mask]);
40 });
41
42 // Union sets by bit manipulation, trying all possible one-bit changes
43 Object.keys(parent).forEach(maskStr => {
44 const mask = Number(maskStr);
45 for (let i = 0; i < 26; ++i) {
46 // Try uniting the current set with sets that have one bit different
47 uniteGroups(mask, mask ^ (1 << i));
48 // If current bit is set, try uniting with sets that have this bit unset
49 if ((mask >> i) & 1) {
50 for (let j = 0; j < 26; ++j) {
51 if (((mask >> j) & 1) === 0) {
52 uniteGroups(mask, mask ^ (1 << i) | (1 << j));
53 }
54 }
55 }
56 }
57 });
58
59 return [groupCount, maxGroupSize];
60}
61
62// Find the root of the set that 'x' belongs to
63function findSetRoot(x: number): number {
64 if (parent[x] !== x) parent[x] = findSetRoot(parent[x]);
65 return parent[x];
66}
67
68// Unite two disjoint sets by their representatives
69function uniteGroups(a: number, b: number): void {
70 if (parent[b] === undefined) return; // Only unite if 'b' is an existing parent
71 const rootA = findSetRoot(a), rootB = findSetRoot(b);
72 if (rootA === rootB) return; // They are already in the same set
73
74 // Union by size, the larger group absorbs the smaller one
75 parent[rootA] = rootB;
76 groupSize[rootB] += groupSize[rootA];
77
78 // Update max group size after union
79 maxGroupSize = Math.max(maxGroupSize, groupSize[rootB]);
80
81 // Decrement the group count as two groups have merged
82 --groupCount;
83}
84
Time and Space Complexity
Time Complexity
The time complexity of the groupStrings
function can be broken down based on the operations it performs:
-
Initialization of data structures:
- Initializing the parent dictionary
p
and the size countersize
can be considered O(1) since they are empty at the start. - Populating the size counter according to the input words is O(N * L), where N is the number of words and L is the average length of a word because each character is processed individually.
- Initializing the parent dictionary
-
Parent-Finding and Union Operations:
- The
find
function performs the path compression technique, which, after initial setup, has a nearly constant time complexity, amortized O(α(N)), where α(N) is the inverse Ackermann function, which is a very slow-growing function. - The two nested
for
loops, with 26 iterations each, result in a potential O(26^2) operations per word. This predominantly dominates the time complexity within the loop. - The
union
operations inside the loops are performed only if certain bitwise conditions are met; however, in the worst case, those conditions will always be true, resulting in 26^2 union operations for each of the N words.
- The
Therefore, the total worst-case time complexity is O(N * (L + 26^2 * α(N))), considering that α(N) is a very slow-growing function and can be considered nearly a constant, we can simplify this to O(N * (L + 26^2)).
Space Complexity
The space complexity can be analyzed by considering the memory used by data structures and the recursion stack for the find
function:
-
Data Structures:
- The
p
andsize
data structures will at most have a distinct entry for each unique bit representation of words, which is at most O(N) space complexity since there can be at most N unique bit masks. - The
Counter
object forsize
also has at most O(N) space complexity.
- The
-
Recursion Stack:
- The
find
function could have a recursion depth equal to the number of elements inp
, which is at most O(N) in the worst case. However, due to path compression, the average recursion depth will be O(α(N)), which is nearly constant.
- The
Therefore, the total space complexity is O(N), as the recursion stack does not significantly contribute to the space complexity due to the path compression resulting in an almost flat tree.
Learn more about how to find time and space complexity quickly using problem constraints.
Which algorithm should you use to find a node that is close to the root of the tree?
Recommended Readings
Union Find Disjoint Set Union Data Structure Introduction Prerequisite Depth First Search Review problems dfs_intro Once we have a strong grasp of recursion and Depth First Search we can now introduce Disjoint Set Union DSU This data structure is motivated by the following problem Suppose we have sets of elements
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Don’t Miss This!