2157. Groups of Strings

HardBit ManipulationUnion FindString
Leetcode Link

Problem Description

In this problem, you are given an array of strings, words, with each string containing unique lowercase English letters. There are two unique concepts defined:

  1. Connectivity between strings: Two strings are considered "connected" if you can perform exactly one of the following operations to one string to get the other:

    • Add a single letter.
    • Remove a single letter.
    • Replace a single letter with another letter (could be the same letter).
  2. Grouping: The strings can be divided into "groups" with the following rules:

    • A string can join a group if it is connected to at least one other string in the group.
    • A string can also form a group by itself if it's not connected to any other.

The task is to find a unique division of words into groups such that:

  • No string belonging to one group is connected to a string in another group.
  • The division results in the maximum number of groups possible.

You're to return an array ans of size 2, where:

  • ans[0] is the maximum number of groups that the words can be divided into.
  • ans[1] is the size of the largest group.

The solution requires ensuring that when a string is grouped, it isn't connected to any other string outside of its group, which guarantees the groups are non-intersecting and uniquely determined.

Intuition

The intuition behind the solution involves bit manipulation and disjoint set (also known as union-find) data structure to keep track of groups efficiently.

  • Since each letter can occur only once, and there are 26 lowercase letters, each word can be uniquely represented by a 26-bit number, marking which letters are present in the word.
  • Utilize the union-find data structure to merge sets (or groups) when strings are "connected."
  • Track the size of each set to determine the maximum group size.

The approach:

  1. Convert each word into a 26-bit number and initialize its parent to itself and initialize the size of each set.
  2. Perform union-find operations to merge sets:
    • Consider all possible single letter changes for each word's bit representation:
      • For adding or deleting a letter, flip the corresponding bit.
      • For replacing a letter, flip the two corresponding bits.
      • Perform union operation if the altered bit representation corresponds to another word.
  3. Count the number of unique parents as the maximum number of groups.
  4. Keep track of the size of the largest group during the process.

By the end of this process, the disjoint sets have been formed, representing the unique groups, and the largest group size has been captured.

Learn more about Union Find patterns.

Solution Approach

The given solution uses bit manipulation and union-find data structures to achieve the task.

Bit Manipulation: Since each word consists of lowercase letters without repetition, each word can be represented as a 26-bit integer. Each bit represents whether a particular letter (from 'a' to 'z') exists in the word or not. This bit representation is obtained by iterating through each character of the word, calculating the difference of the ASCII value of that character from 'a', and setting the corresponding bit.

Union-Find: Union-find, also known as disjoint set, is a data structure that keeps track of elements which are split into one or more disjoint sets. Its main operations are find - to find the representative of a set that an element belongs to, and union - to merge two sets if they are not already merged.

Here's an explanation of the key parts of the implemented solution:

  1. Initialize the necessary data structures:

    • Dictionary p to track the parent of each 26-bit number.
    • Counter size to track the size of each group.
    • Int variables mx to keep the size of the largest group, and n, which is the total number of different groups.
  2. Process each word in words:

    • Convert the word to its bit representation and store its initial parent (itself) and increment its count in the size counter.
    • Track the maximum size of groups with mx.
  3. Explore all possible connections via bit manipulation:

    • Loop through each 26-bit number and simulate adding, removing, or replacing a letter:
      • Toggle a bit to simulate adding or deleting a letter.
      • Toggle a pair of bits to simulate replacing a letter.
      • If the resulting bit pattern is in the parent dictionary, attempt to union the original and altered patterns.
  4. The find and union functions work together to manage the disjoint sets:

    • find recursively locates the representative of the set for a given bit pattern, compressing paths along the way.
    • union joins two sets if they are not already joined, updating the parent mapping, and merges the sizes while updating the mx and n variables.

The solution dynamically builds connections between words, merging groups when a connection is found. By the end of the iteration, we have the maximum number of groups as n and the size of the largest group as mx, which is returned as the final result [n, mx].

By using a bitwise representation of the words and the efficient union-find data structure, the solution scales well with the input size, handling the connectivity logic with minimal overhead.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's demonstrate how the solution approach works with a small example.

Suppose we have the following array of words:

words = ["cat", "bat", "rat", "drat", "dart", "carts"]
  1. Convert each word to a 26-bit number (we'll use pseudocode for the bit manipulation):

    • "cat" -> 0110010... (bits representing 'c', 'a', 't')
    • "bat" -> 0110001... (bits representing 'b', 'a', 't')
    • "rat" -> 0100001... (bits representing 'r', 'a', 't')
    • "drat" -> 1100001... (bits for 'd', 'r', 'a', 't')
    • "dart" -> 1110000... (bits for 'd', 'a', 'r', 't')
    • "carts" -> 0110110... (bits for 'c', 'a', 'r', 't', 's')
  2. Initialize the parent and size for each unique 26-bit number:

p["cat"] = "cat", size["cat"] = 1
p["bat"] = "bat", size["bat"] = 1
p["rat"] = "rat", size["rat"] = 1
p["drat"] = "drat", size["drat"] = 1
p["dart"] = "dart", size["dart"] = 1
p["carts"] = "carts", size["carts"] = 1
  1. Explore all possible connections via bit manipulation:

    • For "cat" we toggle bits corresponding to adding, removing, or replacing a letter:
      • When we flip the 'c' bit, we get "at" which isn't in the array so we ignore it.
      • When we flip the 'a' bit, we get "ct" which isn't in the array so we ignore it.
      • When we flip the 't' bit, we get "ca" which isn't in the array so we ignore it.
      • When we flip the 'c' and 'b', we get a match with "bat", so we union "cat" and "bat". Now they are in the same set.
    • We would perform similar operations for "bat," "rat," "drat," "dart," and "carts."
  2. After doing the union operations for each word, we would find that:

    • "cat" and "bat" are connected; they form one group.
    • "rat," "drat," and "dart" are connected; they form another group.
    • "carts" stands alone as it can't connect with a single letter change.

    Which would result in:

p["cat"] = "cat", size["cat"] = 2 (since "bat" is also connected to "cat")
p["rat"] = "rat", size["rat"] = 3 (connected to both "drat" and "dart")
p["carts"] = "carts", size["carts"] = 1

The number of unique parents is 3, so we have 3 unique groups. The maximum size of a group is 3 (the group with "rat", "drat", and "dart").

So the final output is:

ans = [3, 3]

This output indicates that there are 3 maximum groups and the largest group contains 3 words.

Solution Implementation

1from collections import Counter
2
3class Solution:
4    def groupStrings(self, words: List[str]) -> List[int]:
5        # Helper function to find the root of a set
6        def find(word_bitmask):
7            if parent[word_bitmask] != word_bitmask:
8                parent[word_bitmask] = find(parent[word_bitmask])
9            return parent[word_bitmask]
10      
11        # Helper function to union two sets
12        def union(bitmask_a, bitmask_b):
13            nonlocal max_group_size, distinct_groups
14            if bitmask_b not in parent:
15                return
16            parent_a, parent_b = find(bitmask_a), find(bitmask_b)
17            if parent_a == parent_b:
18                return
19            parent[parent_a] = parent_b
20            size[parent_b] += size[parent_a]
21            max_group_size = max(max_group_size, size[parent_b])
22            distinct_groups -= 1
23
24        # Initialization of the disjoint-set union structure
25        parent = {}
26        size = Counter()
27        distinct_groups = len(words)
28        max_group_size = 0
29      
30        # First pass: Populate disjoint set and calculate initial group sizes
31        for word in words:
32            bitmask = 0
33            for char in word:
34                bitmask |= 1 << (ord(char) - ord('a'))
35            parent[bitmask] = bitmask
36            size[bitmask] += 1
37            max_group_size = max(max_group_size, size[bitmask])
38            if size[bitmask] > 1:
39                distinct_groups -= 1
40      
41        # Second pass: Perform union operations on sets based on shared or differing characters
42        for bitmask in parent.keys():
43            for i in range(26):
44                union(bitmask, bitmask ^ (1 << i))
45                if (bitmask >> i) & 1:
46                    for j in range(26):
47                        if ((bitmask >> j) & 1) == 0:
48                            union(bitmask, bitmask ^ (1 << i) | (1 << j))
49      
50        # Return the number of distinct groups and the size of the largest group
51        return [distinct_groups, max_group_size]
52
1class Solution {
2    private Map<Integer, Integer> parent;
3    private Map<Integer, Integer> groupSize;
4    private int maxGroupSize;
5    private int groupCount;
6
7    public int[] groupStrings(String[] words) {
8        parent = new HashMap<>();
9        groupSize = new HashMap<>();
10        groupCount = words.length;
11        maxGroupSize = 0; // Initialize max group size as zero
12      
13        // Convert each word into a bitmask and initialize disjoint sets
14        for (String word : words) {
15            int bitmask = 0;
16            for (char c : word.toCharArray()) {
17                bitmask |= 1 << (c - 'a');
18            }
19            // Initialize or update the size of the set that the bitmask belongs to
20            parent.put(bitmask, bitmask);
21            groupSize.put(bitmask, groupSize.getOrDefault(bitmask, 0) + 1);
22            // Update max group size if necessary
23            maxGroupSize = Math.max(maxGroupSize, groupSize.get(bitmask));
24            // Decrement group count for duplicate bitmasks
25            if (groupSize.get(bitmask) > 1) {
26                groupCount--;
27            }
28        }
29
30        // Attempt to union sets based on the shifting of bits
31        for (int x : parent.keySet()) {
32            for (int i = 0; i < 26; ++i) {
33                union(x, x ^ (1 << i)); // Shift bit for each alphabet position
34                if ((x >> i & 1) != 0) { // If the bit at position i is set to 1
35                    for (int j = 0; j < 26; ++j) { // Try shifting the bit from position i to position j
36                        if ((x >> j & 1) == 0) {
37                            union(x, x ^ (1 << i) | (1 << j));
38                        }
39                    }
40                }
41            }
42        }
43
44        // Return the number of distinct groups and the size of the largest group
45        return new int[] {groupCount, maxGroupSize};
46    }
47
48    // Find the root of the set that the element x belongs to
49    private int find(int x) {
50        if (parent.get(x) != x) {
51            parent.put(x, find(parent.get(x))); // Path compression
52        }
53        return parent.get(x);
54    }
55
56    // Union two sets that a and b belong to, if they are different sets
57    private void union(int a, int b) {
58        // Ignore if set b is not present
59        if (!parent.containsKey(b)) {
60            return;
61        }
62        int rootA = find(a), rootB = find(b);
63        if (rootA == rootB) {
64            return;
65        }
66        // Merge the two sets and update their size
67        parent.put(rootA, rootB);
68        groupSize.put(rootB, groupSize.get(rootB) + groupSize.get(rootA));
69        // Update the max group size after merging
70        maxGroupSize = Math.max(maxGroupSize, groupSize.get(rootB));
71        // Decrement the group count since two groups have been merged
72        groupCount--;
73    }
74}
75
1#include <vector>
2#include <string>
3#include <unordered_map>
4#include <algorithm>
5
6class Solution {
7public:
8    int maxGroupSize, groupCount; // Variables to maintain max group size and total group count
9
10    // Returns a vector containing the number of different groups and the size of the largest group
11    vector<int> groupStrings(vector<string>& words) {
12        unordered_map<int, int> parent; // Parent mapping for disjoint sets
13        unordered_map<int, int> groupSize; // Size of each group
14        maxGroupSize = 0; // Initialize max group size
15        groupCount = words.size(); // Initial group count is the total number of words
16      
17        // Initialize the disjoint set and group sizes
18        for (auto& word : words) {
19            int mask = 0;
20            // Create bit mask for each character present in the word
21            for (auto& c : word) mask |= 1 << (c - 'a');
22            parent[mask] = mask; // The parent of each set is itself initially
23            ++groupSize[mask]; // Increment the group size for the bit mask
24            maxGroupSize = max(maxGroupSize, groupSize[mask]); // Update max group size
25            if (groupSize[mask] > 1) --groupCount; // Decrement the group count if we have a duplicate
26        }
27      
28        // Union sets by bit manipulation, trying all possible one-bit changes
29        for (auto& [mask, _] : parent) {
30            for (int i = 0; i < 26; ++i) {
31                // Try uniting the current set with sets that have one bit different
32                uniteGroups(mask, mask ^ (1 << i), parent, groupSize);
33                // If current bit is set, try uniting with sets that have this bit unset
34                if ((mask >> i) & 1) {
35                    for (int j = 0; j < 26; ++j) {
36                        if (((mask >> j) & 1) == 0)
37                            uniteGroups(mask, mask ^ (1 << i) | (1 << j), parent, groupSize);
38                    }
39                }
40            }
41        }
42        return {groupCount, maxGroupSize};
43    }
44
45private:
46    // Find the root of the set that 'x' belongs to
47    int findSetRoot(int x, unordered_map<int, int>& parent) {
48        if (parent[x] != x) parent[x] = findSetRoot(parent[x], parent);
49        return parent[x];
50    }
51
52    // Unite two disjoint sets by their representatives
53    void uniteGroups(int a, int b, unordered_map<int, int>& parent, unordered_map<int, int>& groupSize) {
54        if (!parent.count(b)) return; // Only unite if 'b' is an existing parent
55        int rootA = findSetRoot(a, parent), rootB = findSetRoot(b, parent);
56        if (rootA == rootB) return; // They are already in the same set
57
58        // Union by size, the larger group absorbs the smaller
59        parent[rootA] = rootB;
60        groupSize[rootB] += groupSize[rootA]; 
61      
62        // Update max group size after union
63        maxGroupSize = max(maxGroupSize, groupSize[rootB]);
64      
65        --groupCount; // Decrement the group count as two groups have merged
66    }
67};
68
1// TypeScript's way to implement the same functionality
2// We start by defining required types and global variables
3
4// Represents the parent mapping for disjoint sets
5const parent: { [key: number]: number } = {};
6
7// Size of each group mapped by bit mask representative
8const groupSize: { [key: number]: number } = {};
9
10// Variable to maintain max group size
11let maxGroupSize: number = 0;
12
13// Variable to maintain total group count
14let groupCount: number = 0;
15
16// Returns an array containing the number of different groups and the size of the largest group
17function groupStrings(words: string[]): number[] {
18    maxGroupSize = 0; // Initialize max group size
19    groupCount = words.length; // Initial group count is the total number of words
20
21    // Initialize the disjoint set and group sizes
22    words.forEach(word => {
23        let mask = 0;
24        // Create a bitmask for each character present in the word
25        [...word].forEach(c => mask |= 1 << (c.charCodeAt(0) - 'a'.charCodeAt(0)));
26      
27        if (!parent[mask]) {
28            // The parent of each set is itself initially
29            parent[mask] = mask;
30            // The group size for the bitmask starts at 1
31            groupSize[mask] = 1;
32        } else {
33            // Increment the group size for the existing bitmask
34            ++groupSize[mask];
35            // If group size is incremented, decrement group count if it was a unique group before
36            if (groupSize[mask] === 2) --groupCount;
37        }
38        // Update max group size
39        maxGroupSize = Math.max(maxGroupSize, groupSize[mask]);
40    });
41
42    // Union sets by bit manipulation, trying all possible one-bit changes
43    Object.keys(parent).forEach(maskStr => {
44        const mask = Number(maskStr);
45        for (let i = 0; i < 26; ++i) {
46            // Try uniting the current set with sets that have one bit different
47            uniteGroups(mask, mask ^ (1 << i));
48            // If current bit is set, try uniting with sets that have this bit unset
49            if ((mask >> i) & 1) {
50                for (let j = 0; j < 26; ++j) {
51                    if (((mask >> j) & 1) === 0) {
52                        uniteGroups(mask, mask ^ (1 << i) | (1 << j));
53                    }
54                }
55            }
56        }
57    });
58
59    return [groupCount, maxGroupSize];
60}
61
62// Find the root of the set that 'x' belongs to
63function findSetRoot(x: number): number {
64    if (parent[x] !== x) parent[x] = findSetRoot(parent[x]);
65    return parent[x];
66}
67
68// Unite two disjoint sets by their representatives
69function uniteGroups(a: number, b: number): void {
70    if (parent[b] === undefined) return; // Only unite if 'b' is an existing parent
71    const rootA = findSetRoot(a), rootB = findSetRoot(b);
72    if (rootA === rootB) return; // They are already in the same set
73
74    // Union by size, the larger group absorbs the smaller one
75    parent[rootA] = rootB;
76    groupSize[rootB] += groupSize[rootA];
77
78    // Update max group size after union
79    maxGroupSize = Math.max(maxGroupSize, groupSize[rootB]);
80
81    // Decrement the group count as two groups have merged
82    --groupCount;
83}
84

Time and Space Complexity

Time Complexity

The time complexity of the groupStrings function can be broken down based on the operations it performs:

  1. Initialization of data structures:

    • Initializing the parent dictionary p and the size counter size can be considered O(1) since they are empty at the start.
    • Populating the size counter according to the input words is O(N * L), where N is the number of words and L is the average length of a word because each character is processed individually.
  2. Parent-Finding and Union Operations:

    • The find function performs the path compression technique, which, after initial setup, has a nearly constant time complexity, amortized O(α(N)), where α(N) is the inverse Ackermann function, which is a very slow-growing function.
    • The two nested for loops, with 26 iterations each, result in a potential O(26^2) operations per word. This predominantly dominates the time complexity within the loop.
    • The union operations inside the loops are performed only if certain bitwise conditions are met; however, in the worst case, those conditions will always be true, resulting in 26^2 union operations for each of the N words.

Therefore, the total worst-case time complexity is O(N * (L + 26^2 * α(N))), considering that α(N) is a very slow-growing function and can be considered nearly a constant, we can simplify this to O(N * (L + 26^2)).

Space Complexity

The space complexity can be analyzed by considering the memory used by data structures and the recursion stack for the find function:

  1. Data Structures:

    • The p and size data structures will at most have a distinct entry for each unique bit representation of words, which is at most O(N) space complexity since there can be at most N unique bit masks.
    • The Counter object for size also has at most O(N) space complexity.
  2. Recursion Stack:

    • The find function could have a recursion depth equal to the number of elements in p, which is at most O(N) in the worst case. However, due to path compression, the average recursion depth will be O(α(N)), which is nearly constant.

Therefore, the total space complexity is O(N), as the recursion stack does not significantly contribute to the space complexity due to the path compression resulting in an almost flat tree.

Learn more about how to find time and space complexity quickly using problem constraints.


Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:
Question 1 out of 10

Which of the two traversal algorithms (BFS and DFS) can be used to find whether two nodes are connected?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!