2506. Count Pairs Of Similar Strings
Problem Description
You are given an array of strings called words
that is 0-indexed (meaning positions start from 0).
Two strings are considered similar if they contain exactly the same set of unique characters, regardless of frequency or order.
For example:
"abca"
and"cba"
are similar because both contain the characters'a'
,'b'
, and'c'
"abacba"
and"bcfd"
are not similar because they don't share the same set of characters
Your task is to count how many pairs of indices (i, j)
exist where:
0 <= i < j <= words.length - 1
(meaningi
comes beforej
in the array)words[i]
andwords[j]
are similar strings
The solution uses bit manipulation to represent each string as a binary number. Each bit position (0-25) represents whether a specific letter ('a'-'z') appears in the string. For instance, if a string contains 'a', 'c', and 'e', the 0th, 2nd, and 4th bits would be set to 1.
The algorithm:
- For each string, create its binary representation by setting bits corresponding to its characters
- Use a hash table to count how many strings have the same binary representation
- When processing a string, add the current count of strings with the same binary representation to the answer (these form valid pairs with the current string)
- Increment the count for this binary representation
This efficiently finds all similar string pairs in a single pass through the array.
Intuition
The key insight is recognizing that two strings are similar if they have the same set of unique characters. This means we need a way to represent the "character set" of each string in a form that makes comparison easy.
Initially, we might think of creating a sorted string of unique characters for each word (like converting "abca"
to "abc"
). However, this requires string manipulation and sorting, which can be inefficient.
A more elegant approach emerges when we realize that:
- There are only 26 possible lowercase letters
- For similarity, we only care about which letters are present, not their frequency or order
- This is essentially a "set membership" problem
This naturally leads to thinking about bit manipulation. We can use a 26-bit integer where each bit represents whether a specific letter exists in the string. For example:
- Bit 0 represents 'a'
- Bit 1 represents 'b'
- Bit 25 represents 'z'
So a string like "abc"
would be represented as ...0000111
(rightmost 3 bits set), and "cab"
would have the same representation.
Once we have this representation, counting similar pairs becomes straightforward. As we process each string:
- Convert it to its bit representation
- Check how many previously processed strings have the same bit pattern (these form valid pairs with the current string)
- Add this count to our answer
- Update our count for this bit pattern
This approach is efficient because:
- Converting a string to its bit representation takes
O(length of string)
- Comparing two bit representations is just comparing two integers -
O(1)
- We only need one pass through the array
Solution Approach
The solution uses hash table and bit manipulation to efficiently count similar string pairs.
Data Structures Used:
- A hash table (
Counter
) to store the frequency of each bit pattern - An integer variable to accumulate the answer
Implementation Steps:
-
Initialize variables:
ans = 0
to store the total count of similar pairscnt = Counter()
to track how many strings have each unique bit pattern
-
Process each string in the array:
- For each string
s
, initializex = 0
as the bit representation
- For each string
-
Convert string to bit pattern:
- Iterate through each character
c
in the string - Calculate the bit position:
c - ord("a")
gives us a value from 0 to 25 - Set the corresponding bit using OR operation:
x |= 1 << (c - ord("a"))
- For example, if
c = 'b'
, thenc - ord("a") = 1
, and1 << 1 = 2
(binary:10
), so we set bit 1
- Iterate through each character
-
Count pairs with current string:
- Before updating the counter, add
cnt[x]
to the answer - This represents all previously seen strings that have the same bit pattern (and thus are similar to the current string)
- Before updating the counter, add
-
Update the counter:
- Increment
cnt[x]
by 1 to include the current string for future comparisons
- Increment
-
Return the result:
- After processing all strings,
ans
contains the total number of similar pairs
- After processing all strings,
Example Walkthrough:
For words = ["abc", "bca", "xyz", "cba"]
:
"abc"
→ bit pattern:111
(bits 0,1,2 set) →cnt[7] = 0
, add 0 to ans, updatecnt[7] = 1
"bca"
→ bit pattern:111
→cnt[7] = 1
, add 1 to ans, updatecnt[7] = 2
"xyz"
→ bit pattern:111000...000
(bits 23,24,25 set) →cnt[new_pattern] = 0
, add 0 to ans"cba"
→ bit pattern:111
→cnt[7] = 2
, add 2 to ans, updatecnt[7] = 3
- Final answer:
1 + 2 = 3
pairs
Time Complexity: O(n * m)
where n
is the number of words and m
is the average length of each word
Space Complexity: O(n)
for the hash table storing at most n
different bit patterns
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with words = ["aba", "aabb", "abcd", "bac", "aabc"]
.
Step 1: Process "aba"
- Convert to bit pattern: 'a' sets bit 0, 'b' sets bit 1
- Bit pattern:
...0011
(decimal: 3) - Check
cnt[3]
= 0 (no previous strings with this pattern) - Add 0 to answer (ans = 0)
- Update
cnt[3]
= 1
Step 2: Process "aabb"
- Convert to bit pattern: 'a' sets bit 0, 'b' sets bit 1
- Bit pattern:
...0011
(decimal: 3) - Check
cnt[3]
= 1 (one previous string "aba" has same pattern) - Add 1 to answer (ans = 1) - forms pair (0,1)
- Update
cnt[3]
= 2
Step 3: Process "abcd"
- Convert to bit pattern: 'a' sets bit 0, 'b' sets bit 1, 'c' sets bit 2, 'd' sets bit 3
- Bit pattern:
...01111
(decimal: 15) - Check
cnt[15]
= 0 (no previous strings with this pattern) - Add 0 to answer (ans = 1)
- Update
cnt[15]
= 1
Step 4: Process "bac"
- Convert to bit pattern: 'b' sets bit 1, 'a' sets bit 0, 'c' sets bit 2
- Bit pattern:
...0111
(decimal: 7) - Check
cnt[7]
= 0 (no previous strings with this pattern) - Add 0 to answer (ans = 1)
- Update
cnt[7]
= 1
Step 5: Process "aabc"
- Convert to bit pattern: 'a' sets bit 0, 'b' sets bit 1, 'c' sets bit 2
- Bit pattern:
...0111
(decimal: 7) - Check
cnt[7]
= 1 (one previous string "bac" has same pattern) - Add 1 to answer (ans = 2) - forms pair (3,4)
- Update
cnt[7]
= 2
Final Result: 2 similar pairs: (0,1) and (3,4)
- Pair (0,1): "aba" and "aabb" both contain only {a, b}
- Pair (3,4): "bac" and "aabc" both contain only {a, b, c}
Solution Implementation
1class Solution:
2 def similarPairs(self, words: List[str]) -> int:
3 # Count of similar pairs found
4 pair_count = 0
5
6 # Dictionary to store frequency of each unique character set (represented as bitmask)
7 bitmask_frequency = Counter()
8
9 # Process each word in the list
10 for word in words:
11 # Create a bitmask to represent unique characters in the word
12 # Each bit position represents a letter (a=0, b=1, ..., z=25)
13 bitmask = 0
14
15 # Set corresponding bit for each character in the word
16 for char in word:
17 # Calculate bit position for the character (0-25)
18 bit_position = ord(char) - ord('a')
19 # Set the bit at that position using bitwise OR
20 bitmask |= 1 << bit_position
21
22 # Add count of previously seen words with same character set
23 # These form pairs with the current word
24 pair_count += bitmask_frequency[bitmask]
25
26 # Increment frequency count for this bitmask
27 bitmask_frequency[bitmask] += 1
28
29 return pair_count
30
1class Solution {
2 public int similarPairs(String[] words) {
3 int pairCount = 0;
4 // Map to store bitmask of unique characters -> frequency count
5 Map<Integer, Integer> bitmaskFrequency = new HashMap<>();
6
7 // Process each word in the array
8 for (String word : words) {
9 // Create bitmask representing unique characters in current word
10 int bitmask = 0;
11 for (char character : word.toCharArray()) {
12 // Set the bit corresponding to this character (a=0, b=1, ..., z=25)
13 bitmask |= 1 << (character - 'a');
14 }
15
16 // Update frequency map and count pairs
17 // merge() returns the new value after merging
18 // If bitmask exists, increment its count; otherwise set to 1
19 // Subtract 1 because we want pairs with previous occurrences
20 int currentFrequency = bitmaskFrequency.merge(bitmask, 1, Integer::sum);
21 pairCount += currentFrequency - 1;
22 }
23
24 return pairCount;
25 }
26}
27
1class Solution {
2public:
3 int similarPairs(vector<string>& words) {
4 int pairCount = 0;
5 // Map to store frequency of each unique character bitmask
6 unordered_map<int, int> bitmaskFrequency;
7
8 // Process each word in the input array
9 for (const auto& word : words) {
10 // Create a bitmask to represent unique characters in the word
11 // Each bit position represents a letter (0 for 'a', 1 for 'b', etc.)
12 int bitmask = 0;
13
14 // Set the corresponding bit for each character in the word
15 for (const char& ch : word) {
16 // Set the bit at position (ch - 'a') to 1
17 bitmask |= 1 << (ch - 'a');
18 }
19
20 // Add the count of words with the same bitmask seen so far
21 // This represents the number of pairs that can be formed with current word
22 pairCount += bitmaskFrequency[bitmask];
23
24 // Increment the frequency of the current bitmask
25 bitmaskFrequency[bitmask]++;
26 }
27
28 return pairCount;
29 }
30};
31
1/**
2 * Counts the number of similar pairs in the array where two words are similar
3 * if they contain the exact same set of unique characters
4 * @param words - Array of words to check for similar pairs
5 * @returns Number of similar pairs found
6 */
7function similarPairs(words: string[]): number {
8 let pairCount: number = 0;
9 // Map to store the frequency of each unique character set (represented as bitmask)
10 const bitmaskFrequency: Map<number, number> = new Map<number, number>();
11
12 for (const word of words) {
13 // Create a bitmask representing the unique characters in the current word
14 let characterBitmask: number = 0;
15
16 for (const character of word) {
17 // Set the bit corresponding to the character (a=0, b=1, ..., z=25)
18 const bitPosition: number = character.charCodeAt(0) - 97;
19 characterBitmask |= 1 << bitPosition;
20 }
21
22 // Add the count of previously seen words with the same character set
23 const previousCount: number = bitmaskFrequency.get(characterBitmask) || 0;
24 pairCount += previousCount;
25
26 // Update the frequency map with the current word's character set
27 bitmaskFrequency.set(characterBitmask, previousCount + 1);
28 }
29
30 return pairCount;
31}
32
Time and Space Complexity
Time Complexity: O(L)
, where L
is the total length of all strings in the input list.
- The outer loop iterates through each word in
words
, which isn
iterations - For each word, the inner loop iterates through each character in that word
- The total number of character iterations across all words is
L
(the sum of lengths of all strings) - Inside the inner loop, operations like bitwise OR (
|=
), bit shift (<<
), and character conversion are allO(1)
- Dictionary access (
cnt[x]
) and update (cnt[x] += 1
) areO(1)
on average - Therefore, the overall time complexity is
O(L)
Space Complexity: O(n)
, where n
is the number of strings in the input list.
- The
cnt
Counter dictionary stores at mostn
distinct entries (one for each unique bit pattern) - In the worst case, each word has a unique set of characters, resulting in
n
different bit patterns - The variable
x
usesO(1)
space as it's just a single integer (even though it represents a bitmask of up to 26 bits) - Other variables (
ans
,s
,c
) useO(1)
space - Therefore, the overall space complexity is
O(n)
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Using String/Set Comparison Instead of Bit Manipulation
Pitfall: A common mistake is to directly compare character sets using Python sets or sorted strings, which is less efficient.
# Inefficient approach - DON'T DO THIS
def similarPairs(self, words: List[str]) -> int:
pair_count = 0
char_sets = []
for word in words:
# Creating a set for each word
current_set = set(word)
# Comparing with all previous sets
for prev_set in char_sets:
if current_set == prev_set:
pair_count += 1
char_sets.append(current_set)
return pair_count
Problem: This approach has O(n²) time complexity for comparisons and uses more memory to store sets.
Solution: Use bit manipulation as shown in the original solution, which reduces comparison to O(1) hash lookups.
2. Incorrect Bit Position Calculation
Pitfall: Using the character directly instead of calculating its position relative to 'a'.
# WRONG - This will set incorrect bit positions
bitmask |= 1 << ord(char) # Sets bits at positions 97-122 for 'a'-'z'
# CORRECT
bitmask |= 1 << (ord(char) - ord('a')) # Sets bits at positions 0-25
Problem: Without subtracting ord('a')
, you'll use bit positions 97-122 instead of 0-25, potentially causing integer overflow or using unnecessary memory.
3. Counting Pairs Incorrectly
Pitfall: Adding the frequency after incrementing it, or trying to count all pairs at the end.
# WRONG - Counts each string with itself for word in words: bitmask = # ... calculate bitmask bitmask_frequency[bitmask] += 1 # Increment first pair_count += bitmask_frequency[bitmask] # Then add - includes self! # WRONG - Trying to calculate pairs after processing all words for word in words: bitmask = # ... calculate bitmask bitmask_frequency[bitmask] += 1 # This misses the ordering requirement (i < j) for count in bitmask_frequency.values(): pair_count += count * (count - 1) // 2
Solution: Always add the current frequency before incrementing it. This ensures each word only forms pairs with words that appeared before it in the array.
4. Not Handling Empty Strings or Special Cases
Pitfall: Assuming all strings are non-empty or contain only lowercase letters.
# Add validation if needed
for word in words:
if not word: # Handle empty string case
continue
bitmask = 0
for char in word:
if 'a' <= char <= 'z': # Validate character range
bitmask |= 1 << (ord(char) - ord('a'))
5. Using Regular Dictionary Instead of Counter
Pitfall: Using a regular dictionary without proper initialization.
# WRONG - KeyError when accessing non-existent key bitmask_frequency = {} pair_count += bitmask_frequency[bitmask] # KeyError if bitmask not in dict # CORRECT - Use Counter which returns 0 for missing keys bitmask_frequency = Counter() pair_count += bitmask_frequency[bitmask] # Returns 0 if key doesn't exist
Solution: Use Counter()
from collections, which automatically handles missing keys by returning 0.
What is the best way of checking if an element exists in a sorted array once in terms of time complexity? Select the best that applies.
Recommended Readings
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!