Facebook Pixel

2506. Count Pairs Of Similar Strings

EasyBit ManipulationArrayHash TableStringCounting
Leetcode Link

Problem Description

You are given an array of strings called words that is 0-indexed (meaning positions start from 0).

Two strings are considered similar if they contain exactly the same set of unique characters, regardless of frequency or order.

For example:

  • "abca" and "cba" are similar because both contain the characters 'a', 'b', and 'c'
  • "abacba" and "bcfd" are not similar because they don't share the same set of characters

Your task is to count how many pairs of indices (i, j) exist where:

  • 0 <= i < j <= words.length - 1 (meaning i comes before j in the array)
  • words[i] and words[j] are similar strings

The solution uses bit manipulation to represent each string as a binary number. Each bit position (0-25) represents whether a specific letter ('a'-'z') appears in the string. For instance, if a string contains 'a', 'c', and 'e', the 0th, 2nd, and 4th bits would be set to 1.

The algorithm:

  1. For each string, create its binary representation by setting bits corresponding to its characters
  2. Use a hash table to count how many strings have the same binary representation
  3. When processing a string, add the current count of strings with the same binary representation to the answer (these form valid pairs with the current string)
  4. Increment the count for this binary representation

This efficiently finds all similar string pairs in a single pass through the array.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is recognizing that two strings are similar if they have the same set of unique characters. This means we need a way to represent the "character set" of each string in a form that makes comparison easy.

Initially, we might think of creating a sorted string of unique characters for each word (like converting "abca" to "abc"). However, this requires string manipulation and sorting, which can be inefficient.

A more elegant approach emerges when we realize that:

  1. There are only 26 possible lowercase letters
  2. For similarity, we only care about which letters are present, not their frequency or order
  3. This is essentially a "set membership" problem

This naturally leads to thinking about bit manipulation. We can use a 26-bit integer where each bit represents whether a specific letter exists in the string. For example:

  • Bit 0 represents 'a'
  • Bit 1 represents 'b'
  • Bit 25 represents 'z'

So a string like "abc" would be represented as ...0000111 (rightmost 3 bits set), and "cab" would have the same representation.

Once we have this representation, counting similar pairs becomes straightforward. As we process each string:

  • Convert it to its bit representation
  • Check how many previously processed strings have the same bit pattern (these form valid pairs with the current string)
  • Add this count to our answer
  • Update our count for this bit pattern

This approach is efficient because:

  • Converting a string to its bit representation takes O(length of string)
  • Comparing two bit representations is just comparing two integers - O(1)
  • We only need one pass through the array

Solution Approach

The solution uses hash table and bit manipulation to efficiently count similar string pairs.

Data Structures Used:

  • A hash table (Counter) to store the frequency of each bit pattern
  • An integer variable to accumulate the answer

Implementation Steps:

  1. Initialize variables:

    • ans = 0 to store the total count of similar pairs
    • cnt = Counter() to track how many strings have each unique bit pattern
  2. Process each string in the array:

    • For each string s, initialize x = 0 as the bit representation
  3. Convert string to bit pattern:

    • Iterate through each character c in the string
    • Calculate the bit position: c - ord("a") gives us a value from 0 to 25
    • Set the corresponding bit using OR operation: x |= 1 << (c - ord("a"))
    • For example, if c = 'b', then c - ord("a") = 1, and 1 << 1 = 2 (binary: 10), so we set bit 1
  4. Count pairs with current string:

    • Before updating the counter, add cnt[x] to the answer
    • This represents all previously seen strings that have the same bit pattern (and thus are similar to the current string)
  5. Update the counter:

    • Increment cnt[x] by 1 to include the current string for future comparisons
  6. Return the result:

    • After processing all strings, ans contains the total number of similar pairs

Example Walkthrough:

For words = ["abc", "bca", "xyz", "cba"]:

  • "abc" → bit pattern: 111 (bits 0,1,2 set) → cnt[7] = 0, add 0 to ans, update cnt[7] = 1
  • "bca" → bit pattern: 111cnt[7] = 1, add 1 to ans, update cnt[7] = 2
  • "xyz" → bit pattern: 111000...000 (bits 23,24,25 set) → cnt[new_pattern] = 0, add 0 to ans
  • "cba" → bit pattern: 111cnt[7] = 2, add 2 to ans, update cnt[7] = 3
  • Final answer: 1 + 2 = 3 pairs

Time Complexity: O(n * m) where n is the number of words and m is the average length of each word Space Complexity: O(n) for the hash table storing at most n different bit patterns

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with words = ["aba", "aabb", "abcd", "bac", "aabc"].

Step 1: Process "aba"

  • Convert to bit pattern: 'a' sets bit 0, 'b' sets bit 1
  • Bit pattern: ...0011 (decimal: 3)
  • Check cnt[3] = 0 (no previous strings with this pattern)
  • Add 0 to answer (ans = 0)
  • Update cnt[3] = 1

Step 2: Process "aabb"

  • Convert to bit pattern: 'a' sets bit 0, 'b' sets bit 1
  • Bit pattern: ...0011 (decimal: 3)
  • Check cnt[3] = 1 (one previous string "aba" has same pattern)
  • Add 1 to answer (ans = 1) - forms pair (0,1)
  • Update cnt[3] = 2

Step 3: Process "abcd"

  • Convert to bit pattern: 'a' sets bit 0, 'b' sets bit 1, 'c' sets bit 2, 'd' sets bit 3
  • Bit pattern: ...01111 (decimal: 15)
  • Check cnt[15] = 0 (no previous strings with this pattern)
  • Add 0 to answer (ans = 1)
  • Update cnt[15] = 1

Step 4: Process "bac"

  • Convert to bit pattern: 'b' sets bit 1, 'a' sets bit 0, 'c' sets bit 2
  • Bit pattern: ...0111 (decimal: 7)
  • Check cnt[7] = 0 (no previous strings with this pattern)
  • Add 0 to answer (ans = 1)
  • Update cnt[7] = 1

Step 5: Process "aabc"

  • Convert to bit pattern: 'a' sets bit 0, 'b' sets bit 1, 'c' sets bit 2
  • Bit pattern: ...0111 (decimal: 7)
  • Check cnt[7] = 1 (one previous string "bac" has same pattern)
  • Add 1 to answer (ans = 2) - forms pair (3,4)
  • Update cnt[7] = 2

Final Result: 2 similar pairs: (0,1) and (3,4)

  • Pair (0,1): "aba" and "aabb" both contain only {a, b}
  • Pair (3,4): "bac" and "aabc" both contain only {a, b, c}

Solution Implementation

1class Solution:
2    def similarPairs(self, words: List[str]) -> int:
3        # Count of similar pairs found
4        pair_count = 0
5      
6        # Dictionary to store frequency of each unique character set (represented as bitmask)
7        bitmask_frequency = Counter()
8      
9        # Process each word in the list
10        for word in words:
11            # Create a bitmask to represent unique characters in the word
12            # Each bit position represents a letter (a=0, b=1, ..., z=25)
13            bitmask = 0
14          
15            # Set corresponding bit for each character in the word
16            for char in word:
17                # Calculate bit position for the character (0-25)
18                bit_position = ord(char) - ord('a')
19                # Set the bit at that position using bitwise OR
20                bitmask |= 1 << bit_position
21          
22            # Add count of previously seen words with same character set
23            # These form pairs with the current word
24            pair_count += bitmask_frequency[bitmask]
25          
26            # Increment frequency count for this bitmask
27            bitmask_frequency[bitmask] += 1
28      
29        return pair_count
30
1class Solution {
2    public int similarPairs(String[] words) {
3        int pairCount = 0;
4        // Map to store bitmask of unique characters -> frequency count
5        Map<Integer, Integer> bitmaskFrequency = new HashMap<>();
6      
7        // Process each word in the array
8        for (String word : words) {
9            // Create bitmask representing unique characters in current word
10            int bitmask = 0;
11            for (char character : word.toCharArray()) {
12                // Set the bit corresponding to this character (a=0, b=1, ..., z=25)
13                bitmask |= 1 << (character - 'a');
14            }
15          
16            // Update frequency map and count pairs
17            // merge() returns the new value after merging
18            // If bitmask exists, increment its count; otherwise set to 1
19            // Subtract 1 because we want pairs with previous occurrences
20            int currentFrequency = bitmaskFrequency.merge(bitmask, 1, Integer::sum);
21            pairCount += currentFrequency - 1;
22        }
23      
24        return pairCount;
25    }
26}
27
1class Solution {
2public:
3    int similarPairs(vector<string>& words) {
4        int pairCount = 0;
5        // Map to store frequency of each unique character bitmask
6        unordered_map<int, int> bitmaskFrequency;
7      
8        // Process each word in the input array
9        for (const auto& word : words) {
10            // Create a bitmask to represent unique characters in the word
11            // Each bit position represents a letter (0 for 'a', 1 for 'b', etc.)
12            int bitmask = 0;
13          
14            // Set the corresponding bit for each character in the word
15            for (const char& ch : word) {
16                // Set the bit at position (ch - 'a') to 1
17                bitmask |= 1 << (ch - 'a');
18            }
19          
20            // Add the count of words with the same bitmask seen so far
21            // This represents the number of pairs that can be formed with current word
22            pairCount += bitmaskFrequency[bitmask];
23          
24            // Increment the frequency of the current bitmask
25            bitmaskFrequency[bitmask]++;
26        }
27      
28        return pairCount;
29    }
30};
31
1/**
2 * Counts the number of similar pairs in the array where two words are similar
3 * if they contain the exact same set of unique characters
4 * @param words - Array of words to check for similar pairs
5 * @returns Number of similar pairs found
6 */
7function similarPairs(words: string[]): number {
8    let pairCount: number = 0;
9    // Map to store the frequency of each unique character set (represented as bitmask)
10    const bitmaskFrequency: Map<number, number> = new Map<number, number>();
11  
12    for (const word of words) {
13        // Create a bitmask representing the unique characters in the current word
14        let characterBitmask: number = 0;
15      
16        for (const character of word) {
17            // Set the bit corresponding to the character (a=0, b=1, ..., z=25)
18            const bitPosition: number = character.charCodeAt(0) - 97;
19            characterBitmask |= 1 << bitPosition;
20        }
21      
22        // Add the count of previously seen words with the same character set
23        const previousCount: number = bitmaskFrequency.get(characterBitmask) || 0;
24        pairCount += previousCount;
25      
26        // Update the frequency map with the current word's character set
27        bitmaskFrequency.set(characterBitmask, previousCount + 1);
28    }
29  
30    return pairCount;
31}
32

Time and Space Complexity

Time Complexity: O(L), where L is the total length of all strings in the input list.

  • The outer loop iterates through each word in words, which is n iterations
  • For each word, the inner loop iterates through each character in that word
  • The total number of character iterations across all words is L (the sum of lengths of all strings)
  • Inside the inner loop, operations like bitwise OR (|=), bit shift (<<), and character conversion are all O(1)
  • Dictionary access (cnt[x]) and update (cnt[x] += 1) are O(1) on average
  • Therefore, the overall time complexity is O(L)

Space Complexity: O(n), where n is the number of strings in the input list.

  • The cnt Counter dictionary stores at most n distinct entries (one for each unique bit pattern)
  • In the worst case, each word has a unique set of characters, resulting in n different bit patterns
  • The variable x uses O(1) space as it's just a single integer (even though it represents a bitmask of up to 26 bits)
  • Other variables (ans, s, c) use O(1) space
  • Therefore, the overall space complexity is O(n)

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Using String/Set Comparison Instead of Bit Manipulation

Pitfall: A common mistake is to directly compare character sets using Python sets or sorted strings, which is less efficient.

# Inefficient approach - DON'T DO THIS
def similarPairs(self, words: List[str]) -> int:
    pair_count = 0
    char_sets = []
  
    for word in words:
        # Creating a set for each word
        current_set = set(word)
      
        # Comparing with all previous sets
        for prev_set in char_sets:
            if current_set == prev_set:
                pair_count += 1
      
        char_sets.append(current_set)
  
    return pair_count

Problem: This approach has O(n²) time complexity for comparisons and uses more memory to store sets.

Solution: Use bit manipulation as shown in the original solution, which reduces comparison to O(1) hash lookups.

2. Incorrect Bit Position Calculation

Pitfall: Using the character directly instead of calculating its position relative to 'a'.

# WRONG - This will set incorrect bit positions
bitmask |= 1 << ord(char)  # Sets bits at positions 97-122 for 'a'-'z'

# CORRECT
bitmask |= 1 << (ord(char) - ord('a'))  # Sets bits at positions 0-25

Problem: Without subtracting ord('a'), you'll use bit positions 97-122 instead of 0-25, potentially causing integer overflow or using unnecessary memory.

3. Counting Pairs Incorrectly

Pitfall: Adding the frequency after incrementing it, or trying to count all pairs at the end.

# WRONG - Counts each string with itself
for word in words:
    bitmask = # ... calculate bitmask
    bitmask_frequency[bitmask] += 1  # Increment first
    pair_count += bitmask_frequency[bitmask]  # Then add - includes self!

# WRONG - Trying to calculate pairs after processing all words
for word in words:
    bitmask = # ... calculate bitmask
    bitmask_frequency[bitmask] += 1

# This misses the ordering requirement (i < j)
for count in bitmask_frequency.values():
    pair_count += count * (count - 1) // 2

Solution: Always add the current frequency before incrementing it. This ensures each word only forms pairs with words that appeared before it in the array.

4. Not Handling Empty Strings or Special Cases

Pitfall: Assuming all strings are non-empty or contain only lowercase letters.

# Add validation if needed
for word in words:
    if not word:  # Handle empty string case
        continue
  
    bitmask = 0
    for char in word:
        if 'a' <= char <= 'z':  # Validate character range
            bitmask |= 1 << (ord(char) - ord('a'))

5. Using Regular Dictionary Instead of Counter

Pitfall: Using a regular dictionary without proper initialization.

# WRONG - KeyError when accessing non-existent key
bitmask_frequency = {}
pair_count += bitmask_frequency[bitmask]  # KeyError if bitmask not in dict

# CORRECT - Use Counter which returns 0 for missing keys
bitmask_frequency = Counter()
pair_count += bitmask_frequency[bitmask]  # Returns 0 if key doesn't exist

Solution: Use Counter() from collections, which automatically handles missing keys by returning 0.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

What is the best way of checking if an element exists in a sorted array once in terms of time complexity? Select the best that applies.


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More