1239. Maximum Length of a Concatenated String with Unique Characters

MediumBit ManipulationArrayStringBacktracking
Leetcode Link

Problem Description

In this problem, you're given an array of strings named arr. Your task is to create the longest possible string s by concatenating some or all strings from arr. However, there's an important rule: the string s must consist of unique characters only.

In other words, you need to pick a subsequence of strings from arr such that when these strings are combined, no character appears more than once. The length of the resulting string s is your main objective, and you must return the maximum possible length. Remember that a subsequence is formed from the original array by potentially removing some elements, but the order of the remaining elements must be preserved.

Flowchart Walkthrough

Let's analyze the problem using the the Flowchart. Here's a step-by-step walkthrough:

  1. Is it a graph?

    • No: The problem does not involve direct graph concepts like traversal or node edges relationship. It revolves around strings and character arrays.
  2. Need to solve for kth smallest/largest?

    • No: The question does not ask for kth smallest or largest element. It wants the maximum length of the concatenated string that contains unique characters.
  3. Involves Linked Lists?

    • No: This problem is related to arrays of strings, not linked lists.
  4. Does the problem have small constraints?

    • Yes: The problem can have small constraints because the construction of different combinations from a list falls into exponential growth with respect to input size but can still be considered small enough for backtracking.
  5. Brute force / Backtracking?

    • Yes: Given the requirement for uniqueness and the combination of strings, backtracking is suitable as it allows exploration of all combinations of strings to ensure the maximum length where all characters are unique.

Conclusion: The flowchart leads us to conclude using a backtracking approach, where we recursively try to add each string to the existing set and backtrack when the maximum cannot be enhanced or duplicates are introduced.

Intuition

To find the optimal solution for the maximum length of s, we use a bit manipulation technique known as "state compression". This technique involves using a 32-bit integer to represent the presence of each letter where each bit corresponds to a letter in the alphabet.

The intuition behind this approach is that it allows us to efficiently check whether two strings contain any common characters. We can do this by performing an AND operation (&) on their respective compressed states (masks). If the result is zero, it means there are no common characters.

Here's a step-by-step breakdown of how we arrive at the solution:

  1. Initialize a variable ans to zero. This will hold the maximum length found.
  2. Create a list masks with a single element, zero, to keep track of all unique character combinations we've seen so far.
  3. Iterate through each string s in the array arr.
    • For each string, create a bit mask mask that represents the unique characters in the string.
    • If the string has duplicate characters, we set mask to zero and skip it since such a string cannot contribute to a valid s.
  4. Iterate through the current masks in masks.
    • We use the AND operation to check if the current string's mask has any overlap with mask. If it doesn't (i.e., the result is zero), it means we can concatenate this string without repeating any character.
    • In such a case, we combine (OR operation |) the current mask with mask and add the new mask to our masks list.
    • Update ans with the count of set bits in the new mask, which represents the length of the unique character string formed up to this point. This is done using the built-in bit_count() method on the new mask.
  5. Finally, return the maximum length ans found during the process.

The use of state compression and bit manipulation makes the solution efficient, as it simplifies the process of tracking which characters are present in the strings and eliminates the need for complex data structures or string operations.

Learn more about Backtracking patterns.

Solution Approach

The solution uses two significant concepts: bit manipulation for state compression and backtracking to explore all combinations. The algorithm follows these steps:

  1. Initialize an integer ans with the value 0, which will track the maximum length of a string with unique characters found during the process.

  2. We begin with an array masks that starts with a single element, 0, to record the baseline state (the empty string).

  3. Loop through each string s in the input array arr.

    • For each string s, we create an integer mask that serves as its unique character identifier. The binary representation of mask will have a 1 in the position corresponding to a letter (where a is the least significant bit, and z is the most significant).

    • As we iterate over each character c in the string s, we calculate the difference between the ASCII value of c and that of 'a' to find the corresponding bit position.

    i = ord(c) - ord('a')
    • We then check if the bit at that position is already set. If so, it means the character has appeared before, and we break the loop, setting mask to 0, as this string cannot be part of the result due to the duplicate character.
    if mask >> i & 1:
        mask = 0
        break
    • Otherwise, we set the bit corresponding to the character in the mask.
    mask |= 1 << i
    • If the current string s has any duplicate characters, we disregard this mask and move on to the next string in arr.
  4. Next, for each mask computed from the current string, we examine the masks collected so far.

    • For each existing m in masks, we ensure there's no overlap between m and the current mask using the bitwise AND operation.
    if m & mask == 0:
    • If there's no overlap, it means that adding the current string's unique characters to the characters represented by m would still result in a string with all unique characters.

    • In that case, we combine m and mask using the bitwise OR operation. The result is a new mask representing a unique combination of characters from both masks. We add this new mask to masks.

    masks.append(m | mask)
    • We then calculate the total number of unique characters we have so far with the new mask by counting the number of set bits. In Python, this can be done with the .bit_count() method. We update our answer ans if this count is greater than the previous maximum.
    ans = max(ans, (m | mask).bit_count())
  5. Once we've checked all strings and recorded all possible unique character combinations, we return the value of ans, which represents the maximum length of a string with all unique characters we can create by concatenating a subsequence of arr.

The use of bit masks elegantly handles the uniqueness constraint, and the iterative approach ensures that all potential combinations are considered without duplication, leading to an efficient solution.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example to illustrate the solution approach.

Consider the input array of strings arr as ["un","iq","ue"].

  1. Start by initializing ans to 0 and masks to [0].

  2. The first string is un.

    • For u, ord('u') - ord('a') gives us 20. So, mask becomes 0 | (1 << 20).
    • For n, ord('n') - ord('a') gives us 13. So, mask becomes mask | (1 << 13).

    After processing un, our mask is 10000100000000000000 in binary, which is 1048576 decimal.

  3. There are no duplicates in un, so we move on and create a new combination by OR-ing this mask with 0 from masks. We update our masks to be [0, 1048576].

  4. The next string is iq.

    • For i, ord('i') - ord('a') gives us 8. So, mask becomes 0 | (1 << 8).
    • For q, ord('q') - ord('a') gives us 16. So, mask becomes mask | (1 << 16).

    After processing iq, our mask is 1000000010000 in binary, which is 65792 decimal.

  5. Check this mask against all in masks. There's no common bit set with 1048576 (previous mask). Therefore, we can combine them to form a new mask: 1048576 | 65792 which in binary is 1000010010000, and in decimal is 1111368.

  6. Update ans with the bit count of the new mask. It has 4 bits set, representing 4 unique characters. ans becomes 4.

  7. The masks array becomes [0, 1048576, 65792, 1111368].

  8. Lastly, we have the string ue.

    • For u, there's already a bit set in the previous mask, so we skip the creation of a new mask.
  9. There are no more strings to process. We return the ans, which is 4. This is the maximum length of a string with all unique characters we can create by concatenating strings from arr.

This process of using bit masks allows us to efficiently manage and combine the unique characters from various strings without having to deal with actual string concatenation and character counting.

Solution Implementation

1from typing import List
2
3class Solution:
4    def maxLength(self, arr: List[str]) -> int:
5        max_length = 0  # Variable to store the maximum length of unique characters
6        masks = [0]  # List to store the unique character sets as bit masks
7      
8        # Iterate through each string in the input list
9        for string in arr:
10            mask = 0  # Initialize mask for current string
11          
12            # Check each character in the string
13            for char in string:
14                char_index = ord(char) - ord('a')  # Map 'a'-'z' to 0-25
15              
16                # If the character is already in the mask, reset mask and break
17                if mask >> char_index & 1:
18                    mask = 0
19                    break
20                # Add the character to the mask
21                mask |= 1 << char_index
22          
23            # If mask is not zero, it means the string had unique characters
24            if mask != 0:
25                # Check the new mask with existing masks for no overlap of characters
26                for existing_mask in masks.copy():
27                    if existing_mask & mask == 0:
28                        new_mask = existing_mask | mask  # Combine the masks
29                        masks.append(new_mask)  # Append the new mask to masks
30                        max_length = max(max_length, bin(new_mask).count('1'))  # Update max length
31                      
32        return max_length  # return the maximum length found
33
1class Solution {
2
3    public int maxLength(List<String> arr) {
4        int maxLen = 0; // This will hold the maximum length of unique characters.
5        List<Integer> bitMasks = new ArrayList<>(); // This list will store unique character combinations using bit masking.
6        bitMasks.add(0); // Initialize the list with a bitmask of 0 (no characters).
7
8        // Iterate over each string in the list.
9        for (String s : arr) {
10            int bitMask = 0;
11            // Iterate over the characters in the string to create a bitmask.
12            for (int i = 0; i < s.length(); ++i) {
13                int bitIndex = s.charAt(i) - 'a'; // Convert character to a bitmask index (0-25).
14                // If the character is already in the bitmask (duplicate character),
15                // set the bitmask to 0 and break out of the loop.
16                if (((bitMask >> bitIndex) & 1) == 1) {
17                    bitMask = 0;
18                    break;
19                }
20                bitMask |= 1 << bitIndex; // Add the character into the bitmask.
21            }
22
23            // If bitmask is 0, the string contains duplicates and should be ignored.
24            if (bitMask == 0) {
25                continue;
26            }
27
28            int currentSize = bitMasks.size();
29            // Iterate over existing masks and combine them with the new mask if possible.
30            for (int i = 0; i < currentSize; ++i) {
31                int combinedMask = bitMasks.get(i);
32                // If there is no collision between the current mask and the new mask,
33                // we can combine them into a new mask.
34                if ((combinedMask & bitMask) == 0) {
35                    bitMasks.add(combinedMask | bitMask); // Combine masks by OR operation.
36                    maxLen = Math.max(maxLen, Integer.bitCount(combinedMask | bitMask)); // Update maxLen if necessary.
37                }
38            }
39        }
40
41        return maxLen; // Return the maximum length found.
42    }
43}
44
1#include <vector>
2#include <string>
3#include <algorithm>
4
5class Solution {
6public:
7    int maxLength(std::vector<std::string>& arr) {
8        int max_length = 0; // to store the max length of unique characters
9        // masks initially contains only one element: "0", which represents an empty string
10        std::vector<int> masks = {0};
11      
12        // Iterate over all strings in the input vector
13        for (std::string& str : arr) {
14          
15            int mask = 0; // Bitmask to represent the current string
16          
17            // Iterate over each character in the current string
18            for (char& ch : str) {
19                int bitIndex = ch - 'a';
20              
21                // Check if this character has already appeared in the string (mask)
22                if ((mask >> bitIndex) & 1) {
23                    // If the character repeats, discard this string by setting the mask to 0 and break
24                    mask = 0;
25                    break;
26                }
27              
28                // Add the current character to the mask
29                mask |= 1 << bitIndex;
30            }
31          
32            // Only proceed if mask is not zero (valid string without any repeating character)
33            if (mask == 0) continue;
34          
35            int currentMasksCount = masks.size();
36            // Iterate over existing combinations of strings represented by masks
37            for (int i = 0; i < currentMasksCount; ++i) {
38                int combinedMask = masks[i];
39              
40                // Check if current mask and combined mask have no characters in common
41                if ((combinedMask & mask) == 0) {
42                    // Combine current string with the string represented by combinedMask
43                    masks.push_back(combinedMask | mask);
44                    // Update max_length using the number of unique characters in the new combination
45                    max_length = std::max(max_length, __builtin_popcount(combinedMask | mask));
46                }
47            }
48        }
49      
50        // Return the maximum length of a string with all unique characters
51        return max_length;
52    }
53};
54
1// Import necessary functions from built-in modules
2import { max } from 'lodash';
3
4// Utility function to count the number of set bits (1s) in the binary representation of a number
5const popCount = (n: number): number => {
6  let count = 0;
7  while (n) {
8    count += n & 1;
9    n >>= 1;
10  }
11  return count;
12};
13
14// Function to compute the max length of a concatenated string of unique characters
15const maxLength = (arr: string[]): number => {
16  let maxLength = 0; // To store the max length of unique characters
17  // Masks initially contains only one element: 0, which represents an empty string
18  let masks: number[] = [0];
19
20  // Iterate over all strings in the input array
21  arr.forEach(str => {
22    let mask = 0; // Bitmask to represent the current string
23
24    // Iterate over each character in the current string
25    for (const ch of str) {
26      let bitIndex = ch.charCodeAt(0) - 'a'.charCodeAt(0);
27
28      // Check if this character has already appeared in the string (mask)
29      if ((mask >> bitIndex) & 1) {
30        // If the character repeats, discard this string by setting the mask to 0 and break
31        mask = 0;
32        break;
33      }
34
35      // Add the current character to the mask
36      mask |= 1 << bitIndex;
37    }
38
39    // Only proceed if mask is not zero (valid string without any repeating character)
40    if (mask === 0) return;
41
42    // Create a copy of the current masks to iterate over
43    const currentMasks = [...masks];
44  
45    // Iterate over existing combinations of strings represented by masks
46    currentMasks.forEach(combinedMask => {
47      // Check if the current mask and combined mask have no characters in common
48      if ((combinedMask & mask) === 0) {
49        // Combine current string with the string represented by combinedMask
50        masks.push(combinedMask | mask);
51        // Update maxLength using the number of unique characters in the new combination
52        maxLength = max([maxLength, popCount(combinedMask | mask)])!;
53      }
54    });
55  });
56
57  // Return the maximum length of a string with all unique characters
58  return maxLength;
59};
60
61// Example usage:
62// console.log(maxLength(["un", "iq", "ue"])); // Should print 4
63

Time and Space Complexity

Time Complexity

The time complexity of the code primarily stems from two nested loops: the outer loop iterating over each string s in arr, and the inner loop iterating over the set of bitmasks masks. For every character c in each string s, we perform a constant-time operation to check if that character has already been seen by using a bitmask. If it has, we break early and the mask is set to 0. The worst-case scenario for this operation is O(k) where k is the length of the longest string.

Considering the generation of new masks, for every mask variable generated from string s, the code iterates through the masks list to check if there's any overlap with existing masks (m & mask == 0). In the worst-case scenario, if all characters in arr are unique and n is the length of arr, there could be up to 2^n combinations as every element in arr could be included or excluded from the combination.

The bit count operation (.bit_count()) is generally considered a constant-time operation, but since it's applied for each newly created mask, it doesn't dominate the time complexity.

Therefore, the overall time complexity is O(n * 2^n * k).

Space Complexity

The space complexity is dictated by the masks list which stores the unique combinations of characters that have been seen. In the worst-case scenario, this could be as large as 2^n where n is the length of arr. No other data structure in the code uses space that scales with the size of the input to the same degree.

Therefore, the space complexity of the algorithm is O(2^n).

Learn more about how to find time and space complexity quickly using problem constraints.


Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:
Question 1 out of 10

Which of the following problems can be solved with backtracking (select multiple)


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!