Facebook Pixel

645. Set Mismatch

EasyBit ManipulationArrayHash TableSorting
Leetcode Link

Problem Description

You start with a set of integers containing all numbers from 1 to n. Due to an error, one number in this set got duplicated, which means:

  • One number appears twice (the duplicate)
  • One number is missing (got replaced by the duplicate)

Given an integer array nums that represents this corrupted set, you need to find both:

  1. The number that appears twice
  2. The number that is missing

Return these two numbers as an array [duplicate, missing].

For example, if the original set was [1, 2, 3, 4] and after the error it became [1, 2, 2, 4], then:

  • The number 2 appears twice (duplicate)
  • The number 3 is missing
  • You would return [2, 3]

The solution uses mathematical relationships between sums:

  • s1 = (1 + n) * n // 2 calculates the sum of all numbers from 1 to n
  • s2 = sum(set(nums)) calculates the sum of unique numbers in the corrupted array
  • s = sum(nums) calculates the sum of all numbers in the corrupted array (including the duplicate)
  • The duplicate number equals s - s2 (since s includes the duplicate twice while s2 includes it once)
  • The missing number equals s1 - s2 (since s1 has the complete sum while s2 is missing one number)
Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is to think about what information we can extract from different sums of the array.

Let's consider what we know:

  • The original set should have numbers from 1 to n with each appearing exactly once
  • The corrupted set has one number appearing twice and one number missing

If we calculate the sum of the corrupted array sum(nums), this sum includes the duplicate number twice but is missing the original number that got replaced.

Now, if we remove duplicates from the corrupted array using set(nums) and calculate its sum, we get the sum of all unique numbers present in the corrupted array. This sum includes the duplicate number only once but still doesn't include the missing number.

The difference between these two sums (sum(nums) - sum(set(nums))) gives us the duplicate number because:

  • sum(nums) counts the duplicate twice
  • sum(set(nums)) counts the duplicate once
  • Their difference is exactly the duplicate number

To find the missing number, we need to know what the sum should have been if there were no errors. The sum of numbers from 1 to n is (1 + n) * n // 2.

The difference between this ideal sum and the sum of unique numbers in our corrupted array (s1 - s2) gives us the missing number because:

  • s1 includes all numbers from 1 to n
  • s2 includes all numbers except the missing one
  • Their difference is exactly the missing number

This approach cleverly uses the mathematical properties of sums to extract both pieces of information we need without having to track individual numbers or use extra space for counting.

Learn more about Sorting patterns.

Solution Approach

The implementation follows the mathematical approach outlined in the reference solution:

  1. Calculate the length of the array: n = len(nums) gives us the size of the set, which tells us the numbers should range from 1 to n.

  2. Calculate the ideal sum s1: Using the formula for the sum of an arithmetic sequence, s1 = (1 + n) * n // 2 computes what the sum should be if we had all numbers from 1 to n exactly once.

  3. Calculate the sum of unique elements s2: s2 = sum(set(nums)) first converts the array to a set to remove duplicates, then sums all unique elements. This gives us the sum of all distinct numbers present in the corrupted array.

  4. Calculate the actual sum s: s = sum(nums) computes the sum of all elements in the corrupted array, including the duplicate counted twice.

  5. Find the duplicate and missing numbers:

    • The duplicate number = s - s2: Since s includes the duplicate twice and s2 includes it once, their difference gives us the duplicate.
    • The missing number = s1 - s2: Since s1 includes all numbers from 1 to n and s2 is missing one number, their difference gives us the missing number.
  6. Return the result: return [s - s2, s1 - s2] returns an array with the duplicate number first and the missing number second.

This solution has a time complexity of O(n) for computing the sums and creating the set, and a space complexity of O(n) for storing the set. The approach elegantly uses mathematical properties to avoid explicit searching or counting of individual elements.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example where nums = [1, 2, 2, 4].

Step 1: Identify what we have

  • Array length: n = 4
  • The array should contain [1, 2, 3, 4] but instead has [1, 2, 2, 4]
  • Number 2 appears twice (duplicate)
  • Number 3 is missing

Step 2: Calculate the ideal sum (s1)

  • Sum of numbers from 1 to n: s1 = (1 + 4) Γ— 4 Γ· 2 = 5 Γ— 2 = 10
  • This represents: 1 + 2 + 3 + 4 = 10

Step 3: Calculate sum of unique elements (s2)

  • Convert to set: set([1, 2, 2, 4]) = {1, 2, 4}
  • Sum the set: s2 = 1 + 2 + 4 = 7
  • Notice this sum excludes the duplicate occurrence and the missing number

Step 4: Calculate actual array sum (s)

  • Sum all elements: s = 1 + 2 + 2 + 4 = 9
  • This includes the duplicate counted twice

Step 5: Find the duplicate and missing numbers

  • Duplicate: s - s2 = 9 - 7 = 2

    • Why? The actual sum (9) counts 2 twice, while the unique sum (7) counts it once
    • The difference is exactly the duplicate number
  • Missing: s1 - s2 = 10 - 7 = 3

    • Why? The ideal sum (10) includes all numbers 1-4, while the unique sum (7) is missing number 3
    • The difference is exactly the missing number

Step 6: Return result

  • Return [2, 3] where 2 is the duplicate and 3 is the missing number

The elegance of this approach is that we never need to explicitly search for or count individual numbers - the mathematical relationships between different sums reveal both the duplicate and missing values directly.

Solution Implementation

1from typing import List
2
3class Solution:
4    def findErrorNums(self, nums: List[int]) -> List[int]:
5        """
6        Find the duplicate number and the missing number in an array.
7      
8        The array should contain numbers from 1 to n, but one number appears twice
9        and another number is missing.
10      
11        Args:
12            nums: List of integers from 1 to n with one duplicate and one missing
13          
14        Returns:
15            List containing [duplicate_number, missing_number]
16        """
17        n = len(nums)
18      
19        # Calculate the expected sum of numbers from 1 to n using arithmetic series formula
20        expected_sum = (1 + n) * n // 2
21      
22        # Calculate the sum of unique numbers in the array (removes the duplicate)
23        unique_sum = sum(set(nums))
24      
25        # Calculate the actual sum of all numbers in the array (includes the duplicate)
26        actual_sum = sum(nums)
27      
28        # The duplicate number = actual_sum - unique_sum
29        duplicate_number = actual_sum - unique_sum
30      
31        # The missing number = expected_sum - unique_sum
32        missing_number = expected_sum - unique_sum
33      
34        return [duplicate_number, missing_number]
35
1class Solution {
2    public int[] findErrorNums(int[] nums) {
3        int n = nums.length;
4      
5        // Calculate the expected sum of numbers from 1 to n using arithmetic sequence formula
6        int expectedSum = (1 + n) * n / 2;
7      
8        // Track unique numbers and their sum
9        Set<Integer> uniqueNumbers = new HashSet<>();
10        int uniqueSum = 0;
11      
12        // Calculate the actual sum of all elements (including duplicate)
13        int actualSum = 0;
14      
15        // Iterate through the array
16        for (int num : nums) {
17            // If the number is added to set successfully (not a duplicate)
18            if (uniqueNumbers.add(num)) {
19                uniqueSum += num;
20            }
21            // Add to actual sum regardless of duplication
22            actualSum += num;
23        }
24      
25        // The duplicate number = actualSum - uniqueSum (extra occurrence)
26        int duplicate = actualSum - uniqueSum;
27      
28        // The missing number = expectedSum - uniqueSum (what should be there but isn't)
29        int missing = expectedSum - uniqueSum;
30      
31        return new int[] {duplicate, missing};
32    }
33}
34
1class Solution {
2public:
3    vector<int> findErrorNums(vector<int>& nums) {
4        int n = nums.size();
5      
6        // Calculate the expected sum of numbers from 1 to n using the formula: n * (n + 1) / 2
7        int expectedSum = (1 + n) * n / 2;
8      
9        // Create a set to store unique elements from the array
10        // This will automatically remove the duplicate number
11        unordered_set<int> uniqueNumbers(nums.begin(), nums.end());
12      
13        // Calculate the sum of unique elements (missing the duplicate)
14        int sumOfUniqueElements = 0;
15        for (int number : uniqueNumbers) {
16            sumOfUniqueElements += number;
17        }
18      
19        // Calculate the actual sum of all elements in the array (including the duplicate)
20        int actualSum = accumulate(nums.begin(), nums.end(), 0);
21      
22        // The duplicate number = actualSum - sumOfUniqueElements
23        // The missing number = expectedSum - sumOfUniqueElements
24        return {actualSum - sumOfUniqueElements, expectedSum - sumOfUniqueElements};
25    }
26};
27
1/**
2 * Finds the duplicate number and missing number in an array containing numbers from 1 to n
3 * @param nums - Array containing numbers from 1 to n with one duplicate and one missing
4 * @returns Array with [duplicate, missing] numbers
5 */
6function findErrorNums(nums: number[]): number[] {
7    // Get the length of the input array
8    const arrayLength: number = nums.length;
9  
10    // Calculate the expected sum of numbers from 1 to n using formula: n * (n + 1) / 2
11    const expectedSum: number = (arrayLength * (arrayLength + 1)) >> 1;
12  
13    // Calculate the sum of unique elements by removing duplicates using Set
14    const uniqueElementsSum: number = [...new Set(nums)].reduce((accumulator, current) => accumulator + current, 0);
15  
16    // Calculate the actual sum of all elements in the array (including duplicate)
17    const actualSum: number = nums.reduce((accumulator, current) => accumulator + current, 0);
18  
19    // The duplicate number is: actualSum - uniqueElementsSum
20    // The missing number is: expectedSum - uniqueElementsSum
21    return [actualSum - uniqueElementsSum, expectedSum - uniqueElementsSum];
22}
23

Time and Space Complexity

The time complexity is O(n), where n is the length of the array nums. This is because:

  • Computing the sum of integers from 1 to n using the formula (1 + n) * n // 2 takes O(1) time
  • Creating a set from nums takes O(n) time
  • Computing sum(set(nums)) takes O(n) time after set creation
  • Computing sum(nums) takes O(n) time
  • The overall time complexity is dominated by the linear operations, resulting in O(n)

The space complexity is O(n), where n is the length of the array nums. This is because:

  • The set(nums) operation creates a new set that stores the unique elements from nums, which in the worst case (when there's only one duplicate) contains n-1 elements
  • This extra space requirement for storing the deduplicated array results in O(n) space complexity
  • Other variables like s1, s2, s, and n only require O(1) additional space

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Integer Overflow in Large Arrays

When dealing with very large values of n, calculating (1 + n) * n might cause integer overflow in languages with fixed integer sizes. While Python handles arbitrary precision integers automatically, this could be problematic in other languages.

Solution: Use the division first to avoid overflow: n * (n + 1) // 2 or be mindful of the data type limits in your chosen language.

2. Assuming Array is Sorted or Contains Valid Range

The solution assumes the array contains numbers from 1 to n, but doesn't validate this assumption. If the input array contains numbers outside this range or has multiple duplicates, the mathematical approach will produce incorrect results.

Solution: Add input validation or use a frequency-based approach:

def findErrorNums(self, nums: List[int]) -> List[int]:
    n = len(nums)
    count = [0] * (n + 1)  # Index 0 unused, indices 1 to n for numbers 1 to n
  
    for num in nums:
        count[num] += 1
  
    duplicate = missing = 0
    for i in range(1, n + 1):
        if count[i] == 2:
            duplicate = i
        elif count[i] == 0:
            missing = i
  
    return [duplicate, missing]

3. Memory Usage with Set Creation

Creating a set from the array requires O(n) additional space. In memory-constrained environments, this might be inefficient.

Solution: Use XOR bit manipulation or cycle detection algorithms that work in O(1) space:

def findErrorNums(self, nums: List[int]) -> List[int]:
    duplicate = missing = 0
    for i, num in enumerate(nums):
        index = abs(num) - 1
        if nums[index] < 0:
            duplicate = abs(num)
        else:
            nums[index] = -nums[index]
  
    for i in range(len(nums)):
        if nums[i] > 0:
            missing = i + 1
          
    return [duplicate, missing]

4. Confusion About Formula Order

The formula (1 + n) * n // 2 might be confusing. Some might incorrectly write it as n * (n - 1) // 2 or forget the integer division.

Solution: Remember the arithmetic series sum formula clearly: Sum = n * (first_term + last_term) / 2, where first_term = 1 and last_term = n.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which data structure is used in a depth first search?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More