645. Set Mismatch
Problem Description
You start with a set of integers containing all numbers from 1
to n
. Due to an error, one number in this set got duplicated, which means:
- One number appears twice (the duplicate)
- One number is missing (got replaced by the duplicate)
Given an integer array nums
that represents this corrupted set, you need to find both:
- The number that appears twice
- The number that is missing
Return these two numbers as an array [duplicate, missing]
.
For example, if the original set was [1, 2, 3, 4]
and after the error it became [1, 2, 2, 4]
, then:
- The number
2
appears twice (duplicate) - The number
3
is missing - You would return
[2, 3]
The solution uses mathematical relationships between sums:
s1 = (1 + n) * n // 2
calculates the sum of all numbers from 1 to ns2 = sum(set(nums))
calculates the sum of unique numbers in the corrupted arrays = sum(nums)
calculates the sum of all numbers in the corrupted array (including the duplicate)- The duplicate number equals
s - s2
(sinces
includes the duplicate twice whiles2
includes it once) - The missing number equals
s1 - s2
(sinces1
has the complete sum whiles2
is missing one number)
Intuition
The key insight is to think about what information we can extract from different sums of the array.
Let's consider what we know:
- The original set should have numbers from
1
ton
with each appearing exactly once - The corrupted set has one number appearing twice and one number missing
If we calculate the sum of the corrupted array sum(nums)
, this sum includes the duplicate number twice but is missing the original number that got replaced.
Now, if we remove duplicates from the corrupted array using set(nums)
and calculate its sum, we get the sum of all unique numbers present in the corrupted array. This sum includes the duplicate number only once but still doesn't include the missing number.
The difference between these two sums (sum(nums) - sum(set(nums))
) gives us the duplicate number because:
sum(nums)
counts the duplicate twicesum(set(nums))
counts the duplicate once- Their difference is exactly the duplicate number
To find the missing number, we need to know what the sum should have been if there were no errors. The sum of numbers from 1
to n
is (1 + n) * n // 2
.
The difference between this ideal sum and the sum of unique numbers in our corrupted array (s1 - s2
) gives us the missing number because:
s1
includes all numbers from1
ton
s2
includes all numbers except the missing one- Their difference is exactly the missing number
This approach cleverly uses the mathematical properties of sums to extract both pieces of information we need without having to track individual numbers or use extra space for counting.
Learn more about Sorting patterns.
Solution Approach
The implementation follows the mathematical approach outlined in the reference solution:
-
Calculate the length of the array:
n = len(nums)
gives us the size of the set, which tells us the numbers should range from1
ton
. -
Calculate the ideal sum
s1
: Using the formula for the sum of an arithmetic sequence,s1 = (1 + n) * n // 2
computes what the sum should be if we had all numbers from1
ton
exactly once. -
Calculate the sum of unique elements
s2
:s2 = sum(set(nums))
first converts the array to a set to remove duplicates, then sums all unique elements. This gives us the sum of all distinct numbers present in the corrupted array. -
Calculate the actual sum
s
:s = sum(nums)
computes the sum of all elements in the corrupted array, including the duplicate counted twice. -
Find the duplicate and missing numbers:
- The duplicate number =
s - s2
: Sinces
includes the duplicate twice ands2
includes it once, their difference gives us the duplicate. - The missing number =
s1 - s2
: Sinces1
includes all numbers from1
ton
ands2
is missing one number, their difference gives us the missing number.
- The duplicate number =
-
Return the result:
return [s - s2, s1 - s2]
returns an array with the duplicate number first and the missing number second.
This solution has a time complexity of O(n)
for computing the sums and creating the set, and a space complexity of O(n)
for storing the set. The approach elegantly uses mathematical properties to avoid explicit searching or counting of individual elements.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a concrete example where nums = [1, 2, 2, 4]
.
Step 1: Identify what we have
- Array length:
n = 4
- The array should contain
[1, 2, 3, 4]
but instead has[1, 2, 2, 4]
- Number 2 appears twice (duplicate)
- Number 3 is missing
Step 2: Calculate the ideal sum (s1)
- Sum of numbers from 1 to n:
s1 = (1 + 4) Γ 4 Γ· 2 = 5 Γ 2 = 10
- This represents:
1 + 2 + 3 + 4 = 10
Step 3: Calculate sum of unique elements (s2)
- Convert to set:
set([1, 2, 2, 4]) = {1, 2, 4}
- Sum the set:
s2 = 1 + 2 + 4 = 7
- Notice this sum excludes the duplicate occurrence and the missing number
Step 4: Calculate actual array sum (s)
- Sum all elements:
s = 1 + 2 + 2 + 4 = 9
- This includes the duplicate counted twice
Step 5: Find the duplicate and missing numbers
-
Duplicate:
s - s2 = 9 - 7 = 2
- Why? The actual sum (9) counts 2 twice, while the unique sum (7) counts it once
- The difference is exactly the duplicate number
-
Missing:
s1 - s2 = 10 - 7 = 3
- Why? The ideal sum (10) includes all numbers 1-4, while the unique sum (7) is missing number 3
- The difference is exactly the missing number
Step 6: Return result
- Return
[2, 3]
where 2 is the duplicate and 3 is the missing number
The elegance of this approach is that we never need to explicitly search for or count individual numbers - the mathematical relationships between different sums reveal both the duplicate and missing values directly.
Solution Implementation
1from typing import List
2
3class Solution:
4 def findErrorNums(self, nums: List[int]) -> List[int]:
5 """
6 Find the duplicate number and the missing number in an array.
7
8 The array should contain numbers from 1 to n, but one number appears twice
9 and another number is missing.
10
11 Args:
12 nums: List of integers from 1 to n with one duplicate and one missing
13
14 Returns:
15 List containing [duplicate_number, missing_number]
16 """
17 n = len(nums)
18
19 # Calculate the expected sum of numbers from 1 to n using arithmetic series formula
20 expected_sum = (1 + n) * n // 2
21
22 # Calculate the sum of unique numbers in the array (removes the duplicate)
23 unique_sum = sum(set(nums))
24
25 # Calculate the actual sum of all numbers in the array (includes the duplicate)
26 actual_sum = sum(nums)
27
28 # The duplicate number = actual_sum - unique_sum
29 duplicate_number = actual_sum - unique_sum
30
31 # The missing number = expected_sum - unique_sum
32 missing_number = expected_sum - unique_sum
33
34 return [duplicate_number, missing_number]
35
1class Solution {
2 public int[] findErrorNums(int[] nums) {
3 int n = nums.length;
4
5 // Calculate the expected sum of numbers from 1 to n using arithmetic sequence formula
6 int expectedSum = (1 + n) * n / 2;
7
8 // Track unique numbers and their sum
9 Set<Integer> uniqueNumbers = new HashSet<>();
10 int uniqueSum = 0;
11
12 // Calculate the actual sum of all elements (including duplicate)
13 int actualSum = 0;
14
15 // Iterate through the array
16 for (int num : nums) {
17 // If the number is added to set successfully (not a duplicate)
18 if (uniqueNumbers.add(num)) {
19 uniqueSum += num;
20 }
21 // Add to actual sum regardless of duplication
22 actualSum += num;
23 }
24
25 // The duplicate number = actualSum - uniqueSum (extra occurrence)
26 int duplicate = actualSum - uniqueSum;
27
28 // The missing number = expectedSum - uniqueSum (what should be there but isn't)
29 int missing = expectedSum - uniqueSum;
30
31 return new int[] {duplicate, missing};
32 }
33}
34
1class Solution {
2public:
3 vector<int> findErrorNums(vector<int>& nums) {
4 int n = nums.size();
5
6 // Calculate the expected sum of numbers from 1 to n using the formula: n * (n + 1) / 2
7 int expectedSum = (1 + n) * n / 2;
8
9 // Create a set to store unique elements from the array
10 // This will automatically remove the duplicate number
11 unordered_set<int> uniqueNumbers(nums.begin(), nums.end());
12
13 // Calculate the sum of unique elements (missing the duplicate)
14 int sumOfUniqueElements = 0;
15 for (int number : uniqueNumbers) {
16 sumOfUniqueElements += number;
17 }
18
19 // Calculate the actual sum of all elements in the array (including the duplicate)
20 int actualSum = accumulate(nums.begin(), nums.end(), 0);
21
22 // The duplicate number = actualSum - sumOfUniqueElements
23 // The missing number = expectedSum - sumOfUniqueElements
24 return {actualSum - sumOfUniqueElements, expectedSum - sumOfUniqueElements};
25 }
26};
27
1/**
2 * Finds the duplicate number and missing number in an array containing numbers from 1 to n
3 * @param nums - Array containing numbers from 1 to n with one duplicate and one missing
4 * @returns Array with [duplicate, missing] numbers
5 */
6function findErrorNums(nums: number[]): number[] {
7 // Get the length of the input array
8 const arrayLength: number = nums.length;
9
10 // Calculate the expected sum of numbers from 1 to n using formula: n * (n + 1) / 2
11 const expectedSum: number = (arrayLength * (arrayLength + 1)) >> 1;
12
13 // Calculate the sum of unique elements by removing duplicates using Set
14 const uniqueElementsSum: number = [...new Set(nums)].reduce((accumulator, current) => accumulator + current, 0);
15
16 // Calculate the actual sum of all elements in the array (including duplicate)
17 const actualSum: number = nums.reduce((accumulator, current) => accumulator + current, 0);
18
19 // The duplicate number is: actualSum - uniqueElementsSum
20 // The missing number is: expectedSum - uniqueElementsSum
21 return [actualSum - uniqueElementsSum, expectedSum - uniqueElementsSum];
22}
23
Time and Space Complexity
The time complexity is O(n)
, where n
is the length of the array nums
. This is because:
- Computing the sum of integers from 1 to n using the formula
(1 + n) * n // 2
takesO(1)
time - Creating a set from
nums
takesO(n)
time - Computing
sum(set(nums))
takesO(n)
time after set creation - Computing
sum(nums)
takesO(n)
time - The overall time complexity is dominated by the linear operations, resulting in
O(n)
The space complexity is O(n)
, where n
is the length of the array nums
. This is because:
- The
set(nums)
operation creates a new set that stores the unique elements fromnums
, which in the worst case (when there's only one duplicate) containsn-1
elements - This extra space requirement for storing the deduplicated array results in
O(n)
space complexity - Other variables like
s1
,s2
,s
, andn
only requireO(1)
additional space
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Integer Overflow in Large Arrays
When dealing with very large values of n
, calculating (1 + n) * n
might cause integer overflow in languages with fixed integer sizes. While Python handles arbitrary precision integers automatically, this could be problematic in other languages.
Solution: Use the division first to avoid overflow: n * (n + 1) // 2
or be mindful of the data type limits in your chosen language.
2. Assuming Array is Sorted or Contains Valid Range
The solution assumes the array contains numbers from 1 to n, but doesn't validate this assumption. If the input array contains numbers outside this range or has multiple duplicates, the mathematical approach will produce incorrect results.
Solution: Add input validation or use a frequency-based approach:
def findErrorNums(self, nums: List[int]) -> List[int]:
n = len(nums)
count = [0] * (n + 1) # Index 0 unused, indices 1 to n for numbers 1 to n
for num in nums:
count[num] += 1
duplicate = missing = 0
for i in range(1, n + 1):
if count[i] == 2:
duplicate = i
elif count[i] == 0:
missing = i
return [duplicate, missing]
3. Memory Usage with Set Creation
Creating a set from the array requires O(n) additional space. In memory-constrained environments, this might be inefficient.
Solution: Use XOR bit manipulation or cycle detection algorithms that work in O(1) space:
def findErrorNums(self, nums: List[int]) -> List[int]:
duplicate = missing = 0
for i, num in enumerate(nums):
index = abs(num) - 1
if nums[index] < 0:
duplicate = abs(num)
else:
nums[index] = -nums[index]
for i in range(len(nums)):
if nums[i] > 0:
missing = i + 1
return [duplicate, missing]
4. Confusion About Formula Order
The formula (1 + n) * n // 2
might be confusing. Some might incorrectly write it as n * (n - 1) // 2
or forget the integer division.
Solution: Remember the arithmetic series sum formula clearly: Sum = n * (first_term + last_term) / 2, where first_term = 1 and last_term = n.
Which data structure is used in a depth first search?
Recommended Readings
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Donβt Miss This!