287. Find the Duplicate Number
Problem Description
You are given an array nums
containing n + 1
integers, where each integer is in the range [1, n]
inclusive.
The array has exactly one repeated number that appears more than once. Your task is to find and return this duplicate number.
Constraints:
- You cannot modify the input array
nums
- You must use only constant extra space (O(1) space complexity)
Example understanding:
If n = 4
, the array has 5 elements, and each element is between 1 and 4. Since there are 5 elements but only 4 possible distinct values, by the Pigeonhole Principle, at least one number must appear twice.
The solution uses binary search on the value range [1, n]
rather than on array indices. For each mid-value x
, it counts how many elements in the array are less than or equal to x
. If this count is greater than x
, it means there must be a duplicate in the range [1, x]
. Otherwise, the duplicate is in the range [x+1, n]
.
The key insight is that without a duplicate, there should be exactly x
numbers that are ≤ x
. If we count more than x
numbers that are ≤ x
, then at least one number in the range [1, x]
appears multiple times.
Intuition
The key observation starts with understanding what happens in a normal scenario without duplicates. If we have numbers from 1
to n
with no duplicates, then for any number x
in this range, there should be exactly x
numbers that are less than or equal to x
. For example, if x = 3
, there should be exactly 3 numbers: 1, 2, 3
.
But what happens when there's a duplicate? If a number in the range [1, x]
appears twice, then when we count how many numbers are ≤ x
, we'll get more than x
. This is because the duplicate creates an "excess" count.
Think of it like this: imagine you're looking for a duplicate in [1, 2, 3, 3, 4]
. If you ask "how many numbers are ≤ 3?", you get 4 numbers (1, 2, 3, 3
), but in a normal case without duplicates, there should only be 3. This excess tells us the duplicate must be in the range [1, 3]
.
This observation naturally leads to a binary search approach on the value range rather than array indices. We can repeatedly divide the range [1, n]
in half and check which half contains the duplicate by counting elements. If count(≤ mid) > mid
, the duplicate is in [1, mid]
; otherwise, it's in [mid+1, n]
.
The beauty of this approach is that we're using the mathematical property of the problem itself - the constraint that numbers are in range [1, n]
- rather than needing to track array positions or modify the array. We're essentially using the Pigeonhole Principle in a binary search fashion to narrow down where the duplicate must be.
Solution Approach
The solution implements binary search on the value range [1, n]
to find the duplicate number. Here's how the implementation works:
Binary Search Setup:
We use bisect_left
from Python's bisect module to perform the binary search. The search range is range(len(nums))
, which represents values from 0
to n
(since the array has n+1
elements).
Key Function:
The helper function f(x)
is the core logic:
def f(x: int) -> bool:
return sum(v <= x for v in nums) > x
This function returns True
if the count of elements ≤ x
is greater than x
, indicating that the duplicate is in the range [1, x]
.
How Binary Search Works Here:
bisect_left
searches for the leftmost position wheref(x)
returnsTrue
- It starts with the middle value of the range
- If
f(mid)
isTrue
, it means the duplicate is in[1, mid]
, so we search the left half - If
f(mid)
isFalse
, the duplicate is in[mid+1, n]
, so we search the right half - The search continues until we converge on the exact duplicate value
Example Walkthrough:
For array [1, 3, 4, 2, 2]
:
- Check
x = 2
: count of elements≤ 2
is 3 (which are1, 2, 2
). Since3 > 2
,f(2) = True
- Check
x = 1
: count of elements≤ 1
is 1. Since1 = 1
,f(1) = False
- Binary search narrows down to
x = 2
as the duplicate
Time and Space Complexity:
- Time:
O(n log n)
- Binary search takesO(log n)
iterations, and each iteration counts elements inO(n)
- Space:
O(1)
- Only using a few variables, meeting the constant space requirement
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with array [1, 3, 4, 2, 2]
, where n = 4
(array has 5 elements).
Initial Setup:
- Search range:
[1, 4]
(possible values) - We need to find which value appears twice
Binary Search Process:
Iteration 1:
mid = (1 + 4) / 2 = 2
- Count elements ≤ 2: We find
1
and2, 2
→ count = 3 - Is
3 > 2
? Yes! So duplicate is in range[1, 2]
- Update:
right = 2
Iteration 2:
- New range:
[1, 2]
mid = (1 + 2) / 2 = 1
- Count elements ≤ 1: We find only
1
→ count = 1 - Is
1 > 1
? No! So duplicate is in range[2, 2]
- Update:
left = 2
Convergence:
left = right = 2
- The duplicate number is
2
Verification: Let's verify our logic at each step:
- When checking
mid = 2
: We counted 3 elements (1, 2, 2) that are ≤ 2. Without duplicates, there should only be 2 elements (1, 2). The excess tells us the duplicate is somewhere in[1, 2]
. - When checking
mid = 1
: We counted 1 element (1) that is ≤ 1. This is exactly what we'd expect without duplicates, so the duplicate must be greater than 1.
The algorithm efficiently narrows down the search space by half each time, converging on the duplicate value 2
without modifying the array or using extra space beyond a few variables.
Solution Implementation
1class Solution:
2 def findDuplicate(self, nums: List[int]) -> int:
3 """
4 Find the duplicate number in an array containing n+1 integers where each integer
5 is between 1 and n inclusive, with exactly one duplicate.
6
7 Uses binary search on the value range [1, n] to find the duplicate.
8 The key insight: if a number x is the duplicate (or greater than it),
9 then count of numbers <= x will be greater than x.
10 """
11
12 def count_less_than_or_equal(target: int) -> bool:
13 """
14 Check if the count of numbers less than or equal to target exceeds target.
15 This property holds true for the duplicate number and all numbers after it.
16
17 Args:
18 target: The number to check
19
20 Returns:
21 True if count of numbers <= target is greater than target
22 """
23 count = sum(1 for num in nums if num <= target)
24 return count > target
25
26 # Binary search on the range [1, n] where n = len(nums) - 1
27 # We're searching for the first position where count_less_than_or_equal returns True
28 # range(len(nums)) creates [0, 1, 2, ..., len(nums)-1]
29 # Since we need [1, 2, ..., n], bisect_left will return the correct 1-indexed result
30 from bisect import bisect_left
31 duplicate = bisect_left(range(len(nums)), True, key=count_less_than_or_equal)
32
33 return duplicate
34
1class Solution {
2 /**
3 * Finds the duplicate number in an array containing n+1 integers where each integer
4 * is between 1 and n (inclusive). Uses binary search on the value range.
5 *
6 * @param nums Array containing integers with exactly one duplicate
7 * @return The duplicate number
8 */
9 public int findDuplicate(int[] nums) {
10 // Binary search on the range of possible values [1, n]
11 int left = 1;
12 int right = nums.length - 1;
13
14 while (left < right) {
15 // Calculate middle value of the current range
16 int mid = left + (right - left) / 2;
17
18 // Count how many numbers in the array are less than or equal to mid
19 int count = 0;
20 for (int num : nums) {
21 if (num <= mid) {
22 count++;
23 }
24 }
25
26 // By pigeonhole principle: if count > mid, duplicate must be in [left, mid]
27 // Otherwise, duplicate must be in [mid + 1, right]
28 if (count > mid) {
29 // Duplicate is in the lower half
30 right = mid;
31 } else {
32 // Duplicate is in the upper half
33 left = mid + 1;
34 }
35 }
36
37 // When left == right, we've found the duplicate
38 return left;
39 }
40}
41
1class Solution {
2public:
3 int findDuplicate(vector<int>& nums) {
4 // Binary search on the value range [1, n-1]
5 // where n is the size of the array
6 int left = 0;
7 int right = nums.size() - 1;
8
9 while (left < right) {
10 // Calculate the middle value of the current range
11 int mid = (left + right) >> 1; // Equivalent to (left + right) / 2
12
13 // Count how many numbers in the array are less than or equal to mid
14 int count = 0;
15 for (int& value : nums) {
16 count += (value <= mid);
17 }
18
19 // By Pigeonhole Principle:
20 // If count > mid, the duplicate must be in the range [1, mid]
21 // Otherwise, the duplicate must be in the range [mid+1, right]
22 if (count > mid) {
23 // Duplicate is in the left half, narrow the search range
24 right = mid;
25 } else {
26 // Duplicate is in the right half, narrow the search range
27 left = mid + 1;
28 }
29 }
30
31 // When left == right, we've found the duplicate number
32 return left;
33 }
34};
35
1/**
2 * Finds the duplicate number in an array containing n+1 integers where each integer is between 1 and n
3 * Uses binary search on the value range (not array indices) with pigeonhole principle
4 * @param nums - Array containing numbers from 1 to n with one duplicate
5 * @returns The duplicate number in the array
6 */
7function findDuplicate(nums: number[]): number {
8 // Initialize binary search boundaries for the value range [1, n]
9 let left: number = 0;
10 let right: number = nums.length - 1;
11
12 // Binary search on the value range
13 while (left < right) {
14 // Calculate middle value of the current range
15 const middle: number = (left + right) >> 1; // Bit shift for integer division by 2
16
17 // Count how many numbers in the array are less than or equal to middle
18 let count: number = 0;
19 for (const value of nums) {
20 if (value <= middle) {
21 count++;
22 }
23 }
24
25 // By pigeonhole principle: if count > middle, duplicate is in range [1, middle]
26 // Otherwise, duplicate is in range [middle + 1, n]
27 if (count > middle) {
28 // Duplicate is in the lower half
29 right = middle;
30 } else {
31 // Duplicate is in the upper half
32 left = middle + 1;
33 }
34 }
35
36 // When left == right, we've found the duplicate number
37 return left;
38}
39
Time and Space Complexity
The time complexity is O(n × log n)
, where n
is the length of the array nums
. This is because:
- The
bisect_left
function performs binary search on the range[0, len(nums))
, which requiresO(log n)
iterations - For each iteration of the binary search, the key function
f(x)
is called, which computessum(v <= x for v in nums)
, takingO(n)
time to iterate through all elements innums
- Therefore, the total time complexity is
O(n) × O(log n) = O(n × log n)
The space complexity is O(1)
. This is because:
- The binary search only uses a constant amount of extra space for variables
- The function
f(x)
uses a generator expressionsum(v <= x for v in nums)
which doesn't create additional data structures, only iterating through the existing array - No additional arrays or data structures proportional to the input size are created
Common Pitfalls
1. Off-by-One Error in Binary Search Range
The most common pitfall is incorrectly setting up the binary search range. Since the valid numbers are [1, n]
but bisect_left
with range(len(nums))
gives us [0, n]
, the code might return incorrect results.
The Issue:
range(len(nums))
produces[0, 1, 2, ..., n]
- But our valid numbers are
[1, 2, ..., n]
- If the duplicate is
1
, the functionf(0)
would be checked, which doesn't make logical sense
Solution:
Use range(1, len(nums))
to search only valid values:
duplicate = bisect_left(range(1, len(nums)), True, key=count_less_than_or_equal)
2. Incorrect Count Comparison Logic
Another pitfall is using the wrong comparison in the counting function. Some might mistakenly use >=
instead of >
.
The Issue:
# Incorrect - this would fail for non-duplicate numbers
def count_less_than_or_equal(target: int) -> bool:
count = sum(1 for num in nums if num <= target)
return count >= target # Wrong! Should be >
Why it matters:
- For a non-duplicate number
x
, there are exactlyx
numbers≤ x
- Only when there's a duplicate in range
[1, x]
will the count exceedx
- Using
>=
would incorrectly flag all numbers as potential duplicates
3. Using Floyd's Cycle Detection Without Understanding Constraints
Many developers default to Floyd's cycle detection algorithm (tortoise and hare) because it's a classic solution for this problem. However, the given code uses binary search instead.
The Issue: Mixing approaches or trying to "optimize" by switching algorithms mid-implementation:
# Don't mix approaches - stick to one algorithm
def findDuplicate(self, nums: List[int]) -> int:
# Started with binary search setup
def count_less_than_or_equal(target: int) -> bool:
# ... binary search logic
# Then suddenly switching to cycle detection
slow = fast = nums[0] # This creates confusion!
Solution: Commit to one approach. Binary search is more intuitive here and doesn't require treating the array as a linked list.
4. Inefficient Counting Implementation
Using sum()
with a generator expression is elegant but can be misunderstood or implemented inefficiently.
Less Readable Alternative:
def count_less_than_or_equal(target: int) -> bool:
count = len([num for num in nums if num <= target]) # Creates unnecessary list
return count > target
Better Implementation:
def count_less_than_or_equal(target: int) -> bool:
count = 0
for num in nums:
if num <= target:
count += 1
if count > target: # Early termination optimization
return True
return False
This version can return early once we know the count exceeds the target, potentially saving iterations.
Which data structure is used to implement priority queue?
Recommended Readings
Tech Interview Pattern Two Pointers Introduction If you prefer videos here's a super quick introduction to Two Pointers div class responsive iframe iframe src https www youtube com embed xZ4AfXHQ1VQ title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture allowfullscreen iframe div Two pointers is a common interview
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Don’t Miss This!