287. Find the Duplicate Number
Problem Description
You are given an array nums containing n + 1 integers, where each integer is in the range [1, n] inclusive.
The array has exactly one repeated number that appears more than once. Your task is to find and return this duplicate number.
Constraints:
- You cannot modify the input array
nums - You must use only constant extra space (O(1) space complexity)
Example understanding:
If n = 4, the array has 5 elements, and each element is between 1 and 4. Since there are 5 elements but only 4 possible distinct values, by the Pigeonhole Principle, at least one number must appear twice.
The solution uses binary search on the value range [1, n] rather than on array indices. For each mid-value x, it counts how many elements in the array are less than or equal to x. If this count is greater than x, it means there must be a duplicate in the range [1, x]. Otherwise, the duplicate is in the range [x+1, n].
The key insight is that without a duplicate, there should be exactly x numbers that are ≤ x. If we count more than x numbers that are ≤ x, then at least one number in the range [1, x] appears multiple times.
Intuition
The key observation starts with understanding what happens in a normal scenario without duplicates. If we have numbers from 1 to n with no duplicates, then for any number x in this range, there should be exactly x numbers that are less than or equal to x. For example, if x = 3, there should be exactly 3 numbers: 1, 2, 3.
But what happens when there's a duplicate? If a number in the range [1, x] appears twice, then when we count how many numbers are ≤ x, we'll get more than x. This creates a pattern:
- For values below the duplicate:
count(≤ x) == x(normal, no excess) - For the duplicate value and above:
count(≤ x) > x(excess due to duplicate)
This creates a false, false, ..., true, true pattern where feasible(x) = count(≤ x) > x.
For example, in [1, 3, 4, 2, 2] (duplicate is 2):
x = 1: count = 1,1 > 1is falsex = 2: count = 3,3 > 2is true (the duplicate!)x = 3: count = 4,4 > 3is truex = 4: count = 5,5 > 4is true
We want to find the first true - the first value where the count exceeds the expected amount. This is exactly what our binary search template does!
Solution Implementation
1class Solution:
2 def findDuplicate(self, nums: List[int]) -> int:
3 """
4 Find the duplicate number using the binary search template.
5 Feasible condition: count of numbers <= mid > mid
6 """
7 n = len(nums) - 1 # n is the max value (array has n+1 elements)
8
9 # Binary search template on value range [1, n]
10 left, right = 1, n
11 first_true_index = -1
12
13 while left <= right:
14 mid = (left + right) // 2
15
16 # Count how many numbers are <= mid
17 count = sum(1 for num in nums if num <= mid)
18
19 # Feasible: is there excess? (duplicate is <= mid)
20 if count > mid:
21 first_true_index = mid
22 right = mid - 1 # Search for smaller duplicate
23 else:
24 left = mid + 1
25
26 return first_true_index
271class Solution {
2 /**
3 * Find the duplicate number using the binary search template.
4 * Feasible condition: count of numbers <= mid > mid
5 */
6 public int findDuplicate(int[] nums) {
7 int n = nums.length - 1; // n is the max value
8
9 // Binary search template on value range [1, n]
10 int left = 1;
11 int right = n;
12 int firstTrueIndex = -1;
13
14 while (left <= right) {
15 int mid = left + (right - left) / 2;
16
17 // Count how many numbers are <= mid
18 int count = 0;
19 for (int num : nums) {
20 if (num <= mid) {
21 count++;
22 }
23 }
24
25 // Feasible: is there excess?
26 if (count > mid) {
27 firstTrueIndex = mid;
28 right = mid - 1; // Search for smaller duplicate
29 } else {
30 left = mid + 1;
31 }
32 }
33
34 return firstTrueIndex;
35 }
36}
371class Solution {
2public:
3 int findDuplicate(vector<int>& nums) {
4 int n = nums.size() - 1; // n is the max value
5
6 // Binary search template on value range [1, n]
7 int left = 1;
8 int right = n;
9 int firstTrueIndex = -1;
10
11 while (left <= right) {
12 int mid = left + (right - left) / 2;
13
14 // Count how many numbers are <= mid
15 int count = 0;
16 for (int num : nums) {
17 if (num <= mid) {
18 count++;
19 }
20 }
21
22 // Feasible: is there excess?
23 if (count > mid) {
24 firstTrueIndex = mid;
25 right = mid - 1; // Search for smaller duplicate
26 } else {
27 left = mid + 1;
28 }
29 }
30
31 return firstTrueIndex;
32 }
33};
341/**
2 * Find the duplicate number using the binary search template.
3 * Feasible condition: count of numbers <= mid > mid
4 */
5function findDuplicate(nums: number[]): number {
6 const n: number = nums.length - 1; // n is the max value
7
8 // Binary search template on value range [1, n]
9 let left: number = 1;
10 let right: number = n;
11 let firstTrueIndex: number = -1;
12
13 while (left <= right) {
14 const mid: number = Math.floor((left + right) / 2);
15
16 // Count how many numbers are <= mid
17 let count: number = 0;
18 for (const num of nums) {
19 if (num <= mid) {
20 count++;
21 }
22 }
23
24 // Feasible: is there excess?
25 if (count > mid) {
26 firstTrueIndex = mid;
27 right = mid - 1; // Search for smaller duplicate
28 } else {
29 left = mid + 1;
30 }
31 }
32
33 return firstTrueIndex;
34}
35Solution Approach
We use the standard binary search template to find the duplicate. The key insight is defining the feasible function:
Feasible Function:
feasible(x) = count of numbers ≤ x > x
This returns true when there's excess (meaning the duplicate is ≤ x), creating a false, false, ..., true, true pattern.
Template Implementation:
left, right = 1, n
first_true_index = -1
while left <= right:
mid = (left + right) // 2
count = sum(1 for num in nums if num <= mid)
if count > mid: # feasible condition
first_true_index = mid
right = mid - 1 # search for smaller duplicate
else:
left = mid + 1
return first_true_index
How it works:
- Initialize
left = 1,right = n(value range, not indices), andfirst_true_index = -1 - Use
while left <= rightloop - Count how many numbers are
≤ mid - If
count > mid, we found excess, sofirst_true_index = midand search left withright = mid - 1 - If
count ≤ mid, no excess yet, search right withleft = mid + 1 - Return
first_true_indexas the duplicate number
Time and Space Complexity:
- Time:
O(n log n)- Binary search takesO(log n)iterations, each counting inO(n) - Space:
O(1)- Only using a few variables
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with array [1, 3, 4, 2, 2] using the binary search template.
Initial Setup:
left = 1,right = 4(value range)first_true_index = -1- Feasible condition:
count(≤ mid) > mid
Iteration 1:
left = 1,right = 4mid = (1 + 4) // 2 = 2- Count elements ≤ 2:
1, 2, 2→ count = 3 - Is
3 > 2? Yes! (feasible) - Update:
first_true_index = 2,right = mid - 1 = 1
Iteration 2:
left = 1,right = 1mid = (1 + 1) // 2 = 1- Count elements ≤ 1:
1→ count = 1 - Is
1 > 1? No! (not feasible) - Update:
left = mid + 1 = 2
Termination:
left = 2 > right = 1, exit loop- Return
first_true_index = 2
Verification:
- At
mid = 2: count = 3 > 2, meaning there's excess in[1, 2], so the duplicate is 2 or less - At
mid = 1: count = 1 = 1, no excess, so the duplicate is greater than 1 - Therefore, the duplicate is exactly 2 ✓
Time and Space Complexity
The time complexity is O(n × log n), where n is the length of the array nums. This is because:
- The
bisect_leftfunction performs binary search on the range[0, len(nums)), which requiresO(log n)iterations - For each iteration of the binary search, the key function
f(x)is called, which computessum(v <= x for v in nums), takingO(n)time to iterate through all elements innums - Therefore, the total time complexity is
O(n) × O(log n) = O(n × log n)
The space complexity is O(1). This is because:
- The binary search only uses a constant amount of extra space for variables
- The function
f(x)uses a generator expressionsum(v <= x for v in nums)which doesn't create additional data structures, only iterating through the existing array - No additional arrays or data structures proportional to the input size are created
Common Pitfalls
1. Using Wrong Binary Search Template Variant
Pitfall: Using while left < right with right = mid instead of the standard template with while left <= right and right = mid - 1.
# WRONG - different template variant while left < right: mid = (left + right) // 2 if count > mid: right = mid else: left = mid + 1 return left
Solution: Use the standard binary search template with first_true_index:
# CORRECT - standard template
left, right = 1, n
first_true_index = -1
while left <= right:
mid = (left + right) // 2
count = sum(1 for num in nums if num <= mid)
if count > mid:
first_true_index = mid
right = mid - 1
else:
left = mid + 1
return first_true_index
2. Wrong Search Range (Indices vs Values)
Pitfall: Using array indices [0, n] instead of value range [1, n].
# WRONG - searching indices
left, right = 0, len(nums) - 1
Solution: Search on the value range since we're looking for which value is duplicated:
# CORRECT - searching values
n = len(nums) - 1 # max value is n
left, right = 1, n
3. Incorrect Count Comparison
Pitfall: Using count >= mid instead of count > mid.
# WRONG - this would incorrectly flag non-duplicates if count >= mid: # Should be >
Why it matters:
- For a non-duplicate value
x, count(≤ x) = x exactly - Only when there's a duplicate does count(≤ x) > x
- Using
>=incorrectly flags all values as potential duplicates
4. Confusing with Floyd's Cycle Detection
Pitfall: This problem has two classic solutions - binary search and Floyd's cycle detection. Don't mix them.
Solution: The binary search approach (shown here) is more intuitive and directly uses the template. Floyd's cycle detection is a different approach that treats the array as a linked list.
Which technique can we use to find the middle of a linked list?
Recommended Readings
Tech Interview Pattern Two Pointers Introduction If you prefer videos here's a super quick introduction to Two Pointers div class responsive iframe iframe src https www youtube com embed xZ4AfXHQ1VQ title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture allowfullscreen iframe div Two pointers is a common interview
https assets algo monster cover_photos Binary_Search svg Binary Search Intuition Binary search is an efficient array search algorithm It works by narrowing down the search range by half each time If you have looked up a word in a physical dictionary you've already used binary search in real life Let's
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Don’t Miss This!