287. Find the Duplicate Number

MediumBit ManipulationArrayTwo PointersBinary Search

Problem Description

You are given an array nums containing n + 1 integers, where each integer is in the range [1, n] inclusive.

The array has exactly one repeated number that appears more than once. Your task is to find and return this duplicate number.

Constraints:

You cannot modify the input array nums
You must use only constant extra space (O(1) space complexity)

Example understanding: If n = 4, the array has 5 elements, and each element is between 1 and 4. Since there are 5 elements but only 4 possible distinct values, by the Pigeonhole Principle, at least one number must appear twice.

The solution uses binary search on the value range [1, n] rather than on array indices. For each mid-value x, it counts how many elements in the array are less than or equal to x. If this count is greater than x, it means there must be a duplicate in the range [1, x]. Otherwise, the duplicate is in the range [x+1, n].

The key insight is that without a duplicate, there should be exactly x numbers that are ≤ x. If we count more than x numbers that are ≤ x, then at least one number in the range [1, x] appears multiple times.

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Intuition

The key observation starts with understanding what happens in a normal scenario without duplicates. If we have numbers from 1 to n with no duplicates, then for any number x in this range, there should be exactly x numbers that are less than or equal to x. For example, if x = 3, there should be exactly 3 numbers: 1, 2, 3.

But what happens when there's a duplicate? If a number in the range [1, x] appears twice, then when we count how many numbers are ≤ x, we'll get more than x. This is because the duplicate creates an "excess" count.

Think of it like this: imagine you're looking for a duplicate in [1, 2, 3, 3, 4]. If you ask "how many numbers are ≤ 3?", you get 4 numbers (1, 2, 3, 3), but in a normal case without duplicates, there should only be 3. This excess tells us the duplicate must be in the range [1, 3].

This observation naturally leads to a binary search approach on the value range rather than array indices. We can repeatedly divide the range [1, n] in half and check which half contains the duplicate by counting elements. If count(≤ mid) > mid, the duplicate is in [1, mid]; otherwise, it's in [mid+1, n].

The beauty of this approach is that we're using the mathematical property of the problem itself - the constraint that numbers are in range [1, n] - rather than needing to track array positions or modify the array. We're essentially using the Pigeonhole Principle in a binary search fashion to narrow down where the duplicate must be.

Solution Approach

The solution implements binary search on the value range [1, n] to find the duplicate number. Here's how the implementation works:

Binary Search Setup: We use bisect_left from Python's bisect module to perform the binary search. The search range is range(len(nums)), which represents values from 0 to n (since the array has n+1 elements).

Key Function: The helper function f(x) is the core logic:

def f(x: int) -> bool:
    return sum(v <= x for v in nums) > x

This function returns True if the count of elements ≤ x is greater than x, indicating that the duplicate is in the range [1, x].

How Binary Search Works Here:

bisect_left searches for the leftmost position where f(x) returns True
It starts with the middle value of the range
If f(mid) is True, it means the duplicate is in [1, mid], so we search the left half
If f(mid) is False, the duplicate is in [mid+1, n], so we search the right half
The search continues until we converge on the exact duplicate value

Example Walkthrough: For array [1, 3, 4, 2, 2]:

Check x = 2: count of elements ≤ 2 is 3 (which are 1, 2, 2). Since 3 > 2, f(2) = True
Check x = 1: count of elements ≤ 1 is 1. Since 1 = 1, f(1) = False
Binary search narrows down to x = 2 as the duplicate

Time and Space Complexity:

Time: O(n log n) - Binary search takes O(log n) iterations, and each iteration counts elements in O(n)
Space: O(1) - Only using a few variables, meeting the constant space requirement

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with array [1, 3, 4, 2, 2], where n = 4 (array has 5 elements).

Initial Setup:

Search range: [1, 4] (possible values)
We need to find which value appears twice

Binary Search Process:

Iteration 1:

mid = (1 + 4) / 2 = 2
Count elements ≤ 2: We find 1 and 2, 2 → count = 3
Is 3 > 2? Yes! So duplicate is in range [1, 2]
Update: right = 2

Iteration 2:

New range: [1, 2]
mid = (1 + 2) / 2 = 1
Count elements ≤ 1: We find only 1 → count = 1
Is 1 > 1? No! So duplicate is in range [2, 2]
Update: left = 2

Convergence:

left = right = 2
The duplicate number is 2

Verification: Let's verify our logic at each step:

When checking mid = 2: We counted 3 elements (1, 2, 2) that are ≤ 2. Without duplicates, there should only be 2 elements (1, 2). The excess tells us the duplicate is somewhere in [1, 2].
When checking mid = 1: We counted 1 element (1) that is ≤ 1. This is exactly what we'd expect without duplicates, so the duplicate must be greater than 1.

The algorithm efficiently narrows down the search space by half each time, converging on the duplicate value 2 without modifying the array or using extra space beyond a few variables.

Solution Implementation

1class Solution:
2    def findDuplicate(self, nums: List[int]) -> int:
3        """
4        Find the duplicate number in an array containing n+1 integers where each integer 
5        is between 1 and n inclusive, with exactly one duplicate.
6      
7        Uses binary search on the value range [1, n] to find the duplicate.
8        The key insight: if a number x is the duplicate (or greater than it), 
9        then count of numbers <= x will be greater than x.
10        """
11      
12        def count_less_than_or_equal(target: int) -> bool:
13            """
14            Check if the count of numbers less than or equal to target exceeds target.
15            This property holds true for the duplicate number and all numbers after it.
16          
17            Args:
18                target: The number to check
19              
20            Returns:
21                True if count of numbers <= target is greater than target
22            """
23            count = sum(1 for num in nums if num <= target)
24            return count > target
25      
26        # Binary search on the range [1, n] where n = len(nums) - 1
27        # We're searching for the first position where count_less_than_or_equal returns True
28        # range(len(nums)) creates [0, 1, 2, ..., len(nums)-1]
29        # Since we need [1, 2, ..., n], bisect_left will return the correct 1-indexed result
30        from bisect import bisect_left
31        duplicate = bisect_left(range(len(nums)), True, key=count_less_than_or_equal)
32      
33        return duplicate
34

1class Solution {
2    /**
3     * Finds the duplicate number in an array containing n+1 integers where each integer 
4     * is between 1 and n (inclusive). Uses binary search on the value range.
5     * 
6     * @param nums Array containing integers with exactly one duplicate
7     * @return The duplicate number
8     */
9    public int findDuplicate(int[] nums) {
10        // Binary search on the range of possible values [1, n]
11        int left = 1;
12        int right = nums.length - 1;
13      
14        while (left < right) {
15            // Calculate middle value of the current range
16            int mid = left + (right - left) / 2;
17          
18            // Count how many numbers in the array are less than or equal to mid
19            int count = 0;
20            for (int num : nums) {
21                if (num <= mid) {
22                    count++;
23                }
24            }
25          
26            // By pigeonhole principle: if count > mid, duplicate must be in [left, mid]
27            // Otherwise, duplicate must be in [mid + 1, right]
28            if (count > mid) {
29                // Duplicate is in the lower half
30                right = mid;
31            } else {
32                // Duplicate is in the upper half
33                left = mid + 1;
34            }
35        }
36      
37        // When left == right, we've found the duplicate
38        return left;
39    }
40}
41

1class Solution {
2public:
3    int findDuplicate(vector<int>& nums) {
4        // Binary search on the value range [1, n-1]
5        // where n is the size of the array
6        int left = 0;
7        int right = nums.size() - 1;
8      
9        while (left < right) {
10            // Calculate the middle value of the current range
11            int mid = (left + right) >> 1;  // Equivalent to (left + right) / 2
12          
13            // Count how many numbers in the array are less than or equal to mid
14            int count = 0;
15            for (int& value : nums) {
16                count += (value <= mid);
17            }
18          
19            // By Pigeonhole Principle:
20            // If count > mid, the duplicate must be in the range [1, mid]
21            // Otherwise, the duplicate must be in the range [mid+1, right]
22            if (count > mid) {
23                // Duplicate is in the left half, narrow the search range
24                right = mid;
25            } else {
26                // Duplicate is in the right half, narrow the search range
27                left = mid + 1;
28            }
29        }
30      
31        // When left == right, we've found the duplicate number
32        return left;
33    }
34};
35

1/**
2 * Finds the duplicate number in an array containing n+1 integers where each integer is between 1 and n
3 * Uses binary search on the value range (not array indices) with pigeonhole principle
4 * @param nums - Array containing numbers from 1 to n with one duplicate
5 * @returns The duplicate number in the array
6 */
7function findDuplicate(nums: number[]): number {
8    // Initialize binary search boundaries for the value range [1, n]
9    let left: number = 0;
10    let right: number = nums.length - 1;
11  
12    // Binary search on the value range
13    while (left < right) {
14        // Calculate middle value of the current range
15        const middle: number = (left + right) >> 1;  // Bit shift for integer division by 2
16      
17        // Count how many numbers in the array are less than or equal to middle
18        let count: number = 0;
19        for (const value of nums) {
20            if (value <= middle) {
21                count++;
22            }
23        }
24      
25        // By pigeonhole principle: if count > middle, duplicate is in range [1, middle]
26        // Otherwise, duplicate is in range [middle + 1, n]
27        if (count > middle) {
28            // Duplicate is in the lower half
29            right = middle;
30        } else {
31            // Duplicate is in the upper half
32            left = middle + 1;
33        }
34    }
35  
36    // When left == right, we've found the duplicate number
37    return left;
38}
39

Time and Space Complexity

The time complexity is O(n × log n), where n is the length of the array nums. This is because:

The bisect_left function performs binary search on the range [0, len(nums)), which requires O(log n) iterations
For each iteration of the binary search, the key function f(x) is called, which computes sum(v <= x for v in nums), taking O(n) time to iterate through all elements in nums
Therefore, the total time complexity is O(n) × O(log n) = O(n × log n)

The space complexity is O(1). This is because:

The binary search only uses a constant amount of extra space for variables
The function f(x) uses a generator expression sum(v <= x for v in nums) which doesn't create additional data structures, only iterating through the existing array
No additional arrays or data structures proportional to the input size are created

Common Pitfalls

1. Off-by-One Error in Binary Search Range

The most common pitfall is incorrectly setting up the binary search range. Since the valid numbers are [1, n] but bisect_left with range(len(nums)) gives us [0, n], the code might return incorrect results.

The Issue:

range(len(nums)) produces [0, 1, 2, ..., n]
But our valid numbers are [1, 2, ..., n]
If the duplicate is 1, the function f(0) would be checked, which doesn't make logical sense

Solution: Use range(1, len(nums)) to search only valid values:

duplicate = bisect_left(range(1, len(nums)), True, key=count_less_than_or_equal)

2. Incorrect Count Comparison Logic

Another pitfall is using the wrong comparison in the counting function. Some might mistakenly use >= instead of >.

The Issue:

# Incorrect - this would fail for non-duplicate numbers
def count_less_than_or_equal(target: int) -> bool:
    count = sum(1 for num in nums if num <= target)
    return count >= target  # Wrong! Should be >

Why it matters:

For a non-duplicate number x, there are exactly x numbers ≤ x
Only when there's a duplicate in range [1, x] will the count exceed x
Using >= would incorrectly flag all numbers as potential duplicates

3. Using Floyd's Cycle Detection Without Understanding Constraints

Many developers default to Floyd's cycle detection algorithm (tortoise and hare) because it's a classic solution for this problem. However, the given code uses binary search instead.

The Issue: Mixing approaches or trying to "optimize" by switching algorithms mid-implementation:

# Don't mix approaches - stick to one algorithm
def findDuplicate(self, nums: List[int]) -> int:
    # Started with binary search setup
    def count_less_than_or_equal(target: int) -> bool:
        # ... binary search logic
  
    # Then suddenly switching to cycle detection
    slow = fast = nums[0]  # This creates confusion!

Solution: Commit to one approach. Binary search is more intuitive here and doesn't require treating the array as a linked list.

4. Inefficient Counting Implementation

Using sum() with a generator expression is elegant but can be misunderstood or implemented inefficiently.

Less Readable Alternative:

def count_less_than_or_equal(target: int) -> bool:
    count = len([num for num in nums if num <= target])  # Creates unnecessary list
    return count > target

Better Implementation:

def count_less_than_or_equal(target: int) -> bool:
    count = 0
    for num in nums:
        if num <= target:
            count += 1
            if count > target:  # Early termination optimization
                return True
    return False

This version can return early once we know the count exceeds the target, potentially saving iterations.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which type of traversal does breadth first search do?

Pre-order traversal

Post-order traversal

In-order traversal

Level-order traversal

I don't know