Facebook Pixel

287. Find the Duplicate Number

Problem Description

You are given an array nums containing n + 1 integers, where each integer is in the range [1, n] inclusive.

The array has exactly one repeated number that appears more than once. Your task is to find and return this duplicate number.

Constraints:

  • You cannot modify the input array nums
  • You must use only constant extra space (O(1) space complexity)

Example understanding: If n = 4, the array has 5 elements, and each element is between 1 and 4. Since there are 5 elements but only 4 possible distinct values, by the Pigeonhole Principle, at least one number must appear twice.

The solution uses binary search on the value range [1, n] rather than on array indices. For each mid-value x, it counts how many elements in the array are less than or equal to x. If this count is greater than x, it means there must be a duplicate in the range [1, x]. Otherwise, the duplicate is in the range [x+1, n].

The key insight is that without a duplicate, there should be exactly x numbers that are ≤ x. If we count more than x numbers that are ≤ x, then at least one number in the range [1, x] appears multiple times.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key observation starts with understanding what happens in a normal scenario without duplicates. If we have numbers from 1 to n with no duplicates, then for any number x in this range, there should be exactly x numbers that are less than or equal to x. For example, if x = 3, there should be exactly 3 numbers: 1, 2, 3.

But what happens when there's a duplicate? If a number in the range [1, x] appears twice, then when we count how many numbers are ≤ x, we'll get more than x. This creates a pattern:

  • For values below the duplicate: count(≤ x) == x (normal, no excess)
  • For the duplicate value and above: count(≤ x) > x (excess due to duplicate)

This creates a false, false, ..., true, true pattern where feasible(x) = count(≤ x) > x.

For example, in [1, 3, 4, 2, 2] (duplicate is 2):

  • x = 1: count = 1, 1 > 1 is false
  • x = 2: count = 3, 3 > 2 is true (the duplicate!)
  • x = 3: count = 4, 4 > 3 is true
  • x = 4: count = 5, 5 > 4 is true

We want to find the first true - the first value where the count exceeds the expected amount. This is exactly what our binary search template does!

Solution Implementation

1class Solution:
2    def findDuplicate(self, nums: List[int]) -> int:
3        """
4        Find the duplicate number using the binary search template.
5        Feasible condition: count of numbers <= mid > mid
6        """
7        n = len(nums) - 1  # n is the max value (array has n+1 elements)
8
9        # Binary search template on value range [1, n]
10        left, right = 1, n
11        first_true_index = -1
12
13        while left <= right:
14            mid = (left + right) // 2
15
16            # Count how many numbers are <= mid
17            count = sum(1 for num in nums if num <= mid)
18
19            # Feasible: is there excess? (duplicate is <= mid)
20            if count > mid:
21                first_true_index = mid
22                right = mid - 1  # Search for smaller duplicate
23            else:
24                left = mid + 1
25
26        return first_true_index
27
1class Solution {
2    /**
3     * Find the duplicate number using the binary search template.
4     * Feasible condition: count of numbers <= mid > mid
5     */
6    public int findDuplicate(int[] nums) {
7        int n = nums.length - 1;  // n is the max value
8
9        // Binary search template on value range [1, n]
10        int left = 1;
11        int right = n;
12        int firstTrueIndex = -1;
13
14        while (left <= right) {
15            int mid = left + (right - left) / 2;
16
17            // Count how many numbers are <= mid
18            int count = 0;
19            for (int num : nums) {
20                if (num <= mid) {
21                    count++;
22                }
23            }
24
25            // Feasible: is there excess?
26            if (count > mid) {
27                firstTrueIndex = mid;
28                right = mid - 1;  // Search for smaller duplicate
29            } else {
30                left = mid + 1;
31            }
32        }
33
34        return firstTrueIndex;
35    }
36}
37
1class Solution {
2public:
3    int findDuplicate(vector<int>& nums) {
4        int n = nums.size() - 1;  // n is the max value
5
6        // Binary search template on value range [1, n]
7        int left = 1;
8        int right = n;
9        int firstTrueIndex = -1;
10
11        while (left <= right) {
12            int mid = left + (right - left) / 2;
13
14            // Count how many numbers are <= mid
15            int count = 0;
16            for (int num : nums) {
17                if (num <= mid) {
18                    count++;
19                }
20            }
21
22            // Feasible: is there excess?
23            if (count > mid) {
24                firstTrueIndex = mid;
25                right = mid - 1;  // Search for smaller duplicate
26            } else {
27                left = mid + 1;
28            }
29        }
30
31        return firstTrueIndex;
32    }
33};
34
1/**
2 * Find the duplicate number using the binary search template.
3 * Feasible condition: count of numbers <= mid > mid
4 */
5function findDuplicate(nums: number[]): number {
6    const n: number = nums.length - 1;  // n is the max value
7
8    // Binary search template on value range [1, n]
9    let left: number = 1;
10    let right: number = n;
11    let firstTrueIndex: number = -1;
12
13    while (left <= right) {
14        const mid: number = Math.floor((left + right) / 2);
15
16        // Count how many numbers are <= mid
17        let count: number = 0;
18        for (const num of nums) {
19            if (num <= mid) {
20                count++;
21            }
22        }
23
24        // Feasible: is there excess?
25        if (count > mid) {
26            firstTrueIndex = mid;
27            right = mid - 1;  // Search for smaller duplicate
28        } else {
29            left = mid + 1;
30        }
31    }
32
33    return firstTrueIndex;
34}
35

Solution Approach

We use the standard binary search template to find the duplicate. The key insight is defining the feasible function:

Feasible Function: feasible(x) = count of numbers ≤ x > x

This returns true when there's excess (meaning the duplicate is ≤ x), creating a false, false, ..., true, true pattern.

Template Implementation:

left, right = 1, n
first_true_index = -1

while left <= right:
    mid = (left + right) // 2
    count = sum(1 for num in nums if num <= mid)
    if count > mid:  # feasible condition
        first_true_index = mid
        right = mid - 1  # search for smaller duplicate
    else:
        left = mid + 1

return first_true_index

How it works:

  1. Initialize left = 1, right = n (value range, not indices), and first_true_index = -1
  2. Use while left <= right loop
  3. Count how many numbers are ≤ mid
  4. If count > mid, we found excess, so first_true_index = mid and search left with right = mid - 1
  5. If count ≤ mid, no excess yet, search right with left = mid + 1
  6. Return first_true_index as the duplicate number

Time and Space Complexity:

  • Time: O(n log n) - Binary search takes O(log n) iterations, each counting in O(n)
  • Space: O(1) - Only using a few variables

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with array [1, 3, 4, 2, 2] using the binary search template.

Initial Setup:

  • left = 1, right = 4 (value range)
  • first_true_index = -1
  • Feasible condition: count(≤ mid) > mid

Iteration 1:

  • left = 1, right = 4
  • mid = (1 + 4) // 2 = 2
  • Count elements ≤ 2: 1, 2, 2 → count = 3
  • Is 3 > 2? Yes! (feasible)
  • Update: first_true_index = 2, right = mid - 1 = 1

Iteration 2:

  • left = 1, right = 1
  • mid = (1 + 1) // 2 = 1
  • Count elements ≤ 1: 1 → count = 1
  • Is 1 > 1? No! (not feasible)
  • Update: left = mid + 1 = 2

Termination:

  • left = 2 > right = 1, exit loop
  • Return first_true_index = 2

Verification:

  • At mid = 2: count = 3 > 2, meaning there's excess in [1, 2], so the duplicate is 2 or less
  • At mid = 1: count = 1 = 1, no excess, so the duplicate is greater than 1
  • Therefore, the duplicate is exactly 2 ✓

Time and Space Complexity

The time complexity is O(n × log n), where n is the length of the array nums. This is because:

  • The bisect_left function performs binary search on the range [0, len(nums)), which requires O(log n) iterations
  • For each iteration of the binary search, the key function f(x) is called, which computes sum(v <= x for v in nums), taking O(n) time to iterate through all elements in nums
  • Therefore, the total time complexity is O(n) × O(log n) = O(n × log n)

The space complexity is O(1). This is because:

  • The binary search only uses a constant amount of extra space for variables
  • The function f(x) uses a generator expression sum(v <= x for v in nums) which doesn't create additional data structures, only iterating through the existing array
  • No additional arrays or data structures proportional to the input size are created

Common Pitfalls

1. Using Wrong Binary Search Template Variant

Pitfall: Using while left < right with right = mid instead of the standard template with while left <= right and right = mid - 1.

# WRONG - different template variant
while left < right:
    mid = (left + right) // 2
    if count > mid:
        right = mid
    else:
        left = mid + 1
return left

Solution: Use the standard binary search template with first_true_index:

# CORRECT - standard template
left, right = 1, n
first_true_index = -1

while left <= right:
    mid = (left + right) // 2
    count = sum(1 for num in nums if num <= mid)
    if count > mid:
        first_true_index = mid
        right = mid - 1
    else:
        left = mid + 1

return first_true_index

2. Wrong Search Range (Indices vs Values)

Pitfall: Using array indices [0, n] instead of value range [1, n].

# WRONG - searching indices
left, right = 0, len(nums) - 1

Solution: Search on the value range since we're looking for which value is duplicated:

# CORRECT - searching values
n = len(nums) - 1  # max value is n
left, right = 1, n

3. Incorrect Count Comparison

Pitfall: Using count >= mid instead of count > mid.

# WRONG - this would incorrectly flag non-duplicates
if count >= mid:  # Should be >

Why it matters:

  • For a non-duplicate value x, count(≤ x) = x exactly
  • Only when there's a duplicate does count(≤ x) > x
  • Using >= incorrectly flags all values as potential duplicates

4. Confusing with Floyd's Cycle Detection

Pitfall: This problem has two classic solutions - binary search and Floyd's cycle detection. Don't mix them.

Solution: The binary search approach (shown here) is more intuitive and directly uses the template. Floyd's cycle detection is a different approach that treats the array as a linked list.

Loading...
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which technique can we use to find the middle of a linked list?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More