Facebook Pixel

287. Find the Duplicate Number

Problem Description

You are given an array nums containing n + 1 integers, where each integer is in the range [1, n] inclusive.

The array has exactly one repeated number that appears more than once. Your task is to find and return this duplicate number.

Constraints:

  • You cannot modify the input array nums
  • You must use only constant extra space (O(1) space complexity)

Example understanding: If n = 4, the array has 5 elements, and each element is between 1 and 4. Since there are 5 elements but only 4 possible distinct values, by the Pigeonhole Principle, at least one number must appear twice.

The solution uses binary search on the value range [1, n] rather than on array indices. For each mid-value x, it counts how many elements in the array are less than or equal to x. If this count is greater than x, it means there must be a duplicate in the range [1, x]. Otherwise, the duplicate is in the range [x+1, n].

The key insight is that without a duplicate, there should be exactly x numbers that are ≤ x. If we count more than x numbers that are ≤ x, then at least one number in the range [1, x] appears multiple times.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key observation starts with understanding what happens in a normal scenario without duplicates. If we have numbers from 1 to n with no duplicates, then for any number x in this range, there should be exactly x numbers that are less than or equal to x. For example, if x = 3, there should be exactly 3 numbers: 1, 2, 3.

But what happens when there's a duplicate? If a number in the range [1, x] appears twice, then when we count how many numbers are ≤ x, we'll get more than x. This creates a pattern:

  • For values below the duplicate: count(≤ x) == x (normal, no excess)
  • For the duplicate value and above: count(≤ x) > x (excess due to duplicate)

This creates a false, false, ..., true, true pattern where feasible(x) = count(≤ x) > x.

For example, in [1, 3, 4, 2, 2] (duplicate is 2):

  • x = 1: count = 1, 1 > 1 is false
  • x = 2: count = 3, 3 > 2 is true (the duplicate!)
  • x = 3: count = 4, 4 > 3 is true
  • x = 4: count = 5, 5 > 4 is true

We want to find the first true - the first value where the count exceeds the expected amount. This is exactly what our binary search template does!

Solution Approach

We use the standard binary search template to find the duplicate. The key insight is defining the feasible function:

Feasible Function: feasible(x) = count of numbers ≤ x > x

This returns true when there's excess (meaning the duplicate is ≤ x), creating a false, false, ..., true, true pattern.

Template Implementation:

left, right = 1, n
first_true_index = -1

while left <= right:
    mid = (left + right) // 2
    count = sum(1 for num in nums if num <= mid)
    if count > mid:  # feasible condition
        first_true_index = mid
        right = mid - 1  # search for smaller duplicate
    else:
        left = mid + 1

return first_true_index

How it works:

  1. Initialize left = 1, right = n (value range, not indices), and first_true_index = -1
  2. Use while left <= right loop
  3. Count how many numbers are ≤ mid
  4. If count > mid, we found excess, so first_true_index = mid and search left with right = mid - 1
  5. If count ≤ mid, no excess yet, search right with left = mid + 1
  6. Return first_true_index as the duplicate number

Time and Space Complexity:

  • Time: O(n log n) - Binary search takes O(log n) iterations, each counting in O(n)
  • Space: O(1) - Only using a few variables

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with array [1, 3, 4, 2, 2] using the binary search template.

Initial Setup:

  • left = 1, right = 4 (value range)
  • first_true_index = -1
  • Feasible condition: count(≤ mid) > mid

Iteration 1:

  • left = 1, right = 4
  • mid = (1 + 4) // 2 = 2
  • Count elements ≤ 2: 1, 2, 2 → count = 3
  • Is 3 > 2? Yes! (feasible)
  • Update: first_true_index = 2, right = mid - 1 = 1

Iteration 2:

  • left = 1, right = 1
  • mid = (1 + 1) // 2 = 1
  • Count elements ≤ 1: 1 → count = 1
  • Is 1 > 1? No! (not feasible)
  • Update: left = mid + 1 = 2

Termination:

  • left = 2 > right = 1, exit loop
  • Return first_true_index = 2

Verification:

  • At mid = 2: count = 3 > 2, meaning there's excess in [1, 2], so the duplicate is 2 or less
  • At mid = 1: count = 1 = 1, no excess, so the duplicate is greater than 1
  • Therefore, the duplicate is exactly 2 ✓

Solution Implementation

1class Solution:
2    def findDuplicate(self, nums: List[int]) -> int:
3        """
4        Find the duplicate number using the binary search template.
5        Feasible condition: count of numbers <= mid > mid
6        """
7        n = len(nums) - 1  # n is the max value (array has n+1 elements)
8
9        # Binary search template on value range [1, n]
10        left, right = 1, n
11        first_true_index = -1
12
13        while left <= right:
14            mid = (left + right) // 2
15
16            # Count how many numbers are <= mid
17            count = sum(1 for num in nums if num <= mid)
18
19            # Feasible: is there excess? (duplicate is <= mid)
20            if count > mid:
21                first_true_index = mid
22                right = mid - 1  # Search for smaller duplicate
23            else:
24                left = mid + 1
25
26        return first_true_index
27
1class Solution {
2    /**
3     * Find the duplicate number using the binary search template.
4     * Feasible condition: count of numbers <= mid > mid
5     */
6    public int findDuplicate(int[] nums) {
7        int n = nums.length - 1;  // n is the max value
8
9        // Binary search template on value range [1, n]
10        int left = 1;
11        int right = n;
12        int firstTrueIndex = -1;
13
14        while (left <= right) {
15            int mid = left + (right - left) / 2;
16
17            // Count how many numbers are <= mid
18            int count = 0;
19            for (int num : nums) {
20                if (num <= mid) {
21                    count++;
22                }
23            }
24
25            // Feasible: is there excess?
26            if (count > mid) {
27                firstTrueIndex = mid;
28                right = mid - 1;  // Search for smaller duplicate
29            } else {
30                left = mid + 1;
31            }
32        }
33
34        return firstTrueIndex;
35    }
36}
37
1class Solution {
2public:
3    int findDuplicate(vector<int>& nums) {
4        int n = nums.size() - 1;  // n is the max value
5
6        // Binary search template on value range [1, n]
7        int left = 1;
8        int right = n;
9        int firstTrueIndex = -1;
10
11        while (left <= right) {
12            int mid = left + (right - left) / 2;
13
14            // Count how many numbers are <= mid
15            int count = 0;
16            for (int num : nums) {
17                if (num <= mid) {
18                    count++;
19                }
20            }
21
22            // Feasible: is there excess?
23            if (count > mid) {
24                firstTrueIndex = mid;
25                right = mid - 1;  // Search for smaller duplicate
26            } else {
27                left = mid + 1;
28            }
29        }
30
31        return firstTrueIndex;
32    }
33};
34
1/**
2 * Find the duplicate number using the binary search template.
3 * Feasible condition: count of numbers <= mid > mid
4 */
5function findDuplicate(nums: number[]): number {
6    const n: number = nums.length - 1;  // n is the max value
7
8    // Binary search template on value range [1, n]
9    let left: number = 1;
10    let right: number = n;
11    let firstTrueIndex: number = -1;
12
13    while (left <= right) {
14        const mid: number = Math.floor((left + right) / 2);
15
16        // Count how many numbers are <= mid
17        let count: number = 0;
18        for (const num of nums) {
19            if (num <= mid) {
20                count++;
21            }
22        }
23
24        // Feasible: is there excess?
25        if (count > mid) {
26            firstTrueIndex = mid;
27            right = mid - 1;  // Search for smaller duplicate
28        } else {
29            left = mid + 1;
30        }
31    }
32
33    return firstTrueIndex;
34}
35

Time and Space Complexity

The time complexity is O(n × log n), where n is the length of the array nums. This is because:

  • The bisect_left function performs binary search on the range [0, len(nums)), which requires O(log n) iterations
  • For each iteration of the binary search, the key function f(x) is called, which computes sum(v <= x for v in nums), taking O(n) time to iterate through all elements in nums
  • Therefore, the total time complexity is O(n) × O(log n) = O(n × log n)

The space complexity is O(1). This is because:

  • The binary search only uses a constant amount of extra space for variables
  • The function f(x) uses a generator expression sum(v <= x for v in nums) which doesn't create additional data structures, only iterating through the existing array
  • No additional arrays or data structures proportional to the input size are created

Common Pitfalls

1. Using Wrong Binary Search Template Variant

Pitfall: Using while left < right with right = mid instead of the standard template with while left <= right and right = mid - 1.

# WRONG - different template variant
while left < right:
    mid = (left + right) // 2
    if count > mid:
        right = mid
    else:
        left = mid + 1
return left

Solution: Use the standard binary search template with first_true_index:

# CORRECT - standard template
left, right = 1, n
first_true_index = -1

while left <= right:
    mid = (left + right) // 2
    count = sum(1 for num in nums if num <= mid)
    if count > mid:
        first_true_index = mid
        right = mid - 1
    else:
        left = mid + 1

return first_true_index

2. Wrong Search Range (Indices vs Values)

Pitfall: Using array indices [0, n] instead of value range [1, n].

# WRONG - searching indices
left, right = 0, len(nums) - 1

Solution: Search on the value range since we're looking for which value is duplicated:

# CORRECT - searching values
n = len(nums) - 1  # max value is n
left, right = 1, n

3. Incorrect Count Comparison

Pitfall: Using count >= mid instead of count > mid.

# WRONG - this would incorrectly flag non-duplicates
if count >= mid:  # Should be >

Why it matters:

  • For a non-duplicate value x, count(≤ x) = x exactly
  • Only when there's a duplicate does count(≤ x) > x
  • Using >= incorrectly flags all values as potential duplicates

4. Confusing with Floyd's Cycle Detection

Pitfall: This problem has two classic solutions - binary search and Floyd's cycle detection. Don't mix them.

Solution: The binary search approach (shown here) is more intuitive and directly uses the template. Floyd's cycle detection is a different approach that treats the array as a linked list.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

In a binary min heap, the maximum element can be found in:


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More