275. H-Index II
Problem Description
You are given an array citations
where citations[i]
represents the number of citations a researcher received for their i-th paper. The array is sorted in non-descending order (ascending order).
Your task is to find the researcher's h-index.
The h-index is defined as the maximum value of h
such that the researcher has published at least h
papers that have each been cited at least h
times.
For example:
- If a researcher has 5 papers with citations [0, 1, 3, 5, 6], their h-index is 3 because:
- They have 3 papers with at least 3 citations each (papers with 3, 5, and 6 citations)
- They don't have 4 papers with at least 4 citations each
The key insight for the binary search approach is that we're looking for the maximum h
where at least h
papers have h
or more citations. Since the array is sorted, if we want to check if there are at least mid
papers with at least mid
citations, we need to verify that citations[n - mid] >= mid
. This works because:
- The last
mid
papers in the sorted array are from indexn - mid
ton - 1
- If
citations[n - mid] >= mid
, then all papers from indexn - mid
onwards have at leastmid
citations - This means we have at least
mid
papers with at leastmid
citations
The algorithm must run in logarithmic time complexity, which suggests using binary search to find the optimal h-index value.
Intuition
The key observation is that the h-index has a monotonic property. If a researcher has an h-index of x
, it means they have at least x
papers with x
or more citations. This automatically implies they also have at least y
papers with y
or more citations for any y < x
.
Think of it this way: if you have 5 papers with at least 5 citations each, you definitely also have 4 papers with at least 4 citations each, 3 papers with at least 3 citations each, and so on.
This monotonic property suggests we can use binary search. We're essentially searching for the maximum value of h
where the condition "at least h
papers have h
or more citations" is still true.
Since the array is sorted in ascending order, checking if we have at least h
papers with h
or more citations becomes straightforward:
- The papers with the highest citations are at the end of the array
- If we want
h
papers with at leasth
citations, we look at the lasth
papers - The last
h
papers start from indexn - h
- The minimum citation among these last
h
papers iscitations[n - h]
- So we need
citations[n - h] >= h
The binary search works by:
- Starting with a search range from 0 to n (the maximum possible h-index is n)
- Checking the middle value
mid
to see if it's a valid h-index - If
citations[n - mid] >= mid
, it meansmid
is a valid h-index, but there might be a larger one, so we search the right half - Otherwise,
mid
is too large to be an h-index, so we search the left half - We continue until we find the maximum valid h-index
Solution Approach
We implement a binary search algorithm to find the maximum h-index value. The search space is from 0 to n
(the length of the citations array), as the h-index cannot exceed the total number of papers.
The implementation follows these steps:
-
Initialize search boundaries: Set
left = 0
andright = n
, wheren
is the length of the citations array. -
Binary search loop: While
left < right
:- Calculate the midpoint:
mid = (left + right + 1) >> 1
- The
>> 1
operation is a bitwise right shift, equivalent to dividing by 2 - We use
(left + right + 1)
instead of(left + right)
to ensure the search converges correctly when searching for the maximum value
- The
- Calculate the midpoint:
-
Check validity condition: For each
mid
value, we check if it's a valid h-index:- We need at least
mid
papers with at leastmid
citations - Since the array is sorted, the last
mid
papers have the highest citations - These papers start from index
n - mid
- If
citations[n - mid] >= mid
, then all papers from indexn - mid
ton - 1
have at leastmid
citations
- We need at least
-
Update search boundaries:
- If
citations[n - mid] >= mid
: This meansmid
is a valid h-index, but there might be a larger one. Updateleft = mid
to search for larger values - Otherwise:
mid
is too large to be a valid h-index. Updateright = mid - 1
to search for smaller values
- If
-
Return result: When the loop ends,
left
contains the maximum h-index value.
The algorithm achieves O(log n)
time complexity through binary search, meeting the problem's requirement for logarithmic time. The space complexity is O(1)
as we only use a constant amount of extra space for the pointers.
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with a concrete example: citations = [0, 1, 3, 5, 6]
We need to find the maximum h-index where at least h
papers have h
or more citations.
Initial Setup:
- Array length
n = 5
- Search range:
left = 0
,right = 5
Iteration 1:
- Calculate
mid = (0 + 5 + 1) >> 1 = 3
- Check if h-index of 3 is valid:
- We need at least 3 papers with at least 3 citations
- The last 3 papers start at index
n - mid = 5 - 3 = 2
- Check
citations[2] >= 3
: Is3 >= 3
? Yes! - This means papers at indices 2, 3, 4 (with citations 3, 5, 6) all have at least 3 citations
- Since valid, search for larger h-index:
left = 3
,right = 5
Iteration 2:
- Calculate
mid = (3 + 5 + 1) >> 1 = 4
- Check if h-index of 4 is valid:
- We need at least 4 papers with at least 4 citations
- The last 4 papers start at index
n - mid = 5 - 4 = 1
- Check
citations[1] >= 4
: Is1 >= 4
? No! - The paper at index 1 only has 1 citation, which is less than 4
- Since invalid, search for smaller h-index:
left = 3
,right = 3
Termination:
left == right
, so we exit the loop- Return
left = 3
Verification: The h-index is 3 because:
- Papers with citations [3, 5, 6] (3 papers) each have at least 3 citations ✓
- We cannot have h-index of 4 because we don't have 4 papers with at least 4 citations ✗
The binary search efficiently found the maximum h-index in just 2 iterations instead of checking all possible values from 0 to 5.
Solution Implementation
1class Solution:
2 def hIndex(self, citations: List[int]) -> int:
3 """
4 Calculate the H-Index from a sorted list of citations.
5
6 H-Index is defined as the maximum value h such that the researcher has
7 published at least h papers that have each been cited at least h times.
8
9 Args:
10 citations: A sorted list of citation counts in ascending order
11
12 Returns:
13 The H-Index value
14 """
15 n = len(citations)
16
17 # Binary search for the maximum h-index
18 # We're searching for the largest h where citations[n-h] >= h
19 left, right = 0, n
20
21 while left < right:
22 # Calculate mid point, using (left + right + 1) // 2 to bias toward right
23 # This ensures we check the upper half when the range has even length
24 mid = (left + right + 1) // 2
25
26 # Check if we have at least 'mid' papers with >= 'mid' citations
27 # citations[n - mid] is the (n - mid + 1)-th smallest citation count
28 # If this value >= mid, then we have at least mid papers with >= mid citations
29 if citations[n - mid] >= mid:
30 # Valid h-index found, try to find a larger one
31 left = mid
32 else:
33 # Current mid is too large, search in the lower half
34 right = mid - 1
35
36 return left
37
1class Solution {
2 /**
3 * Calculates the H-index using binary search on a sorted citations array.
4 * H-index is defined as the maximum value h such that the researcher has
5 * published at least h papers with h or more citations each.
6 *
7 * @param citations sorted array of citation counts in ascending order
8 * @return the H-index value
9 */
10 public int hIndex(int[] citations) {
11 int arrayLength = citations.length;
12 int leftBoundary = 0;
13 int rightBoundary = arrayLength;
14
15 // Binary search to find the leftmost position where
16 // citations[position] >= papers count from this position to end
17 while (leftBoundary < rightBoundary) {
18 // Calculate middle index using unsigned right shift to avoid overflow
19 int middleIndex = (leftBoundary + rightBoundary) >>> 1;
20
21 // Check if current paper has enough citations
22 // arrayLength - middleIndex represents the number of papers
23 // with citations >= citations[middleIndex]
24 if (citations[middleIndex] >= arrayLength - middleIndex) {
25 // Move right boundary to search in left half
26 rightBoundary = middleIndex;
27 } else {
28 // Move left boundary to search in right half
29 leftBoundary = middleIndex + 1;
30 }
31 }
32
33 // The H-index is the number of papers with sufficient citations
34 return arrayLength - leftBoundary;
35 }
36}
37
1class Solution {
2public:
3 int hIndex(vector<int>& citations) {
4 // Get the size of the citations array
5 int n = citations.size();
6
7 // Initialize binary search boundaries
8 // left: minimum possible h-index (0)
9 // right: maximum possible h-index (n)
10 int left = 0, right = n;
11
12 // Binary search for the maximum h-index
13 while (left < right) {
14 // Calculate mid point (using +1 to bias towards right when even)
15 // This ensures we check the upper half when the range is even
16 int mid = (left + right + 1) >> 1;
17
18 // Check if we have at least 'mid' papers with >= 'mid' citations
19 // citations[n - mid] is the paper at position (n - mid) in sorted array
20 // If this paper has >= mid citations, then all papers from index (n - mid)
21 // to (n - 1) have >= mid citations, giving us exactly 'mid' such papers
22 if (citations[n - mid] >= mid) {
23 // Valid h-index found, try to find a larger one
24 left = mid;
25 } else {
26 // h-index too large, search in the lower half
27 right = mid - 1;
28 }
29 }
30
31 // Return the maximum valid h-index found
32 return left;
33 }
34};
35
1/**
2 * Calculate the H-index from a sorted array of citations.
3 * H-index is defined as the maximum value h such that the researcher has
4 * published at least h papers that have each been cited at least h times.
5 *
6 * @param citations - A sorted array of citation counts in ascending order
7 * @returns The H-index value
8 */
9function hIndex(citations: number[]): number {
10 const totalPapers: number = citations.length;
11
12 // Binary search boundaries
13 // left: minimum possible h-index value
14 // right: maximum possible h-index value
15 let left: number = 0;
16 let right: number = totalPapers;
17
18 // Binary search to find the maximum h-index
19 while (left < right) {
20 // Calculate mid point, using (left + right + 1) >> 1 to round up
21 // This ensures we check the upper half when the range has even length
22 const mid: number = (left + right + 1) >> 1;
23
24 // Check if we have at least 'mid' papers with at least 'mid' citations
25 // citations[totalPapers - mid] is the citation count of the 'mid'-th highest cited paper
26 if (citations[totalPapers - mid] >= mid) {
27 // Valid h-index found, try to find a larger one
28 left = mid;
29 } else {
30 // h-index too large, search in the lower half
31 right = mid - 1;
32 }
33 }
34
35 // Return the maximum valid h-index found
36 return left;
37}
38
Time and Space Complexity
The time complexity is O(log n)
, where n
is the length of the array citations
. This is because the algorithm uses binary search to find the h-index. In each iteration of the while loop, the search space is reduced by half (either left = mid
or right = mid - 1
), resulting in at most log n
iterations.
The space complexity is O(1)
. The algorithm only uses a constant amount of extra space for the variables n
, left
, right
, and mid
, regardless of the input size. No additional data structures that scale with the input are created.
Common Pitfalls
1. Incorrect Midpoint Calculation in Binary Search
Pitfall: Using mid = (left + right) // 2
instead of mid = (left + right + 1) // 2
when searching for the maximum value.
When searching for the maximum valid h-index, if we use the standard midpoint calculation, the binary search can get stuck in an infinite loop. Consider when left = 4
and right = 5
:
- With
mid = (4 + 5) // 2 = 4
, if the condition is satisfied, we setleft = mid = 4
, resulting in no progress - The loop continues indefinitely with the same values
Solution: Use mid = (left + right + 1) // 2
to bias toward the right when the range has even length. This ensures the search space always decreases:
mid = (left + right + 1) // 2 # Correct for finding maximum
2. Array Index Out of Bounds
Pitfall: Not handling edge cases where mid = 0
or mid = n
, leading to invalid array access at citations[n - mid]
.
When mid = 0
, we're checking if there are at least 0 papers with 0+ citations (always true for non-empty arrays). When mid = n
, we check citations[0]
, which might cause confusion about what we're actually validating.
Solution: Add boundary checks or handle the edge cases explicitly:
if mid == 0: return 0 # h-index of 0 is always valid if mid > 0 and citations[n - mid] >= mid: left = mid
3. Misunderstanding the Condition Check
Pitfall: Using citations[mid] >= n - mid
instead of citations[n - mid] >= mid
.
Some might incorrectly think about checking from the beginning of the array. The confusion arises from trying to check "if the paper at position mid
has enough citations for the remaining papers."
Solution: Remember that we need the last mid
papers (highest citations) to each have at least mid
citations:
# Correct: Check if the (n-mid)th paper has at least mid citations if citations[n - mid] >= mid: # We have at least mid papers with >= mid citations
4. Off-by-One Error in Search Range
Pitfall: Initializing right = n - 1
instead of right = n
.
The h-index can be equal to the total number of papers n
. For example, if all 5 papers have 5+ citations each, the h-index is 5. Setting right = n - 1
would miss this valid case.
Solution: Initialize the search range correctly:
left, right = 0, n # Include n as a possible h-index value
What's the output of running the following function using input [30, 20, 10, 100, 33, 12]
?
1def fun(arr: List[int]) -> List[int]:
2 import heapq
3 heapq.heapify(arr)
4 res = []
5 for i in range(3):
6 res.append(heapq.heappop(arr))
7 return res
8
1public static int[] fun(int[] arr) {
2 int[] res = new int[3];
3 PriorityQueue<Integer> heap = new PriorityQueue<>();
4 for (int i = 0; i < arr.length; i++) {
5 heap.add(arr[i]);
6 }
7 for (int i = 0; i < 3; i++) {
8 res[i] = heap.poll();
9 }
10 return res;
11}
12
1class HeapItem {
2 constructor(item, priority = item) {
3 this.item = item;
4 this.priority = priority;
5 }
6}
7
8class MinHeap {
9 constructor() {
10 this.heap = [];
11 }
12
13 push(node) {
14 // insert the new node at the end of the heap array
15 this.heap.push(node);
16 // find the correct position for the new node
17 this.bubble_up();
18 }
19
20 bubble_up() {
21 let index = this.heap.length - 1;
22
23 while (index > 0) {
24 const element = this.heap[index];
25 const parentIndex = Math.floor((index - 1) / 2);
26 const parent = this.heap[parentIndex];
27
28 if (parent.priority <= element.priority) break;
29 // if the parent is bigger than the child then swap the parent and child
30 this.heap[index] = parent;
31 this.heap[parentIndex] = element;
32 index = parentIndex;
33 }
34 }
35
36 pop() {
37 const min = this.heap[0];
38 this.heap[0] = this.heap[this.size() - 1];
39 this.heap.pop();
40 this.bubble_down();
41 return min;
42 }
43
44 bubble_down() {
45 let index = 0;
46 let min = index;
47 const n = this.heap.length;
48
49 while (index < n) {
50 const left = 2 * index + 1;
51 const right = left + 1;
52
53 if (left < n && this.heap[left].priority < this.heap[min].priority) {
54 min = left;
55 }
56 if (right < n && this.heap[right].priority < this.heap[min].priority) {
57 min = right;
58 }
59 if (min === index) break;
60 [this.heap[min], this.heap[index]] = [this.heap[index], this.heap[min]];
61 index = min;
62 }
63 }
64
65 peek() {
66 return this.heap[0];
67 }
68
69 size() {
70 return this.heap.length;
71 }
72}
73
74function fun(arr) {
75 const heap = new MinHeap();
76 for (const x of arr) {
77 heap.push(new HeapItem(x));
78 }
79 const res = [];
80 for (let i = 0; i < 3; i++) {
81 res.push(heap.pop().item);
82 }
83 return res;
84}
85
Recommended Readings
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Don’t Miss This!