274. H-Index
Problem Description
You are given an array citations
where citations[i]
represents the number of citations a researcher received for their i-th paper. Your task is to find and return the researcher's h-index.
The h-index is defined as the maximum value of h
such that the researcher has published at least h
papers that have each been cited at least h
times.
For example, if a researcher has 5 papers with citations [3, 0, 6, 1, 5]
:
- They have 3 papers with at least 3 citations each (papers with 3, 6, and 5 citations)
- They have 2 papers with at least 4 citations each (papers with 6 and 5 citations)
- They have 2 papers with at least 5 citations each (papers with 6 and 5 citations)
- They have only 1 paper with at least 6 citations (the paper with 6 citations)
The h-index would be 3 because the researcher has 3 papers with at least 3 citations each, but does not have 4 papers with at least 4 citations each.
The solution approach sorts the citations in descending order, then checks from the highest possible h-value (which is the total number of papers) down to 1. For each potential h-value, it verifies if the h-th most cited paper has at least h
citations. The first h-value that satisfies this condition is returned as the h-index.
Intuition
The key insight is that if we have h
papers with at least h
citations each, then when we sort the papers by citation count in descending order, the h-th paper (at index h-1
) must have at least h
citations.
Think about it this way: if we line up all papers from most cited to least cited, and we want to find the maximum h
where we have h
papers with at least h
citations, we need to find the point where the paper's position number equals or exceeds its citation count.
For example, with sorted citations [6, 5, 3, 1, 0]
:
- Position 1: Paper has 6 citations, we have 1 paper with β₯1 citations β
- Position 2: Paper has 5 citations, we have 2 papers with β₯2 citations β
- Position 3: Paper has 3 citations, we have 3 papers with β₯3 citations β
- Position 4: Paper has 1 citation, we DON'T have 4 papers with β₯4 citations β
The h-index is 3, the last position where the condition holds.
By sorting in descending order, we guarantee that if the paper at position h
has at least h
citations, then all papers before it (which have more citations) also satisfy the requirement. This allows us to check from the largest possible h
value downward and return the first valid h
we find, ensuring we get the maximum h-index.
The beauty of this approach is that once sorted, we only need to check if citations[h-1] >= h
for each potential h
value, making the logic clean and straightforward.
Solution Approach
The implementation follows a straightforward sorting-based approach:
-
Sort the citations array in descending order: We use
citations.sort(reverse=True)
to arrange the papers from highest to lowest citation count. This ordering is crucial because it allows us to easily check if we have at leasth
papers withh
or more citations. -
Iterate through possible h-values from largest to smallest: We loop through
h
values starting fromlen(citations)
(the total number of papers, which is the maximum possible h-index) down to 1. We userange(len(citations), 0, -1)
to generate this sequence. -
Check the h-index condition: For each potential
h
value, we check ifcitations[h-1] >= h
. This condition verifies that the h-th most cited paper (at indexh-1
due to 0-based indexing) has at leasth
citations. If this paper has at leasth
citations, then all papers before it in the sorted array also have at leasth
citations, meaning we have found at leasth
papers withh
or more citations each. -
Return the first valid h-value: Since we're checking from the largest possible
h
downward, the firsth
that satisfies our condition is the maximum h-index. We immediately return this value. -
Handle the edge case: If no valid
h
is found in the loop (meaning even the most cited paper has 0 citations), we return 0.
The time complexity is O(n log n)
due to sorting, where n
is the number of papers. The space complexity is O(1)
if we're allowed to modify the input array in-place, or O(n)
if we need to preserve the original array.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with a small example: citations = [3, 0, 6, 1, 5]
Step 1: Sort in descending order
After sorting: [6, 5, 3, 1, 0]
Step 2: Check each possible h-value from largest to smallest
-
h = 5: Check if
citations[4] >= 5
citations[4] = 0
, and0 < 5
β- We don't have 5 papers with at least 5 citations
-
h = 4: Check if
citations[3] >= 4
citations[3] = 1
, and1 < 4
β- We don't have 4 papers with at least 4 citations
-
h = 3: Check if
citations[2] >= 3
citations[2] = 3
, and3 >= 3
β- We have 3 papers (with citations 6, 5, 3) that each have at least 3 citations
- Return 3
The h-index is 3 because:
- Papers at indices 0, 1, 2 (citations: 6, 5, 3) all have β₯ 3 citations
- This gives us exactly 3 papers with at least 3 citations each
- We cannot have h = 4 because the 4th paper only has 1 citation
The key insight: when sorted descending, if the paper at position h has at least h citations, then all papers before it (with higher citations) automatically satisfy the requirement, giving us exactly h qualifying papers.
Solution Implementation
1class Solution:
2 def hIndex(self, citations: List[int]) -> int:
3 # Sort citations in descending order to check from highest to lowest
4 citations.sort(reverse=True)
5
6 # Try each possible h-index value from largest to smallest
7 # h represents the potential h-index value we're testing
8 for h in range(len(citations), 0, -1):
9 # Check if the h-th paper (1-indexed) has at least h citations
10 # citations[h-1] is the h-th paper when counting from 1
11 # If the h-th paper has >= h citations, then all papers before it
12 # also have >= h citations (since sorted in descending order)
13 if citations[h - 1] >= h:
14 return h
15
16 # If no valid h-index found, return 0
17 return 0
18
1class Solution {
2 public int hIndex(int[] citations) {
3 // Sort the citations array in ascending order
4 Arrays.sort(citations);
5
6 // Get the total number of papers
7 int n = citations.length;
8
9 // Try each possible h-index value from highest to lowest
10 for (int h = n; h > 0; h--) {
11 // Check if we have at least h papers with h or more citations
12 // citations[n - h] is the h-th paper from the end (after sorting)
13 // If this paper has at least h citations, then all papers after it
14 // (which are h papers total) also have at least h citations
15 if (citations[n - h] >= h) {
16 return h;
17 }
18 }
19
20 // If no valid h-index found, return 0
21 return 0;
22 }
23}
24
1class Solution {
2public:
3 int hIndex(vector<int>& citations) {
4 // Sort citations in descending order to check from highest to lowest
5 sort(citations.rbegin(), citations.rend());
6
7 // Try each possible h-index value from highest (n) to lowest (1)
8 // h represents the potential h-index value we're testing
9 for (int h = citations.size(); h > 0; --h) {
10 // Check if we have at least h papers with h or more citations
11 // citations[h-1] is the h-th paper (0-indexed)
12 // If the h-th paper has at least h citations, then all papers
13 // before it (1st to h-th) also have at least h citations
14 if (citations[h - 1] >= h) {
15 return h;
16 }
17 }
18
19 // No valid h-index found, return 0
20 return 0;
21 }
22};
23
1/**
2 * Calculates the H-index of a researcher based on their citation counts.
3 * H-index is the largest number h such that the researcher has h papers
4 * with at least h citations each.
5 *
6 * @param citations - Array of citation counts for each paper
7 * @returns The H-index value
8 */
9function hIndex(citations: number[]): number {
10 // Sort citations in descending order to check from highest citations first
11 citations.sort((a, b) => b - a);
12
13 // Try each possible h-value from largest to smallest
14 // h represents the potential H-index value
15 for (let h = citations.length; h > 0; h--) {
16 // Check if the h-th paper (0-indexed at h-1) has at least h citations
17 // If true, we found the maximum h where h papers have >= h citations
18 if (citations[h - 1] >= h) {
19 return h;
20 }
21 }
22
23 // If no valid H-index found, return 0
24 return 0;
25}
26
Time and Space Complexity
Time Complexity: O(n log n)
The dominant operation in this code is the sorting of the citations array using citations.sort(reverse=True)
, which has a time complexity of O(n log n)
where n
is the length of the citations array. The subsequent for loop iterates at most n
times with O(1)
operations in each iteration, contributing O(n)
to the overall complexity. Since O(n log n) + O(n) = O(n log n)
, the overall time complexity is O(n log n)
.
Space Complexity: O(log n)
Python's Timsort algorithm (used in the sort()
method) requires O(log n)
space for its recursion stack in the worst case. The sort operation is performed in-place, modifying the original array without creating a new one. The only additional variables used are the loop variable h
and the implicit iterator, which require O(1)
space. Therefore, the overall space complexity is O(log n)
.
Common Pitfalls
Pitfall 1: Index Out of Bounds Error
The most common mistake in this implementation is accessing citations[h-1]
when h
equals len(citations)
. When we have n
papers and check for h = n
, we're trying to access citations[n-1]
, which is valid. However, many developers mistakenly think they need to check if there are at least h
papers with h
citations by verifying all h
papers, leading to attempted access of indices beyond the array bounds.
Incorrect approach:
for h in range(len(citations), 0, -1):
# Trying to check if ALL h papers have at least h citations
valid = True
for i in range(h): # This will cause index error when h = len(citations)
if citations[i] < h:
valid = False
break
if valid:
return h
Correct approach: Simply check the h-th paper (at index h-1). If it has at least h citations, then due to the descending sort, all papers before it also have at least h citations.
Pitfall 2: Misunderstanding the H-Index Definition
Another common error is checking citations[h] >= h
instead of citations[h-1] >= h
. This happens when developers forget about 0-based indexing or misinterpret which paper to check.
Incorrect:
for h in range(len(citations), 0, -1):
if h < len(citations) and citations[h] >= h: # Wrong index!
return h
Correct:
for h in range(len(citations), 0, -1):
if citations[h-1] >= h: # h-1 because of 0-based indexing
return h
Pitfall 3: Not Handling Edge Cases Properly
Some implementations fail when all papers have 0 citations or when the array is empty.
Better solution with explicit edge case handling:
class Solution:
def hIndex(self, citations: List[int]) -> int:
if not citations: # Handle empty array
return 0
citations.sort(reverse=True)
# Alternative approach that's more intuitive
h_index = 0
for i, citation_count in enumerate(citations):
# i+1 represents the number of papers we've seen so far
# We can have an h-index of at most min(i+1, citation_count)
h_index = max(h_index, min(i + 1, citation_count))
return h_index
This alternative approach iterates through the sorted citations and keeps track of the maximum valid h-index found so far, which some find more intuitive than checking from largest to smallest h values.
What's the output of running the following function using input [30, 20, 10, 100, 33, 12]
?
1def fun(arr: List[int]) -> List[int]:
2 import heapq
3 heapq.heapify(arr)
4 res = []
5 for i in range(3):
6 res.append(heapq.heappop(arr))
7 return res
8
1public static int[] fun(int[] arr) {
2 int[] res = new int[3];
3 PriorityQueue<Integer> heap = new PriorityQueue<>();
4 for (int i = 0; i < arr.length; i++) {
5 heap.add(arr[i]);
6 }
7 for (int i = 0; i < 3; i++) {
8 res[i] = heap.poll();
9 }
10 return res;
11}
12
1class HeapItem {
2 constructor(item, priority = item) {
3 this.item = item;
4 this.priority = priority;
5 }
6}
7
8class MinHeap {
9 constructor() {
10 this.heap = [];
11 }
12
13 push(node) {
14 // insert the new node at the end of the heap array
15 this.heap.push(node);
16 // find the correct position for the new node
17 this.bubble_up();
18 }
19
20 bubble_up() {
21 let index = this.heap.length - 1;
22
23 while (index > 0) {
24 const element = this.heap[index];
25 const parentIndex = Math.floor((index - 1) / 2);
26 const parent = this.heap[parentIndex];
27
28 if (parent.priority <= element.priority) break;
29 // if the parent is bigger than the child then swap the parent and child
30 this.heap[index] = parent;
31 this.heap[parentIndex] = element;
32 index = parentIndex;
33 }
34 }
35
36 pop() {
37 const min = this.heap[0];
38 this.heap[0] = this.heap[this.size() - 1];
39 this.heap.pop();
40 this.bubble_down();
41 return min;
42 }
43
44 bubble_down() {
45 let index = 0;
46 let min = index;
47 const n = this.heap.length;
48
49 while (index < n) {
50 const left = 2 * index + 1;
51 const right = left + 1;
52
53 if (left < n && this.heap[left].priority < this.heap[min].priority) {
54 min = left;
55 }
56 if (right < n && this.heap[right].priority < this.heap[min].priority) {
57 min = right;
58 }
59 if (min === index) break;
60 [this.heap[min], this.heap[index]] = [this.heap[index], this.heap[min]];
61 index = min;
62 }
63 }
64
65 peek() {
66 return this.heap[0];
67 }
68
69 size() {
70 return this.heap.length;
71 }
72}
73
74function fun(arr) {
75 const heap = new MinHeap();
76 for (const x of arr) {
77 heap.push(new HeapItem(x));
78 }
79 const res = [];
80 for (let i = 0; i < 3; i++) {
81 res.push(heap.pop().item);
82 }
83 return res;
84}
85
Recommended Readings
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Donβt Miss This!