Facebook Pixel

274. H-Index

MediumArrayCounting SortSorting
Leetcode Link

Problem Description

You are given an array citations where citations[i] represents the number of citations a researcher received for their i-th paper. Your task is to find and return the researcher's h-index.

The h-index is defined as the maximum value of h such that the researcher has published at least h papers that have each been cited at least h times.

For example, if a researcher has 5 papers with citations [3, 0, 6, 1, 5]:

  • They have 3 papers with at least 3 citations each (papers with 3, 6, and 5 citations)
  • They have 2 papers with at least 4 citations each (papers with 6 and 5 citations)
  • They have 2 papers with at least 5 citations each (papers with 6 and 5 citations)
  • They have only 1 paper with at least 6 citations (the paper with 6 citations)

The h-index would be 3 because the researcher has 3 papers with at least 3 citations each, but does not have 4 papers with at least 4 citations each.

The solution approach sorts the citations in descending order, then checks from the highest possible h-value (which is the total number of papers) down to 1. For each potential h-value, it verifies if the h-th most cited paper has at least h citations. The first h-value that satisfies this condition is returned as the h-index.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that if we have h papers with at least h citations each, then when we sort the papers by citation count in descending order, the h-th paper (at index h-1) must have at least h citations.

Think about it this way: if we line up all papers from most cited to least cited, and we want to find the maximum h where we have h papers with at least h citations, we need to find the point where the paper's position number equals or exceeds its citation count.

For example, with sorted citations [6, 5, 3, 1, 0]:

  • Position 1: Paper has 6 citations, we have 1 paper with β‰₯1 citations βœ“
  • Position 2: Paper has 5 citations, we have 2 papers with β‰₯2 citations βœ“
  • Position 3: Paper has 3 citations, we have 3 papers with β‰₯3 citations βœ“
  • Position 4: Paper has 1 citation, we DON'T have 4 papers with β‰₯4 citations βœ—

The h-index is 3, the last position where the condition holds.

By sorting in descending order, we guarantee that if the paper at position h has at least h citations, then all papers before it (which have more citations) also satisfy the requirement. This allows us to check from the largest possible h value downward and return the first valid h we find, ensuring we get the maximum h-index.

The beauty of this approach is that once sorted, we only need to check if citations[h-1] >= h for each potential h value, making the logic clean and straightforward.

Solution Approach

The implementation follows a straightforward sorting-based approach:

  1. Sort the citations array in descending order: We use citations.sort(reverse=True) to arrange the papers from highest to lowest citation count. This ordering is crucial because it allows us to easily check if we have at least h papers with h or more citations.

  2. Iterate through possible h-values from largest to smallest: We loop through h values starting from len(citations) (the total number of papers, which is the maximum possible h-index) down to 1. We use range(len(citations), 0, -1) to generate this sequence.

  3. Check the h-index condition: For each potential h value, we check if citations[h-1] >= h. This condition verifies that the h-th most cited paper (at index h-1 due to 0-based indexing) has at least h citations. If this paper has at least h citations, then all papers before it in the sorted array also have at least h citations, meaning we have found at least h papers with h or more citations each.

  4. Return the first valid h-value: Since we're checking from the largest possible h downward, the first h that satisfies our condition is the maximum h-index. We immediately return this value.

  5. Handle the edge case: If no valid h is found in the loop (meaning even the most cited paper has 0 citations), we return 0.

The time complexity is O(n log n) due to sorting, where n is the number of papers. The space complexity is O(1) if we're allowed to modify the input array in-place, or O(n) if we need to preserve the original array.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with a small example: citations = [3, 0, 6, 1, 5]

Step 1: Sort in descending order After sorting: [6, 5, 3, 1, 0]

Step 2: Check each possible h-value from largest to smallest

  • h = 5: Check if citations[4] >= 5

    • citations[4] = 0, and 0 < 5 βœ—
    • We don't have 5 papers with at least 5 citations
  • h = 4: Check if citations[3] >= 4

    • citations[3] = 1, and 1 < 4 βœ—
    • We don't have 4 papers with at least 4 citations
  • h = 3: Check if citations[2] >= 3

    • citations[2] = 3, and 3 >= 3 βœ“
    • We have 3 papers (with citations 6, 5, 3) that each have at least 3 citations
    • Return 3

The h-index is 3 because:

  • Papers at indices 0, 1, 2 (citations: 6, 5, 3) all have β‰₯ 3 citations
  • This gives us exactly 3 papers with at least 3 citations each
  • We cannot have h = 4 because the 4th paper only has 1 citation

The key insight: when sorted descending, if the paper at position h has at least h citations, then all papers before it (with higher citations) automatically satisfy the requirement, giving us exactly h qualifying papers.

Solution Implementation

1class Solution:
2    def hIndex(self, citations: List[int]) -> int:
3        # Sort citations in descending order to check from highest to lowest
4        citations.sort(reverse=True)
5      
6        # Try each possible h-index value from largest to smallest
7        # h represents the potential h-index value we're testing
8        for h in range(len(citations), 0, -1):
9            # Check if the h-th paper (1-indexed) has at least h citations
10            # citations[h-1] is the h-th paper when counting from 1
11            # If the h-th paper has >= h citations, then all papers before it
12            # also have >= h citations (since sorted in descending order)
13            if citations[h - 1] >= h:
14                return h
15      
16        # If no valid h-index found, return 0
17        return 0
18
1class Solution {
2    public int hIndex(int[] citations) {
3        // Sort the citations array in ascending order
4        Arrays.sort(citations);
5      
6        // Get the total number of papers
7        int n = citations.length;
8      
9        // Try each possible h-index value from highest to lowest
10        for (int h = n; h > 0; h--) {
11            // Check if we have at least h papers with h or more citations
12            // citations[n - h] is the h-th paper from the end (after sorting)
13            // If this paper has at least h citations, then all papers after it
14            // (which are h papers total) also have at least h citations
15            if (citations[n - h] >= h) {
16                return h;
17            }
18        }
19      
20        // If no valid h-index found, return 0
21        return 0;
22    }
23}
24
1class Solution {
2public:
3    int hIndex(vector<int>& citations) {
4        // Sort citations in descending order to check from highest to lowest
5        sort(citations.rbegin(), citations.rend());
6      
7        // Try each possible h-index value from highest (n) to lowest (1)
8        // h represents the potential h-index value we're testing
9        for (int h = citations.size(); h > 0; --h) {
10            // Check if we have at least h papers with h or more citations
11            // citations[h-1] is the h-th paper (0-indexed)
12            // If the h-th paper has at least h citations, then all papers
13            // before it (1st to h-th) also have at least h citations
14            if (citations[h - 1] >= h) {
15                return h;
16            }
17        }
18      
19        // No valid h-index found, return 0
20        return 0;
21    }
22};
23
1/**
2 * Calculates the H-index of a researcher based on their citation counts.
3 * H-index is the largest number h such that the researcher has h papers 
4 * with at least h citations each.
5 * 
6 * @param citations - Array of citation counts for each paper
7 * @returns The H-index value
8 */
9function hIndex(citations: number[]): number {
10    // Sort citations in descending order to check from highest citations first
11    citations.sort((a, b) => b - a);
12  
13    // Try each possible h-value from largest to smallest
14    // h represents the potential H-index value
15    for (let h = citations.length; h > 0; h--) {
16        // Check if the h-th paper (0-indexed at h-1) has at least h citations
17        // If true, we found the maximum h where h papers have >= h citations
18        if (citations[h - 1] >= h) {
19            return h;
20        }
21    }
22  
23    // If no valid H-index found, return 0
24    return 0;
25}
26

Time and Space Complexity

Time Complexity: O(n log n)

The dominant operation in this code is the sorting of the citations array using citations.sort(reverse=True), which has a time complexity of O(n log n) where n is the length of the citations array. The subsequent for loop iterates at most n times with O(1) operations in each iteration, contributing O(n) to the overall complexity. Since O(n log n) + O(n) = O(n log n), the overall time complexity is O(n log n).

Space Complexity: O(log n)

Python's Timsort algorithm (used in the sort() method) requires O(log n) space for its recursion stack in the worst case. The sort operation is performed in-place, modifying the original array without creating a new one. The only additional variables used are the loop variable h and the implicit iterator, which require O(1) space. Therefore, the overall space complexity is O(log n).

Common Pitfalls

Pitfall 1: Index Out of Bounds Error

The most common mistake in this implementation is accessing citations[h-1] when h equals len(citations). When we have n papers and check for h = n, we're trying to access citations[n-1], which is valid. However, many developers mistakenly think they need to check if there are at least h papers with h citations by verifying all h papers, leading to attempted access of indices beyond the array bounds.

Incorrect approach:

for h in range(len(citations), 0, -1):
    # Trying to check if ALL h papers have at least h citations
    valid = True
    for i in range(h):  # This will cause index error when h = len(citations)
        if citations[i] < h:
            valid = False
            break
    if valid:
        return h

Correct approach: Simply check the h-th paper (at index h-1). If it has at least h citations, then due to the descending sort, all papers before it also have at least h citations.

Pitfall 2: Misunderstanding the H-Index Definition

Another common error is checking citations[h] >= h instead of citations[h-1] >= h. This happens when developers forget about 0-based indexing or misinterpret which paper to check.

Incorrect:

for h in range(len(citations), 0, -1):
    if h < len(citations) and citations[h] >= h:  # Wrong index!
        return h

Correct:

for h in range(len(citations), 0, -1):
    if citations[h-1] >= h:  # h-1 because of 0-based indexing
        return h

Pitfall 3: Not Handling Edge Cases Properly

Some implementations fail when all papers have 0 citations or when the array is empty.

Better solution with explicit edge case handling:

class Solution:
    def hIndex(self, citations: List[int]) -> int:
        if not citations:  # Handle empty array
            return 0
          
        citations.sort(reverse=True)
      
        # Alternative approach that's more intuitive
        h_index = 0
        for i, citation_count in enumerate(citations):
            # i+1 represents the number of papers we've seen so far
            # We can have an h-index of at most min(i+1, citation_count)
            h_index = max(h_index, min(i + 1, citation_count))
      
        return h_index

This alternative approach iterates through the sorted citations and keeps track of the maximum valid h-index found so far, which some find more intuitive than checking from largest to smallest h values.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

What's the output of running the following function using input [30, 20, 10, 100, 33, 12]?

1def fun(arr: List[int]) -> List[int]:
2    import heapq
3    heapq.heapify(arr)
4    res = []
5    for i in range(3):
6        res.append(heapq.heappop(arr))
7    return res
8
1public static int[] fun(int[] arr) {
2    int[] res = new int[3];
3    PriorityQueue<Integer> heap = new PriorityQueue<>();
4    for (int i = 0; i < arr.length; i++) {
5        heap.add(arr[i]);
6    }
7    for (int i = 0; i < 3; i++) {
8        res[i] = heap.poll();
9    }
10    return res;
11}
12
1class HeapItem {
2    constructor(item, priority = item) {
3        this.item = item;
4        this.priority = priority;
5    }
6}
7
8class MinHeap {
9    constructor() {
10        this.heap = [];
11    }
12
13    push(node) {
14        // insert the new node at the end of the heap array
15        this.heap.push(node);
16        // find the correct position for the new node
17        this.bubble_up();
18    }
19
20    bubble_up() {
21        let index = this.heap.length - 1;
22
23        while (index > 0) {
24            const element = this.heap[index];
25            const parentIndex = Math.floor((index - 1) / 2);
26            const parent = this.heap[parentIndex];
27
28            if (parent.priority <= element.priority) break;
29            // if the parent is bigger than the child then swap the parent and child
30            this.heap[index] = parent;
31            this.heap[parentIndex] = element;
32            index = parentIndex;
33        }
34    }
35
36    pop() {
37        const min = this.heap[0];
38        this.heap[0] = this.heap[this.size() - 1];
39        this.heap.pop();
40        this.bubble_down();
41        return min;
42    }
43
44    bubble_down() {
45        let index = 0;
46        let min = index;
47        const n = this.heap.length;
48
49        while (index < n) {
50            const left = 2 * index + 1;
51            const right = left + 1;
52
53            if (left < n && this.heap[left].priority < this.heap[min].priority) {
54                min = left;
55            }
56            if (right < n && this.heap[right].priority < this.heap[min].priority) {
57                min = right;
58            }
59            if (min === index) break;
60            [this.heap[min], this.heap[index]] = [this.heap[index], this.heap[min]];
61            index = min;
62        }
63    }
64
65    peek() {
66        return this.heap[0];
67    }
68
69    size() {
70        return this.heap.length;
71    }
72}
73
74function fun(arr) {
75    const heap = new MinHeap();
76    for (const x of arr) {
77        heap.push(new HeapItem(x));
78    }
79    const res = [];
80    for (let i = 0; i < 3; i++) {
81        res.push(heap.pop().item);
82    }
83    return res;
84}
85

Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More