274. H-Index

MediumArrayCounting SortSorting

Problem Description

You are given an array citations where citations[i] represents the number of citations a researcher received for their i-th paper. Your task is to find and return the researcher's h-index.

The h-index is defined as the maximum value of h such that the researcher has published at least h papers that have each been cited at least h times.

For example, if a researcher has 5 papers with citations [3, 0, 6, 1, 5]:

They have 3 papers with at least 3 citations each (papers with 3, 6, and 5 citations)
They have 2 papers with at least 4 citations each (papers with 6 and 5 citations)
They have 2 papers with at least 5 citations each (papers with 6 and 5 citations)
They have only 1 paper with at least 6 citations (the paper with 6 citations)

The h-index would be 3 because the researcher has 3 papers with at least 3 citations each, but does not have 4 papers with at least 4 citations each.

The solution approach sorts the citations in descending order, then checks from the highest possible h-value (which is the total number of papers) down to 1. For each potential h-value, it verifies if the h-th most cited paper has at least h citations. The first h-value that satisfies this condition is returned as the h-index.

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Intuition

The key insight is that if we have h papers with at least h citations each, then when we sort the papers by citation count in descending order, the h-th paper (at index h-1) must have at least h citations.

Think about it this way: if we line up all papers from most cited to least cited, and we want to find the maximum h where we have h papers with at least h citations, we need to find the point where the paper's position number equals or exceeds its citation count.

For example, with sorted citations [6, 5, 3, 1, 0]:

Position 1: Paper has 6 citations, we have 1 paper with ≥1 citations ✓
Position 2: Paper has 5 citations, we have 2 papers with ≥2 citations ✓
Position 3: Paper has 3 citations, we have 3 papers with ≥3 citations ✓
Position 4: Paper has 1 citation, we DON'T have 4 papers with ≥4 citations ✗

The h-index is 3, the last position where the condition holds.

By sorting in descending order, we guarantee that if the paper at position h has at least h citations, then all papers before it (which have more citations) also satisfy the requirement. This allows us to check from the largest possible h value downward and return the first valid h we find, ensuring we get the maximum h-index.

The beauty of this approach is that once sorted, we only need to check if citations[h-1] >= h for each potential h value, making the logic clean and straightforward.

Solution Approach

The implementation follows a straightforward sorting-based approach:

Sort the citations array in descending order: We use citations.sort(reverse=True) to arrange the papers from highest to lowest citation count. This ordering is crucial because it allows us to easily check if we have at least h papers with h or more citations.
Iterate through possible h-values from largest to smallest: We loop through h values starting from len(citations) (the total number of papers, which is the maximum possible h-index) down to 1. We use range(len(citations), 0, -1) to generate this sequence.
Check the h-index condition: For each potential h value, we check if citations[h-1] >= h. This condition verifies that the h-th most cited paper (at index h-1 due to 0-based indexing) has at least h citations. If this paper has at least h citations, then all papers before it in the sorted array also have at least h citations, meaning we have found at least h papers with h or more citations each.
Return the first valid h-value: Since we're checking from the largest possible h downward, the first h that satisfies our condition is the maximum h-index. We immediately return this value.
Handle the edge case: If no valid h is found in the loop (meaning even the most cited paper has 0 citations), we return 0.

The time complexity is O(n log n) due to sorting, where n is the number of papers. The space complexity is O(1) if we're allowed to modify the input array in-place, or O(n) if we need to preserve the original array.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with a small example: citations = [3, 0, 6, 1, 5]

Step 1: Sort in descending order After sorting: [6, 5, 3, 1, 0]

Step 2: Check each possible h-value from largest to smallest

h = 5: Check if citations[4] >= 5
- citations[4] = 0, and 0 < 5 ✗
- We don't have 5 papers with at least 5 citations
h = 4: Check if citations[3] >= 4
- citations[3] = 1, and 1 < 4 ✗
- We don't have 4 papers with at least 4 citations
h = 3: Check if citations[2] >= 3
- citations[2] = 3, and 3 >= 3 ✓
- We have 3 papers (with citations 6, 5, 3) that each have at least 3 citations
- Return 3

The h-index is 3 because:

Papers at indices 0, 1, 2 (citations: 6, 5, 3) all have ≥ 3 citations
This gives us exactly 3 papers with at least 3 citations each
We cannot have h = 4 because the 4th paper only has 1 citation

The key insight: when sorted descending, if the paper at position h has at least h citations, then all papers before it (with higher citations) automatically satisfy the requirement, giving us exactly h qualifying papers.

Solution Implementation

1class Solution:
2    def hIndex(self, citations: List[int]) -> int:
3        # Sort citations in descending order to check from highest to lowest
4        citations.sort(reverse=True)
5      
6        # Try each possible h-index value from largest to smallest
7        # h represents the potential h-index value we're testing
8        for h in range(len(citations), 0, -1):
9            # Check if the h-th paper (1-indexed) has at least h citations
10            # citations[h-1] is the h-th paper when counting from 1
11            # If the h-th paper has >= h citations, then all papers before it
12            # also have >= h citations (since sorted in descending order)
13            if citations[h - 1] >= h:
14                return h
15      
16        # If no valid h-index found, return 0
17        return 0
18

1class Solution {
2    public int hIndex(int[] citations) {
3        // Sort the citations array in ascending order
4        Arrays.sort(citations);
5      
6        // Get the total number of papers
7        int n = citations.length;
8      
9        // Try each possible h-index value from highest to lowest
10        for (int h = n; h > 0; h--) {
11            // Check if we have at least h papers with h or more citations
12            // citations[n - h] is the h-th paper from the end (after sorting)
13            // If this paper has at least h citations, then all papers after it
14            // (which are h papers total) also have at least h citations
15            if (citations[n - h] >= h) {
16                return h;
17            }
18        }
19      
20        // If no valid h-index found, return 0
21        return 0;
22    }
23}
24

1class Solution {
2public:
3    int hIndex(vector<int>& citations) {
4        // Sort citations in descending order to check from highest to lowest
5        sort(citations.rbegin(), citations.rend());
6      
7        // Try each possible h-index value from highest (n) to lowest (1)
8        // h represents the potential h-index value we're testing
9        for (int h = citations.size(); h > 0; --h) {
10            // Check if we have at least h papers with h or more citations
11            // citations[h-1] is the h-th paper (0-indexed)
12            // If the h-th paper has at least h citations, then all papers
13            // before it (1st to h-th) also have at least h citations
14            if (citations[h - 1] >= h) {
15                return h;
16            }
17        }
18      
19        // No valid h-index found, return 0
20        return 0;
21    }
22};
23

1/**
2 * Calculates the H-index of a researcher based on their citation counts.
3 * H-index is the largest number h such that the researcher has h papers 
4 * with at least h citations each.
5 * 
6 * @param citations - Array of citation counts for each paper
7 * @returns The H-index value
8 */
9function hIndex(citations: number[]): number {
10    // Sort citations in descending order to check from highest citations first
11    citations.sort((a, b) => b - a);
12  
13    // Try each possible h-value from largest to smallest
14    // h represents the potential H-index value
15    for (let h = citations.length; h > 0; h--) {
16        // Check if the h-th paper (0-indexed at h-1) has at least h citations
17        // If true, we found the maximum h where h papers have >= h citations
18        if (citations[h - 1] >= h) {
19            return h;
20        }
21    }
22  
23    // If no valid H-index found, return 0
24    return 0;
25}
26

Time and Space Complexity

Time Complexity: O(n log n)

The dominant operation in this code is the sorting of the citations array using citations.sort(reverse=True), which has a time complexity of O(n log n) where n is the length of the citations array. The subsequent for loop iterates at most n times with O(1) operations in each iteration, contributing O(n) to the overall complexity. Since O(n log n) + O(n) = O(n log n), the overall time complexity is O(n log n).

Space Complexity: O(log n)

Python's Timsort algorithm (used in the sort() method) requires O(log n) space for its recursion stack in the worst case. The sort operation is performed in-place, modifying the original array without creating a new one. The only additional variables used are the loop variable h and the implicit iterator, which require O(1) space. Therefore, the overall space complexity is O(log n).

Common Pitfalls

Pitfall 1: Index Out of Bounds Error

The most common mistake in this implementation is accessing citations[h-1] when h equals len(citations). When we have n papers and check for h = n, we're trying to access citations[n-1], which is valid. However, many developers mistakenly think they need to check if there are at least h papers with h citations by verifying all h papers, leading to attempted access of indices beyond the array bounds.

Incorrect approach:

for h in range(len(citations), 0, -1):
    # Trying to check if ALL h papers have at least h citations
    valid = True
    for i in range(h):  # This will cause index error when h = len(citations)
        if citations[i] < h:
            valid = False
            break
    if valid:
        return h

Correct approach: Simply check the h-th paper (at index h-1). If it has at least h citations, then due to the descending sort, all papers before it also have at least h citations.

Pitfall 2: Misunderstanding the H-Index Definition

Another common error is checking citations[h] >= h instead of citations[h-1] >= h. This happens when developers forget about 0-based indexing or misinterpret which paper to check.

Incorrect:

for h in range(len(citations), 0, -1):
    if h < len(citations) and citations[h] >= h:  # Wrong index!
        return h

Correct:

for h in range(len(citations), 0, -1):
    if citations[h-1] >= h:  # h-1 because of 0-based indexing
        return h

Pitfall 3: Not Handling Edge Cases Properly

Some implementations fail when all papers have 0 citations or when the array is empty.

Better solution with explicit edge case handling:

class Solution:
    def hIndex(self, citations: List[int]) -> int:
        if not citations:  # Handle empty array
            return 0
          
        citations.sort(reverse=True)
      
        # Alternative approach that's more intuitive
        h_index = 0
        for i, citation_count in enumerate(citations):
            # i+1 represents the number of papers we've seen so far
            # We can have an h-index of at most min(i+1, citation_count)
            h_index = max(h_index, min(i + 1, citation_count))
      
        return h_index

This alternative approach iterates through the sorted citations and keeps track of the maximum valid h-index found so far, which some find more intuitive than checking from largest to smallest h values.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan: