Facebook Pixel

274. H-Index

MediumArrayCounting SortSorting
Leetcode Link

Problem Description

You are given an array citations where citations[i] represents the number of citations a researcher received for their i-th paper. Your task is to find and return the researcher's h-index.

The h-index is defined as the maximum value of h such that the researcher has published at least h papers that have each been cited at least h times.

For example, if a researcher has 5 papers with citations [3, 0, 6, 1, 5]:

  • They have 3 papers with at least 3 citations each (papers with 3, 6, and 5 citations)
  • They have 2 papers with at least 4 citations each (papers with 6 and 5 citations)
  • They have 2 papers with at least 5 citations each (papers with 6 and 5 citations)
  • They have only 1 paper with at least 6 citations (the paper with 6 citations)

The h-index would be 3 because the researcher has 3 papers with at least 3 citations each, but does not have 4 papers with at least 4 citations each.

The solution approach sorts the citations in descending order, then checks from the highest possible h-value (which is the total number of papers) down to 1. For each potential h-value, it verifies if the h-th most cited paper has at least h citations. The first h-value that satisfies this condition is returned as the h-index.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that if we have h papers with at least h citations each, then when we sort the papers by citation count in descending order, the h-th paper (at index h-1) must have at least h citations.

Think about it this way: if we line up all papers from most cited to least cited, and we want to find the maximum h where we have h papers with at least h citations, we need to find the point where the paper's position number equals or exceeds its citation count.

For example, with sorted citations [6, 5, 3, 1, 0]:

  • Position 1: Paper has 6 citations, we have 1 paper with ≥1 citations ✓
  • Position 2: Paper has 5 citations, we have 2 papers with ≥2 citations ✓
  • Position 3: Paper has 3 citations, we have 3 papers with ≥3 citations ✓
  • Position 4: Paper has 1 citation, we DON'T have 4 papers with ≥4 citations ✗

The h-index is 3, the last position where the condition holds.

By sorting in descending order, we guarantee that if the paper at position h has at least h citations, then all papers before it (which have more citations) also satisfy the requirement. This allows us to check from the largest possible h value downward and return the first valid h we find, ensuring we get the maximum h-index.

The beauty of this approach is that once sorted, we only need to check if citations[h-1] >= h for each potential h value, making the logic clean and straightforward.

Solution Approach

The implementation follows a straightforward sorting-based approach:

  1. Sort the citations array in descending order: We use citations.sort(reverse=True) to arrange the papers from highest to lowest citation count. This ordering is crucial because it allows us to easily check if we have at least h papers with h or more citations.

  2. Iterate through possible h-values from largest to smallest: We loop through h values starting from len(citations) (the total number of papers, which is the maximum possible h-index) down to 1. We use range(len(citations), 0, -1) to generate this sequence.

  3. Check the h-index condition: For each potential h value, we check if citations[h-1] >= h. This condition verifies that the h-th most cited paper (at index h-1 due to 0-based indexing) has at least h citations. If this paper has at least h citations, then all papers before it in the sorted array also have at least h citations, meaning we have found at least h papers with h or more citations each.

  4. Return the first valid h-value: Since we're checking from the largest possible h downward, the first h that satisfies our condition is the maximum h-index. We immediately return this value.

  5. Handle the edge case: If no valid h is found in the loop (meaning even the most cited paper has 0 citations), we return 0.

The time complexity is O(n log n) due to sorting, where n is the number of papers. The space complexity is O(1) if we're allowed to modify the input array in-place, or O(n) if we need to preserve the original array.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with a small example: citations = [3, 0, 6, 1, 5]

Step 1: Sort in descending order After sorting: [6, 5, 3, 1, 0]

Step 2: Check each possible h-value from largest to smallest

  • h = 5: Check if citations[4] >= 5

    • citations[4] = 0, and 0 < 5
    • We don't have 5 papers with at least 5 citations
  • h = 4: Check if citations[3] >= 4

    • citations[3] = 1, and 1 < 4
    • We don't have 4 papers with at least 4 citations
  • h = 3: Check if citations[2] >= 3

    • citations[2] = 3, and 3 >= 3
    • We have 3 papers (with citations 6, 5, 3) that each have at least 3 citations
    • Return 3

The h-index is 3 because:

  • Papers at indices 0, 1, 2 (citations: 6, 5, 3) all have ≥ 3 citations
  • This gives us exactly 3 papers with at least 3 citations each
  • We cannot have h = 4 because the 4th paper only has 1 citation

The key insight: when sorted descending, if the paper at position h has at least h citations, then all papers before it (with higher citations) automatically satisfy the requirement, giving us exactly h qualifying papers.

Solution Implementation

1class Solution:
2    def hIndex(self, citations: List[int]) -> int:
3        # Sort citations in descending order to check from highest to lowest
4        citations.sort(reverse=True)
5      
6        # Try each possible h-index value from largest to smallest
7        # h represents the potential h-index value we're testing
8        for h in range(len(citations), 0, -1):
9            # Check if the h-th paper (1-indexed) has at least h citations
10            # citations[h-1] is the h-th paper when counting from 1
11            # If the h-th paper has >= h citations, then all papers before it
12            # also have >= h citations (since sorted in descending order)
13            if citations[h - 1] >= h:
14                return h
15      
16        # If no valid h-index found, return 0
17        return 0
18
1class Solution {
2    public int hIndex(int[] citations) {
3        // Sort the citations array in ascending order
4        Arrays.sort(citations);
5      
6        // Get the total number of papers
7        int n = citations.length;
8      
9        // Try each possible h-index value from highest to lowest
10        for (int h = n; h > 0; h--) {
11            // Check if we have at least h papers with h or more citations
12            // citations[n - h] is the h-th paper from the end (after sorting)
13            // If this paper has at least h citations, then all papers after it
14            // (which are h papers total) also have at least h citations
15            if (citations[n - h] >= h) {
16                return h;
17            }
18        }
19      
20        // If no valid h-index found, return 0
21        return 0;
22    }
23}
24
1class Solution {
2public:
3    int hIndex(vector<int>& citations) {
4        // Sort citations in descending order to check from highest to lowest
5        sort(citations.rbegin(), citations.rend());
6      
7        // Try each possible h-index value from highest (n) to lowest (1)
8        // h represents the potential h-index value we're testing
9        for (int h = citations.size(); h > 0; --h) {
10            // Check if we have at least h papers with h or more citations
11            // citations[h-1] is the h-th paper (0-indexed)
12            // If the h-th paper has at least h citations, then all papers
13            // before it (1st to h-th) also have at least h citations
14            if (citations[h - 1] >= h) {
15                return h;
16            }
17        }
18      
19        // No valid h-index found, return 0
20        return 0;
21    }
22};
23
1/**
2 * Calculates the H-index of a researcher based on their citation counts.
3 * H-index is the largest number h such that the researcher has h papers 
4 * with at least h citations each.
5 * 
6 * @param citations - Array of citation counts for each paper
7 * @returns The H-index value
8 */
9function hIndex(citations: number[]): number {
10    // Sort citations in descending order to check from highest citations first
11    citations.sort((a, b) => b - a);
12  
13    // Try each possible h-value from largest to smallest
14    // h represents the potential H-index value
15    for (let h = citations.length; h > 0; h--) {
16        // Check if the h-th paper (0-indexed at h-1) has at least h citations
17        // If true, we found the maximum h where h papers have >= h citations
18        if (citations[h - 1] >= h) {
19            return h;
20        }
21    }
22  
23    // If no valid H-index found, return 0
24    return 0;
25}
26

Time and Space Complexity

Time Complexity: O(n log n)

The dominant operation in this code is the sorting of the citations array using citations.sort(reverse=True), which has a time complexity of O(n log n) where n is the length of the citations array. The subsequent for loop iterates at most n times with O(1) operations in each iteration, contributing O(n) to the overall complexity. Since O(n log n) + O(n) = O(n log n), the overall time complexity is O(n log n).

Space Complexity: O(log n)

Python's Timsort algorithm (used in the sort() method) requires O(log n) space for its recursion stack in the worst case. The sort operation is performed in-place, modifying the original array without creating a new one. The only additional variables used are the loop variable h and the implicit iterator, which require O(1) space. Therefore, the overall space complexity is O(log n).

Common Pitfalls

Pitfall 1: Index Out of Bounds Error

The most common mistake in this implementation is accessing citations[h-1] when h equals len(citations). When we have n papers and check for h = n, we're trying to access citations[n-1], which is valid. However, many developers mistakenly think they need to check if there are at least h papers with h citations by verifying all h papers, leading to attempted access of indices beyond the array bounds.

Incorrect approach:

for h in range(len(citations), 0, -1):
    # Trying to check if ALL h papers have at least h citations
    valid = True
    for i in range(h):  # This will cause index error when h = len(citations)
        if citations[i] < h:
            valid = False
            break
    if valid:
        return h

Correct approach: Simply check the h-th paper (at index h-1). If it has at least h citations, then due to the descending sort, all papers before it also have at least h citations.

Pitfall 2: Misunderstanding the H-Index Definition

Another common error is checking citations[h] >= h instead of citations[h-1] >= h. This happens when developers forget about 0-based indexing or misinterpret which paper to check.

Incorrect:

for h in range(len(citations), 0, -1):
    if h < len(citations) and citations[h] >= h:  # Wrong index!
        return h

Correct:

for h in range(len(citations), 0, -1):
    if citations[h-1] >= h:  # h-1 because of 0-based indexing
        return h

Pitfall 3: Not Handling Edge Cases Properly

Some implementations fail when all papers have 0 citations or when the array is empty.

Better solution with explicit edge case handling:

class Solution:
    def hIndex(self, citations: List[int]) -> int:
        if not citations:  # Handle empty array
            return 0
          
        citations.sort(reverse=True)
      
        # Alternative approach that's more intuitive
        h_index = 0
        for i, citation_count in enumerate(citations):
            # i+1 represents the number of papers we've seen so far
            # We can have an h-index of at most min(i+1, citation_count)
            h_index = max(h_index, min(i + 1, citation_count))
      
        return h_index

This alternative approach iterates through the sorted citations and keeps track of the maximum valid h-index found so far, which some find more intuitive than checking from largest to smallest h values.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Problem: Given a list of tasks and a list of requirements, compute a sequence of tasks that can be performed, such that we complete every task once while satisfying all the requirements.

Which of the following method should we use to solve this problem?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More