275. H-Index II


Problem Description

This problem involves finding the h-index of a researcher based on the array of their paper citations, which is sorted in ascending order. The h-index represents a metric that aims to measure both the productivity and citation impact of the publications of a scientist or scholar. It's defined as the maximum number 'h' such that the researcher has at least 'h' papers cited 'h' times each. Thus, if a researcher has an h-index of 5, they have published at least 5 papers with 5 or more citations each. The challenge in this problem includes implementing an efficient algorithm that runs in logarithmic time - a clear indicator that a binary search approach is likely the most appropriate solution given the sorted nature of the array.

Intuition

The intuition behind using a binary search for this problem arises from the sorted nature of the citations array and the specific property we're trying to find - the h-index. By definition, we know that an h-index of value 'h' means that there must be 'h' papers with at least 'h' citations each. This implies a monotonic relationship between the number of papers and the number of citations in a sorted list.

For a sorted array, if we find a point where the citation number is greater than or equal to the number of papers counted from the end of the array backwards, it's possible that our current position is part of the h-index count. If the citation count at a given array position is equal to or more than the papers count from that point to the array end, then any smaller number of papers will also have an equal or greater number of citations.

Binary search fits well here since this algorithm efficiently narrows down the search space by half with each step. We perform a binary search to pinpoint the exact point in the array where the number of citations crosses from being less than the h-index count to satisfying the h-index condition. Iteratively, we search for the largest 'mid' such that citations[n - mid] >= mid, which means there are at least 'mid' papers with 'mid' or more citations. This process of splitting the search space in half and validating the h-index condition at each midpoint quickly locates the correct h-index value in logarithmic time.

The implemented solution starts with binary search boundaries from 0 to the number of citations (left and right, respectively). We iteratively adjust these boundaries based on whether the mid-value satisfies the h-index condition until left and right converge, which will happen when we find the maximum 'h' that satisfies the condition.

Learn more about Binary Search patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Depth first search can be used to find whether two components in a graph are connected.

Solution Approach

The given solution approach employs a binary search algorithm that takes advantage of the sorted nature of the input array 'citations'. Given that binary search is a divide-and-conquer algorithm that halves the search space with each iteration, it is a perfect match for the requirement of the problem to run in logarithmic time.

The binary search process in this problem is somewhat unique. The goal is to find the maximum h index possible. To facilitate this, we set two variables, left and right, which represent the current range we consider for our binary search. The variable left starts at 0 (the lowest possible h-index), and right starts at n (the length of the citations array which is the highest initial guess for h-index).

Within the while loop, the algorithm calculates the middle point mid by taking the average of left and right and incrementing it by one to handle the integer division correctly. The condition (left < right) ensures that we continue the search as long as there is a range to check.

At each iteration of the binary search, we check if there are at least mid papers with mid or more citations, which translates to the condition citations[n - mid] >= mid. High values of mid mean higher possible h-index values, and conversely, lower values mean lower possible h-index values.

  • If the condition is true, it means we have at least mid papers with at least mid citations, and we should look to see if there's a higher h-index possible, so we move the left pointer to mid.
  • Conversely, if the condition is false, it means we have fewer than mid papers with at least mid citations, so we must lower our expectations and thus move the right pointer to mid - 1.

The loop continues adjusting left and right until they converge, at which point left will hold the maximum h-index that satisfies the h-index condition (citations[n - left] >= left). This left value is what's returned as the solution to the problem.

The entire process requires no additional data structures as it operates directly on the input array and uses a handful of variables for the pointers and the middle index. The binary search pattern itself serves as the primary algorithmic strategy, exploiting the ordered nature of the input and requiring only O(log n) time to complete.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

How does quick sort divide the problem into subproblems?

Example Walkthrough

Let's consider a researcher whose papers have the following citation counts, sorted in ascending order: [0, 1, 3, 5, 6]. The goal is to find the h-index using the binary search approach explained above.

There are 5 papers, so we set left to 0 and right to 5, representing possibility for the h-index from 0 to 5.

Step 1: while (left < right)

  • left = 0, right = 5
  • Calculate mid = (0 + 5)/2 = 2 (since we're using integer division, the fractional part is discarded)
  • Check the condition citations[n - mid] >= mid, with n being the number of papers.
  • citations[5 - 2] is citations[3] which is 5. Since 5 >= 2, the condition is true.
  • We might have a higher h-index, so we set left to mid. Now, left = 2 and right = 5.

Step 2: while (left < right)

  • left = 2, right = 5
  • Calculate mid = (2 + 5)/2 = 3
  • citations[5 - 3] is citations[2] which is 3. Since 3 >= 3, the condition is true.
  • We could still have a higher h-index, so again set left to mid. Now, left = 3 and right = 5.

Step 3: while (left < right)

  • left = 3, right = 5
  • Calculate mid = (3 + 5)/2 = 4
  • citations[5 - 4] is citations[1] which is 1. Since 1 < 4, the condition is false.
  • We've overshot the possible h-index, so we need to lower it, set right to mid - 1. Now, left = 3 and right = 4 - 1 = 3.

In the next step, left will equal right and the loop will stop. left, which also equals right, is our h-index which is 3. This means the researcher has 3 papers with at least 3 citations each. This matches the definition of the h-index, and since the search ended, this is the maximum h possible for the given citations array.

Not Sure What to Study? Take the 2-min Quiz:

Which of the following is equvalent to O(3*2^n + n^3 + n!+ log n)?

Python Solution

1from typing import List
2
3class Solution:
4    def hIndex(self, citations: List[int]) -> int:
5        # Initialise the length of the citations list.
6        number_of_citations = len(citations)
7        # Set initial search range from 0 to number_of_citations.
8        left, right = 0, number_of_citations
9        # Perform binary search to find the h-index.
10        while left < right:
11            # Mid-point of the current search range calculated using bitwise right shift, equivalent to dividing by 2.
12            # Adding 1 to include the mid element in the right part of the search during the next iteration.
13            mid = (left + right + 1) // 2  # Using '//' for integer division in Python3
14            # Check if mid citations have at least 'mid' each.
15            if citations[number_of_citations - mid] >= mid:
16                # If condition satisfied, search in the right half.
17                left = mid
18            else:
19                # Else search in the left half, exclude 'mid' by subtracting one.
20                right = mid - 1
21        # The left pointer indicates the maximum number of citations that at least 'left' papers have.
22        return left
23

Java Solution

1class Solution {
2    public int hIndex(int[] citations) {
3        int numPapers = citations.length; // Number of papers
4        int low = 0;                      // Lower bound of binary search
5        int high = numPapers;             // Upper bound of binary search
6
7        // Perform binary search
8        while (low < high) {
9            // Calculate the middle index. "+1" ensures we go to upper half if not found.
10            int mid = (low + high + 1) >>> 1; // Using unsigned right shift for safe mid calculation
11
12            // Check if mid papers have at least 'mid' citations each
13            if (citations[numPapers - mid] >= mid) {
14                // If yes, move the lower bound up
15                low = mid;
16            } else {
17                // Otherwise, bring the upper bound down
18                high = mid - 1;
19            }
20        }
21
22        // After exiting the loop, low is the h-index
23        return low;
24    }
25}
26

C++ Solution

1class Solution {
2public:
3    int hIndex(vector<int>& citations) {
4        // The size of the citations vector
5        int numPapers = citations.size();
6
7        // Initialize binary search bounds
8        int left = 0, right = numPapers;
9
10        while (left < right) {
11            // Calculate the mid point, avoid overflow by using left + (right - left) / 2
12            // The '+1' is to make sure we find the upper bound in case of even number of elements
13            int mid = left + (right - left + 1) / 2;
14
15            // Check if mid number of papers have at least mid citations each
16            // Since 'citations' is sorted in non-increasing order,
17            // we check the value at index 'numPapers - mid'
18            if (citations[numPapers - mid] >= mid)
19                // If mid papers have at least mid citations, this could be our h-index,
20                // but there may be a larger one. Move the search to the right (upper half).
21                left = mid;
22            else
23                // If mid papers do not have at least mid citations, look for lower h-index
24                // in the left (lower half).
25                right = mid - 1;
26        }
27        // After the loop, 'left' points to the largest number such that there are 'left' papers
28        // that have at least 'left' citations.
29        return left;
30    }
31};
32

Typescript Solution

1function hIndex(citations: number[]): number {
2    // Initialize the size of the citations array.
3    const length = citations.length;
4    // Define binary search bounds.
5    let left = 0,
6        right = length;
7  
8    // Perform a binary search to find the h-index.
9    while (left < right) {
10        // Calculate the mid point. The bitwise operation is used to perform integer division by 2.
11        const mid = (left + right + 1) >> 1;
12        // Check if the mid element fulfills the h-index property.
13        if (citations[length - mid] >= mid) {
14            // If the condition is met, move the lower bound up, as we are looking for the maximum h.
15            left = mid;
16        } else {
17            // Otherwise, move the upper bound down.
18            right = mid - 1;
19        }
20    }
21
22    // The left variable will have settled on the correct h-index value by the end of the loop.
23    return left;
24}
25
Fast Track Your Learning with Our Quick Skills Quiz:

The three-steps of Depth First Search are:

  1. Identify states;
  2. Draw the state-space tree;
  3. DFS on the state-space tree.

Time and Space Complexity

The time complexity of the code is O(log n), where n is the length of the citations list. This is because the algorithm uses a binary search technique, repeatedly dividing the search interval in half, which results in a logarithmic number of iterations in relation to the size of the input list.

The space complexity of the code is O(1). There are no extra data structures used that grow with the input size. The additional space used for variables left, right, and mid is constant, and thus does not depend on the size of the input list.

Learn more about how to find time and space complexity quickly.


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫