275. H-Index II
Problem Description
This problem involves finding the h-index of a researcher based on the array of their paper citations, which is sorted in ascending order. The h-index represents a metric that aims to measure both the productivity and citation impact of the publications of a scientist or scholar. It's defined as the maximum number 'h' such that the researcher has at least 'h' papers cited 'h' times each. Thus, if a researcher has an h-index of 5, they have published at least 5 papers with 5 or more citations each. The challenge in this problem includes implementing an efficient algorithm that runs in logarithmic time - a clear indicator that a binary search approach is likely the most appropriate solution given the sorted nature of the array.
Intuition
The intuition behind using a binary search for this problem arises from the sorted nature of the citations array and the specific property we're trying to find - the h-index. By definition, we know that an h-index of value 'h' means that there must be 'h' papers with at least 'h' citations each. This implies a monotonic relationship between the number of papers and the number of citations in a sorted list.
For a sorted array, if we find a point where the citation number is greater than or equal to the number of papers counted from the end of the array backwards, it's possible that our current position is part of the h-index count. If the citation count at a given array position is equal to or more than the papers count from that point to the array end, then any smaller number of papers will also have an equal or greater number of citations.
Binary search fits well here since this algorithm efficiently narrows down the search space by half with each step. We perform a binary search to pinpoint the exact point in the array where the number of citations crosses from being less than the h-index count to satisfying the h-index condition. Iteratively, we search for the largest 'mid' such that citations[n - mid] >= mid
, which means there are at least 'mid' papers with 'mid' or more citations. This process of splitting the search space in half and validating the h-index condition at each midpoint quickly locates the correct h-index value in logarithmic time.
The implemented solution starts with binary search boundaries from 0 to the number of citations (left and right, respectively). We iteratively adjust these boundaries based on whether the mid-value satisfies the h-index condition until left and right converge, which will happen when we find the maximum 'h' that satisfies the condition.
Learn more about Binary Search patterns.
Solution Approach
The given solution approach employs a binary search algorithm that takes advantage of the sorted nature of the input array 'citations'. Given that binary search is a divide-and-conquer algorithm that halves the search space with each iteration, it is a perfect match for the requirement of the problem to run in logarithmic time.
The binary search process in this problem is somewhat unique. The goal is to find the maximum h
index possible. To facilitate this, we set two variables, left
and right
, which represent the current range we consider for our binary search. The variable left
starts at 0 (the lowest possible h-index), and right
starts at n
(the length of the citations array which is the highest initial guess for h-index).
Within the while
loop, the algorithm calculates the middle point mid
by taking the average of left
and right
and incrementing it by one to handle the integer division correctly. The condition (left < right)
ensures that we continue the search as long as there is a range to check.
At each iteration of the binary search, we check if there are at least mid
papers with mid
or more citations, which translates to the condition citations[n - mid] >= mid
. High values of mid
mean higher possible h-index values, and conversely, lower values mean lower possible h-index values.
- If the condition is true, it means we have at least
mid
papers with at leastmid
citations, and we should look to see if there's a higher h-index possible, so we move theleft
pointer tomid
. - Conversely, if the condition is false, it means we have fewer than
mid
papers with at leastmid
citations, so we must lower our expectations and thus move theright
pointer tomid - 1
.
The loop continues adjusting left
and right
until they converge, at which point left
will hold the maximum h-index that satisfies the h-index condition (citations[n - left] >= left
). This left
value is what's returned as the solution to the problem.
The entire process requires no additional data structures as it operates directly on the input array and uses a handful of variables for the pointers and the middle index. The binary search pattern itself serves as the primary algorithmic strategy, exploiting the ordered nature of the input and requiring only O(log n)
time to complete.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's consider a researcher whose papers have the following citation counts, sorted in ascending order: [0, 1, 3, 5, 6]
. The goal is to find the h-index using the binary search approach explained above.
There are 5 papers, so we set left
to 0 and right
to 5, representing possibility for the h-index from 0 to 5.
Step 1: while (left < right)
left
= 0,right
= 5- Calculate
mid
= (0 + 5)/2 = 2 (since we're using integer division, the fractional part is discarded) - Check the condition
citations[n - mid] >= mid
, withn
being the number of papers. citations[5 - 2]
iscitations[3]
which is5
. Since5 >= 2
, the condition is true.- We might have a higher h-index, so we set
left
tomid
. Now,left
= 2 andright
= 5.
Step 2: while (left < right)
left
= 2,right
= 5- Calculate
mid
= (2 + 5)/2 = 3 citations[5 - 3]
iscitations[2]
which is3
. Since3 >= 3
, the condition is true.- We could still have a higher h-index, so again set
left
tomid
. Now,left
= 3 andright
= 5.
Step 3: while (left < right)
left
= 3,right
= 5- Calculate
mid
= (3 + 5)/2 = 4 citations[5 - 4]
iscitations[1]
which is1
. Since1 < 4
, the condition is false.- We've overshot the possible h-index, so we need to lower it, set
right
tomid - 1
. Now,left
= 3 andright
= 4 - 1 = 3.
In the next step, left
will equal right
and the loop will stop. left
, which also equals right
, is our h-index which is 3
. This means the researcher has 3 papers with at least 3 citations each. This matches the definition of the h-index, and since the search ended, this is the maximum h
possible for the given citations array.
Solution Implementation
1from typing import List
2
3class Solution:
4 def hIndex(self, citations: List[int]) -> int:
5 # Initialise the length of the citations list.
6 number_of_citations = len(citations)
7 # Set initial search range from 0 to number_of_citations.
8 left, right = 0, number_of_citations
9 # Perform binary search to find the h-index.
10 while left < right:
11 # Mid-point of the current search range calculated using bitwise right shift, equivalent to dividing by 2.
12 # Adding 1 to include the mid element in the right part of the search during the next iteration.
13 mid = (left + right + 1) // 2 # Using '//' for integer division in Python3
14 # Check if mid citations have at least 'mid' each.
15 if citations[number_of_citations - mid] >= mid:
16 # If condition satisfied, search in the right half.
17 left = mid
18 else:
19 # Else search in the left half, exclude 'mid' by subtracting one.
20 right = mid - 1
21 # The left pointer indicates the maximum number of citations that at least 'left' papers have.
22 return left
23
1class Solution {
2 public int hIndex(int[] citations) {
3 int numPapers = citations.length; // Number of papers
4 int low = 0; // Lower bound of binary search
5 int high = numPapers; // Upper bound of binary search
6
7 // Perform binary search
8 while (low < high) {
9 // Calculate the middle index. "+1" ensures we go to upper half if not found.
10 int mid = (low + high + 1) >>> 1; // Using unsigned right shift for safe mid calculation
11
12 // Check if mid papers have at least 'mid' citations each
13 if (citations[numPapers - mid] >= mid) {
14 // If yes, move the lower bound up
15 low = mid;
16 } else {
17 // Otherwise, bring the upper bound down
18 high = mid - 1;
19 }
20 }
21
22 // After exiting the loop, low is the h-index
23 return low;
24 }
25}
26
1class Solution {
2public:
3 int hIndex(vector<int>& citations) {
4 // The size of the citations vector
5 int numPapers = citations.size();
6
7 // Initialize binary search bounds
8 int left = 0, right = numPapers;
9
10 while (left < right) {
11 // Calculate the mid point, avoid overflow by using left + (right - left) / 2
12 // The '+1' is to make sure we find the upper bound in case of even number of elements
13 int mid = left + (right - left + 1) / 2;
14
15 // Check if mid number of papers have at least mid citations each
16 // Since 'citations' is sorted in non-increasing order,
17 // we check the value at index 'numPapers - mid'
18 if (citations[numPapers - mid] >= mid)
19 // If mid papers have at least mid citations, this could be our h-index,
20 // but there may be a larger one. Move the search to the right (upper half).
21 left = mid;
22 else
23 // If mid papers do not have at least mid citations, look for lower h-index
24 // in the left (lower half).
25 right = mid - 1;
26 }
27 // After the loop, 'left' points to the largest number such that there are 'left' papers
28 // that have at least 'left' citations.
29 return left;
30 }
31};
32
1function hIndex(citations: number[]): number {
2 // Initialize the size of the citations array.
3 const length = citations.length;
4 // Define binary search bounds.
5 let left = 0,
6 right = length;
7
8 // Perform a binary search to find the h-index.
9 while (left < right) {
10 // Calculate the mid point. The bitwise operation is used to perform integer division by 2.
11 const mid = (left + right + 1) >> 1;
12 // Check if the mid element fulfills the h-index property.
13 if (citations[length - mid] >= mid) {
14 // If the condition is met, move the lower bound up, as we are looking for the maximum h.
15 left = mid;
16 } else {
17 // Otherwise, move the upper bound down.
18 right = mid - 1;
19 }
20 }
21
22 // The left variable will have settled on the correct h-index value by the end of the loop.
23 return left;
24}
25
Time and Space Complexity
The time complexity of the code is O(log n)
, where n
is the length of the citations
list. This is because the algorithm uses a binary search technique, repeatedly dividing the search interval in half, which results in a logarithmic number of iterations in relation to the size of the input list.
The space complexity of the code is O(1)
. There are no extra data structures used that grow with the input size. The additional space used for variables left
, right
, and mid
is constant, and thus does not depend on the size of the input list.
Learn more about how to find time and space complexity quickly using problem constraints.
Which technique can we use to find the middle of a linked list?
Recommended Readings
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Don’t Miss This!