A clear explanation of the H-Index II problem using binary search on a sorted citations array.
Problem Restatement
We are given an integer array citations.
The array is sorted in ascending order.
Each value:
citations[i]is the number of citations received by one paper.
Return the researcher’s h-index.
The h-index is the maximum value h such that the researcher has at least h papers with at least h citations each.
The required runtime is logarithmic, so we need binary search. The official examples include [0,1,3,5,6] -> 3 and [1,2,100] -> 2.
Input and Output
| Item | Meaning |
|---|---|
| Input | Sorted ascending array citations |
| Output | The h-index |
| Goal | Find the largest valid h |
| Required time | O(log n) |
Example function shape:
def hIndex(citations: list[int]) -> int:
...Examples
Example 1:
citations = [0, 1, 3, 5, 6]There are 5 papers.
If h = 3, then the last three papers have citation counts:
[3, 5, 6]All three have at least 3 citations.
If h = 4, then we would need four papers with at least four citations, but only two papers have at least four citations.
Answer:
3Example 2:
citations = [1, 2, 100]There are two papers with at least two citations:
[2, 100]There are not three papers with at least three citations.
Answer:
2First Thought: Linear Scan
Since the array is sorted, we can scan from left to right.
At index i, the number of papers from i to the end is:
n - iIf:
citations[i] >= n - ithen there are at least n - i papers with at least n - i citations.
The first index where this becomes true gives the h-index.
class Solution:
def hIndex(self, citations: list[int]) -> int:
n = len(citations)
for i, c in enumerate(citations):
h = n - i
if c >= h:
return h
return 0This is correct, but it runs in O(n).
Problem With Linear Scan
The problem explicitly asks for logarithmic time.
So we need to use the sorted order more aggressively.
The condition:
citations[i] >= n - ihas a monotonic shape.
At earlier indices, citations[i] is smaller and n - i is larger.
At later indices, citations[i] is larger and n - i is smaller.
So the condition changes from false to true at most once.
That is exactly a binary search pattern.
Key Insight
For an index i, define:
h = n - iThis means there are h papers from index i through the end.
Because the array is sorted ascending, all papers from index i onward have at least citations[i] citations.
So if:
citations[i] >= hthen these h papers all have at least h citations.
This makes h a valid h-index candidate.
We want the earliest index where this condition becomes true.
Then the answer is:
n - indexAlgorithm
- Let
n = len(citations). - Binary search over indices
0..n - 1. - At each
mid, compute:h = n - mid - If:then
citations[mid] >= hmidis feasible, so search left for an earlier feasible index. - Otherwise, search right.
- When binary search ends,
leftis the first feasible index. - Return:
n - left
If no index is feasible, left == n, so the answer becomes 0.
Walkthrough
Use:
citations = [0, 1, 3, 5, 6]Start:
n = 5
left = 0
right = 5We use a half-open search interval [left, right).
First middle:
mid = 2
h = 5 - 2 = 3
citations[2] = 3Since:
3 >= 3index 2 is feasible.
Search left:
right = 2Next middle:
mid = 1
h = 5 - 1 = 4
citations[1] = 1Since:
1 < 4index 1 is not feasible.
Search right:
left = 2Now:
left == right == 2The first feasible index is 2.
Return:
n - left = 5 - 2 = 3Correctness
For each index i, there are exactly n - i papers from i to the end of the sorted array.
Because the array is sorted ascending, every paper from index i onward has at least citations[i] citations.
Therefore, if citations[i] >= n - i, then the researcher has at least n - i papers with at least n - i citations. So n - i is a valid h-index candidate.
The condition is monotonic. When it becomes true at some index, it remains true for later indices because citation counts do not decrease while n - i decreases.
Binary search finds the first index where the condition is true.
This first feasible index gives the largest valid h-index, because earlier indices correspond to larger h values. If no feasible index exists, the h-index is 0.
Therefore the algorithm returns the correct h-index.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(log n) | Binary search over the sorted array |
| Space | O(1) | Only a few variables are used |
Implementation
class Solution:
def hIndex(self, citations: list[int]) -> int:
n = len(citations)
left = 0
right = n
while left < right:
mid = (left + right) // 2
h = n - mid
if citations[mid] >= h:
right = mid
else:
left = mid + 1
return n - leftCode Explanation
We search in the half-open interval:
[left, right)Initialize it to the whole array:
left = 0
right = nAt each index mid, the possible h-index is:
h = n - midIf the current paper has enough citations to support that h:
if citations[mid] >= h:then this index works, and maybe an earlier index also works.
So we move left:
right = midOtherwise, this index cannot support its candidate h, so we search to the right:
left = mid + 1At the end, left is the first feasible index.
The answer is:
return n - leftIf no index works, then left == n, and the function returns 0.
Testing
def run_tests():
s = Solution()
assert s.hIndex([0, 1, 3, 5, 6]) == 3
assert s.hIndex([1, 2, 100]) == 2
assert s.hIndex([0]) == 0
assert s.hIndex([100]) == 1
assert s.hIndex([0, 0, 0]) == 0
assert s.hIndex([1, 1, 1]) == 1
assert s.hIndex([4, 4, 4, 4]) == 4
assert s.hIndex([0, 1, 4, 5, 6]) == 3
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
[0, 1, 3, 5, 6] | Official example |
[1, 2, 100] | Official example |
[0] | Single paper with zero citations |
[100] | H-index capped by paper count |
| All zero | No positive h-index |
| All one | H-index is one |
| All high enough | H-index equals n |
| Mixed sorted values | Confirms binary search boundary |