A clear explanation of the H-Index problem using sorting, then an optimized counting approach.
Problem Restatement
We are given an integer array citations.
Each value:
citations[i]is the number of citations received by the researcher’s ith paper.
Return the researcher’s h-index.
The h-index is the maximum value h such that the researcher has at least h papers with at least h citations each.
The constraints are 1 <= citations.length <= 5000 and 0 <= citations[i] <= 1000. The official examples include [3,0,6,1,5] -> 3 and [1,3,1] -> 1.
Input and Output
| Item | Meaning |
|---|---|
| Input | An array citations |
| Output | The researcher’s h-index |
| Goal | Find the largest valid h |
| Condition | At least h papers have at least h citations each |
Example function shape:
def hIndex(citations: list[int]) -> int:
...Examples
Example 1:
citations = [3, 0, 6, 1, 5]Sort descending:
[6, 5, 3, 1, 0]Check possible h-index values:
h | Need | Result |
|---|---|---|
1 | At least 1 paper with at least 1 citation | Yes |
2 | At least 2 papers with at least 2 citations | Yes |
3 | At least 3 papers with at least 3 citations | Yes |
4 | At least 4 papers with at least 4 citations | No |
Answer:
3Example 2:
citations = [1, 3, 1]Sort descending:
[3, 1, 1]There is at least one paper with at least one citation.
There are not two papers with at least two citations.
Answer:
1Example 3:
citations = [0, 0, 0]No paper has at least one citation.
Answer:
0First Thought: Try Every h
A direct solution is to try every possible value of h from 0 to n.
For each h, count how many papers have at least h citations.
class Solution:
def hIndex(self, citations: list[int]) -> int:
n = len(citations)
best = 0
for h in range(n + 1):
count = 0
for c in citations:
if c >= h:
count += 1
if count >= h:
best = h
return bestThis works, but it repeatedly scans the whole array.
Problem With Brute Force
There are up to n possible values of h.
For each value, we scan all papers.
That gives:
O(n^2)We can do better by sorting.
Key Insight
After sorting citations in descending order, the position tells us how many papers have at least that many citations.
For example:
[6, 5, 3, 1, 0]At index 0, we have seen 1 paper.
At index 1, we have seen 2 papers.
At index 2, we have seen 3 papers.
If the citation count at index i is at least i + 1, then there are at least i + 1 papers with at least i + 1 citations.
So we scan the sorted list and keep increasing the answer while this condition holds.
Algorithm
- Sort
citationsin descending order. - Initialize
h = 0. - For each citation count
c:- If
c >= h + 1, increaseh. - Otherwise stop.
- If
- Return
h.
Walkthrough
Use:
citations = [3, 0, 6, 1, 5]After sorting:
[6, 5, 3, 1, 0]Start:
h = 0Read 6.
6 >= 1So:
h = 1Read 5.
5 >= 2So:
h = 2Read 3.
3 >= 3So:
h = 3Read 1.
1 < 4Stop.
Return:
3Correctness
After sorting citations in descending order, the first i + 1 papers are the i + 1 most cited papers.
When we see a citation count c at index i, if:
c >= i + 1then every previous paper has at least c citations or more, so the first i + 1 papers each have at least i + 1 citations. Therefore i + 1 is a valid h-index candidate.
If the condition fails at index i, then the i + 1th most cited paper has fewer than i + 1 citations. Since all later papers have no more citations, there cannot be i + 1 papers with at least i + 1 citations.
Thus the largest value reached by the scan is exactly the h-index.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n log n) | Sorting dominates |
| Space | O(1) or O(n) | Depends on the sorting implementation |
Implementation
class Solution:
def hIndex(self, citations: list[int]) -> int:
citations.sort(reverse=True)
h = 0
for c in citations:
if c >= h + 1:
h += 1
else:
break
return hCode Explanation
Sort from most cited to least cited:
citations.sort(reverse=True)The variable h stores the largest valid h-index found so far:
h = 0For each citation count:
for c in citations:we check whether this paper can support increasing the h-index by one:
if c >= h + 1:
h += 1If it cannot, the later papers cannot help because they have no more citations.
So we stop:
else:
breakAlternative: Counting Solution
Since the h-index cannot be larger than n, citation counts above n can be grouped together.
Create buckets:
bucket[0], bucket[1], ..., bucket[n]where:
bucket[i]counts papers with exactlyicitations fori < nbucket[n]counts papers with at leastncitations
Then scan from n down to 0.
class Solution:
def hIndex(self, citations: list[int]) -> int:
n = len(citations)
bucket = [0] * (n + 1)
for c in citations:
if c >= n:
bucket[n] += 1
else:
bucket[c] += 1
papers = 0
for h in range(n, -1, -1):
papers += bucket[h]
if papers >= h:
return h
return 0This version runs in:
| Metric | Value |
|---|---|
| Time | O(n) |
| Space | O(n) |
Testing
def run_tests():
s = Solution()
assert s.hIndex([3, 0, 6, 1, 5]) == 3
assert s.hIndex([1, 3, 1]) == 1
assert s.hIndex([0, 0, 0]) == 0
assert s.hIndex([100]) == 1
assert s.hIndex([0]) == 0
assert s.hIndex([4, 4, 4, 4]) == 4
assert s.hIndex([10, 8, 5, 4, 3]) == 4
assert s.hIndex([25, 8, 5, 3, 3]) == 3
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
[3, 0, 6, 1, 5] | Official example |
[1, 3, 1] | Small unsorted input |
| All zero | H-index is zero |
| Single highly cited paper | H-index capped by paper count |
| Single zero-citation paper | No valid positive h-index |
| All papers have enough citations | H-index equals n |
| Descending input | Already sorted case |
| Mixed high citations | Confirms h-index cap logic |