Find the longest run of consecutive integers in an unsorted array using a hash set and sequence-start detection.
Problem Restatement
We are given an unsorted integer array nums.
We need to return the length of the longest consecutive sequence of numbers.
The numbers do not need to be adjacent in the array. They only need to exist somewhere in the array.
The problem also requires an O(n) time algorithm. Sorting would take O(n log n), so sorting does not satisfy the required bound.
Examples
Example 1:
nums = [100, 4, 200, 1, 3, 2]The longest consecutive sequence is:
[1, 2, 3, 4]Its length is:
4So the output is:
4Example 2:
nums = [0, 3, 7, 2, 5, 8, 4, 6, 0, 1]The longest consecutive sequence is:
[0, 1, 2, 3, 4, 5, 6, 7, 8]Its length is:
9So the output is:
9Example 3:
nums = [1, 0, 1, 2]The longest consecutive sequence is:
[0, 1, 2]Even though 1 appears twice, duplicates do not extend the sequence.
The output is:
3Input and Output
| Item | Meaning |
|---|---|
| Input | nums: list[int] |
| Output | Length of the longest consecutive integer sequence |
| Important detail | The array is unsorted |
| Required time | O(n) |
Function shape:
class Solution:
def longestConsecutive(self, nums: list[int]) -> int:
...First Thought: Sort the Array
A natural idea is to sort the array.
For example:
nums = [100, 4, 200, 1, 3, 2]After sorting:
[1, 2, 3, 4, 100, 200]Then we can scan from left to right and count consecutive numbers.
This is easy, but sorting costs:
O(n log n)The problem requires O(n), so we need another method.
Key Insight
We need fast membership checks.
For any number x, we want to quickly ask:
x in numsA hash set gives expected O(1) lookup.
So we store all numbers in a set:
num_set = set(nums)Now we can test whether x + 1, x + 2, and so on exist.
But we should not start counting from every number.
For example, in this set:
{1, 2, 3, 4}Starting from 1 is useful.
Starting from 2 repeats work, because 2 is already inside the sequence that starts at 1.
So we only start counting from a number x when x - 1 does not exist.
That means x is the beginning of a sequence.
Sequence Start Rule
A number starts a consecutive sequence if its previous number is missing.
if num - 1 not in num_set:
# num is the start of a sequenceExample:
nums = [100, 4, 200, 1, 3, 2]
num_set = {1, 2, 3, 4, 100, 200}Check each number:
| Number | Previous exists? | Start? |
|---|---|---|
100 | 99 missing | Yes |
4 | 3 exists | No |
200 | 199 missing | Yes |
1 | 0 missing | Yes |
3 | 2 exists | No |
2 | 1 exists | No |
Only 100, 200, and 1 are sequence starts.
From 1, we count:
1 -> 2 -> 3 -> 4Length is 4.
Algorithm
Create a set from nums.
Initialize best = 0.
For each number num in the set:
- Check whether
num - 1exists. - If it exists, skip
numbecause it is not the beginning of a sequence. - If it does not exist, start a new sequence from
num. - Count upward while
num + lengthexists. - Update
best.
Return best.
Correctness
Every consecutive sequence has exactly one smallest number.
For the sequence:
[1, 2, 3, 4]the smallest number is 1.
Only 1 satisfies:
1 - 1 not in num_setThe other numbers do not satisfy the start rule:
2 - 1 in num_set
3 - 1 in num_set
4 - 1 in num_setSo the algorithm counts each sequence once, starting from its smallest number.
When it starts from that smallest number, it checks x + 1, x + 2, and so on until the next number is missing. Therefore, it counts the full length of that consecutive sequence.
Since every sequence is counted once and best stores the maximum length found, the algorithm returns the length of the longest consecutive sequence.
Complexity
Let n be the length of nums.
| Metric | Value | Why |
|---|---|---|
| Time | O(n) | Each number is used in constant expected-time set operations |
| Space | O(n) | The set stores unique numbers |
Although there is a nested while loop, the total work is linear.
The while loop only runs from sequence starts. Across all starts, each unique number belongs to one counted sequence.
Implementation
class Solution:
def longestConsecutive(self, nums: list[int]) -> int:
num_set = set(nums)
best = 0
for num in num_set:
if num - 1 in num_set:
continue
length = 1
while num + length in num_set:
length += 1
best = max(best, length)
return bestCode Explanation
We first put every number into a set:
num_set = set(nums)This also removes duplicates.
For example:
[1, 0, 1, 2]becomes:
{0, 1, 2}Then we scan through unique numbers:
for num in num_set:If the previous number exists, this number is somewhere in the middle of a sequence:
if num - 1 in num_set:
continueSo we skip it.
If the previous number does not exist, num is the start of a sequence.
We count forward:
length = 1
while num + length in num_set:
length += 1For num = 1 and num_set = {1, 2, 3, 4, 100, 200}, this checks:
2 exists
3 exists
4 exists
5 missingSo the length is 4.
Finally, we keep the largest length:
best = max(best, length)Testing
def run_tests():
s = Solution()
assert s.longestConsecutive([100, 4, 200, 1, 3, 2]) == 4
assert s.longestConsecutive([0, 3, 7, 2, 5, 8, 4, 6, 0, 1]) == 9
assert s.longestConsecutive([1, 0, 1, 2]) == 3
assert s.longestConsecutive([]) == 0
assert s.longestConsecutive([5]) == 1
assert s.longestConsecutive([10, 30, 20]) == 1
assert s.longestConsecutive([-2, -1, 0, 1, 5]) == 4
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
[100, 4, 200, 1, 3, 2] | Standard unordered case |
[0, 3, 7, 2, 5, 8, 4, 6, 0, 1] | Long sequence with duplicate |
[1, 0, 1, 2] | Duplicate value should not extend length |
[] | Empty input |
[5] | Single element |
[10, 30, 20] | No consecutive pair |
[-2, -1, 0, 1, 5] | Negative numbers |