A clear explanation of maintaining disjoint sorted intervals from a stream using insertion and merging.
Problem Restatement
We need to design a data structure called SummaryRanges.
It receives non-negative integers from a data stream, one number at a time.
After each insertion, it should be able to return all numbers seen so far as a sorted list of disjoint intervals.
For example, if the stream gives:
1, 3, 7, 2, 6Then the intervals change like this:
after 1: [[1, 1]]
after 3: [[1, 1], [3, 3]]
after 7: [[1, 1], [3, 3], [7, 7]]
after 2: [[1, 3], [7, 7]]
after 6: [[1, 3], [6, 7]]The number 2 connects [1, 1] and [3, 3], so they become [1, 3].
The number 6 extends [7, 7] to [6, 7].
Input and Output
| Method | Meaning |
|---|---|
SummaryRanges() | Initializes an empty stream |
addNum(value) | Adds one integer to the stream |
getIntervals() | Returns the current disjoint intervals sorted by start |
The output of getIntervals() is a list of intervals:
[[start1, end1], [start2, end2], ...]Each interval is inclusive.
Examples
Start with an empty object:
summaryRanges = SummaryRanges()Add 1:
summaryRanges.addNum(1)
summaryRanges.getIntervals()Output:
[[1, 1]]Add 3:
summaryRanges.addNum(3)
summaryRanges.getIntervals()Output:
[[1, 1], [3, 3]]Add 7:
summaryRanges.addNum(7)
summaryRanges.getIntervals()Output:
[[1, 1], [3, 3], [7, 7]]Add 2:
summaryRanges.addNum(2)
summaryRanges.getIntervals()Output:
[[1, 3], [7, 7]]Add 6:
summaryRanges.addNum(6)
summaryRanges.getIntervals()Output:
[[1, 3], [6, 7]]First Thought: Store All Numbers
The simplest design is to store every number in a set.
When getIntervals() is called:
- Sort all numbers.
- Scan them from left to right.
- Merge consecutive numbers into intervals.
Code idea:
class SummaryRanges:
def __init__(self):
self.values = set()
def addNum(self, value: int) -> None:
self.values.add(value)
def getIntervals(self) -> list[list[int]]:
nums = sorted(self.values)
result = []
for x in nums:
if not result or result[-1][1] + 1 < x:
result.append([x, x])
else:
result[-1][1] = x
return resultThis is simple and correct.
But every call to getIntervals() sorts all numbers again.
If there are many queries, this becomes expensive.
Problem With Brute Force
Suppose we have k unique numbers.
| Operation | Cost |
|---|---|
addNum | O(1) average |
getIntervals | O(k log k) |
This design delays the work until query time.
A better design maintains the intervals incrementally when each number is added.
Then getIntervals() can simply return the current interval list.
Key Insight
The intervals are always sorted and disjoint.
When we insert a new value, it can only interact with the intervals immediately around its position.
There are four cases.
Case 1: the value is already inside an interval.
intervals = [[1, 5]]
value = 3Nothing changes.
Case 2: the value extends the left interval.
intervals = [[1, 3]]
value = 4Result:
[[1, 4]]Case 3: the value extends the right interval.
intervals = [[5, 8]]
value = 4Result:
[[4, 8]]Case 4: the value connects two intervals.
intervals = [[1, 2], [4, 5]]
value = 3Result:
[[1, 5]]So for every new value, we only need to find its insertion position among the current intervals.
Algorithm
Maintain a sorted list:
self.intervalsEach item is:
[start, end]For addNum(value):
- Use binary search to find the first interval whose start is greater than
value. - Let
ibe that position. - The possible left neighbor is
i - 1. - The possible right neighbor is
i. - Check whether
valueis already inside the left interval. - Check whether
valueconnects to the left interval. - Check whether
valueconnects to the right interval. - Merge or insert as needed.
For getIntervals():
Return the current intervals.
Correctness
The data structure starts empty, so the interval list is initially sorted and disjoint.
Assume before adding a value, all intervals are sorted and disjoint.
Binary search finds where the value belongs by start position. Therefore, only the interval immediately before that position and the interval at that position can overlap with or touch the new value.
No earlier interval can touch the value because it ends before the left neighbor. No later interval can touch the value because it starts after the right neighbor.
The algorithm handles all valid possibilities:
If the value lies inside the left interval, the set of represented numbers stays the same.
If the value touches the left interval, the algorithm extends that interval.
If the value touches the right interval, the algorithm extends that interval.
If the value touches both, the algorithm merges both intervals through the new value.
If it touches neither, the algorithm inserts [value, value].
After each case, the resulting intervals remain sorted and disjoint, and they represent exactly all numbers seen so far.
Therefore, every call to getIntervals() returns the correct summary.
Complexity
Let r be the number of current intervals.
| Operation | Time | Space |
|---|---|---|
addNum | O(log r + r) | O(1) extra |
getIntervals | O(r) | O(r) if returning a copy |
| Storage | O(r) | Stores only intervals |
The binary search costs O(log r).
Insertion or deletion in a Python list can cost O(r) because elements may shift.
In languages with a balanced tree map, addNum can be implemented in O(log r).
Implementation
from bisect import bisect_left
class SummaryRanges:
def __init__(self):
self.intervals = []
def addNum(self, value: int) -> None:
intervals = self.intervals
i = bisect_left(intervals, [value, value])
left_touch = i > 0 and intervals[i - 1][1] + 1 >= value
right_touch = i < len(intervals) and value + 1 >= intervals[i][0]
if left_touch and right_touch:
intervals[i - 1][1] = max(intervals[i - 1][1], intervals[i][1])
intervals.pop(i)
elif left_touch:
intervals[i - 1][1] = max(intervals[i - 1][1], value)
elif right_touch:
intervals[i][0] = min(intervals[i][0], value)
else:
intervals.insert(i, [value, value])
def getIntervals(self) -> list[list[int]]:
return self.intervalsCode Explanation
We keep intervals sorted by their start:
self.intervals = []For each new value, we find where [value, value] would be inserted:
i = bisect_left(intervals, [value, value])The left neighbor, if it exists, is:
intervals[i - 1]The right neighbor, if it exists, is:
intervals[i]The value touches or lies inside the left interval when:
intervals[i - 1][1] + 1 >= valueThis covers both cases:
[1, 3] with value 4
[1, 5] with value 3The value touches or lies before the right interval when:
value + 1 >= intervals[i][0]This covers:
[5, 7] with value 4If both sides touch, the value connects two intervals:
[[1, 2], [4, 5]], value = 3So we merge them:
intervals[i - 1][1] = max(intervals[i - 1][1], intervals[i][1])
intervals.pop(i)If only the left side touches, extend the left interval.
If only the right side touches, extend the right interval.
If neither side touches, insert a new single-value interval.
Testing
def run_tests():
sr = SummaryRanges()
assert sr.getIntervals() == []
sr.addNum(1)
assert sr.getIntervals() == [[1, 1]]
sr.addNum(3)
assert sr.getIntervals() == [[1, 1], [3, 3]]
sr.addNum(7)
assert sr.getIntervals() == [[1, 1], [3, 3], [7, 7]]
sr.addNum(2)
assert sr.getIntervals() == [[1, 3], [7, 7]]
sr.addNum(6)
assert sr.getIntervals() == [[1, 3], [6, 7]]
sr.addNum(3)
assert sr.getIntervals() == [[1, 3], [6, 7]]
sr.addNum(4)
assert sr.getIntervals() == [[1, 4], [6, 7]]
sr.addNum(5)
assert sr.getIntervals() == [[1, 7]]
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
| Empty intervals | Checks initial state |
| Add isolated values | Creates separate intervals |
| Add bridge value | Merges two intervals |
| Add duplicate value | Should not change anything |
| Add value before interval | Extends right interval |
| Add final bridge | Merges everything into one interval |