Exploit near-sorted structure in inputs like append-only logs or incremental updates to sort in linear or near-linear time.
Nearly sorted inputs appear frequently in practice. Logs appended over time, incremental updates, and cached results often differ only slightly from sorted order. Algorithms that exploit this structure can run in linear or near-linear time.
Problem
You have an array A[0..n-1] that is close to sorted. You want to sort it faster than general-purpose O(n log n) algorithms.
Measuring “Nearly Sorted”
Several measures capture how far an array is from sorted.
Number of inversions
An inversion is a pair (i, j) with i < j and A[i] > A[j]. A sorted array has zero inversions.
Maximum displacement
Each element moves at most d positions from its sorted location.
Runs
The array can be decomposed into a small number of already sorted segments.
Different algorithms exploit different measures.
Insertion Sort on Nearly Sorted Data
Insertion sort adapts to low inversion count.
For each element, the cost is proportional to how far it must move left. If most elements are already near their correct positions, the total work is small.
Time complexity becomes:
O(n + number_of_inversions)If inversions are few, this is close to linear.
Example
[1, 2, 3, 5, 4, 6, 7]Only one inversion exists: (5, 4). Insertion sort shifts 5 once and finishes quickly.
Heap-Based Method for Bounded Displacement
If each element is at most d positions away from its correct position, use a min-heap of size d + 1.
sort_k_sorted(A, d):
heap = first d + 1 elements
for i = d + 1 to n - 1:
output extract_min(heap)
insert A[i] into heap
while heap not empty:
output extract_min(heap)Invariant
At each step, the smallest remaining element must lie within the next d + 1 elements of the input. The heap stores exactly those candidates.
Complexity
Each heap operation costs:
O(log d)Total time:
O(n log d)When d is small, this is close to linear.
Run-Based Sorting
If the array consists of sorted runs, detect them and merge.
identify runs
merge runs using k-way mergeIf there are r runs, merging costs:
O(n log r)This is efficient when r is small.
This idea is used in practical algorithms such as Timsort, which detects natural runs and merges them.
Adaptive Merge Strategy
Instead of fixed-size splits, scan the array to find increasing or decreasing segments:
[1, 2, 3] [9, 8, 7] [10, 11]Reverse decreasing runs to make them increasing:
[1, 2, 3] [7, 8, 9] [10, 11]Then merge these runs efficiently.
Correctness
Each method relies on a structural invariant:
- Insertion sort maintains a sorted prefix and shifts only necessary elements
- Heap method maintains a candidate window containing the next correct element
- Run-based methods maintain sorted subarrays and merge them correctly
All methods preserve permutation because they only move existing elements.
Choosing a Strategy
| Property | Recommended method |
|---|---|
| Few inversions | Insertion sort |
Bounded displacement d | Heap-based sort |
| Few sorted runs | Run detection + merge |
| Unknown structure | Hybrid adaptive sort |
Practical Algorithms
Modern standard libraries use adaptive algorithms.
Timsort, used in Python and Java, combines:
run detection
insertion sort for small runs
merge sort for combining runsIt guarantees:
O(n log n) worst case
O(n) on nearly sorted inputCommon Bugs
A common mistake is assuming “nearly sorted” without measuring it. Algorithms like insertion sort degrade to quadratic time if the assumption is false.
Another bug is choosing d incorrectly in the heap method. If d is underestimated, correctness fails.
A third bug is failing to reverse descending runs before merging, which breaks the assumption that runs are sorted.
Takeaway
Nearly sorted data allows algorithms to skip unnecessary work. By exploiting structure such as small inversions, bounded displacement, or existing runs, sorting can approach linear time in practice.