Detect natural runs in input and normalize them for efficient merging in Timsort.
Timsort run detection is the first phase of Timsort. It scans the array, detects natural runs, and normalizes them into a form suitable for efficient merging.
You use it as part of adaptive sorting where exploiting existing order matters.
Problem
Given an array ( A ), partition it into runs such that:
- each run is monotonic (non-decreasing or strictly decreasing)
- runs satisfy a minimum length constraint
Idea
Timsort improves on simple natural run detection by:
- Detecting both ascending and descending runs
- Reversing descending runs
- Extending short runs to a minimum size using insertion sort
This guarantees balanced merging later.
Algorithm
timsort_run_detection(A, min_run):
runs = []
n = length(A)
i = 0
while i < n:
j = i + 1
if j < n and A[j] < A[i]:
while j < n and A[j] < A[j-1]:
j = j + 1
reverse(A[i:j])
else:
while j < n and A[j] >= A[j-1]:
j = j + 1
run_length = j - i
if run_length < min_run:
end = min(i + min_run, n)
insertion_sort(A, i, end)
j = end
runs.append((i, j))
i = j
return runsExample
Input:
[ A = [5, 4, 3, 7, 8, 2, 1, 6] ]
Detected runs:
- [5, 4, 3] → reversed → [3, 4, 5]
- [7, 8]
- [2, 1] → reversed → [1, 2]
- [6]
After enforcing minimum run size, small runs may be extended.
Correctness
Each run is transformed into a non-decreasing sequence. Reversing preserves all elements. Extending with insertion sort produces sorted segments. These runs form a valid partition of the array and can be merged to produce a fully sorted result.
Complexity
| phase | cost |
|---|---|
| run detection | ( O(n) ) |
| extension | ( O(n) ) amortized |
Total overhead is linear before merging.
Why Minimum Run Matters
Without a minimum run size, many tiny runs would lead to inefficient merging. Timsort ensures:
- fewer runs
- balanced merge tree
- improved cache locality
Typical minimum run size:
[ 32 \le \text{min_run} \le 64 ]
When to Use
Run detection is useful when:
- input contains ordered segments
- adaptive sorting is desired
- stable and high performance sorting is required
This phase is a key reason why Timsort performs well on real world data.