# Timsort

# Timsort

Timsort is a hybrid stable sorting algorithm designed for real world data. It combines ideas from merge sort and insertion sort, with strong emphasis on exploiting existing order.

It is the default sorting algorithm in languages such as Python and Java (for objects). Its design focuses on practical performance rather than purely theoretical bounds.

## Problem

Given an array $A$ of length $n$, reorder it such that:

$$
A[0] \le A[1] \le \dots \le A[n-1]
$$

while preserving stability.

## Core Ideas

Timsort builds on three key ideas:

| idea           | purpose                          |
| -------------- | -------------------------------- |
| run detection  | identify already sorted segments |
| insertion sort | efficiently extend short runs    |
| merge strategy | merge runs with size invariants  |

### Run

A run is a monotonic sequence:

* increasing: $A[i] \le A[i+1]$
* or decreasing (reversed to increasing)

Short runs are extended to a minimum size using insertion sort.

### Stack of Runs

Runs are pushed onto a stack. Timsort enforces size constraints between runs to ensure efficient merging.

Typical invariant:

$$
|X| > |Y| + |Z|
$$

for the top runs $X, Y, Z$ on the stack.

## Algorithm

```text id="7v2qpf"
timsort(A):
    runs = []

    while not at end of A:
        run = detect_run(A)

        if length(run) < min_run:
            extend run using insertion sort

        push run onto stack
        maintain_merge_invariants(runs)

    merge all runs on stack
```

Merge invariants ensure that runs are merged in a balanced way, avoiding worst case behavior.

## Example

Let:

$$
A = [1, 2, 3, 7, 6, 5, 8, 9]
$$

Detected runs:

* [1, 2, 3, 7]
* [6, 5] → reversed → [5, 6]
* [8, 9]

Stack:

| run       | size |
| --------- | ---- |
| [1,2,3,7] | 4    |
| [5,6]     | 2    |
| [8,9]     | 2    |

Merging proceeds based on invariants:

Final result:

$$
[1,2,3,5,6,7,8,9]
$$

## Correctness

Each run is sorted by construction. The merge step preserves order and stability. The stack invariants guarantee that merges occur in a structured way, preventing unbalanced recursion patterns. By induction over the merge process, the final array is sorted.

## Complexity

| case                  | time          |
| --------------------- | ------------- |
| best (already sorted) | $O(n)$        |
| worst                 | $O(n \log n)$ |
| average               | $O(n \log n)$ |

Space:

$$
O(n)
$$

## Properties

| property              | value     |
| --------------------- | --------- |
| stable                | yes       |
| adaptive              | yes       |
| in-place              | no        |
| practical performance | excellent |

## Notes

Timsort is highly optimized for partially sorted data. It minimizes comparisons and memory movement in typical workloads.

A key optimization is **galloping mode**, where merging switches to exponential search when one run dominates comparisons.

This algorithm is widely used in standard libraries due to its strong real world performance.

## Implementation

A full Timsort implementation is large. The following simplified version shows the structure without all optimizations.

```python id="p8s2kx"
def timsort(a):
    min_run = 32
    n = len(a)

    def insertion_sort(left, right):
        for i in range(left + 1, right + 1):
            key = a[i]
            j = i - 1
            while j >= left and a[j] > key:
                a[j + 1] = a[j]
                j -= 1
            a[j + 1] = key

    size = min_run
    for start in range(0, n, size):
        end = min(start + size - 1, n - 1)
        insertion_sort(start, end)

    size = min_run
    while size < n:
        for left in range(0, n, 2 * size):
            mid = min(left + size - 1, n - 1)
            right = min(left + 2 * size - 1, n - 1)

            if mid < right:
                merge(a, left, mid, right)

        size *= 2

    return a

def merge(a, l, mid, r):
    left = a[l:mid+1]
    right = a[mid+1:r+1]

    i = j = 0
    k = l

    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            a[k] = left[i]
            i += 1
        else:
            a[k] = right[j]
            j += 1
        k += 1

    while i < len(left):
        a[k] = left[i]
        i += 1
        k += 1

    while j < len(right):
        a[k] = right[j]
        j += 1
        k += 1
```

