# 6.25 Choosing the Right Sort

# 6.25 Choosing the Right Sort

There is no single best sorting algorithm. The correct choice depends on input size, data characteristics, memory constraints, and required guarantees such as stability or worst-case bounds.

## Problem

You need to select a sorting algorithm for a specific workload. The goal is to balance performance, memory usage, stability, and implementation complexity.

## Decision Factors

Key dimensions that determine the choice:

```text id="f2j1kx"
input size n
data distribution
key type and range
stability requirement
memory constraints
need for worst-case guarantees
parallel or external environment
```

## Quick Reference Table

| Scenario                        | Recommended Approach             |
| ------------------------------- | -------------------------------- |
| General-purpose in-memory sort  | quick sort or hybrid (introsort) |
| Need stability                  | merge sort or Timsort            |
| Small arrays                    | insertion sort                   |
| Nearly sorted data              | insertion sort or adaptive sort  |
| Integer keys with small range   | counting sort                    |
| Fixed-width integers            | radix sort                       |
| Floating-point in [0,1) uniform | bucket sort                      |
| Top k elements                  | heap or quickselect              |
| External data (disk)            | external merge sort              |
| Parallel environment            | parallel merge or sample sort    |

## General-Purpose Sorting

For most in-memory cases:

```text id="0b8y9t"
introsort = quick sort + heap sort fallback + insertion sort for small ranges
```

This combination provides:

* fast average performance
* protection against worst-case behavior
* low memory usage

It is used in many standard libraries.

## When Stability Matters

Choose a stable algorithm when equal-key order is important.

```text id="m1q3sw"
merge sort
Timsort
counting sort (stable variant)
radix sort (with stable digit sort)
```

Use cases:

* multi-key sorting
* records with secondary fields
* user-visible ordering

## When Memory is Limited

Use in-place algorithms:

```text id="y6g5tz"
quick sort
heap sort
```

Heap sort gives strong worst-case guarantees:

```text id="m7f1rs"
O(n log n) time
O(1) extra space
```

Quick sort is usually faster in practice but may require safeguards.

## When Data is Nearly Sorted

Exploit structure:

```text id="p4q8vb"
insertion sort → O(n + inversions)
Timsort → detects runs and merges efficiently
```

These can approach linear time.

## When Keys Have Structure

Use non-comparison sorting.

```text id="6s8t9e"
counting sort → small integer range
radix sort → fixed-width integers or strings
bucket sort → uniform distribution
```

These achieve linear or near-linear time by avoiding comparisons.

## When Only Part of the Order is Needed

Do not sort everything.

```text id="n5h2zy"
top k → heap or quickselect
median → quickselect
```

This reduces unnecessary work.

## External Sorting

For datasets larger than memory:

```text id="z1p3kd"
external merge sort
```

Design for:

* sequential disk access
* large block transfers
* minimal passes over data

## Parallel Sorting

Use algorithms that divide work cleanly:

```text id="x9c7vu"
parallel merge sort
parallel quick sort
sample sort (distributed)
radix sort (GPU)
```

Focus on load balancing and minimizing synchronization.

## Practical Defaults

In most environments:

```text id="d2k4nv"
use the language’s built-in sort
```

Built-in sorts are highly optimized, tested, and often adaptive.

Examples:

* Timsort for objects
* introsort for primitive types

Custom implementations are justified when:

* constraints differ from standard assumptions
* specialized key structure exists
* learning or research purposes

## Tradeoffs Summary

| Algorithm      | Time           | Space    | Stable | Notes                           |
| -------------- | -------------- | -------- | ------ | ------------------------------- |
| Insertion sort | O(n²)          | O(1)     | Yes    | fast for small or nearly sorted |
| Merge sort     | O(n log n)     | O(n)     | Yes    | predictable, stable             |
| Quick sort     | O(n log n) avg | O(log n) | No     | fast in practice                |
| Heap sort      | O(n log n)     | O(1)     | No     | worst-case guarantee            |
| Counting sort  | O(n + k)       | O(n + k) | Yes    | small integer keys              |
| Radix sort     | O(d(n + b))    | O(n + b) | Yes    | structured keys                 |
| Bucket sort    | O(n) avg       | O(n)     | Yes    | uniform distribution            |

## Common Mistakes

* Using quick sort without pivot safeguards on adversarial input
* Using counting sort when key range is large
* Ignoring stability requirements
* Sorting entire data when only top k is needed
* Reimplementing standard algorithms without necessity

## Selection Strategy

A practical decision process:

```text id="c7t9wr"
if n is small:
  use insertion sort

else if keys are integers with small range:
  use counting or radix sort

else if stability required:
  use merge sort or Timsort

else if memory constrained:
  use quick sort or heap sort

else:
  use built-in hybrid sort
```

Adjust based on environment and constraints.

## Takeaway

Choosing the right sort is a decision problem. Match algorithm properties to data characteristics and system constraints. When in doubt, rely on well-engineered standard library implementations.