# 6.18 Sorting Records

# 6.18 Sorting Records

Sorting records means sorting structured values by one or more fields. The record itself usually contains more information than the key used for ordering. The sorting algorithm must move the whole record while comparing only the selected fields.

## Problem

You have records such as:

```text id="nx9x1p"
(name, department, salary, hired_at)
```

You want to sort them by a rule such as:

```text id="7fks3x"
department ascending
salary descending
hired_at ascending
```

## Solution

Define a key that represents the intended order.

```text id="hsq5rb"
key(record):
  return (
    record.department,
    -record.salary,
    record.hired_at
  )
```

Then sort records by that key.

```text id="oqxino"
sort(records, key = key)
```

The sorted order is lexicographic. First compare `department`. If departments are equal, compare `salary` in descending order. If salaries are equal, compare `hired_at`.

## Example

Input:

```text id="wqgpya"
[
  (Ana,  Eng, 120, 2021),
  (Bo,   Ops, 100, 2020),
  (Cy,   Eng, 140, 2022),
  (Dee,  Eng, 120, 2019)
]
```

Sort by department ascending, salary descending, hired year ascending:

```text id="f8stnp"
[
  (Cy,   Eng, 140, 2022),
  (Dee,  Eng, 120, 2019),
  (Ana,  Eng, 120, 2021),
  (Bo,   Ops, 100, 2020)
]
```

`Cy` comes first among engineering records because salary `140` is highest. `Dee` comes before `Ana` because both have salary `120`, and `2019` comes before `2021`.

## Comparator Form

The same rule can be written as a comparator.

```text id="9dh7dz"
compare(a, b):
  if a.department < b.department: return -1
  if a.department > b.department: return 1

  if a.salary > b.salary: return -1
  if a.salary < b.salary: return 1

  if a.hired_at < b.hired_at: return -1
  if a.hired_at > b.hired_at: return 1

  return 0
```

This form is explicit, but longer. The key-extraction form is usually easier to read and test.

## Stability and Original Order

If the key does not include every field, multiple records may compare equal. A stable sort keeps equal-key records in their input order.

For example, if you sort only by department:

```text id="6q1cnr"
stable_sort(records, key = department)
```

then employees within the same department remain in their original order.

With an unstable sort, that internal order may change.

## Multi-Pass Sorting

Stable sorting allows multi-pass record sorting.

To sort by department first and salary second, apply the less important key first:

```text id="xt7qrq"
stable_sort(records, key = salary)
stable_sort(records, key = department)
```

The final order is by department. Within each department, salary order remains because the second pass is stable.

For descending salary, use:

```text id="ls5lyi"
stable_sort(records, key = salary, descending = true)
stable_sort(records, key = department)
```

## Correctness

The key function maps each record to an ordering value. Sorting by the key is correct when the key order matches the desired record order.

For two records `a` and `b`, the output order is determined by:

```text id="c58suc"
key(a) <= key(b)
```

If the key is lexicographic, earlier key components dominate later ones. Later components only matter when earlier components are equal.

The permutation condition holds because the algorithm moves records, not only keys. Every output record must be one of the input records.

## Preserving Record Identity

Do not sort keys separately from records unless you maintain a link between them.

Incorrect pattern:

```text id="9y92ts"
keys = [key(r) for r in records]
sort(keys)
```

This loses the association between keys and records.

Correct pattern:

```text id="0e40yo"
pairs = [(key(r), r) for r in records]
sort(pairs)
records = [r for (_, r) in pairs]
```

This is the decorate-sort-undecorate pattern.

## Handling Missing Fields

Real records often contain missing values. Define a policy before sorting.

Common policies:

```text id="y1o4qw"
missing values first
missing values last
missing values treated as zero
missing values rejected as invalid
```

The policy must be consistent. Inconsistent handling can break comparator transitivity.

Example:

```text id="d61oje"
key(record):
  if record.salary is missing:
    return (record.department, 1, 0)
  else:
    return (record.department, 0, -record.salary)
```

Here the second key component places present salaries before missing salaries.

## Complexity

If sorting uses comparison sort, the number of comparisons is usually:

```text id="0ymb1s"
O(n log n)
```

If key extraction costs `c`, extracting the key inside every comparison can be expensive.

A comparator may compute keys repeatedly:

```text id="uubb3y"
O(c n log n)
```

Decorating records computes each key once:

```text id="6g1d8a"
O(c n)
```

then sorts decorated pairs:

```text id="qhz8y0"
O(n log n)
```

This is often faster when key extraction is nontrivial.

## Implementation Notes

Use key extraction when the language supports it. It is concise, testable, and usually safer than hand-written comparator code.

Use comparators when the order cannot be expressed cleanly as a tuple key, or when comparison must short-circuit expensive fields.

Keep the original index if stable behavior is required but the sort implementation is unstable:

```text id="2x3cbn"
key(record, index):
  return (primary_key(record), index)
```

This forces equal keys to preserve input order.

## Common Bugs

A common bug is sorting only the key array and forgetting to move records with their keys.

Another bug is reversing the whole tuple when only one field should be descending. For mixed ascending and descending fields, transform only the descending component.

A third bug is leaving missing values undefined. Sorting must know how to compare every pair of records.

A fourth bug is using an unstable sort for multi-pass sorting. The second pass may destroy the order created by the first pass.

## Takeaway

Sorting records is mostly about defining the right key. Once the key is correct, ordinary sorting machinery applies. Stability, missing-value policy, and key extraction cost determine the practical implementation.

