# 6.19 Coordinate Compression

# 6.19 Coordinate Compression

Coordinate compression replaces large or sparse keys with small dense ranks. It preserves order while reducing the value range. This makes range-based algorithms, counting arrays, Fenwick trees, segment trees, and bucketed structures practical on data with large numeric keys.

## Problem

You have values such as:

```text
[1000000000, -7, 42, 1000000000, 500000]
```

The values are too large or too sparse to use directly as array indices. You want to replace each distinct value with a compact integer rank while preserving order.

## Solution

Collect the distinct values, sort them, and map each value to its index in the sorted distinct list.

```text
coordinate_compress(A):
  values = sorted(unique(A))
  rank = empty map

  for i = 0 to length(values) - 1:
    rank[values[i]] = i

  return [rank[x] for x in A]
```

For the input:

```text
[1000000000, -7, 42, 1000000000, 500000]
```

The sorted distinct values are:

```text
[-7, 42, 500000, 1000000000]
```

The rank map is:

```text
-7         -> 0
42         -> 1
500000     -> 2
1000000000 -> 3
```

The compressed array is:

```text
[3, 0, 1, 3, 2]
```

## Order Preservation

Coordinate compression preserves order:

```text
x < y  if and only if  rank[x] < rank[y]
```

This property is the reason compression is safe for comparison-based range logic. It allows an algorithm to use compact indices without changing relative ordering.

Compression does not preserve distances. If:

```text
x = 10
y = 1000000
```

their ranks may differ by only `1` if no value lies between them. Therefore coordinate compression is valid for order queries, but not for arithmetic that depends on numeric distance.

## Example: Counting Inversions

To count inversions with a Fenwick tree, values must be usable as indices.

```text
count_inversions(A):
  C = coordinate_compress(A)
  tree = Fenwick(size = number of distinct values)
  inversions = 0

  for i = length(C) - 1 down to 0:
    inversions += tree.sum(C[i] - 1)
    tree.add(C[i], 1)

  return inversions
```

Here `tree.sum(C[i] - 1)` counts how many smaller values have appeared to the right of `A[i]`.

The original values may be negative or huge. The compressed ranks are small nonnegative integers, so the Fenwick tree remains compact.

## Correctness

The compression map is built from the sorted list of distinct values. Therefore, if `x < y`, then `x` appears before `y` in that list, so `rank[x] < rank[y]`.

If `rank[x] < rank[y]`, then `x` appears before `y` in the sorted distinct list, so `x < y`.

Equal values receive the same rank because the map is built from distinct values. Thus equality is preserved as well:

```text
x = y  if and only if  rank[x] = rank[y]
```

Any algorithm that depends only on order and equality can safely use compressed coordinates.

## Complexity

Let:

```text
n = number of input elements
m = number of distinct values
```

Building the distinct sorted list costs:

```text
O(n log n)
```

or more precisely:

```text
O(n log m)
```

if values are inserted into an ordered set.

Building the rank map costs:

```text
O(m)
```

Mapping the input costs:

```text
O(n)
```

Total time is usually:

```text
O(n log n)
```

Space usage is:

```text
O(n)
```

or:

```text
O(m)
```

if the compressed output can overwrite the original array.

## Lower Bound and Upper Bound Queries

Sometimes you need to map a value that was not present in the original array.

For example, given compressed coordinates from:

```text
[10, 20, 50]
```

you may need the first coordinate greater than or equal to `30`.

Use binary search on the sorted distinct list:

```text
lower_bound(values, x):
  return first index i such that values[i] >= x
```

For `x = 30`, the result is index `2`, which corresponds to value `50`.

Similarly:

```text
upper_bound(values, x):
  return first index i such that values[i] > x
```

This is useful for offline range queries and interval endpoints.

## Coordinate Compression for Intervals

For intervals, compress all endpoints:

```text
intervals = [(l1, r1), (l2, r2), ...]
values = sorted(unique(all li and ri))
```

Then replace each endpoint with its rank.

Be careful about interval semantics. Closed intervals, half-open intervals, and gaps between coordinates behave differently. If the algorithm must represent actual covered lengths, plain compression is not enough; you must also keep the original coordinate values and compute segment lengths from adjacent coordinates.

## Implementation Notes

Use coordinate compression when:

```text
values are large
values are sparse
only order matters
array indexing is needed
```

Do not use it when:

```text
numeric distance matters
missing coordinates must be represented explicitly
the value domain is already small
```

When the algorithm needs both compact indices and original values, keep the sorted distinct array. It serves as the inverse map:

```text
original_value = values[rank]
```

## Common Bugs

A common bug is assuming compressed ranks preserve differences. They do not.

Another bug is giving duplicate values different ranks. Equal input values must compress to the same integer.

A third bug is compressing only left interval endpoints and forgetting right endpoints. Range algorithms usually need all relevant boundaries.

A fourth bug is using one-based and zero-based ranks inconsistently. Fenwick trees are often one-based internally, while compressed coordinates are naturally zero-based. Define the conversion explicitly.

## Takeaway

Coordinate compression turns large sparse keys into small dense ranks while preserving order and equality. It is a bridge between arbitrary values and array-indexed data structures.

