Replace large or sparse keys with small dense ranks that preserve order, making range-based and indexed structures practical on wide-valued data.
Coordinate compression replaces large or sparse keys with small dense ranks. It preserves order while reducing the value range. This makes range-based algorithms, counting arrays, Fenwick trees, segment trees, and bucketed structures practical on data with large numeric keys.
Problem
You have values such as:
[1000000000, -7, 42, 1000000000, 500000]The values are too large or too sparse to use directly as array indices. You want to replace each distinct value with a compact integer rank while preserving order.
Solution
Collect the distinct values, sort them, and map each value to its index in the sorted distinct list.
coordinate_compress(A):
values = sorted(unique(A))
rank = empty map
for i = 0 to length(values) - 1:
rank[values[i]] = i
return [rank[x] for x in A]For the input:
[1000000000, -7, 42, 1000000000, 500000]The sorted distinct values are:
[-7, 42, 500000, 1000000000]The rank map is:
-7 -> 0
42 -> 1
500000 -> 2
1000000000 -> 3The compressed array is:
[3, 0, 1, 3, 2]Order Preservation
Coordinate compression preserves order:
x < y if and only if rank[x] < rank[y]This property is the reason compression is safe for comparison-based range logic. It allows an algorithm to use compact indices without changing relative ordering.
Compression does not preserve distances. If:
x = 10
y = 1000000their ranks may differ by only 1 if no value lies between them. Therefore coordinate compression is valid for order queries, but not for arithmetic that depends on numeric distance.
Example: Counting Inversions
To count inversions with a Fenwick tree, values must be usable as indices.
count_inversions(A):
C = coordinate_compress(A)
tree = Fenwick(size = number of distinct values)
inversions = 0
for i = length(C) - 1 down to 0:
inversions += tree.sum(C[i] - 1)
tree.add(C[i], 1)
return inversionsHere tree.sum(C[i] - 1) counts how many smaller values have appeared to the right of A[i].
The original values may be negative or huge. The compressed ranks are small nonnegative integers, so the Fenwick tree remains compact.
Correctness
The compression map is built from the sorted list of distinct values. Therefore, if x < y, then x appears before y in that list, so rank[x] < rank[y].
If rank[x] < rank[y], then x appears before y in the sorted distinct list, so x < y.
Equal values receive the same rank because the map is built from distinct values. Thus equality is preserved as well:
x = y if and only if rank[x] = rank[y]Any algorithm that depends only on order and equality can safely use compressed coordinates.
Complexity
Let:
n = number of input elements
m = number of distinct valuesBuilding the distinct sorted list costs:
O(n log n)or more precisely:
O(n log m)if values are inserted into an ordered set.
Building the rank map costs:
O(m)Mapping the input costs:
O(n)Total time is usually:
O(n log n)Space usage is:
O(n)or:
O(m)if the compressed output can overwrite the original array.
Lower Bound and Upper Bound Queries
Sometimes you need to map a value that was not present in the original array.
For example, given compressed coordinates from:
[10, 20, 50]you may need the first coordinate greater than or equal to 30.
Use binary search on the sorted distinct list:
lower_bound(values, x):
return first index i such that values[i] >= xFor x = 30, the result is index 2, which corresponds to value 50.
Similarly:
upper_bound(values, x):
return first index i such that values[i] > xThis is useful for offline range queries and interval endpoints.
Coordinate Compression for Intervals
For intervals, compress all endpoints:
intervals = [(l1, r1), (l2, r2), ...]
values = sorted(unique(all li and ri))Then replace each endpoint with its rank.
Be careful about interval semantics. Closed intervals, half-open intervals, and gaps between coordinates behave differently. If the algorithm must represent actual covered lengths, plain compression is not enough; you must also keep the original coordinate values and compute segment lengths from adjacent coordinates.
Implementation Notes
Use coordinate compression when:
values are large
values are sparse
only order matters
array indexing is neededDo not use it when:
numeric distance matters
missing coordinates must be represented explicitly
the value domain is already smallWhen the algorithm needs both compact indices and original values, keep the sorted distinct array. It serves as the inverse map:
original_value = values[rank]Common Bugs
A common bug is assuming compressed ranks preserve differences. They do not.
Another bug is giving duplicate values different ranks. Equal input values must compress to the same integer.
A third bug is compressing only left interval endpoints and forgetting right endpoints. Range algorithms usually need all relevant boundaries.
A fourth bug is using one-based and zero-based ranks inconsistently. Fenwick trees are often one-based internally, while compressed coordinates are naturally zero-based. Define the conversion explicitly.
Takeaway
Coordinate compression turns large sparse keys into small dense ranks while preserving order and equality. It is a bridge between arbitrary values and array-indexed data structures.