6.20 Inversion Counting

An inversion is a pair of positions that appears in the opposite order from the final sorted order. Inversion counting measures how far an array is from being sorted. It is useful for ranking distance, sorting analysis, permutation comparison, and adaptive algorithm design.

Problem

You have an array A[0..n-1]. You want to count the number of pairs (i, j) such that:

i < j and A[i] > A[j]

A sorted array has zero inversions. A reverse sorted array has the maximum number of inversions:

n(n - 1) / 2

Example

For:

A = [2, 4, 1, 3]

The inversions are:

(2, 1)
(4, 1)
(4, 3)

So the inversion count is:

Brute Force Baseline

The direct method checks every pair.

count_inversions_bruteforce(A):
  count = 0

  for i = 0 to n - 1:
    for j = i + 1 to n - 1:
      if A[i] > A[j]:
        count = count + 1

  return count

This is easy to verify but slow.

time:  O(n^2)
space: O(1)

It is useful as a test oracle for small arrays.

Merge Sort Method

Inversions can be counted during merge sort. The key observation is that when merging two sorted halves, if an element from the right half is smaller than an element from the left half, then it forms inversions with all remaining elements in the left half.

count_inversions(A):
  buffer = new array of length n
  return sort_count(A, buffer, 0, n)

sort_count(A, buffer, lo, hi):
  if hi - lo <= 1:
    return 0

  mid = lo + (hi - lo) / 2

  left_count = sort_count(A, buffer, lo, mid)
  right_count = sort_count(A, buffer, mid, hi)
  split_count = merge_count(A, buffer, lo, mid, hi)

  return left_count + right_count + split_count

merge_count(A, buffer, lo, mid, hi):
  i = lo
  j = mid
  k = lo
  count = 0

  while i < mid and j < hi:
    if A[i] <= A[j]:
      buffer[k] = A[i]
      i = i + 1
    else:
      buffer[k] = A[j]
      count = count + (mid - i)
      j = j + 1

    k = k + 1

  copy remaining A[i..mid-1] into buffer
  copy remaining A[j..hi-1] into buffer
  copy buffer[lo..hi-1] back into A[lo..hi-1]

  return count

Why the Merge Count Works

During merge, both halves are already sorted. Suppose:

A[i] > A[j]

where i is in the left half and j is in the right half. Since the left half is sorted, every element from A[i] through A[mid-1] is also greater than A[j].

Therefore A[j] creates:

mid - i

split inversions at once.

This avoids checking each pair separately.

Correctness

The recursive calls count inversions entirely inside the left half and entirely inside the right half.

The merge step counts split inversions, where the first element lies in the left half and the second lies in the right half.

Every inversion belongs to exactly one of these three categories:

inside the left half
inside the right half
across the split

The categories are disjoint, so their counts can be added.

The merge step also sorts the range, which is required by the parent call. Thus the algorithm simultaneously maintains sorted ranges and accumulates inversion counts.

Complexity

The recurrence is the same as merge sort:

T(n) = 2T(n/2) + O(n)

So the running time is:

O(n log n)

The auxiliary space is:

O(n)

for the merge buffer.

Fenwick Tree Method

Another approach uses coordinate compression and a Fenwick tree.

Scan from right to left. For each value, count how many smaller values have already appeared to its right.

count_inversions_fenwick(A):
  C = coordinate_compress(A)
  tree = Fenwick(size = number of distinct values)
  count = 0

  for i = n - 1 down to 0:
    count = count + tree.sum(C[i] - 1)
    tree.add(C[i], 1)

  return count

Here tree.sum(C[i] - 1) returns the number of elements to the right that are smaller than A[i].

Handling Duplicates

Duplicates are not inversions when the definition uses A[i] > A[j].

This is why the merge method uses:

if A[i] <= A[j]

When values are equal, the left element is emitted first and no inversion is counted.

In the Fenwick method, query only ranks strictly smaller than C[i]:

tree.sum(C[i] - 1)

Querying tree.sum(C[i]) would incorrectly count equal values as inversions.

Integer Size

The inversion count can be large. The maximum count is:

n(n - 1) / 2

For n = 100000, this is:

4999950000

This exceeds a 32-bit signed integer. Use a 64-bit integer or larger.

Implementation Notes

Use the merge sort method when you already need a sorted array or want a simple deterministic algorithm.

Use the Fenwick tree method when you need online-style counting over compressed values, or when the operation is part of a larger frequency-query algorithm.

Keep a brute force implementation for tests. Generate small random arrays and compare the optimized method against brute force.

Common Bugs

A common bug is counting equal values as inversions. Use strict comparison.

Another bug is forgetting that merge-based counting mutates the array. Copy the input first if the original order must be preserved.

A third bug is storing the count in a 32-bit integer.

A fourth bug is using uncompressed large values as Fenwick indices.

Takeaway

Inversion counting is a sorting-adjacent problem. Merge sort counts inversions by detecting cross-order violations during merging, while Fenwick trees count smaller elements to the right using compressed ranks.