15.14 Counting Inversions

You need to measure how far an array is from being sorted.

15.14 Counting Inversions

Problem

You need to measure how far an array is from being sorted.

Given an array:

[2, 4, 1, 3, 5]

an inversion is a pair of indexes:

$$ i < j $$

such that:

$$ a[i] > a[j] $$

In this array, the inversions are:

(2, 1)
(4, 1)
(4, 3)

So the inversion count is:

3

A sorted array has zero inversions.

A reverse-sorted array has the maximum number of inversions:

$$ \frac{n(n-1)}{2} $$

A direct solution compares every pair and runs in:

$$ O(n^2) $$

Can we count inversions faster?

Solution

Use a merge-sort style divide-and-conquer algorithm.

The key observation is that inversions fall into three categories:

  1. Inversions entirely inside the left half
  2. Inversions entirely inside the right half
  3. Cross inversions, where the larger element is in the left half and the smaller element is in the right half

The first two categories are counted recursively.

The third category is counted during the merge step.

This gives the same recurrence as merge sort:

$$ T(n)=2T(n/2)+O(n) $$

so the running time is:

$$ O(n\log n) $$

Brute Force Baseline

The direct method is simple:

func CountInversionsBrute(nums []int) int64 {
    var count int64

    for i := 0; i < len(nums); i++ {
        for j := i + 1; j < len(nums); j++ {
            if nums[i] > nums[j] {
                count++
            }
        }
    }

    return count
}

This is useful for testing, but not for large inputs.

With (n = 1,000,000), the number of pairs is roughly:

$$ 5 \times 10^{11} $$

That is too much work for ordinary use.

Divide Step

Split the array:

[2, 4, 1, 3, 5]

into:

left  = [2, 4]
right = [1, 3, 5]

Recursively count inversions in each half.

Then count inversions crossing the boundary.

The crucial condition for a cross inversion is:

left element > right element

Because both halves are sorted before merging, we can count many inversions at once.

Counting During Merge

Suppose the recursive calls have sorted the halves:

left  = [2, 4]
right = [1, 3, 5]

Now merge them.

Compare:

2 and 1

Since:

2 > 1

the right element 1 forms an inversion with every remaining element in the left half:

2
4

That contributes:

2 inversions

Then compare:

2 and 3

Since:

2 <= 3

there is no inversion. Take 2.

Then compare:

4 and 3

Since:

4 > 3

the right element 3 forms an inversion with every remaining element in the left half:

4

That contributes:

1 inversion

Total cross inversions:

3

Why This Works

During merge, both halves are sorted.

If:

left[i] > right[j]

then every element from:

left[i]
left[i+1]
...
left[end]

is also greater than right[j].

So instead of counting one inversion, we count:

len(left) - i

inversions immediately.

This is the entire algorithmic advantage.

Implementation

func CountInversions(nums []int) int64 {
    if len(nums) <= 1 {
        return 0
    }

    work := append([]int(nil), nums...)
    buf := make([]int, len(nums))

    return countSort(work, buf, 0, len(nums))
}

func countSort(a, buf []int, lo, hi int) int64 {
    if hi-lo <= 1 {
        return 0
    }

    mid := lo + (hi-lo)/2

    count := countSort(a, buf, lo, mid)
    count += countSort(a, buf, mid, hi)
    count += mergeAndCount(a, buf, lo, mid, hi)

    return count
}

func mergeAndCount(a, buf []int, lo, mid, hi int) int64 {
    i := lo
    j := mid
    k := lo

    var count int64

    for i < mid && j < hi {
        if a[i] <= a[j] {
            buf[k] = a[i]
            i++
        } else {
            buf[k] = a[j]
            j++

            count += int64(mid - i)
        }

        k++
    }

    for i < mid {
        buf[k] = a[i]
        i++
        k++
    }

    for j < hi {
        buf[k] = a[j]
        j++
        k++
    }

    copy(a[lo:hi], buf[lo:hi])

    return count
}

This implementation preserves the input by copying it first.

If mutation is acceptable, remove the copy and sort the input in place.

Example Trace

Input:

[2, 4, 1, 3, 5]

Recursive splits:

[2, 4]       [1, 3, 5]
[2] [4]      [1] [3, 5]

Left half:

[2] and [4]

No inversion.

Right half:

[3] and [5]

No inversion.

Merge:

[1] and [3, 5]

No inversion.

Final merge:

[2, 4] and [1, 3, 5]

Cross inversions:

2 > 1  contributes 2
4 > 3  contributes 1

Total:

3

Handling Duplicates

Use:

if a[i] <= a[j]

not:

if a[i] < a[j]

An inversion requires:

a[i] > a[j]

Equal values are not inversions.

Choosing the left element first when equal also keeps the merge stable.

Example:

[2, 2, 1]

Inversions:

(2, 1)
(2, 1)

Count:

2

The two equal 2 values do not form an inversion with each other.

Complexity Analysis

The algorithm follows merge sort.

Each level performs one full merge pass:

$$ O(n) $$

The number of levels is:

$$ O(\log n) $$

Total time:

$$ O(n\log n) $$

Extra space:

$$ O(n) $$

The inversion count itself may be as large as:

$$ \frac{n(n-1)}{2} $$

so use a 64-bit integer.

Maximum Inversion Count

For a reverse-sorted array:

[5, 4, 3, 2, 1]

every pair is inverted.

The number of inversions is:

$$ \binom{n}{2} = \frac{n(n-1)}{2} $$

For (n=100{,}000), this is:

4,999,950,000

which does not fit in a 32-bit signed integer.

Use:

int64

for the result.

Applications

Inversion counting appears in several domains.

Ranking Distance

If two rankings contain the same items, the number of inversions measures disagreement.

Example:

ranking A: [A, B, C, D]
ranking B: [B, A, D, C]

After mapping one ranking into the order of the other, inversion counting gives a Kendall tau style distance.

Sorting Difficulty

The inversion count measures how much work insertion sort must perform.

Insertion sort runs quickly when the number of inversions is small.

Data Quality

Unexpectedly high inversion counts can reveal timestamps, sequence numbers, or version vectors that are out of order.

Competitive Programming

Many problems reduce to counting pairs:

i < j and a[i] > a[j]

or variants such as:

a[i] > 2*a[j]

The merge-sort pattern often applies.

Variant: Reverse Pairs

A common variant counts pairs:

$$ i < j $$

and:

$$ a[i] > 2a[j] $$

The merge step must count these before merging.

Pseudo-code:

j = mid

for i in left half:
    while j < hi and a[i] > 2*a[j]:
        j++

    count += j - mid

Then perform the normal merge.

The same divide-and-conquer structure remains.

Common Mistakes

Counting Equal Values

Equal values are not inversions.

Use <= during merge to avoid counting duplicates incorrectly.

Using 32-Bit Counts

The count can exceed int32.

Use int64.

Forgetting to Merge

The recursive count is only correct if each half is sorted before returning.

Counting and sorting are coupled.

Mutating Unexpectedly

The algorithm sorts the working array.

Copy the input if callers expect the original order.

Counting Only Adjacent Swaps

An inversion is any out-of-order pair, not only neighboring elements.

Adjacent swaps are related, but the definition is global.

Discussion

Counting inversions is a model divide-and-conquer problem because the recursive split alone is not enough. The value comes from using sorted subproblems to count many cross-boundary pairs in one operation. A single comparison during merge may account for thousands or millions of inversions.

This pattern is broader than inversion counting. Many pair-counting problems can be solved by sorting one dimension and counting across the divide in another. Once the subarrays are ordered, an apparent quadratic comparison problem often becomes a linear merge scan.

The practical lesson is to look for batch counting opportunities. If a condition holds for one element and an ordered suffix, count the whole suffix at once. That is the same structural advantage that makes merge sort useful, repurposed here to compute a global statistic rather than produce only a sorted array.