# Weighted Median

# Weighted Median

Weighted Median generalizes the ordinary median by assigning a nonnegative weight to each element. Instead of balancing the number of elements on both sides, it balances the total weight.

An element is a weighted median if the total weight of smaller elements is at most half of the total weight, and the total weight of larger elements is also at most half.

## Problem

Given pairs

$$
(x_1, w_1), (x_2, w_2), \dots, (x_n, w_n)
$$

where each $w_i \ge 0$, find a value $x_m$ such that:

$$
\sum_{x_i < x_m} w_i \le \frac{W}{2}
$$

and

$$
\sum_{x_i > x_m} w_i \le \frac{W}{2}
$$

where:

$$
W = \sum_{i=1}^{n} w_i
$$

## Algorithm

The simple method sorts by value, then scans cumulative weight.

```id="wm1"
weighted_median(items):
    sort items by value
    W = total weight
    prefix = 0

    for each (x, w) in sorted items:
        prefix = prefix + w
        if prefix >= W / 2:
            return x
```

A linear-time version can be built from selection and partitioning, but sorting is usually simpler and sufficient.

## Example

Let:

$$
[(2, 1), (4, 3), (7, 2), (9, 4)]
$$

The total weight is:

$$
W = 10
$$

Half of the total weight is:

$$
5
$$

Cumulative weights after sorting:

| value | weight | cumulative |
| ----- | -----: | ---------: |
| 2     |      1 |          1 |
| 4     |      3 |          4 |
| 7     |      2 |          6 |
| 9     |      4 |         10 |

The first cumulative weight at least $5$ occurs at value $7$, so the weighted median is $7$.

## Correctness

After sorting, all values before the current item are smaller and all values after it are larger. The scan returns the first item whose cumulative weight reaches at least half of the total weight.

At that point, the weight of smaller elements is less than or equal to $W/2$, because the previous prefix was below $W/2$. The weight of larger elements is also less than or equal to $W/2$, because the current prefix has reached at least $W/2$.

Therefore the returned value satisfies the weighted median conditions.

## Complexity

| method          |            time |            space |
| --------------- | --------------: | ---------------: |
| sort and scan   |   $O(n \log n)$ | $O(1)$ or $O(n)$ |
| selection based | $O(n)$ expected |  $O(1)$ possible |

The sorting method is easier to implement and less error prone.

## When to Use

Use Weighted Median when observations have unequal importance. Common examples include weighted statistics, facility location, robust aggregation, voting, load balancing, and percentile calculations with sample weights.

## Implementation

```id="wm2"
def weighted_median(items):
    items = sorted(items, key=lambda p: p[0])
    total = sum(w for _, w in items)

    prefix = 0
    for x, w in items:
        prefix += w
        if prefix >= total / 2:
            return x

    raise ValueError("empty input")
```

```id="wm3"
package main

import "sort"

type WeightedItem struct {
	Value  int
	Weight float64
}

func WeightedMedian(items []WeightedItem) (int, bool) {
	if len(items) == 0 {
		return 0, false
	}

	sort.Slice(items, func(i, j int) bool {
		return items[i].Value < items[j].Value
	})

	total := 0.0
	for _, item := range items {
		total += item.Weight
	}

	prefix := 0.0
	for _, item := range items {
		prefix += item.Weight
		if prefix >= total/2 {
			return item.Value, true
		}
	}

	return 0, false
}
```

