# LeetCode 528: Random Pick with Weight

## Problem Restatement

We are given a 0-indexed array of positive integers `w`.

Each `w[i]` is the weight of index `i`.

We need to implement a class with a method:

```python
pickIndex()
```

This method returns an index from `0` to `len(w) - 1`.

The probability of returning index `i` must be:

```python
w[i] / sum(w)
```

So larger weights should be picked more often.

For example, if:

```python
w = [1, 3]
```

then index `0` should be picked with probability:

```python
1 / 4
```

and index `1` should be picked with probability:

```python
3 / 4
```

Since this is a randomized problem, many output sequences can be accepted. The important part is that the long-run distribution matches the weights.

## Input and Output

| Item | Meaning |
|---|---|
| Constructor input | An array of positive integers `w` |
| Method output | One random index |
| Probability rule | Index `i` is returned with probability `w[i] / sum(w)` |
| Array length | `1 <= w.length <= 10^4` |
| Weight value | `1 <= w[i] <= 10^5` |
| Calls | `pickIndex` is called at most `10^4` times |

Class shape:

```python
class Solution:

    def __init__(self, w: list[int]):
        ...

    def pickIndex(self) -> int:
        ...
```

## Examples

Consider:

```python
w = [1, 3]
```

The total weight is:

```python
1 + 3 = 4
```

So we can think of four slots:

```python
[0, 1, 1, 1]
```

Index `0` owns one slot.

Index `1` owns three slots.

If we pick one slot uniformly at random, then index `0` has probability `1 / 4`, and index `1` has probability `3 / 4`.

Now consider:

```python
w = [2, 5, 3]
```

The total weight is:

```python
10
```

The probability distribution should be:

| Index | Weight | Probability |
|---|---:|---:|
| `0` | `2` | `2 / 10` |
| `1` | `5` | `5 / 10` |
| `2` | `3` | `3 / 10` |

We do not need to actually build this repeated list:

```python
[0, 0, 1, 1, 1, 1, 1, 2, 2, 2]
```

That list explains the idea, but it is not memory-efficient.

## First Thought: Expand the Array

A simple approach is to create a large array where each index appears as many times as its weight.

For:

```python
w = [2, 5, 3]
```

we would build:

```python
expanded = [0, 0, 1, 1, 1, 1, 1, 2, 2, 2]
```

Then `pickIndex()` chooses one random element from `expanded`.

This gives the correct probability because every slot is equally likely.

But this approach can use too much memory.

The sum of weights can be large:

```python
10^4 * 10^5 = 10^9
```

So we cannot build an expanded array.

## Key Insight

Instead of storing every repeated index, store the boundary where each index ends.

For:

```python
w = [2, 5, 3]
```

the prefix sums are:

```python
[2, 7, 10]
```

These prefix sums divide the range `1` through `10` into intervals:

| Random value | Picked index |
|---|---:|
| `1, 2` | `0` |
| `3, 4, 5, 6, 7` | `1` |
| `8, 9, 10` | `2` |

Index `0` gets `2` values.

Index `1` gets `5` values.

Index `2` gets `3` values.

So the interval sizes exactly match the weights.

Now `pickIndex()` can:

1. Generate a random integer from `1` to `total_weight`.
2. Find the first prefix sum greater than or equal to that random integer.
3. Return that prefix sum's index.

The search is binary search because the prefix sums are sorted.

## Algorithm

During initialization:

1. Create an empty array `prefix`.
2. Keep a running total.
3. For each weight, add it to the running total.
4. Append the running total to `prefix`.

During `pickIndex()`:

1. Generate a random integer `target` in the inclusive range:

```python
1 <= target <= total_weight
```

2. Binary search for the leftmost index `i` such that:

```python
prefix[i] >= target
```

3. Return `i`.

## Correctness

The prefix sum array partitions the integers from `1` to `total_weight` into consecutive ranges.

Index `0` receives the range:

```python
1 through prefix[0]
```

For every later index `i`, it receives the range:

```python
prefix[i - 1] + 1 through prefix[i]
```

The number of integers in this range is:

```python
prefix[i] - prefix[i - 1]
```

By construction, that value is exactly `w[i]`.

Therefore, index `i` owns exactly `w[i]` random values out of `sum(w)` total values.

Since `target` is chosen uniformly from `1` through `sum(w)`, the probability that `target` lands inside index `i`'s range is:

```python
w[i] / sum(w)
```

The binary search returns the first prefix sum greater than or equal to `target`, which is exactly the index whose range contains `target`.

So `pickIndex()` returns each index with the required probability.

## Complexity

Let `n = len(w)`.

| Operation | Time | Space |
|---|---:|---:|
| Constructor | `O(n)` | `O(n)` |
| `pickIndex()` | `O(log n)` | `O(1)` |

The constructor builds one prefix sum array.

Each call to `pickIndex()` performs one binary search.

## Implementation

```python
import random
from bisect import bisect_left

class Solution:

    def __init__(self, w: list[int]):
        self.prefix = []
        total = 0

        for weight in w:
            total += weight
            self.prefix.append(total)

        self.total = total

    def pickIndex(self) -> int:
        target = random.randint(1, self.total)
        return bisect_left(self.prefix, target)
```

## Code Explanation

The constructor builds cumulative weights:

```python
self.prefix = []
total = 0
```

For each weight:

```python
total += weight
self.prefix.append(total)
```

If `w = [2, 5, 3]`, then `self.prefix` becomes:

```python
[2, 7, 10]
```

The total weight is stored separately:

```python
self.total = total
```

In `pickIndex()`, we choose a random target:

```python
target = random.randint(1, self.total)
```

This is inclusive on both ends.

Then we find the first prefix sum that is at least `target`:

```python
bisect_left(self.prefix, target)
```

For example, if:

```python
self.prefix = [2, 7, 10]
```

then:

| Target | First prefix `>= target` | Returned index |
|---|---:|---:|
| `1` | `2` | `0` |
| `2` | `2` | `0` |
| `3` | `7` | `1` |
| `7` | `7` | `1` |
| `8` | `10` | `2` |
| `10` | `10` | `2` |

This gives exactly the weighted distribution.

## Manual Binary Search Version

Some interviews prefer writing the binary search directly.

```python
import random

class Solution:

    def __init__(self, w: list[int]):
        self.prefix = []
        total = 0

        for weight in w:
            total += weight
            self.prefix.append(total)

        self.total = total

    def pickIndex(self) -> int:
        target = random.randint(1, self.total)

        left = 0
        right = len(self.prefix) - 1

        while left < right:
            mid = (left + right) // 2

            if self.prefix[mid] >= target:
                right = mid
            else:
                left = mid + 1

        return left
```

The invariant is that the answer is always inside `[left, right]`.

If `self.prefix[mid] >= target`, then `mid` might be the answer, but there could be an earlier valid index. So we move `right` to `mid`.

If `self.prefix[mid] < target`, then `mid` cannot be the answer. So we move `left` to `mid + 1`.

When the loop ends, `left == right`, and that index is the first prefix sum greater than or equal to `target`.

## Testing

Randomized algorithms are tested differently from deterministic algorithms.

We should not expect an exact sequence of outputs.

Instead, we can test that every returned index is valid and that the distribution is approximately correct after many trials.

```python
from collections import Counter

def run_tests():
    s = Solution([1, 3])

    for _ in range(100):
        idx = s.pickIndex()
        assert idx in [0, 1]

    s = Solution([2, 5, 3])
    counts = Counter(s.pickIndex() for _ in range(10000))

    assert counts[0] < counts[1]
    assert counts[2] < counts[1]

    s = Solution([10])
    for _ in range(100):
        assert s.pickIndex() == 0

    print("all tests passed")

run_tests()
```

Test meaning:

| Test | Why |
|---|---|
| `[1, 3]` | Checks valid indices for a simple weighted case |
| `[2, 5, 3]` | Checks that the largest weight is picked most often |
| `[10]` | Checks single-index input |