# Array Vectorization

# Array Vectorization

Array vectorization processes multiple elements at once using wide CPU instructions. Instead of applying an operation to one value per instruction, the CPU applies it to a vector of values.

You use it when arrays are large, element types are uniform, and the same operation applies to many consecutive elements.

## Problem

Given arrays $A$ and $B$ of length $n$, compute an output array $C$ such that:

$$
C[i] = A[i] + B[i]
$$

for every valid index $i$.

## Structure

Vectorization works best when data is contiguous:

$$
A = [a_0, a_1, a_2, \dots, a_{n-1}]
$$

A vector register can load several adjacent values at once:

$$
[a_i, a_{i+1}, a_{i+2}, a_{i+3}]
$$

## Algorithm

Process elements in fixed-width chunks, then handle the remainder.

```id="array-vectorization"
vector_add(A, B):
    n = length(A)
    C = allocate(n)

    width = vector_width()

    i = 0
    while i + width <= n:
        va = load_vector(A, i)
        vb = load_vector(B, i)
        vc = va + vb
        store_vector(C, i, vc)
        i += width

    while i < n:
        C[i] = A[i] + B[i]
        i += 1

    return C
```

## Example

Let

$$
A = [1, 2, 3, 4, 5, 6]
$$

and

$$
B = [10, 20, 30, 40, 50, 60]
$$

With vector width $4$:

| step | indices | operation                       | result           |
| ---- | ------- | ------------------------------- | ---------------- |
| 1    | 0 to 3  | [1, 2, 3, 4] + [10, 20, 30, 40] | [11, 22, 33, 44] |
| 2    | 4 to 5  | scalar remainder                | [55, 66]         |

Final result:

$$
C = [11, 22, 33, 44, 55, 66]
$$

## Correctness

The vector loop processes disjoint contiguous blocks of width `vector_width()`. For each block, corresponding elements from $A$ and $B$ are loaded, added, and stored at the same positions in $C$.

The scalar loop handles every remaining index after the last full vector block. Since the vector blocks and remainder cover exactly the range $[0, n)$, every output value satisfies $C[i] = A[i] + B[i]$.

## Complexity

| operation  |   time |
| ---------- | -----: |
| vector add | $O(n)$ |

Vectorization does not change asymptotic complexity. It improves constant factors by doing more work per instruction.

Space usage:

$$
O(n)
$$

for the output array.

## When to Use

Array vectorization is appropriate when:

* data is contiguous
* operations are simple and uniform
* arrays are large enough to amortize setup cost
* branches inside the loop are minimal

It is less suitable when:

* access is irregular
* each element needs different control flow
* data contains pointer-heavy objects
* memory alignment or layout prevents efficient vector loads

## Implementation

```python id="array-vectorization-python"
def vector_add(a, b):
    if len(a) != len(b):
        raise ValueError("length mismatch")

    c = [0] * len(a)
    for i in range(len(a)):
        c[i] = a[i] + b[i]

    return c
```

```go id="array-vectorization-go"
func VectorAdd(a, b []int) ([]int, bool) {
    if len(a) != len(b) {
        return nil, false
    }

    c := make([]int, len(a))
    for i := 0; i < len(a); i++ {
        c[i] = a[i] + b[i]
    }

    return c, true
}
```