Skip to content

SIMD Linear Search

Search several array elements at once using vector instructions.

SIMD linear search is a linear search variant that compares several values at the same time using vector instructions. SIMD means single instruction, multiple data. Instead of comparing one array element per loop iteration, the algorithm loads a block of elements into a vector register and compares all lanes against the target in one operation.

The logical algorithm remains linear search. The speedup comes from processing multiple elements per CPU instruction.

Problem

Given an array AA of length nn and a target value xx, find an index ii such that

A[i]=x A[i] = x

If no such index exists, return 1-1.

Idea

Use a vector width ww. Each iteration loads ww elements:

A[i],A[i+1],,A[i+w1] A[i], A[i+1], \ldots, A[i+w-1]

Compare all ww elements against xx in parallel. The comparison produces a bit mask. If the mask is nonzero, at least one lane matched the target. The first set bit gives the first matching position inside the vector.

Algorithm

simd_linear_search(A, x):
    i = 0
    w = vector_width()

    while i + w <= length(A):
        values = vector_load(A, i)
        mask = vector_compare_equal(values, x)

        if mask != 0:
            lane = first_set_bit(mask)
            return i + lane

        i = i + w

    while i < length(A):
        if A[i] == x:
            return i
        i = i + 1

    return -1

The final scalar loop handles the remaining elements when the array length is not a multiple of the vector width.

Example

Suppose the vector width is 44:

A=[7,9,12,15,18,21,24,30] A = [7, 9, 12, 15, 18, 21, 24, 30]

and

x=21 x = 21

The first vector compares:

[7,9,12,15] [7, 9, 12, 15]

No match. The second vector compares:

[18,21,24,30] [18, 21, 24, 30]

The mask has lane 11 set, so the answer is:

4+1=5 4 + 1 = 5

Correctness

The algorithm compares every array element with the target. The SIMD loop compares elements in contiguous groups of width ww. If a group contains a match, the mask identifies at least one matching lane, and selecting the first set bit returns the earliest match in that group.

If no SIMD group contains the target, the scalar tail checks every remaining element. Therefore the algorithm returns a valid matching index if one exists, and returns 1-1 only after all elements have been checked.

Complexity

casescalar comparisonsvector iterationstime
best11 lane group11O(1)O(1)
worstnn lanes checkedn/w\lceil n / w \rceilO(n/w)O(n / w) vector steps

The asymptotic model usually still describes the algorithm as:

O(n) O(n)

because ww is a fixed hardware constant.

Space usage is:

O(1) O(1)

When to Use

Use SIMD linear search when:

  • the data is unsorted
  • many elements must be scanned
  • values are stored contiguously
  • vector instructions are available
  • the comparison operation is simple

It is especially effective for byte arrays, integer arrays, string scanning, filtering, and parsing workloads.

Implementation

Python does not expose portable low level SIMD directly, but NumPy uses vectorized native loops internally.

import numpy as np

def simd_linear_search_numpy(a, x):
    arr = np.asarray(a)
    matches = np.flatnonzero(arr == x)
    if len(matches) == 0:
        return -1
    return int(matches[0])

A Go implementation can show the block scanning structure. Actual SIMD requires architecture specific intrinsics or assembly.

func SIMDLinearSearchShape(a []int, x int) int {
    const width = 4
    n := len(a)
    i := 0

    for i+width <= n {
        if a[i] == x {
            return i
        }
        if a[i+1] == x {
            return i + 1
        }
        if a[i+2] == x {
            return i + 2
        }
        if a[i+3] == x {
            return i + 3
        }
        i += width
    }

    for i < n {
        if a[i] == x {
            return i
        }
        i++
    }

    return -1
}