SIMD linear search is a linear search variant that compares several values at the same time using vector instructions. SIMD means single instruction, multiple data. Instead of comparing one array element per loop iteration, the algorithm loads a block of elements into a vector register and compares all lanes against the target in one operation.
The logical algorithm remains linear search. The speedup comes from processing multiple elements per CPU instruction.
Problem
Given an array of length and a target value , find an index such that
If no such index exists, return .
Idea
Use a vector width . Each iteration loads elements:
Compare all elements against in parallel. The comparison produces a bit mask. If the mask is nonzero, at least one lane matched the target. The first set bit gives the first matching position inside the vector.
Algorithm
simd_linear_search(A, x):
i = 0
w = vector_width()
while i + w <= length(A):
values = vector_load(A, i)
mask = vector_compare_equal(values, x)
if mask != 0:
lane = first_set_bit(mask)
return i + lane
i = i + w
while i < length(A):
if A[i] == x:
return i
i = i + 1
return -1The final scalar loop handles the remaining elements when the array length is not a multiple of the vector width.
Example
Suppose the vector width is :
and
The first vector compares:
No match. The second vector compares:
The mask has lane set, so the answer is:
Correctness
The algorithm compares every array element with the target. The SIMD loop compares elements in contiguous groups of width . If a group contains a match, the mask identifies at least one matching lane, and selecting the first set bit returns the earliest match in that group.
If no SIMD group contains the target, the scalar tail checks every remaining element. Therefore the algorithm returns a valid matching index if one exists, and returns only after all elements have been checked.
Complexity
| case | scalar comparisons | vector iterations | time |
|---|---|---|---|
| best | lane group | ||
| worst | lanes checked | vector steps |
The asymptotic model usually still describes the algorithm as:
because is a fixed hardware constant.
Space usage is:
When to Use
Use SIMD linear search when:
- the data is unsorted
- many elements must be scanned
- values are stored contiguously
- vector instructions are available
- the comparison operation is simple
It is especially effective for byte arrays, integer arrays, string scanning, filtering, and parsing workloads.
Implementation
Python does not expose portable low level SIMD directly, but NumPy uses vectorized native loops internally.
import numpy as np
def simd_linear_search_numpy(a, x):
arr = np.asarray(a)
matches = np.flatnonzero(arr == x)
if len(matches) == 0:
return -1
return int(matches[0])A Go implementation can show the block scanning structure. Actual SIMD requires architecture specific intrinsics or assembly.
func SIMDLinearSearchShape(a []int, x int) int {
const width = 4
n := len(a)
i := 0
for i+width <= n {
if a[i] == x {
return i
}
if a[i+1] == x {
return i + 1
}
if a[i+2] == x {
return i + 2
}
if a[i+3] == x {
return i + 3
}
i += width
}
for i < n {
if a[i] == x {
return i
}
i++
}
return -1
}