17.19 Spatial Indexes
You need to search large collections of geometric objects efficiently.
17.19 Spatial Indexes
Problem
You need to search large collections of geometric objects efficiently.
Common queries include:
find all points inside this rectangle
find objects near this location
find candidate collisions
find the nearest object
find all shapes intersecting this window
A brute-force scan checks every object. For n objects, each query costs:
O(n)
That may be fine for small datasets, but it fails for maps, games, CAD files, simulations, image processing, and spatial databases.
A spatial index organizes geometry so that most irrelevant objects can be skipped.
Solution
Choose a spatial index that matches the data and query pattern.
The most common options are:
| Index | Best For |
|---|---|
| Uniform grid | Uniform data, fixed-radius queries |
| Spatial hash | Sparse grids, broad-phase collision |
| KD-tree | Point data, nearest-neighbor queries |
| Quadtree | 2D adaptive subdivision |
| R-tree | Rectangles, polygons, spatial databases |
| Bounding volume hierarchy | Collision and ray queries |
No single index dominates every workload. The right structure depends on object type, update frequency, distribution, and query type.
Uniform Grid
A uniform grid divides the plane into equal-sized cells.
+---+---+---+
| | | |
+---+---+---+
| | x | |
+---+---+---+
| | | |
+---+---+---+
Each object is inserted into the cell or cells it overlaps.
A query examines only relevant cells.
For a point:
from math import floor
def grid_key(point, cell_size):
return (
floor(point.x / cell_size),
floor(point.y / cell_size)
)
Uniform grids are simple and fast when objects have similar sizes and are roughly evenly distributed.
They perform poorly when objects are highly clustered or vary greatly in size.
KD-Tree
A KD-tree recursively splits points by coordinate.
At one level it splits by x.
At the next level it splits by y.
Then x again, and so on.
split x
split y
split x
split x
split y
split x
split x
A simple node:
from dataclasses import dataclass
@dataclass
class KDNode:
point: object
axis: int
left: object = None
right: object = None
Build:
def build_kdtree(points, depth=0):
if not points:
return None
axis = depth % 2
points = sorted(
points,
key=lambda p: (p.x, p.y) if axis == 0 else (p.y, p.x)
)
mid = len(points) // 2
return KDNode(
point=points[mid],
axis=axis,
left=build_kdtree(points[:mid], depth + 1),
right=build_kdtree(points[mid + 1:], depth + 1)
)
KD-trees are effective for static point sets and nearest-neighbor search.
Range Query in a KD-Tree
A rectangular query can prune subtrees that cannot intersect the query region.
For clarity, suppose each recursive call receives a bounding box for the subtree. Then:
def range_query(node, rect, result):
if node is None:
return
p = node.point
if rect.contains(p):
result.append(p)
axis = node.axis
if axis == 0:
if rect.left <= p.x:
range_query(node.left, rect, result)
if p.x <= rect.right:
range_query(node.right, rect, result)
else:
if rect.bottom <= p.y:
range_query(node.left, rect, result)
if p.y <= rect.top:
range_query(node.right, rect, result)
This simple version prunes by split coordinate only. More advanced versions store bounding boxes per node and prune more aggressively.
Nearest Neighbor in a KD-Tree
Nearest-neighbor search descends toward the side containing the query point first, then checks whether the other side could still contain a closer point.
def distance_squared(a, b):
dx = a.x - b.x
dy = a.y - b.y
return dx * dx + dy * dy
def nearest(node, query, best=None, best_dist=float("inf")):
if node is None:
return best, best_dist
point = node.point
d = distance_squared(point, query)
if d < best_dist:
best = point
best_dist = d
axis = node.axis
if axis == 0:
delta = query.x - point.x
else:
delta = query.y - point.y
first = node.left if delta < 0 else node.right
second = node.right if delta < 0 else node.left
best, best_dist = nearest(first, query, best, best_dist)
if delta * delta < best_dist:
best, best_dist = nearest(second, query, best, best_dist)
return best, best_dist
The pruning condition:
delta² < best distance
means the splitting line is close enough that the opposite subtree might contain a better answer.
Quadtree
A quadtree recursively subdivides a two-dimensional region into four quadrants.
+-----+-----+
| NW | NE |
+-----+-----+
| SW | SE |
+-----+-----+
When a cell contains too many objects, split it.
Quadtrees are useful when data is clustered. Dense regions get more subdivisions. Sparse regions remain coarse.
They are commonly used in:
games
image processing
collision detection
map systems
particle simulations
The main tuning parameters are:
maximum objects per node
maximum depth
minimum cell size
Poor choices can produce either excessive subdivision or overloaded leaves.
R-Tree
An R-tree stores bounding rectangles.
Each internal node contains a bounding rectangle that encloses its children.
root rectangle
child rectangle
object rectangles
child rectangle
object rectangles
R-trees are designed for spatial databases and geometry collections such as rectangles, line strings, and polygons.
They support queries like:
find all objects whose bounding boxes intersect this query box
The exact geometry test happens afterward.
This mirrors the broad-phase then narrow-phase pattern:
R-tree filter by bounding boxes
exact intersection test
R-trees are especially useful when objects are not points.
Bounding Volume Hierarchy
A bounding volume hierarchy, or BVH, stores objects in a tree of bounding boxes or other bounding volumes.
BVHs are widely used in:
ray tracing
collision detection
physics engines
robotics
A query traverses the tree and ignores branches whose bounding volume cannot match.
For example, a ray query first intersects bounding boxes. Only if the ray hits a box does the traversal continue to the contained objects.
BVHs are often faster than general-purpose spatial indexes for rendering and physics workloads because they can be tuned to the query pattern.
Static vs Dynamic Data
Spatial indexes differ strongly in update behavior.
Static data:
build once
query many times
allows expensive preprocessing.
Dynamic data:
objects move
insertions and deletions happen frequently
requires cheap updates.
Typical choices:
| Workload | Good Index |
|---|---|
| Static nearest-neighbor points | KD-tree |
| Moving game objects | Spatial hash or dynamic BVH |
| Spatial database polygons | R-tree |
| Uniform particle simulation | Uniform grid |
| Clustered 2D points | Quadtree |
| Ray tracing scene | BVH |
Bounding Boxes
Most spatial indexes begin with bounding boxes.
For a point:
box = [x, x] × [y, y]
For a segment:
def segment_box(a, b):
return Rectangle(
left=min(a.x, b.x),
bottom=min(a.y, b.y),
right=max(a.x, b.x),
top=max(a.y, b.y)
)
For a polygon, use the min and max of all vertices.
Bounding boxes are cheap to compute and cheap to test.
def boxes_intersect(a, b):
return (
a.left <= b.right
and b.left <= a.right
and a.bottom <= b.top
and b.bottom <= a.top
)
They may produce false positives, but they should not produce false negatives.
Query Pipeline
Most spatial systems use a two-stage query pipeline.
candidate generation
exact filtering
Example:
def query_polygons(index, query_box):
candidates = index.search(query_box)
return [
polygon
for polygon in candidates
if polygon_intersects_box(polygon, query_box)
]
The spatial index should be fast and conservative.
The exact predicate should be correct.
This separation keeps the index simple and the geometry reliable.
Complexity Analysis
Spatial-index complexity depends on distribution and workload.
Typical expectations:
| Operation | Typical Cost |
|---|---|
| Grid insertion | O(1) expected |
| KD-tree build | O(n log n) |
| KD-tree nearest query | O(log n) average |
| Quadtree query | O(log n + k) average |
| R-tree query | O(log n + k) average |
| Brute-force query | O(n) |
Here k is the number of reported candidates or results.
Worst cases can be worse, especially with clustered, adversarial, or highly overlapping data.
Common Pitfalls
Building the Wrong Index
A KD-tree is good for points. It is not a general polygon index.
Expecting Exact Answers from an Index
Most spatial indexes return candidates. Exact predicates still matter.
Ignoring Object Size
Uniform grids perform poorly when some objects are much larger than the cell size.
Rebuilding Too Often
For dynamic workloads, rebuild cost can dominate query savings.
Skipping Bounding-Box Tests
Bounding boxes are cheap and should usually be the first filter.
Discussion
Spatial indexes are performance tools. They do not replace geometric predicates; they reduce the number of times those predicates need to run. A good index matches the shape of the workload: point queries use one structure, rectangle and polygon queries another, and moving-object systems often require a different design from static datasets.
The most reliable architecture is layered: use the index to produce candidates, then use exact or robust geometry to decide the final result. This keeps performance concerns separated from correctness concerns.