17.19 Spatial Indexes

You need to search large collections of geometric objects efficiently.

17.19 Spatial Indexes

Problem

You need to search large collections of geometric objects efficiently.

Common queries include:

find all points inside this rectangle
find objects near this location
find candidate collisions
find the nearest object
find all shapes intersecting this window

A brute-force scan checks every object. For n objects, each query costs:

O(n)

That may be fine for small datasets, but it fails for maps, games, CAD files, simulations, image processing, and spatial databases.

A spatial index organizes geometry so that most irrelevant objects can be skipped.

Solution

Choose a spatial index that matches the data and query pattern.

The most common options are:

Index Best For
Uniform grid Uniform data, fixed-radius queries
Spatial hash Sparse grids, broad-phase collision
KD-tree Point data, nearest-neighbor queries
Quadtree 2D adaptive subdivision
R-tree Rectangles, polygons, spatial databases
Bounding volume hierarchy Collision and ray queries

No single index dominates every workload. The right structure depends on object type, update frequency, distribution, and query type.

Uniform Grid

A uniform grid divides the plane into equal-sized cells.

+---+---+---+
|   |   |   |
+---+---+---+
|   | x |   |
+---+---+---+
|   |   |   |
+---+---+---+

Each object is inserted into the cell or cells it overlaps.

A query examines only relevant cells.

For a point:

from math import floor

def grid_key(point, cell_size):
    return (
        floor(point.x / cell_size),
        floor(point.y / cell_size)
    )

Uniform grids are simple and fast when objects have similar sizes and are roughly evenly distributed.

They perform poorly when objects are highly clustered or vary greatly in size.

KD-Tree

A KD-tree recursively splits points by coordinate.

At one level it splits by x.

At the next level it splits by y.

Then x again, and so on.

split x
  split y
    split x
    split x
  split y
    split x
    split x

A simple node:

from dataclasses import dataclass

@dataclass
class KDNode:
    point: object
    axis: int
    left: object = None
    right: object = None

Build:

def build_kdtree(points, depth=0):
    if not points:
        return None

    axis = depth % 2

    points = sorted(
        points,
        key=lambda p: (p.x, p.y) if axis == 0 else (p.y, p.x)
    )

    mid = len(points) // 2

    return KDNode(
        point=points[mid],
        axis=axis,
        left=build_kdtree(points[:mid], depth + 1),
        right=build_kdtree(points[mid + 1:], depth + 1)
    )

KD-trees are effective for static point sets and nearest-neighbor search.

Range Query in a KD-Tree

A rectangular query can prune subtrees that cannot intersect the query region.

For clarity, suppose each recursive call receives a bounding box for the subtree. Then:

def range_query(node, rect, result):
    if node is None:
        return

    p = node.point

    if rect.contains(p):
        result.append(p)

    axis = node.axis

    if axis == 0:
        if rect.left <= p.x:
            range_query(node.left, rect, result)
        if p.x <= rect.right:
            range_query(node.right, rect, result)
    else:
        if rect.bottom <= p.y:
            range_query(node.left, rect, result)
        if p.y <= rect.top:
            range_query(node.right, rect, result)

This simple version prunes by split coordinate only. More advanced versions store bounding boxes per node and prune more aggressively.

Nearest Neighbor in a KD-Tree

Nearest-neighbor search descends toward the side containing the query point first, then checks whether the other side could still contain a closer point.

def distance_squared(a, b):
    dx = a.x - b.x
    dy = a.y - b.y
    return dx * dx + dy * dy

def nearest(node, query, best=None, best_dist=float("inf")):
    if node is None:
        return best, best_dist

    point = node.point
    d = distance_squared(point, query)

    if d < best_dist:
        best = point
        best_dist = d

    axis = node.axis

    if axis == 0:
        delta = query.x - point.x
    else:
        delta = query.y - point.y

    first = node.left if delta < 0 else node.right
    second = node.right if delta < 0 else node.left

    best, best_dist = nearest(first, query, best, best_dist)

    if delta * delta < best_dist:
        best, best_dist = nearest(second, query, best, best_dist)

    return best, best_dist

The pruning condition:

delta² < best distance

means the splitting line is close enough that the opposite subtree might contain a better answer.

Quadtree

A quadtree recursively subdivides a two-dimensional region into four quadrants.

+-----+-----+
| NW  | NE  |
+-----+-----+
| SW  | SE  |
+-----+-----+

When a cell contains too many objects, split it.

Quadtrees are useful when data is clustered. Dense regions get more subdivisions. Sparse regions remain coarse.

They are commonly used in:

games
image processing
collision detection
map systems
particle simulations

The main tuning parameters are:

maximum objects per node
maximum depth
minimum cell size

Poor choices can produce either excessive subdivision or overloaded leaves.

R-Tree

An R-tree stores bounding rectangles.

Each internal node contains a bounding rectangle that encloses its children.

root rectangle
  child rectangle
    object rectangles
  child rectangle
    object rectangles

R-trees are designed for spatial databases and geometry collections such as rectangles, line strings, and polygons.

They support queries like:

find all objects whose bounding boxes intersect this query box

The exact geometry test happens afterward.

This mirrors the broad-phase then narrow-phase pattern:

R-tree filter by bounding boxes
exact intersection test

R-trees are especially useful when objects are not points.

Bounding Volume Hierarchy

A bounding volume hierarchy, or BVH, stores objects in a tree of bounding boxes or other bounding volumes.

BVHs are widely used in:

ray tracing
collision detection
physics engines
robotics

A query traverses the tree and ignores branches whose bounding volume cannot match.

For example, a ray query first intersects bounding boxes. Only if the ray hits a box does the traversal continue to the contained objects.

BVHs are often faster than general-purpose spatial indexes for rendering and physics workloads because they can be tuned to the query pattern.

Static vs Dynamic Data

Spatial indexes differ strongly in update behavior.

Static data:

build once
query many times

allows expensive preprocessing.

Dynamic data:

objects move
insertions and deletions happen frequently

requires cheap updates.

Typical choices:

Workload Good Index
Static nearest-neighbor points KD-tree
Moving game objects Spatial hash or dynamic BVH
Spatial database polygons R-tree
Uniform particle simulation Uniform grid
Clustered 2D points Quadtree
Ray tracing scene BVH

Bounding Boxes

Most spatial indexes begin with bounding boxes.

For a point:

box = [x, x] × [y, y]

For a segment:

def segment_box(a, b):
    return Rectangle(
        left=min(a.x, b.x),
        bottom=min(a.y, b.y),
        right=max(a.x, b.x),
        top=max(a.y, b.y)
    )

For a polygon, use the min and max of all vertices.

Bounding boxes are cheap to compute and cheap to test.

def boxes_intersect(a, b):
    return (
        a.left <= b.right
        and b.left <= a.right
        and a.bottom <= b.top
        and b.bottom <= a.top
    )

They may produce false positives, but they should not produce false negatives.

Query Pipeline

Most spatial systems use a two-stage query pipeline.

candidate generation
exact filtering

Example:

def query_polygons(index, query_box):
    candidates = index.search(query_box)

    return [
        polygon
        for polygon in candidates
        if polygon_intersects_box(polygon, query_box)
    ]

The spatial index should be fast and conservative.

The exact predicate should be correct.

This separation keeps the index simple and the geometry reliable.

Complexity Analysis

Spatial-index complexity depends on distribution and workload.

Typical expectations:

Operation Typical Cost
Grid insertion O(1) expected
KD-tree build O(n log n)
KD-tree nearest query O(log n) average
Quadtree query O(log n + k) average
R-tree query O(log n + k) average
Brute-force query O(n)

Here k is the number of reported candidates or results.

Worst cases can be worse, especially with clustered, adversarial, or highly overlapping data.

Common Pitfalls

Building the Wrong Index

A KD-tree is good for points. It is not a general polygon index.

Expecting Exact Answers from an Index

Most spatial indexes return candidates. Exact predicates still matter.

Ignoring Object Size

Uniform grids perform poorly when some objects are much larger than the cell size.

Rebuilding Too Often

For dynamic workloads, rebuild cost can dominate query savings.

Skipping Bounding-Box Tests

Bounding boxes are cheap and should usually be the first filter.

Discussion

Spatial indexes are performance tools. They do not replace geometric predicates; they reduce the number of times those predicates need to run. A good index matches the shape of the workload: point queries use one structure, rectangle and polygon queries another, and moving-object systems often require a different design from static datasets.

The most reliable architecture is layered: use the index to produce candidates, then use exact or robust geometry to decide the final result. This keeps performance concerns separated from correctness concerns.