20.19 Building a Geometry Query Engine

Design a geometry query engine that stores spatial objects and answers geometric questions efficiently.

20.19 Building a Geometry Query Engine

Problem

Design a geometry query engine that stores spatial objects and answers geometric questions efficiently.

Geometry query engines appear in maps, CAD tools, games, robotics, logistics, GIS systems, layout engines, collision detection systems, and spatial databases. The input is a set of objects such as points, rectangles, line segments, or polygons. The query asks for objects that match a spatial condition.

Common queries include:

Which points are inside this rectangle?
Which objects intersect this area?
Which object is nearest to this point?
Which line segments cross this segment?
Which polygons contain this location?

A naive engine scans every object for every query. That is simple but too slow once the dataset grows. The central algorithmic idea is spatial indexing: use geometry to discard most objects before running exact tests.

Solution

Start with axis-aligned bounding boxes and a grid index. A bounding box gives a cheap approximation of an object. A grid index maps space into cells, so a query only inspects objects near the query region.

For a first engine, store points and rectangles.

from dataclasses import dataclass
from collections import defaultdict
import math

Geometry primitives:

@dataclass(frozen=True)
class Point:
    x: float
    y: float

@dataclass(frozen=True)
class Rect:
    min_x: float
    min_y: float
    max_x: float
    max_y: float

    def contains_point(self, point):
        return (
            self.min_x <= point.x <= self.max_x
            and self.min_y <= point.y <= self.max_y
        )

    def intersects(self, other):
        return not (
            self.max_x < other.min_x
            or other.max_x < self.min_x
            or self.max_y < other.min_y
            or other.max_y < self.min_y
        )

The intersects method is the basic filter for rectangular range queries.

Bounding Boxes

Every indexed object should expose a bounding box.

For a point, the bounding box is a zero-area rectangle.

def point_bounds(point):
    return Rect(
        min_x=point.x,
        min_y=point.y,
        max_x=point.x,
        max_y=point.y,
    )

For a rectangle, the bounding box is itself.

def rect_bounds(rect):
    return rect

For a polygon or line segment, compute the minimum and maximum x and y values. The bounding box may include empty space, but it is fast to test.

Bounding boxes are conservative:

If two objects do not have intersecting bounding boxes, they cannot intersect.
If two bounding boxes intersect, the objects may or may not intersect.

This makes them useful as a first-stage filter.

A Brute-Force Engine

Start with a correct baseline.

class BruteForceGeometryEngine:
    def __init__(self):
        self.objects = {}

    def insert(self, object_id, geometry):
        self.objects[object_id] = geometry

    def query_rect(self, area):
        results = []

        for object_id, geometry in self.objects.items():
            bounds = bounds_of(geometry)

            if bounds.intersects(area):
                results.append(object_id)

        return results

Dispatch bounding boxes by object type:

def bounds_of(geometry):
    if isinstance(geometry, Point):
        return point_bounds(geometry)

    if isinstance(geometry, Rect):
        return rect_bounds(geometry)

    raise TypeError(f"unsupported geometry: {type(geometry)!r}")

This baseline is O(n) per query. It is valuable because it gives you a reference implementation for tests.

Uniform Grid Index

A uniform grid divides space into square cells.

+---+---+---+
|   |   |   |
+---+---+---+
|   | X |   |
+---+---+---+
|   |   |   |
+---+---+---+

Each object is assigned to every cell its bounding box overlaps. A query examines only the cells overlapped by the query rectangle.

class GridIndex:
    def __init__(self, cell_size):
        self.cell_size = cell_size
        self.cells = defaultdict(set)
        self.objects = {}

    def _cell_coord(self, value):
        return math.floor(value / self.cell_size)

    def _cells_for_rect(self, rect):
        min_cx = self._cell_coord(rect.min_x)
        max_cx = self._cell_coord(rect.max_x)
        min_cy = self._cell_coord(rect.min_y)
        max_cy = self._cell_coord(rect.max_y)

        for cx in range(min_cx, max_cx + 1):
            for cy in range(min_cy, max_cy + 1):
                yield cx, cy

Insert objects:

class GridIndex(GridIndex):
    def insert(self, object_id, geometry):
        self.objects[object_id] = geometry
        bounds = bounds_of(geometry)

        for cell in self._cells_for_rect(bounds):
            self.cells[cell].add(object_id)

Query by rectangle:

class GridIndex(GridIndex):
    def query_rect(self, area):
        candidates = set()

        for cell in self._cells_for_rect(area):
            candidates.update(self.cells.get(cell, set()))

        results = []

        for object_id in candidates:
            geometry = self.objects[object_id]

            if bounds_of(geometry).intersects(area):
                results.append(object_id)

        return results

The grid index reduces the candidate set when objects are spatially distributed.

Choosing Cell Size

Cell size controls index quality.

If cells are too large, each query sees too many objects.

If cells are too small, large objects are duplicated across many cells and queries may scan many empty cells.

Cell size Problem
Too large Poor filtering
Too small High overhead
Near object/query scale Usually reasonable

There is no universal value. Choose cell size from data distribution and query workload.

A simple rule of thumb:

cell_size should be near the average query width or average object size

Then benchmark.

Exact Tests After Filtering

The grid index returns candidates based on bounding boxes. For points and rectangles, the bounding box test may already be exact enough. For line segments and polygons, run a second-stage exact test.

The query pipeline should look like this:

query area
  -> spatial index candidates
  -> exact geometry predicate
  -> final results

This two-stage design is common in spatial systems.

Point-in-Rectangle Query

For point data, a rectangular range query should return points inside the area.

class PointGridIndex(GridIndex):
    def query_points_in_rect(self, area):
        candidates = set()

        for cell in self._cells_for_rect(area):
            candidates.update(self.cells.get(cell, set()))

        results = []

        for object_id in candidates:
            point = self.objects[object_id]

            if area.contains_point(point):
                results.append(object_id)

        return results

This is useful for map markers, sensor locations, warehouse locations, and scatterplot selection.

Nearest Neighbor Query

A nearest-neighbor query asks for the closest object to a point.

A brute-force implementation:

def squared_distance(left, right):
    dx = left.x - right.x
    dy = left.y - right.y

    return dx * dx + dy * dy
def nearest_point_bruteforce(points, query):
    best_id = None
    best_distance = math.inf

    for object_id, point in points.items():
        distance = squared_distance(point, query)

        if distance < best_distance:
            best_distance = distance
            best_id = object_id

    return best_id, math.sqrt(best_distance)

Using a grid, search outward from the query cell until no unseen cell can contain a closer point. A simple version expands rings.

def cells_in_ring(cx, cy, radius):
    if radius == 0:
        yield cx, cy
        return

    for x in range(cx - radius, cx + radius + 1):
        yield x, cy - radius
        yield x, cy + radius

    for y in range(cy - radius + 1, cy + radius):
        yield cx - radius, y
        yield cx + radius, y

A complete nearest-neighbor search needs careful stopping conditions. For many applications, a bounded search radius is enough.

Segment Intersection

For line segments, the basic exact predicate is orientation.

@dataclass(frozen=True)
class Segment:
    a: Point
    b: Point

Orientation:

def orientation(a, b, c):
    value = (
        (b.x - a.x) * (c.y - a.y)
        - (b.y - a.y) * (c.x - a.x)
    )

    if value > 0:
        return 1

    if value < 0:
        return -1

    return 0

Segment bounding box:

def segment_bounds(segment):
    return Rect(
        min(segment.a.x, segment.b.x),
        min(segment.a.y, segment.b.y),
        max(segment.a.x, segment.b.x),
        max(segment.a.y, segment.b.y),
    )

Intersection:

def on_segment(a, b, c):
    return (
        min(a.x, c.x) <= b.x <= max(a.x, c.x)
        and min(a.y, c.y) <= b.y <= max(a.y, c.y)
    )

def segments_intersect(left, right):
    a = left.a
    b = left.b
    c = right.a
    d = right.b

    o1 = orientation(a, b, c)
    o2 = orientation(a, b, d)
    o3 = orientation(c, d, a)
    o4 = orientation(c, d, b)

    if o1 != o2 and o3 != o4:
        return True

    if o1 == 0 and on_segment(a, c, b):
        return True

    if o2 == 0 and on_segment(a, d, b):
        return True

    if o3 == 0 and on_segment(c, a, d):
        return True

    if o4 == 0 and on_segment(c, b, d):
        return True

    return False

For a segment query, first use bounding boxes to find candidates, then run segments_intersect.

Extending bounds_of

Add segment support:

def bounds_of(geometry):
    if isinstance(geometry, Point):
        return point_bounds(geometry)

    if isinstance(geometry, Rect):
        return rect_bounds(geometry)

    if isinstance(geometry, Segment):
        return segment_bounds(geometry)

    raise TypeError(f"unsupported geometry: {type(geometry)!r}")

Query segments that intersect a given segment:

class SegmentGridIndex(GridIndex):
    def query_segment(self, segment):
        area = segment_bounds(segment)
        candidates = set()

        for cell in self._cells_for_rect(area):
            candidates.update(self.cells.get(cell, set()))

        results = []

        for object_id in candidates:
            geometry = self.objects[object_id]

            if not isinstance(geometry, Segment):
                continue

            if segments_intersect(segment, geometry):
                results.append(object_id)

        return results

This pattern scales to polygons and circles: index by bounding box, then apply exact predicates.

R-Trees

Uniform grids work well when objects are evenly distributed and similarly sized. They work poorly when geometry is clustered or object sizes vary widely.

An R-tree is a spatial index designed for rectangles. It groups nearby bounding boxes into hierarchical nodes.

Conceptually:

root bounding boxes
  -> child bounding boxes
      -> leaf objects

A query descends only into nodes whose bounding boxes intersect the query area.

R-trees are a standard choice for GIS and spatial databases because they handle nonuniform distributions better than uniform grids.

A full R-tree implementation is beyond this recipe, but the interface is similar:

class SpatialIndex:
    def insert(self, object_id, geometry):
        raise NotImplementedError

    def query_rect(self, area):
        raise NotImplementedError

Design your engine around this interface so you can replace the grid with an R-tree later.

Precision and Robustness

Floating-point geometry is fragile.

Problems include:

  • Nearly collinear points.
  • Tiny rounding errors.
  • Points near boundaries.
  • Very large and very small coordinates mixed together.
  • Different coordinate systems.

A strict orientation check:

if value > 0

may be unstable when value is near zero.

Use an epsilon policy:

EPSILON = 1e-9

def orientation_eps(a, b, c):
    value = (
        (b.x - a.x) * (c.y - a.y)
        - (b.y - a.y) * (c.x - a.x)
    )

    if value > EPSILON:
        return 1

    if value < -EPSILON:
        return -1

    return 0

For serious computational geometry, prefer robust predicates and well-defined coordinate systems. For integer grid geometry, exact integer arithmetic is often simpler and safer.

Coordinate Systems

A geometry engine should make coordinate assumptions explicit.

Examples:

Domain Coordinate model
Screen UI Pixels, y may increase downward
GIS Latitude and longitude
CAD Cartesian coordinates
Games World coordinates
Robotics Local or global frame

Euclidean distance on latitude and longitude is usually wrong over large geographic areas. For maps, use appropriate projections or geodesic distance formulas.

The algorithms in this recipe assume a flat Cartesian plane.

Deletion and Updates

A spatial index must support updates.

For a grid index, deletion requires removing the object from every cell it occupies.

Store the object's previous cells.

class MutableGridIndex(GridIndex):
    def __init__(self, cell_size):
        super().__init__(cell_size)
        self.object_cells = {}

    def insert(self, object_id, geometry):
        if object_id in self.objects:
            self.delete(object_id)

        self.objects[object_id] = geometry
        bounds = bounds_of(geometry)
        cells = set(self._cells_for_rect(bounds))

        for cell in cells:
            self.cells[cell].add(object_id)

        self.object_cells[object_id] = cells

    def delete(self, object_id):
        cells = self.object_cells.pop(object_id, set())

        for cell in cells:
            self.cells[cell].discard(object_id)

        self.objects.pop(object_id, None)

Updates become delete plus insert.

def update(self, object_id, geometry):
    self.insert(object_id, geometry)

Metrics

Track index quality.

Useful metrics:

Metric Meaning
object_count Stored objects
cell_count Non-empty cells
average_objects_per_cell Filtering quality
max_objects_per_cell Hotspot indicator
average_cells_per_object Duplication overhead
query_candidate_count Work before exact filtering
query_result_count Final result size
false_positive_ratio Candidate count divided by result count

If most queries inspect thousands of candidates to return a few objects, the index parameters are poor.

Testing

Test rectangle intersection.

def test_rect_intersection():
    a = Rect(0, 0, 10, 10)
    b = Rect(5, 5, 15, 15)
    c = Rect(20, 20, 30, 30)

    assert a.intersects(b) is True
    assert a.intersects(c) is False

Test point query.

def test_points_in_rect():
    index = PointGridIndex(cell_size=10)

    index.insert("a", Point(1, 1))
    index.insert("b", Point(20, 20))

    result = set(index.query_points_in_rect(
        Rect(0, 0, 10, 10)
    ))

    assert result == {"a"}

Test segment intersection.

def test_segments_intersect():
    left = Segment(Point(0, 0), Point(10, 10))
    right = Segment(Point(0, 10), Point(10, 0))

    assert segments_intersect(left, right) is True

Test deletion.

def test_delete_removes_from_cells():
    index = MutableGridIndex(cell_size=10)

    index.insert("a", Rect(0, 0, 5, 5))
    index.delete("a")

    assert index.query_rect(Rect(0, 0, 10, 10)) == []

Small geometry tests should include boundary cases. Boundary semantics must be explicit: points on edges either count as inside or outside.

Common Bugs

The most common geometry engine bug is relying on bounding boxes as exact geometry tests. Bounding boxes are filters. They may produce false positives.

Another common bug is choosing a grid cell size without measuring candidate counts.

Objects that span many cells can dominate memory. Large rectangles, long roads, or country-sized polygons need special handling.

Floating-point comparisons near boundaries can produce inconsistent results. Use an epsilon policy or exact arithmetic where appropriate.

Deletion bugs leave stale object IDs in cells. Track object-to-cell mappings.

Coordinate systems are often mixed. Latitude-longitude values should not be treated as flat x-y coordinates unless the domain permits it.

Nearest-neighbor queries are easy to implement incorrectly. Stopping conditions must prove that no unseen cell can contain a closer object.

Recipe

Build the engine in layers.

Start with geometry primitives and exact predicates. Add a brute-force engine as a correctness baseline. Add bounding boxes for every object type. Implement a uniform grid index for rectangular candidate filtering. Add exact second-stage tests for points, rectangles, and segments. Track candidate counts and tune cell size. Add deletion by storing object-to-cell mappings. Move to R-trees when data is clustered, object sizes vary, or grid tuning becomes difficult. Define coordinate and boundary semantics before exposing the API.

The main lesson is that spatial query engines use approximation before precision. Bounding boxes and indexes quickly narrow the search space; exact geometry tests confirm the final answer. This separation keeps the engine fast without giving up correctness.