20.19 Building a Geometry Query Engine
Design a geometry query engine that stores spatial objects and answers geometric questions efficiently.
20.19 Building a Geometry Query Engine
Problem
Design a geometry query engine that stores spatial objects and answers geometric questions efficiently.
Geometry query engines appear in maps, CAD tools, games, robotics, logistics, GIS systems, layout engines, collision detection systems, and spatial databases. The input is a set of objects such as points, rectangles, line segments, or polygons. The query asks for objects that match a spatial condition.
Common queries include:
Which points are inside this rectangle?
Which objects intersect this area?
Which object is nearest to this point?
Which line segments cross this segment?
Which polygons contain this location?
A naive engine scans every object for every query. That is simple but too slow once the dataset grows. The central algorithmic idea is spatial indexing: use geometry to discard most objects before running exact tests.
Solution
Start with axis-aligned bounding boxes and a grid index. A bounding box gives a cheap approximation of an object. A grid index maps space into cells, so a query only inspects objects near the query region.
For a first engine, store points and rectangles.
from dataclasses import dataclass
from collections import defaultdict
import math
Geometry primitives:
@dataclass(frozen=True)
class Point:
x: float
y: float
@dataclass(frozen=True)
class Rect:
min_x: float
min_y: float
max_x: float
max_y: float
def contains_point(self, point):
return (
self.min_x <= point.x <= self.max_x
and self.min_y <= point.y <= self.max_y
)
def intersects(self, other):
return not (
self.max_x < other.min_x
or other.max_x < self.min_x
or self.max_y < other.min_y
or other.max_y < self.min_y
)
The intersects method is the basic filter for rectangular range queries.
Bounding Boxes
Every indexed object should expose a bounding box.
For a point, the bounding box is a zero-area rectangle.
def point_bounds(point):
return Rect(
min_x=point.x,
min_y=point.y,
max_x=point.x,
max_y=point.y,
)
For a rectangle, the bounding box is itself.
def rect_bounds(rect):
return rect
For a polygon or line segment, compute the minimum and maximum x and y values. The bounding box may include empty space, but it is fast to test.
Bounding boxes are conservative:
If two objects do not have intersecting bounding boxes, they cannot intersect.
If two bounding boxes intersect, the objects may or may not intersect.
This makes them useful as a first-stage filter.
A Brute-Force Engine
Start with a correct baseline.
class BruteForceGeometryEngine:
def __init__(self):
self.objects = {}
def insert(self, object_id, geometry):
self.objects[object_id] = geometry
def query_rect(self, area):
results = []
for object_id, geometry in self.objects.items():
bounds = bounds_of(geometry)
if bounds.intersects(area):
results.append(object_id)
return results
Dispatch bounding boxes by object type:
def bounds_of(geometry):
if isinstance(geometry, Point):
return point_bounds(geometry)
if isinstance(geometry, Rect):
return rect_bounds(geometry)
raise TypeError(f"unsupported geometry: {type(geometry)!r}")
This baseline is O(n) per query. It is valuable because it gives you a reference implementation for tests.
Uniform Grid Index
A uniform grid divides space into square cells.
+---+---+---+
| | | |
+---+---+---+
| | X | |
+---+---+---+
| | | |
+---+---+---+
Each object is assigned to every cell its bounding box overlaps. A query examines only the cells overlapped by the query rectangle.
class GridIndex:
def __init__(self, cell_size):
self.cell_size = cell_size
self.cells = defaultdict(set)
self.objects = {}
def _cell_coord(self, value):
return math.floor(value / self.cell_size)
def _cells_for_rect(self, rect):
min_cx = self._cell_coord(rect.min_x)
max_cx = self._cell_coord(rect.max_x)
min_cy = self._cell_coord(rect.min_y)
max_cy = self._cell_coord(rect.max_y)
for cx in range(min_cx, max_cx + 1):
for cy in range(min_cy, max_cy + 1):
yield cx, cy
Insert objects:
class GridIndex(GridIndex):
def insert(self, object_id, geometry):
self.objects[object_id] = geometry
bounds = bounds_of(geometry)
for cell in self._cells_for_rect(bounds):
self.cells[cell].add(object_id)
Query by rectangle:
class GridIndex(GridIndex):
def query_rect(self, area):
candidates = set()
for cell in self._cells_for_rect(area):
candidates.update(self.cells.get(cell, set()))
results = []
for object_id in candidates:
geometry = self.objects[object_id]
if bounds_of(geometry).intersects(area):
results.append(object_id)
return results
The grid index reduces the candidate set when objects are spatially distributed.
Choosing Cell Size
Cell size controls index quality.
If cells are too large, each query sees too many objects.
If cells are too small, large objects are duplicated across many cells and queries may scan many empty cells.
| Cell size | Problem |
|---|---|
| Too large | Poor filtering |
| Too small | High overhead |
| Near object/query scale | Usually reasonable |
There is no universal value. Choose cell size from data distribution and query workload.
A simple rule of thumb:
cell_size should be near the average query width or average object size
Then benchmark.
Exact Tests After Filtering
The grid index returns candidates based on bounding boxes. For points and rectangles, the bounding box test may already be exact enough. For line segments and polygons, run a second-stage exact test.
The query pipeline should look like this:
query area
-> spatial index candidates
-> exact geometry predicate
-> final results
This two-stage design is common in spatial systems.
Point-in-Rectangle Query
For point data, a rectangular range query should return points inside the area.
class PointGridIndex(GridIndex):
def query_points_in_rect(self, area):
candidates = set()
for cell in self._cells_for_rect(area):
candidates.update(self.cells.get(cell, set()))
results = []
for object_id in candidates:
point = self.objects[object_id]
if area.contains_point(point):
results.append(object_id)
return results
This is useful for map markers, sensor locations, warehouse locations, and scatterplot selection.
Nearest Neighbor Query
A nearest-neighbor query asks for the closest object to a point.
A brute-force implementation:
def squared_distance(left, right):
dx = left.x - right.x
dy = left.y - right.y
return dx * dx + dy * dy
def nearest_point_bruteforce(points, query):
best_id = None
best_distance = math.inf
for object_id, point in points.items():
distance = squared_distance(point, query)
if distance < best_distance:
best_distance = distance
best_id = object_id
return best_id, math.sqrt(best_distance)
Using a grid, search outward from the query cell until no unseen cell can contain a closer point. A simple version expands rings.
def cells_in_ring(cx, cy, radius):
if radius == 0:
yield cx, cy
return
for x in range(cx - radius, cx + radius + 1):
yield x, cy - radius
yield x, cy + radius
for y in range(cy - radius + 1, cy + radius):
yield cx - radius, y
yield cx + radius, y
A complete nearest-neighbor search needs careful stopping conditions. For many applications, a bounded search radius is enough.
Segment Intersection
For line segments, the basic exact predicate is orientation.
@dataclass(frozen=True)
class Segment:
a: Point
b: Point
Orientation:
def orientation(a, b, c):
value = (
(b.x - a.x) * (c.y - a.y)
- (b.y - a.y) * (c.x - a.x)
)
if value > 0:
return 1
if value < 0:
return -1
return 0
Segment bounding box:
def segment_bounds(segment):
return Rect(
min(segment.a.x, segment.b.x),
min(segment.a.y, segment.b.y),
max(segment.a.x, segment.b.x),
max(segment.a.y, segment.b.y),
)
Intersection:
def on_segment(a, b, c):
return (
min(a.x, c.x) <= b.x <= max(a.x, c.x)
and min(a.y, c.y) <= b.y <= max(a.y, c.y)
)
def segments_intersect(left, right):
a = left.a
b = left.b
c = right.a
d = right.b
o1 = orientation(a, b, c)
o2 = orientation(a, b, d)
o3 = orientation(c, d, a)
o4 = orientation(c, d, b)
if o1 != o2 and o3 != o4:
return True
if o1 == 0 and on_segment(a, c, b):
return True
if o2 == 0 and on_segment(a, d, b):
return True
if o3 == 0 and on_segment(c, a, d):
return True
if o4 == 0 and on_segment(c, b, d):
return True
return False
For a segment query, first use bounding boxes to find candidates, then run segments_intersect.
Extending bounds_of
Add segment support:
def bounds_of(geometry):
if isinstance(geometry, Point):
return point_bounds(geometry)
if isinstance(geometry, Rect):
return rect_bounds(geometry)
if isinstance(geometry, Segment):
return segment_bounds(geometry)
raise TypeError(f"unsupported geometry: {type(geometry)!r}")
Query segments that intersect a given segment:
class SegmentGridIndex(GridIndex):
def query_segment(self, segment):
area = segment_bounds(segment)
candidates = set()
for cell in self._cells_for_rect(area):
candidates.update(self.cells.get(cell, set()))
results = []
for object_id in candidates:
geometry = self.objects[object_id]
if not isinstance(geometry, Segment):
continue
if segments_intersect(segment, geometry):
results.append(object_id)
return results
This pattern scales to polygons and circles: index by bounding box, then apply exact predicates.
R-Trees
Uniform grids work well when objects are evenly distributed and similarly sized. They work poorly when geometry is clustered or object sizes vary widely.
An R-tree is a spatial index designed for rectangles. It groups nearby bounding boxes into hierarchical nodes.
Conceptually:
root bounding boxes
-> child bounding boxes
-> leaf objects
A query descends only into nodes whose bounding boxes intersect the query area.
R-trees are a standard choice for GIS and spatial databases because they handle nonuniform distributions better than uniform grids.
A full R-tree implementation is beyond this recipe, but the interface is similar:
class SpatialIndex:
def insert(self, object_id, geometry):
raise NotImplementedError
def query_rect(self, area):
raise NotImplementedError
Design your engine around this interface so you can replace the grid with an R-tree later.
Precision and Robustness
Floating-point geometry is fragile.
Problems include:
- Nearly collinear points.
- Tiny rounding errors.
- Points near boundaries.
- Very large and very small coordinates mixed together.
- Different coordinate systems.
A strict orientation check:
if value > 0
may be unstable when value is near zero.
Use an epsilon policy:
EPSILON = 1e-9
def orientation_eps(a, b, c):
value = (
(b.x - a.x) * (c.y - a.y)
- (b.y - a.y) * (c.x - a.x)
)
if value > EPSILON:
return 1
if value < -EPSILON:
return -1
return 0
For serious computational geometry, prefer robust predicates and well-defined coordinate systems. For integer grid geometry, exact integer arithmetic is often simpler and safer.
Coordinate Systems
A geometry engine should make coordinate assumptions explicit.
Examples:
| Domain | Coordinate model |
|---|---|
| Screen UI | Pixels, y may increase downward |
| GIS | Latitude and longitude |
| CAD | Cartesian coordinates |
| Games | World coordinates |
| Robotics | Local or global frame |
Euclidean distance on latitude and longitude is usually wrong over large geographic areas. For maps, use appropriate projections or geodesic distance formulas.
The algorithms in this recipe assume a flat Cartesian plane.
Deletion and Updates
A spatial index must support updates.
For a grid index, deletion requires removing the object from every cell it occupies.
Store the object's previous cells.
class MutableGridIndex(GridIndex):
def __init__(self, cell_size):
super().__init__(cell_size)
self.object_cells = {}
def insert(self, object_id, geometry):
if object_id in self.objects:
self.delete(object_id)
self.objects[object_id] = geometry
bounds = bounds_of(geometry)
cells = set(self._cells_for_rect(bounds))
for cell in cells:
self.cells[cell].add(object_id)
self.object_cells[object_id] = cells
def delete(self, object_id):
cells = self.object_cells.pop(object_id, set())
for cell in cells:
self.cells[cell].discard(object_id)
self.objects.pop(object_id, None)
Updates become delete plus insert.
def update(self, object_id, geometry):
self.insert(object_id, geometry)
Metrics
Track index quality.
Useful metrics:
| Metric | Meaning |
|---|---|
| object_count | Stored objects |
| cell_count | Non-empty cells |
| average_objects_per_cell | Filtering quality |
| max_objects_per_cell | Hotspot indicator |
| average_cells_per_object | Duplication overhead |
| query_candidate_count | Work before exact filtering |
| query_result_count | Final result size |
| false_positive_ratio | Candidate count divided by result count |
If most queries inspect thousands of candidates to return a few objects, the index parameters are poor.
Testing
Test rectangle intersection.
def test_rect_intersection():
a = Rect(0, 0, 10, 10)
b = Rect(5, 5, 15, 15)
c = Rect(20, 20, 30, 30)
assert a.intersects(b) is True
assert a.intersects(c) is False
Test point query.
def test_points_in_rect():
index = PointGridIndex(cell_size=10)
index.insert("a", Point(1, 1))
index.insert("b", Point(20, 20))
result = set(index.query_points_in_rect(
Rect(0, 0, 10, 10)
))
assert result == {"a"}
Test segment intersection.
def test_segments_intersect():
left = Segment(Point(0, 0), Point(10, 10))
right = Segment(Point(0, 10), Point(10, 0))
assert segments_intersect(left, right) is True
Test deletion.
def test_delete_removes_from_cells():
index = MutableGridIndex(cell_size=10)
index.insert("a", Rect(0, 0, 5, 5))
index.delete("a")
assert index.query_rect(Rect(0, 0, 10, 10)) == []
Small geometry tests should include boundary cases. Boundary semantics must be explicit: points on edges either count as inside or outside.
Common Bugs
The most common geometry engine bug is relying on bounding boxes as exact geometry tests. Bounding boxes are filters. They may produce false positives.
Another common bug is choosing a grid cell size without measuring candidate counts.
Objects that span many cells can dominate memory. Large rectangles, long roads, or country-sized polygons need special handling.
Floating-point comparisons near boundaries can produce inconsistent results. Use an epsilon policy or exact arithmetic where appropriate.
Deletion bugs leave stale object IDs in cells. Track object-to-cell mappings.
Coordinate systems are often mixed. Latitude-longitude values should not be treated as flat x-y coordinates unless the domain permits it.
Nearest-neighbor queries are easy to implement incorrectly. Stopping conditions must prove that no unseen cell can contain a closer object.
Recipe
Build the engine in layers.
Start with geometry primitives and exact predicates. Add a brute-force engine as a correctness baseline. Add bounding boxes for every object type. Implement a uniform grid index for rectangular candidate filtering. Add exact second-stage tests for points, rectangles, and segments. Track candidate counts and tune cell size. Add deletion by storing object-to-cell mappings. Move to R-trees when data is clustered, object sizes vary, or grid tuning becomes difficult. Define coordinate and boundary semantics before exposing the API.
The main lesson is that spatial query engines use approximation before precision. Bounding boxes and indexes quickly narrow the search space; exact geometry tests confirm the final answer. This separation keeps the engine fast without giving up correctness.