17.18 Geometric Hashing
You need a fast way to group or search geometric objects by location.
17.18 Geometric Hashing
Problem
You need a fast way to group or search geometric objects by location.
Examples include:
nearby point lookup
collision broad-phase detection
duplicate point detection
spatial clustering
map tile indexing
particle simulation
nearest-neighbor candidates
A brute-force search compares every object with every other object.
For n objects, that costs:
O(n²)
Geometric hashing reduces this cost by placing objects into spatial buckets. Instead of searching the whole dataset, you search only nearby buckets.
Solution
Divide space into a regular grid and hash each object into one or more grid cells.
For a point:
(x, y)
and cell size:
s
the grid cell is:
cell_x = floor(x / s)
cell_y = floor(y / s)
The pair:
(cell_x, cell_y)
becomes the hash key.
In Python:
from collections import defaultdict
from math import floor
class SpatialHash:
def __init__(self, cell_size):
self.cell_size = cell_size
self.cells = defaultdict(list)
def key(self, point):
return (
floor(point.x / self.cell_size),
floor(point.y / self.cell_size)
)
def insert(self, point):
self.cells[self.key(point)].append(point)
This gives a simple spatial index.
Querying Nearby Points
To find points near a query point, inspect the query cell and neighboring cells.
def nearby_cells(cell):
cx, cy = cell
for dx in (-1, 0, 1):
for dy in (-1, 0, 1):
yield (cx + dx, cy + dy)
Then:
def candidates(self, point):
result = []
for cell in nearby_cells(self.key(point)):
result.extend(self.cells.get(cell, []))
return result
The result is a candidate set, not necessarily the final answer.
You still need an exact distance test afterward.
def distance_squared(a, b):
dx = a.x - b.x
dy = a.y - b.y
return dx * dx + dy * dy
For a radius query:
def radius_query(self, point, radius):
radius2 = radius * radius
result = []
for candidate in self.candidates(point):
if distance_squared(point, candidate) <= radius2:
result.append(candidate)
return result
Choosing Cell Size
Cell size controls performance.
If cells are too small:
many empty buckets
many neighboring cells to inspect
more hash overhead
If cells are too large:
too many objects per bucket
candidate sets become large
performance approaches brute force
For radius queries, a common choice is:
cell size = query radius
Then any point within the radius must lie in the same cell or one of the eight neighboring cells.
For variable-radius queries, no single cell size is ideal. You may need a hierarchy, an R-tree, or a KD-tree.
Duplicate Point Detection
Spatial hashing is useful for merging points that are equal or nearly equal.
For exact integer coordinates:
def find_duplicates(points):
seen = set()
duplicates = []
for point in points:
key = (point.x, point.y)
if key in seen:
duplicates.append(point)
else:
seen.add(key)
return duplicates
For approximate floating-point points, hash by grid cell and check nearby cells.
def almost_duplicate(hash_index, point, eps):
for candidate in hash_index.radius_query(point, eps):
return candidate
return None
This is common in mesh cleanup and point-cloud processing.
Collision Broad Phase
In physics engines and games, collision detection often has two phases:
broad phase: find possible collisions
narrow phase: test exact geometry
A spatial hash is a common broad-phase structure.
For each object, insert its bounding box into every grid cell it overlaps.
def box_cells(box, cell_size):
min_x = floor(box.left / cell_size)
max_x = floor((box.right) / cell_size)
min_y = floor(box.bottom / cell_size)
max_y = floor((box.top) / cell_size)
for x in range(min_x, max_x + 1):
for y in range(min_y, max_y + 1):
yield (x, y)
Objects sharing a cell become collision candidates.
def insert_box(cells, obj, box, cell_size):
for cell in box_cells(box, cell_size):
cells[cell].append(obj)
Then, for each bucket, test pairs only inside that bucket.
Avoiding Duplicate Pair Tests
If two boxes overlap multiple cells, the same object pair may appear multiple times.
Use a set of tested pairs.
def pair_key(a, b):
if id(a) < id(b):
return (id(a), id(b))
return (id(b), id(a))
During broad-phase testing:
tested = set()
for bucket in cells.values():
for i in range(len(bucket)):
for j in range(i + 1, len(bucket)):
key = pair_key(bucket[i], bucket[j])
if key in tested:
continue
tested.add(key)
if exact_collision(bucket[i], bucket[j]):
report_collision(bucket[i], bucket[j])
The spatial hash reduces the number of candidate pairs. The final exact test preserves correctness.
Hashing Segments
For line segments, insert the segment into every grid cell it crosses.
A simple approach is to insert its bounding box:
def insert_segment_by_box(cells, segment, cell_size):
box = bounding_box(segment)
for cell in box_cells(box, cell_size):
cells[cell].append(segment)
This is conservative: it may add the segment to cells it does not actually cross.
That is acceptable for broad-phase detection.
For tighter indexing, use a grid traversal algorithm that walks through only cells intersected by the segment.
Geometric Hashing for Pattern Matching
The term geometric hashing also refers to a technique for recognizing shapes under transformations.
The high-level idea:
- Pick a basis from points in a model.
- Express other model points relative to that basis.
- Store those transformed coordinates in a hash table.
- For a query shape, repeat the process and vote for matching models.
This version is used in computer vision and pattern recognition.
In this section, the focus is the spatial-indexing version because it appears more often in algorithmic geometry and simulation code.
Complexity Analysis
For uniformly distributed points and a suitable cell size:
insertion: O(1) expected
candidate query: O(1) expected
radius query: O(k) expected
where k is the number of reported nearby points.
For poor distributions, performance can degrade.
If many points fall into the same cell, queries become:
O(n)
A spatial hash is simple and fast when the data distribution and cell size are reasonable.
Common Pitfalls
Choosing the Wrong Cell Size
Cell size determines bucket density. Bad cell sizes destroy performance.
Forgetting Neighboring Cells
A nearby point may lie just across a cell boundary.
Always inspect adjacent cells for radius queries.
Treating Candidates as Answers
A spatial hash returns candidates. Run exact geometry afterward.
Ignoring Negative Coordinates
Use floor, not integer truncation toward zero.
For example:
floor(-0.2) = -1
but truncation may produce:
0
which places points in the wrong cell.
Duplicate Pair Reports
Objects spanning multiple cells may create repeated candidate pairs.
Track tested pairs when reporting collisions.
Discussion
Geometric hashing is one of the simplest spatial indexing techniques. It trades perfect selectivity for speed and implementation simplicity. For uniformly distributed objects and predictable query radii, it is often faster and easier than a tree-based index.
The important design rule is to treat the hash as a filter. It narrows the search to plausible candidates, but the exact geometric predicate still decides the answer. This broad-phase then narrow-phase split is a recurring pattern in practical geometry systems.