17.18 Geometric Hashing

You need a fast way to group or search geometric objects by location.

17.18 Geometric Hashing

Problem

You need a fast way to group or search geometric objects by location.

Examples include:

nearby point lookup
collision broad-phase detection
duplicate point detection
spatial clustering
map tile indexing
particle simulation
nearest-neighbor candidates

A brute-force search compares every object with every other object.

For n objects, that costs:

O(n²)

Geometric hashing reduces this cost by placing objects into spatial buckets. Instead of searching the whole dataset, you search only nearby buckets.

Solution

Divide space into a regular grid and hash each object into one or more grid cells.

For a point:

(x, y)

and cell size:

s

the grid cell is:

cell_x = floor(x / s)
cell_y = floor(y / s)

The pair:

(cell_x, cell_y)

becomes the hash key.

In Python:

from collections import defaultdict
from math import floor

class SpatialHash:
    def __init__(self, cell_size):
        self.cell_size = cell_size
        self.cells = defaultdict(list)

    def key(self, point):
        return (
            floor(point.x / self.cell_size),
            floor(point.y / self.cell_size)
        )

    def insert(self, point):
        self.cells[self.key(point)].append(point)

This gives a simple spatial index.

Querying Nearby Points

To find points near a query point, inspect the query cell and neighboring cells.

def nearby_cells(cell):
    cx, cy = cell

    for dx in (-1, 0, 1):
        for dy in (-1, 0, 1):
            yield (cx + dx, cy + dy)

Then:

def candidates(self, point):
    result = []

    for cell in nearby_cells(self.key(point)):
        result.extend(self.cells.get(cell, []))

    return result

The result is a candidate set, not necessarily the final answer.

You still need an exact distance test afterward.

def distance_squared(a, b):
    dx = a.x - b.x
    dy = a.y - b.y

    return dx * dx + dy * dy

For a radius query:

def radius_query(self, point, radius):
    radius2 = radius * radius
    result = []

    for candidate in self.candidates(point):
        if distance_squared(point, candidate) <= radius2:
            result.append(candidate)

    return result

Choosing Cell Size

Cell size controls performance.

If cells are too small:

many empty buckets
many neighboring cells to inspect
more hash overhead

If cells are too large:

too many objects per bucket
candidate sets become large
performance approaches brute force

For radius queries, a common choice is:

cell size = query radius

Then any point within the radius must lie in the same cell or one of the eight neighboring cells.

For variable-radius queries, no single cell size is ideal. You may need a hierarchy, an R-tree, or a KD-tree.

Duplicate Point Detection

Spatial hashing is useful for merging points that are equal or nearly equal.

For exact integer coordinates:

def find_duplicates(points):
    seen = set()
    duplicates = []

    for point in points:
        key = (point.x, point.y)

        if key in seen:
            duplicates.append(point)
        else:
            seen.add(key)

    return duplicates

For approximate floating-point points, hash by grid cell and check nearby cells.

def almost_duplicate(hash_index, point, eps):
    for candidate in hash_index.radius_query(point, eps):
        return candidate

    return None

This is common in mesh cleanup and point-cloud processing.

Collision Broad Phase

In physics engines and games, collision detection often has two phases:

broad phase: find possible collisions
narrow phase: test exact geometry

A spatial hash is a common broad-phase structure.

For each object, insert its bounding box into every grid cell it overlaps.

def box_cells(box, cell_size):
    min_x = floor(box.left / cell_size)
    max_x = floor((box.right) / cell_size)

    min_y = floor(box.bottom / cell_size)
    max_y = floor((box.top) / cell_size)

    for x in range(min_x, max_x + 1):
        for y in range(min_y, max_y + 1):
            yield (x, y)

Objects sharing a cell become collision candidates.

def insert_box(cells, obj, box, cell_size):
    for cell in box_cells(box, cell_size):
        cells[cell].append(obj)

Then, for each bucket, test pairs only inside that bucket.

Avoiding Duplicate Pair Tests

If two boxes overlap multiple cells, the same object pair may appear multiple times.

Use a set of tested pairs.

def pair_key(a, b):
    if id(a) < id(b):
        return (id(a), id(b))

    return (id(b), id(a))

During broad-phase testing:

tested = set()

for bucket in cells.values():
    for i in range(len(bucket)):
        for j in range(i + 1, len(bucket)):
            key = pair_key(bucket[i], bucket[j])

            if key in tested:
                continue

            tested.add(key)

            if exact_collision(bucket[i], bucket[j]):
                report_collision(bucket[i], bucket[j])

The spatial hash reduces the number of candidate pairs. The final exact test preserves correctness.

Hashing Segments

For line segments, insert the segment into every grid cell it crosses.

A simple approach is to insert its bounding box:

def insert_segment_by_box(cells, segment, cell_size):
    box = bounding_box(segment)

    for cell in box_cells(box, cell_size):
        cells[cell].append(segment)

This is conservative: it may add the segment to cells it does not actually cross.

That is acceptable for broad-phase detection.

For tighter indexing, use a grid traversal algorithm that walks through only cells intersected by the segment.

Geometric Hashing for Pattern Matching

The term geometric hashing also refers to a technique for recognizing shapes under transformations.

The high-level idea:

  1. Pick a basis from points in a model.
  2. Express other model points relative to that basis.
  3. Store those transformed coordinates in a hash table.
  4. For a query shape, repeat the process and vote for matching models.

This version is used in computer vision and pattern recognition.

In this section, the focus is the spatial-indexing version because it appears more often in algorithmic geometry and simulation code.

Complexity Analysis

For uniformly distributed points and a suitable cell size:

insertion: O(1) expected
candidate query: O(1) expected
radius query: O(k) expected

where k is the number of reported nearby points.

For poor distributions, performance can degrade.

If many points fall into the same cell, queries become:

O(n)

A spatial hash is simple and fast when the data distribution and cell size are reasonable.

Common Pitfalls

Choosing the Wrong Cell Size

Cell size determines bucket density. Bad cell sizes destroy performance.

Forgetting Neighboring Cells

A nearby point may lie just across a cell boundary.

Always inspect adjacent cells for radius queries.

Treating Candidates as Answers

A spatial hash returns candidates. Run exact geometry afterward.

Ignoring Negative Coordinates

Use floor, not integer truncation toward zero.

For example:

floor(-0.2) = -1

but truncation may produce:

0

which places points in the wrong cell.

Duplicate Pair Reports

Objects spanning multiple cells may create repeated candidate pairs.

Track tested pairs when reporting collisions.

Discussion

Geometric hashing is one of the simplest spatial indexing techniques. It trades perfect selectivity for speed and implementation simplicity. For uniformly distributed objects and predictable query radii, it is often faster and easier than a tree-based index.

The important design rule is to treat the hash as a filter. It narrows the search to plausible candidates, but the exact geometric predicate still decides the answer. This broad-phase then narrow-phase split is a recurring pattern in practical geometry systems.