17.8 Closest Pair of Points

Problem

Given a set of points in the plane, find the pair with the smallest Euclidean distance.

A brute-force solution compares every pair of points. For n points, that requires:

O(n²)

distance checks.

That is acceptable for small input, but it becomes impractical for large point sets. The closest-pair problem needs a faster method.

Solution

Use divide and conquer.

The high-level structure is:

sort points by x-coordinate
split into left and right halves
solve recursively on each half
merge by checking points near the split line

The key insight is that after solving the left and right halves, only a small number of cross-boundary point pairs can improve the answer.

Let:

d = best distance found in the left or right half

Any better pair crossing the split line must have both points within distance d of the split line.

This gives a narrow vertical strip to inspect.

Brute Force Baseline

Always start with the simple version. It is useful for testing the optimized implementation.

def distance_squared(a, b):
    dx = a.x - b.x
    dy = a.y - b.y
    return dx * dx + dy * dy

def closest_pair_brute(points):
    n = len(points)

    if n < 2:
        return None

    best = None
    best_distance = float("inf")

    for i in range(n):
        for j in range(i + 1, n):
            d = distance_squared(points[i], points[j])

            if d < best_distance:
                best_distance = d
                best = (points[i], points[j])

    return best, best_distance

This implementation returns the pair and the squared distance. Squared distances avoid unnecessary square roots.

Divide-and-Conquer Implementation

Use a point type:

from dataclasses import dataclass

@dataclass(frozen=True)
class Point:
    x: float
    y: float

The main function sorts the points once, then calls a recursive helper.

def closest_pair(points):
    if len(points) < 2:
        return None

    points_by_x = sorted(points, key=lambda p: (p.x, p.y))
    points_by_y = sorted(points, key=lambda p: (p.y, p.x))

    pair, distance = closest_pair_recursive(points_by_x, points_by_y)

    return pair, distance

The recursive function receives the same points sorted two ways: by x and by y.

def closest_pair_recursive(points_by_x, points_by_y):
    n = len(points_by_x)

    if n <= 3:
        return closest_pair_brute(points_by_x)

    mid = n // 2
    midpoint = points_by_x[mid]

    left_by_x = points_by_x[:mid]
    right_by_x = points_by_x[mid:]

    left_set = set(left_by_x)

    left_by_y = []
    right_by_y = []

    for point in points_by_y:
        if point in left_set:
            left_by_y.append(point)
        else:
            right_by_y.append(point)

    left_pair, left_distance = closest_pair_recursive(left_by_x, left_by_y)
    right_pair, right_distance = closest_pair_recursive(right_by_x, right_by_y)

    if left_distance <= right_distance:
        best_pair = left_pair
        best_distance = left_distance
    else:
        best_pair = right_pair
        best_distance = right_distance

    strip = [
        point
        for point in points_by_y
        if (point.x - midpoint.x) * (point.x - midpoint.x) < best_distance
    ]

    for i in range(len(strip)):
        for j in range(i + 1, min(i + 8, len(strip))):
            d = distance_squared(strip[i], strip[j])

            if d < best_distance:
                best_distance = d
                best_pair = (strip[i], strip[j])

    return best_pair, best_distance

The loop over the strip checks only the next few points in y order. The constant 8 is enough in the standard planar closest-pair proof. Many implementations use 7; using 8 is harmless and easier to write clearly.

Why the Strip Is Small

After the recursive calls, no two points in the left half are closer than d, and no two points in the right half are closer than d.

Only pairs split across the dividing line can improve the answer.

So both points must be inside this vertical strip:

        split
          |
     d    |    d
<-------->|<-------->
          |

Now sort the strip by y.

For each point, only a constant number of following points need to be examined. Geometrically, too many points in the same d by 2d neighborhood would force two of them to be closer than d, contradicting the recursive result.

That is the reason the merge step remains linear.

Example

Input:

(0,0)
(5,4)
(3,1)
(9,6)
(4,2)
(8,7)

The algorithm splits the points by x:

left:  (0,0), (3,1), (4,2)
right: (5,4), (8,7), (9,6)

The left side finds:

(3,1), (4,2)

with squared distance:

The right side finds:

(8,7), (9,6)

with squared distance:

The strip then checks cross-boundary candidates. If none is closer, either pair is a valid closest pair.

Handling Duplicate Points

Duplicate points make the closest distance zero.

You can detect them before recursion:

def has_duplicates(points):
    return len(set(points)) != len(points)

If duplicates exist, return the duplicate pair immediately.

In practice, this is often worth doing. A zero distance is the best possible answer.

Returning the Actual Distance

The algorithm above returns squared distance.

To return Euclidean distance:

import math

pair, distance2 = closest_pair(points)
distance = math.sqrt(distance2)

For comparisons, keep squared distance. Convert only for final reporting.

Correctness Argument

The closest pair is either entirely in the left half, entirely in the right half, or split across the dividing line.

The recursive calls solve the first two cases.

Let d be the smaller of those two answers. Any split pair with distance less than d must lie within distance d of the vertical split line, so it must appear in the strip.

The strip is sorted by y. For each point in the strip, only a constant number of later points can be close enough to improve the answer. Checking those candidates therefore covers every possible split pair that can beat d.

Since all three cases are covered, the algorithm returns the closest pair.

Complexity Analysis

Initial sorting costs:

O(n log n)

At each recursive level, the merge work is linear:

O(n)

The recurrence is:

T(n) = 2T(n/2) + O(n)

Therefore:

T(n) = O(n log n)

Space usage is:

O(n)

for sorted lists and recursive working arrays.

Common Pitfalls

Do not sort the strip independently inside every recursive call unless you accept a slower implementation. Keeping points_by_y through recursion preserves the linear merge step.

Do not compare actual distances with square roots in the inner loop. Squared distances are faster and avoid extra floating-point work.

Do not forget duplicate points. They produce distance zero and can simplify the answer immediately.

Do not check every point in the strip against every other point. That turns the merge step back into quadratic work.

Do not split points_by_y by comparing only x < midpoint.x when duplicate x values exist. Use membership in the left set, as shown above.

Discussion

The closest-pair algorithm is a model divide-and-conquer algorithm. The recursive split is straightforward, but the value comes from the merge step. Once the best distance inside each half is known, geometry sharply limits the number of cross-boundary candidates.

This pattern appears often in computational geometry: sort once, preserve useful order through recursion, and use geometric packing arguments to reduce the number of candidates.