15.18 Closest Pair of Points

Given \(n\) points in a plane, find the pair with the smallest Euclidean distance.

15.18 Closest Pair of Points

Problem

Given (n) points in a plane, find the pair with the smallest Euclidean distance.

Example:

(2,3)
(12,30)
(40,50)
(5,1)
(12,10)
(3,4)

The closest pair is:

(2,3)
(3,4)

with distance:

$$ \sqrt{(3-2)^2+(4-3)^2} = \sqrt{2} $$

A straightforward solution compares every pair of points.

point 1 vs point 2
point 1 vs point 3
...
point n-1 vs point n

The number of comparisons is:

$$ \binom{n}{2} = O(n^2) $$

Can divide and conquer find the closest pair more efficiently?

Solution

Use the closest-pair divide-and-conquer algorithm.

The algorithm:

  1. Sort points by x-coordinate.
  2. Split the points into left and right halves.
  3. Recursively find the closest pair in each half.
  4. Find the closest pair that crosses the dividing line.
  5. Return the best result.

The surprising result is that the merge step can be completed in linear time.

This yields:

$$ T(n)=2T(n/2)+O(n) $$

and therefore:

$$ O(n\log n) $$

overall.

Brute Force Baseline

The direct implementation is simple.

func ClosestPairBrute(points []Point) float64 {
    best := math.Inf(1)

    for i := 0; i < len(points); i++ {
        for j := i + 1; j < len(points); j++ {
            best = math.Min(
                best,
                Distance(points[i], points[j]),
            )
        }
    }

    return best
}

Complexity:

$$ O(n^2) $$

For a few thousand points this is acceptable.

For millions of points it becomes expensive.

Geometric Observation

Suppose we split the points by a vertical line.

left points  |  right points

Recursively compute:

dL = closest distance in left half
dR = closest distance in right half

Then:

$$ d=\min(d_L,d_R) $$

Any closer pair must cross the dividing line.

Therefore the merge step only needs to search for crossing pairs closer than (d).

This observation eliminates most comparisons.

Divide Step

After sorting by x-coordinate:

P0 P1 P2 P3 | P4 P5 P6 P7

Split around the median x-coordinate.

left
right

Recursively solve:

closest(left)
closest(right)

This is straightforward divide and conquer.

The challenge is the merge step.

The Strip

Let:

$$ d=\min(d_L,d_R) $$

Any crossing pair closer than (d) must lie within distance (d) of the dividing line.

Construct the strip:

|<---- d ---->| line |<---- d ---->|

Points farther away cannot improve the answer.

Only points inside this strip matter.

Why the Strip Is Small

At first glance, the strip might contain many points.

The key geometric fact is that points inside each half are already separated by at least (d).

That limits how densely points can be packed.

As a result, each point only needs to be compared against a constant number of nearby points.

This is the insight that transforms the merge step from quadratic to linear.

Sorting by Y-Coordinate

Inside the strip, sort points by y-coordinate.

Example:

(4,2)
(5,4)
(4,5)
(6,7)
(5,9)
...

Now process points from bottom to top.

For each point:

compare only nearby points above it

The geometry guarantees that at most a constant number of comparisons are necessary.

The Famous Seven-Point Bound

Consider a point:

p

inside the strip.

Any point farther than (d) vertically cannot improve the answer.

Therefore only points inside a rectangle:

width  = 2d
height = d

need consideration.

A packing argument shows that at most seven subsequent points can lie in the relevant region.

Therefore:

compare against next 7 points

is sufficient.

This makes the strip scan linear.

Point Structure

type Point struct {
    X float64
    Y float64
}

Distance:

func Distance(a, b Point) float64 {
    dx := a.X - b.X
    dy := a.Y - b.Y

    return math.Sqrt(dx*dx + dy*dy)
}

Recursive Structure

Pseudo-code:

closest(points)

    if few points
        brute force

    split

    dLeft  = closest(left)
    dRight = closest(right)

    d = min(dLeft, dRight)

    build strip

    scan strip

    return best distance

This resembles merge sort.

The strip scan replaces the merge operation.

Efficient Implementation Strategy

Maintain:

Px = points sorted by x
Py = points sorted by y

throughout recursion.

This avoids repeated sorting.

If we sort inside every recursive call, complexity becomes:

$$ O(n\log^2 n) $$

Maintaining both orderings gives the optimal:

$$ O(n\log n) $$

algorithm.

Recursive Skeleton

func ClosestPair(points []Point) float64 {
    px := append([]Point(nil), points...)
    py := append([]Point(nil), points...)

    sort.Slice(px, func(i, j int) bool {
        return px[i].X < px[j].X
    })

    sort.Slice(py, func(i, j int) bool {
        return py[i].Y < py[j].Y
    })

    return closest(px, py)
}

The recursive helper receives both sorted views.

Base Case

For small inputs:

2 points
3 points
4 points

brute force is simpler.

if len(px) <= 3 {
    return bruteForce(px)
}

This keeps the recursion clean.

Strip Construction

Suppose:

midX = median x-coordinate

After computing:

d

build:

strip := make([]Point, 0)

for _, p := range py {
    if math.Abs(p.X-midX) < d {
        strip = append(strip, p)
    }
}

Because py is already sorted by y-coordinate, the strip inherits that ordering automatically.

Strip Scan

best := d

for i := 0; i < len(strip); i++ {
    for j := i + 1;
        j < len(strip) &&
        strip[j].Y-strip[i].Y < best;
        j++ {

        best = math.Min(
            best,
            Distance(strip[i], strip[j]),
        )
    }
}

The theoretical analysis shows that the inner loop performs only a constant number of iterations per point.

Therefore:

$$ O(n) $$

total strip work.

Recurrence Analysis

Each recursive call processes:

left half
right half

The merge step performs:

build strip
scan strip

in linear time.

Therefore:

$$ T(n)=2T(n/2)+O(n) $$

Using the Master Theorem:

$$ T(n)=O(n\log n) $$

This is exponentially better than:

$$ O(n^2) $$

for large datasets.

Visual Example

Suppose:

left minimum  = 5
right minimum = 4

Then:

d = 4

Any crossing pair farther than:

4

from the dividing line cannot improve the answer.

The strip contains only candidate points.

The geometric packing argument then reduces thousands of potential comparisons to a handful per point.

This is the entire reason the algorithm succeeds.

Closest Pair in Higher Dimensions

The same idea extends beyond two dimensions.

Examples:

3D points
4D points
k-dimensional points

However, the constant factors increase.

The elegant seven-point bound becomes more complicated.

For large dimensions, spatial data structures often become preferable.

Examples include:

  • k-d trees
  • ball trees
  • cover trees
  • locality-sensitive hashing

The divide-and-conquer algorithm remains most attractive in low dimensions.

Applications

Geographic Information Systems

Finding nearby locations.

Examples:

  • restaurants
  • hospitals
  • delivery hubs

Collision Detection

Identifying objects that may collide.

Examples:

  • physics engines
  • games
  • simulations

Clustering

Many clustering algorithms begin with nearest-neighbor relationships.

Computer Vision

Matching features across images.

Computational Geometry

The closest-pair problem is one of the classic geometry problems that demonstrates divide-and-conquer power.

Common Mistakes

Sorting During Every Recursive Call

Repeated sorting increases complexity.

Maintain sorted structures across recursion.

Comparing Every Point in the Strip

The strip scan is not quadratic.

Use the geometric bound.

Ignoring Y Ordering

The linear merge step depends on processing points by y-coordinate.

Using Square Roots Excessively

Distance comparisons can use squared distance:

dx*dx + dy*dy

This avoids many expensive square-root operations.

Take the square root only when returning the final answer.

Mishandling Duplicate Points

If two points are identical:

distance = 0

The algorithm should immediately recognize this as the optimal answer.

Relationship to Merge Sort

The closest-pair algorithm is often described as the geometric analogue of merge sort.

Both algorithms:

  1. Split the input in half.
  2. Recursively solve each half.
  3. Perform a linear merge step.

The difference is that merge sort combines sorted sequences, while closest-pair combines geometric information.

The recurrence is identical:

$$ T(n)=2T(n/2)+O(n) $$

which is why both achieve:

$$ O(n\log n) $$

time.

Discussion

Closest-pair is one of the most elegant divide-and-conquer algorithms because the difficult part is not the recursion. The difficult part is proving that the merge step is linear. At first glance, points near the dividing line seem to require many comparisons. The geometric packing argument reveals hidden structure and collapses that apparent complexity into a constant number of checks per point.

This is a recurring lesson in advanced algorithm design. The recursive decomposition is often obvious. The real breakthrough comes from discovering additional properties that make the combine phase efficient. Merge sort uses ordering. FFT uses algebraic symmetry. Closest-pair uses geometry. In each case, divide and conquer succeeds because the merge step is far cheaper than it initially appears.