15.18 Closest Pair of Points
Given \(n\) points in a plane, find the pair with the smallest Euclidean distance.
15.18 Closest Pair of Points
Problem
Given (n) points in a plane, find the pair with the smallest Euclidean distance.
Example:
(2,3)
(12,30)
(40,50)
(5,1)
(12,10)
(3,4)
The closest pair is:
(2,3)
(3,4)
with distance:
$$ \sqrt{(3-2)^2+(4-3)^2} = \sqrt{2} $$
A straightforward solution compares every pair of points.
point 1 vs point 2
point 1 vs point 3
...
point n-1 vs point n
The number of comparisons is:
$$ \binom{n}{2} = O(n^2) $$
Can divide and conquer find the closest pair more efficiently?
Solution
Use the closest-pair divide-and-conquer algorithm.
The algorithm:
- Sort points by x-coordinate.
- Split the points into left and right halves.
- Recursively find the closest pair in each half.
- Find the closest pair that crosses the dividing line.
- Return the best result.
The surprising result is that the merge step can be completed in linear time.
This yields:
$$ T(n)=2T(n/2)+O(n) $$
and therefore:
$$ O(n\log n) $$
overall.
Brute Force Baseline
The direct implementation is simple.
func ClosestPairBrute(points []Point) float64 {
best := math.Inf(1)
for i := 0; i < len(points); i++ {
for j := i + 1; j < len(points); j++ {
best = math.Min(
best,
Distance(points[i], points[j]),
)
}
}
return best
}
Complexity:
$$ O(n^2) $$
For a few thousand points this is acceptable.
For millions of points it becomes expensive.
Geometric Observation
Suppose we split the points by a vertical line.
left points | right points
Recursively compute:
dL = closest distance in left half
dR = closest distance in right half
Then:
$$ d=\min(d_L,d_R) $$
Any closer pair must cross the dividing line.
Therefore the merge step only needs to search for crossing pairs closer than (d).
This observation eliminates most comparisons.
Divide Step
After sorting by x-coordinate:
P0 P1 P2 P3 | P4 P5 P6 P7
Split around the median x-coordinate.
left
right
Recursively solve:
closest(left)
closest(right)
This is straightforward divide and conquer.
The challenge is the merge step.
The Strip
Let:
$$ d=\min(d_L,d_R) $$
Any crossing pair closer than (d) must lie within distance (d) of the dividing line.
Construct the strip:
|<---- d ---->| line |<---- d ---->|
Points farther away cannot improve the answer.
Only points inside this strip matter.
Why the Strip Is Small
At first glance, the strip might contain many points.
The key geometric fact is that points inside each half are already separated by at least (d).
That limits how densely points can be packed.
As a result, each point only needs to be compared against a constant number of nearby points.
This is the insight that transforms the merge step from quadratic to linear.
Sorting by Y-Coordinate
Inside the strip, sort points by y-coordinate.
Example:
(4,2)
(5,4)
(4,5)
(6,7)
(5,9)
...
Now process points from bottom to top.
For each point:
compare only nearby points above it
The geometry guarantees that at most a constant number of comparisons are necessary.
The Famous Seven-Point Bound
Consider a point:
p
inside the strip.
Any point farther than (d) vertically cannot improve the answer.
Therefore only points inside a rectangle:
width = 2d
height = d
need consideration.
A packing argument shows that at most seven subsequent points can lie in the relevant region.
Therefore:
compare against next 7 points
is sufficient.
This makes the strip scan linear.
Point Structure
type Point struct {
X float64
Y float64
}
Distance:
func Distance(a, b Point) float64 {
dx := a.X - b.X
dy := a.Y - b.Y
return math.Sqrt(dx*dx + dy*dy)
}
Recursive Structure
Pseudo-code:
closest(points)
if few points
brute force
split
dLeft = closest(left)
dRight = closest(right)
d = min(dLeft, dRight)
build strip
scan strip
return best distance
This resembles merge sort.
The strip scan replaces the merge operation.
Efficient Implementation Strategy
Maintain:
Px = points sorted by x
Py = points sorted by y
throughout recursion.
This avoids repeated sorting.
If we sort inside every recursive call, complexity becomes:
$$ O(n\log^2 n) $$
Maintaining both orderings gives the optimal:
$$ O(n\log n) $$
algorithm.
Recursive Skeleton
func ClosestPair(points []Point) float64 {
px := append([]Point(nil), points...)
py := append([]Point(nil), points...)
sort.Slice(px, func(i, j int) bool {
return px[i].X < px[j].X
})
sort.Slice(py, func(i, j int) bool {
return py[i].Y < py[j].Y
})
return closest(px, py)
}
The recursive helper receives both sorted views.
Base Case
For small inputs:
2 points
3 points
4 points
brute force is simpler.
if len(px) <= 3 {
return bruteForce(px)
}
This keeps the recursion clean.
Strip Construction
Suppose:
midX = median x-coordinate
After computing:
d
build:
strip := make([]Point, 0)
for _, p := range py {
if math.Abs(p.X-midX) < d {
strip = append(strip, p)
}
}
Because py is already sorted by y-coordinate, the strip inherits that ordering automatically.
Strip Scan
best := d
for i := 0; i < len(strip); i++ {
for j := i + 1;
j < len(strip) &&
strip[j].Y-strip[i].Y < best;
j++ {
best = math.Min(
best,
Distance(strip[i], strip[j]),
)
}
}
The theoretical analysis shows that the inner loop performs only a constant number of iterations per point.
Therefore:
$$ O(n) $$
total strip work.
Recurrence Analysis
Each recursive call processes:
left half
right half
The merge step performs:
build strip
scan strip
in linear time.
Therefore:
$$ T(n)=2T(n/2)+O(n) $$
Using the Master Theorem:
$$ T(n)=O(n\log n) $$
This is exponentially better than:
$$ O(n^2) $$
for large datasets.
Visual Example
Suppose:
left minimum = 5
right minimum = 4
Then:
d = 4
Any crossing pair farther than:
4
from the dividing line cannot improve the answer.
The strip contains only candidate points.
The geometric packing argument then reduces thousands of potential comparisons to a handful per point.
This is the entire reason the algorithm succeeds.
Closest Pair in Higher Dimensions
The same idea extends beyond two dimensions.
Examples:
3D points
4D points
k-dimensional points
However, the constant factors increase.
The elegant seven-point bound becomes more complicated.
For large dimensions, spatial data structures often become preferable.
Examples include:
- k-d trees
- ball trees
- cover trees
- locality-sensitive hashing
The divide-and-conquer algorithm remains most attractive in low dimensions.
Applications
Geographic Information Systems
Finding nearby locations.
Examples:
- restaurants
- hospitals
- delivery hubs
Collision Detection
Identifying objects that may collide.
Examples:
- physics engines
- games
- simulations
Clustering
Many clustering algorithms begin with nearest-neighbor relationships.
Computer Vision
Matching features across images.
Computational Geometry
The closest-pair problem is one of the classic geometry problems that demonstrates divide-and-conquer power.
Common Mistakes
Sorting During Every Recursive Call
Repeated sorting increases complexity.
Maintain sorted structures across recursion.
Comparing Every Point in the Strip
The strip scan is not quadratic.
Use the geometric bound.
Ignoring Y Ordering
The linear merge step depends on processing points by y-coordinate.
Using Square Roots Excessively
Distance comparisons can use squared distance:
dx*dx + dy*dy
This avoids many expensive square-root operations.
Take the square root only when returning the final answer.
Mishandling Duplicate Points
If two points are identical:
distance = 0
The algorithm should immediately recognize this as the optimal answer.
Relationship to Merge Sort
The closest-pair algorithm is often described as the geometric analogue of merge sort.
Both algorithms:
- Split the input in half.
- Recursively solve each half.
- Perform a linear merge step.
The difference is that merge sort combines sorted sequences, while closest-pair combines geometric information.
The recurrence is identical:
$$ T(n)=2T(n/2)+O(n) $$
which is why both achieve:
$$ O(n\log n) $$
time.
Discussion
Closest-pair is one of the most elegant divide-and-conquer algorithms because the difficult part is not the recursion. The difficult part is proving that the merge step is linear. At first glance, points near the dividing line seem to require many comparisons. The geometric packing argument reveals hidden structure and collapses that apparent complexity into a constant number of checks per point.
This is a recurring lesson in advanced algorithm design. The recursive decomposition is often obvious. The real breakthrough comes from discovering additional properties that make the combine phase efficient. Merge sort uses ordering. FFT uses algebraic symmetry. Closest-pair uses geometry. In each case, divide and conquer succeeds because the merge step is far cheaper than it initially appears.