Nearest neighbor search in metric space using hierarchical covers with exponential scale separation.
Cover tree search is a metric space search structure designed for nearest neighbor queries. It builds a hierarchy of points where each level covers the level below with exponentially shrinking distance bounds.
It provides strong theoretical guarantees in spaces with bounded intrinsic dimension.
Problem
Given a set of points in a metric space and a query point , find:
Structure
Each level of the tree has a scale:
Each node satisfies:
| property | meaning |
|---|---|
| nesting | every point appears in all lower levels |
| covering | children are within bounded distance |
| separation | nodes at same level are well separated |
Each node stores a point and pointers to children at the next finer level.
Algorithm
Search maintains a candidate best distance and prunes using scale bounds.
cover_tree_search(node, q, best):
if node is null:
return best
d = distance(q, node.point)
best = min(best, d)
if node.is_leaf:
return best
for child in node.children:
if distance(q, child.point) <= best + scale(child.level):
best = cover_tree_search(child, q, best)
return bestExample
Points in 2D:
| point |
|---|
| (1,1) |
| (2,2) |
| (4,4) |
| (8,8) |
Hierarchy:
| level | scale |
|---|---|
| 3 | 8 |
| 2 | 4 |
| 1 | 2 |
| 0 | 1 |
Query:
Nearest neighbor:
Correctness
The cover tree enforces two invariants:
- Covering: every point at level is within distance of its parent
- Separation: distinct nodes at level are at least apart
These properties guarantee that any point closer than the current best candidate must lie within a reachable subtree under the pruning condition.
Since all candidates that can improve the result are explored, and all pruned subtrees are provably too far to contain a closer point, the algorithm returns the true nearest neighbor.
Complexity
Let be number of points and intrinsic dimension be bounded.
| case | time |
|---|---|
| average nearest neighbor | |
| worst case |
Space complexity:
When to Use
Cover trees are useful when:
- data lies in general metric spaces
- intrinsic dimension is low or moderate
- nearest neighbor queries dominate workload
- triangle inequality holds but coordinates are not available
Compared to k-d trees and ball trees, cover trees provide stronger theoretical guarantees in high dimensional metric spaces.
Implementation
import math
def dist(a, b):
return math.sqrt(sum((x - y) ** 2 for x, y in zip(a, b)))
class Node:
def __init__(self, point, level):
self.point = point
self.level = level
self.children = []
def scale(level):
return 2 ** level
def cover_tree_search(node, q, best=float("inf")):
if node is None:
return best
d = dist(q, node.point)
if d < best:
best = d
for child in node.children:
if dist(q, child.point) <= best + scale(child.level):
best = cover_tree_search(child, q, best)
return bestimport "math"
type Point []float64
func dist(a, b Point) float64 {
sum := 0.0
for i := range a {
d := a[i] - b[i]
sum += d * d
}
return math.Sqrt(sum)
}
type Node struct {
Point Point
Level int
Children []*Node
}
func scale(level int) float64 {
return math.Pow(2, float64(level))
}
func CoverTreeSearch(node *Node, q Point, best float64) float64 {
if node == nil {
return best
}
d := dist(q, node.Point)
if d < best {
best = d
}
for _, child := range node.Children {
if dist(q, child.Point) <= best+scale(child.Level) {
best = CoverTreeSearch(child, q, best)
}
}
return best
}