# Cover Tree Search

# Cover Tree Search

Cover tree search is a metric space search structure designed for nearest neighbor queries. It builds a hierarchy of points where each level covers the level below with exponentially shrinking distance bounds.

It provides strong theoretical guarantees in spaces with bounded intrinsic dimension.

## Problem

Given a set of points $S$ in a metric space and a query point $q$, find:

$$
\arg\min_{p \in S} d(p, q)
$$

## Structure

Each level $i$ of the tree has a scale:

$$
2^i
$$

Each node satisfies:

| property   | meaning                                 |
| ---------- | --------------------------------------- |
| nesting    | every point appears in all lower levels |
| covering   | children are within bounded distance    |
| separation | nodes at same level are well separated  |

Each node stores a point and pointers to children at the next finer level.

## Algorithm

Search maintains a candidate best distance and prunes using scale bounds.

```text id="ct0k7q"
cover_tree_search(node, q, best):
    if node is null:
        return best

    d = distance(q, node.point)
    best = min(best, d)

    if node.is_leaf:
        return best

    for child in node.children:
        if distance(q, child.point) <= best + scale(child.level):
            best = cover_tree_search(child, q, best)

    return best
```

## Example

Points in 2D:

| point |
| ----- |
| (1,1) |
| (2,2) |
| (4,4) |
| (8,8) |

Hierarchy:

| level | scale |
| ----- | ----- |
| 3     | 8     |
| 2     | 4     |
| 1     | 2     |
| 0     | 1     |

Query:

$$
q = (3,3)
$$

Nearest neighbor:

$$
(2,2)
$$

## Correctness

The cover tree enforces two invariants:

* Covering: every point at level $i$ is within distance $2^i$ of its parent
* Separation: distinct nodes at level $i$ are at least $2^i$ apart

These properties guarantee that any point closer than the current best candidate must lie within a reachable subtree under the pruning condition.

Since all candidates that can improve the result are explored, and all pruned subtrees are provably too far to contain a closer point, the algorithm returns the true nearest neighbor.

## Complexity

Let $n$ be number of points and intrinsic dimension be bounded.

| case                     | time        |
| ------------------------ | ----------- |
| average nearest neighbor | $O(\log n)$ |
| worst case               | $O(n)$      |

Space complexity:

$$
O(n)
$$

## When to Use

Cover trees are useful when:

* data lies in general metric spaces
* intrinsic dimension is low or moderate
* nearest neighbor queries dominate workload
* triangle inequality holds but coordinates are not available

Compared to k-d trees and ball trees, cover trees provide stronger theoretical guarantees in high dimensional metric spaces.

## Implementation

```python id="ct1m8v"
import math

def dist(a, b):
    return math.sqrt(sum((x - y) ** 2 for x, y in zip(a, b)))

class Node:
    def __init__(self, point, level):
        self.point = point
        self.level = level
        self.children = []

def scale(level):
    return 2 ** level

def cover_tree_search(node, q, best=float("inf")):
    if node is None:
        return best

    d = dist(q, node.point)
    if d < best:
        best = d

    for child in node.children:
        if dist(q, child.point) <= best + scale(child.level):
            best = cover_tree_search(child, q, best)

    return best
```

```go id="ct2n7x"
import "math"

type Point []float64

func dist(a, b Point) float64 {
	sum := 0.0
	for i := range a {
		d := a[i] - b[i]
		sum += d * d
	}
	return math.Sqrt(sum)
}

type Node struct {
	Point    Point
	Level    int
	Children []*Node
}

func scale(level int) float64 {
	return math.Pow(2, float64(level))
}

func CoverTreeSearch(node *Node, q Point, best float64) float64 {
	if node == nil {
		return best
	}

	d := dist(q, node.Point)
	if d < best {
		best = d
	}

	for _, child := range node.Children {
		if dist(q, child.Point) <= best+scale(child.Level) {
			best = CoverTreeSearch(child, q, best)
		}
	}

	return best
}
```

