VP Tree Search

VP tree search (vantage point tree) organizes points in a metric space by selecting a pivot point and partitioning others based on distance from that pivot.

It is designed for efficient nearest neighbor search using triangle inequality based pruning.

Problem

Given a set of points $S$ in a metric space and a query point $q$ , find:

\arg\min_{p \in S} d(p, q)

Structure

Each node stores:

field	meaning
vantage point	pivot point
threshold	median distance to partition points
left	points closer than threshold
right	points farther than threshold

Partition rule:

d(p, vp) \le \mu \Rightarrow p \in left

d(p, vp) > \mu \Rightarrow p \in right

Algorithm

Search maintains the best distance found so far and prunes subtrees using triangle inequality.

vp_search(node, q, best):
    if node is null:
        return best

    d = distance(q, node.vp)
    best = min(best, d)

    if node.is_leaf:
        for p in node.points:
            best = min(best, distance(q, p))
        return best

    if d < node.threshold:
        best = vp_search(node.left, q, best)

        if d + best >= node.threshold:
            best = vp_search(node.right, q, best)
    else:
        best = vp_search(node.right, q, best)

        if d - best <= node.threshold:
            best = vp_search(node.left, q, best)

    return best

Example

Points:

point
(1,1)
(2,2)
(8,8)
(9,9)

Choose vantage point:

vp = (2,2)

Threshold splits into:

region	points
left	(1,1), (2,2)
right	(8,8), (9,9)

Query:

q = (2,3)

Nearest neighbor:

(2,2)

Correctness

VP trees rely on the triangle inequality:

d(a,c) \le d(a,b) + d(b,c)

This allows pruning. If the minimum possible distance from the query point to any point in a subtree exceeds the best known distance, that subtree cannot contain a closer point.

Since all points are either examined directly or excluded only when provably farther, the algorithm returns the true nearest neighbor.

Complexity

Let $n$ be number of points.

case	time
average nearest neighbor	$O(\log n)$
worst case	$O(n)$

Space complexity:

O(n)

When to Use

VP trees are useful when:

data lies in a metric space
distance function is expensive
nearest neighbor queries are frequent
triangle inequality holds strongly

Compared to k-d trees, VP trees do not depend on coordinate axes and work better in non-Euclidean spaces.

Compared to ball trees, VP trees often use simpler partition logic.

Implementation

import math

def dist(a, b):
    return math.sqrt(sum((x - y) ** 2 for x, y in zip(a, b)))

class Node:
    def __init__(self, vp, threshold):
        self.vp = vp
        self.threshold = threshold
        self.left = None
        self.right = None
        self.points = None

def vp_search(node, q, best=float("inf")):
    if node is None:
        return best

    d = dist(q, node.vp)
    best = min(best, d)

    if node.points is not None:
        for p in node.points:
            best = min(best, dist(q, p))
        return best

    if node.left and node.right:
        if d < node.threshold:
            best = vp_search(node.left, q, best)
            if d + best >= node.threshold:
                best = vp_search(node.right, q, best)
        else:
            best = vp_search(node.right, q, best)
            if d - best <= node.threshold:
                best = vp_search(node.left, q, best)

    return best

import "math"

type Point []float64

func dist(a, b Point) float64 {
	sum := 0.0
	for i := range a {
		d := a[i] - b[i]
		sum += d * d
	}
	return math.Sqrt(sum)
}

type Node struct {
	VP        Point
	Threshold float64
	Left      *Node
	Right     *Node
	Points    []Point
}

func VPSearch(node *Node, q Point, best float64) float64 {
	if node == nil {
		return best
	}

	d := dist(q, node.VP)
	if d < best {
		best = d
	}

	if node.Points != nil {
		for _, p := range node.Points {
			d2 := dist(q, p)
			if d2 < best {
				best = d2
			}
		}
		return best
	}

	if node.Left != nil && node.Right != nil {
		if d < node.Threshold {
			best = VPSearch(node.Left, q, best)
			if d+best >= node.Threshold {
				best = VPSearch(node.Right, q, best)
			}
		} else {
			best = VPSearch(node.Right, q, best)
			if d-best <= node.Threshold {
				best = VPSearch(node.Left, q, best)
			}
		}
	}

	return best
}