VP tree search (vantage point tree) organizes points in a metric space by selecting a pivot point and partitioning others based on distance from that pivot.
It is designed for efficient nearest neighbor search using triangle inequality based pruning.
Problem
Given a set of points in a metric space and a query point , find:
Structure
Each node stores:
| field | meaning |
|---|---|
| vantage point | pivot point |
| threshold | median distance to partition points |
| left | points closer than threshold |
| right | points farther than threshold |
Partition rule:
Algorithm
Search maintains the best distance found so far and prunes subtrees using triangle inequality.
vp_search(node, q, best):
if node is null:
return best
d = distance(q, node.vp)
best = min(best, d)
if node.is_leaf:
for p in node.points:
best = min(best, distance(q, p))
return best
if d < node.threshold:
best = vp_search(node.left, q, best)
if d + best >= node.threshold:
best = vp_search(node.right, q, best)
else:
best = vp_search(node.right, q, best)
if d - best <= node.threshold:
best = vp_search(node.left, q, best)
return bestExample
Points:
| point |
|---|
| (1,1) |
| (2,2) |
| (8,8) |
| (9,9) |
Choose vantage point:
Threshold splits into:
| region | points |
|---|---|
| left | (1,1), (2,2) |
| right | (8,8), (9,9) |
Query:
Nearest neighbor:
Correctness
VP trees rely on the triangle inequality:
This allows pruning. If the minimum possible distance from the query point to any point in a subtree exceeds the best known distance, that subtree cannot contain a closer point.
Since all points are either examined directly or excluded only when provably farther, the algorithm returns the true nearest neighbor.
Complexity
Let be number of points.
| case | time |
|---|---|
| average nearest neighbor | |
| worst case |
Space complexity:
When to Use
VP trees are useful when:
- data lies in a metric space
- distance function is expensive
- nearest neighbor queries are frequent
- triangle inequality holds strongly
Compared to k-d trees, VP trees do not depend on coordinate axes and work better in non-Euclidean spaces.
Compared to ball trees, VP trees often use simpler partition logic.
Implementation
import math
def dist(a, b):
return math.sqrt(sum((x - y) ** 2 for x, y in zip(a, b)))
class Node:
def __init__(self, vp, threshold):
self.vp = vp
self.threshold = threshold
self.left = None
self.right = None
self.points = None
def vp_search(node, q, best=float("inf")):
if node is None:
return best
d = dist(q, node.vp)
best = min(best, d)
if node.points is not None:
for p in node.points:
best = min(best, dist(q, p))
return best
if node.left and node.right:
if d < node.threshold:
best = vp_search(node.left, q, best)
if d + best >= node.threshold:
best = vp_search(node.right, q, best)
else:
best = vp_search(node.right, q, best)
if d - best <= node.threshold:
best = vp_search(node.left, q, best)
return bestimport "math"
type Point []float64
func dist(a, b Point) float64 {
sum := 0.0
for i := range a {
d := a[i] - b[i]
sum += d * d
}
return math.Sqrt(sum)
}
type Node struct {
VP Point
Threshold float64
Left *Node
Right *Node
Points []Point
}
func VPSearch(node *Node, q Point, best float64) float64 {
if node == nil {
return best
}
d := dist(q, node.VP)
if d < best {
best = d
}
if node.Points != nil {
for _, p := range node.Points {
d2 := dist(q, p)
if d2 < best {
best = d2
}
}
return best
}
if node.Left != nil && node.Right != nil {
if d < node.Threshold {
best = VPSearch(node.Left, q, best)
if d+best >= node.Threshold {
best = VPSearch(node.Right, q, best)
}
} else {
best = VPSearch(node.Right, q, best)
if d-best <= node.Threshold {
best = VPSearch(node.Left, q, best)
}
}
}
return best
}