Compute the sorted-order rank of a key in a search tree augmented with subtree sizes.
Order Statistic Tree Rank computes how many keys are smaller than a given key. It uses a binary search tree where each node stores the size of its subtree.
This operation is the inverse of Order Statistic Tree Select. Select maps a rank to a key. Rank maps a key to its position in sorted order.
Problem
Given an order statistic tree and a key , compute the zero-based rank of :
If duplicates are allowed, the definition must specify whether rank means first occurrence, last occurrence, or count of strictly smaller keys.
Data Structure
Each node stores its key, children, parent if upward traversal is used, and subtree size.
node:
key
left
right
sizeThe size field satisfies:
size(node) = 1 + size(node.left) + size(node.right)Algorithm
Walk from the root as in binary search. Whenever the search moves right, all keys in the left subtree and the current node are smaller than the target, so add them to the rank.
rank(root, x):
r = 0
node = root
while node is not null:
if x < node.key:
node = node.left
else if x > node.key:
r = r + size(node.left) + 1
node = node.right
else:
return r + size(node.left)
return not foundExample
For sorted order:
The rank of is:
because three elements are smaller than :
Correctness
At every node, the binary search tree invariant separates the key space into smaller keys on the left and larger keys on the right.
When the algorithm moves left, no skipped key is smaller than , so the rank does not change. When it moves right, the current node and all keys in its left subtree are smaller than , so their count is added. When the key is found, all remaining smaller keys are exactly those in its left subtree.
Therefore the returned value is the number of keys smaller than .
Complexity
| tree type | time |
|---|---|
| balanced tree | |
| unbalanced tree |
Space:
for the iterative version.
When to Use
Use Order Statistic Tree Rank when:
- the set changes over time
- you need dynamic ranks
- insertions and deletions must remain efficient
- sorted arrays are too costly to maintain under updates
Typical applications include dynamic medians, inversion counting, online percentiles, leaderboard positions, and rank queries in ordered sets.
Implementation
class Node:
def __init__(self, key):
self.key = key
self.left = None
self.right = None
self.size = 1
def size(node):
return node.size if node else 0
def rank(root, x):
r = 0
node = root
while node:
if x < node.key:
node = node.left
elif x > node.key:
r += size(node.left) + 1
node = node.right
else:
return r + size(node.left)
return Nonetype OSTNode struct {
Key int
Left, Right *OSTNode
Size int
}
func nodeSize(x *OSTNode) int {
if x == nil {
return 0
}
return x.Size
}
func OrderStatisticRank(root *OSTNode, key int) (int, bool) {
r := 0
x := root
for x != nil {
if key < x.Key {
x = x.Left
} else if key > x.Key {
r += nodeSize(x.Left) + 1
x = x.Right
} else {
return r + nodeSize(x.Left), true
}
}
return 0, false
}