20.12 Building a Cache Eviction Policy
Design a cache that stores recently used values and evicts entries when capacity is full.
20.12 Building a Cache Eviction Policy
Problem
Design a cache that stores recently used values and evicts entries when capacity is full.
Caches appear in databases, operating systems, compilers, web servers, API clients, CDNs, search systems, build tools, and application frameworks. They reduce latency by keeping expensive results close to where they are needed.
A cache answers two questions:
Do we already have this value?
If the cache is full, which value should we remove?
Example:
capacity = 3
put A
put B
put C
get A
put D
At the final step, the cache must evict one entry. A reasonable policy removes B, because A was just used and C was more recent than B.
This is the least recently used policy, usually called LRU.
Solution
Use a hash table plus a doubly linked list.
The hash table maps keys to nodes, so lookup is O(1). The linked list stores usage order, so the cache can move recently used nodes to the front and evict the least recently used node from the back.
The design looks like this:
Hash table:
A -> node(A)
B -> node(B)
C -> node(C)
List:
most recent A <-> C <-> B least recent
Operations:
| Operation | Required behavior |
|---|---|
get(key) |
Return value and mark entry as recently used |
put(key, value) |
Insert or update value and mark entry as recently used |
| Eviction | Remove least recently used entry |
Node Structure
A doubly linked list node stores a key, a value, and links to neighbors.
class Node:
def __init__(self, key=None, value=None):
self.key = key
self.value = value
self.prev = None
self.next = None
Use sentinel nodes for the head and tail. Sentinels avoid special cases when the list is empty or has one element.
class LRUCache:
def __init__(self, capacity):
if capacity <= 0:
raise ValueError("capacity must be positive")
self.capacity = capacity
self.items = {}
self.head = Node()
self.tail = Node()
self.head.next = self.tail
self.tail.prev = self.head
The list invariant is:
head <-> most recent ... least recent <-> tail
List Operations
The cache needs three small list operations:
- Remove a node from its current position.
- Add a node immediately after the head.
- Move a node to the front.
class LRUCache:
def __init__(self, capacity):
if capacity <= 0:
raise ValueError("capacity must be positive")
self.capacity = capacity
self.items = {}
self.head = Node()
self.tail = Node()
self.head.next = self.tail
self.tail.prev = self.head
def _remove(self, node):
previous_node = node.prev
next_node = node.next
previous_node.next = next_node
next_node.prev = previous_node
node.prev = None
node.next = None
def _add_front(self, node):
first = self.head.next
node.prev = self.head
node.next = first
self.head.next = node
first.prev = node
def _move_front(self, node):
self._remove(node)
self._add_front(node)
These helpers are private because callers should not manipulate the list directly.
Getting Values
A successful lookup marks the entry as recently used.
class LRUCache:
def __init__(self, capacity):
if capacity <= 0:
raise ValueError("capacity must be positive")
self.capacity = capacity
self.items = {}
self.head = Node()
self.tail = Node()
self.head.next = self.tail
self.tail.prev = self.head
def _remove(self, node):
previous_node = node.prev
next_node = node.next
previous_node.next = next_node
next_node.prev = previous_node
node.prev = None
node.next = None
def _add_front(self, node):
first = self.head.next
node.prev = self.head
node.next = first
self.head.next = node
first.prev = node
def _move_front(self, node):
self._remove(node)
self._add_front(node)
def get(self, key):
node = self.items.get(key)
if node is None:
return None
self._move_front(node)
return node.value
Returning None for a missing key is simple, but it creates ambiguity if None is a valid cached value. A production cache may return (found, value) instead.
def get_entry(cache, key):
value = cache.get(key)
if value is None:
return False, None
return True, value
A cleaner design is to raise KeyError or use a sentinel object.
Inserting and Updating Values
A put operation has three cases:
- Key already exists: update value and move it to the front.
- Key is new and cache has space: insert at the front.
- Key is new and cache is full: evict least recently used entry, then insert.
class LRUCache:
def __init__(self, capacity):
if capacity <= 0:
raise ValueError("capacity must be positive")
self.capacity = capacity
self.items = {}
self.head = Node()
self.tail = Node()
self.head.next = self.tail
self.tail.prev = self.head
def _remove(self, node):
previous_node = node.prev
next_node = node.next
previous_node.next = next_node
next_node.prev = previous_node
node.prev = None
node.next = None
def _add_front(self, node):
first = self.head.next
node.prev = self.head
node.next = first
self.head.next = node
first.prev = node
def _move_front(self, node):
self._remove(node)
self._add_front(node)
def _evict_lru(self):
node = self.tail.prev
if node is self.head:
return
self._remove(node)
del self.items[node.key]
def get(self, key):
node = self.items.get(key)
if node is None:
return None
self._move_front(node)
return node.value
def put(self, key, value):
node = self.items.get(key)
if node is not None:
node.value = value
self._move_front(node)
return
if len(self.items) >= self.capacity:
self._evict_lru()
node = Node(key, value)
self.items[key] = node
self._add_front(node)
Now both get and put run in O(1) expected time.
Walking Through an Example
Capacity:
3
Operations:
put A
put B
put C
get A
put D
State after put A:
A
State after put B:
B, A
State after put C:
C, B, A
State after get A:
A, C, B
State after put D:
D, A, C
B is evicted because it is least recently used.
Complexity
| Operation | Complexity |
|---|---|
get |
O(1) expected |
put |
O(1) expected |
| Move to front | O(1) |
| Evict least recent | O(1) |
| Memory | O(capacity) |
The O(1) bounds depend on two structures working together. The hash table gives direct access to nodes. The doubly linked list gives constant-time reordering and eviction.
A hash table alone cannot identify the least recently used item efficiently. A linked list alone cannot find arbitrary keys efficiently.
FIFO Eviction
LRU is not the only policy. The simplest eviction policy is FIFO: first in, first out.
FIFO evicts the oldest inserted item, regardless of whether it was recently used.
from collections import deque
class FIFOCache:
def __init__(self, capacity):
self.capacity = capacity
self.values = {}
self.order = deque()
def get(self, key):
return self.values.get(key)
def put(self, key, value):
if key in self.values:
self.values[key] = value
return
if len(self.values) >= self.capacity:
oldest = self.order.popleft()
del self.values[oldest]
self.values[key] = value
self.order.append(key)
FIFO is easy to implement but often performs worse than LRU because it ignores access patterns.
LFU Eviction
Least frequently used, or LFU, evicts the item with the lowest access count.
This helps when some items are consistently popular but not always recently used.
Example:
A accessed 100 times yesterday
B accessed once one second ago
LRU may keep B. LFU may keep A.
A simplified LFU design stores access counts:
class SimpleLFUCache:
def __init__(self, capacity):
self.capacity = capacity
self.values = {}
self.counts = {}
def get(self, key):
if key not in self.values:
return None
self.counts[key] += 1
return self.values[key]
def put(self, key, value):
if key in self.values:
self.values[key] = value
self.counts[key] += 1
return
if len(self.values) >= self.capacity:
victim = min(
self.counts,
key=lambda item: self.counts[item],
)
del self.values[victim]
del self.counts[victim]
self.values[key] = value
self.counts[key] = 1
This version has O(n) eviction because it scans all counts. A production LFU cache uses frequency buckets and linked lists to achieve O(1) operations.
LFU also needs aging. Without aging, old popularity can dominate forever.
TTL Expiration
Many caches should expire entries after a fixed time.
Example:
cache user profile for 5 minutes
Add an expiration timestamp.
import time
class TTLCache:
def __init__(self, ttl_seconds):
self.ttl_seconds = ttl_seconds
self.values = {}
def put(self, key, value, now=None):
if now is None:
now = time.time()
self.values[key] = {
"value": value,
"expires_at": now + self.ttl_seconds,
}
def get(self, key, now=None):
if now is None:
now = time.time()
entry = self.values.get(key)
if entry is None:
return None
if entry["expires_at"] <= now:
del self.values[key]
return None
return entry["value"]
TTL and eviction solve different problems.
| Mechanism | Purpose |
|---|---|
| Eviction | Remove entries when capacity is full |
| TTL | Remove entries when they are stale |
A real cache often uses both.
Weighted Capacity
Not all entries cost the same amount of memory.
A cache of strings, images, query results, or serialized objects may need a byte budget rather than an item count.
def approximate_size(value):
return len(str(value).encode("utf-8"))
A weighted cache tracks total size:
class WeightedLRUCache(LRUCache):
def __init__(self, max_weight):
super().__init__(capacity=10**18)
self.max_weight = max_weight
self.current_weight = 0
self.weights = {}
def put(self, key, value):
weight = approximate_size(value)
old_node = self.items.get(key)
if old_node is not None:
self.current_weight -= self.weights[key]
old_node.value = value
self.weights[key] = weight
self.current_weight += weight
self._move_front(old_node)
else:
node = Node(key, value)
self.items[key] = node
self.weights[key] = weight
self.current_weight += weight
self._add_front(node)
while self.current_weight > self.max_weight:
self._evict_lru_weighted()
def _evict_lru_weighted(self):
node = self.tail.prev
if node is self.head:
return
self._remove(node)
del self.items[node.key]
self.current_weight -= self.weights[node.key]
del self.weights[node.key]
This pattern is common in memory-sensitive services.
Cache Miss Handling
A cache usually wraps an expensive loader.
def get_or_load(cache, key, loader):
value = cache.get(key)
if value is not None:
return value
value = loader(key)
cache.put(key, value)
return value
This is simple but can cause duplicate work under concurrency. If many requests miss the same key at once, they may all call loader.
This is called a cache stampede.
Avoiding Cache Stampedes
A cache stampede occurs when many callers try to regenerate the same value at the same time.
Common mitigations:
| Technique | Behavior |
|---|---|
| Per-key lock | Only one caller regenerates a key |
| Stale-while-revalidate | Return stale value while refreshing |
| Jittered TTL | Avoid synchronized expiration |
| Request coalescing | Share one in-flight computation |
| Negative caching | Cache misses for absent data |
A simple per-key lock approach:
import threading
from collections import defaultdict
class LoadingCache:
def __init__(self, cache):
self.cache = cache
self.locks = defaultdict(threading.Lock)
def get_or_load(self, key, loader):
value = self.cache.get(key)
if value is not None:
return value
with self.locks[key]:
value = self.cache.get(key)
if value is not None:
return value
value = loader(key)
self.cache.put(key, value)
return value
The second cache check inside the lock is essential. Another thread may have already loaded the value.
Negative Caching
Sometimes the expensive result is “not found.”
Example:
load user 999 -> not found
Without negative caching, repeated requests for the same missing user hit the backing store repeatedly.
Use a sentinel for negative entries.
NOT_FOUND = object()
def get_or_load_with_negative_cache(cache, key, loader):
value = cache.get(key)
if value is NOT_FOUND:
return None
if value is not None:
return value
loaded = loader(key)
if loaded is None:
cache.put(key, NOT_FOUND)
return None
cache.put(key, loaded)
return loaded
Negative entries should usually have shorter TTLs than positive entries, because missing data may appear later.
Metrics
A cache needs metrics to be useful.
Track:
| Metric | Meaning |
|---|---|
| hit_count | Number of successful cache reads |
| miss_count | Number of failed cache reads |
| hit_rate | hit_count divided by total reads |
| eviction_count | Number of removed entries |
| load_count | Number of calls to backing loader |
| load_latency | Time spent loading values |
| current_size | Number or weight of entries |
| stale_served_count | Number of stale values returned |
A cache with a low hit rate may be too small, keyed incorrectly, or caching the wrong workload.
Testing
Test basic insertion and lookup.
def test_lru_get_existing_value():
cache = LRUCache(capacity=2)
cache.put("A", 1)
assert cache.get("A") == 1
Test eviction order.
def test_lru_evicts_least_recently_used():
cache = LRUCache(capacity=2)
cache.put("A", 1)
cache.put("B", 2)
assert cache.get("A") == 1
cache.put("C", 3)
assert cache.get("B") is None
assert cache.get("A") == 1
assert cache.get("C") == 3
Test update behavior.
def test_lru_update_refreshes_entry():
cache = LRUCache(capacity=2)
cache.put("A", 1)
cache.put("B", 2)
cache.put("A", 10)
cache.put("C", 3)
assert cache.get("A") == 10
assert cache.get("B") is None
These tests protect the linked-list invariant and eviction behavior.
Common Bugs
The most common LRU bug is updating the hash table but forgetting to update the linked list. This causes stale nodes and incorrect eviction.
Another common bug is failing to remove evicted entries from the hash table. The cache then returns values that should no longer exist.
Sentinel handling can be wrong at boundaries. Use head and tail sentinels to avoid empty-list special cases.
Returning None for a miss is ambiguous when None is a valid value. Use a sentinel or an explicit result type.
TTL caches often return expired values because expiration is checked only during cleanup. Check expiration on read.
Concurrent caches often duplicate expensive loads. Use per-key locking or request coalescing.
Weighted caches can exceed memory budgets if updates do not subtract the old weight before adding the new one.
Recipe
Build the cache in layers.
Start with an LRU cache using a hash table and a doubly linked list. Use sentinels to simplify list manipulation. Make get refresh recency. Make put update existing entries and evict from the tail when full. Add TTL only after capacity eviction works. Add byte-based capacity when item sizes vary. Add loader integration, per-key locks, and negative caching when the cache fronts an expensive source. Track hit rate, miss rate, eviction count, and load latency.
The main lesson is that a cache eviction policy is a contract about what the system chooses to remember. LRU is a strong default because recent use often predicts future use, but the right policy depends on workload shape, memory budget, staleness tolerance, and concurrency behavior.