20.12 Building a Cache Eviction Policy

Design a cache that stores recently used values and evicts entries when capacity is full.

20.12 Building a Cache Eviction Policy

Problem

Design a cache that stores recently used values and evicts entries when capacity is full.

Caches appear in databases, operating systems, compilers, web servers, API clients, CDNs, search systems, build tools, and application frameworks. They reduce latency by keeping expensive results close to where they are needed.

A cache answers two questions:

Do we already have this value?
If the cache is full, which value should we remove?

Example:

capacity = 3

put A
put B
put C
get A
put D

At the final step, the cache must evict one entry. A reasonable policy removes B, because A was just used and C was more recent than B.

This is the least recently used policy, usually called LRU.

Solution

Use a hash table plus a doubly linked list.

The hash table maps keys to nodes, so lookup is O(1). The linked list stores usage order, so the cache can move recently used nodes to the front and evict the least recently used node from the back.

The design looks like this:

Hash table:
A -> node(A)
B -> node(B)
C -> node(C)

List:
most recent  A <-> C <-> B  least recent

Operations:

Operation Required behavior
get(key) Return value and mark entry as recently used
put(key, value) Insert or update value and mark entry as recently used
Eviction Remove least recently used entry

Node Structure

A doubly linked list node stores a key, a value, and links to neighbors.

class Node:
    def __init__(self, key=None, value=None):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None

Use sentinel nodes for the head and tail. Sentinels avoid special cases when the list is empty or has one element.

class LRUCache:
    def __init__(self, capacity):
        if capacity <= 0:
            raise ValueError("capacity must be positive")

        self.capacity = capacity
        self.items = {}

        self.head = Node()
        self.tail = Node()

        self.head.next = self.tail
        self.tail.prev = self.head

The list invariant is:

head <-> most recent ... least recent <-> tail

List Operations

The cache needs three small list operations:

  1. Remove a node from its current position.
  2. Add a node immediately after the head.
  3. Move a node to the front.
class LRUCache:
    def __init__(self, capacity):
        if capacity <= 0:
            raise ValueError("capacity must be positive")

        self.capacity = capacity
        self.items = {}

        self.head = Node()
        self.tail = Node()

        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        previous_node = node.prev
        next_node = node.next

        previous_node.next = next_node
        next_node.prev = previous_node

        node.prev = None
        node.next = None

    def _add_front(self, node):
        first = self.head.next

        node.prev = self.head
        node.next = first

        self.head.next = node
        first.prev = node

    def _move_front(self, node):
        self._remove(node)
        self._add_front(node)

These helpers are private because callers should not manipulate the list directly.

Getting Values

A successful lookup marks the entry as recently used.

class LRUCache:
    def __init__(self, capacity):
        if capacity <= 0:
            raise ValueError("capacity must be positive")

        self.capacity = capacity
        self.items = {}

        self.head = Node()
        self.tail = Node()

        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        previous_node = node.prev
        next_node = node.next

        previous_node.next = next_node
        next_node.prev = previous_node

        node.prev = None
        node.next = None

    def _add_front(self, node):
        first = self.head.next

        node.prev = self.head
        node.next = first

        self.head.next = node
        first.prev = node

    def _move_front(self, node):
        self._remove(node)
        self._add_front(node)

    def get(self, key):
        node = self.items.get(key)

        if node is None:
            return None

        self._move_front(node)

        return node.value

Returning None for a missing key is simple, but it creates ambiguity if None is a valid cached value. A production cache may return (found, value) instead.

def get_entry(cache, key):
    value = cache.get(key)

    if value is None:
        return False, None

    return True, value

A cleaner design is to raise KeyError or use a sentinel object.

Inserting and Updating Values

A put operation has three cases:

  1. Key already exists: update value and move it to the front.
  2. Key is new and cache has space: insert at the front.
  3. Key is new and cache is full: evict least recently used entry, then insert.
class LRUCache:
    def __init__(self, capacity):
        if capacity <= 0:
            raise ValueError("capacity must be positive")

        self.capacity = capacity
        self.items = {}

        self.head = Node()
        self.tail = Node()

        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        previous_node = node.prev
        next_node = node.next

        previous_node.next = next_node
        next_node.prev = previous_node

        node.prev = None
        node.next = None

    def _add_front(self, node):
        first = self.head.next

        node.prev = self.head
        node.next = first

        self.head.next = node
        first.prev = node

    def _move_front(self, node):
        self._remove(node)
        self._add_front(node)

    def _evict_lru(self):
        node = self.tail.prev

        if node is self.head:
            return

        self._remove(node)
        del self.items[node.key]

    def get(self, key):
        node = self.items.get(key)

        if node is None:
            return None

        self._move_front(node)

        return node.value

    def put(self, key, value):
        node = self.items.get(key)

        if node is not None:
            node.value = value
            self._move_front(node)
            return

        if len(self.items) >= self.capacity:
            self._evict_lru()

        node = Node(key, value)
        self.items[key] = node
        self._add_front(node)

Now both get and put run in O(1) expected time.

Walking Through an Example

Capacity:

3

Operations:

put A
put B
put C
get A
put D

State after put A:

A

State after put B:

B, A

State after put C:

C, B, A

State after get A:

A, C, B

State after put D:

D, A, C

B is evicted because it is least recently used.

Complexity

Operation Complexity
get O(1) expected
put O(1) expected
Move to front O(1)
Evict least recent O(1)
Memory O(capacity)

The O(1) bounds depend on two structures working together. The hash table gives direct access to nodes. The doubly linked list gives constant-time reordering and eviction.

A hash table alone cannot identify the least recently used item efficiently. A linked list alone cannot find arbitrary keys efficiently.

FIFO Eviction

LRU is not the only policy. The simplest eviction policy is FIFO: first in, first out.

FIFO evicts the oldest inserted item, regardless of whether it was recently used.

from collections import deque

class FIFOCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.values = {}
        self.order = deque()

    def get(self, key):
        return self.values.get(key)

    def put(self, key, value):
        if key in self.values:
            self.values[key] = value
            return

        if len(self.values) >= self.capacity:
            oldest = self.order.popleft()
            del self.values[oldest]

        self.values[key] = value
        self.order.append(key)

FIFO is easy to implement but often performs worse than LRU because it ignores access patterns.

LFU Eviction

Least frequently used, or LFU, evicts the item with the lowest access count.

This helps when some items are consistently popular but not always recently used.

Example:

A accessed 100 times yesterday
B accessed once one second ago

LRU may keep B. LFU may keep A.

A simplified LFU design stores access counts:

class SimpleLFUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.values = {}
        self.counts = {}

    def get(self, key):
        if key not in self.values:
            return None

        self.counts[key] += 1
        return self.values[key]

    def put(self, key, value):
        if key in self.values:
            self.values[key] = value
            self.counts[key] += 1
            return

        if len(self.values) >= self.capacity:
            victim = min(
                self.counts,
                key=lambda item: self.counts[item],
            )

            del self.values[victim]
            del self.counts[victim]

        self.values[key] = value
        self.counts[key] = 1

This version has O(n) eviction because it scans all counts. A production LFU cache uses frequency buckets and linked lists to achieve O(1) operations.

LFU also needs aging. Without aging, old popularity can dominate forever.

TTL Expiration

Many caches should expire entries after a fixed time.

Example:

cache user profile for 5 minutes

Add an expiration timestamp.

import time

class TTLCache:
    def __init__(self, ttl_seconds):
        self.ttl_seconds = ttl_seconds
        self.values = {}

    def put(self, key, value, now=None):
        if now is None:
            now = time.time()

        self.values[key] = {
            "value": value,
            "expires_at": now + self.ttl_seconds,
        }

    def get(self, key, now=None):
        if now is None:
            now = time.time()

        entry = self.values.get(key)

        if entry is None:
            return None

        if entry["expires_at"] <= now:
            del self.values[key]
            return None

        return entry["value"]

TTL and eviction solve different problems.

Mechanism Purpose
Eviction Remove entries when capacity is full
TTL Remove entries when they are stale

A real cache often uses both.

Weighted Capacity

Not all entries cost the same amount of memory.

A cache of strings, images, query results, or serialized objects may need a byte budget rather than an item count.

def approximate_size(value):
    return len(str(value).encode("utf-8"))

A weighted cache tracks total size:

class WeightedLRUCache(LRUCache):
    def __init__(self, max_weight):
        super().__init__(capacity=10**18)
        self.max_weight = max_weight
        self.current_weight = 0
        self.weights = {}

    def put(self, key, value):
        weight = approximate_size(value)

        old_node = self.items.get(key)

        if old_node is not None:
            self.current_weight -= self.weights[key]
            old_node.value = value
            self.weights[key] = weight
            self.current_weight += weight
            self._move_front(old_node)
        else:
            node = Node(key, value)
            self.items[key] = node
            self.weights[key] = weight
            self.current_weight += weight
            self._add_front(node)

        while self.current_weight > self.max_weight:
            self._evict_lru_weighted()

    def _evict_lru_weighted(self):
        node = self.tail.prev

        if node is self.head:
            return

        self._remove(node)
        del self.items[node.key]

        self.current_weight -= self.weights[node.key]
        del self.weights[node.key]

This pattern is common in memory-sensitive services.

Cache Miss Handling

A cache usually wraps an expensive loader.

def get_or_load(cache, key, loader):
    value = cache.get(key)

    if value is not None:
        return value

    value = loader(key)
    cache.put(key, value)

    return value

This is simple but can cause duplicate work under concurrency. If many requests miss the same key at once, they may all call loader.

This is called a cache stampede.

Avoiding Cache Stampedes

A cache stampede occurs when many callers try to regenerate the same value at the same time.

Common mitigations:

Technique Behavior
Per-key lock Only one caller regenerates a key
Stale-while-revalidate Return stale value while refreshing
Jittered TTL Avoid synchronized expiration
Request coalescing Share one in-flight computation
Negative caching Cache misses for absent data

A simple per-key lock approach:

import threading
from collections import defaultdict

class LoadingCache:
    def __init__(self, cache):
        self.cache = cache
        self.locks = defaultdict(threading.Lock)

    def get_or_load(self, key, loader):
        value = self.cache.get(key)

        if value is not None:
            return value

        with self.locks[key]:
            value = self.cache.get(key)

            if value is not None:
                return value

            value = loader(key)
            self.cache.put(key, value)

            return value

The second cache check inside the lock is essential. Another thread may have already loaded the value.

Negative Caching

Sometimes the expensive result is “not found.”

Example:

load user 999 -> not found

Without negative caching, repeated requests for the same missing user hit the backing store repeatedly.

Use a sentinel for negative entries.

NOT_FOUND = object()

def get_or_load_with_negative_cache(cache, key, loader):
    value = cache.get(key)

    if value is NOT_FOUND:
        return None

    if value is not None:
        return value

    loaded = loader(key)

    if loaded is None:
        cache.put(key, NOT_FOUND)
        return None

    cache.put(key, loaded)
    return loaded

Negative entries should usually have shorter TTLs than positive entries, because missing data may appear later.

Metrics

A cache needs metrics to be useful.

Track:

Metric Meaning
hit_count Number of successful cache reads
miss_count Number of failed cache reads
hit_rate hit_count divided by total reads
eviction_count Number of removed entries
load_count Number of calls to backing loader
load_latency Time spent loading values
current_size Number or weight of entries
stale_served_count Number of stale values returned

A cache with a low hit rate may be too small, keyed incorrectly, or caching the wrong workload.

Testing

Test basic insertion and lookup.

def test_lru_get_existing_value():
    cache = LRUCache(capacity=2)

    cache.put("A", 1)

    assert cache.get("A") == 1

Test eviction order.

def test_lru_evicts_least_recently_used():
    cache = LRUCache(capacity=2)

    cache.put("A", 1)
    cache.put("B", 2)

    assert cache.get("A") == 1

    cache.put("C", 3)

    assert cache.get("B") is None
    assert cache.get("A") == 1
    assert cache.get("C") == 3

Test update behavior.

def test_lru_update_refreshes_entry():
    cache = LRUCache(capacity=2)

    cache.put("A", 1)
    cache.put("B", 2)
    cache.put("A", 10)
    cache.put("C", 3)

    assert cache.get("A") == 10
    assert cache.get("B") is None

These tests protect the linked-list invariant and eviction behavior.

Common Bugs

The most common LRU bug is updating the hash table but forgetting to update the linked list. This causes stale nodes and incorrect eviction.

Another common bug is failing to remove evicted entries from the hash table. The cache then returns values that should no longer exist.

Sentinel handling can be wrong at boundaries. Use head and tail sentinels to avoid empty-list special cases.

Returning None for a miss is ambiguous when None is a valid value. Use a sentinel or an explicit result type.

TTL caches often return expired values because expiration is checked only during cleanup. Check expiration on read.

Concurrent caches often duplicate expensive loads. Use per-key locking or request coalescing.

Weighted caches can exceed memory budgets if updates do not subtract the old weight before adding the new one.

Recipe

Build the cache in layers.

Start with an LRU cache using a hash table and a doubly linked list. Use sentinels to simplify list manipulation. Make get refresh recency. Make put update existing entries and evict from the tail when full. Add TTL only after capacity eviction works. Add byte-based capacity when item sizes vary. Add loader integration, per-key locks, and negative caching when the cache fronts an expensive source. Track hit rate, miss rate, eviction count, and load latency.

The main lesson is that a cache eviction policy is a contract about what the system chooses to remember. LRU is a strong default because recent use often predicts future use, but the right policy depends on workload shape, memory budget, staleness tolerance, and concurrency behavior.