# 73. Inline Caches

# 73. Inline Caches

Inline caches are small cache records stored near bytecode instructions. They let CPython remember facts discovered during previous executions of an instruction, then reuse those facts on later executions.

The goal is simple: avoid repeating expensive dynamic lookups when the runtime situation has not changed.

For example, this expression looks small:

```python
obj.name
```

But generic attribute lookup may involve:

```text
look at obj type
check descriptors
check instance dictionary
check class dictionary
check base classes
handle __getattribute__
handle __getattr__
raise AttributeError if missing
```

If the same bytecode sees the same object type many times, CPython can cache the lookup path.

```text
LOAD_ATTR
    cache: expected type
    cache: dictionary version
    cache: offset or descriptor data
```

The next execution can check the cache quickly. If the check succeeds, the interpreter takes a fast path. If it fails, CPython falls back to the generic slow path.

## 73.1 Why Inline Caches Exist

Python operations are dynamic.

This code:

```python
x.value
```

does not statically mean “load field at offset 8”.

It means:

```text
perform Python attribute access semantics
```

That semantic operation may involve many runtime decisions.

However, real programs often behave predictably:

```python
for user in users:
    total += user.score
```

Inside the loop, `user` is usually the same type each time. The attribute name is fixed. The class layout is usually stable. The lookup result is often the same kind of operation.

Inline caches exploit that regularity.

They preserve Python semantics while optimizing the common case.

## 73.2 Inline Cache Position

An inline cache sits beside the instruction it supports.

Conceptually:

```text
LOAD_ATTR score
CACHE
CACHE
CACHE
```

The cache entries are part of the bytecode stream layout, but they are not normal source-level operations. They are reserved storage used by the interpreter.

The instruction owns its cache records.

This differs from a global hash table cache:

```text
global cache:
    key = operation + type + name
    value = resolved lookup
```

Inline caches are local:

```text
bytecode offset 42:
    LOAD_ATTR score
    cache for this exact LOAD_ATTR
```

Locality matters. The interpreter can reach the cache directly from the instruction pointer.

## 73.3 Monomorphic Caches

The simplest useful inline cache is monomorphic.

It remembers one observed shape.

Example:

```python
def f(obj):
    return obj.x
```

If `f` is called repeatedly with instances of the same class, the cache can store:

```text
expected type: Point
attribute name: x
lookup data: instance dict offset or slot offset
version tag: current class dictionary version
```

Then execution becomes:

```text
if type(obj) == Point and cached versions still match:
    use cached fast path
else:
    use generic lookup
```

This is called monomorphic because the call site or attribute site has one dominant type.

Many Python programs have many monomorphic sites.

## 73.4 Polymorphism and Cache Misses

A bytecode location may observe more than one type.

```python
def get_name(obj):
    return obj.name

get_name(user)
get_name(team)
get_name(project)
```

The same `LOAD_ATTR name` instruction sees `User`, `Team`, and `Project`.

A simple monomorphic cache can only remember one of them. When a different type appears, the cache misses.

A miss does not produce incorrect behavior. It only means CPython uses the generic path and may update or deoptimize the cache.

Possible outcomes:

```text
cache hit
    fast path

cache miss with useful new pattern
    adapt or respecialize

cache miss with unstable pattern
    stay generic or back off
```

Inline caches are speculative. They optimize what actually happens, not what the source code might do.

## 73.5 Attribute Lookup Caches

Attribute access is one of the most important cache targets.

The bytecode instruction:

```text
LOAD_ATTR
```

supports expressions such as:

```python
obj.x
```

Generic lookup must respect the descriptor protocol and class hierarchy.

A cache may remember:

```text
object type
type version
dictionary version
attribute offset
descriptor pointer
whether result comes from instance or class
```

A slot-based class can be especially efficient:

```python
class Point:
    __slots__ = ("x", "y")
```

For `p.x`, the cache can often reduce lookup to a checked offset load.

For normal instance dictionaries, the cache may remember dictionary layout information.

## 73.6 Method Lookup Caches

Method calls are another major target.

Consider:

```python
obj.method(arg)
```

This involves two logical steps:

```text
load method
call method
```

Naively, loading a method creates a bound method object:

```text
function + self
```

Creating bound method objects repeatedly is expensive.

CPython has fast paths for method calls that avoid temporary bound method allocation in common cases.

Conceptually:

```text
LOAD_METHOD method
PRECALL
CALL
```

The method lookup cache can remember the resolved method and the assumptions that make it valid.

Fast method calls matter because object-oriented Python code performs method dispatch constantly.

## 73.7 Global and Builtin Caches

Global name access is dynamic.

```python
len(xs)
```

The name `len` is resolved through namespaces:

```text
locals
globals
builtins
```

At module level, global lookup checks the module dictionary and then builtins.

A cache can remember:

```text
globals dictionary version
builtins dictionary version
resolved object
```

If neither dictionary has changed, the cached object remains valid.

This makes repeated global and builtin access faster.

Example:

```python
for item in xs:
    total += len(item)
```

The `len` lookup can usually be cached after the first few iterations.

## 73.8 Binary Operation Caches

Binary operations are dynamic.

```python
a + b
```

The operation depends on runtime types:

```text
int + int
str + str
list + list
custom __add__
custom __radd__
unsupported pair
```

The generic path must handle all of this.

An inline cache can specialize the site for common pairs:

```text
int + int
float + float
str + str
```

For integer addition, the fast path can avoid much of the generic numeric dispatch. It still must handle overflow and object allocation rules, but it can skip broad type-dispatch logic.

## 73.9 Subscript Caches

Subscript access is also dynamic:

```python
obj[key]
```

Common cases include:

```text
list[index]
tuple[index]
dict[key]
str[index]
custom __getitem__
```

A cache can specialize:

```text
list with integer index
tuple with integer index
dict with exact key type
```

Example:

```python
for i in range(len(xs)):
    item = xs[i]
```

When `xs` is a list and `i` is an integer, the interpreter can use a specialized fast path.

## 73.10 Cache Validation

Every cache needs validation.

A cached result is valid only while its assumptions remain true.

For attribute lookup, assumptions may include:

```text
object has expected type
type dictionary has not changed
instance dictionary layout has not changed
descriptor has not changed
```

For global lookup:

```text
globals dictionary has not changed
builtins dictionary has not changed
```

For binary operations:

```text
left operand has expected exact type
right operand has expected exact type
```

Validation must be cheaper than the full operation. Otherwise the cache does not help.

A typical fast path is:

```text
check type pointer
check version tag
load cached value
```

## 73.11 Version Tags

Version tags are small change counters used to detect mutation.

For example, a dictionary can carry a version value that changes when the dictionary changes.

A cache can store:

```text
expected dictionary version = 12345
```

Later:

```text
if dict.version == 12345:
    cached lookup remains valid
else:
    cache miss
```

This avoids scanning the dictionary to prove that nothing changed.

Version tags turn invalidation into a cheap comparison.

## 73.12 Cache Invalidation

Some virtual machines use active invalidation: when a class changes, all dependent caches are found and cleared.

CPython primarily favors cheap validation at the use site.

That means the cache often remains physically present, but it stops matching once the relevant version tag changes.

Example:

```python
class C:
    x = 1

def f(obj):
    return obj.x

C.x = 2
```

After `C.x = 2`, the class dictionary version changes. The cached lookup for `obj.x` fails validation and falls back to generic lookup.

The bytecode site can then adapt again.

## 73.13 Cache Warmup

Inline caches need warmup.

At first execution, the interpreter does not know what types a site will see.

Execution starts generic:

```text
LOAD_ATTR
```

After enough executions, CPython gathers enough evidence to specialize:

```text
LOAD_ATTR_INSTANCE_VALUE
```

or another specialized form.

The exact opcode names and thresholds vary by Python version.

The pattern is stable:

```text
start generic
observe behavior
specialize
hit fast path
deoptimize if assumptions fail
```

## 73.14 Adaptive Instructions

Modern CPython uses adaptive instructions as part of specialization.

An adaptive instruction counts executions and misses.

Conceptually:

```text
LOAD_ATTR_ADAPTIVE
    counter
    cache entries
```

When the counter reaches a threshold, CPython attempts specialization.

If specialization succeeds, the instruction changes into a more specific opcode.

If specialization fails repeatedly, CPython can delay future specialization attempts.

This prevents unstable code from wasting time on constant respecialization.

## 73.15 Deoptimization

Deoptimization means returning from a specialized form to a more generic form.

Example:

```text
LOAD_ATTR_INSTANCE_VALUE
    ↓
LOAD_ATTR_ADAPTIVE
    ↓
LOAD_ATTR
```

A specialized instruction may deoptimize when assumptions fail too often.

Reasons include:

```text
many unrelated object types
mutating class dictionaries
custom attribute hooks
changing globals
unusual descriptors
```

Deoptimization preserves correctness.

Optimization is optional. Semantics always come from the generic operation.

## 73.16 Cache Entries Are Hidden From Normal Disassembly

Inline caches occupy bytecode space, but normal disassembly hides them by default.

You can ask `dis` to show caches in recent Python versions:

```python
import dis

def f(obj):
    return obj.x

dis.dis(f, show_caches=True)
```

Conceptually, you may see:

```text
LOAD_FAST                0 (obj)
LOAD_ATTR                0 (x)
CACHE
CACHE
CACHE
CACHE
RETURN_VALUE
```

The exact number of cache entries depends on the instruction.

This makes `dis` useful for studying interpreter specialization.

## 73.17 Specialized Bytecode Is Runtime State

Specialization mutates executable bytecode state in memory.

The source code and logical code object remain the same from the language perspective, but the interpreter’s internal instruction stream may change as the program runs.

This has several consequences:

```text
bytecode execution can become faster after warmup
the same source can specialize differently in different runs
debug and tracing modes may inhibit specialization
different Python versions may show different opcodes
```

A performance investigation should account for warmup.

A single cold execution may measure generic bytecode more than specialized execution.

## 73.18 Inline Caches and Correctness

Inline caches must preserve full Python semantics.

They cannot assume that Python behaves like a static language.

For example, this must still work:

```python
class C:
    x = 1

obj = C()

def f():
    return obj.x

print(f())
C.x = 2
print(f())
```

The second call must see the updated value.

Therefore the cache must notice the class dictionary change.

Similarly, monkey patching builtins must remain visible:

```python
import builtins

old_len = builtins.len
builtins.len = lambda x: 42

try:
    print(len([1, 2, 3]))
finally:
    builtins.len = old_len
```

A cache for `len` must not ignore the mutated builtins dictionary.

## 73.19 Inline Caches and the C API

The C API complicates caching.

Native extensions can mutate objects, dictionaries, types, and descriptors through C-level operations.

CPython’s caches must remain valid under those mutations.

Version tags and runtime checks provide the bridge:

```text
C extension mutates dictionary
    ↓
dictionary version changes
    ↓
cached lookup fails validation
    ↓
generic path runs
```

This is one reason CPython cannot freely use aggressive assumptions without careful invalidation rules.

## 73.20 Inline Caches and Type Stability

Inline caches reward type-stable code.

Type-stable code repeatedly presents the same types at the same bytecode sites.

Example:

```python
def area(rectangles):
    total = 0
    for r in rectangles:
        total += r.width * r.height
    return total
```

If every `r` is a `Rectangle`, the attribute sites specialize well.

Less stable code:

```python
def read(obj):
    return obj.value
```

called with many unrelated object types may remain generic or miss often.

This does not make dynamic code wrong. It just changes the optimization profile.

## 73.21 Inline Caches and Object Layout

Object layout affects cache quality.

Instance dictionaries are flexible but require dictionary machinery.

Slots provide fixed storage:

```python
class Point:
    __slots__ = ("x", "y")
```

A slot access can often be cached as:

```text
expected type
slot offset
```

This is closer to field access in static languages.

However, `__slots__` changes object semantics and should be used for concrete reasons such as memory use, layout stability, or very frequent attribute access.

## 73.22 Inline Caches and Descriptors

Descriptors make attribute access powerful.

Examples include:

```text
functions
property objects
staticmethod
classmethod
custom descriptors
```

A descriptor can define:

```python
__get__
__set__
__delete__
```

This affects whether an attribute is loaded from the instance, class, or descriptor result.

Inline caches must encode descriptor-sensitive paths.

For a normal method:

```python
obj.method()
```

the cache may optimize method lookup.

For a property:

```python
obj.value
```

the property getter must still execute.

The cache cannot replace a descriptor call with a raw value unless semantics allow it.

## 73.23 Inline Caches and Globals

Global caching depends on namespace stability.

Example:

```python
def f(xs):
    return len(xs)
```

The function’s global dictionary and the builtins dictionary define name resolution.

A cache for `len` remains valid while both dictionaries keep the same relevant version state.

If the module assigns a new global:

```python
len = lambda x: 0
```

the global dictionary changes. The lookup must be redone.

This keeps Python’s dynamic namespace behavior intact.

## 73.24 Inline Caches and Imports

Imports often create global names:

```python
import math

def f(x):
    return math.sqrt(x)
```

This involves two cacheable operations:

```text
LOAD_GLOBAL math
LOAD_ATTR sqrt
```

After warmup:

```text
math lookup can be cached
sqrt attribute lookup can be cached
```

This is why moving imports outside hot loops helps, but repeated module attribute access can still become relatively efficient after specialization.

## 73.25 Inline Caches and Loops

Loops amplify cache benefits.

A small loop body may execute millions of times:

```python
for p in points:
    total += p.x
```

The `LOAD_ATTR x` instruction may execute once per iteration.

A cache miss on the first few iterations matters little. Cache hits on the remaining iterations matter a lot.

This is the main performance argument for inline caches:

```text
pay small warmup cost
reduce repeated dynamic overhead
```

## 73.26 Inline Caches vs Memoization

Inline caching differs from user-level memoization.

Memoization caches function results:

```text
same input
same output
```

Inline caching caches operation resolution:

```text
same runtime shape
same fast path
```

Example:

```python
obj.x
```

The cache does not necessarily store the final value of `obj.x`. It may store how to find it quickly.

For mutable objects, storing the final value would often be wrong.

## 73.27 Inline Caches vs CPU Caches

Inline caches are interpreter-level data structures.

CPU caches are hardware-level memory caches.

They are unrelated mechanisms, but they interact.

Inline caches improve interpreter logic by avoiding expensive lookups.

CPU caches improve memory access by keeping recently used memory close to the processor.

A good inline cache design also considers hardware locality:

```text
cache entries near bytecode
small fixed-size records
few pointer chases
cheap validation
```

## 73.28 Performance Shape

Inline caches improve common dynamic operations:

```text
attribute access
method calls
global lookup
binary operations
subscript access
unpacking
calls
```

They help most when code has:

```text
hot loops
stable types
stable globals
stable class dictionaries
repeated operations at the same bytecode sites
```

They help less when code has:

```text
frequent monkey patching
many unrelated types at one site
custom dynamic lookup hooks
heavy tracing
mostly cold execution
```

## 73.29 Reading Inline Cache Code

When reading CPython source, look for:

```text
adaptive opcode families
specialization counters
cache structures
version checks
deoptimization paths
miss handlers
generic fallback calls
```

Relevant areas include:

```text
Python/bytecodes.c
Python/generated_cases.c.h
Python/specialize.c
Include/internal/pycore_code.h
Lib/dis.py
```

The exact file organization can change between CPython versions, but the concepts remain recognizable.

## 73.30 Mental Model

A useful model:

```text
An inline cache turns repeated dynamic lookup into checked direct access.
```

The check preserves correctness.

The direct access improves performance.

The fallback preserves Python semantics.

```text
generic operation
    ↓
observe
    ↓
specialize
    ↓
validate cache
    ↓
fast path
    ↓
fallback if invalid
```

## 73.31 Chapter Summary

Inline caches are local runtime caches attached to bytecode instructions.

They store facts such as:

```text
observed types
dictionary versions
attribute offsets
resolved descriptors
global lookup results
operation-specific fast paths
```

They are central to modern CPython performance because they reduce repeated dynamic dispatch overhead without changing Python semantics.

Inline caches work best when a bytecode site sees stable runtime behavior. When assumptions fail, CPython falls back to generic execution and may respecialize or deoptimize later.
