# 75. Function Call Fast Paths

# 75. Function Call Fast Paths

Function calls are one of the most important performance paths in CPython.

Python programs call functions constantly:

```python
len(xs)
str(x)
range(n)
obj.method()
f(a, b)
callback(event)
decorator(fn)
```

A call looks simple at the source level, but CPython must support many callable forms:

```text
Python functions
bound methods
builtin functions
builtin methods
C extension callables
classes
objects with __call__
classmethod objects
staticmethod objects
partial objects
decorated wrappers
```

A generic call path must also handle:

```text
positional arguments
keyword arguments
default values
keyword-only arguments
positional-only arguments
*args
**kwargs
bound self
descriptors
argument errors
trace hooks
profiling hooks
```

Function call fast paths exist to make common calls avoid the full generic machinery.

## 75.1 Why Calls Are Expensive

A Python function call does much more than jump to a code address.

For this call:

```python
result = f(x, y)
```

CPython may need to:

```text
load f
load x
load y
determine whether f is callable
prepare arguments
bind arguments to parameters
create or initialize a frame
set locals
handle defaults
handle keyword arguments
enter the evaluation loop
execute bytecode
return a result
deallocate or recycle frame state
```

For a method call:

```python
obj.f(x)
```

there is additional work:

```text
look up attribute f
apply descriptor protocol
bind obj as self
avoid or create bound method object
prepare final argument list
call target
```

Fast paths reduce this overhead for common call shapes.

## 75.2 Calls in the Bytecode Stream

A call is represented by several bytecode instructions.

A simple call:

```python
f(x)
```

conceptually becomes:

```text
LOAD_GLOBAL f
LOAD_FAST x
PRECALL
CALL
```

A method call:

```python
obj.f(x)
```

conceptually becomes:

```text
LOAD_FAST obj
LOAD_METHOD f
LOAD_FAST x
PRECALL
CALL
```

The exact bytecode changes by Python version, but the model is stable:

```text
load callable
load arguments
prepare call
execute call
```

This split allows CPython to specialize different parts of the call sequence.

## 75.3 The Generic Call Path

The generic call path must support every callable.

Conceptually:

```c
PyObject *result = PyObject_Call(callable, args_tuple, kwargs_dict);
```

This representation is flexible, but expensive.

It often requires:

```text
allocating a tuple for positional arguments
allocating a dict for keyword arguments
normalizing argument layout
checking callable protocol
dispatching through tp_call
handling errors
```

A fully generic call is necessary for correctness, but it is too expensive for hot loops and small functions.

## 75.4 Vectorcall

Vectorcall is CPython’s main fast calling convention for many callables.

Instead of packaging arguments into a tuple and dictionary, CPython passes arguments as a C array of `PyObject *`.

Conceptually:

```c
result = vectorcallfunc(
    callable,
    args_array,
    nargsf,
    kwnames
);
```

The important idea:

```text
arguments are already on the stack
call target reads them directly
no temporary args tuple required
no temporary kwargs dict required in common cases
```

For a call like:

```python
f(a, b, c)
```

the stack already contains:

```text
f
a
b
c
```

Vectorcall lets the callee consume that layout directly.

## 75.5 Argument Stack Layout

Before the call executes, CPython has already pushed the callable and arguments onto the frame stack.

Conceptually:

```text
stack before CALL:

callable
arg0
arg1
arg2
```

The call instruction knows the argument count.

Instead of constructing:

```python
(args_tuple, kwargs_dict)
```

the interpreter can pass a pointer into the stack.

This matters because small calls dominate Python workloads.

Avoiding temporary tuple and dict allocation removes significant overhead.

## 75.6 Positional Calls

The fastest common case is a positional-only call with a known callable shape:

```python
f(a, b)
```

No keyword matching is needed.

The call path can:

```text
verify callable supports vectorcall
pass argument pointer and count
create frame if Python function
run directly if C builtin
return result
```

This path avoids:

```text
keyword dictionary construction
argument tuple construction
complex binding logic
```

Many inner-loop calls fit this shape.

## 75.7 Keyword Calls

Keyword calls are more complex:

```python
f(x=1, y=2)
```

CPython must preserve keyword names and match them to parameters.

Vectorcall still avoids a full kwargs dictionary in many cases by passing keyword names separately.

Conceptually:

```text
args array:
    1
    2

kwnames:
    ("x", "y")
```

The callee can bind keyword values using the compact `kwnames` tuple.

This avoids building a full dictionary unless the function actually needs one, such as for `**kwargs`.

## 75.8 Python Function Calls

Calling a Python function usually means creating or initializing a new frame.

For:

```python
def add(a, b):
    return a + b

add(1, 2)
```

CPython must:

```text
load function object
read code object
map arguments to local slots
initialize frame
execute bytecode
return value
```

A Python function object contains:

```text
code object
globals dictionary
defaults
keyword defaults
closure cells
annotations
qualname
module
```

Fast paths optimize argument binding and frame setup.

## 75.9 Fast Locals and Argument Binding

Python function parameters are stored in fast local slots.

For:

```python
def f(a, b, c):
    return a + b + c
```

the frame can store:

```text
localsplus[0] = a
localsplus[1] = b
localsplus[2] = c
```

A positional call can copy stack arguments directly into these slots.

This is much faster than creating a dictionary for locals.

The local dictionary seen through `locals()` is not the primary execution storage for optimized function execution.

## 75.10 Defaults

Default arguments complicate binding:

```python
def f(a, b=10):
    return a + b
```

A call:

```python
f(1)
```

must fill `b` from the function’s defaults.

CPython stores defaults on the function object.

Fast binding logic can fill missing trailing positional parameters from the defaults tuple.

Conceptually:

```text
provided:
    a = 1

defaults:
    b = 10

locals:
    a = 1
    b = 10
```

This remains cheaper than the fully generic argument binding path.

## 75.11 Keyword-Only Arguments

Keyword-only arguments require separate handling:

```python
def f(a, *, limit=10):
    return a + limit
```

A call:

```python
f(3, limit=5)
```

must bind `limit` by name.

The interpreter must check:

```text
required keyword-only parameters
keyword-only defaults
unexpected keywords
duplicate bindings
```

Fast paths can still help when keyword names are known and the shape is stable.

## 75.12 Positional-Only Arguments

CPython also supports positional-only arguments:

```python
def f(a, b, /, c):
    return a + b + c
```

Many builtins use positional-only parameters.

This simplifies binding because those arguments cannot be passed by keyword.

A positional-only fast path can avoid keyword name checks for those parameters.

## 75.13 Builtin Function Calls

Builtin functions are often implemented in C.

Example:

```python
len(xs)
```

At runtime, this can call a C function directly through a fast calling convention.

A builtin fast path can avoid creating a Python frame.

Execution shape:

```text
LOAD_GLOBAL len
LOAD_FAST xs
PRECALL
CALL
    call C function
return PyObject *
```

This is much cheaper than calling a Python function, although it still works with Python objects and reference counts.

## 75.14 Method Calls

Method calls are especially important.

Source:

```python
obj.method(x)
```

Naive execution would do this:

```text
look up obj.method
create bound method object
call bound method with x
destroy bound method
```

A bound method object packages:

```text
function
self
```

CPython avoids creating this temporary object for common method calls.

The bytecode sequence uses method-aware instructions.

Conceptually:

```text
LOAD_METHOD method
LOAD_FAST x
CALL
```

The interpreter can keep the function and `self` separately on the stack.

## 75.15 Avoiding Bound Method Allocation

For a normal instance method:

```python
class C:
    def f(self, x):
        return x + 1

obj = C()
obj.f(10)
```

The optimized path can behave like:

```text
find function C.f
push function
push self
push argument 10
call function with self inserted
```

No bound method object needs to be allocated.

This saves:

```text
object allocation
reference count operations
temporary object lifetime management
extra indirection
```

Method-call optimization is one of CPython’s most important object-oriented fast paths.

## 75.16 Descriptor-Aware Method Lookup

Method lookup must respect the descriptor protocol.

A function stored on a class is a descriptor. Accessing it through an instance normally produces a bound method.

But not every attribute access is a simple method:

```text
property
staticmethod
classmethod
custom descriptor
plain callable object
data descriptor
non-data descriptor
```

The fast path must only skip bound method creation when semantics allow it.

If the descriptor behavior is unusual, CPython falls back to the generic path.

## 75.17 CALL Specialization

Modern CPython specializes call instructions.

Common specialized cases include:

```text
Python function with positional arguments
builtin C function with vectorcall
method descriptor
bound method
class construction
```

Specialization lets the `CALL` instruction skip general dispatch once the call site becomes stable.

A hot call site may repeatedly see the same callable type and same argument shape.

That pattern is exactly what adaptive specialization exploits.

## 75.18 PRECALL

`PRECALL` exists to prepare and specialize calls.

It gives the interpreter a separate point to optimize before the final call happens.

Conceptually:

```text
PRECALL:
    inspect callable shape
    specialize call path
    prepare cache state

CALL:
    execute call
```

Separating preparation from execution gives CPython more room to optimize call sequences.

## 75.19 Vectorcall and Classes

Classes are callable.

```python
obj = C(1, 2)
```

This involves object construction:

```text
call metaclass __call__
allocate instance through __new__
initialize instance through __init__
return instance
```

Fast paths can help for common class construction patterns, but class calls are still more complex than ordinary function calls.

Class creation and instance construction involve descriptors, metaclasses, allocation, and initialization.

## 75.20 `__call__` Objects

Any object can be callable if its type defines `__call__`.

Example:

```python
class Adder:
    def __call__(self, x):
        return x + 1

add_one = Adder()
add_one(10)
```

The generic callable protocol handles this.

Fast paths may help if the call target is stable, but the interpreter must still honor normal attribute lookup and descriptor semantics for `__call__`.

## 75.21 Star Arguments

Star arguments force more general handling.

```python
f(*args)
```

CPython must unpack `args` into positional arguments.

```python
f(**kwargs)
```

must unpack a mapping into keyword arguments.

```python
f(*args, **kwargs)
```

may require merging and duplicate checking.

These forms often reduce fast-path opportunities because the final argument shape is known only at runtime.

## 75.22 Argument Error Reporting

Fast paths still need precise error behavior.

Example:

```python
def f(a, b):
    pass

f(1)
```

must raise an error like:

```text
missing required positional argument
```

Similarly:

```python
f(1, 2, 3)
```

must report too many arguments.

Keyword errors must also be exact:

```text
unexpected keyword argument
multiple values for argument
missing keyword-only argument
```

The fast path cannot produce vague or incorrect errors.

## 75.23 Recursion Checks

Each Python call can increase recursion depth.

CPython must check recursion limits to prevent uncontrolled C stack growth.

Example:

```python
def f():
    return f()

f()
```

Eventually raises:

```text
RecursionError
```

Fast call paths must preserve recursion checks.

The call may be optimized, but entering Python execution still needs recursion accounting.

## 75.24 Tracing and Profiling

Tracing and profiling hooks complicate calls.

Tools may need events for:

```text
call
return
exception
line
opcode
```

When tracing or profiling is active, CPython may take slower paths so it can produce correct events.

Fast call paths are most effective when tracing is disabled.

## 75.25 Reference Ownership in Calls

Calls are dense with reference count operations.

The interpreter must manage:

```text
callable reference
argument references
default references
temporary references
return value reference
exception state references
frame references
```

A correct fast path must be as careful as the generic path.

A small mistake can create:

```text
memory leak
double free
use after free
incorrect object lifetime
```

Call optimization is therefore performance-sensitive and correctness-sensitive.

## 75.26 Return Value Handling

A call returns either:

```text
new reference to result
exception indicator
```

The caller must:

```text
push result on stack
or propagate exception
```

For Python functions, returning from the callee frame produces the result.

For C functions, the C API convention is usually:

```text
return PyObject * on success
return NULL with exception set on failure
```

The call instruction must handle both conventions correctly.

## 75.27 Calls and Exceptions

Any call can raise.

```python
f()
```

may raise because:

```text
f explicitly raises
argument binding fails
descriptor lookup fails
__call__ fails
allocation fails
recursion limit is exceeded
C extension reports error
```

Fast paths must always include a clean exception exit path.

That path must unwind stack state and maintain frame invariants.

## 75.28 Calls and Inline Caches

Call specialization uses inline caches.

A call site may cache:

```text
callable kind
call target
argument count
keyword shape
function version
type version
descriptor result
```

The cache allows the interpreter to say:

```text
this site is still calling the same kind of thing
use the fast path
```

If the callable changes, validation fails and CPython falls back.

## 75.29 Calls and Global Lookup

Many calls start with a global lookup:

```python
len(xs)
print(x)
range(n)
isinstance(x, T)
```

Optimization often involves two stages:

```text
LOAD_GLOBAL specialization
CALL specialization
```

For `len(xs)`:

```text
cache builtin len
recognize builtin call shape
call C implementation efficiently
```

The combined effect is larger than either optimization alone.

## 75.30 Calls and Attribute Lookup

Method calls combine attribute lookup and call execution.

```python
obj.append(x)
```

The optimized path may include:

```text
specialized LOAD_METHOD
specialized PRECALL
specialized CALL
```

For common built-in types such as list, dict, and str, this can be highly optimized.

Example:

```python
items.append(x)
```

is a very common operation. CPython invests heavily in making this shape efficient.

## 75.31 Calls and Frame Allocation

Frame allocation used to be a significant cost.

Modern CPython reduces this cost through internal frame representations and frame object laziness.

A Python call needs execution state, but it does not always need a full heap-allocated Python frame object visible to user code.

A full frame object may be materialized only when needed, such as for:

```text
tracebacks
inspect.currentframe()
debuggers
profilers
generators and coroutines
```

This reduces ordinary call overhead.

## 75.32 Inlining

CPython generally does not inline Python functions in the classic compiler sense.

For example:

```python
def add(a, b):
    return a + b

x = add(1, 2)
```

CPython does not normally replace the call with:

```python
x = 1 + 2
```

Function inlining is difficult in Python because:

```text
functions can be rebound
globals can change
defaults can change
closures exist
tracing expects frames
introspection expects call structure
exceptions need correct tracebacks
```

Instead, CPython focuses on reducing call overhead rather than eliminating calls entirely.

## 75.33 Tail Calls

CPython does not perform general tail call optimization.

Example:

```python
def fact(n, acc=1):
    if n == 0:
        return acc
    return fact(n - 1, acc * n)
```

This still consumes one Python call frame per recursive call.

Reasons include:

```text
debuggability
traceback preservation
introspection behavior
semantic expectations
implementation simplicity
```

Function call fast paths reduce overhead, but they do not turn recursive Python into loops.

## 75.34 Calls to C Extensions

C extension calls can be fast when they use modern calling conventions.

Older extension functions may use:

```c
METH_VARARGS
```

which receives a tuple of arguments.

Newer forms can use:

```c
METH_FASTCALL
METH_FASTCALL | METH_KEYWORDS
```

These integrate better with vectorcall-style argument passing.

Extension authors can significantly affect call overhead by choosing the right calling convention.

## 75.35 Bound Methods

A bound method object stores:

```text
underlying function
bound self object
```

Example:

```python
m = obj.method
m(1)
```

Here the bound method object is observable and must exist.

But for immediate calls:

```python
obj.method(1)
```

CPython can often avoid allocating it.

This distinction is important:

```python
obj.method
```

requires an object result.

```python
obj.method()
```

may use a call-specific optimization.

## 75.36 Constructors

Calling a class:

```python
C(x)
```

normally performs:

```text
type.__call__
C.__new__
C.__init__
```

Fast paths are limited by Python’s object construction semantics.

Custom metaclasses, custom `__new__`, and custom `__init__` can all affect behavior.

Still, common built-in constructors such as:

```python
list(x)
dict(x)
tuple(x)
int(x)
str(x)
```

often use efficient C-level call paths.

## 75.37 Fast Paths Are Conservative

CPython call fast paths are conservative.

They optimize when assumptions are cheap to validate and safe.

They fall back when:

```text
call target changes
argument shape changes
tracing is active
descriptor behavior is unusual
callable type is unknown
keyword handling is complex
```

This is the central rule:

```text
fast path for common case
generic path for everything else
```

## 75.38 Performance Guidelines From Internals

Understanding call fast paths suggests practical Python guidelines:

| Pattern | Reason |
|---|---|
| Prefer simple positional calls in hot loops | Cheapest argument binding |
| Avoid unnecessary wrappers in inner loops | Each wrapper adds another call |
| Move repeated dynamic lookup out of hot loops when useful | Reduces lookup plus call overhead |
| Use builtins directly where appropriate | Many builtins have efficient C paths |
| Avoid excessive `*args` and `**kwargs` in hot paths | Forces general argument handling |
| Keep call sites type-stable | Helps adaptive specialization |

These are not rigid rules. They are performance heuristics.

Correct design comes first. Optimize only where measurement shows call overhead matters.

## 75.39 Reading Call Paths in CPython

Important areas to study:

| Area | Purpose |
|---|---|
| `Python/ceval.c` | Executes call opcodes |
| `Objects/call.c` | Generic and vectorcall helpers |
| `Include/cpython/abstract.h` | Public call APIs |
| `Include/internal/pycore_call.h` | Internal call helpers |
| `Objects/methodobject.c` | Builtin function and method objects |
| `Objects/funcobject.c` | Python function objects |
| `Objects/typeobject.c` | Type calls and descriptor machinery |

The exact layout changes across releases, but these areas contain the main machinery.

## 75.40 Mental Model

A useful model:

```text
A Python call is argument layout plus callable dispatch plus frame or C entry.
```

Fast paths optimize all three:

```text
argument layout:
    avoid tuple and dict construction

callable dispatch:
    recognize common callable kinds

execution entry:
    use direct Python frame setup or C vectorcall
```

The generic call protocol remains available for every unusual case.

## 75.41 Chapter Summary

Function call fast paths reduce the overhead of one of Python’s most frequent operations.

They rely on:

```text
vectorcall
fast locals
method-call optimization
bound method avoidance
call-site specialization
inline caches
efficient frame setup
C fastcall conventions
```

These optimizations do not remove Python’s dynamic call semantics. They make common call shapes cheaper while preserving correctness for descriptors, keyword arguments, `*args`, `**kwargs`, tracing, exceptions, recursion checks, and the C API.
