Skip to content

75. Function Call Fast Paths

CALL_PY_EXACT_ARGS, CALL_BUILTIN_FAST, and the fast-path conditions that bypass the generic call machinery.

Function calls are one of the most important performance paths in CPython.

Python programs call functions constantly:

len(xs)
str(x)
range(n)
obj.method()
f(a, b)
callback(event)
decorator(fn)

A call looks simple at the source level, but CPython must support many callable forms:

Python functions
bound methods
builtin functions
builtin methods
C extension callables
classes
objects with __call__
classmethod objects
staticmethod objects
partial objects
decorated wrappers

A generic call path must also handle:

positional arguments
keyword arguments
default values
keyword-only arguments
positional-only arguments
*args
**kwargs
bound self
descriptors
argument errors
trace hooks
profiling hooks

Function call fast paths exist to make common calls avoid the full generic machinery.

75.1 Why Calls Are Expensive

A Python function call does much more than jump to a code address.

For this call:

result = f(x, y)

CPython may need to:

load f
load x
load y
determine whether f is callable
prepare arguments
bind arguments to parameters
create or initialize a frame
set locals
handle defaults
handle keyword arguments
enter the evaluation loop
execute bytecode
return a result
deallocate or recycle frame state

For a method call:

obj.f(x)

there is additional work:

look up attribute f
apply descriptor protocol
bind obj as self
avoid or create bound method object
prepare final argument list
call target

Fast paths reduce this overhead for common call shapes.

75.2 Calls in the Bytecode Stream

A call is represented by several bytecode instructions.

A simple call:

f(x)

conceptually becomes:

LOAD_GLOBAL f
LOAD_FAST x
PRECALL
CALL

A method call:

obj.f(x)

conceptually becomes:

LOAD_FAST obj
LOAD_METHOD f
LOAD_FAST x
PRECALL
CALL

The exact bytecode changes by Python version, but the model is stable:

load callable
load arguments
prepare call
execute call

This split allows CPython to specialize different parts of the call sequence.

75.3 The Generic Call Path

The generic call path must support every callable.

Conceptually:

PyObject *result = PyObject_Call(callable, args_tuple, kwargs_dict);

This representation is flexible, but expensive.

It often requires:

allocating a tuple for positional arguments
allocating a dict for keyword arguments
normalizing argument layout
checking callable protocol
dispatching through tp_call
handling errors

A fully generic call is necessary for correctness, but it is too expensive for hot loops and small functions.

75.4 Vectorcall

Vectorcall is CPython’s main fast calling convention for many callables.

Instead of packaging arguments into a tuple and dictionary, CPython passes arguments as a C array of PyObject *.

Conceptually:

result = vectorcallfunc(
    callable,
    args_array,
    nargsf,
    kwnames
);

The important idea:

arguments are already on the stack
call target reads them directly
no temporary args tuple required
no temporary kwargs dict required in common cases

For a call like:

f(a, b, c)

the stack already contains:

f
a
b
c

Vectorcall lets the callee consume that layout directly.

75.5 Argument Stack Layout

Before the call executes, CPython has already pushed the callable and arguments onto the frame stack.

Conceptually:

stack before CALL:

callable
arg0
arg1
arg2

The call instruction knows the argument count.

Instead of constructing:

(args_tuple, kwargs_dict)

the interpreter can pass a pointer into the stack.

This matters because small calls dominate Python workloads.

Avoiding temporary tuple and dict allocation removes significant overhead.

75.6 Positional Calls

The fastest common case is a positional-only call with a known callable shape:

f(a, b)

No keyword matching is needed.

The call path can:

verify callable supports vectorcall
pass argument pointer and count
create frame if Python function
run directly if C builtin
return result

This path avoids:

keyword dictionary construction
argument tuple construction
complex binding logic

Many inner-loop calls fit this shape.

75.7 Keyword Calls

Keyword calls are more complex:

f(x=1, y=2)

CPython must preserve keyword names and match them to parameters.

Vectorcall still avoids a full kwargs dictionary in many cases by passing keyword names separately.

Conceptually:

args array:
    1
    2

kwnames:
    ("x", "y")

The callee can bind keyword values using the compact kwnames tuple.

This avoids building a full dictionary unless the function actually needs one, such as for **kwargs.

75.8 Python Function Calls

Calling a Python function usually means creating or initializing a new frame.

For:

def add(a, b):
    return a + b

add(1, 2)

CPython must:

load function object
read code object
map arguments to local slots
initialize frame
execute bytecode
return value

A Python function object contains:

code object
globals dictionary
defaults
keyword defaults
closure cells
annotations
qualname
module

Fast paths optimize argument binding and frame setup.

75.9 Fast Locals and Argument Binding

Python function parameters are stored in fast local slots.

For:

def f(a, b, c):
    return a + b + c

the frame can store:

localsplus[0] = a
localsplus[1] = b
localsplus[2] = c

A positional call can copy stack arguments directly into these slots.

This is much faster than creating a dictionary for locals.

The local dictionary seen through locals() is not the primary execution storage for optimized function execution.

75.10 Defaults

Default arguments complicate binding:

def f(a, b=10):
    return a + b

A call:

f(1)

must fill b from the function’s defaults.

CPython stores defaults on the function object.

Fast binding logic can fill missing trailing positional parameters from the defaults tuple.

Conceptually:

provided:
    a = 1

defaults:
    b = 10

locals:
    a = 1
    b = 10

This remains cheaper than the fully generic argument binding path.

75.11 Keyword-Only Arguments

Keyword-only arguments require separate handling:

def f(a, *, limit=10):
    return a + limit

A call:

f(3, limit=5)

must bind limit by name.

The interpreter must check:

required keyword-only parameters
keyword-only defaults
unexpected keywords
duplicate bindings

Fast paths can still help when keyword names are known and the shape is stable.

75.12 Positional-Only Arguments

CPython also supports positional-only arguments:

def f(a, b, /, c):
    return a + b + c

Many builtins use positional-only parameters.

This simplifies binding because those arguments cannot be passed by keyword.

A positional-only fast path can avoid keyword name checks for those parameters.

75.13 Builtin Function Calls

Builtin functions are often implemented in C.

Example:

len(xs)

At runtime, this can call a C function directly through a fast calling convention.

A builtin fast path can avoid creating a Python frame.

Execution shape:

LOAD_GLOBAL len
LOAD_FAST xs
PRECALL
CALL
    call C function
return PyObject *

This is much cheaper than calling a Python function, although it still works with Python objects and reference counts.

75.14 Method Calls

Method calls are especially important.

Source:

obj.method(x)

Naive execution would do this:

look up obj.method
create bound method object
call bound method with x
destroy bound method

A bound method object packages:

function
self

CPython avoids creating this temporary object for common method calls.

The bytecode sequence uses method-aware instructions.

Conceptually:

LOAD_METHOD method
LOAD_FAST x
CALL

The interpreter can keep the function and self separately on the stack.

75.15 Avoiding Bound Method Allocation

For a normal instance method:

class C:
    def f(self, x):
        return x + 1

obj = C()
obj.f(10)

The optimized path can behave like:

find function C.f
push function
push self
push argument 10
call function with self inserted

No bound method object needs to be allocated.

This saves:

object allocation
reference count operations
temporary object lifetime management
extra indirection

Method-call optimization is one of CPython’s most important object-oriented fast paths.

75.16 Descriptor-Aware Method Lookup

Method lookup must respect the descriptor protocol.

A function stored on a class is a descriptor. Accessing it through an instance normally produces a bound method.

But not every attribute access is a simple method:

property
staticmethod
classmethod
custom descriptor
plain callable object
data descriptor
non-data descriptor

The fast path must only skip bound method creation when semantics allow it.

If the descriptor behavior is unusual, CPython falls back to the generic path.

75.17 CALL Specialization

Modern CPython specializes call instructions.

Common specialized cases include:

Python function with positional arguments
builtin C function with vectorcall
method descriptor
bound method
class construction

Specialization lets the CALL instruction skip general dispatch once the call site becomes stable.

A hot call site may repeatedly see the same callable type and same argument shape.

That pattern is exactly what adaptive specialization exploits.

75.18 PRECALL

PRECALL exists to prepare and specialize calls.

It gives the interpreter a separate point to optimize before the final call happens.

Conceptually:

PRECALL:
    inspect callable shape
    specialize call path
    prepare cache state

CALL:
    execute call

Separating preparation from execution gives CPython more room to optimize call sequences.

75.19 Vectorcall and Classes

Classes are callable.

obj = C(1, 2)

This involves object construction:

call metaclass __call__
allocate instance through __new__
initialize instance through __init__
return instance

Fast paths can help for common class construction patterns, but class calls are still more complex than ordinary function calls.

Class creation and instance construction involve descriptors, metaclasses, allocation, and initialization.

75.20 __call__ Objects

Any object can be callable if its type defines __call__.

Example:

class Adder:
    def __call__(self, x):
        return x + 1

add_one = Adder()
add_one(10)

The generic callable protocol handles this.

Fast paths may help if the call target is stable, but the interpreter must still honor normal attribute lookup and descriptor semantics for __call__.

75.21 Star Arguments

Star arguments force more general handling.

f(*args)

CPython must unpack args into positional arguments.

f(**kwargs)

must unpack a mapping into keyword arguments.

f(*args, **kwargs)

may require merging and duplicate checking.

These forms often reduce fast-path opportunities because the final argument shape is known only at runtime.

75.22 Argument Error Reporting

Fast paths still need precise error behavior.

Example:

def f(a, b):
    pass

f(1)

must raise an error like:

missing required positional argument

Similarly:

f(1, 2, 3)

must report too many arguments.

Keyword errors must also be exact:

unexpected keyword argument
multiple values for argument
missing keyword-only argument

The fast path cannot produce vague or incorrect errors.

75.23 Recursion Checks

Each Python call can increase recursion depth.

CPython must check recursion limits to prevent uncontrolled C stack growth.

Example:

def f():
    return f()

f()

Eventually raises:

RecursionError

Fast call paths must preserve recursion checks.

The call may be optimized, but entering Python execution still needs recursion accounting.

75.24 Tracing and Profiling

Tracing and profiling hooks complicate calls.

Tools may need events for:

call
return
exception
line
opcode

When tracing or profiling is active, CPython may take slower paths so it can produce correct events.

Fast call paths are most effective when tracing is disabled.

75.25 Reference Ownership in Calls

Calls are dense with reference count operations.

The interpreter must manage:

callable reference
argument references
default references
temporary references
return value reference
exception state references
frame references

A correct fast path must be as careful as the generic path.

A small mistake can create:

memory leak
double free
use after free
incorrect object lifetime

Call optimization is therefore performance-sensitive and correctness-sensitive.

75.26 Return Value Handling

A call returns either:

new reference to result
exception indicator

The caller must:

push result on stack
or propagate exception

For Python functions, returning from the callee frame produces the result.

For C functions, the C API convention is usually:

return PyObject * on success
return NULL with exception set on failure

The call instruction must handle both conventions correctly.

75.27 Calls and Exceptions

Any call can raise.

f()

may raise because:

f explicitly raises
argument binding fails
descriptor lookup fails
__call__ fails
allocation fails
recursion limit is exceeded
C extension reports error

Fast paths must always include a clean exception exit path.

That path must unwind stack state and maintain frame invariants.

75.28 Calls and Inline Caches

Call specialization uses inline caches.

A call site may cache:

callable kind
call target
argument count
keyword shape
function version
type version
descriptor result

The cache allows the interpreter to say:

this site is still calling the same kind of thing
use the fast path

If the callable changes, validation fails and CPython falls back.

75.29 Calls and Global Lookup

Many calls start with a global lookup:

len(xs)
print(x)
range(n)
isinstance(x, T)

Optimization often involves two stages:

LOAD_GLOBAL specialization
CALL specialization

For len(xs):

cache builtin len
recognize builtin call shape
call C implementation efficiently

The combined effect is larger than either optimization alone.

75.30 Calls and Attribute Lookup

Method calls combine attribute lookup and call execution.

obj.append(x)

The optimized path may include:

specialized LOAD_METHOD
specialized PRECALL
specialized CALL

For common built-in types such as list, dict, and str, this can be highly optimized.

Example:

items.append(x)

is a very common operation. CPython invests heavily in making this shape efficient.

75.31 Calls and Frame Allocation

Frame allocation used to be a significant cost.

Modern CPython reduces this cost through internal frame representations and frame object laziness.

A Python call needs execution state, but it does not always need a full heap-allocated Python frame object visible to user code.

A full frame object may be materialized only when needed, such as for:

tracebacks
inspect.currentframe()
debuggers
profilers
generators and coroutines

This reduces ordinary call overhead.

75.32 Inlining

CPython generally does not inline Python functions in the classic compiler sense.

For example:

def add(a, b):
    return a + b

x = add(1, 2)

CPython does not normally replace the call with:

x = 1 + 2

Function inlining is difficult in Python because:

functions can be rebound
globals can change
defaults can change
closures exist
tracing expects frames
introspection expects call structure
exceptions need correct tracebacks

Instead, CPython focuses on reducing call overhead rather than eliminating calls entirely.

75.33 Tail Calls

CPython does not perform general tail call optimization.

Example:

def fact(n, acc=1):
    if n == 0:
        return acc
    return fact(n - 1, acc * n)

This still consumes one Python call frame per recursive call.

Reasons include:

debuggability
traceback preservation
introspection behavior
semantic expectations
implementation simplicity

Function call fast paths reduce overhead, but they do not turn recursive Python into loops.

75.34 Calls to C Extensions

C extension calls can be fast when they use modern calling conventions.

Older extension functions may use:

METH_VARARGS

which receives a tuple of arguments.

Newer forms can use:

METH_FASTCALL
METH_FASTCALL | METH_KEYWORDS

These integrate better with vectorcall-style argument passing.

Extension authors can significantly affect call overhead by choosing the right calling convention.

75.35 Bound Methods

A bound method object stores:

underlying function
bound self object

Example:

m = obj.method
m(1)

Here the bound method object is observable and must exist.

But for immediate calls:

obj.method(1)

CPython can often avoid allocating it.

This distinction is important:

obj.method

requires an object result.

obj.method()

may use a call-specific optimization.

75.36 Constructors

Calling a class:

C(x)

normally performs:

type.__call__
C.__new__
C.__init__

Fast paths are limited by Python’s object construction semantics.

Custom metaclasses, custom __new__, and custom __init__ can all affect behavior.

Still, common built-in constructors such as:

list(x)
dict(x)
tuple(x)
int(x)
str(x)

often use efficient C-level call paths.

75.37 Fast Paths Are Conservative

CPython call fast paths are conservative.

They optimize when assumptions are cheap to validate and safe.

They fall back when:

call target changes
argument shape changes
tracing is active
descriptor behavior is unusual
callable type is unknown
keyword handling is complex

This is the central rule:

fast path for common case
generic path for everything else

75.38 Performance Guidelines From Internals

Understanding call fast paths suggests practical Python guidelines:

PatternReason
Prefer simple positional calls in hot loopsCheapest argument binding
Avoid unnecessary wrappers in inner loopsEach wrapper adds another call
Move repeated dynamic lookup out of hot loops when usefulReduces lookup plus call overhead
Use builtins directly where appropriateMany builtins have efficient C paths
Avoid excessive *args and **kwargs in hot pathsForces general argument handling
Keep call sites type-stableHelps adaptive specialization

These are not rigid rules. They are performance heuristics.

Correct design comes first. Optimize only where measurement shows call overhead matters.

75.39 Reading Call Paths in CPython

Important areas to study:

AreaPurpose
Python/ceval.cExecutes call opcodes
Objects/call.cGeneric and vectorcall helpers
Include/cpython/abstract.hPublic call APIs
Include/internal/pycore_call.hInternal call helpers
Objects/methodobject.cBuiltin function and method objects
Objects/funcobject.cPython function objects
Objects/typeobject.cType calls and descriptor machinery

The exact layout changes across releases, but these areas contain the main machinery.

75.40 Mental Model

A useful model:

A Python call is argument layout plus callable dispatch plus frame or C entry.

Fast paths optimize all three:

argument layout:
    avoid tuple and dict construction

callable dispatch:
    recognize common callable kinds

execution entry:
    use direct Python frame setup or C vectorcall

The generic call protocol remains available for every unusual case.

75.41 Chapter Summary

Function call fast paths reduce the overhead of one of Python’s most frequent operations.

They rely on:

vectorcall
fast locals
method-call optimization
bound method avoidance
call-site specialization
inline caches
efficient frame setup
C fastcall conventions

These optimizations do not remove Python’s dynamic call semantics. They make common call shapes cheaper while preserving correctness for descriptors, keyword arguments, *args, **kwargs, tracing, exceptions, recursion checks, and the C API.