CALL_PY_EXACT_ARGS, CALL_BUILTIN_FAST, and the fast-path conditions that bypass the generic call machinery.
Function calls are one of the most important performance paths in CPython.
Python programs call functions constantly:
len(xs)
str(x)
range(n)
obj.method()
f(a, b)
callback(event)
decorator(fn)A call looks simple at the source level, but CPython must support many callable forms:
Python functions
bound methods
builtin functions
builtin methods
C extension callables
classes
objects with __call__
classmethod objects
staticmethod objects
partial objects
decorated wrappersA generic call path must also handle:
positional arguments
keyword arguments
default values
keyword-only arguments
positional-only arguments
*args
**kwargs
bound self
descriptors
argument errors
trace hooks
profiling hooksFunction call fast paths exist to make common calls avoid the full generic machinery.
75.1 Why Calls Are Expensive
A Python function call does much more than jump to a code address.
For this call:
result = f(x, y)CPython may need to:
load f
load x
load y
determine whether f is callable
prepare arguments
bind arguments to parameters
create or initialize a frame
set locals
handle defaults
handle keyword arguments
enter the evaluation loop
execute bytecode
return a result
deallocate or recycle frame stateFor a method call:
obj.f(x)there is additional work:
look up attribute f
apply descriptor protocol
bind obj as self
avoid or create bound method object
prepare final argument list
call targetFast paths reduce this overhead for common call shapes.
75.2 Calls in the Bytecode Stream
A call is represented by several bytecode instructions.
A simple call:
f(x)conceptually becomes:
LOAD_GLOBAL f
LOAD_FAST x
PRECALL
CALLA method call:
obj.f(x)conceptually becomes:
LOAD_FAST obj
LOAD_METHOD f
LOAD_FAST x
PRECALL
CALLThe exact bytecode changes by Python version, but the model is stable:
load callable
load arguments
prepare call
execute callThis split allows CPython to specialize different parts of the call sequence.
75.3 The Generic Call Path
The generic call path must support every callable.
Conceptually:
PyObject *result = PyObject_Call(callable, args_tuple, kwargs_dict);This representation is flexible, but expensive.
It often requires:
allocating a tuple for positional arguments
allocating a dict for keyword arguments
normalizing argument layout
checking callable protocol
dispatching through tp_call
handling errorsA fully generic call is necessary for correctness, but it is too expensive for hot loops and small functions.
75.4 Vectorcall
Vectorcall is CPython’s main fast calling convention for many callables.
Instead of packaging arguments into a tuple and dictionary, CPython passes arguments as a C array of PyObject *.
Conceptually:
result = vectorcallfunc(
callable,
args_array,
nargsf,
kwnames
);The important idea:
arguments are already on the stack
call target reads them directly
no temporary args tuple required
no temporary kwargs dict required in common casesFor a call like:
f(a, b, c)the stack already contains:
f
a
b
cVectorcall lets the callee consume that layout directly.
75.5 Argument Stack Layout
Before the call executes, CPython has already pushed the callable and arguments onto the frame stack.
Conceptually:
stack before CALL:
callable
arg0
arg1
arg2The call instruction knows the argument count.
Instead of constructing:
(args_tuple, kwargs_dict)the interpreter can pass a pointer into the stack.
This matters because small calls dominate Python workloads.
Avoiding temporary tuple and dict allocation removes significant overhead.
75.6 Positional Calls
The fastest common case is a positional-only call with a known callable shape:
f(a, b)No keyword matching is needed.
The call path can:
verify callable supports vectorcall
pass argument pointer and count
create frame if Python function
run directly if C builtin
return resultThis path avoids:
keyword dictionary construction
argument tuple construction
complex binding logicMany inner-loop calls fit this shape.
75.7 Keyword Calls
Keyword calls are more complex:
f(x=1, y=2)CPython must preserve keyword names and match them to parameters.
Vectorcall still avoids a full kwargs dictionary in many cases by passing keyword names separately.
Conceptually:
args array:
1
2
kwnames:
("x", "y")The callee can bind keyword values using the compact kwnames tuple.
This avoids building a full dictionary unless the function actually needs one, such as for **kwargs.
75.8 Python Function Calls
Calling a Python function usually means creating or initializing a new frame.
For:
def add(a, b):
return a + b
add(1, 2)CPython must:
load function object
read code object
map arguments to local slots
initialize frame
execute bytecode
return valueA Python function object contains:
code object
globals dictionary
defaults
keyword defaults
closure cells
annotations
qualname
moduleFast paths optimize argument binding and frame setup.
75.9 Fast Locals and Argument Binding
Python function parameters are stored in fast local slots.
For:
def f(a, b, c):
return a + b + cthe frame can store:
localsplus[0] = a
localsplus[1] = b
localsplus[2] = cA positional call can copy stack arguments directly into these slots.
This is much faster than creating a dictionary for locals.
The local dictionary seen through locals() is not the primary execution storage for optimized function execution.
75.10 Defaults
Default arguments complicate binding:
def f(a, b=10):
return a + bA call:
f(1)must fill b from the function’s defaults.
CPython stores defaults on the function object.
Fast binding logic can fill missing trailing positional parameters from the defaults tuple.
Conceptually:
provided:
a = 1
defaults:
b = 10
locals:
a = 1
b = 10This remains cheaper than the fully generic argument binding path.
75.11 Keyword-Only Arguments
Keyword-only arguments require separate handling:
def f(a, *, limit=10):
return a + limitA call:
f(3, limit=5)must bind limit by name.
The interpreter must check:
required keyword-only parameters
keyword-only defaults
unexpected keywords
duplicate bindingsFast paths can still help when keyword names are known and the shape is stable.
75.12 Positional-Only Arguments
CPython also supports positional-only arguments:
def f(a, b, /, c):
return a + b + cMany builtins use positional-only parameters.
This simplifies binding because those arguments cannot be passed by keyword.
A positional-only fast path can avoid keyword name checks for those parameters.
75.13 Builtin Function Calls
Builtin functions are often implemented in C.
Example:
len(xs)At runtime, this can call a C function directly through a fast calling convention.
A builtin fast path can avoid creating a Python frame.
Execution shape:
LOAD_GLOBAL len
LOAD_FAST xs
PRECALL
CALL
call C function
return PyObject *This is much cheaper than calling a Python function, although it still works with Python objects and reference counts.
75.14 Method Calls
Method calls are especially important.
Source:
obj.method(x)Naive execution would do this:
look up obj.method
create bound method object
call bound method with x
destroy bound methodA bound method object packages:
function
selfCPython avoids creating this temporary object for common method calls.
The bytecode sequence uses method-aware instructions.
Conceptually:
LOAD_METHOD method
LOAD_FAST x
CALLThe interpreter can keep the function and self separately on the stack.
75.15 Avoiding Bound Method Allocation
For a normal instance method:
class C:
def f(self, x):
return x + 1
obj = C()
obj.f(10)The optimized path can behave like:
find function C.f
push function
push self
push argument 10
call function with self insertedNo bound method object needs to be allocated.
This saves:
object allocation
reference count operations
temporary object lifetime management
extra indirectionMethod-call optimization is one of CPython’s most important object-oriented fast paths.
75.16 Descriptor-Aware Method Lookup
Method lookup must respect the descriptor protocol.
A function stored on a class is a descriptor. Accessing it through an instance normally produces a bound method.
But not every attribute access is a simple method:
property
staticmethod
classmethod
custom descriptor
plain callable object
data descriptor
non-data descriptorThe fast path must only skip bound method creation when semantics allow it.
If the descriptor behavior is unusual, CPython falls back to the generic path.
75.17 CALL Specialization
Modern CPython specializes call instructions.
Common specialized cases include:
Python function with positional arguments
builtin C function with vectorcall
method descriptor
bound method
class constructionSpecialization lets the CALL instruction skip general dispatch once the call site becomes stable.
A hot call site may repeatedly see the same callable type and same argument shape.
That pattern is exactly what adaptive specialization exploits.
75.18 PRECALL
PRECALL exists to prepare and specialize calls.
It gives the interpreter a separate point to optimize before the final call happens.
Conceptually:
PRECALL:
inspect callable shape
specialize call path
prepare cache state
CALL:
execute callSeparating preparation from execution gives CPython more room to optimize call sequences.
75.19 Vectorcall and Classes
Classes are callable.
obj = C(1, 2)This involves object construction:
call metaclass __call__
allocate instance through __new__
initialize instance through __init__
return instanceFast paths can help for common class construction patterns, but class calls are still more complex than ordinary function calls.
Class creation and instance construction involve descriptors, metaclasses, allocation, and initialization.
75.20 __call__ Objects
Any object can be callable if its type defines __call__.
Example:
class Adder:
def __call__(self, x):
return x + 1
add_one = Adder()
add_one(10)The generic callable protocol handles this.
Fast paths may help if the call target is stable, but the interpreter must still honor normal attribute lookup and descriptor semantics for __call__.
75.21 Star Arguments
Star arguments force more general handling.
f(*args)CPython must unpack args into positional arguments.
f(**kwargs)must unpack a mapping into keyword arguments.
f(*args, **kwargs)may require merging and duplicate checking.
These forms often reduce fast-path opportunities because the final argument shape is known only at runtime.
75.22 Argument Error Reporting
Fast paths still need precise error behavior.
Example:
def f(a, b):
pass
f(1)must raise an error like:
missing required positional argumentSimilarly:
f(1, 2, 3)must report too many arguments.
Keyword errors must also be exact:
unexpected keyword argument
multiple values for argument
missing keyword-only argumentThe fast path cannot produce vague or incorrect errors.
75.23 Recursion Checks
Each Python call can increase recursion depth.
CPython must check recursion limits to prevent uncontrolled C stack growth.
Example:
def f():
return f()
f()Eventually raises:
RecursionErrorFast call paths must preserve recursion checks.
The call may be optimized, but entering Python execution still needs recursion accounting.
75.24 Tracing and Profiling
Tracing and profiling hooks complicate calls.
Tools may need events for:
call
return
exception
line
opcodeWhen tracing or profiling is active, CPython may take slower paths so it can produce correct events.
Fast call paths are most effective when tracing is disabled.
75.25 Reference Ownership in Calls
Calls are dense with reference count operations.
The interpreter must manage:
callable reference
argument references
default references
temporary references
return value reference
exception state references
frame referencesA correct fast path must be as careful as the generic path.
A small mistake can create:
memory leak
double free
use after free
incorrect object lifetimeCall optimization is therefore performance-sensitive and correctness-sensitive.
75.26 Return Value Handling
A call returns either:
new reference to result
exception indicatorThe caller must:
push result on stack
or propagate exceptionFor Python functions, returning from the callee frame produces the result.
For C functions, the C API convention is usually:
return PyObject * on success
return NULL with exception set on failureThe call instruction must handle both conventions correctly.
75.27 Calls and Exceptions
Any call can raise.
f()may raise because:
f explicitly raises
argument binding fails
descriptor lookup fails
__call__ fails
allocation fails
recursion limit is exceeded
C extension reports errorFast paths must always include a clean exception exit path.
That path must unwind stack state and maintain frame invariants.
75.28 Calls and Inline Caches
Call specialization uses inline caches.
A call site may cache:
callable kind
call target
argument count
keyword shape
function version
type version
descriptor resultThe cache allows the interpreter to say:
this site is still calling the same kind of thing
use the fast pathIf the callable changes, validation fails and CPython falls back.
75.29 Calls and Global Lookup
Many calls start with a global lookup:
len(xs)
print(x)
range(n)
isinstance(x, T)Optimization often involves two stages:
LOAD_GLOBAL specialization
CALL specializationFor len(xs):
cache builtin len
recognize builtin call shape
call C implementation efficientlyThe combined effect is larger than either optimization alone.
75.30 Calls and Attribute Lookup
Method calls combine attribute lookup and call execution.
obj.append(x)The optimized path may include:
specialized LOAD_METHOD
specialized PRECALL
specialized CALLFor common built-in types such as list, dict, and str, this can be highly optimized.
Example:
items.append(x)is a very common operation. CPython invests heavily in making this shape efficient.
75.31 Calls and Frame Allocation
Frame allocation used to be a significant cost.
Modern CPython reduces this cost through internal frame representations and frame object laziness.
A Python call needs execution state, but it does not always need a full heap-allocated Python frame object visible to user code.
A full frame object may be materialized only when needed, such as for:
tracebacks
inspect.currentframe()
debuggers
profilers
generators and coroutinesThis reduces ordinary call overhead.
75.32 Inlining
CPython generally does not inline Python functions in the classic compiler sense.
For example:
def add(a, b):
return a + b
x = add(1, 2)CPython does not normally replace the call with:
x = 1 + 2Function inlining is difficult in Python because:
functions can be rebound
globals can change
defaults can change
closures exist
tracing expects frames
introspection expects call structure
exceptions need correct tracebacksInstead, CPython focuses on reducing call overhead rather than eliminating calls entirely.
75.33 Tail Calls
CPython does not perform general tail call optimization.
Example:
def fact(n, acc=1):
if n == 0:
return acc
return fact(n - 1, acc * n)This still consumes one Python call frame per recursive call.
Reasons include:
debuggability
traceback preservation
introspection behavior
semantic expectations
implementation simplicityFunction call fast paths reduce overhead, but they do not turn recursive Python into loops.
75.34 Calls to C Extensions
C extension calls can be fast when they use modern calling conventions.
Older extension functions may use:
METH_VARARGSwhich receives a tuple of arguments.
Newer forms can use:
METH_FASTCALL
METH_FASTCALL | METH_KEYWORDSThese integrate better with vectorcall-style argument passing.
Extension authors can significantly affect call overhead by choosing the right calling convention.
75.35 Bound Methods
A bound method object stores:
underlying function
bound self objectExample:
m = obj.method
m(1)Here the bound method object is observable and must exist.
But for immediate calls:
obj.method(1)CPython can often avoid allocating it.
This distinction is important:
obj.methodrequires an object result.
obj.method()may use a call-specific optimization.
75.36 Constructors
Calling a class:
C(x)normally performs:
type.__call__
C.__new__
C.__init__Fast paths are limited by Python’s object construction semantics.
Custom metaclasses, custom __new__, and custom __init__ can all affect behavior.
Still, common built-in constructors such as:
list(x)
dict(x)
tuple(x)
int(x)
str(x)often use efficient C-level call paths.
75.37 Fast Paths Are Conservative
CPython call fast paths are conservative.
They optimize when assumptions are cheap to validate and safe.
They fall back when:
call target changes
argument shape changes
tracing is active
descriptor behavior is unusual
callable type is unknown
keyword handling is complexThis is the central rule:
fast path for common case
generic path for everything else75.38 Performance Guidelines From Internals
Understanding call fast paths suggests practical Python guidelines:
| Pattern | Reason |
|---|---|
| Prefer simple positional calls in hot loops | Cheapest argument binding |
| Avoid unnecessary wrappers in inner loops | Each wrapper adds another call |
| Move repeated dynamic lookup out of hot loops when useful | Reduces lookup plus call overhead |
| Use builtins directly where appropriate | Many builtins have efficient C paths |
Avoid excessive *args and **kwargs in hot paths | Forces general argument handling |
| Keep call sites type-stable | Helps adaptive specialization |
These are not rigid rules. They are performance heuristics.
Correct design comes first. Optimize only where measurement shows call overhead matters.
75.39 Reading Call Paths in CPython
Important areas to study:
| Area | Purpose |
|---|---|
Python/ceval.c | Executes call opcodes |
Objects/call.c | Generic and vectorcall helpers |
Include/cpython/abstract.h | Public call APIs |
Include/internal/pycore_call.h | Internal call helpers |
Objects/methodobject.c | Builtin function and method objects |
Objects/funcobject.c | Python function objects |
Objects/typeobject.c | Type calls and descriptor machinery |
The exact layout changes across releases, but these areas contain the main machinery.
75.40 Mental Model
A useful model:
A Python call is argument layout plus callable dispatch plus frame or C entry.Fast paths optimize all three:
argument layout:
avoid tuple and dict construction
callable dispatch:
recognize common callable kinds
execution entry:
use direct Python frame setup or C vectorcallThe generic call protocol remains available for every unusual case.
75.41 Chapter Summary
Function call fast paths reduce the overhead of one of Python’s most frequent operations.
They rely on:
vectorcall
fast locals
method-call optimization
bound method avoidance
call-site specialization
inline caches
efficient frame setup
C fastcall conventionsThese optimizations do not remove Python’s dynamic call semantics. They make common call shapes cheaper while preserving correctness for descriptors, keyword arguments, *args, **kwargs, tracing, exceptions, recursion checks, and the C API.