PEP 590 vectorcall protocol, _Py_TPFLAGS_HAVE_VECTORCALL, and stack-based argument passing for zero-overhead calls.
Vectorcall is CPython’s fast calling convention for many callable objects.
Its purpose is to avoid building temporary argument containers for common function calls. Older generic calls often packaged positional arguments into a tuple and keyword arguments into a dictionary. That representation is flexible, but expensive. Vectorcall passes arguments as a compact C array of PyObject * pointers, often pointing directly into the interpreter stack.
Conceptually:
generic call:
callable(args_tuple, kwargs_dict)
vectorcall:
callable(args_array, nargs, kwnames)The result is lower allocation pressure, fewer reference count operations, and faster dispatch for common calls.
76.1 The Problem Vectorcall Solves
Consider:
f(a, b, c)A generic call representation may need to create:
args_tuple = (a, b, c)
kwargs_dict = NULLFor keyword calls:
f(a, b, limit=10)it may need:
args_tuple = (a, b)
kwargs_dict = {"limit": 10}That creates overhead before the callee does any useful work.
In hot code, this cost is significant:
for item in items:
total += f(item)If f(item) runs millions of times, allocating temporary argument containers millions of times is wasteful.
Vectorcall avoids that packaging when the callable supports it.
76.2 Core Calling Shape
At the C level, vectorcall is based on a function pointer type roughly shaped like:
typedef PyObject *(*vectorcallfunc)(
PyObject *callable,
PyObject *const *args,
size_t nargsf,
PyObject *kwnames
);The arguments mean:
| Parameter | Meaning |
|---|---|
callable | The object being called |
args | Pointer to an array of positional and keyword argument values |
nargsf | Encoded positional argument count, plus possible flag bits |
kwnames | Tuple of keyword names, or NULL when no keywords are present |
The array does not own the objects. It is a view over existing references, commonly values already on the interpreter stack.
76.3 Positional Arguments
For a call:
f(x, y, z)the vectorcall argument array is conceptually:
args[0] = x
args[1] = y
args[2] = z
nargs = 3
kwnames = NULLNo tuple needs to be created.
The callee receives a pointer and a count. It reads arguments directly from the array.
This is the fastest and simplest vectorcall case.
76.4 Keyword Arguments
For a call:
f(x, y, limit=10, strict=True)the vectorcall layout is conceptually:
args[0] = x
args[1] = y
args[2] = 10
args[3] = True
nargs = 2
kwnames = ("limit", "strict")The keyword names tuple describes the trailing keyword values.
The first nargs entries are positional arguments. The remaining entries correspond to names in kwnames.
keyword value for kwnames[0] is args[nargs + 0]
keyword value for kwnames[1] is args[nargs + 1]This layout avoids building a keyword dictionary for many calls.
A dictionary is still needed when the callee accepts **kwargs or when generic APIs demand dictionary form.
76.5 nargsf
The nargsf parameter is not just a plain count.
It encodes:
number of positional arguments
optional call flagsThe most important flag is used by call paths that include the callable object inside the argument vector.
This lets CPython use stack layouts efficiently without moving elements.
Conceptually:
args[-1] may contain callable when a special flag is setThe exact bit layout is an internal detail, but the design goal is clear: avoid stack shuffling and allocation.
76.6 Stack-Based Argument Passing
The CPython evaluation loop already stores values on a stack.
For:
f(a, b)the stack near the call site may look like:
f
a
bA slow generic call would copy a and b into a tuple.
Vectorcall can pass a pointer into this stack region:
args -> [a, b]
nargs = 2The callable consumes the argument view.
After the call returns, the interpreter removes the callable and arguments from the stack and pushes the result.
76.7 Why It Is Called Vectorcall
The word “vector” refers to passing arguments as a vector: a contiguous array of object pointers.
It does not mean SIMD vectorization.
Vectorcall is about call ABI layout inside CPython, not CPU vector instructions.
The “vector” is:
PyObject *const *argsa pointer to consecutive Python object pointers.
76.8 Which Objects Support Vectorcall
Many callable types can support vectorcall:
Python functions
builtin functions
builtin methods
method descriptors
classes
some C extension callables
partial objectsSupport depends on the object’s type and its internal call slot.
A callable that does not support vectorcall still works through the generic call protocol.
The fast path is optional.
Correctness does not depend on vectorcall.
76.9 Python Function Vectorcall
Python functions support vectorcall.
For:
def add(a, b):
return a + b
add(1, 2)vectorcall helps CPython pass arguments into the new frame efficiently.
The callee still executes Python bytecode. Vectorcall does not turn Python code into C code. It reduces the overhead of entering the function.
The function call still needs:
argument binding
frame initialization
recursion checks
bytecode execution
return handlingBut it can skip temporary argument tuple construction in common cases.
76.10 Builtin Function Vectorcall
Builtins benefit strongly from vectorcall.
Example:
len(xs)A vectorcall-aware builtin receives arguments directly.
Conceptually:
static PyObject *
builtin_len(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
{
if (nargs != 1) {
error...
}
return PyLong_FromSsize_t(PyObject_Length(args[0]));
}No Python frame is created. No argument tuple is needed.
This is why builtins with simple signatures can be much cheaper than equivalent Python wrappers.
76.11 Method Calls and Vectorcall
Method calls combine method lookup and calling convention.
For:
obj.method(x)CPython tries to avoid creating a temporary bound method object.
The optimized shape is:
function
self
xThe call then proceeds through vectorcall with self inserted as the first argument.
Conceptually:
obj.method(x)becomes close to:
C.method(obj, x)internally, when semantics allow it.
This avoids allocation of:
bound method objectwhich would otherwise contain:
function pointer
self reference76.12 Bound Method Vectorcall
A bound method object can also use vectorcall.
Example:
m = obj.method
m(x)Here, the bound method object is observable and must exist.
But when called, it can still pass arguments efficiently:
self
xThe method object stores self, then vectorcall builds an argument view that includes it.
76.13 PyObject_Vectorcall
CPython exposes helper APIs for vectorcall.
Conceptually:
PyObject *PyObject_Vectorcall(
PyObject *callable,
PyObject *const *args,
size_t nargsf,
PyObject *kwnames
);This calls a Python object using the vectorcall convention when possible.
There are also helper APIs for method calls and calls with known argument layouts.
Extension code can use these helpers to avoid unnecessary tuple and dict allocation.
76.14 Relationship to PyObject_Call
The traditional generic API is:
PyObject *PyObject_Call(
PyObject *callable,
PyObject *args,
PyObject *kwargs
);where:
args is a tuple
kwargs is a dict or NULLThis remains important because it is flexible and stable.
Vectorcall is the faster path when the caller already has arguments in array form.
Comparison:
| API | Argument representation | Typical cost |
|---|---|---|
PyObject_Call | Tuple plus dict | More allocation and normalization |
PyObject_Vectorcall | Array plus keyword-name tuple | Less allocation in common cases |
76.15 Vectorcall and C Extensions
C extension authors can expose vectorcall support.
This matters for performance-sensitive extension modules.
Older calling conventions include:
METH_VARARGSwhere arguments arrive as a tuple.
Faster conventions include:
METH_FASTCALL
METH_FASTCALL | METH_KEYWORDSThese receive arguments in array form, closer to vectorcall.
A C function declared with fastcall-style conventions can avoid tuple creation during calls.
76.16 METH_FASTCALL
A fastcall C function is conceptually shaped like:
static PyObject *
func(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
{
...
}This handles positional arguments only.
For:
func(a, b)the function receives:
args[0] = a
args[1] = b
nargs = 2The function must validate the argument count and types.
76.17 METH_FASTCALL | METH_KEYWORDS
For keyword support, a C function can use:
static PyObject *
func(PyObject *self,
PyObject *const *args,
Py_ssize_t nargs,
PyObject *kwnames)
{
...
}This receives positional values and keyword values in the same array, with keyword names separated into kwnames.
This is the C extension analogue of vectorcall keyword layout.
76.18 Argument Clinic
Many CPython builtins use Argument Clinic.
Argument Clinic generates argument parsing code for C functions.
It can generate fastcall-compatible wrappers, which gives builtins efficient signatures without hand-writing all parsing logic.
It also helps keep:
C implementation
Python signature
documentation
error messagesconsistent.
For CPython internals work, Argument Clinic matters because many builtin call paths go through generated wrappers.
76.19 Vectorcall and functools.partial
functools.partial stores a callable plus pre-bound arguments.
Example:
from functools import partial
add_one = partial(pow, 2)
add_one(10)This means:
pow(2, 10)A vectorcall-aware partial can combine stored arguments and new arguments efficiently.
It still may need temporary storage for the merged argument view, but it avoids heavier generic call paths where possible.
76.20 Vectorcall and Class Construction
Classes are callable.
obj = C(a, b)The call path goes through the type’s call behavior:
metaclass call
__new__
__init__Vectorcall can optimize parts of this path, especially argument passing.
However, construction remains semantically rich:
custom metaclass
custom __new__
custom __init__
descriptor behavior
allocation
initializationSo class calls are usually more complex than simple function calls.
76.21 Vectorcall and Descriptors
Descriptor binding interacts with vectorcall.
A function stored in a class is a descriptor. When accessed through an instance, it binds self.
class C:
def f(self, x):
return x + 1
c = C()
c.f(10)The optimized method path avoids creating a bound method object, but it must still preserve descriptor semantics.
For custom descriptors:
class D:
def __get__(self, obj, typ):
...CPython often needs the generic path because binding behavior is user-defined.
76.22 Vectorcall and Keyword Semantics
Vectorcall improves layout, but it does not simplify Python’s keyword rules away.
The callee must still detect:
unexpected keyword arguments
duplicate values
missing required arguments
keyword-only parameters
positional-only violationsExample:
def f(a, /, b):
return a + b
f(a=1, b=2)must still fail because a is positional-only.
Vectorcall optimizes transport of arguments, not the semantic rules of binding.
76.23 Vectorcall and *args
Calls using *args are less direct:
f(*args)CPython must expand the iterable or tuple into call arguments.
If args is already a tuple, CPython can often use its internal array of items efficiently.
If it is an arbitrary iterable, CPython must first materialize the positional arguments.
Vectorcall can be used after expansion, but it cannot avoid the semantic need to expand.
76.24 Vectorcall and **kwargs
For:
f(**kwargs)CPython must expand a mapping into keyword names and values.
This is more expensive than a static keyword call:
f(x=1, y=2)because the keys and values are dynamic.
Vectorcall may still be the final call convention, but the keyword mapping must first be processed.
76.25 Vectorcall and Introspection
Python supports rich introspection.
Examples:
inspect.signature(f)
f.__defaults__
f.__kwdefaults__
f.__code__Vectorcall must remain compatible with this model.
It changes how calls are transported internally. It does not change the callable’s Python-visible signature.
A Python function called through vectorcall still behaves like the same Python function.
76.26 Vectorcall and Error Messages
Call errors must remain precise.
Example:
def f(a, b):
pass
f(1)must report a missing argument.
Example:
f(1, 2, 3)must report too many arguments.
Vectorcall fast paths still need exact error behavior.
This is a major reason call optimization is complicated: the fast path must produce the same visible errors as the slow path.
76.27 Vectorcall and Reference Counts
The argument array usually contains borrowed references or stack-owned references.
The vectorcall callee must not assume ownership unless documented.
Return values follow normal CPython conventions:
success:
return new reference
failure:
return NULL and set exceptionCallers and callees must carefully follow ownership rules.
Vectorcall reduces temporary object creation, but it does not remove reference-count discipline.
76.28 Vectorcall and Recursion Checks
Vectorcall does not bypass recursion protection.
A Python function called through vectorcall still enters Python execution.
Recursive code:
def f():
return f()
f()must still raise:
RecursionErrorFast calling convention and recursion accounting are separate concerns.
76.29 Vectorcall and Tracing
When tracing or profiling is active, CPython may need to produce call events.
Examples:
function call event
function return event
exception event
line eventVectorcall still works, but the effective performance benefit may be smaller because tracing forces extra runtime work.
Fast paths are most visible when tracing and profiling are disabled.
76.30 Vectorcall and the Stable ABI
Vectorcall touches C API and ABI design.
Some vectorcall details are CPython-specific. Extension authors targeting the limited API or stable ABI must be careful about which APIs and struct fields they rely on.
The design goal is to expose useful fast call mechanisms while avoiding unnecessary commitment to internal layout details.
76.31 Why Vectorcall Matters
Vectorcall matters because calls are everywhere.
A small reduction in call overhead helps:
builtin-heavy code
method-heavy code
small helper functions
iterator pipelines
decorator wrappers
numeric dispatch into C extensions
standard library internalsIt is not a magic optimization. It does not make Python bytecode execute like native code. It removes unnecessary packaging around calls.
That removal is valuable because it occurs on a very hot path.
76.32 Performance Example
Consider:
def inc(x):
return x + 1
total = 0
for i in range(1_000_000):
total += inc(i)Each loop iteration performs a Python function call.
Vectorcall helps with entering inc, but the call still needs:
argument binding
frame setup
bytecode execution
return handlingNow compare:
total = 0
for i in range(1_000_000):
total += i + 1This avoids the function call entirely.
Vectorcall reduces call overhead. It does not make calls free.
76.33 Extension Author Guidelines
For C extension authors, useful rules are:
| Pattern | Reason |
|---|---|
| Prefer fastcall-compatible signatures for hot callables | Avoid tuple allocation |
Avoid unnecessary PyObject_Call when arguments are already in an array | Preserve fast layout |
| Use Argument Clinic where appropriate | Generates correct fast wrappers |
| Keep error handling exact | Fast paths must match Python semantics |
| Respect ownership rules | Vectorcall still uses normal reference counting |
These choices can materially affect extension performance.
76.34 Reading Vectorcall in CPython
Important source areas include:
| Area | Purpose |
|---|---|
Objects/call.c | Generic call and vectorcall helpers |
Objects/funcobject.c | Python function call behavior |
Objects/methodobject.c | Builtin functions and method objects |
Objects/typeobject.c | Type call and descriptor behavior |
Include/cpython/abstract.h | Public call declarations |
Include/internal/pycore_call.h | Internal call helpers |
Python/ceval.c | Bytecode-level call execution |
When reading the source, track three things:
where the callable comes from
how arguments are laid out
which call slot or helper is used76.35 Mental Model
A useful model:
Vectorcall passes arguments as a borrowed view over a contiguous array.The core benefit:
do not build a tuple unless a tuple is needed
do not build a dict unless a dict is needed
do not move stack values unless movement is neededThe rest of Python’s call semantics remain intact.
76.36 Chapter Summary
Vectorcall is CPython’s fast internal calling convention.
It represents arguments as:
PyObject *const *args
nargsf
kwnamesrather than:
args tuple
kwargs dictThis reduces allocation and reference-count overhead on one of CPython’s hottest paths.
Vectorcall works with Python functions, builtins, methods, classes, and extension callables that support it. It preserves Python’s full call semantics, including keyword binding, descriptors, recursion checks, tracing, error messages, and reference ownership rules.