Skip to content

76. Vectorcall

PEP 590 vectorcall protocol, _Py_TPFLAGS_HAVE_VECTORCALL, and stack-based argument passing for zero-overhead calls.

Vectorcall is CPython’s fast calling convention for many callable objects.

Its purpose is to avoid building temporary argument containers for common function calls. Older generic calls often packaged positional arguments into a tuple and keyword arguments into a dictionary. That representation is flexible, but expensive. Vectorcall passes arguments as a compact C array of PyObject * pointers, often pointing directly into the interpreter stack.

Conceptually:

generic call:
    callable(args_tuple, kwargs_dict)

vectorcall:
    callable(args_array, nargs, kwnames)

The result is lower allocation pressure, fewer reference count operations, and faster dispatch for common calls.

76.1 The Problem Vectorcall Solves

Consider:

f(a, b, c)

A generic call representation may need to create:

args_tuple = (a, b, c)
kwargs_dict = NULL

For keyword calls:

f(a, b, limit=10)

it may need:

args_tuple = (a, b)
kwargs_dict = {"limit": 10}

That creates overhead before the callee does any useful work.

In hot code, this cost is significant:

for item in items:
    total += f(item)

If f(item) runs millions of times, allocating temporary argument containers millions of times is wasteful.

Vectorcall avoids that packaging when the callable supports it.

76.2 Core Calling Shape

At the C level, vectorcall is based on a function pointer type roughly shaped like:

typedef PyObject *(*vectorcallfunc)(
    PyObject *callable,
    PyObject *const *args,
    size_t nargsf,
    PyObject *kwnames
);

The arguments mean:

ParameterMeaning
callableThe object being called
argsPointer to an array of positional and keyword argument values
nargsfEncoded positional argument count, plus possible flag bits
kwnamesTuple of keyword names, or NULL when no keywords are present

The array does not own the objects. It is a view over existing references, commonly values already on the interpreter stack.

76.3 Positional Arguments

For a call:

f(x, y, z)

the vectorcall argument array is conceptually:

args[0] = x
args[1] = y
args[2] = z

nargs = 3
kwnames = NULL

No tuple needs to be created.

The callee receives a pointer and a count. It reads arguments directly from the array.

This is the fastest and simplest vectorcall case.

76.4 Keyword Arguments

For a call:

f(x, y, limit=10, strict=True)

the vectorcall layout is conceptually:

args[0] = x
args[1] = y
args[2] = 10
args[3] = True

nargs = 2
kwnames = ("limit", "strict")

The keyword names tuple describes the trailing keyword values.

The first nargs entries are positional arguments. The remaining entries correspond to names in kwnames.

keyword value for kwnames[0] is args[nargs + 0]
keyword value for kwnames[1] is args[nargs + 1]

This layout avoids building a keyword dictionary for many calls.

A dictionary is still needed when the callee accepts **kwargs or when generic APIs demand dictionary form.

76.5 nargsf

The nargsf parameter is not just a plain count.

It encodes:

number of positional arguments
optional call flags

The most important flag is used by call paths that include the callable object inside the argument vector.

This lets CPython use stack layouts efficiently without moving elements.

Conceptually:

args[-1] may contain callable when a special flag is set

The exact bit layout is an internal detail, but the design goal is clear: avoid stack shuffling and allocation.

76.6 Stack-Based Argument Passing

The CPython evaluation loop already stores values on a stack.

For:

f(a, b)

the stack near the call site may look like:

f
a
b

A slow generic call would copy a and b into a tuple.

Vectorcall can pass a pointer into this stack region:

args -> [a, b]
nargs = 2

The callable consumes the argument view.

After the call returns, the interpreter removes the callable and arguments from the stack and pushes the result.

76.7 Why It Is Called Vectorcall

The word “vector” refers to passing arguments as a vector: a contiguous array of object pointers.

It does not mean SIMD vectorization.

Vectorcall is about call ABI layout inside CPython, not CPU vector instructions.

The “vector” is:

PyObject *const *args

a pointer to consecutive Python object pointers.

76.8 Which Objects Support Vectorcall

Many callable types can support vectorcall:

Python functions
builtin functions
builtin methods
method descriptors
classes
some C extension callables
partial objects

Support depends on the object’s type and its internal call slot.

A callable that does not support vectorcall still works through the generic call protocol.

The fast path is optional.

Correctness does not depend on vectorcall.

76.9 Python Function Vectorcall

Python functions support vectorcall.

For:

def add(a, b):
    return a + b

add(1, 2)

vectorcall helps CPython pass arguments into the new frame efficiently.

The callee still executes Python bytecode. Vectorcall does not turn Python code into C code. It reduces the overhead of entering the function.

The function call still needs:

argument binding
frame initialization
recursion checks
bytecode execution
return handling

But it can skip temporary argument tuple construction in common cases.

76.10 Builtin Function Vectorcall

Builtins benefit strongly from vectorcall.

Example:

len(xs)

A vectorcall-aware builtin receives arguments directly.

Conceptually:

static PyObject *
builtin_len(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
{
    if (nargs != 1) {
        error...
    }
    return PyLong_FromSsize_t(PyObject_Length(args[0]));
}

No Python frame is created. No argument tuple is needed.

This is why builtins with simple signatures can be much cheaper than equivalent Python wrappers.

76.11 Method Calls and Vectorcall

Method calls combine method lookup and calling convention.

For:

obj.method(x)

CPython tries to avoid creating a temporary bound method object.

The optimized shape is:

function
self
x

The call then proceeds through vectorcall with self inserted as the first argument.

Conceptually:

obj.method(x)

becomes close to:

C.method(obj, x)

internally, when semantics allow it.

This avoids allocation of:

bound method object

which would otherwise contain:

function pointer
self reference

76.12 Bound Method Vectorcall

A bound method object can also use vectorcall.

Example:

m = obj.method
m(x)

Here, the bound method object is observable and must exist.

But when called, it can still pass arguments efficiently:

self
x

The method object stores self, then vectorcall builds an argument view that includes it.

76.13 PyObject_Vectorcall

CPython exposes helper APIs for vectorcall.

Conceptually:

PyObject *PyObject_Vectorcall(
    PyObject *callable,
    PyObject *const *args,
    size_t nargsf,
    PyObject *kwnames
);

This calls a Python object using the vectorcall convention when possible.

There are also helper APIs for method calls and calls with known argument layouts.

Extension code can use these helpers to avoid unnecessary tuple and dict allocation.

76.14 Relationship to PyObject_Call

The traditional generic API is:

PyObject *PyObject_Call(
    PyObject *callable,
    PyObject *args,
    PyObject *kwargs
);

where:

args is a tuple
kwargs is a dict or NULL

This remains important because it is flexible and stable.

Vectorcall is the faster path when the caller already has arguments in array form.

Comparison:

APIArgument representationTypical cost
PyObject_CallTuple plus dictMore allocation and normalization
PyObject_VectorcallArray plus keyword-name tupleLess allocation in common cases

76.15 Vectorcall and C Extensions

C extension authors can expose vectorcall support.

This matters for performance-sensitive extension modules.

Older calling conventions include:

METH_VARARGS

where arguments arrive as a tuple.

Faster conventions include:

METH_FASTCALL
METH_FASTCALL | METH_KEYWORDS

These receive arguments in array form, closer to vectorcall.

A C function declared with fastcall-style conventions can avoid tuple creation during calls.

76.16 METH_FASTCALL

A fastcall C function is conceptually shaped like:

static PyObject *
func(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
{
    ...
}

This handles positional arguments only.

For:

func(a, b)

the function receives:

args[0] = a
args[1] = b
nargs = 2

The function must validate the argument count and types.

76.17 METH_FASTCALL | METH_KEYWORDS

For keyword support, a C function can use:

static PyObject *
func(PyObject *self,
     PyObject *const *args,
     Py_ssize_t nargs,
     PyObject *kwnames)
{
    ...
}

This receives positional values and keyword values in the same array, with keyword names separated into kwnames.

This is the C extension analogue of vectorcall keyword layout.

76.18 Argument Clinic

Many CPython builtins use Argument Clinic.

Argument Clinic generates argument parsing code for C functions.

It can generate fastcall-compatible wrappers, which gives builtins efficient signatures without hand-writing all parsing logic.

It also helps keep:

C implementation
Python signature
documentation
error messages

consistent.

For CPython internals work, Argument Clinic matters because many builtin call paths go through generated wrappers.

76.19 Vectorcall and functools.partial

functools.partial stores a callable plus pre-bound arguments.

Example:

from functools import partial

add_one = partial(pow, 2)
add_one(10)

This means:

pow(2, 10)

A vectorcall-aware partial can combine stored arguments and new arguments efficiently.

It still may need temporary storage for the merged argument view, but it avoids heavier generic call paths where possible.

76.20 Vectorcall and Class Construction

Classes are callable.

obj = C(a, b)

The call path goes through the type’s call behavior:

metaclass call
__new__
__init__

Vectorcall can optimize parts of this path, especially argument passing.

However, construction remains semantically rich:

custom metaclass
custom __new__
custom __init__
descriptor behavior
allocation
initialization

So class calls are usually more complex than simple function calls.

76.21 Vectorcall and Descriptors

Descriptor binding interacts with vectorcall.

A function stored in a class is a descriptor. When accessed through an instance, it binds self.

class C:
    def f(self, x):
        return x + 1

c = C()
c.f(10)

The optimized method path avoids creating a bound method object, but it must still preserve descriptor semantics.

For custom descriptors:

class D:
    def __get__(self, obj, typ):
        ...

CPython often needs the generic path because binding behavior is user-defined.

76.22 Vectorcall and Keyword Semantics

Vectorcall improves layout, but it does not simplify Python’s keyword rules away.

The callee must still detect:

unexpected keyword arguments
duplicate values
missing required arguments
keyword-only parameters
positional-only violations

Example:

def f(a, /, b):
    return a + b

f(a=1, b=2)

must still fail because a is positional-only.

Vectorcall optimizes transport of arguments, not the semantic rules of binding.

76.23 Vectorcall and *args

Calls using *args are less direct:

f(*args)

CPython must expand the iterable or tuple into call arguments.

If args is already a tuple, CPython can often use its internal array of items efficiently.

If it is an arbitrary iterable, CPython must first materialize the positional arguments.

Vectorcall can be used after expansion, but it cannot avoid the semantic need to expand.

76.24 Vectorcall and **kwargs

For:

f(**kwargs)

CPython must expand a mapping into keyword names and values.

This is more expensive than a static keyword call:

f(x=1, y=2)

because the keys and values are dynamic.

Vectorcall may still be the final call convention, but the keyword mapping must first be processed.

76.25 Vectorcall and Introspection

Python supports rich introspection.

Examples:

inspect.signature(f)
f.__defaults__
f.__kwdefaults__
f.__code__

Vectorcall must remain compatible with this model.

It changes how calls are transported internally. It does not change the callable’s Python-visible signature.

A Python function called through vectorcall still behaves like the same Python function.

76.26 Vectorcall and Error Messages

Call errors must remain precise.

Example:

def f(a, b):
    pass

f(1)

must report a missing argument.

Example:

f(1, 2, 3)

must report too many arguments.

Vectorcall fast paths still need exact error behavior.

This is a major reason call optimization is complicated: the fast path must produce the same visible errors as the slow path.

76.27 Vectorcall and Reference Counts

The argument array usually contains borrowed references or stack-owned references.

The vectorcall callee must not assume ownership unless documented.

Return values follow normal CPython conventions:

success:
    return new reference

failure:
    return NULL and set exception

Callers and callees must carefully follow ownership rules.

Vectorcall reduces temporary object creation, but it does not remove reference-count discipline.

76.28 Vectorcall and Recursion Checks

Vectorcall does not bypass recursion protection.

A Python function called through vectorcall still enters Python execution.

Recursive code:

def f():
    return f()

f()

must still raise:

RecursionError

Fast calling convention and recursion accounting are separate concerns.

76.29 Vectorcall and Tracing

When tracing or profiling is active, CPython may need to produce call events.

Examples:

function call event
function return event
exception event
line event

Vectorcall still works, but the effective performance benefit may be smaller because tracing forces extra runtime work.

Fast paths are most visible when tracing and profiling are disabled.

76.30 Vectorcall and the Stable ABI

Vectorcall touches C API and ABI design.

Some vectorcall details are CPython-specific. Extension authors targeting the limited API or stable ABI must be careful about which APIs and struct fields they rely on.

The design goal is to expose useful fast call mechanisms while avoiding unnecessary commitment to internal layout details.

76.31 Why Vectorcall Matters

Vectorcall matters because calls are everywhere.

A small reduction in call overhead helps:

builtin-heavy code
method-heavy code
small helper functions
iterator pipelines
decorator wrappers
numeric dispatch into C extensions
standard library internals

It is not a magic optimization. It does not make Python bytecode execute like native code. It removes unnecessary packaging around calls.

That removal is valuable because it occurs on a very hot path.

76.32 Performance Example

Consider:

def inc(x):
    return x + 1

total = 0
for i in range(1_000_000):
    total += inc(i)

Each loop iteration performs a Python function call.

Vectorcall helps with entering inc, but the call still needs:

argument binding
frame setup
bytecode execution
return handling

Now compare:

total = 0
for i in range(1_000_000):
    total += i + 1

This avoids the function call entirely.

Vectorcall reduces call overhead. It does not make calls free.

76.33 Extension Author Guidelines

For C extension authors, useful rules are:

PatternReason
Prefer fastcall-compatible signatures for hot callablesAvoid tuple allocation
Avoid unnecessary PyObject_Call when arguments are already in an arrayPreserve fast layout
Use Argument Clinic where appropriateGenerates correct fast wrappers
Keep error handling exactFast paths must match Python semantics
Respect ownership rulesVectorcall still uses normal reference counting

These choices can materially affect extension performance.

76.34 Reading Vectorcall in CPython

Important source areas include:

AreaPurpose
Objects/call.cGeneric call and vectorcall helpers
Objects/funcobject.cPython function call behavior
Objects/methodobject.cBuiltin functions and method objects
Objects/typeobject.cType call and descriptor behavior
Include/cpython/abstract.hPublic call declarations
Include/internal/pycore_call.hInternal call helpers
Python/ceval.cBytecode-level call execution

When reading the source, track three things:

where the callable comes from
how arguments are laid out
which call slot or helper is used

76.35 Mental Model

A useful model:

Vectorcall passes arguments as a borrowed view over a contiguous array.

The core benefit:

do not build a tuple unless a tuple is needed
do not build a dict unless a dict is needed
do not move stack values unless movement is needed

The rest of Python’s call semantics remain intact.

76.36 Chapter Summary

Vectorcall is CPython’s fast internal calling convention.

It represents arguments as:

PyObject *const *args
nargsf
kwnames

rather than:

args tuple
kwargs dict

This reduces allocation and reference-count overhead on one of CPython’s hottest paths.

Vectorcall works with Python functions, builtins, methods, classes, and extension callables that support it. It preserves Python’s full call semantics, including keyword binding, descriptors, recursion checks, tracing, error messages, and reference ownership rules.