Skip to content

20. The AST

AST node types defined in Parser/Python.asdl, the ast module, and how AST nodes map to source locations.

The abstract syntax tree, usually called the AST, is the structured representation of Python source code after parsing.

The tokenizer produces a flat stream of tokens. The parser recognizes grammar. The AST records the result as a tree of statements and expressions.

For this source:

x = 1 + 2

the AST is shaped like this:

Module
    Assign
        targets:
            Name id="x", ctx=Store
        value:
            BinOp
                left: Constant value=1
                op: Add
                right: Constant value=2

The AST discards many surface details. It does not preserve most whitespace, comments, or exact formatting. It keeps the structure needed for semantic analysis and bytecode generation.

20.1 Position in the Compilation Pipeline

The AST sits between parsing and compiler analysis.

source bytes
tokenization
parsing
AST
AST validation
symbol table
compiler
code object
bytecode execution

The parser builds the AST. Later phases consume it.

The AST answers questions such as:

What statements are in this module?
What expression is assigned to this name?
What is the body of this function?
What names are loaded, stored, or deleted?
What is the test expression of this if statement?
What is the element expression of this comprehension?
What source position produced this node?

It does not answer questions such as:

Is this name local or global?
What bytecode should be emitted?
What stack size is needed?
What constants should be placed in co_consts?
What exception table entries are needed?

Those belong to later compiler stages.

20.2 Concrete Syntax Tree vs Abstract Syntax Tree

A parser can produce a concrete syntax tree or an abstract syntax tree.

A concrete syntax tree preserves grammar detail. It represents the source closely.

An abstract syntax tree preserves semantic structure. It removes details that no later compiler phase needs.

For example:

x = (((1)))

The concrete syntax contains three pairs of parentheses.

The AST can represent this simply as:

Assign
    target: Name "x"
    value: Constant 1

The parentheses affected parsing, but they do not create runtime behavior by themselves.

Likewise:

x = 1 + 2

The AST keeps the addition because it affects runtime behavior.

BinOp
    left: Constant 1
    op: Add
    right: Constant 2

The AST is therefore abstract in the practical compiler sense. It keeps operations, not all spelling.

20.3 Python-Level AST Access

Python exposes AST functionality through the ast module.

Example:

import ast

tree = ast.parse("x = 1 + 2\n")
print(ast.dump(tree, indent=4))

Typical output:

Module(
    body=[
        Assign(
            targets=[
                Name(id='x', ctx=Store())],
            value=BinOp(
                left=Constant(value=1),
                op=Add(),
                right=Constant(value=2)))],
    type_ignores=[])

This is the same general structure CPython uses internally, exposed as Python objects.

The ast module is useful for:

static analysis
code generation
linting
refactoring
macro-like source transforms
security review tools
documentation tools
teaching compiler structure

But the public AST API is still an abstraction. CPython’s internal C representation and public Python AST classes are related, but they are not the same physical objects.

20.4 Module Nodes

The root of a normal Python file is a Module.

Example:

x = 1
print(x)

AST shape:

Module
    body:
        Assign
        Expr
    type_ignores:
        []

The body is an ordered list of statements.

At module level, the compiler emits bytecode that executes statements from top to bottom.

There are different root node types depending on parse mode:

Parse modeRoot nodeExample
execModulex = 1
evalExpression1 + 2
singleInteractiveREPL input
function type modeFunctionTypelegacy type comments

Example:

import ast

print(type(ast.parse("x = 1", mode="exec")).__name__)
print(type(ast.parse("1 + 2", mode="eval")).__name__)
print(type(ast.parse("print(1)", mode="single")).__name__)

The grammar start mode changes the root AST shape.

20.5 Statement Nodes

Statements represent actions.

Common statement nodes include:

NodeSource example
Assignx = 1
AnnAssignx: int = 1
AugAssignx += 1
Returnreturn x
Ifif x: ...
Forfor x in xs: ...
Whilewhile x: ...
FunctionDefdef f(): ...
AsyncFunctionDefasync def f(): ...
ClassDefclass C: ...
Trytry: ... except: ...
Withwith open(p) as f: ...
Importimport os
ImportFromfrom os import path
Exprf() as a statement
Passpass
Breakbreak
Continuecontinue
Raiseraise ValueError()
Assertassert x
Deletedel x
Globalglobal x
Nonlocalnonlocal x

A statement node appears where the grammar expects a statement. Statement nodes usually live in a body, orelse, finalbody, or similar list.

Example:

if x:
    a = 1
else:
    a = 2

AST shape:

If
    test: Name "x", Load
    body:
        Assign a = 1
    orelse:
        Assign a = 2

The else block is represented by the orelse field.

20.6 Expression Nodes

Expressions produce values.

Common expression nodes include:

NodeSource example
Constant1, "x", None
Namex
BinOpa + b
UnaryOp-x
BoolOpa and b
Comparea < b
Callf(x)
Attributeobj.name
Subscriptxs[i]
List[1, 2]
Tuple(1, 2)
Dict{"a": 1}
Set{1, 2}
ListComp[x for x in xs]
GeneratorExp(x for x in xs)
Lambdalambda x: x + 1
IfExpa if cond else b
Yieldyield x
Awaitawait task
NamedExpr(x := f())

Example:

result = obj.method(x, y=2)

AST shape:

Assign
    target:
        Name "result", Store
    value:
        Call
            func:
                Attribute
                    value: Name "obj", Load
                    attr: "method"
                    ctx: Load
            args:
                Name "x", Load
            keywords:
                keyword arg="y", value=Constant 2

The AST records the call target, positional arguments, keyword arguments, and attribute access.

20.7 Expression Contexts

Some expression nodes carry context.

The most important contexts are:

ContextMeaning
LoadRead a value
StoreAssign to a target
DelDelete a binding or target

Example:

x = x + 1

AST shape:

Assign
    target:
        Name "x", Store
    value:
        BinOp
            left: Name "x", Load
            op: Add
            right: Constant 1

The same spelling, x, appears twice with different meaning.

Example with attributes:

obj.value = obj.value + 1

AST shape:

Assign
    target:
        Attribute obj.value, Store
    value:
        BinOp
            left: Attribute obj.value, Load
            op: Add
            right: Constant 1

Context is essential for code generation. Loading a name, storing a name, and deleting a name use different bytecode instructions.

20.8 Constants

Modern Python AST uses Constant for many literal values.

Examples:

1
3.14
"hello"
b"bytes"
None
True
False
...

Each can appear as:

Constant(value=...)

Container literals are separate nodes:

[1, 2]
(1, 2)
{"a": 1}
{1, 2}

AST shape:

List
    elts:
        Constant 1
        Constant 2

Tuple
    elts:
        Constant 1
        Constant 2

Dict
    keys:
        Constant "a"
    values:
        Constant 1

Set
    elts:
        Constant 1
        Constant 2

The compiler may later fold some constants. For example:

x = 1 + 2

can compile to a constant 3 in bytecode depending on optimization rules. But the AST still records a binary operation before optimization.

20.9 Operators as AST Nodes

Operators are represented as small AST node objects.

Examples:

SourceAST operator
a + bAdd
a - bSub
a * bMult
a / bDiv
a // bFloorDiv
a % bMod
a ** bPow
a @ bMatMult
a << bLShift
a >> bRShift
a | bBitOr
a & bBitAnd
a ^ bBitXor

Example:

x = a + b

AST:

BinOp
    left: Name "a"
    op: Add
    right: Name "b"

The operator is not stored as a string. It is represented structurally.

This makes downstream compiler code simpler. It can switch on operator node type rather than parse textual operators again.

20.10 Comparisons

Comparisons use a special AST shape because Python supports chained comparison.

Example:

a < b < c

This means:

a < b and b < c

except that b is evaluated once.

AST shape:

Compare
    left: Name "a"
    ops:
        Lt
        Lt
    comparators:
        Name "b"
        Name "c"

This differs from binary operations.

A binary expression like:

a + b + c

is nested:

BinOp
    left:
        BinOp
            left: a
            op: Add
            right: b
    op: Add
    right: c

A chained comparison is represented as one Compare node with multiple operators and comparators.

20.11 Boolean Operations

Boolean operations also have a distinct AST shape.

Example:

a and b and c

AST shape:

BoolOp
    op: And
    values:
        Name "a"
        Name "b"
        Name "c"

Similarly:

a or b or c

uses BoolOp with Or.

This shape preserves short-circuit structure. The compiler emits bytecode that evaluates operands left to right and stops when the result is known.

Boolean operations are not ordinary function calls. They have special evaluation rules.

20.12 Function Definitions

A function definition is represented by FunctionDef.

Example:

def add(a: int, b: int = 0) -> int:
    return a + b

AST shape:

FunctionDef
    name: "add"
    args:
        arguments
            args:
                arg "a", annotation: Name "int"
                arg "b", annotation: Name "int"
            defaults:
                Constant 0
    returns:
        Name "int"
    body:
        Return
            BinOp
                Name "a"
                Add
                Name "b"
    decorator_list:
        []

A function definition node stores:

function name
argument structure
body statements
decorators
return annotation
type comment
source location

The function body is not executed while parsing. It becomes part of the function’s code object later. At runtime, executing the def statement creates a function object.

20.13 Arguments

Function arguments are represented by an arguments node.

It must support all Python parameter forms:

def f(a, b=1, /, c=2, *args, d, e=3, **kwargs):
    pass

The AST must distinguish:

positional-only parameters
positional-or-keyword parameters
varargs parameter
keyword-only parameters
keyword-only defaults
kwargs parameter
defaults
annotations

This structure is more detailed than a simple list of names.

The compiler later uses it to build a code object with the correct argument counts and flags.

20.14 Class Definitions

A class definition is represented by ClassDef.

Example:

class Point(Base):
    x = 0
    y = 0

AST shape:

ClassDef
    name: "Point"
    bases:
        Name "Base"
    keywords:
        []
    body:
        Assign x = 0
        Assign y = 0
    decorator_list:
        []

A class body is compiled as an executable block. At runtime, CPython executes the class body to produce a namespace, then calls the metaclass machinery to create the class object.

The AST only records the class definition structure. It does not build the class.

20.15 Comprehensions

Comprehensions have dedicated AST nodes.

Examples:

[x for x in xs]
{x for x in xs}
{k: v for k, v in pairs}
(x for x in xs)

AST node types:

Source formAST node
List comprehensionListComp
Set comprehensionSetComp
Dict comprehensionDictComp
Generator expressionGeneratorExp

Each uses one or more comprehension nodes.

Example:

[x * x for x in xs if x > 0]

AST shape:

ListComp
    elt:
        BinOp x * x
    generators:
        comprehension
            target: Name "x", Store
            iter: Name "xs", Load
            ifs:
                Compare x > 0
            is_async: 0

Comprehensions are expression nodes, but they introduce local binding behavior. Later compiler phases handle their scope carefully.

20.16 Pattern Matching AST

Python pattern matching uses dedicated AST nodes.

Example:

match value:
    case 0:
        result = "zero"
    case [x, y]:
        result = x + y

AST shape:

Match
    subject:
        Name "value"
    cases:
        match_case
            pattern: MatchValue Constant 0
            body:
                Assign result = "zero"
        match_case
            pattern: MatchSequence
                MatchAs name="x"
                MatchAs name="y"
            body:
                Assign result = x + y

Pattern nodes are separate from expression nodes because pattern syntax has different meaning.

For example:

case x:

does not load a variable named x. It captures a value into the name x.

That is why pattern matching cannot reuse ordinary expression AST nodes everywhere. It needs pattern-specific structure.

20.17 Async AST Nodes

Async syntax also appears in the AST.

Examples:

async def fetch():
    await client.get(url)

AST shape:

AsyncFunctionDef
    name: "fetch"
    body:
        Expr
            Await
                Call
                    Attribute client.get

Async loops and context managers also have dedicated forms:

async for item in stream:
    process(item)

async with lock:
    run()

AST nodes:

AsyncFor
AsyncWith
Await
AsyncFunctionDef

These nodes allow later compiler phases to generate coroutine-aware bytecode.

20.18 Source Locations

AST nodes carry source positions.

Common fields include:

lineno
col_offset
end_lineno
end_col_offset

Example:

import ast

tree = ast.parse("x = 1 + 2\n")
assign = tree.body[0]

print(assign.lineno)
print(assign.col_offset)
print(assign.end_lineno)
print(assign.end_col_offset)

Source locations are used by:

tracebacks
syntax errors
debuggers
profilers
coverage tools
AST transformers
editor tooling

When generating AST manually, missing source locations can cause compilation errors. The helper ast.fix_missing_locations() fills missing location data where possible.

Example:

import ast

tree = ast.Module(
    body=[
        ast.Assign(
            targets=[ast.Name(id="x", ctx=ast.Store())],
            value=ast.Constant(value=1),
        )
    ],
    type_ignores=[],
)

tree = ast.fix_missing_locations(tree)
code = compile(tree, "<ast>", "exec")
exec(code)

20.19 AST Validation

Before compiling an AST, CPython validates it.

Validation catches malformed trees such as:

Name with no context
assignment target with Load context
expression where statement is required
statement where expression is required
invalid source location fields
invalid operator node placement
missing required fields

Example of an invalid AST:

import ast

tree = ast.Module(
    body=[
        ast.Assign(
            targets=[ast.Name(id="x", ctx=ast.Load())],
            value=ast.Constant(value=1),
        )
    ],
    type_ignores=[],
)

tree = ast.fix_missing_locations(tree)
compile(tree, "<ast>", "exec")

The target x has Load context, but assignment requires Store. CPython rejects this.

This matters because the public ast module allows users to construct trees manually. The compiler must not assume that every user-created AST is valid.

20.20 AST Transformations

AST tools can modify code before compilation.

Example: replace every integer constant 1 with 42.

import ast

class ReplaceOne(ast.NodeTransformer):
    def visit_Constant(self, node):
        if node.value == 1:
            return ast.copy_location(ast.Constant(value=42), node)
        return node

tree = ast.parse("x = 1\n")
tree = ReplaceOne().visit(tree)
tree = ast.fix_missing_locations(tree)

code = compile(tree, "<ast>", "exec")
ns = {}
exec(code, ns)

print(ns["x"])

This prints:

42

AST transformation is powerful but constrained. The transformed tree must still obey AST validity rules.

20.21 AST vs Bytecode

AST and bytecode represent different levels.

For this source:

x = 1 + 2

AST:

Assign
    Name "x", Store
    BinOp
        Constant 1
        Add
        Constant 2

Bytecode may look like:

LOAD_CONST 3
STORE_NAME x
RETURN_CONST None

The bytecode may already contain optimized constants. It also contains execution instructions, not source syntax.

Comparison:

LayerRepresentsPreserves
TokensLexical source unitsspelling, comments in public tokenizer, positions
ASTSyntax structurestatements, expressions, contexts, positions
BytecodeExecution planstack operations, constants, names, jumps

The AST is closer to source than bytecode, but less exact than tokens.

20.22 AST and Symbol Tables

The symbol table pass reads the AST to classify names.

Example:

x = 1

def f():
    return x

The AST says:

module assigns x
function f loads x

The symbol table decides:

x is global from inside f
f is local to module scope

Example with closure:

def outer():
    x = 1

    def inner():
        return x

    return inner

The AST records nested functions and name usage. The symbol table determines that:

x is local to outer
x is free in inner
x must be stored in a closure cell

This is not purely parsing. It requires scope analysis over the AST.

20.23 AST and Compiler Generation

The compiler walks the AST and emits bytecode.

Simplified model:

compile Module:
    compile each statement

compile Assign:
    compile right-hand expression
    compile each target in Store context

compile BinOp:
    compile left expression
    compile right expression
    emit binary operation instruction

compile If:
    compile test
    emit conditional jump
    compile body
    emit jump over else
    compile else body

The compiler is a tree walker with control-flow and stack-management logic.

It depends on AST structure being correct. A malformed AST could cause wrong bytecode, crashes, or invalid runtime state, which is why validation exists.

20.24 CPython AST Source Files

Important CPython areas include:

Parser/Python.asdl
Python/Python-ast.c
Python/ast.c
Python/ast_opt.c
Python/symtable.c
Python/compile.c
Lib/ast.py

The ASDL file defines the AST node schema. Generated C code provides AST structures and constructors. Parser and AST code build nodes from grammar matches. The optimizer and compiler consume the tree.

Conceptual roles:

AreaRole
ASDL schemaDefines AST node types and fields
Parser actionsCreate AST nodes from grammar matches
AST validationReject malformed trees
AST optimizerPerforms tree-level simplifications
Symbol tableClassifies names and scopes
CompilerEmits bytecode from AST
Lib/ast.pyPython-level AST helpers

20.25 Minimal Mental Model

Use this model:

The AST is CPython’s structured syntax tree.
It is built by the parser.
It removes most formatting details.
It records statements, expressions, operators, contexts, and source locations.
It separates syntax structure from later semantic analysis.
The symbol table pass reads the AST to classify names.
The compiler walks the AST to emit bytecode.

The AST is the central handoff object between syntax and execution.