AST node types defined in Parser/Python.asdl, the ast module, and how AST nodes map to source locations.
The abstract syntax tree, usually called the AST, is the structured representation of Python source code after parsing.
The tokenizer produces a flat stream of tokens. The parser recognizes grammar. The AST records the result as a tree of statements and expressions.
For this source:
x = 1 + 2the AST is shaped like this:
Module
Assign
targets:
Name id="x", ctx=Store
value:
BinOp
left: Constant value=1
op: Add
right: Constant value=2The AST discards many surface details. It does not preserve most whitespace, comments, or exact formatting. It keeps the structure needed for semantic analysis and bytecode generation.
20.1 Position in the Compilation Pipeline
The AST sits between parsing and compiler analysis.
source bytes
↓
tokenization
↓
parsing
↓
AST
↓
AST validation
↓
symbol table
↓
compiler
↓
code object
↓
bytecode executionThe parser builds the AST. Later phases consume it.
The AST answers questions such as:
What statements are in this module?
What expression is assigned to this name?
What is the body of this function?
What names are loaded, stored, or deleted?
What is the test expression of this if statement?
What is the element expression of this comprehension?
What source position produced this node?It does not answer questions such as:
Is this name local or global?
What bytecode should be emitted?
What stack size is needed?
What constants should be placed in co_consts?
What exception table entries are needed?Those belong to later compiler stages.
20.2 Concrete Syntax Tree vs Abstract Syntax Tree
A parser can produce a concrete syntax tree or an abstract syntax tree.
A concrete syntax tree preserves grammar detail. It represents the source closely.
An abstract syntax tree preserves semantic structure. It removes details that no later compiler phase needs.
For example:
x = (((1)))The concrete syntax contains three pairs of parentheses.
The AST can represent this simply as:
Assign
target: Name "x"
value: Constant 1The parentheses affected parsing, but they do not create runtime behavior by themselves.
Likewise:
x = 1 + 2The AST keeps the addition because it affects runtime behavior.
BinOp
left: Constant 1
op: Add
right: Constant 2The AST is therefore abstract in the practical compiler sense. It keeps operations, not all spelling.
20.3 Python-Level AST Access
Python exposes AST functionality through the ast module.
Example:
import ast
tree = ast.parse("x = 1 + 2\n")
print(ast.dump(tree, indent=4))Typical output:
Module(
body=[
Assign(
targets=[
Name(id='x', ctx=Store())],
value=BinOp(
left=Constant(value=1),
op=Add(),
right=Constant(value=2)))],
type_ignores=[])This is the same general structure CPython uses internally, exposed as Python objects.
The ast module is useful for:
static analysis
code generation
linting
refactoring
macro-like source transforms
security review tools
documentation tools
teaching compiler structureBut the public AST API is still an abstraction. CPython’s internal C representation and public Python AST classes are related, but they are not the same physical objects.
20.4 Module Nodes
The root of a normal Python file is a Module.
Example:
x = 1
print(x)AST shape:
Module
body:
Assign
Expr
type_ignores:
[]The body is an ordered list of statements.
At module level, the compiler emits bytecode that executes statements from top to bottom.
There are different root node types depending on parse mode:
| Parse mode | Root node | Example |
|---|---|---|
exec | Module | x = 1 |
eval | Expression | 1 + 2 |
single | Interactive | REPL input |
| function type mode | FunctionType | legacy type comments |
Example:
import ast
print(type(ast.parse("x = 1", mode="exec")).__name__)
print(type(ast.parse("1 + 2", mode="eval")).__name__)
print(type(ast.parse("print(1)", mode="single")).__name__)The grammar start mode changes the root AST shape.
20.5 Statement Nodes
Statements represent actions.
Common statement nodes include:
| Node | Source example |
|---|---|
Assign | x = 1 |
AnnAssign | x: int = 1 |
AugAssign | x += 1 |
Return | return x |
If | if x: ... |
For | for x in xs: ... |
While | while x: ... |
FunctionDef | def f(): ... |
AsyncFunctionDef | async def f(): ... |
ClassDef | class C: ... |
Try | try: ... except: ... |
With | with open(p) as f: ... |
Import | import os |
ImportFrom | from os import path |
Expr | f() as a statement |
Pass | pass |
Break | break |
Continue | continue |
Raise | raise ValueError() |
Assert | assert x |
Delete | del x |
Global | global x |
Nonlocal | nonlocal x |
A statement node appears where the grammar expects a statement. Statement nodes usually live in a body, orelse, finalbody, or similar list.
Example:
if x:
a = 1
else:
a = 2AST shape:
If
test: Name "x", Load
body:
Assign a = 1
orelse:
Assign a = 2The else block is represented by the orelse field.
20.6 Expression Nodes
Expressions produce values.
Common expression nodes include:
| Node | Source example |
|---|---|
Constant | 1, "x", None |
Name | x |
BinOp | a + b |
UnaryOp | -x |
BoolOp | a and b |
Compare | a < b |
Call | f(x) |
Attribute | obj.name |
Subscript | xs[i] |
List | [1, 2] |
Tuple | (1, 2) |
Dict | {"a": 1} |
Set | {1, 2} |
ListComp | [x for x in xs] |
GeneratorExp | (x for x in xs) |
Lambda | lambda x: x + 1 |
IfExp | a if cond else b |
Yield | yield x |
Await | await task |
NamedExpr | (x := f()) |
Example:
result = obj.method(x, y=2)AST shape:
Assign
target:
Name "result", Store
value:
Call
func:
Attribute
value: Name "obj", Load
attr: "method"
ctx: Load
args:
Name "x", Load
keywords:
keyword arg="y", value=Constant 2The AST records the call target, positional arguments, keyword arguments, and attribute access.
20.7 Expression Contexts
Some expression nodes carry context.
The most important contexts are:
| Context | Meaning |
|---|---|
Load | Read a value |
Store | Assign to a target |
Del | Delete a binding or target |
Example:
x = x + 1AST shape:
Assign
target:
Name "x", Store
value:
BinOp
left: Name "x", Load
op: Add
right: Constant 1The same spelling, x, appears twice with different meaning.
Example with attributes:
obj.value = obj.value + 1AST shape:
Assign
target:
Attribute obj.value, Store
value:
BinOp
left: Attribute obj.value, Load
op: Add
right: Constant 1Context is essential for code generation. Loading a name, storing a name, and deleting a name use different bytecode instructions.
20.8 Constants
Modern Python AST uses Constant for many literal values.
Examples:
1
3.14
"hello"
b"bytes"
None
True
False
...Each can appear as:
Constant(value=...)Container literals are separate nodes:
[1, 2]
(1, 2)
{"a": 1}
{1, 2}AST shape:
List
elts:
Constant 1
Constant 2
Tuple
elts:
Constant 1
Constant 2
Dict
keys:
Constant "a"
values:
Constant 1
Set
elts:
Constant 1
Constant 2The compiler may later fold some constants. For example:
x = 1 + 2can compile to a constant 3 in bytecode depending on optimization rules. But the AST still records a binary operation before optimization.
20.9 Operators as AST Nodes
Operators are represented as small AST node objects.
Examples:
| Source | AST operator |
|---|---|
a + b | Add |
a - b | Sub |
a * b | Mult |
a / b | Div |
a // b | FloorDiv |
a % b | Mod |
a ** b | Pow |
a @ b | MatMult |
a << b | LShift |
a >> b | RShift |
a | b | BitOr |
a & b | BitAnd |
a ^ b | BitXor |
Example:
x = a + bAST:
BinOp
left: Name "a"
op: Add
right: Name "b"The operator is not stored as a string. It is represented structurally.
This makes downstream compiler code simpler. It can switch on operator node type rather than parse textual operators again.
20.10 Comparisons
Comparisons use a special AST shape because Python supports chained comparison.
Example:
a < b < cThis means:
a < b and b < cexcept that b is evaluated once.
AST shape:
Compare
left: Name "a"
ops:
Lt
Lt
comparators:
Name "b"
Name "c"This differs from binary operations.
A binary expression like:
a + b + cis nested:
BinOp
left:
BinOp
left: a
op: Add
right: b
op: Add
right: cA chained comparison is represented as one Compare node with multiple operators and comparators.
20.11 Boolean Operations
Boolean operations also have a distinct AST shape.
Example:
a and b and cAST shape:
BoolOp
op: And
values:
Name "a"
Name "b"
Name "c"Similarly:
a or b or cuses BoolOp with Or.
This shape preserves short-circuit structure. The compiler emits bytecode that evaluates operands left to right and stops when the result is known.
Boolean operations are not ordinary function calls. They have special evaluation rules.
20.12 Function Definitions
A function definition is represented by FunctionDef.
Example:
def add(a: int, b: int = 0) -> int:
return a + bAST shape:
FunctionDef
name: "add"
args:
arguments
args:
arg "a", annotation: Name "int"
arg "b", annotation: Name "int"
defaults:
Constant 0
returns:
Name "int"
body:
Return
BinOp
Name "a"
Add
Name "b"
decorator_list:
[]A function definition node stores:
function name
argument structure
body statements
decorators
return annotation
type comment
source locationThe function body is not executed while parsing. It becomes part of the function’s code object later. At runtime, executing the def statement creates a function object.
20.13 Arguments
Function arguments are represented by an arguments node.
It must support all Python parameter forms:
def f(a, b=1, /, c=2, *args, d, e=3, **kwargs):
passThe AST must distinguish:
positional-only parameters
positional-or-keyword parameters
varargs parameter
keyword-only parameters
keyword-only defaults
kwargs parameter
defaults
annotationsThis structure is more detailed than a simple list of names.
The compiler later uses it to build a code object with the correct argument counts and flags.
20.14 Class Definitions
A class definition is represented by ClassDef.
Example:
class Point(Base):
x = 0
y = 0AST shape:
ClassDef
name: "Point"
bases:
Name "Base"
keywords:
[]
body:
Assign x = 0
Assign y = 0
decorator_list:
[]A class body is compiled as an executable block. At runtime, CPython executes the class body to produce a namespace, then calls the metaclass machinery to create the class object.
The AST only records the class definition structure. It does not build the class.
20.15 Comprehensions
Comprehensions have dedicated AST nodes.
Examples:
[x for x in xs]
{x for x in xs}
{k: v for k, v in pairs}
(x for x in xs)AST node types:
| Source form | AST node |
|---|---|
| List comprehension | ListComp |
| Set comprehension | SetComp |
| Dict comprehension | DictComp |
| Generator expression | GeneratorExp |
Each uses one or more comprehension nodes.
Example:
[x * x for x in xs if x > 0]AST shape:
ListComp
elt:
BinOp x * x
generators:
comprehension
target: Name "x", Store
iter: Name "xs", Load
ifs:
Compare x > 0
is_async: 0Comprehensions are expression nodes, but they introduce local binding behavior. Later compiler phases handle their scope carefully.
20.16 Pattern Matching AST
Python pattern matching uses dedicated AST nodes.
Example:
match value:
case 0:
result = "zero"
case [x, y]:
result = x + yAST shape:
Match
subject:
Name "value"
cases:
match_case
pattern: MatchValue Constant 0
body:
Assign result = "zero"
match_case
pattern: MatchSequence
MatchAs name="x"
MatchAs name="y"
body:
Assign result = x + yPattern nodes are separate from expression nodes because pattern syntax has different meaning.
For example:
case x:does not load a variable named x. It captures a value into the name x.
That is why pattern matching cannot reuse ordinary expression AST nodes everywhere. It needs pattern-specific structure.
20.17 Async AST Nodes
Async syntax also appears in the AST.
Examples:
async def fetch():
await client.get(url)AST shape:
AsyncFunctionDef
name: "fetch"
body:
Expr
Await
Call
Attribute client.getAsync loops and context managers also have dedicated forms:
async for item in stream:
process(item)
async with lock:
run()AST nodes:
AsyncFor
AsyncWith
Await
AsyncFunctionDefThese nodes allow later compiler phases to generate coroutine-aware bytecode.
20.18 Source Locations
AST nodes carry source positions.
Common fields include:
lineno
col_offset
end_lineno
end_col_offsetExample:
import ast
tree = ast.parse("x = 1 + 2\n")
assign = tree.body[0]
print(assign.lineno)
print(assign.col_offset)
print(assign.end_lineno)
print(assign.end_col_offset)Source locations are used by:
tracebacks
syntax errors
debuggers
profilers
coverage tools
AST transformers
editor toolingWhen generating AST manually, missing source locations can cause compilation errors. The helper ast.fix_missing_locations() fills missing location data where possible.
Example:
import ast
tree = ast.Module(
body=[
ast.Assign(
targets=[ast.Name(id="x", ctx=ast.Store())],
value=ast.Constant(value=1),
)
],
type_ignores=[],
)
tree = ast.fix_missing_locations(tree)
code = compile(tree, "<ast>", "exec")
exec(code)20.19 AST Validation
Before compiling an AST, CPython validates it.
Validation catches malformed trees such as:
Name with no context
assignment target with Load context
expression where statement is required
statement where expression is required
invalid source location fields
invalid operator node placement
missing required fieldsExample of an invalid AST:
import ast
tree = ast.Module(
body=[
ast.Assign(
targets=[ast.Name(id="x", ctx=ast.Load())],
value=ast.Constant(value=1),
)
],
type_ignores=[],
)
tree = ast.fix_missing_locations(tree)
compile(tree, "<ast>", "exec")The target x has Load context, but assignment requires Store. CPython rejects this.
This matters because the public ast module allows users to construct trees manually. The compiler must not assume that every user-created AST is valid.
20.20 AST Transformations
AST tools can modify code before compilation.
Example: replace every integer constant 1 with 42.
import ast
class ReplaceOne(ast.NodeTransformer):
def visit_Constant(self, node):
if node.value == 1:
return ast.copy_location(ast.Constant(value=42), node)
return node
tree = ast.parse("x = 1\n")
tree = ReplaceOne().visit(tree)
tree = ast.fix_missing_locations(tree)
code = compile(tree, "<ast>", "exec")
ns = {}
exec(code, ns)
print(ns["x"])This prints:
42AST transformation is powerful but constrained. The transformed tree must still obey AST validity rules.
20.21 AST vs Bytecode
AST and bytecode represent different levels.
For this source:
x = 1 + 2AST:
Assign
Name "x", Store
BinOp
Constant 1
Add
Constant 2Bytecode may look like:
LOAD_CONST 3
STORE_NAME x
RETURN_CONST NoneThe bytecode may already contain optimized constants. It also contains execution instructions, not source syntax.
Comparison:
| Layer | Represents | Preserves |
|---|---|---|
| Tokens | Lexical source units | spelling, comments in public tokenizer, positions |
| AST | Syntax structure | statements, expressions, contexts, positions |
| Bytecode | Execution plan | stack operations, constants, names, jumps |
The AST is closer to source than bytecode, but less exact than tokens.
20.22 AST and Symbol Tables
The symbol table pass reads the AST to classify names.
Example:
x = 1
def f():
return xThe AST says:
module assigns x
function f loads xThe symbol table decides:
x is global from inside f
f is local to module scopeExample with closure:
def outer():
x = 1
def inner():
return x
return innerThe AST records nested functions and name usage. The symbol table determines that:
x is local to outer
x is free in inner
x must be stored in a closure cellThis is not purely parsing. It requires scope analysis over the AST.
20.23 AST and Compiler Generation
The compiler walks the AST and emits bytecode.
Simplified model:
compile Module:
compile each statement
compile Assign:
compile right-hand expression
compile each target in Store context
compile BinOp:
compile left expression
compile right expression
emit binary operation instruction
compile If:
compile test
emit conditional jump
compile body
emit jump over else
compile else bodyThe compiler is a tree walker with control-flow and stack-management logic.
It depends on AST structure being correct. A malformed AST could cause wrong bytecode, crashes, or invalid runtime state, which is why validation exists.
20.24 CPython AST Source Files
Important CPython areas include:
Parser/Python.asdl
Python/Python-ast.c
Python/ast.c
Python/ast_opt.c
Python/symtable.c
Python/compile.c
Lib/ast.pyThe ASDL file defines the AST node schema. Generated C code provides AST structures and constructors. Parser and AST code build nodes from grammar matches. The optimizer and compiler consume the tree.
Conceptual roles:
| Area | Role |
|---|---|
| ASDL schema | Defines AST node types and fields |
| Parser actions | Create AST nodes from grammar matches |
| AST validation | Reject malformed trees |
| AST optimizer | Performs tree-level simplifications |
| Symbol table | Classifies names and scopes |
| Compiler | Emits bytecode from AST |
Lib/ast.py | Python-level AST helpers |
20.25 Minimal Mental Model
Use this model:
The AST is CPython’s structured syntax tree.
It is built by the parser.
It removes most formatting details.
It records statements, expressions, operators, contexts, and source locations.
It separates syntax structure from later semantic analysis.
The symbol table pass reads the AST to classify names.
The compiler walks the AST to emit bytecode.The AST is the central handoff object between syntax and execution.