# 20. The AST

# 20. The AST

The abstract syntax tree, usually called the AST, is the structured representation of Python source code after parsing.

The tokenizer produces a flat stream of tokens. The parser recognizes grammar. The AST records the result as a tree of statements and expressions.

For this source:

```python
x = 1 + 2
```

the AST is shaped like this:

```text
Module
    Assign
        targets:
            Name id="x", ctx=Store
        value:
            BinOp
                left: Constant value=1
                op: Add
                right: Constant value=2
```

The AST discards many surface details. It does not preserve most whitespace, comments, or exact formatting. It keeps the structure needed for semantic analysis and bytecode generation.

## 20.1 Position in the Compilation Pipeline

The AST sits between parsing and compiler analysis.

```text
source bytes
    ↓
tokenization
    ↓
parsing
    ↓
AST
    ↓
AST validation
    ↓
symbol table
    ↓
compiler
    ↓
code object
    ↓
bytecode execution
```

The parser builds the AST. Later phases consume it.

The AST answers questions such as:

```text
What statements are in this module?
What expression is assigned to this name?
What is the body of this function?
What names are loaded, stored, or deleted?
What is the test expression of this if statement?
What is the element expression of this comprehension?
What source position produced this node?
```

It does not answer questions such as:

```text
Is this name local or global?
What bytecode should be emitted?
What stack size is needed?
What constants should be placed in co_consts?
What exception table entries are needed?
```

Those belong to later compiler stages.

## 20.2 Concrete Syntax Tree vs Abstract Syntax Tree

A parser can produce a concrete syntax tree or an abstract syntax tree.

A concrete syntax tree preserves grammar detail. It represents the source closely.

An abstract syntax tree preserves semantic structure. It removes details that no later compiler phase needs.

For example:

```python
x = (((1)))
```

The concrete syntax contains three pairs of parentheses.

The AST can represent this simply as:

```text
Assign
    target: Name "x"
    value: Constant 1
```

The parentheses affected parsing, but they do not create runtime behavior by themselves.

Likewise:

```python
x = 1 + 2
```

The AST keeps the addition because it affects runtime behavior.

```text
BinOp
    left: Constant 1
    op: Add
    right: Constant 2
```

The AST is therefore abstract in the practical compiler sense. It keeps operations, not all spelling.

## 20.3 Python-Level AST Access

Python exposes AST functionality through the `ast` module.

Example:

```python
import ast

tree = ast.parse("x = 1 + 2\n")
print(ast.dump(tree, indent=4))
```

Typical output:

```text
Module(
    body=[
        Assign(
            targets=[
                Name(id='x', ctx=Store())],
            value=BinOp(
                left=Constant(value=1),
                op=Add(),
                right=Constant(value=2)))],
    type_ignores=[])
```

This is the same general structure CPython uses internally, exposed as Python objects.

The `ast` module is useful for:

```text
static analysis
code generation
linting
refactoring
macro-like source transforms
security review tools
documentation tools
teaching compiler structure
```

But the public AST API is still an abstraction. CPython’s internal C representation and public Python AST classes are related, but they are not the same physical objects.

## 20.4 Module Nodes

The root of a normal Python file is a `Module`.

Example:

```python
x = 1
print(x)
```

AST shape:

```text
Module
    body:
        Assign
        Expr
    type_ignores:
        []
```

The `body` is an ordered list of statements.

At module level, the compiler emits bytecode that executes statements from top to bottom.

There are different root node types depending on parse mode:

| Parse mode         | Root node      | Example              |
| ------------------ | -------------- | -------------------- |
| `exec`             | `Module`       | `x = 1`              |
| `eval`             | `Expression`   | `1 + 2`              |
| `single`           | `Interactive`  | REPL input           |
| function type mode | `FunctionType` | legacy type comments |

Example:

```python
import ast

print(type(ast.parse("x = 1", mode="exec")).__name__)
print(type(ast.parse("1 + 2", mode="eval")).__name__)
print(type(ast.parse("print(1)", mode="single")).__name__)
```

The grammar start mode changes the root AST shape.

## 20.5 Statement Nodes

Statements represent actions.

Common statement nodes include:

| Node               | Source example           |
| ------------------ | ------------------------ |
| `Assign`           | `x = 1`                  |
| `AnnAssign`        | `x: int = 1`             |
| `AugAssign`        | `x += 1`                 |
| `Return`           | `return x`               |
| `If`               | `if x: ...`              |
| `For`              | `for x in xs: ...`       |
| `While`            | `while x: ...`           |
| `FunctionDef`      | `def f(): ...`           |
| `AsyncFunctionDef` | `async def f(): ...`     |
| `ClassDef`         | `class C: ...`           |
| `Try`              | `try: ... except: ...`   |
| `With`             | `with open(p) as f: ...` |
| `Import`           | `import os`              |
| `ImportFrom`       | `from os import path`    |
| `Expr`             | `f()` as a statement     |
| `Pass`             | `pass`                   |
| `Break`            | `break`                  |
| `Continue`         | `continue`               |
| `Raise`            | `raise ValueError()`     |
| `Assert`           | `assert x`               |
| `Delete`           | `del x`                  |
| `Global`           | `global x`               |
| `Nonlocal`         | `nonlocal x`             |

A statement node appears where the grammar expects a statement. Statement nodes usually live in a `body`, `orelse`, `finalbody`, or similar list.

Example:

```python
if x:
    a = 1
else:
    a = 2
```

AST shape:

```text
If
    test: Name "x", Load
    body:
        Assign a = 1
    orelse:
        Assign a = 2
```

The `else` block is represented by the `orelse` field.

## 20.6 Expression Nodes

Expressions produce values.

Common expression nodes include:

| Node           | Source example     |
| -------------- | ------------------ |
| `Constant`     | `1`, `"x"`, `None` |
| `Name`         | `x`                |
| `BinOp`        | `a + b`            |
| `UnaryOp`      | `-x`               |
| `BoolOp`       | `a and b`          |
| `Compare`      | `a < b`            |
| `Call`         | `f(x)`             |
| `Attribute`    | `obj.name`         |
| `Subscript`    | `xs[i]`            |
| `List`         | `[1, 2]`           |
| `Tuple`        | `(1, 2)`           |
| `Dict`         | `{"a": 1}`         |
| `Set`          | `{1, 2}`           |
| `ListComp`     | `[x for x in xs]`  |
| `GeneratorExp` | `(x for x in xs)`  |
| `Lambda`       | `lambda x: x + 1`  |
| `IfExp`        | `a if cond else b` |
| `Yield`        | `yield x`          |
| `Await`        | `await task`       |
| `NamedExpr`    | `(x := f())`       |

Example:

```python
result = obj.method(x, y=2)
```

AST shape:

```text
Assign
    target:
        Name "result", Store
    value:
        Call
            func:
                Attribute
                    value: Name "obj", Load
                    attr: "method"
                    ctx: Load
            args:
                Name "x", Load
            keywords:
                keyword arg="y", value=Constant 2
```

The AST records the call target, positional arguments, keyword arguments, and attribute access.

## 20.7 Expression Contexts

Some expression nodes carry context.

The most important contexts are:

| Context | Meaning                    |
| ------- | -------------------------- |
| `Load`  | Read a value               |
| `Store` | Assign to a target         |
| `Del`   | Delete a binding or target |

Example:

```python
x = x + 1
```

AST shape:

```text
Assign
    target:
        Name "x", Store
    value:
        BinOp
            left: Name "x", Load
            op: Add
            right: Constant 1
```

The same spelling, `x`, appears twice with different meaning.

Example with attributes:

```python
obj.value = obj.value + 1
```

AST shape:

```text
Assign
    target:
        Attribute obj.value, Store
    value:
        BinOp
            left: Attribute obj.value, Load
            op: Add
            right: Constant 1
```

Context is essential for code generation. Loading a name, storing a name, and deleting a name use different bytecode instructions.

## 20.8 Constants

Modern Python AST uses `Constant` for many literal values.

Examples:

```python
1
3.14
"hello"
b"bytes"
None
True
False
...
```

Each can appear as:

```text
Constant(value=...)
```

Container literals are separate nodes:

```python
[1, 2]
(1, 2)
{"a": 1}
{1, 2}
```

AST shape:

```text
List
    elts:
        Constant 1
        Constant 2

Tuple
    elts:
        Constant 1
        Constant 2

Dict
    keys:
        Constant "a"
    values:
        Constant 1

Set
    elts:
        Constant 1
        Constant 2
```

The compiler may later fold some constants. For example:

```python
x = 1 + 2
```

can compile to a constant `3` in bytecode depending on optimization rules. But the AST still records a binary operation before optimization.

## 20.9 Operators as AST Nodes

Operators are represented as small AST node objects.

Examples:

| Source   | AST operator |
| -------- | ------------ |
| `a + b`  | `Add`        |
| `a - b`  | `Sub`        |
| `a * b`  | `Mult`       |
| `a / b`  | `Div`        |
| `a // b` | `FloorDiv`   |
| `a % b`  | `Mod`        |
| `a ** b` | `Pow`        |
| `a @ b`  | `MatMult`    |
| `a << b` | `LShift`     |
| `a >> b` | `RShift`     |
| `a \| b` | `BitOr`      |
| `a & b`  | `BitAnd`     |
| `a ^ b`  | `BitXor`     |

Example:

```python
x = a + b
```

AST:

```text
BinOp
    left: Name "a"
    op: Add
    right: Name "b"
```

The operator is not stored as a string. It is represented structurally.

This makes downstream compiler code simpler. It can switch on operator node type rather than parse textual operators again.

## 20.10 Comparisons

Comparisons use a special AST shape because Python supports chained comparison.

Example:

```python
a < b < c
```

This means:

```python
a < b and b < c
```

except that `b` is evaluated once.

AST shape:

```text
Compare
    left: Name "a"
    ops:
        Lt
        Lt
    comparators:
        Name "b"
        Name "c"
```

This differs from binary operations.

A binary expression like:

```python
a + b + c
```

is nested:

```text
BinOp
    left:
        BinOp
            left: a
            op: Add
            right: b
    op: Add
    right: c
```

A chained comparison is represented as one `Compare` node with multiple operators and comparators.

## 20.11 Boolean Operations

Boolean operations also have a distinct AST shape.

Example:

```python
a and b and c
```

AST shape:

```text
BoolOp
    op: And
    values:
        Name "a"
        Name "b"
        Name "c"
```

Similarly:

```python
a or b or c
```

uses `BoolOp` with `Or`.

This shape preserves short-circuit structure. The compiler emits bytecode that evaluates operands left to right and stops when the result is known.

Boolean operations are not ordinary function calls. They have special evaluation rules.

## 20.12 Function Definitions

A function definition is represented by `FunctionDef`.

Example:

```python
def add(a: int, b: int = 0) -> int:
    return a + b
```

AST shape:

```text
FunctionDef
    name: "add"
    args:
        arguments
            args:
                arg "a", annotation: Name "int"
                arg "b", annotation: Name "int"
            defaults:
                Constant 0
    returns:
        Name "int"
    body:
        Return
            BinOp
                Name "a"
                Add
                Name "b"
    decorator_list:
        []
```

A function definition node stores:

```text
function name
argument structure
body statements
decorators
return annotation
type comment
source location
```

The function body is not executed while parsing. It becomes part of the function’s code object later. At runtime, executing the `def` statement creates a function object.

## 20.13 Arguments

Function arguments are represented by an `arguments` node.

It must support all Python parameter forms:

```python
def f(a, b=1, /, c=2, *args, d, e=3, **kwargs):
    pass
```

The AST must distinguish:

```text
positional-only parameters
positional-or-keyword parameters
varargs parameter
keyword-only parameters
keyword-only defaults
kwargs parameter
defaults
annotations
```

This structure is more detailed than a simple list of names.

The compiler later uses it to build a code object with the correct argument counts and flags.

## 20.14 Class Definitions

A class definition is represented by `ClassDef`.

Example:

```python
class Point(Base):
    x = 0
    y = 0
```

AST shape:

```text
ClassDef
    name: "Point"
    bases:
        Name "Base"
    keywords:
        []
    body:
        Assign x = 0
        Assign y = 0
    decorator_list:
        []
```

A class body is compiled as an executable block. At runtime, CPython executes the class body to produce a namespace, then calls the metaclass machinery to create the class object.

The AST only records the class definition structure. It does not build the class.

## 20.15 Comprehensions

Comprehensions have dedicated AST nodes.

Examples:

```python
[x for x in xs]
{x for x in xs}
{k: v for k, v in pairs}
(x for x in xs)
```

AST node types:

| Source form          | AST node       |
| -------------------- | -------------- |
| List comprehension   | `ListComp`     |
| Set comprehension    | `SetComp`      |
| Dict comprehension   | `DictComp`     |
| Generator expression | `GeneratorExp` |

Each uses one or more `comprehension` nodes.

Example:

```python
[x * x for x in xs if x > 0]
```

AST shape:

```text
ListComp
    elt:
        BinOp x * x
    generators:
        comprehension
            target: Name "x", Store
            iter: Name "xs", Load
            ifs:
                Compare x > 0
            is_async: 0
```

Comprehensions are expression nodes, but they introduce local binding behavior. Later compiler phases handle their scope carefully.

## 20.16 Pattern Matching AST

Python pattern matching uses dedicated AST nodes.

Example:

```python
match value:
    case 0:
        result = "zero"
    case [x, y]:
        result = x + y
```

AST shape:

```text
Match
    subject:
        Name "value"
    cases:
        match_case
            pattern: MatchValue Constant 0
            body:
                Assign result = "zero"
        match_case
            pattern: MatchSequence
                MatchAs name="x"
                MatchAs name="y"
            body:
                Assign result = x + y
```

Pattern nodes are separate from expression nodes because pattern syntax has different meaning.

For example:

```python
case x:
```

does not load a variable named `x`. It captures a value into the name `x`.

That is why pattern matching cannot reuse ordinary expression AST nodes everywhere. It needs pattern-specific structure.

## 20.17 Async AST Nodes

Async syntax also appears in the AST.

Examples:

```python
async def fetch():
    await client.get(url)
```

AST shape:

```text
AsyncFunctionDef
    name: "fetch"
    body:
        Expr
            Await
                Call
                    Attribute client.get
```

Async loops and context managers also have dedicated forms:

```python
async for item in stream:
    process(item)

async with lock:
    run()
```

AST nodes:

```text
AsyncFor
AsyncWith
Await
AsyncFunctionDef
```

These nodes allow later compiler phases to generate coroutine-aware bytecode.

## 20.18 Source Locations

AST nodes carry source positions.

Common fields include:

```text
lineno
col_offset
end_lineno
end_col_offset
```

Example:

```python
import ast

tree = ast.parse("x = 1 + 2\n")
assign = tree.body[0]

print(assign.lineno)
print(assign.col_offset)
print(assign.end_lineno)
print(assign.end_col_offset)
```

Source locations are used by:

```text
tracebacks
syntax errors
debuggers
profilers
coverage tools
AST transformers
editor tooling
```

When generating AST manually, missing source locations can cause compilation errors. The helper `ast.fix_missing_locations()` fills missing location data where possible.

Example:

```python
import ast

tree = ast.Module(
    body=[
        ast.Assign(
            targets=[ast.Name(id="x", ctx=ast.Store())],
            value=ast.Constant(value=1),
        )
    ],
    type_ignores=[],
)

tree = ast.fix_missing_locations(tree)
code = compile(tree, "<ast>", "exec")
exec(code)
```

## 20.19 AST Validation

Before compiling an AST, CPython validates it.

Validation catches malformed trees such as:

```text
Name with no context
assignment target with Load context
expression where statement is required
statement where expression is required
invalid source location fields
invalid operator node placement
missing required fields
```

Example of an invalid AST:

```python
import ast

tree = ast.Module(
    body=[
        ast.Assign(
            targets=[ast.Name(id="x", ctx=ast.Load())],
            value=ast.Constant(value=1),
        )
    ],
    type_ignores=[],
)

tree = ast.fix_missing_locations(tree)
compile(tree, "<ast>", "exec")
```

The target `x` has `Load` context, but assignment requires `Store`. CPython rejects this.

This matters because the public `ast` module allows users to construct trees manually. The compiler must not assume that every user-created AST is valid.

## 20.20 AST Transformations

AST tools can modify code before compilation.

Example: replace every integer constant `1` with `42`.

```python
import ast

class ReplaceOne(ast.NodeTransformer):
    def visit_Constant(self, node):
        if node.value == 1:
            return ast.copy_location(ast.Constant(value=42), node)
        return node

tree = ast.parse("x = 1\n")
tree = ReplaceOne().visit(tree)
tree = ast.fix_missing_locations(tree)

code = compile(tree, "<ast>", "exec")
ns = {}
exec(code, ns)

print(ns["x"])
```

This prints:

```text
42
```

AST transformation is powerful but constrained. The transformed tree must still obey AST validity rules.

## 20.21 AST vs Bytecode

AST and bytecode represent different levels.

For this source:

```python
x = 1 + 2
```

AST:

```text
Assign
    Name "x", Store
    BinOp
        Constant 1
        Add
        Constant 2
```

Bytecode may look like:

```text
LOAD_CONST 3
STORE_NAME x
RETURN_CONST None
```

The bytecode may already contain optimized constants. It also contains execution instructions, not source syntax.

Comparison:

| Layer    | Represents           | Preserves                                         |
| -------- | -------------------- | ------------------------------------------------- |
| Tokens   | Lexical source units | spelling, comments in public tokenizer, positions |
| AST      | Syntax structure     | statements, expressions, contexts, positions      |
| Bytecode | Execution plan       | stack operations, constants, names, jumps         |

The AST is closer to source than bytecode, but less exact than tokens.

## 20.22 AST and Symbol Tables

The symbol table pass reads the AST to classify names.

Example:

```python
x = 1

def f():
    return x
```

The AST says:

```text
module assigns x
function f loads x
```

The symbol table decides:

```text
x is global from inside f
f is local to module scope
```

Example with closure:

```python
def outer():
    x = 1

    def inner():
        return x

    return inner
```

The AST records nested functions and name usage. The symbol table determines that:

```text
x is local to outer
x is free in inner
x must be stored in a closure cell
```

This is not purely parsing. It requires scope analysis over the AST.

## 20.23 AST and Compiler Generation

The compiler walks the AST and emits bytecode.

Simplified model:

```text
compile Module:
    compile each statement

compile Assign:
    compile right-hand expression
    compile each target in Store context

compile BinOp:
    compile left expression
    compile right expression
    emit binary operation instruction

compile If:
    compile test
    emit conditional jump
    compile body
    emit jump over else
    compile else body
```

The compiler is a tree walker with control-flow and stack-management logic.

It depends on AST structure being correct. A malformed AST could cause wrong bytecode, crashes, or invalid runtime state, which is why validation exists.

## 20.24 CPython AST Source Files

Important CPython areas include:

```text
Parser/Python.asdl
Python/Python-ast.c
Python/ast.c
Python/ast_opt.c
Python/symtable.c
Python/compile.c
Lib/ast.py
```

The ASDL file defines the AST node schema. Generated C code provides AST structures and constructors. Parser and AST code build nodes from grammar matches. The optimizer and compiler consume the tree.

Conceptual roles:

| Area           | Role                                  |
| -------------- | ------------------------------------- |
| ASDL schema    | Defines AST node types and fields     |
| Parser actions | Create AST nodes from grammar matches |
| AST validation | Reject malformed trees                |
| AST optimizer  | Performs tree-level simplifications   |
| Symbol table   | Classifies names and scopes           |
| Compiler       | Emits bytecode from AST               |
| `Lib/ast.py`   | Python-level AST helpers              |

## 20.25 Minimal Mental Model

Use this model:

```text
The AST is CPython’s structured syntax tree.
It is built by the parser.
It removes most formatting details.
It records statements, expressions, operators, contexts, and source locations.
It separates syntax structure from later semantic analysis.
The symbol table pass reads the AST to classify names.
The compiler walks the AST to emit bytecode.
```

The AST is the central handoff object between syntax and execution.

