# 2.19 Parsing Expressions

Parsing expressions converts a sequence of tokens into a structured form that reflects operator precedence and associativity. The result is typically an abstract syntax tree (AST) or a computed value. This section presents a practical method, the shunting-yard algorithm, followed by a direct recursive descent approach.

## Problem

Given an expression:

```text id="a1yq3r"
3 + 4 * (2 - 1)
```

Produce either:

* a value: `7`
* or a structured representation such as an AST

The parser must respect precedence:

```text id="c8h6sz"
* and / bind tighter than + and -
```

and associativity:

```text id="g5l2x9"
left-associative: a - b - c means (a - b) - c
```

## Token Assumptions

Assume the tokenizer (Section 2.11) produces tokens:

```text id="q0v4kf"
number, operator, left paren, right paren
```

We use these token kinds:

```go id="n4s1av"
type TokenKind int

const (
	Number TokenKind = iota
	Operator
	LParen
	RParen
)
```

Each token has a `Text` field such as `"3"` or `"+"`.

## Operator Table

Define precedence and associativity.

```go id="u7g1xk"
type OpInfo struct {
	Prec int
	Left bool
}

var ops = map[string]OpInfo{
	"+": {Prec: 1, Left: true},
	"-": {Prec: 1, Left: true},
	"*": {Prec: 2, Left: true},
	"/": {Prec: 2, Left: true},
}
```

## Shunting-Yard Algorithm

Convert infix tokens into postfix (Reverse Polish Notation, RPN), then evaluate.

### Idea

Maintain two structures:

* output list
* operator stack

Process tokens from left to right.

### Implementation

```go id="1n9o0s"
func ToRPN(tokens []Token) []Token {
	var out []Token
	var stack []Token

	for _, tok := range tokens {
		switch tok.Kind {

		case Number:
			out = append(out, tok)

		case Operator:
			for len(stack) > 0 {
				top := stack[len(stack)-1]

				if top.Kind != Operator {
					break
				}

				o1 := ops[tok.Text]
				o2 := ops[top.Text]

				if (o1.Left && o1.Prec <= o2.Prec) ||
					(!o1.Left && o1.Prec < o2.Prec) {

					out = append(out, top)
					stack = stack[:len(stack)-1]
					continue
				}
				break
			}
			stack = append(stack, tok)

		case LParen:
			stack = append(stack, tok)

		case RParen:
			for len(stack) > 0 && stack[len(stack)-1].Kind != LParen {
				out = append(out, stack[len(stack)-1])
				stack = stack[:len(stack)-1]
			}
			if len(stack) > 0 {
				stack = stack[:len(stack)-1]
			}
		}
	}

	for len(stack) > 0 {
		out = append(out, stack[len(stack)-1])
		stack = stack[:len(stack)-1]
	}

	return out
}
```

### Invariant

```text id="x4k7pt"
The output list is a valid postfix form of all tokens processed so far
The stack contains operators not yet emitted, ordered by precedence rules
```

## Evaluating RPN

```go id="s0z1fr"
func EvalRPN(tokens []Token) int {
	var stack []int

	for _, tok := range tokens {
		if tok.Kind == Number {
			v, _ := strconv.Atoi(tok.Text)
			stack = append(stack, v)
			continue
		}

		b := stack[len(stack)-1]
		a := stack[len(stack)-2]
		stack = stack[:len(stack)-2]

		var res int

		switch tok.Text {
		case "+":
			res = a + b
		case "-":
			res = a - b
		case "*":
			res = a * b
		case "/":
			res = a / b
		}

		stack = append(stack, res)
	}

	return stack[0]
}
```

Requires:

```go id="5c7t8j"
import "strconv"
```

Invariant:

```text id="r6h1vz"
Stack contains values of partially evaluated subexpressions
```

## Example

Input:

```text id="r2q1we"
3 + 4 * (2 - 1)
```

RPN:

```text id="u9x8ka"
3 4 2 1 - * +
```

Evaluation:

```text id="l3p6gf"
= 7
```

## Recursive Descent Parsing

An alternative is direct parsing using grammar rules.

Grammar:

```text id="m7v3ot"
expr   := term { (+|-) term }
term   := factor { (*|/) factor }
factor := number | "(" expr ")"
```

### Implementation

```go id="n3x0zb"
type Parser struct {
	tokens []Token
	pos    int
}

func (p *Parser) peek() Token {
	if p.pos >= len(p.tokens) {
		return Token{}
	}
	return p.tokens[p.pos]
}

func (p *Parser) next() Token {
	t := p.peek()
	p.pos++
	return t
}

func (p *Parser) parseExpr() int {
	val := p.parseTerm()

	for {
		tok := p.peek()
		if tok.Kind == Operator && (tok.Text == "+" || tok.Text == "-") {
			p.next()
			right := p.parseTerm()

			if tok.Text == "+" {
				val += right
			} else {
				val -= right
			}
		} else {
			break
		}
	}

	return val
}

func (p *Parser) parseTerm() int {
	val := p.parseFactor()

	for {
		tok := p.peek()
		if tok.Kind == Operator && (tok.Text == "*" || tok.Text == "/") {
			p.next()
			right := p.parseFactor()

			if tok.Text == "*" {
				val *= right
			} else {
				val /= right
			}
		} else {
			break
		}
	}

	return val
}

func (p *Parser) parseFactor() int {
	tok := p.next()

	if tok.Kind == Number {
		v, _ := strconv.Atoi(tok.Text)
		return v
	}

	if tok.Kind == LParen {
		val := p.parseExpr()
		p.next() // consume ')'
		return val
	}

	panic("invalid input")
}
```

Invariant:

```text id="k3r8xt"
Each parse function consumes tokens belonging to its grammar rule and returns the correct value
```

## Complexity

```text id="w0c6ms"
Time:  O(n)
Space: O(n) for stacks or recursion
```

Each token is processed a constant number of times.

## Common Pitfalls

Forgetting operator precedence leads to incorrect evaluation.

Not handling associativity correctly breaks expressions like `a - b - c`.

Missing parentheses handling causes stack underflow or incorrect grouping.

Using integer division without considering truncation rules may produce unintended results.

Not validating input can cause panics or incorrect parsing.

## Design Pattern

Expression parsing follows:

1. Define grammar
2. Encode precedence
3. Maintain structure (stack or recursion)
4. Ensure each token is consumed exactly once

## Takeaway

Parsing expressions transforms flat token sequences into structured meaning. The shunting-yard algorithm separates parsing from evaluation and works well for simple calculators. Recursive descent maps directly to grammar rules and is easier to extend. The critical elements are precedence, associativity, and correct token consumption.