Skip to content

2.19 Parsing Expressions

Parsing expressions converts a sequence of tokens into a structured form that reflects operator precedence and associativity.

Parsing expressions converts a sequence of tokens into a structured form that reflects operator precedence and associativity. The result is typically an abstract syntax tree (AST) or a computed value. This section presents a practical method, the shunting-yard algorithm, followed by a direct recursive descent approach.

Problem

Given an expression:

3 + 4 * (2 - 1)

Produce either:

  • a value: 7
  • or a structured representation such as an AST

The parser must respect precedence:

* and / bind tighter than + and -

and associativity:

left-associative: a - b - c means (a - b) - c

Token Assumptions

Assume the tokenizer (Section 2.11) produces tokens:

number, operator, left paren, right paren

We use these token kinds:

type TokenKind int

const (
	Number TokenKind = iota
	Operator
	LParen
	RParen
)

Each token has a Text field such as "3" or "+".

Operator Table

Define precedence and associativity.

type OpInfo struct {
	Prec int
	Left bool
}

var ops = map[string]OpInfo{
	"+": {Prec: 1, Left: true},
	"-": {Prec: 1, Left: true},
	"*": {Prec: 2, Left: true},
	"/": {Prec: 2, Left: true},
}

Shunting-Yard Algorithm

Convert infix tokens into postfix (Reverse Polish Notation, RPN), then evaluate.

Idea

Maintain two structures:

  • output list
  • operator stack

Process tokens from left to right.

Implementation

func ToRPN(tokens []Token) []Token {
	var out []Token
	var stack []Token

	for _, tok := range tokens {
		switch tok.Kind {

		case Number:
			out = append(out, tok)

		case Operator:
			for len(stack) > 0 {
				top := stack[len(stack)-1]

				if top.Kind != Operator {
					break
				}

				o1 := ops[tok.Text]
				o2 := ops[top.Text]

				if (o1.Left && o1.Prec <= o2.Prec) ||
					(!o1.Left && o1.Prec < o2.Prec) {

					out = append(out, top)
					stack = stack[:len(stack)-1]
					continue
				}
				break
			}
			stack = append(stack, tok)

		case LParen:
			stack = append(stack, tok)

		case RParen:
			for len(stack) > 0 && stack[len(stack)-1].Kind != LParen {
				out = append(out, stack[len(stack)-1])
				stack = stack[:len(stack)-1]
			}
			if len(stack) > 0 {
				stack = stack[:len(stack)-1]
			}
		}
	}

	for len(stack) > 0 {
		out = append(out, stack[len(stack)-1])
		stack = stack[:len(stack)-1]
	}

	return out
}

Invariant

The output list is a valid postfix form of all tokens processed so far
The stack contains operators not yet emitted, ordered by precedence rules

Evaluating RPN

func EvalRPN(tokens []Token) int {
	var stack []int

	for _, tok := range tokens {
		if tok.Kind == Number {
			v, _ := strconv.Atoi(tok.Text)
			stack = append(stack, v)
			continue
		}

		b := stack[len(stack)-1]
		a := stack[len(stack)-2]
		stack = stack[:len(stack)-2]

		var res int

		switch tok.Text {
		case "+":
			res = a + b
		case "-":
			res = a - b
		case "*":
			res = a * b
		case "/":
			res = a / b
		}

		stack = append(stack, res)
	}

	return stack[0]
}

Requires:

import "strconv"

Invariant:

Stack contains values of partially evaluated subexpressions

Example

Input:

3 + 4 * (2 - 1)

RPN:

3 4 2 1 - * +

Evaluation:

= 7

Recursive Descent Parsing

An alternative is direct parsing using grammar rules.

Grammar:

expr   := term { (+|-) term }
term   := factor { (*|/) factor }
factor := number | "(" expr ")"

Implementation

type Parser struct {
	tokens []Token
	pos    int
}

func (p *Parser) peek() Token {
	if p.pos >= len(p.tokens) {
		return Token{}
	}
	return p.tokens[p.pos]
}

func (p *Parser) next() Token {
	t := p.peek()
	p.pos++
	return t
}

func (p *Parser) parseExpr() int {
	val := p.parseTerm()

	for {
		tok := p.peek()
		if tok.Kind == Operator && (tok.Text == "+" || tok.Text == "-") {
			p.next()
			right := p.parseTerm()

			if tok.Text == "+" {
				val += right
			} else {
				val -= right
			}
		} else {
			break
		}
	}

	return val
}

func (p *Parser) parseTerm() int {
	val := p.parseFactor()

	for {
		tok := p.peek()
		if tok.Kind == Operator && (tok.Text == "*" || tok.Text == "/") {
			p.next()
			right := p.parseFactor()

			if tok.Text == "*" {
				val *= right
			} else {
				val /= right
			}
		} else {
			break
		}
	}

	return val
}

func (p *Parser) parseFactor() int {
	tok := p.next()

	if tok.Kind == Number {
		v, _ := strconv.Atoi(tok.Text)
		return v
	}

	if tok.Kind == LParen {
		val := p.parseExpr()
		p.next() // consume ')'
		return val
	}

	panic("invalid input")
}

Invariant:

Each parse function consumes tokens belonging to its grammar rule and returns the correct value

Complexity

Time:  O(n)
Space: O(n) for stacks or recursion

Each token is processed a constant number of times.

Common Pitfalls

Forgetting operator precedence leads to incorrect evaluation.

Not handling associativity correctly breaks expressions like a - b - c.

Missing parentheses handling causes stack underflow or incorrect grouping.

Using integer division without considering truncation rules may produce unintended results.

Not validating input can cause panics or incorrect parsing.

Design Pattern

Expression parsing follows:

  1. Define grammar
  2. Encode precedence
  3. Maintain structure (stack or recursion)
  4. Ensure each token is consumed exactly once

Takeaway

Parsing expressions transforms flat token sequences into structured meaning. The shunting-yard algorithm separates parsing from evaluation and works well for simple calculators. Recursive descent maps directly to grammar rules and is easier to extend. The critical elements are precedence, associativity, and correct token consumption.