Parsing expressions converts a sequence of tokens into a structured form that reflects operator precedence and associativity.
Parsing expressions converts a sequence of tokens into a structured form that reflects operator precedence and associativity. The result is typically an abstract syntax tree (AST) or a computed value. This section presents a practical method, the shunting-yard algorithm, followed by a direct recursive descent approach.
Problem
Given an expression:
3 + 4 * (2 - 1)Produce either:
- a value:
7 - or a structured representation such as an AST
The parser must respect precedence:
* and / bind tighter than + and -and associativity:
left-associative: a - b - c means (a - b) - cToken Assumptions
Assume the tokenizer (Section 2.11) produces tokens:
number, operator, left paren, right parenWe use these token kinds:
type TokenKind int
const (
Number TokenKind = iota
Operator
LParen
RParen
)Each token has a Text field such as "3" or "+".
Operator Table
Define precedence and associativity.
type OpInfo struct {
Prec int
Left bool
}
var ops = map[string]OpInfo{
"+": {Prec: 1, Left: true},
"-": {Prec: 1, Left: true},
"*": {Prec: 2, Left: true},
"/": {Prec: 2, Left: true},
}Shunting-Yard Algorithm
Convert infix tokens into postfix (Reverse Polish Notation, RPN), then evaluate.
Idea
Maintain two structures:
- output list
- operator stack
Process tokens from left to right.
Implementation
func ToRPN(tokens []Token) []Token {
var out []Token
var stack []Token
for _, tok := range tokens {
switch tok.Kind {
case Number:
out = append(out, tok)
case Operator:
for len(stack) > 0 {
top := stack[len(stack)-1]
if top.Kind != Operator {
break
}
o1 := ops[tok.Text]
o2 := ops[top.Text]
if (o1.Left && o1.Prec <= o2.Prec) ||
(!o1.Left && o1.Prec < o2.Prec) {
out = append(out, top)
stack = stack[:len(stack)-1]
continue
}
break
}
stack = append(stack, tok)
case LParen:
stack = append(stack, tok)
case RParen:
for len(stack) > 0 && stack[len(stack)-1].Kind != LParen {
out = append(out, stack[len(stack)-1])
stack = stack[:len(stack)-1]
}
if len(stack) > 0 {
stack = stack[:len(stack)-1]
}
}
}
for len(stack) > 0 {
out = append(out, stack[len(stack)-1])
stack = stack[:len(stack)-1]
}
return out
}Invariant
The output list is a valid postfix form of all tokens processed so far
The stack contains operators not yet emitted, ordered by precedence rulesEvaluating RPN
func EvalRPN(tokens []Token) int {
var stack []int
for _, tok := range tokens {
if tok.Kind == Number {
v, _ := strconv.Atoi(tok.Text)
stack = append(stack, v)
continue
}
b := stack[len(stack)-1]
a := stack[len(stack)-2]
stack = stack[:len(stack)-2]
var res int
switch tok.Text {
case "+":
res = a + b
case "-":
res = a - b
case "*":
res = a * b
case "/":
res = a / b
}
stack = append(stack, res)
}
return stack[0]
}Requires:
import "strconv"Invariant:
Stack contains values of partially evaluated subexpressionsExample
Input:
3 + 4 * (2 - 1)RPN:
3 4 2 1 - * +Evaluation:
= 7Recursive Descent Parsing
An alternative is direct parsing using grammar rules.
Grammar:
expr := term { (+|-) term }
term := factor { (*|/) factor }
factor := number | "(" expr ")"Implementation
type Parser struct {
tokens []Token
pos int
}
func (p *Parser) peek() Token {
if p.pos >= len(p.tokens) {
return Token{}
}
return p.tokens[p.pos]
}
func (p *Parser) next() Token {
t := p.peek()
p.pos++
return t
}
func (p *Parser) parseExpr() int {
val := p.parseTerm()
for {
tok := p.peek()
if tok.Kind == Operator && (tok.Text == "+" || tok.Text == "-") {
p.next()
right := p.parseTerm()
if tok.Text == "+" {
val += right
} else {
val -= right
}
} else {
break
}
}
return val
}
func (p *Parser) parseTerm() int {
val := p.parseFactor()
for {
tok := p.peek()
if tok.Kind == Operator && (tok.Text == "*" || tok.Text == "/") {
p.next()
right := p.parseFactor()
if tok.Text == "*" {
val *= right
} else {
val /= right
}
} else {
break
}
}
return val
}
func (p *Parser) parseFactor() int {
tok := p.next()
if tok.Kind == Number {
v, _ := strconv.Atoi(tok.Text)
return v
}
if tok.Kind == LParen {
val := p.parseExpr()
p.next() // consume ')'
return val
}
panic("invalid input")
}Invariant:
Each parse function consumes tokens belonging to its grammar rule and returns the correct valueComplexity
Time: O(n)
Space: O(n) for stacks or recursionEach token is processed a constant number of times.
Common Pitfalls
Forgetting operator precedence leads to incorrect evaluation.
Not handling associativity correctly breaks expressions like a - b - c.
Missing parentheses handling causes stack underflow or incorrect grouping.
Using integer division without considering truncation rules may produce unintended results.
Not validating input can cause panics or incorrect parsing.
Design Pattern
Expression parsing follows:
- Define grammar
- Encode precedence
- Maintain structure (stack or recursion)
- Ensure each token is consumed exactly once
Takeaway
Parsing expressions transforms flat token sequences into structured meaning. The shunting-yard algorithm separates parsing from evaluation and works well for simple calculators. Recursive descent maps directly to grammar rules and is easier to extend. The critical elements are precedence, associativity, and correct token consumption.