# Writing a Virtual Machine

### Writing a Virtual Machine

A virtual machine is a program that runs another program.

The “machine” part means it has its own instruction format, its own execution rules, and usually its own memory model. The “virtual” part means this machine is implemented in software.

A CPU runs machine code. A virtual machine runs bytecode.

For example, instead of executing native CPU instructions directly, a VM may execute instructions like:

```text
push 1
push 2
add
print
halt
```

A VM reads these instructions one by one and performs the requested operations.

#### Why Write a Virtual Machine

A VM is useful when you want a controlled execution environment.

You might write one for:

```text
a scripting language
a bytecode interpreter
a game engine
a rule engine
a query engine
a calculator language
a teaching language
a sandboxed plugin system
```

A VM lets you separate the language from the hardware. Your compiler or parser can produce bytecode. The VM runs that bytecode.

#### A Stack-Based VM

The simplest VM design is a stack machine.

A stack machine stores temporary values on a stack.

Example program:

```text
push 1
push 2
add
print
halt
```

Execution:

```text
push 1       stack: [1]
push 2       stack: [1, 2]
add          stack: [3]
print        prints 3
halt         stop
```

The `add` instruction removes two values from the stack, adds them, and pushes the result back.

#### Defining Instructions

Start with a small instruction set.

```zig
const OpCode = enum(u8) {
    push_i64,
    add,
    sub,
    mul,
    div,
    print,
    halt,
};
```

Each instruction is one operation the VM understands.

Some instructions need extra data. For example, `push_i64` needs the integer value to push.

We can represent instructions as a tagged union:

```zig
const Instruction = union(OpCode) {
    push_i64: i64,
    add: void,
    sub: void,
    mul: void,
    div: void,
    print: void,
    halt: void,
};
```

Now an instruction can carry data only when it needs data.

#### A Simple Program

Here is bytecode for:

```text
1 + 2
```

```zig
const program = [_]Instruction{
    .{ .push_i64 = 1 },
    .{ .push_i64 = 2 },
    .add,
    .print,
    .halt,
};
```

The VM will execute these instructions in order.

#### The VM State

A VM needs state.

At minimum:

```text
instruction pointer
value stack
running flag
```

The instruction pointer tells the VM which instruction to execute next.

```zig
const std = @import("std");

const Vm = struct {
    instructions: []const Instruction,
    ip: usize,
    stack: std.ArrayList(i64),

    pub fn init(
        allocator: std.mem.Allocator,
        instructions: []const Instruction,
    ) Vm {
        return .{
            .instructions = instructions,
            .ip = 0,
            .stack = std.ArrayList(i64).init(allocator),
        };
    }

    pub fn deinit(self: *Vm) void {
        self.stack.deinit();
    }
};
```

The stack stores `i64` values for now.

Later, a real language may need multiple value types, such as booleans, strings, arrays, objects, and functions.

#### Stack Helpers

Stack code should check errors carefully.

```zig
fn push(self: *Vm, value: i64) !void {
    try self.stack.append(value);
}

fn pop(self: *Vm) !i64 {
    if (self.stack.items.len == 0) {
        return error.StackUnderflow;
    }

    return self.stack.pop();
}
```

`StackUnderflow` means the program tried to remove a value from an empty stack.

For example:

```text
add
halt
```

This is invalid because `add` needs two values.

#### The Execution Loop

A VM usually has a loop:

```text
fetch instruction
advance instruction pointer
execute instruction
repeat
```

In Zig:

```zig
pub fn run(self: *Vm) !void {
    while (true) {
        if (self.ip >= self.instructions.len) {
            return error.InstructionPointerOutOfBounds;
        }

        const instruction = self.instructions[self.ip];
        self.ip += 1;

        switch (instruction) {
            .push_i64 => |value| {
                try self.push(value);
            },
            .add => {
                const b = try self.pop();
                const a = try self.pop();
                try self.push(a + b);
            },
            .sub => {
                const b = try self.pop();
                const a = try self.pop();
                try self.push(a - b);
            },
            .mul => {
                const b = try self.pop();
                const a = try self.pop();
                try self.push(a * b);
            },
            .div => {
                const b = try self.pop();
                const a = try self.pop();

                if (b == 0) {
                    return error.DivisionByZero;
                }

                try self.push(@divTrunc(a, b));
            },
            .print => {
                const value = try self.pop();
                std.debug.print("{}\n", .{value});
            },
            .halt => {
                return;
            },
        }
    }
}
```

This is the heart of the VM.

It fetches one instruction, then uses `switch` to execute it.

#### Full Minimal VM

Here is the full version:

```zig
const std = @import("std");

const OpCode = enum(u8) {
    push_i64,
    add,
    sub,
    mul,
    div,
    print,
    halt,
};

const Instruction = union(OpCode) {
    push_i64: i64,
    add: void,
    sub: void,
    mul: void,
    div: void,
    print: void,
    halt: void,
};

const Vm = struct {
    instructions: []const Instruction,
    ip: usize,
    stack: std.ArrayList(i64),

    pub fn init(
        allocator: std.mem.Allocator,
        instructions: []const Instruction,
    ) Vm {
        return .{
            .instructions = instructions,
            .ip = 0,
            .stack = std.ArrayList(i64).init(allocator),
        };
    }

    pub fn deinit(self: *Vm) void {
        self.stack.deinit();
    }

    fn push(self: *Vm, value: i64) !void {
        try self.stack.append(value);
    }

    fn pop(self: *Vm) !i64 {
        if (self.stack.items.len == 0) {
            return error.StackUnderflow;
        }

        return self.stack.pop();
    }

    pub fn run(self: *Vm) !void {
        while (true) {
            if (self.ip >= self.instructions.len) {
                return error.InstructionPointerOutOfBounds;
            }

            const instruction = self.instructions[self.ip];
            self.ip += 1;

            switch (instruction) {
                .push_i64 => |value| {
                    try self.push(value);
                },
                .add => {
                    const b = try self.pop();
                    const a = try self.pop();
                    try self.push(a + b);
                },
                .sub => {
                    const b = try self.pop();
                    const a = try self.pop();
                    try self.push(a - b);
                },
                .mul => {
                    const b = try self.pop();
                    const a = try self.pop();
                    try self.push(a * b);
                },
                .div => {
                    const b = try self.pop();
                    const a = try self.pop();

                    if (b == 0) {
                        return error.DivisionByZero;
                    }

                    try self.push(@divTrunc(a, b));
                },
                .print => {
                    const value = try self.pop();
                    std.debug.print("{}\n", .{value});
                },
                .halt => {
                    return;
                },
            }
        }
    }
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    const program = [_]Instruction{
        .{ .push_i64 = 1 },
        .{ .push_i64 = 2 },
        .{ .push_i64 = 3 },
        .mul,
        .add,
        .print,
        .halt,
    };

    var vm = Vm.init(gpa.allocator(), &program);
    defer vm.deinit();

    try vm.run();
}
```

This prints:

```text
7
```

The bytecode computes:

```text
1 + (2 * 3)
```

#### Why the Stack Works

The stack removes the need for explicit temporary variable names.

Instead of saying:

```text
tmp1 = 2 * 3
tmp2 = 1 + tmp1
print tmp2
```

the bytecode says:

```text
push 1
push 2
push 3
mul
add
print
```

The order of stack operations carries the structure.

This is why stack machines are common in simple interpreters.

#### Instruction Pointer

The instruction pointer is often called `ip`.

```zig
ip: usize
```

It stores the index of the next instruction.

At the start:

```text
ip = 0
```

After fetching an instruction:

```zig
const instruction = self.instructions[self.ip];
self.ip += 1;
```

The VM advances to the next instruction.

Some instructions, such as jumps, will modify `ip` directly.

#### Adding Jumps

Without jumps, the VM only runs straight-line code.

To support `if`, `while`, and loops, we need jumps.

Add instructions:

```zig
const OpCode = enum(u8) {
    push_i64,
    add,
    sub,
    mul,
    div,
    jump,
    jump_if_zero,
    print,
    halt,
};
```

And add payloads:

```zig
const Instruction = union(OpCode) {
    push_i64: i64,
    add: void,
    sub: void,
    mul: void,
    div: void,
    jump: usize,
    jump_if_zero: usize,
    print: void,
    halt: void,
};
```

A jump sets the instruction pointer.

```zig
.jump => |target| {
    if (target >= self.instructions.len) {
        return error.InvalidJumpTarget;
    }

    self.ip = target;
},
```

A conditional jump pops a value and jumps only when the value is zero.

```zig
.jump_if_zero => |target| {
    const value = try self.pop();

    if (value == 0) {
        if (target >= self.instructions.len) {
            return error.InvalidJumpTarget;
        }

        self.ip = target;
    }
},
```

Now the VM can represent branches.

#### Local Variables

A stack is enough for simple expressions, but languages need variables.

Add local storage:

```zig
locals: []i64
```

Then add instructions:

```zig
load_local
store_local
```

Conceptually:

```text
push 10
store_local 0
load_local 0
print
halt
```

This stores `10` in local slot `0`, loads it again, and prints it.

A simple instruction design:

```zig
load_local: usize,
store_local: usize,
```

The VM checks that the index is valid.

#### Values

Our VM only supports `i64`.

A more realistic VM needs a `Value` type.

```zig
const Value = union(enum) {
    integer: i64,
    boolean: bool,
    nil,
};
```

Then the stack becomes:

```zig
stack: std.ArrayList(Value)
```

Arithmetic instructions check the value types.

```zig
fn addValues(a: Value, b: Value) !Value {
    if (a == .integer and b == .integer) {
        return .{
            .integer = a.integer + b.integer,
        };
    }

    return error.TypeMismatch;
}
```

A typed value representation is the beginning of a real language runtime.

#### Bytecode Encoding

The examples above use Zig tagged unions. That is easy to read and good for learning.

A production VM often uses compact bytecode:

```text
u8 opcode
immediate bytes
u8 opcode
immediate bytes
```

Example layout:

```text
01 00 00 00 00 00 00 00 2A
06
07
```

Where:

```text
01 means push_i64
next 8 bytes are the integer
06 means print
07 means halt
```

Compact bytecode is smaller and faster to load, but harder to debug.

For learning, start with tagged union instructions. Optimize the representation later.

#### Disassembly

A disassembler prints bytecode in a readable form.

Example output:

```text
0000 push_i64 1
0001 push_i64 2
0002 add
0003 print
0004 halt
```

A disassembler is extremely useful when debugging a VM.

```zig
fn disassemble(instructions: []const Instruction) void {
    for (instructions, 0..) |instruction, index| {
        std.debug.print("{d:0>4} ", .{index});

        switch (instruction) {
            .push_i64 => |value| std.debug.print("push_i64 {}\n", .{value}),
            .add => std.debug.print("add\n", .{}),
            .sub => std.debug.print("sub\n", .{}),
            .mul => std.debug.print("mul\n", .{}),
            .div => std.debug.print("div\n", .{}),
            .print => std.debug.print("print\n", .{}),
            .halt => std.debug.print("halt\n", .{}),
        }
    }
}
```

When execution goes wrong, first look at the bytecode.

#### The VM and the Parser

A parser builds an AST.

A compiler can walk that AST and emit bytecode.

For expression:

```text
1 + 2 * 3
```

The AST is:

```text
+
  1
  *
    2
    3
```

The compiler emits:

```text
push 1
push 2
push 3
mul
add
```

The VM runs it.

That gives a full pipeline:

```text
source text -> tokens -> AST -> bytecode -> VM execution
```

#### Error Handling in a VM

A VM should return errors for invalid execution.

Examples:

```text
stack underflow
division by zero
invalid jump target
unknown instruction
type mismatch
local index out of bounds
instruction pointer out of bounds
```

These are runtime errors of the virtual machine.

Do not let invalid bytecode silently corrupt VM state.

Zig’s error unions are a good fit here.

#### Testing a VM

A VM is easy to test because bytecode is data.

Example:

```zig
test "vm adds numbers" {
    const program = [_]Instruction{
        .{ .push_i64 = 20 },
        .{ .push_i64 = 22 },
        .add,
        .halt,
    };

    var vm = Vm.init(std.testing.allocator, &program);
    defer vm.deinit();

    try vm.run();

    try std.testing.expectEqual(@as(i64, 42), vm.stack.items[0]);
}
```

You can test every instruction directly.

Good VM tests are small and precise.

#### Common Mistakes

A common mistake is adding too many instructions too early.

Start with arithmetic, printing, jumps, and locals.

Another mistake is ignoring error cases. Every stack pop, local access, jump target, and type operation needs validation.

A third mistake is mixing parser, compiler, and VM logic together. Keep them separate.

A fourth mistake is optimizing bytecode encoding before the instruction semantics are clear.

#### A Practical Build Order

Build the VM in this order:

```text
define instructions
define VM state
implement stack helpers
implement arithmetic
implement print and halt
add tests
add jumps
add locals
add Value union
write a disassembler
write a compiler from AST to bytecode
```

Each step should be testable before moving on.

#### The Main Idea

A virtual machine executes instructions inside a software-defined machine.

In Zig, a simple VM can be built from enums, tagged unions, arrays, switches, and explicit error handling.

Start with a stack machine. Keep the instruction set small. Test each instruction. Add jumps and locals only after straight-line execution works.