A virtual machine is a program that runs another program.
The “machine” part means it has its own instruction format, its own execution rules, and usually its own memory model. The “virtual” part means this machine is implemented in software.
A CPU runs machine code. A virtual machine runs bytecode.
For example, instead of executing native CPU instructions directly, a VM may execute instructions like:
push 1
push 2
add
print
haltA VM reads these instructions one by one and performs the requested operations.
Why Write a Virtual Machine
A VM is useful when you want a controlled execution environment.
You might write one for:
a scripting language
a bytecode interpreter
a game engine
a rule engine
a query engine
a calculator language
a teaching language
a sandboxed plugin systemA VM lets you separate the language from the hardware. Your compiler or parser can produce bytecode. The VM runs that bytecode.
A Stack-Based VM
The simplest VM design is a stack machine.
A stack machine stores temporary values on a stack.
Example program:
push 1
push 2
add
print
haltExecution:
push 1 stack: [1]
push 2 stack: [1, 2]
add stack: [3]
print prints 3
halt stopThe add instruction removes two values from the stack, adds them, and pushes the result back.
Defining Instructions
Start with a small instruction set.
const OpCode = enum(u8) {
push_i64,
add,
sub,
mul,
div,
print,
halt,
};Each instruction is one operation the VM understands.
Some instructions need extra data. For example, push_i64 needs the integer value to push.
We can represent instructions as a tagged union:
const Instruction = union(OpCode) {
push_i64: i64,
add: void,
sub: void,
mul: void,
div: void,
print: void,
halt: void,
};Now an instruction can carry data only when it needs data.
A Simple Program
Here is bytecode for:
1 + 2const program = [_]Instruction{
.{ .push_i64 = 1 },
.{ .push_i64 = 2 },
.add,
.print,
.halt,
};The VM will execute these instructions in order.
The VM State
A VM needs state.
At minimum:
instruction pointer
value stack
running flagThe instruction pointer tells the VM which instruction to execute next.
const std = @import("std");
const Vm = struct {
instructions: []const Instruction,
ip: usize,
stack: std.ArrayList(i64),
pub fn init(
allocator: std.mem.Allocator,
instructions: []const Instruction,
) Vm {
return .{
.instructions = instructions,
.ip = 0,
.stack = std.ArrayList(i64).init(allocator),
};
}
pub fn deinit(self: *Vm) void {
self.stack.deinit();
}
};The stack stores i64 values for now.
Later, a real language may need multiple value types, such as booleans, strings, arrays, objects, and functions.
Stack Helpers
Stack code should check errors carefully.
fn push(self: *Vm, value: i64) !void {
try self.stack.append(value);
}
fn pop(self: *Vm) !i64 {
if (self.stack.items.len == 0) {
return error.StackUnderflow;
}
return self.stack.pop();
}StackUnderflow means the program tried to remove a value from an empty stack.
For example:
add
haltThis is invalid because add needs two values.
The Execution Loop
A VM usually has a loop:
fetch instruction
advance instruction pointer
execute instruction
repeatIn Zig:
pub fn run(self: *Vm) !void {
while (true) {
if (self.ip >= self.instructions.len) {
return error.InstructionPointerOutOfBounds;
}
const instruction = self.instructions[self.ip];
self.ip += 1;
switch (instruction) {
.push_i64 => |value| {
try self.push(value);
},
.add => {
const b = try self.pop();
const a = try self.pop();
try self.push(a + b);
},
.sub => {
const b = try self.pop();
const a = try self.pop();
try self.push(a - b);
},
.mul => {
const b = try self.pop();
const a = try self.pop();
try self.push(a * b);
},
.div => {
const b = try self.pop();
const a = try self.pop();
if (b == 0) {
return error.DivisionByZero;
}
try self.push(@divTrunc(a, b));
},
.print => {
const value = try self.pop();
std.debug.print("{}\n", .{value});
},
.halt => {
return;
},
}
}
}This is the heart of the VM.
It fetches one instruction, then uses switch to execute it.
Full Minimal VM
Here is the full version:
const std = @import("std");
const OpCode = enum(u8) {
push_i64,
add,
sub,
mul,
div,
print,
halt,
};
const Instruction = union(OpCode) {
push_i64: i64,
add: void,
sub: void,
mul: void,
div: void,
print: void,
halt: void,
};
const Vm = struct {
instructions: []const Instruction,
ip: usize,
stack: std.ArrayList(i64),
pub fn init(
allocator: std.mem.Allocator,
instructions: []const Instruction,
) Vm {
return .{
.instructions = instructions,
.ip = 0,
.stack = std.ArrayList(i64).init(allocator),
};
}
pub fn deinit(self: *Vm) void {
self.stack.deinit();
}
fn push(self: *Vm, value: i64) !void {
try self.stack.append(value);
}
fn pop(self: *Vm) !i64 {
if (self.stack.items.len == 0) {
return error.StackUnderflow;
}
return self.stack.pop();
}
pub fn run(self: *Vm) !void {
while (true) {
if (self.ip >= self.instructions.len) {
return error.InstructionPointerOutOfBounds;
}
const instruction = self.instructions[self.ip];
self.ip += 1;
switch (instruction) {
.push_i64 => |value| {
try self.push(value);
},
.add => {
const b = try self.pop();
const a = try self.pop();
try self.push(a + b);
},
.sub => {
const b = try self.pop();
const a = try self.pop();
try self.push(a - b);
},
.mul => {
const b = try self.pop();
const a = try self.pop();
try self.push(a * b);
},
.div => {
const b = try self.pop();
const a = try self.pop();
if (b == 0) {
return error.DivisionByZero;
}
try self.push(@divTrunc(a, b));
},
.print => {
const value = try self.pop();
std.debug.print("{}\n", .{value});
},
.halt => {
return;
},
}
}
}
};
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const program = [_]Instruction{
.{ .push_i64 = 1 },
.{ .push_i64 = 2 },
.{ .push_i64 = 3 },
.mul,
.add,
.print,
.halt,
};
var vm = Vm.init(gpa.allocator(), &program);
defer vm.deinit();
try vm.run();
}This prints:
7The bytecode computes:
1 + (2 * 3)Why the Stack Works
The stack removes the need for explicit temporary variable names.
Instead of saying:
tmp1 = 2 * 3
tmp2 = 1 + tmp1
print tmp2the bytecode says:
push 1
push 2
push 3
mul
add
printThe order of stack operations carries the structure.
This is why stack machines are common in simple interpreters.
Instruction Pointer
The instruction pointer is often called ip.
ip: usizeIt stores the index of the next instruction.
At the start:
ip = 0After fetching an instruction:
const instruction = self.instructions[self.ip];
self.ip += 1;The VM advances to the next instruction.
Some instructions, such as jumps, will modify ip directly.
Adding Jumps
Without jumps, the VM only runs straight-line code.
To support if, while, and loops, we need jumps.
Add instructions:
const OpCode = enum(u8) {
push_i64,
add,
sub,
mul,
div,
jump,
jump_if_zero,
print,
halt,
};And add payloads:
const Instruction = union(OpCode) {
push_i64: i64,
add: void,
sub: void,
mul: void,
div: void,
jump: usize,
jump_if_zero: usize,
print: void,
halt: void,
};A jump sets the instruction pointer.
.jump => |target| {
if (target >= self.instructions.len) {
return error.InvalidJumpTarget;
}
self.ip = target;
},A conditional jump pops a value and jumps only when the value is zero.
.jump_if_zero => |target| {
const value = try self.pop();
if (value == 0) {
if (target >= self.instructions.len) {
return error.InvalidJumpTarget;
}
self.ip = target;
}
},Now the VM can represent branches.
Local Variables
A stack is enough for simple expressions, but languages need variables.
Add local storage:
locals: []i64Then add instructions:
load_local
store_localConceptually:
push 10
store_local 0
load_local 0
print
haltThis stores 10 in local slot 0, loads it again, and prints it.
A simple instruction design:
load_local: usize,
store_local: usize,The VM checks that the index is valid.
Values
Our VM only supports i64.
A more realistic VM needs a Value type.
const Value = union(enum) {
integer: i64,
boolean: bool,
nil,
};Then the stack becomes:
stack: std.ArrayList(Value)Arithmetic instructions check the value types.
fn addValues(a: Value, b: Value) !Value {
if (a == .integer and b == .integer) {
return .{
.integer = a.integer + b.integer,
};
}
return error.TypeMismatch;
}A typed value representation is the beginning of a real language runtime.
Bytecode Encoding
The examples above use Zig tagged unions. That is easy to read and good for learning.
A production VM often uses compact bytecode:
u8 opcode
immediate bytes
u8 opcode
immediate bytesExample layout:
01 00 00 00 00 00 00 00 2A
06
07Where:
01 means push_i64
next 8 bytes are the integer
06 means print
07 means haltCompact bytecode is smaller and faster to load, but harder to debug.
For learning, start with tagged union instructions. Optimize the representation later.
Disassembly
A disassembler prints bytecode in a readable form.
Example output:
0000 push_i64 1
0001 push_i64 2
0002 add
0003 print
0004 haltA disassembler is extremely useful when debugging a VM.
fn disassemble(instructions: []const Instruction) void {
for (instructions, 0..) |instruction, index| {
std.debug.print("{d:0>4} ", .{index});
switch (instruction) {
.push_i64 => |value| std.debug.print("push_i64 {}\n", .{value}),
.add => std.debug.print("add\n", .{}),
.sub => std.debug.print("sub\n", .{}),
.mul => std.debug.print("mul\n", .{}),
.div => std.debug.print("div\n", .{}),
.print => std.debug.print("print\n", .{}),
.halt => std.debug.print("halt\n", .{}),
}
}
}When execution goes wrong, first look at the bytecode.
The VM and the Parser
A parser builds an AST.
A compiler can walk that AST and emit bytecode.
For expression:
1 + 2 * 3The AST is:
+
1
*
2
3The compiler emits:
push 1
push 2
push 3
mul
addThe VM runs it.
That gives a full pipeline:
source text -> tokens -> AST -> bytecode -> VM executionError Handling in a VM
A VM should return errors for invalid execution.
Examples:
stack underflow
division by zero
invalid jump target
unknown instruction
type mismatch
local index out of bounds
instruction pointer out of boundsThese are runtime errors of the virtual machine.
Do not let invalid bytecode silently corrupt VM state.
Zig’s error unions are a good fit here.
Testing a VM
A VM is easy to test because bytecode is data.
Example:
test "vm adds numbers" {
const program = [_]Instruction{
.{ .push_i64 = 20 },
.{ .push_i64 = 22 },
.add,
.halt,
};
var vm = Vm.init(std.testing.allocator, &program);
defer vm.deinit();
try vm.run();
try std.testing.expectEqual(@as(i64, 42), vm.stack.items[0]);
}You can test every instruction directly.
Good VM tests are small and precise.
Common Mistakes
A common mistake is adding too many instructions too early.
Start with arithmetic, printing, jumps, and locals.
Another mistake is ignoring error cases. Every stack pop, local access, jump target, and type operation needs validation.
A third mistake is mixing parser, compiler, and VM logic together. Keep them separate.
A fourth mistake is optimizing bytecode encoding before the instruction semantics are clear.
A Practical Build Order
Build the VM in this order:
define instructions
define VM state
implement stack helpers
implement arithmetic
implement print and halt
add tests
add jumps
add locals
add Value union
write a disassembler
write a compiler from AST to bytecodeEach step should be testable before moving on.
The Main Idea
A virtual machine executes instructions inside a software-defined machine.
In Zig, a simple VM can be built from enums, tagged unions, arrays, switches, and explicit error handling.
Start with a stack machine. Keep the instruction set small. Test each instruction. Add jumps and locals only after straight-line execution works.