Research substrate for Mochi MEP-45 (May 2026). A 12-file deep dive into ahead-of-time transpilation from Mochi to ISO C23: language surface, design philosophy, prior-art transpilers, runtime building blocks, codegen design, type-system lowering, C-target portability, dataset pipeline, streams and agents, build system, testing gates, and risks.
Background research for Mochi MEP-45: the deep-dive specification for the C-as-target AOT half of MEP-42. The transpiler takes compiler3 IR, lowers it to ISO C23 plus a thin libmochi.a runtime, and ships a statically-linked single-file native binary for every tier-1 triple (x86_64-linux-{gnu,musl}, aarch64-linux-{gnu,musl}, aarch64-darwin, x86_64-darwin, x86_64-windows-msvc, x86_64-windows-gnu, wasm32-wasi). The master correctness gate is byte-equal stdout from the produced binary versus vm3 on the entire fixture corpus.
Each file in this section pins down one piece of the lowering contract. The notes are first-principles design work, not summaries of the current implementation. They were written ahead of code so the spec leads the build.
Files
- Language surface – every Mochi construct the codegen must lower, walked through one section per topic (value core, function core, collection core, ADT core, query DSL, stream/agent core, logic, AI/FFI, tests, modules, error model, concurrency).
- Design philosophy – the five guiding principles (spec-first, boring C, no ABI surprises, portability over performance, verifiable output) plus the runtime shape and a sample C output.
- Prior-art transpilers – Nim, Crystal, Vala, OCaml, Roc, Koka, MLton, Cosmopolitan, zig cc, Cython, ATS, MLton, Soufflé, plus 12 distilled lessons and a full sources list.
- Runtime building blocks – GC (BDWGC, MMTk, Perceus), allocator (mimalloc, scudo), coroutines (minicoro), I/O (libuv, libxev), strings (utf8proc, simdutf), hash tables (cwisstable), JSON/YAML/CSV, HTTP (libcurl), LLM, FFI.
- Codegen design – pipeline, name mangling, type lowering table, expression/statement lowering, setjmp/longjmp errors, Maranget pattern matching, modules, debug info.
- Type-system lowering – monomorphisation, records, sum types with niche optimisation, closures with fat pointer, strings with short-string optimisation, lists, Swiss-table maps, sets, time, errors.
- C target portability – C23 features used, compiler matrix (clang, gcc, msvc, zig cc, cosmocc, tcc), tier-1/2/3 targets, ABI per arch, libc matrix, sanitisers, reproducibility, hardening, style guide for emitted C.
- Dataset pipeline lowering – query DSL lowering with operator fusion, joins (inner, left, cross), group-by, order-by, distinct/union/intersect/except, arena allocation, load/save adapters.
- Streams and agents – stream/agent/mailbox lowering, M:N work-stealing scheduler over minicoro fibers, channels, shutdown protocol.
- Build system –
mochi buildcommand surface, cache layout (BLAKE3 content-addressed), cross-compile via bundled zig cc, APE via cosmocc, WASM via wasi-sdk, distribution, versioning. - Testing and CI gates – differential testing vs vm3, BG corpus, sanitiser matrix (ASan/UBSan/TSan/MSan/LeakSan), property tests, fuzzing, reproducibility check, 16 phased gates.
- Risks and alternatives – semantic, build, supply-chain, performance risks; explicit rejection of LLVM IR / WASM / Rust / JIT / C++ / Zig as primary; kill switches; comparable industrial precedent.
See also
- Native Code Emission – the parent research substrate for MEP-42, of which this is the C-AOT half. Pair with the copy-and-patch JIT notes there for the full picture.
- Memory Management – the MEP-41 substrate; the GC choice and capability story in note 04 and the hardening defaults in note 07 inherit from this.