Skip to content

Mochi-to-C Transpiler

Research substrate for Mochi MEP-45 (May 2026). A 12-file deep dive into ahead-of-time transpilation from Mochi to ISO C23: language surface, design philosophy, prior-art transpilers, runtime building blocks, codegen design, type-system lowering, C-target portability, dataset pipeline, streams and agents, build system, testing gates, and risks.

Background research for Mochi MEP-45: the deep-dive specification for the C-as-target AOT half of MEP-42. The transpiler takes compiler3 IR, lowers it to ISO C23 plus a thin libmochi.a runtime, and ships a statically-linked single-file native binary for every tier-1 triple (x86_64-linux-{gnu,musl}, aarch64-linux-{gnu,musl}, aarch64-darwin, x86_64-darwin, x86_64-windows-msvc, x86_64-windows-gnu, wasm32-wasi). The master correctness gate is byte-equal stdout from the produced binary versus vm3 on the entire fixture corpus.

Each file in this section pins down one piece of the lowering contract. The notes are first-principles design work, not summaries of the current implementation. They were written ahead of code so the spec leads the build.

Files

  1. Language surface – every Mochi construct the codegen must lower, walked through one section per topic (value core, function core, collection core, ADT core, query DSL, stream/agent core, logic, AI/FFI, tests, modules, error model, concurrency).
  2. Design philosophy – the five guiding principles (spec-first, boring C, no ABI surprises, portability over performance, verifiable output) plus the runtime shape and a sample C output.
  3. Prior-art transpilers – Nim, Crystal, Vala, OCaml, Roc, Koka, MLton, Cosmopolitan, zig cc, Cython, ATS, MLton, Soufflé, plus 12 distilled lessons and a full sources list.
  4. Runtime building blocks – GC (BDWGC, MMTk, Perceus), allocator (mimalloc, scudo), coroutines (minicoro), I/O (libuv, libxev), strings (utf8proc, simdutf), hash tables (cwisstable), JSON/YAML/CSV, HTTP (libcurl), LLM, FFI.
  5. Codegen design – pipeline, name mangling, type lowering table, expression/statement lowering, setjmp/longjmp errors, Maranget pattern matching, modules, debug info.
  6. Type-system lowering – monomorphisation, records, sum types with niche optimisation, closures with fat pointer, strings with short-string optimisation, lists, Swiss-table maps, sets, time, errors.
  7. C target portability – C23 features used, compiler matrix (clang, gcc, msvc, zig cc, cosmocc, tcc), tier-1/2/3 targets, ABI per arch, libc matrix, sanitisers, reproducibility, hardening, style guide for emitted C.
  8. Dataset pipeline lowering – query DSL lowering with operator fusion, joins (inner, left, cross), group-by, order-by, distinct/union/intersect/except, arena allocation, load/save adapters.
  9. Streams and agents – stream/agent/mailbox lowering, M:N work-stealing scheduler over minicoro fibers, channels, shutdown protocol.
  10. Build systemmochi build command surface, cache layout (BLAKE3 content-addressed), cross-compile via bundled zig cc, APE via cosmocc, WASM via wasi-sdk, distribution, versioning.
  11. Testing and CI gates – differential testing vs vm3, BG corpus, sanitiser matrix (ASan/UBSan/TSan/MSan/LeakSan), property tests, fuzzing, reproducibility check, 16 phased gates.
  12. Risks and alternatives – semantic, build, supply-chain, performance risks; explicit rejection of LLVM IR / WASM / Rust / JIT / C++ / Zig as primary; kill switches; comparable industrial precedent.

See also

  • Native Code Emission – the parent research substrate for MEP-42, of which this is the C-AOT half. Pair with the copy-and-patch JIT notes there for the full picture.
  • Memory Management – the MEP-41 substrate; the GC choice and capability story in note 04 and the hardening defaults in note 07 inherit from this.
Language surfaceEvery Mochi construct the MEP-45 codegen must lower: value core, function core, collection core, ADT core, query DSL, stream/agent core, logic, AI/FFI, tests, modules, error model, concurrency semantics.
14 min
Design philosophyThe five guiding principles behind the Mochi-to-C transpiler (spec-first, boring C, no ABI surprises, portability over performance, verifiable output), plus the runtime shape and a sample C output.
10 min
Prior-art transpilersSurvey of transpilers and AOT compilers that emit C or behave like a C-target system: Nim, Crystal, Vala, OCaml, Roc, Koka, MLton, Cosmopolitan, zig cc, Cython, ATS, Soufflé. Twelve distilled lessons.
26 min
Runtime building blocksInventory of the third-party and home-grown components the C runtime can stand on: GC (BDWGC, MMTk, Perceus), allocator (mimalloc, scudo), coroutines (minicoro), I/O (libuv, libxev), strings, hash tables, JSON/YAML/CSV, HTTP, LLM, FFI.
13 min
Codegen designCodegen pipeline, why a C IR, name mangling rules, type-lowering table, value representation with `mochi_value` boxed type, expression lowering, statement lowering, for-loop lowering, try/catch via setjmp, Maranget pattern matching, modules, amalgamation.
11 min
Type-system loweringType-system lowering details: generics/monomorphisation, records, sum types with niche optimisation, closures with fat pointer, strings with SSO, lists, maps with Swiss-table, sets, time/duration, error values with built-in code table.
8 min
C target and portabilityThe C target itself: C23 features used, compiler matrix (clang, gcc, msvc, zig cc, cosmocc, tcc), tier-1/2/3 architectures and OSes, ABI per arch, libc matrix, sanitisers, reproducibility, hardening, style guide for emitted C.
8 min
Dataset pipeline loweringLowering the Mochi query DSL (LINQ-style from/where/select/join/group by/order/limit/union/intersect/except) to C with arena allocation, operator fusion, and load/save adapters.
7 min
Streams and agentsLowering Mochi `stream<T>`, stream definitions, `on`-handlers, agent records, and `intent` methods, plus the M:N work-stealing scheduler over minicoro fibers that runs them.
6 min
Build systemBuild pipeline: `mochi build` command surface, output layout, amalgamated runtime, cross-compilation via bundled zig cc, APE via cosmocc, WASM via wasi-sdk, content-addressed caching, reproducibility.
6 min
Testing and CI gatesTesting strategy: differential testing against vm3, BG corpus, fuzzing, sanitiser matrix (ASan/UBSan/TSan/MSan/LeakSan), property tests, reproducibility check, 16 phased CI gates.
6 min
Risks and alternativesRisks (semantic, build, supply chain, performance, ergonomic), explicit alternatives considered (LLVM IR, WASM, Rust, JIT, C++, Zig), kill switches that demote the transpiler back to optional, comparable industrial precedent.
8 min