Skip to content

Language surface

Every Mochi construct the MEP-45 codegen must lower: value core, function core, collection core, ADT core, query DSL, stream/agent core, logic, AI/FFI, tests, modules, error model, concurrency semantics.

MEP-45 research note 01, Mochi language surface

Author: research pass for MEP-45 (Mochi → C transpiler). Date: 2026-05-22 (GMT+7). Sources: docs/features/*.md, docs/index.md, docs/common-language-errors.md, mcp/cheatsheet.mochi, ROADMAP.md, examples/v0.2v0.7, normative security specs docs/security/threat-model.md and docs/security/memory-safety.md.

This note records the user-visible language surface that the C target must faithfully reproduce. It is deliberately written from the spec downward and ignores the existing Go runtime, the vm3 bytecode, and any backend codegen files. The goal is a transpiler design that would be correct against the language, not against the present implementation.

The surface decomposes into eight orthogonal sub-languages: (1) the value core, (2) the function and method core, (3) the collection core, (4) the algebraic-data-type core, (5) the query DSL, (6) the stream / agent core, (7) the logic-programming core, and (8) the AI / FFI shells. Each section below names every form a Mochi program can write, then states a lowering obligation the C backend must honour.

1. Value core

1.1 Bindings

Mochi has exactly two binding forms.

  • let name = expr, immutable. Re-assignment is a compile-time error, not a runtime panic. The compiler must emit the binding as a const-ish C l-value: either const T name = ...; for primitive types, or a flag in the symbol table that forbids = lowering after first definition.
  • var name = expr, mutable. Re-assignment is unrestricted within the variable’s lexical scope.

A binding may carry an explicit type: let x: int = 0. When the annotation is absent the type is inferred from the initializer. The inference is unidirectional (left-to-right, expression-down), so var xs: list<int> = [] is the canonical idiom for an empty mutable list. (See VM enhance spec 0951 §1 for the “mutable literal in a function body must be cloned” rule; this is a semantic rule, not a backend hack, and the C lowering must respect it.)

Mochi supports destructuring at let:

let [a, b] = [1, 2]
let {"name": n, "age": age} = {"name": "Ana", "age": 22}

The list pattern is positional; the map pattern is key-driven. Both bind fresh names; both are immutable. C lowering: emit a temp for the source expression and then bind each name to a field/index access of the temp.

Scoping is lexical and block-based. Inner blocks shadow outer bindings.

1.2 Primitive types

Surfaced by the docs and the cheatsheet:

MochiWidth / semanticsC lowering
int64-bit signed integer (inferred from integer literals)int64_t
float64-bit IEEE 754 doubledouble
booltrue / falsebool (C23 keyword)
stringUTF-8 text, indexable as code points, immutabletagged mochi_str (see runtime note)
timeabsolute timestamp (used by streams)mochi_time (i64 ns since Unix epoch, UTC)
durationtime interval (std/time API)mochi_dur (i64 ns)
image (preview)binary blob (load "cat.png")mochi_bytes

Implicit numeric conversions are not allowed (per the type-checker discipline implied by MEP-4/5/6 referenced from the threat model). int + float is a type error; the program must float(x) first. This shapes the C lowering: emit no implicit C conversions; the C output is dense with explicit cast helpers that the type-checker has already proven safe.

1.3 Operators

Arithmetic + - * / %; comparison == != < <= > >=; boolean && || !; membership in; string concatenation overloads +. Lowering: 1-for-1 to C operators for integer types (with overflow trap in debug builds); floating-point uses C’s / (IEEE 754); == on records is field-wise structural; == on collections is element-wise; in on map dispatches on the map’s hash table; in on string is strstr-style substring.

1.4 Strings as read-only character sequences

let text = "hello"
print(text[1])     // "e"
for ch in text { ... }

Indexing yields a 1-character string (not a char). Iteration yields 1-character strings in code-point order, not byte order. This forces a UTF-8 cursor in the runtime (not raw char *).

1.5 Literals

Integer literals; floating literals (3.14); boolean; string with C-style escapes; triple-quoted multi-line strings ("""..."""); list [...]; map {key: val, ...}; set {a, b, c}; record constructor T { field: val }.

The set literal {a, b, c} is distinguished from the empty/map literal {} by the absence of : after the first element. The grammar must keep these unambiguous; the C lowering picks the right constructor accordingly.

2. Function and method core

2.1 Top-level functions

fun add(a: int, b: int): int { return a + b }

Parameters are typed; return type is required (even void is implicit when no return-type clause appears, as in fun describe() { print(...) } from examples/v0.3/method.mochi). Recursion is supported. The docs explicitly warn there is no implicit tail-call optimisation, so the C lowering is free to (and should) emit ordinary C calls with no obligation to TCO. Optional opt-in via [[gnu::musttail]] / [[clang::musttail]] is a Phase 2 enhancement.

2.2 First-class function values

let square = fun(x: int): int => x * x
fun apply(f: fun(int): int, value: int): int { return f(value) }

Function values are first-class. They have a structural type fun(int): int. Lowering: closures need an environment struct because the arrow form can capture lexical variables (make_adder in mcp/cheatsheet.mochi closes over n). The natural lowering is a 2-word fat pointer { void (*code)(env*, args...); env* env; }. See codegen note for the alternative monomorphisation strategies.

2.3 Methods on type blocks

type Circle {
  radius: float
  fun area(): float { return 3.14 * radius * radius }
}

A method receives an implicit self; field names inside the block are unqualified (radius, not self.radius). Lowering: every method becomes a C function Circle_area(struct Circle *self) and field access desugars to self->radius.

2.4 Built-in print

Variadic, prints with default formatting and inserts spaces (cheatsheet: print("name = ", name, ...)); newline at end. No format-string interface at the source layer.

3. Collection core

Three primitive containers, all with structural typing:

  • list<T>, ordered, growable. Access by integer index. Append via xs.push(v) or append(xs, v). Concatenation with +. Iteration with for x in xs.
  • map<K, V>, keyed lookup with m[k]. Iteration over keys: for k in m.keys(). Membership: k in m.
  • set<T>, unordered, unique members. Membership: x in s. Set ops via the query layer (union, intersect, except).
  • string, read-only list<char>-ish (see §1.4).

Lowering obligations:

  • list<T> is the workhorse. The runtime needs a small-buffer-optimised growable vector: pointer + len + cap, with the first ~4–8 elements inline.
  • map<K, V> defaults to a hash table for K: int | string; an ordered variant is needed for query stability. Swiss-table style is the modern default; see runtime note.
  • set<T> is a map<T, unit> internally. The query layer (§5) needs the insertion-ordered semantics for union/except to be deterministic.
  • All collections are value-semantically copied at language level; the VM enhancement spec 0951 §1 makes this explicit (“each function call must allocate a fresh copy of any list/map literal”). The C lowering must therefore use copy-on-write or per-call allocation; persistent data structures (HAMT) are a candidate.

4. Algebraic-data-type core

Two type-declaration shapes:

  • Records (struct-like):
    type Book { title: string, author: Person, pages: int }
    Construction Book { title: ..., author: ..., pages: ... }. Field access b.title. Equality is structural.
  • Sum types with payload-carrying variants:
    type Tree = Leaf | Node(left: Tree, value: int, right: Tree)
    Lowering: a tagged union. Nullary variants like Leaf can be folded into the tag word; positional variants get a payload struct.

Pattern matching deconstructs both:

return match t {
  Leaf => 0
  Node(l, v, r) => sum(l) + v + sum(r)
}

The C lowering must implement an exhaustive decision-tree compilation (Maranget’s algorithm) and reject non-exhaustive matches at compile time; the wildcard _ arm makes exhaustiveness explicit. Variable bindings inside a pattern (Node(l, v, r)) are immutable in the arm body.

Type declarations may carry methods (§2.3) inside the block. Methods are indexed by the principal type, so Tree cannot carry methods that depend on which variant self is; methods that need that dispatch internally via match self.

5. Query DSL

A complete in-language LINQ. The grammar of a single query is:

query := 'from' x 'in' src
         ( 'from' y 'in' src2
         | 'join' alias 'in' src 'on' expr
         | 'left' 'join' alias 'in' src 'on' expr
         | 'right' 'join' alias 'in' src 'on' expr
         | 'outer' 'join' alias 'in' src 'on' expr
         )*
         ( 'where' expr )?
         ( 'group' 'by' expr 'into' g )?
         ( 'sort' 'by' expr (',' expr)* )?
         ( 'skip' expr )?
         ( 'take' expr )?
         'select' expr

Set operations operate on whole queries:

(from x in a select x) union (from y in b select y)
(from x in a select x) union all ...
(from x in a select x) intersect ...
(from x in a select x) except ...

Sort direction is encoded by leading -: sort by -p.age.

group by returns a synthetic g per group, with .key and aggregation functions count(g), sum(g, e), avg(g, e), min(g, e), max(g, e).

External sources are loaded with load PATH as T (CSV / JSON / JSONL / YAML auto-detected by extension, override with with { format: "..." }). Results are written with save expr to PATH. extern type T { ... } declares a record shape that the loader is allowed to populate from an externally-defined schema (the load_with.mochi example).

Lowering: the query is not a function call to a library; it is a source-level DSL that the compiler translates into a pipeline of iterator combinators. The naive lowering is a nested loop; the modern lowering is a fused stream (Csharp LINQ-to-objects style) that respects short-circuiting (take cuts off the upstream). The C target must support both: small queries inline as nested loops, large queries materialise into the runtime’s mochi_dataset type.

6. Stream / agent core

6.1 Streams

A stream T { fields } declaration introduces a global event channel keyed by the type name. Events are emitted with emit T { ... } and consumed by on T as x { body } handler blocks. Handlers can be either top-level (the cheatsheet example) or nested inside an agent (§6.2).

The roadmap pins the dispatch contract:

  • Events are queued and replayed deterministically.
  • Multiple on blocks for the same stream are all invoked, concurrently per the docs (docs/features/streams.md).
  • Optional timestamp: time field; auto-assigned via now() if absent.
  • Events emitted from inside a handler are queued (FIFO) and processed after the current handler returns.

The “concurrently” claim is a real semantic obligation. The C lowering needs an M:N scheduler with a per-stream FIFO and a worker pool. Test runs require the scheduler to also support deterministic single-threaded replay (the cheatsheet’s test blocks use emit and expect and require ordering by timestamp).

6.2 Agents

An agent T { ... } block bundles state, handlers, and exposed methods. State is per-instance (var count: int = 0); handlers (on Sensor as s { ... }) react to streams; intent foo(): U { ... } methods are externally callable. Construction is let m = T {}; intents are invoked as methods: m.status().

Lowering: an agent is a record-with-handlers. The C output is a struct with the state fields plus a fixed-size handler table that the scheduler walks on event delivery. Intent methods compile like ordinary methods (§2.3). The agent struct is heap-allocated and lives for the duration of the program (roadmap: “single instance per agent definition” was the v0.5 contract; future versions may allow many).

7. Logic-programming core

fact parent("Alice", "Bob")
rule grandparent(x, z) :- parent(x, y), parent(y, z)
let gp = query grandparent(x, z)

Bottom-up Datalog. Facts are stored in an indexed relation. Rules are non-recursive in syntax but the engine must support transitive closures (grandparent over parent). Queries return a list<map<string, T>> keyed by the variable names in the head, so gp[0].x and gp[0].z are accessible.

Lowering: the modern path is to emit a semi-naive evaluator. The pragmatic path for a transpiler is to translate each rule into a nested loop and a result accumulator, with deduplication via a per-relation hash set. Sliding window: emit both, gate on relation size at runtime.

8. AI and FFI shells

8.1 Generative AI

generate text { prompt: ..., temperature: ..., max_tokens: ..., stop: ..., model: ..., tools: [...] } returns a string; generate embedding { text: ..., normalize: bool, model: ... } returns list<float>; generate T { prompt: ... } (where T is a user record) returns a T, with the model coerced to emit JSON matching the record schema (cheatsheet §6, examples/v0.3/generate-struct.mochi).

Models are declared once:

model quick { provider: "openai", name: "gpt-3.5-turbo", temperature: 0.7 }

and referenced by name from generate blocks. Tools are ordinary fun references with optional { description: "..." } metadata.

Lowering: generate is a runtime call dispatched to a mochi_llm_* shim in the C runtime. The shim talks HTTP (libcurl) to the provider; the record-return form uses the same JSON-driven loader as load. Models live in a global symbol table populated at static-init time.

8.2 HTTP fetch

fetch "url" as T and the with { method, headers, body } long form. Errors propagate as exceptions; try { ... } catch err { ... } catches. Lowering: libcurl + JSON parser. The try/catch form needs an exception-like construct in C; setjmp/longjmp is the pragmatic choice (see runtime note for the algebraic-effects-vs-setjmp tradeoff).

8.3 FFI

import go "math" as math
extern fun math.Sqrt(x: float): float

Three host languages are explicitly named: go, python, typescript. At the C target, “import go” and “import typescript” cannot mean linking against Go or Deno objects directly; the only universal interop story is subprocess RPC. The transpiler must generate stubs that marshall the call to a sidecar process for each host, using a JSON-over-pipe protocol matching the existing runtime/ffi/* shape (named in the docs but not read here).

A fourth host implicit in the design is “import c”, direct linkage against a C library. The transpiler should add this case explicitly: an import c "header.h" as foo declaration that maps to a #include plus direct call generation, no marshalling.

9. Tests

test "name" { expect bool_expr ; ... }

test blocks are top-level. Each expect is a boolean expression. On failure, the reported diagnostic carries the line and the rendered expression text (so the compiler has to keep the source span for every expect). Tests participate in build via mochi test.

Lowering: every test becomes a C function mochi_test_<n>(test_ctx*); every expect becomes a macro that records pass/fail and the source-text expansion. A small driver iterates the test table at main() time and prints results in a stable format.

10. Module and package system

A directory is a package. Files share a namespace. package foo at the top of a file sets the package name; import "path" (relative ./...) brings another package in; aliasing via as. export makes a name visible; unmarked names are package-private.

Lowering: each package becomes a C translation unit (or a header+source pair). Cross-package references go through the alias as a C namespace prefix: import "mathutils" as mu; mu.add(...)mathutils_add(...). The transpiler must produce a deterministic mangling scheme; see codegen note for the rules.

11. Error model

Two distinct error paths:

  • Compile time: type errors, exhaustiveness, re-assignment of let, module-cycle, undeclared extern, schema mismatch in load, etc.
  • Runtime: fetch failure, parse failure, division by zero, integer overflow (per the threat-model “logic bugs trap deterministically” clause).

Runtime errors are recovered with try { ... } catch err { ... }. The caught err is a value with at least .message: string. There is no explicit error-type hierarchy in the docs; the C lowering uses a single mochi_error discriminated value with a string message plus an optional machine-readable code, threaded through try / catch via a stacked longjmp buffer.

12. Concurrency semantics summary

Distilled from §6:

  • Streams: M handlers per stream type, all invoked per event, concurrently per the docs but replayable deterministically in test mode.
  • Agents: own per-agent state. Handlers inside an agent are serialised against that agent’s state (single-thread per agent is the simplest reading); inter-agent dispatch can be concurrent.
  • The threat model excludes “concurrent / multi-actor Mochi” from the vm3 memory-safety claim, which means the language does admit concurrency but the safety story is best-effort there.

For the C target this resolves to: per-agent mailbox, one fiber per agent, M:N scheduler over OS threads, channels for inter-agent events. See the runtime note for the libdill / minicoro / boost.context tradeoff.

13. Reflection / introspection

Nothing in the language surface requires runtime reflection. print(x) needs a per-type formatter, which the transpiler can emit statically by walking the record’s field list at compile time (no runtime type descriptors needed beyond what match already requires for sum types). The C lowering can therefore avoid a full RTTI system, keeping the runtime small.

14. Lowering-obligation summary

Compressed checklist that the MEP body uses as the source of truth for the codegen design:

  1. Preserve let/var, structural records, sum types, methods, pattern matching.
  2. Preserve UTF-8 string semantics for indexing and iteration.
  3. Preserve copy-on-allocate semantics for list/map/set literals inside function bodies (VM-enhance §1).
  4. Preserve LINQ query semantics including set operations and group-by.
  5. Preserve concurrent multi-handler stream dispatch with deterministic test replay.
  6. Preserve agent encapsulation and per-instance state.
  7. Preserve try/catch with stack unwinding.
  8. Preserve Datalog fact/rule/query evaluation with deduplication.
  9. Preserve load/save for CSV/JSON/JSONL/YAML.
  10. Preserve fetch with header/body/method customisation and JSON decoding.
  11. Preserve generate text/generate T/generate embedding with per-model defaults and tool-callbacks.
  12. Preserve test discoverability and pretty-printed failures.
  13. Preserve package-private symbol scoping with deterministic mangling.
  14. Preserve extern interop with Go / Python / TypeScript / C.

Each obligation maps to a specific C-output construct documented in 05-codegen-design.md.