Type-system lowering details: generics/monomorphisation, records, sum types with niche optimisation, closures with fat pointer, strings with SSO, lists, maps with Swiss-table, sets, time/duration, error values with built-in code table.
MEP-45 research note 06, Type-system lowering details
Author: research pass for MEP-45. Date: 2026-05-22 (GMT+7).
Companion to note 05. Focused on the type-system corners and how they become C types and C invariants. This note assumes the codegen scaffold in note 05 and elaborates one piece at a time.
1. Generics and monomorphisation
Mochi’s surface generics are limited to type parameters on collections and on user record/sum types (no higher-kinded types in the docs, no trait/typeclass system surfaced). The transpiler performs full monomorphisation:
- After type-checking, the elaborator records every concrete
instantiation observed in the program (
list<int>,Pair<string, list<float>>,Option<Tree>). - For each instantiation, a fresh C type and a fresh set of method/helper functions are emitted.
- De-duplication: identical instantiations across packages collapse to a single C type using the mangling rules in note 05 §3.
- Polymorphic recursion is rejected at type-check time (a generic
function cannot call itself at a strictly larger type), this keeps
the monomorphisation set finite. (The MEP body must verify the spec
rules this out; if not, a Phase-2 “polymorphic stub” using a boxed
mochi_valueis the fallback.)
The set of instantiations is a closed analysis over the whole program,
so it requires the transpiler to see all packages at once. For the
mochi build path this is fine; for separate compilation (Phase 3) the
transpiler would have to emit specialisation stubs lazily.
2. Records (struct types)
type Book {
title: string
author: Person
pages: int
tags: list<string>
metadata: map<string, string>
published: bool
}Lowering: a packed C struct. Field order in the C struct is the source
order (so published: bool lives at the end of Book, and the
compiler does not silently re-order for cache reasons). This rule
exists for two reasons:
- Deterministic ABI: a Book emitted by this transpiler must lay out identically on every supported target.
- Round-trip with
print(book): pretty-printers walk fields in source order.
C struct layout still has C padding rules. The transpiler does not
inject __attribute__((packed)) (that would trigger UB on misaligned
loads on arm64). Instead, the transpiler may re-order at the source
level if and only if the user opted in with @layout(compact) on the
type (Phase 2). For v1, source order wins.
Constructors:
struct pkg_Book pkg_Book__new(
mochi_str title, struct pkg_Person author, mochi_int pages,
mochi_list__str tags, mochi_map__str_str metadata, bool published);Field access b.title lowers to b.title (the C compiler matches the
field name). Field update b.pages = 5 is not allowed if the
binding holding b is let; under var, the lowering is plain C
assignment.
Structural equality is field-wise:
bool pkg_Book__eq(struct pkg_Book a, struct pkg_Book b);emitted automatically when the type is used in == / != /
set<Book> / map<Book, ...>.
3. Sum types
type Tree = Leaf | Node(left: Tree, value: int, right: Tree)Lowering:
typedef enum {
PKG_TREE_TAG__Leaf,
PKG_TREE_TAG__Node,
} pkg_Tree_tag;
struct pkg_Tree {
pkg_Tree_tag tag;
union {
struct { /* empty */ } Leaf;
struct {
struct pkg_Tree *left; // boxed, to break recursion
mochi_int value;
struct pkg_Tree *right;
} Node;
} u;
};Constructors emit factory functions:
struct pkg_Tree pkg_Tree__Leaf(void);
struct pkg_Tree pkg_Tree__Node(struct pkg_Tree *l, mochi_int v, struct pkg_Tree *r);Recursive variant payloads use heap-allocated boxes; non-recursive payloads inline. The transpiler proves recursion at type-check time and boxes accordingly.
Equality: tag compare, then per-variant field compare. Auto-derived; shared with the records path.
3.1 Niche optimisation
Two special cases:
?T(type Option<T> = None | Some(T)) withTa pointer-shaped value (string, list, map, record handle):Noneisnullptr,Some(x)isx. No tag word.- A two-variant nullary enum (e.g.
type Side = Left | Right) lowers to a singleuint8_t.
A third case considered: pointer tagging on platforms where the low 3 bits of a pointer are unused. Deferred to Phase 2, the codegen saving is real (~6 % on the BG corpus per the Lobster paper) but the portability cost is high (wasm32 pointers, arm64 with MTE).
3.2 GADTs and refined patterns
Not in the language surface. No work to do.
4. Function types and closures
Mochi function types are structural: fun(int, string): bool matches
any function (free or method or arrow) with that signature.
Lowering:
typedef struct {
bool (*code)(void *env, mochi_int a, mochi_str b);
void *env;
} mochi_closure__bool_i64_str;
bool mochi_closure__bool_i64_str__call(
mochi_closure__bool_i64_str f, mochi_int a, mochi_str b) {
return f.code(f.env, a, b);
}Free functions get an env == NULL adapter. Arrow functions get an
env-struct holding their captures. Method values get an env == self
shim.
The transpiler tries to devirtualise where it can (note 05 §14, item 5), but a closure call site never compiles into something less than this two-load + indirect-call sequence. The cost is ~3-4 cycles on modern x86 and arm64, well-predicted by the branch target predictor as long as the call site is monomorphic.
4.1 Closure environment heap discipline
If the closure escapes its lexical scope (returned upwards, stored in a record, sent on a channel), the env struct is heap-allocated and GC-managed. If the closure does not escape (single use inside the same function), the env struct is stack-allocated.
The escape analysis is conservative: any closure passed to a function
parameter is treated as escaping unless the callee is annotated
@no_escape (Phase 2 attribute). For v1, all closures heap-allocate
their env when they capture anything. This is the safe-and-correct
default; performance work is Phase 2.
5. Strings
mochi_str is an immutable UTF-8 byte slice with a precomputed hash:
typedef struct {
const uint8_t *bytes;
size_t len; // byte length
uint32_t hash; // FxHash or wyhash, computed at construction
uint32_t flags; // interned, gc-managed, …
} mochi_str;Indexing s[i] returns the i-th code point as a one-character
mochi_str. The cursor:
typedef struct { const uint8_t *p; const uint8_t *end; } mochi_str_iter;
mochi_str_iter mochi_str_iter_init(mochi_str s);
bool mochi_str_iter_next(mochi_str_iter *it, mochi_str *out_char);The cursor handles BOM, surrogates, and invalid UTF-8 (replacement character per Unicode TR-36).
Concatenation +:
mochi_str mochi_str_cat(mochi_str a, mochi_str b);Short-string optimisation: strings of ≤ 22 bytes are stored inline in the
mochi_str struct (bytes reinterpreted as a buffer, length in low 6
bits of flags). This is a Phase-2 optimisation, v1 always heap-allocates.
Interning: string literals in the source are deduplicated per-module at
codegen time. Runtime interning is not automatic; an opt-in
mochi_str_intern(s) returns a canonical handle for map<string, T>
hot loops.
6. Lists
mochi_list__T is a growable, dense vector:
typedef struct {
T *data;
size_t len;
size_t cap;
uint32_t flags; // gc, frozen, owned, ...
} mochi_list__T;Capacity policy: amortised doubling with a floor of 4. Append is
amortised O(1). Concatenation a + b allocates a fresh list of length
a.len + b.len.
The “list literal is cloned per call” rule (VM-enhance 0951 §1):
- Compile-time constant literal in a
letbinding at top level: emitted as astatic constarray; the wrapper is built once at startup. - Constant literal in a function body: emitted as a
mochi_list__T_new3(1, 2, 3)call, which always allocates. - Variable literal in a function body: same.
[]always returns a fresh empty list.
This is per the VM-enhance spec which is normative for the language.
7. Maps
mochi_map__K_V is a Swiss-table (Abseil-style) for K hashable. The
hash function is statically resolved from K:
int,bool, enum-tag-like sum tags: identity (then mixed with multiplicative hash to break clustering)string: precomputedhashfield onmochi_str- record types: field-wise hash, combined with a rotating mixer
typedef struct {
/* opaque slot array */
void *slots;
size_t len;
size_t cap;
uint32_t flags;
} mochi_map__K_V;API:
V mochi_map__K_V__get(mochi_map__K_V m, K key); // panic on miss
bool mochi_map__K_V__try_get(mochi_map__K_V m, K key, V *out);
void mochi_map__K_V__set(mochi_map__K_V *m, K key, V value);
bool mochi_map__K_V__contains(mochi_map__K_V m, K key);
size_t mochi_map__K_V__len(mochi_map__K_V m);The query layer needs insertion-ordered iteration. A second flavour,
mochi_omap__K_V, holds an auxiliary mochi_list__K of insertion order
keys and is the type the query DSL emits for from x in m. This is a
separate type so the cost is not paid by hot non-query maps.
8. Sets
A mochi_set__T is structurally mochi_map__T_unit where unit is a
zero-byte sentinel. The map’s value slot is omitted by the codegen,
saving the value-side storage.
9. Time and duration
typedef int64_t mochi_time; // ns since Unix epoch UTC
typedef int64_t mochi_dur; // ns
std/time.now() calls clock_gettime(CLOCK_REALTIME, &ts) on POSIX,
GetSystemTimePreciseAsFileTime on Windows. parse_iso and
format_iso are runtime functions. Arithmetic is plain integer
arithmetic with overflow checks.
10. Error values
typedef struct mochi_error {
int32_t code; // 0 = no-error, negative = built-in, positive = user
mochi_str message;
struct mochi_error *cause; // chained, optional
} mochi_error;Built-in codes are documented in mochi/errors.h:
| Code | Name | Source |
|---|---|---|
| -1 | MOCHI_ERR_FETCH | network or HTTP non-2xx |
| -2 | MOCHI_ERR_PARSE | JSON / YAML / CSV decode |
| -3 | MOCHI_ERR_TYPE | runtime type mismatch (from mochi_value unwrap) |
| -4 | MOCHI_ERR_INDEX | OOB index / missing key |
| -5 | MOCHI_ERR_DIVZERO | integer divide by zero |
| -6 | MOCHI_ERR_OVERFLOW | integer overflow (debug builds only) |
| -7 | MOCHI_ERR_FFI | FFI subprocess failure |
| -8 | MOCHI_ERR_LLM | provider error from generate |
| -9 | MOCHI_ERR_ASSERT | expect false |
User errors get codes ≥ 1; the runtime issues them via a mochi_panic
counterpart that constructs a mochi_error and mochi_throws.
11. Effect labels (future)
The current language surface has no algebraic effect system in the
source. The MEP body should record that if a future MEP adds one (the
threat-model document mentions a meta effect on OpUnseal), the C
target is prepared: a per-effect capability is threaded as a hidden
parameter through any function that uses it, matching the Koka
capability-passing translation.
For v1: no extra parameters.
12. Type-checker → codegen interface
The MIR carries every type as a fully-instantiated MType. The codegen
visits each MType exactly once via a memoised lowering function
lower_type(MType) -> CType. Memoisation ensures that two identical
MTypes share a single C type and a single set of helpers.
The lowering is total: every MType the type-checker emits has a C
lowering. The MEP body must list every case and the test gate must
cover all of them with at least one fixture each.