Self-hosting feasibility study (Phase 30)¶
Scope¶
Phase 30 of the implementation plan asks whether the Gossamer compiler can, in its current shape, begin compiling itself. This document captures the state of that question at the end of Phase 30 and lists every gap standing between today's surface language and a full self-hosted front-end.
Companion ports under examples/selfhost/:
lexer.gos— minimal tokeniser over a subset of Gossamer source.parser.gos— minimal recursive-descent parser over a synthetic token stream.
Both files parse cleanly through gos parse, which is asserted by
crates/gossamer-cli/tests/cli.rs::selfhost_ports_parse_cleanly.
What works today¶
- Enums with data, pattern-matched exhaustively — viable for
TokenKind,Expr,Stmt,Ty. - Structs with named fields,
&and&mutreferences — viable forLexer,Parser,TyCtxt-shaped contexts. for,while,loop,match,if/else— enough control flow for a recursive-descent parser.- Function calls, method calls, tuples, slices, mutable locals.
println/print/eprintln/formatbuiltins.- Package manager (Phase 27–28): project layout, lockfile,
fetcher, vendor directory. Ready to host a
gossamer-selfhostcrate once the front-end ports land. - Build graph (Phase 29): incremental, parallel, content- addressable cache. Ready to compile the ports once the back-end gaps below are closed.
Gaps blocking a real front-end port¶
The following are required by the Rust implementation of the front-end and are not yet first-class in Gossamer source:
- Dynamic arrays /
Vec-shaped growable collections. The ports lean on[T]slice-push; the runtime supports it through stubs but the language surface needs a stable story forpush,pop,extend,with_capacity. - Hash maps. Every non-trivial compiler phase has a symbol
table.
std::collections::HashMap-equivalent needs a public surface (BTreeMapwould do; neither is wired). - String / byte indexing. The lexer port calls
byte_at(i)andslice(start, end). These are convenient shorthands overstr::as_bytes()and&str[start..end]; the language needs the equivalent range-slice notation or methods exposed stably. ?error propagation onResult<T, E>. Already parses; the typechecker still needs the generic arithmetic pass that distinguishesOkfromErr.- Generic functions and types. The parser needs
Vec<T>,Option<T>,Result<T, E>. Monomorphisation lands in Phase 10 of the plan but is not yet wired through to codegen. - Traits / interfaces.
Display,Debug,Iteratorare all used pervasively in the Rust original; Phase 08 delivered the trait surface but trait-objects-through-the-VM is still stub. - Stdlib
io::Read-style traits. The lexer needs a source of bytes that is not necessarily a fully-loadedString. Today we load the whole file. - Byte literals,
b"...",b'x'. Lexer proper uses byte literals; the Gossamer port had to synthesise integer constants instead. - Macros. The Rust parser uses
matches!,format!,vec![],thiserror::Error-derived enums. Macros are not planned for the 1.0.0 release; the port must inline these.
Benchmarks¶
A throughput benchmark (tokens/sec, LoC/s) is tracked as work-item
under Phase 31 once gos test and gos bench are wired. For now
the only measurement is the no-op incremental build budget from
Phase 29 (tests/build_graph.rs::no_op_rebuild_completes_within_the_phase_29_budget).
Conclusion¶
Self-hosting the front-end is reachable but not yet practical. The single biggest prerequisite is the stdlib surface for collections (items 1, 2 above); the second is closing the generics/trait-object loop in the VM. With those two, porting the Rust lexer and parser verbatim is an ~6 KLoC translation exercise, not a research project.
No work is proposed for Phase 30 beyond the ports and this write-up. Subsequent phases will revisit self-hosting once the 1.0.0 release ships.