commonsformat-parser-bootstrap
A minimal parser for Commons Format modules, sized for bootstrapping. Reads enough of the format to load the production parser, runner, and resolver spec modules. Intended for D0 deployment only — local use during initial toolchain bootstrap. The production parser (commonsformat-parser) supersedes this module once available.
Premise
The bootstrap parser exists for one job: read enough of the Commons Format format to load the production parser, eval runner, and resolver spec modules so that those tools can be generated and verified.
It is not a production parser. It does not need to handle every TOML edge case, every Markdown construct, every adversarial input, or any deployment context other than D0. It exists to break the chicken-and-egg of self-hosting and is retired once the production parser is verified.
The bootstrap parser is the only Commons Format module that gets hand-shepherded into existence. Its first instance is generated by a human reading the format-spec prose and prompting a code generator, then iterating against this module's small eval suite until conformance. After that first instance exists, it is used mechanically to read the production parser spec, generate a production parser, and the bootstrap parser retires.
Deployment tier
This module is D0 only. It is intended for local use during toolchain bootstrap. Implementations of this module must not be deployed in any context where they process input from untrusted sources, run as part of CI, or otherwise face adversarial input.
A consumer who needs a parser for production use generates an implementation from commonsformat-parser (the production module), which targets D1 or higher.
Interface
A bootstrap parser is a function or component that takes a path to a module directory on disk and returns the logical contents specified in format-spec §10, sufficient to support reading the production parser, runner, and resolver spec modules.
The bootstrap parser handles a deliberately narrow subset of the format. It is not required to handle every construct the format permits — only those that appear in the spec modules it must read during bootstrap.
Schema
This module ships a schema.sql declaring the
shape of its data. Per §8
this is a shape commitment, not a storage commitment — the
generated implementation chooses a runtime representation
appropriate to its intent.
-- This DDL describes data shape, not storage. Runtime representation
-- is implementation-defined; choose what is appropriate to the target
-- language and the module's intent. The bootstrap parser is a pure
-- function from a module directory to a partial Module structure;
-- the tables below are the shape of its output, not a database.
--
-- This schema is a deliberate subset of commonsformat-parser's schema.
-- The bootstrap parser handles only the fields needed during bootstrap
-- (metadata, prose with tagged sections, evals, path), as documented in
-- the module's <constraints>.
CREATE TABLE modules (
path TEXT NOT NULL PRIMARY KEY,
name TEXT NOT NULL,
version TEXT NOT NULL,
commonsformat TEXT NOT NULL,
has_eval_suite BOOLEAN NOT NULL
);
CREATE TABLE tagged_sections (
module_path TEXT NOT NULL,
tag_name TEXT NOT NULL,
ordinal INTEGER NOT NULL,
content TEXT NOT NULL,
PRIMARY KEY (module_path, tag_name, ordinal),
FOREIGN KEY (module_path) REFERENCES modules (path)
);
CREATE TABLE named_constraints (
module_path TEXT NOT NULL,
constraint_name TEXT NOT NULL,
description TEXT NOT NULL,
PRIMARY KEY (module_path, constraint_name),
FOREIGN KEY (module_path) REFERENCES modules (path)
);
CREATE TABLE eval_cases (
module_path TEXT NOT NULL,
case_name TEXT NOT NULL,
case_class TEXT NOT NULL CHECK (case_class IN ('functional', 'adversarial', 'generator_adversary')),
category TEXT NOT NULL,
input_blob BLOB NOT NULL,
expect_blob BLOB NOT NULL,
PRIMARY KEY (module_path, case_name),
FOREIGN KEY (module_path) REFERENCES modules (path)
);
CREATE TABLE parse_violations (
module_path TEXT NOT NULL,
ordinal INTEGER NOT NULL,
kind TEXT NOT NULL,
detail TEXT NOT NULL,
PRIMARY KEY (module_path, ordinal)
);
What the bootstrap parser must handle
read-format-spec-module— parses the format-spec module (commonsformat.toml,commonsformat.md,evals.toml,LICENSE) successfullyread-parser-spec-module— parses the production parser spec module successfullyread-runner-spec-module— parses the eval runner spec module successfullyread-resolver-spec-module— parses the resolver spec module successfullyextract-required-tags— extracts the tagged sections (<intent>,<constraints>,<avoid>,<interface>,<threat-model>) used by the bootstrap-relevant spec moduleshandle-utf8— input files in UTF-8 are accepted; non-UTF-8 input is rejectedproduce-logical-contents— produces the metadata, prose, constraints, evals, and path fields per format-spec §10
What the bootstrap parser is NOT required to handle
- Adversarial input. The bootstrap parser is not required to remain bounded under hostile input, to detect malicious modules, or to resist denial-of-service. It runs locally on trusted seed material.
- Performance. Slow parsing is acceptable for bootstrap. The production parser handles performance-sensitive cases.
- Edge cases of the TOML or Markdown subsets that don't appear in the bootstrap-relevant spec modules. If a future spec module uses a TOML construct the bootstrap parser doesn't recognize, that's fine — the bootstrap parser doesn't need to read it. The production parser will.
- Comprehensive error reporting. Returning "rejected" without detailed diagnostics is acceptable at D0.
- Streaming, incremental parsing, or any optimization concern.
Threat model
The bootstrap parser runs on the consumer's local machine, processing seed material the consumer chose to clone (the format-spec, parser, runner, and resolver spec modules from a trusted source). The threat model is "the consumer might have made a mistake," not "an adversary fed me input."
This is the D0 threat model. Implementations of this module are explicitly not intended for any context where the threat model is stronger.
A consumer using a bootstrap parser to read modules from untrusted sources is misusing it. The format makes this misuse loud — the deployment tier is declared in the lockfile, and tools should refuse to operate at higher tiers using a bootstrap-only implementation.
Bootstrap procedure
A consumer encountering Commons Format for the first time follows this sequence:
- Clone the canonical repository containing the format-spec, bootstrap parser, parser, runner, and resolver spec modules.
- Read the format-spec module's
commonsformat.md(with eyes, since no parser exists yet). - Read this module's
commonsformat.mdto understand what the bootstrap parser needs to do. - Use a code generator to produce a bootstrap parser implementation in the consumer's target language. The format spec provides the grammar; this module provides the narrowed scope.
- Verify the bootstrap parser against this module's eval suite by running the cases manually (computing expected outputs by hand and comparing) or by writing a simple test harness in the consumer's language.
- Iterate until the bootstrap parser passes this module's eval suite.
- Use the bootstrap parser to read the production parser spec module's files.
- Generate a production parser from the merged spec (format-spec + parser-spec).
- The production parser inherits the bootstrap eval suite plus the production eval suite, and must pass both. The consumer can verify it manually, by running the bootstrap parser's evals plus the production evals through whatever testing approach they have.
- Once the production parser passes, retire the bootstrap parser. Subsequent tools (eval runner, resolver) are bootstrapped using the production parser.
The bootstrap parser is a temporary artifact. Its existence is finite and intentional.
Verification
This module's eval suite verifies that a bootstrap parser can correctly read the bootstrap-relevant spec modules (format-spec, production parser, runner, resolver). Cases use real spec module content as fixtures, so a passing bootstrap parser is empirically ready for its narrow job.
A consumer generating a bootstrap parser iterates against this eval suite until conformance. The cases are deliberately few — the bootstrap parser doesn't need extensive testing because it operates on a fixed, trusted set of inputs.