module · canonical release 0.2

commonsformat-parser-bootstrap

version 0.1.1 · targets format 0.2 · D0 only

A minimal parser for Commons Format modules, sized for bootstrapping. Reads enough of the format to load the production parser, runner, and resolver spec modules. Intended for D0 deployment only — local use during initial toolchain bootstrap. The production parser (commonsformat-parser) supersedes this module once available.

depends on github.com/commonsformat/commonsformat-format ^0.2.0

license MPL-2.0

verifies ./evals.toml

deployment D0 only

Premise

The bootstrap parser exists for one job: read enough of the Commons Format format to load the production parser, eval runner, and resolver spec modules so that those tools can be generated and verified.

It is not a production parser. It does not need to handle every TOML edge case, every Markdown construct, every adversarial input, or any deployment context other than D0. It exists to break the chicken-and-egg of self-hosting and is retired once the production parser is verified.

The bootstrap parser is the only Commons Format module that gets hand-shepherded into existence. Its first instance is generated by a human reading the format-spec prose and prompting a code generator, then iterating against this module's small eval suite until conformance. After that first instance exists, it is used mechanically to read the production parser spec, generate a production parser, and the bootstrap parser retires.

Deployment tier

This module is D0 only. It is intended for local use during toolchain bootstrap. Implementations of this module must not be deployed in any context where they process input from untrusted sources, run as part of CI, or otherwise face adversarial input.

A consumer who needs a parser for production use generates an implementation from commonsformat-parser (the production module), which targets D1 or higher.

Interface

A bootstrap parser is a function or component that takes a path to a module directory on disk and returns the logical contents specified in format-spec §10, sufficient to support reading the production parser, runner, and resolver spec modules.

The bootstrap parser handles a deliberately narrow subset of the format. It is not required to handle every construct the format permits — only those that appear in the spec modules it must read during bootstrap.

Schema

This module ships a schema.sql declaring the shape of its data. Per §8 this is a shape commitment, not a storage commitment — the generated implementation chooses a runtime representation appropriate to its intent.

-- This DDL describes data shape, not storage. Runtime representation
-- is implementation-defined; choose what is appropriate to the target
-- language and the module's intent. The bootstrap parser is a pure
-- function from a module directory to a partial Module structure;
-- the tables below are the shape of its output, not a database.
--
-- This schema is a deliberate subset of commonsformat-parser's schema.
-- The bootstrap parser handles only the fields needed during bootstrap
-- (metadata, prose with tagged sections, evals, path), as documented in
-- the module's <constraints>.

CREATE TABLE modules (
    path                 TEXT    NOT NULL PRIMARY KEY,
    name                 TEXT    NOT NULL,
    version              TEXT    NOT NULL,
    commonsformat        TEXT    NOT NULL,
    has_eval_suite       BOOLEAN NOT NULL
);

CREATE TABLE tagged_sections (
    module_path          TEXT    NOT NULL,
    tag_name             TEXT    NOT NULL,
    ordinal              INTEGER NOT NULL,
    content              TEXT    NOT NULL,
    PRIMARY KEY (module_path, tag_name, ordinal),
    FOREIGN KEY (module_path) REFERENCES modules (path)
);

CREATE TABLE named_constraints (
    module_path          TEXT    NOT NULL,
    constraint_name      TEXT    NOT NULL,
    description          TEXT    NOT NULL,
    PRIMARY KEY (module_path, constraint_name),
    FOREIGN KEY (module_path) REFERENCES modules (path)
);

CREATE TABLE eval_cases (
    module_path          TEXT    NOT NULL,
    case_name            TEXT    NOT NULL,
    case_class           TEXT    NOT NULL CHECK (case_class IN ('functional', 'adversarial', 'generator_adversary')),
    category             TEXT    NOT NULL,
    input_blob           BLOB    NOT NULL,
    expect_blob          BLOB    NOT NULL,
    PRIMARY KEY (module_path, case_name),
    FOREIGN KEY (module_path) REFERENCES modules (path)
);

CREATE TABLE parse_violations (
    module_path          TEXT    NOT NULL,
    ordinal              INTEGER NOT NULL,
    kind                 TEXT    NOT NULL,
    detail               TEXT    NOT NULL,
    PRIMARY KEY (module_path, ordinal)
);

What the bootstrap parser must handle

read-format-spec-module — parses the format-spec module (commonsformat.toml, commonsformat.md, evals.toml, LICENSE) successfully
read-parser-spec-module — parses the production parser spec module successfully
read-runner-spec-module — parses the eval runner spec module successfully
read-resolver-spec-module — parses the resolver spec module successfully
extract-required-tags — extracts the tagged sections (<intent>, <constraints>, <avoid>, <interface>, <threat-model>) used by the bootstrap-relevant spec modules
handle-utf8 — input files in UTF-8 are accepted; non-UTF-8 input is rejected
produce-logical-contents — produces the metadata, prose, constraints, evals, and path fields per format-spec §10

What the bootstrap parser is NOT required to handle

Adversarial input. The bootstrap parser is not required to remain bounded under hostile input, to detect malicious modules, or to resist denial-of-service. It runs locally on trusted seed material.
Performance. Slow parsing is acceptable for bootstrap. The production parser handles performance-sensitive cases.
Edge cases of the TOML or Markdown subsets that don't appear in the bootstrap-relevant spec modules. If a future spec module uses a TOML construct the bootstrap parser doesn't recognize, that's fine — the bootstrap parser doesn't need to read it. The production parser will.
Comprehensive error reporting. Returning "rejected" without detailed diagnostics is acceptable at D0.
Streaming, incremental parsing, or any optimization concern.

Threat model

The bootstrap parser runs on the consumer's local machine, processing seed material the consumer chose to clone (the format-spec, parser, runner, and resolver spec modules from a trusted source). The threat model is "the consumer might have made a mistake," not "an adversary fed me input."

This is the D0 threat model. Implementations of this module are explicitly not intended for any context where the threat model is stronger.

A consumer using a bootstrap parser to read modules from untrusted sources is misusing it. The format makes this misuse loud — the deployment tier is declared in the lockfile, and tools should refuse to operate at higher tiers using a bootstrap-only implementation.

Bootstrap procedure

A consumer encountering Commons Format for the first time follows this sequence:

Clone the canonical repository containing the format-spec, bootstrap parser, parser, runner, and resolver spec modules.
Read the format-spec module's commonsformat.md (with eyes, since no parser exists yet).
Read this module's commonsformat.md to understand what the bootstrap parser needs to do.
Use a code generator to produce a bootstrap parser implementation in the consumer's target language. The format spec provides the grammar; this module provides the narrowed scope.
Verify the bootstrap parser against this module's eval suite by running the cases manually (computing expected outputs by hand and comparing) or by writing a simple test harness in the consumer's language.
Iterate until the bootstrap parser passes this module's eval suite.
Use the bootstrap parser to read the production parser spec module's files.
Generate a production parser from the merged spec (format-spec + parser-spec).
The production parser inherits the bootstrap eval suite plus the production eval suite, and must pass both. The consumer can verify it manually, by running the bootstrap parser's evals plus the production evals through whatever testing approach they have.
Once the production parser passes, retire the bootstrap parser. Subsequent tools (eval runner, resolver) are bootstrapped using the production parser.

The bootstrap parser is a temporary artifact. Its existence is finite and intentional.

Verification

This module's eval suite verifies that a bootstrap parser can correctly read the bootstrap-relevant spec modules (format-spec, production parser, runner, resolver). Cases use real spec module content as fixtures, so a passing bootstrap parser is empirically ready for its narrow job.

A consumer generating a bootstrap parser iterates against this eval suite until conformance. The cases are deliberately few — the bootstrap parser doesn't need extensive testing because it operates on a fixed, trusted set of inputs.

describes commonsformat-parser-bootstrap 0.1.1 · targets commons format 0.2 · generated from release 0.2 · 2026-05-28