module · canonical release 0.2

commonsformat-parser

version 0.1.2 · targets format 0.2

A parser for Commons Format modules. Reads a module directory from disk and produces a structured representation of its contents. Verifies that the module conforms to the format's structural and encoding requirements.

depends on github.com/commonsformat/commonsformat-format ^0.2.0
license MPL-2.0
verifies ./evals.toml

Premise

The parser reads a module directory from disk and produces a structured representation of the module's contents. The structured representation contains the parsed metadata, the parsed prose with extracted tagged sections, and (if present) the parsed eval suite.

The parser does not resolve dependencies, does not fetch from Git, does not verify lockfiles, and does not run evals. Those are separate concerns handled by separate tools, each specified by its own module.

This module describes what a conformant parser must do. It does not prescribe how a parser is implemented, what language it is written in, what its calling conventions are, or how it reports errors. Those are decisions consumers make when generating their parser.

Interface

A parser is a function or component that takes a path to a module directory on disk and returns a Module structure containing:

  • The parsed metadata from commonsformat.toml
  • The parsed prose from commonsformat.md, with tagged sections extracted
  • The parsed eval suite from evals.toml if present
  • The parsed metadata from any auxiliary files referenced from commonsformat.toml

If the module directory does not contain a valid Commons Format module per the format specification, the parser reports the violation and does not return a Module structure. The format of the violation report is the consumer's choice; the violations themselves are determined by the format.

Schema

This module ships a schema.sql declaring the shape of its data. Per §8 this is a shape commitment, not a storage commitment — the generated implementation chooses a runtime representation appropriate to its intent.

-- This DDL describes data shape, not storage. Runtime representation
-- is implementation-defined; choose what is appropriate to the target
-- language and the module's intent. The parser is a pure function
-- from a module directory to a Module structure; the tables below are
-- the shape of that structure, not a database the parser maintains.

CREATE TABLE modules (
    path                 TEXT    NOT NULL PRIMARY KEY,
    name                 TEXT    NOT NULL,
    version              TEXT    NOT NULL,
    description          TEXT    NOT NULL,
    license              TEXT    NOT NULL,
    commonsformat        TEXT    NOT NULL,
    has_eval_suite       BOOLEAN NOT NULL,
    has_schema           BOOLEAN NOT NULL
);

CREATE TABLE module_authors (
    module_path          TEXT    NOT NULL,
    ordinal              INTEGER NOT NULL,
    author_name          TEXT    NOT NULL,
    email                TEXT,
    url                  TEXT,
    PRIMARY KEY (module_path, ordinal),
    FOREIGN KEY (module_path) REFERENCES modules (path)
);

CREATE TABLE module_dependencies (
    module_path          TEXT    NOT NULL,
    ordinal              INTEGER NOT NULL,
    spec_url             TEXT    NOT NULL,
    version_constraint   TEXT,
    git_ref              TEXT,
    commit_sha           TEXT,
    edge_kind            TEXT    NOT NULL CHECK (edge_kind IN ('depends_on', 'extends')),
    PRIMARY KEY (module_path, ordinal),
    FOREIGN KEY (module_path) REFERENCES modules (path)
);

CREATE TABLE tagged_sections (
    module_path          TEXT    NOT NULL,
    tag_name             TEXT    NOT NULL,
    ordinal              INTEGER NOT NULL,
    content              TEXT    NOT NULL,
    PRIMARY KEY (module_path, tag_name, ordinal),
    FOREIGN KEY (module_path) REFERENCES modules (path)
);

CREATE TABLE named_constraints (
    module_path          TEXT    NOT NULL,
    constraint_name      TEXT    NOT NULL,
    description          TEXT    NOT NULL,
    PRIMARY KEY (module_path, constraint_name),
    FOREIGN KEY (module_path) REFERENCES modules (path)
);

CREATE TABLE module_examples (
    module_path          TEXT    NOT NULL,
    example_name         TEXT    NOT NULL,
    content              TEXT    NOT NULL,
    PRIMARY KEY (module_path, example_name),
    FOREIGN KEY (module_path) REFERENCES modules (path)
);

CREATE TABLE eval_cases (
    module_path          TEXT    NOT NULL,
    case_name            TEXT    NOT NULL,
    case_class           TEXT    NOT NULL CHECK (case_class IN ('functional', 'adversarial', 'generator_adversary')),
    category             TEXT    NOT NULL,
    description          TEXT    NOT NULL,
    input_blob           BLOB    NOT NULL,
    expect_blob          BLOB    NOT NULL,
    severity             TEXT    NOT NULL DEFAULT 'error' CHECK (severity IN ('info', 'warn', 'error', 'critical')),
    PRIMARY KEY (module_path, case_name),
    FOREIGN KEY (module_path) REFERENCES modules (path)
);

CREATE TABLE eval_case_verifies (
    module_path          TEXT    NOT NULL,
    case_name            TEXT    NOT NULL,
    constraint_name      TEXT    NOT NULL,
    PRIMARY KEY (module_path, case_name, constraint_name)
);

CREATE TABLE eval_properties (
    module_path          TEXT    NOT NULL,
    property_name        TEXT    NOT NULL,
    property_value       TEXT    NOT NULL,
    PRIMARY KEY (module_path, property_name),
    FOREIGN KEY (module_path) REFERENCES modules (path)
);

CREATE TABLE schema_tables (
    module_path          TEXT    NOT NULL,
    table_name           TEXT    NOT NULL,
    PRIMARY KEY (module_path, table_name),
    FOREIGN KEY (module_path) REFERENCES modules (path)
);

CREATE TABLE schema_columns (
    module_path          TEXT    NOT NULL,
    table_name           TEXT    NOT NULL,
    column_name          TEXT    NOT NULL,
    column_type          TEXT    NOT NULL CHECK (column_type IN ('INTEGER', 'REAL', 'TEXT', 'BLOB', 'TIMESTAMP', 'BOOLEAN')),
    ordinal              INTEGER NOT NULL,
    is_not_null          BOOLEAN NOT NULL DEFAULT FALSE,
    is_in_primary_key    BOOLEAN NOT NULL DEFAULT FALSE,
    default_literal      TEXT,
    PRIMARY KEY (module_path, table_name, column_name),
    FOREIGN KEY (module_path, table_name) REFERENCES schema_tables (module_path, table_name)
);

CREATE TABLE parse_violations (
    module_path          TEXT    NOT NULL,
    ordinal              INTEGER NOT NULL,
    kind                 TEXT    NOT NULL,
    detail               TEXT    NOT NULL,
    file_path            TEXT,
    line_number          INTEGER,
    PRIMARY KEY (module_path, ordinal)
);

What the parser must accept

  • accepts-conformant-modules — any module directory that conforms to the format specification produces a Module structure containing all fields the format defines
  • preserves-tagged-content — tagged section content is extracted verbatim, preserving whitespace and Markdown formatting inside the tags
  • preserves-prose — prose outside tagged sections is preserved as parsed Markdown structure (or as raw text; both are conformant)
  • handles-utf8 — input files in UTF-8 are accepted; non-UTF-8 input is rejected with an encoding violation
  • handles-line-endings — LF and CRLF line endings are both accepted; internal representation normalizes to LF
  • deterministic — parsing the same module twice produces equivalent Module structures
  • offline — parsing does not require network access

What the parser must reject

  • rejects-missing-required-files — a directory without commonsformat.toml, commonsformat.md, or LICENSE is rejected
  • rejects-malformed-toml — TOML files violating the format's TOML subset are rejected
  • rejects-malformed-markdown — Markdown files violating the format's Markdown subset are rejected (where the subset is restrictive; the parser need not reject CommonMark constructs the subset permits to be ignored)
  • rejects-malformed-frontmatter — missing required fields or invalid field values in commonsformat.toml are rejected
  • rejects-malformed-tags — tagged sections that are unclosed, improperly nested, or violate uniqueness rules are rejected
  • rejects-non-utf8 — files containing non-UTF-8 bytes are rejected

Anti-patterns

  • Performing dependency resolution. The parser handles a single module in isolation.
  • Fetching from network. The parser operates on local filesystem only.
  • Executing eval cases. The parser parses the eval suite into a structured representation; running cases is the eval runner's job.
  • Auto-correcting malformed input. If the input violates the format, the parser rejects it. It does not silently fix things.
  • Tolerating non-subset TOML or Markdown features. The parser enforces the subsets defined in the format spec.

Module structure produced

The Module structure produced by the parser contains the logical contents specified in format-spec §10 (Logical Module Structure): metadata, prose with extracted tagged sections, constraints, interface, avoid, threat_model, examples, evals (when verifies is declared), and path.

Concrete representation is at the implementation's discretion. Two conformant parsers may use different in-memory data structures or field names; what matters is that the logical contents specified by the format are accessible.

Threat model

The parser operates on potentially adversarial input. A malicious module on disk should not cause the parser to:

  • Crash unexpectedly (controlled rejection is fine; uncontrolled failure is not)
  • Consume unbounded memory or time (parser must have resource bounds)
  • Execute arbitrary code (parser does not interpret content as code)
  • Access network or filesystem outside the module directory
  • Leak information through timing channels in security-relevant comparisons (e.g., when checking checksums or signatures, though those are not the parser's primary responsibility)

The parser is the first line of defense against malformed or hostile modules. Subsequent tools (resolver, eval runner) trust that the parser has validated the input's structural correctness.

Verification

This module's eval suite (evals.toml) defines what conformance means for a parser implementation. A parser passes conformance by passing all cases in the merged eval suite — this module's evals plus the eval cases inherited from the format-spec module via the dependency.

The format-spec module already contains extensive parsing-related eval cases (TOML subset acceptance and rejection, Markdown subset parsing, tagged section extraction, module loading, encoding handling). This module's evals add interface-level cases that test the parser as a callable component, separately from the format content it processes.

Consumers generating a parser implementation iterate against the merged eval suite until conformance is achieved. The first generation rarely passes everything; the iteration is normal and expected.

describes commonsformat-parser 0.1.2 · targets commons format 0.2 · generated from release 0.2 · 2026-05-28