module · canonical release 0.2

commonsformat-fetcher

version 0.1.2 · targets format 0.2

A fetcher for Commons Format Git-rooted dependencies. Given a spec URL and a version constraint or commit SHA, returns the module content fetched from the source repository at the resolved revision.

depends on github.com/commonsformat/commonsformat-format ^0.2.0

license MPL-2.0

verifies ./evals.toml

Premise

The fetcher takes a Git-rooted spec URL plus either a version constraint or a specific commit SHA, and returns the module content fetched from the source repository at the resolved revision.

The fetcher does not parse the fetched content (that is the parser's job). It does not resolve transitive dependencies (that is the resolver's job). It does not merge specs. It does not verify checksums against a lockfile (that is the lockfile tool's job).

The fetcher is the boundary between the local toolchain and remote repositories. Once the fetcher returns, all subsequent processing happens locally on the consumer's machine.

Interface

A fetcher is a function or component that takes:

A Git-rooted spec URL of the form host/owner/repo/path@reference, where:
- host is the Git host (github.com, gitlab.com, etc.)
- owner/repo is the repository identifier
- path is the module directory within the repository (may be empty for repository-root modules)
- reference is a tag name, branch name, or commit SHA

And returns:

The module content as raw file bytes for the files at that path in the repository at that reference, OR
A fetch error describing why the fetch failed

The returned module content is the raw bytes of commonsformat.toml, commonsformat.md, evals.toml, LICENSE, and any references/ files present at the specified path. The fetcher does not interpret these bytes — that is the parser's concern.

Schema

This module ships a schema.sql declaring the shape of its data. Per §8 this is a shape commitment, not a storage commitment — the generated implementation chooses a runtime representation appropriate to its intent.

-- This DDL describes data shape, not storage. Runtime representation
-- is implementation-defined; choose what is appropriate to the target
-- language and the module's intent. The fetcher's outputs are raw
-- bytes plus resolution metadata; the tables below declare those
-- shapes. The optional cache may or may not be persisted; the cache
-- table's existence here describes shape, not storage policy.

CREATE TABLE fetch_results (
    spec_url             TEXT    NOT NULL,
    reference            TEXT    NOT NULL,
    resolved_commit_sha  TEXT    NOT NULL,
    reference_kind       TEXT    NOT NULL CHECK (reference_kind IN ('tag', 'branch', 'commit')),
    fetched_at_ms        INTEGER NOT NULL,
    PRIMARY KEY (spec_url, reference)
);

CREATE TABLE fetched_files (
    spec_url             TEXT    NOT NULL,
    reference            TEXT    NOT NULL,
    file_path            TEXT    NOT NULL,
    content              BLOB    NOT NULL,
    PRIMARY KEY (spec_url, reference, file_path),
    FOREIGN KEY (spec_url, reference) REFERENCES fetch_results (spec_url, reference)
);

CREATE TABLE fetch_errors (
    spec_url             TEXT    NOT NULL,
    reference            TEXT    NOT NULL,
    error_kind           TEXT    NOT NULL CHECK (error_kind IN ('network', 'not-found', 'reference-not-found', 'size-exceeded', 'recursion-depth-exceeded', 'authentication-required', 'other')),
    detail               TEXT,
    PRIMARY KEY (spec_url, reference)
);

CREATE TABLE cache_entries (
    spec_url             TEXT    NOT NULL,
    resolved_commit_sha  TEXT    NOT NULL,
    file_path            TEXT    NOT NULL,
    content              BLOB    NOT NULL,
    cached_at_ms         INTEGER NOT NULL,
    PRIMARY KEY (spec_url, resolved_commit_sha, file_path)
);

CREATE INDEX cache_entries_by_commit ON cache_entries (resolved_commit_sha);

What the fetcher must do

resolves-tags-to-commits — when given a tag reference, the fetcher resolves it to the underlying commit SHA and reports both the tag and the SHA in its result
resolves-branches-to-commits — when given a branch reference, the fetcher resolves it to the current head commit SHA and reports both
exact-commit-fetching — when given a commit SHA directly, the fetcher retrieves content at exactly that commit
reports-resolved-revision — every successful fetch reports the resolved commit SHA, regardless of whether the input was a tag, branch, or SHA
limits-to-module-path — the fetcher returns only the files at the specified module path, not the whole repository
reports-fetch-failure — network failures, repository not found, reference not found, and other failures produce a structured fetch error rather than crashing
offline-with-cache — a fetcher implementation may cache previously fetched content; subsequent fetches of the same URL+SHA may return cached content without network access

What the fetcher must NOT do

Parsing the fetched files. The fetcher returns raw bytes; the parser interprets them.
Resolving transitive dependencies. If a fetched module declares its own dependencies, the fetcher does not follow them. The resolver orchestrates that traversal, calling the fetcher for each dependency separately.
Verifying checksums against an external lockfile. The fetcher reports the resolved SHA; whether that matches a recorded SHA in a lockfile is the lockfile tool's concern.
Modifying fetched content. Bytes returned are bytes received.
Authenticating. Public repositories are accessed anonymously. Private repositories require authentication arranged outside the fetcher's scope (e.g., the consumer's Git credentials).

Implementation strategy

The fetcher does not implement the Git protocol. Implementations typically use one of these strategies:

Shell out to git (clone, checkout, read files). Simple, reliable, requires git installed.
Use a Git library (libgit2 bindings, go-git, etc.). More complex, allows finer control, no external dependency.
Use the host's HTTP API (GitHub API, GitLab API, etc.). Faster for single-file reads, host-specific, requires API limits consideration.

Implementations choose based on consumer environment. The contract is the same regardless of strategy: input URL+reference, output files+resolved SHA or error.

Threat model

The fetcher operates on remote content that is potentially adversarial. A compromised or malicious source repository could return:

Files that don't conform to the Commons Format format
Tags that move to point at different commits than they did previously
Content much larger than expected (decompression-bomb style)
References that recurse (a tag pointing at a branch pointing at a tag, etc.)

The fetcher mitigates by:

Returning raw bytes, not parsed content. Malformed content fails in the parser, which has its own threat model. The fetcher does not need to validate format.
Always reporting the resolved commit SHA. A consumer with a lockfile can detect when a tag has moved by comparing the reported SHA against the recorded one.
Bounding response size and time. A fetch attempt that exceeds configurable limits is aborted with a fetch error.
Refusing to follow non-trivial reference indirection. Tags pointing at tags pointing at tags should be detected and refused beyond a small depth.

Trust in the fetched content is established by the consumer's choice of source URL plus subsequent verification (parser validation, lockfile checksum match). The fetcher itself does not try to determine trust; it transports.

Verification

This module's eval suite uses mock Git endpoints and test fixtures to verify fetcher behavior without requiring real network access during eval runs. Cases cover URL parsing, reference resolution, content retrieval, error reporting, and bounded resource use.

A consumer generating a fetcher implementation iterates against this eval suite. Implementations that satisfy the contract are interchangeable from the resolver's perspective.

describes commonsformat-fetcher 0.1.2 · targets commons format 0.2 · generated from release 0.2 · 2026-05-28