commonsformat-fetcher
A fetcher for Commons Format Git-rooted dependencies. Given a spec URL and a version constraint or commit SHA, returns the module content fetched from the source repository at the resolved revision.
Premise
The fetcher takes a Git-rooted spec URL plus either a version constraint or a specific commit SHA, and returns the module content fetched from the source repository at the resolved revision.
The fetcher does not parse the fetched content (that is the parser's job). It does not resolve transitive dependencies (that is the resolver's job). It does not merge specs. It does not verify checksums against a lockfile (that is the lockfile tool's job).
The fetcher is the boundary between the local toolchain and remote repositories. Once the fetcher returns, all subsequent processing happens locally on the consumer's machine.
Interface
A fetcher is a function or component that takes:
- A Git-rooted spec URL of the form
host/owner/repo/path@reference, where:hostis the Git host (github.com,gitlab.com, etc.)owner/repois the repository identifierpathis the module directory within the repository (may be empty for repository-root modules)referenceis a tag name, branch name, or commit SHA
And returns:
- The module content as raw file bytes for the files at that path in the repository at that reference, OR
- A fetch error describing why the fetch failed
The returned module content is the raw bytes of
commonsformat.toml, commonsformat.md,
evals.toml, LICENSE, and any
references/ files present at the specified
path. The fetcher does not interpret these bytes — that
is the parser's concern.
Schema
This module ships a schema.sql declaring the
shape of its data. Per §8
this is a shape commitment, not a storage commitment — the
generated implementation chooses a runtime representation
appropriate to its intent.
-- This DDL describes data shape, not storage. Runtime representation
-- is implementation-defined; choose what is appropriate to the target
-- language and the module's intent. The fetcher's outputs are raw
-- bytes plus resolution metadata; the tables below declare those
-- shapes. The optional cache may or may not be persisted; the cache
-- table's existence here describes shape, not storage policy.
CREATE TABLE fetch_results (
spec_url TEXT NOT NULL,
reference TEXT NOT NULL,
resolved_commit_sha TEXT NOT NULL,
reference_kind TEXT NOT NULL CHECK (reference_kind IN ('tag', 'branch', 'commit')),
fetched_at_ms INTEGER NOT NULL,
PRIMARY KEY (spec_url, reference)
);
CREATE TABLE fetched_files (
spec_url TEXT NOT NULL,
reference TEXT NOT NULL,
file_path TEXT NOT NULL,
content BLOB NOT NULL,
PRIMARY KEY (spec_url, reference, file_path),
FOREIGN KEY (spec_url, reference) REFERENCES fetch_results (spec_url, reference)
);
CREATE TABLE fetch_errors (
spec_url TEXT NOT NULL,
reference TEXT NOT NULL,
error_kind TEXT NOT NULL CHECK (error_kind IN ('network', 'not-found', 'reference-not-found', 'size-exceeded', 'recursion-depth-exceeded', 'authentication-required', 'other')),
detail TEXT,
PRIMARY KEY (spec_url, reference)
);
CREATE TABLE cache_entries (
spec_url TEXT NOT NULL,
resolved_commit_sha TEXT NOT NULL,
file_path TEXT NOT NULL,
content BLOB NOT NULL,
cached_at_ms INTEGER NOT NULL,
PRIMARY KEY (spec_url, resolved_commit_sha, file_path)
);
CREATE INDEX cache_entries_by_commit ON cache_entries (resolved_commit_sha);
What the fetcher must do
resolves-tags-to-commits— when given a tag reference, the fetcher resolves it to the underlying commit SHA and reports both the tag and the SHA in its resultresolves-branches-to-commits— when given a branch reference, the fetcher resolves it to the current head commit SHA and reports bothexact-commit-fetching— when given a commit SHA directly, the fetcher retrieves content at exactly that commitreports-resolved-revision— every successful fetch reports the resolved commit SHA, regardless of whether the input was a tag, branch, or SHAlimits-to-module-path— the fetcher returns only the files at the specified module path, not the whole repositoryreports-fetch-failure— network failures, repository not found, reference not found, and other failures produce a structured fetch error rather than crashingoffline-with-cache— a fetcher implementation may cache previously fetched content; subsequent fetches of the same URL+SHA may return cached content without network access
What the fetcher must NOT do
- Parsing the fetched files. The fetcher returns raw bytes; the parser interprets them.
- Resolving transitive dependencies. If a fetched module declares its own dependencies, the fetcher does not follow them. The resolver orchestrates that traversal, calling the fetcher for each dependency separately.
- Verifying checksums against an external lockfile. The fetcher reports the resolved SHA; whether that matches a recorded SHA in a lockfile is the lockfile tool's concern.
- Modifying fetched content. Bytes returned are bytes received.
- Authenticating. Public repositories are accessed anonymously. Private repositories require authentication arranged outside the fetcher's scope (e.g., the consumer's Git credentials).
Implementation strategy
The fetcher does not implement the Git protocol. Implementations typically use one of these strategies:
- Shell out to
git(clone, checkout, read files). Simple, reliable, requires git installed. - Use a Git library (libgit2 bindings, go-git, etc.). More complex, allows finer control, no external dependency.
- Use the host's HTTP API (GitHub API, GitLab API, etc.). Faster for single-file reads, host-specific, requires API limits consideration.
Implementations choose based on consumer environment. The contract is the same regardless of strategy: input URL+reference, output files+resolved SHA or error.
Threat model
The fetcher operates on remote content that is potentially adversarial. A compromised or malicious source repository could return:
- Files that don't conform to the Commons Format format
- Tags that move to point at different commits than they did previously
- Content much larger than expected (decompression-bomb style)
- References that recurse (a tag pointing at a branch pointing at a tag, etc.)
The fetcher mitigates by:
- Returning raw bytes, not parsed content. Malformed content fails in the parser, which has its own threat model. The fetcher does not need to validate format.
- Always reporting the resolved commit SHA. A consumer with a lockfile can detect when a tag has moved by comparing the reported SHA against the recorded one.
- Bounding response size and time. A fetch attempt that exceeds configurable limits is aborted with a fetch error.
- Refusing to follow non-trivial reference indirection. Tags pointing at tags pointing at tags should be detected and refused beyond a small depth.
Trust in the fetched content is established by the consumer's choice of source URL plus subsequent verification (parser validation, lockfile checksum match). The fetcher itself does not try to determine trust; it transports.
Verification
This module's eval suite uses mock Git endpoints and test fixtures to verify fetcher behavior without requiring real network access during eval runs. Cases cover URL parsing, reference resolution, content retrieval, error reporting, and bounded resource use.
A consumer generating a fetcher implementation iterates against this eval suite. Implementations that satisfy the contract are interchangeable from the resolver's perspective.