module · canonical release 0.2

commonsformat-orchestrator

version 0.1.1 · targets format 0.2

Orchestrates the generation-verification loop for Commons Format implementations. Takes a merged spec, invokes a code generator, runs the eval suite against the candidate, and iterates with refined prompts until conformance is achieved or budget is exhausted. Renamed from commonsformat-generator-harness in v0.1.2 to reflect its expanded role owning the iteration loop.

depends on github.com/commonsformat/commonsformat-format ^0.2.0

github.com/commonsformat/commonsformat-eval-runner ^0.1.1

license MPL-2.0

verifies ./evals.toml

Premise

The orchestrator is the workflow component that ties code generation to verification. It takes a merged virtual spec, prompts a generator to produce a candidate implementation, runs the spec's eval suite against that candidate via the eval runner, and — when cases fail — refines its prompt with the failures and regenerates, until either all cases pass or a configured budget is exhausted.

This module was named commonsformat-generator-harness in v0.1.0 and v0.1.1. It is renamed commonsformat-orchestrator in v0.1.2 because the old name described only one step of what the workflow needs. Without an iteration owner, the bootstrap procedure described a loop nobody implemented; the orchestrator closes that gap.

The orchestrator does not choose which generator to use (consumers configure that), does not implement the eval runner (it depends on that module), and does not enforce deployment tier policy (the verifier does that). It owns the workflow that turns a spec into a candidate that has been tested and refined to conformance.

Interface

An orchestrator is a function or component that takes:

A merged virtual spec (produced by a resolver applying the merge algorithm per format-spec §11.4)
An orchestration configuration including:
- Target language identifier
- Generator selection (which model, service, or process to invoke)
- Generation parameters (temperature, token budgets, deterministic vs sampled — generator-specific)
- Eval runner configuration (how to invoke the runner against candidates)
- Iteration budget (maximum attempts, total time, total token spend, or any combination)
- Optional context augmentations (additional examples, style preferences, hand-written code seeds — consumer-supplied)

And produces one of:

A verified implementation candidate that passes the merged spec's eval suite, along with metadata about the generation history (number of iterations, generator invocations, parameter variations, intermediate failures)
A budget-exhausted result indicating that no candidate satisfied the eval suite within the configured budget, including the closest candidate and the cases it failed

What the orchestrator must do

own-iteration-loop — when a candidate fails the eval suite, the orchestrator decides whether to retry, what to change about the prompt, and when to give up; it does not return failed candidates for an external loop to handle
pass-merged-spec-to-generator — each generation invocation receives the merged spec as context, with all sections (intent, constraints, avoid, interface, threat-model, examples) included
include-failure-context-on-retry — when a candidate fails, the failing eval cases are included in the next prompt as context, so the generator can see what to fix
record-full-history — the result includes every generation attempt along with what eval cases each attempt failed, allowing later audits to reconstruct what was tried
record-spec-hash — the result records a hash of the merged spec the iteration was performed against, allowing later verification that the candidate matches a specific spec state
enforce-budget — iteration stops when the configured budget (attempts, time, tokens) is exhausted; the orchestrator does not silently exceed limits
separation-from-verification — the orchestrator's "candidate passes the eval suite" is a workflow result, not a deployment-tier verification; the verifier reads orchestrator output along with lockfile state to gate deployment

What the orchestrator must NOT do

Implement eval execution. The eval runner does that; the orchestrator depends on it.
Choose a generator. Consumers configure which generator the orchestrator invokes.
Enforce deployment tier policy. The verifier does that, reading the orchestrator's output and the lockfile.
Modify the candidate implementation. The orchestrator records what the generator produced; post-processing is out of scope.
Mix multiple generators within a single iteration loop. Multi-generator conformance is achieved by running the orchestrator multiple times with different generator configurations, not by combining generators internally.
Run indefinitely. Budget exhaustion produces a structured result, not a hang.

Multi-generator conformance

The verification axis "generator diversity" (format-spec §13.2) requires that the same merged spec produce conformant implementations across multiple generators. This is achieved by:

Invoking the orchestrator once per generator. Each invocation produces either a verified candidate or a budget-exhausted result.
Recording each generator's outcome in the lockfile (format-spec §14).

The orchestrator itself sees only one generator at a time. Multi-generator coordination happens at a higher orchestration layer that calls the orchestrator multiple times — typically the consumer's bootstrap procedure or CI pipeline, not a separate Commons Format module.

This separation matters: the orchestrator's contract is about a single generator's iteration loop, which keeps the contract small. Multi-generator strategy is a consumer concern that the format supports through verification axis declarations and lockfile structure.

Iteration strategies

The format does not specify a particular iteration strategy. Some strategies that fit within the orchestrator's contract:

Naive retry: when a candidate fails, regenerate with identical parameters. Cheap; often sufficient for stochastic generators.
Failure-feedback prompting: include the specific failing eval cases in the next prompt's context, asking the generator to fix them. Most common strategy.
Parameter variation: increase temperature or change generator parameters between attempts to escape local minima.
Hand-written seed mixing: include consumer-supplied code as seeds the generator builds around; useful for forcing divergence from the generator's default patterns.

These are consumer-strategy concerns. The orchestrator records what strategy it used in its result so iterations are reproducible. Strategies for amplifying divergence and escaping convergent generator patterns are an active research area outside the format itself; the orchestrator provides the substrate those strategies operate within.

Threat model

The orchestrator invokes external code generators, which return candidate implementations the consumer will run. Threats:

A compromised generator returning malicious code that passes the spec's evals because the evals didn't anticipate the malice. The orchestrator cannot detect this; mitigation is multi-generator conformance, where independent generators producing the same passing behavior is much harder to compromise simultaneously.
A compromised orchestrator that fails to pass the merged spec faithfully, feeding the generator a different (perhaps weakened) spec. The orchestrator records the spec hash in its result, allowing later audits to detect this.
A generator that emits content larger or more complex than expected. The orchestrator bounds generation per the configured budget; abuse of this kind results in budget exhaustion, not unbounded resource use.
A generator running with elevated privileges or accessing resources beyond what the orchestrator intends. Generator invocation is sandboxed by configuration; the orchestrator does not assume generators can be trusted to behave well, and isolates them appropriately.

The orchestrator is a sandbox boundary as much as it is a workflow adapter. Implementations should treat generator invocation as running untrusted code, regardless of the generator's reputation.

Verification

This module's eval suite tests orchestrator behavior using mock generators that simulate failure-then-success patterns, budget-exhaustion patterns, and pathological cases. The orchestrator's contract is about the iteration loop; the eval suite verifies that loop's mechanics under reproducible mock inputs.

Real generators (Claude, GPT, etc.) are not part of conformance testing because their outputs are non-deterministic; mock generators provide reproducible inputs to the orchestrator.

describes commonsformat-orchestrator 0.1.1 · targets commons format 0.2 · generated from release 0.2 · 2026-05-28