Factory · Reference

Glossary

The production system's working vocabulary, defined once. Every page on this site uses these terms exactly as written here, and each entry has a stable anchor, so a claim elsewhere can link straight to the definition it depends on.

Verification gap

The distance between how much teams distrust AI-generated code and how rarely they actually verify it before shipping.

The term is Sonar's, from its January 2026 State of Code survey: 96% of developers say they don't fully trust AI-generated code, yet only 48% always verify it before committing. The gap is the space between those two numbers, and it widens as generation speeds up, because verification done as a human ritual cannot keep pace with code written by machines. The factory exists to close it by making verification automatic and auditable rather than manual and optional: the gate transcript is the evidence the gap was closed on a given run.

Verification debt

The accumulating, unreviewed distance between code that has shipped and code a person has actually understood.

The term is Werner Vogels', from his December 2025 re:Invent keynote: AI generates code faster than anyone can understand it, so software reaches production before a human has confirmed what it does, and every merged change nobody fully read is a quiet loan against the day the system surprises you. A behavior catalog cited to source and a gate transcript are how the debt is paid down: they make what shipped legible without requiring anyone to read every line.

Behavior catalog

A typed, machine-checkable inventory of every observable behavior in a system, with source provenance: each behavior cites the file it was read from and a note on why.

The catalog is the single source of truth the factory projects from: test skeletons, documentation, and parity assertions all derive from it deterministically, with the LLM filling only the irreducible parts under gate enforcement. Every catalog ships with an explicit boundary stating what was read and what wasn't, because absence of a finding is only a claim inside the boundary. When a system is rebuilt, the approved catalog becomes the contractual definition of parity. An excerpt is published at /proof/catalog.

Gate

An automated quality check, deterministic or LLM-judged against a rubric, that work must pass before it moves forward in the factory.

Deterministic gates are scripts: a compile, a grep, a test run. LLM-judged gates score work against a rubric that is fixed before the run, with a fixed pass threshold, and a low score blocks the line the same way a failing compile does. Gate names are public and appear in transcripts; prompts and orchestration internals don't.

Gate transcript

The logged record of every gate a piece of work passed or failed, available for inspection. The factory's receipt.

A transcript records the gate name, its kind, the result, and what happened on failure, including the resubmission that fixed it. Failures stay in the log; a transcript with no red entries would read as theater, and the fix loop carries most of the information. An annotated sample is published at /proof/transcript.

Dual-green

The same behavioral test suite passing against two systems, typically a legacy system and its rebuild, proving behavioral parity.

Order matters. The suite derives from an approved behavior catalog and must first run green against the original system, proving it tests reality before it judges anything new; only then does a green run on the rebuild mean parity. Dual-green is the acceptance standard for rebuild engagements: parity becomes a test result against criteria the client approved rather than a promise.

Parity replay

Replaying recorded operations against a rebuilt system and diffing the results against the original, asserting zero behavioral difference.

Where dual-green runs a projected test suite on both systems, parity replay re-runs what actually happened: recorded operations, executed against the new system, with every result diffed against the original's. Its proven scope today is internal; it's how our own platform moved between factory generations (see the changelog, v3).

Coverage debt

The enumerated, classified list of behaviors a system has that its test suite doesn't cover, tiered by the test level each behavior actually needs.

The tiering is the point. Flat coverage tools inflate the browser-test number; routing each uncovered behavior to the cheapest test level that actually exercises it (unit, integration, or end-to-end) keeps the backlog honest and reserves e2e for the behaviors that need it. A coverage-debt report is a standard deliverable of a verification engagement.

Probe candidate

A finding traced in source but not yet confirmed against a running system. Never presented as a confirmed defect.

Probe candidate is one of three epistemic tags the factory applies to its own claims: proven (an artifact is on disk or a test ran green), traced in source (read from code, not yet run), and probe candidate (needs a live environment to settle). Each probe candidate ships with the exact API call that settles it, and it stays a candidate until that call has run. The tags appear wherever findings or capabilities are described on this site.

Proven Traced in source Probe candidate

Enforcement matrix

The generated cross of every frontend gate against every backend guard, exposing rules enforced in only one layer.

Each cell falls into one of three classes: both-layers, the control group where frontend and backend agree; UI-only, the finding class, rules the frontend promises that the backend doesn't enforce; and backend-only, the inverse, where the server refuses what the interface lets you click. A sample matrix is published at /proof/catalog.

Factory version

The named generations of the production line: v1, v2, v3. Higher versions automate stations that previously required human review.

v1 built a production platform with an architect reviewing at stations that are now automated gates; v2 ported Bugzilla end to end without those reviews; v3 added catalog-driven verification and turned it on our own prior work. The changelog records each generation with its receipts.

The vocabulary is defined; the proof library uses it.

See the proof or read how the machine works