The old output and candidate fix are compared side by side.
Compare prompts, responses, expected behavior, failure families, and replay results side by side.
An escaped regression usually has nowhere to go. Here the failure is captured, replayed against the fix, and promoted to a release check the next deploy must pass, with a signed gate verdict on the record.
durable failure families
recorded replay executions
current release-gate outcome
Compare prompts, responses, expected behavior, failure families, and replay results side by side.
The promoted case gets owner, severity, family, scope, and the threshold required to ship.
The verdict includes suite status, replay evidence, reviewer note, and release linkage.
Inspect the work, the gate, the owner, and the record that remains after every decision.
Record the failure with prompts, payloads, model version, and output.
Run the candidate fix against the failure family and related cases.
Turn the incident into a suite case with an owner and threshold.
Attach the signed pass or fail verdict to the release decision.
Owner: Release quality. Status: gate blocking.
policy bypass
refusal reason
assistant candidate
This walkthrough is read-only. Start a pilot to run the same loop on your own work, with the figures reading from your live metrics.