Weights, pass criteria, and scorecards are visible before the run.
Teams define criteria, weight risk, and keep judge instructions attached.
A score on its own does not survive examination. Here the weighted rubric, judge confidence, human concordance, and bias and cost gates stay linked to the release, so an eval becomes a decision you can defend.
median scorecard outcome
median confidence band
reviewer agreement signal
current run
Teams define criteria, weight risk, and keep judge instructions attached.
Scorecards surface low-confidence cases, disagreement, and the reviewer queue they create.
The deploy check keeps the scorecard, traces, and gates linked to the release decision.
Inspect the work, the gate, the owner, and the record that remains after every decision.
Create the scorecard, weights, judge prompt, and acceptance thresholds.
Score model outputs, traces, and multi-turn cases against the rubric.
Send uncertain cases to the right inbox with context attached.
Attach the result to release review and deploy checks.
Owner: Evaluation lead. Status: decision ready.
weighted rubric
tool + answer
12 overrides
This walkthrough is read-only. Start a pilot to run the same loop on your own work, with the figures reading from your live metrics.
Rubric Studio Cloud walkthrough
This block mirrors the shipped PR #1 path: author a rubric, create an AI draft, get expert approval, send work to grading, and write the contribution used by scorecards.
Read-only PR #1 path
Name the task type, domain, risk level, and first criteria.
Generate a draft with warnings and review mode attached.
AI-drafted rubrics stay blocked until an expert approves them.
Grade model output criterion by criterion with evidence gates.
A submitted grade writes the scorecard contribution path.