APP DATA OS · EVALUATION STUDIO LIVE DEMO

Weighted rubrics, calibrated judges, gates, traces, and consumer inboxes.

A score on its own does not survive examination. Here the weighted rubric, judge confidence, human concordance, and bias and cost gates stay linked to the release, so an eval becomes a decision you can defend.

Run it on your work See the product

RUBRIC PASS RATE

94%

median scorecard outcome

JUDGE CONFIDENCE

91%

median confidence band

HUMAN CONCORDANCE

89%

reviewer agreement signal

COST PER CALL

$0.0008

current run

READ-ONLY SURFACES

Production workflow states, evidence, and owners at a glance.

EVALUATION RUN

Support assistant release candidate

● DECISION-READY

RUBRIC

weighted

JUDGES

calibrated

GATES

3 checked

SURFACE READING · SEED DATA HERE, YOUR METRICS IN A PILOT

WHAT’S ON SCREEN

Three panels, one record.

RUBRIC EDITOR

Weights, pass criteria, and scorecards are visible before the run.

Teams define criteria, weight risk, and keep judge instructions attached.

Accuracy · 40% weightREQUIRED

Safety · 35% weightREQUIRED

Tone · 25% weightWATCHED

CALIBRATED JUDGES

Confidence and concordance decide what needs human review.

Scorecards surface low-confidence cases, disagreement, and the reviewer queue they create.

Median confidence · 91%STABLE

Concordance · 89%WATCHED

Review queue · 12 casesOPEN

RELEASE GATES

Bias, cost, SLO, and regression checks travel with the release.

The deploy check keeps the scorecard, traces, and gates linked to the release decision.

Bias Sentinel · No holdPASSING

Cost ceiling · $0.0010PASSING

Regression · 1 warningREVIEW

DEMO PATH

Four steps, one defensible record.

Inspect the work, the gate, the owner, and the record that remains after every decision.

STEP 01

Define

Create the scorecard, weights, judge prompt, and acceptance thresholds.

STEP 02

Run

Score model outputs, traces, and multi-turn cases against the rubric.

STEP 03

Review

Send uncertain cases to the right inbox with context attached.

STEP 04

Gate

Attach the result to release review and deploy checks.

WHAT COMES OUT

The evaluation run packet, attached.

Owner: Evaluation lead. Status: decision ready.

Scorecard

weighted rubric

↳ SIGNED

Trace record

tool + answer

↳ ATTACHED

Reviewer strip

12 overrides

↳ QUEUED

NEXT PATH

See the proof. Then run it.

This walkthrough is read-only. Start a pilot to run the same loop on your own work, with the figures reading from your live metrics.

Run it on your work See the product

Rubric Studio Cloud walkthrough

From rubric draft to model scorecard contribution.

This block mirrors the shipped PR #1 path: author a rubric, create an AI draft, get expert approval, send work to grading, and write the contribution used by scorecards.

Coming with QA Review / AdjudicationComing with ScorecardsComing with Exports

Product page Evaluation Studio section

Read-only PR #1 path

Model output safety rubric

seeded path

Author rubric

Name the task type, domain, risk level, and first criteria.

PR #1 live

AI draft

Generate a draft with warnings and review mode attached.

PR #1 live

Expert approval

AI-drafted rubrics stay blocked until an expert approves them.

PR #1 live

Worker grading

Grade model output criterion by criterion with evidence gates.

PR #1 live

Scorecard contribution

A submitted grade writes the scorecard contribution path.

PR #1 live