OPEN · RUBRIC STUDIO · THE IDE FOR THE RUBRIC

The rubric becomes the artifact.

Local, file-based, git-friendly authoring for the criterion-level evaluations that now shape frontier AI. Author, test, calibrate, diff, and export.

RUBRIC · behavior-shaping-v3
AGREEMENT
0.81 κ
BIAS PROBES
2 warn
DIFF IMPACT
14/200
AUTHOR
On disk

Criteria, judges, samples, calibration data — a project folder a reviewer can diff.

CALIBRATE
Agreement first

Cohen κ, Fleiss κ, Krippendorff α, bootstrap intervals, ordinal support.

EXPORT
Portable

rubric-spec, judge cards, manifests, adapters for Inspect, Evals, Promptfoo.

HOW IT WORKS

Three steps. Git-native authoring.

Write the criteria. Score with a mock or BYO judge. Diff the wording, see the score impact.

STEP 01
WHAT WE WRITE

Author the criteria

Criterion-level rubrics in a project folder with schema validation, examples, evidence requirements, and theme tags.

STEP 02
WHAT WE MEASURE

Calibrate against gold

Bring expert scores into the calibration tab. Compute agreement. Probe judge bias. Rank criteria that need work.

STEP 03
WHAT WE SHIP

Diff and export

Semantic rubric changes next to score-impact overlays. Export rubric-spec, manifests, conformance badges, intake packets.

WHAT COMES OUT

Files a reviewer can diff.

Every project leaves a folder. Every export is portable. Reviewers run it without a hosted account.

01

rubric.toml

Portable rubric in the rubric-spec schema. Validated, linted, diffable, and adapter-ready.

↳ ARTIFACT
02

judge-card.md

Disclosure card for the judge prompt: calibration results, known bias, use envelope, limits.

↳ ARTIFACT
03

eval-run-manifest.json

Reproducible scoring envelope with provenance, hashes, and the exact data the run touched.

↳ ARTIFACT
04

framework adapters

Exports for Inspect, OpenAI Evals, Promptfoo, Hugging Face, and lm-eval-harness.

↳ ARTIFACT
05

Intake packets

Signed .auraonepkg with a privacy preview before handoff to AuraOne reviewers.

↳ ARTIFACT
RELATED OPEN SURFACES

Next to this in AuraOne Open.

AGENT STUDIO OPEN

The debug loop, on your laptop.

Local-first IDE for MCP and A2A agents. Replay, compare, export.

See the page →
ROBOTICS STUDIO OPEN

Review teleop and VLA datasets, on disk.

Scrub sensor streams. Cluster failures. Export reviewed subsets.

See the page →
OPEN V2

Trust gates for agentic and embodied AI.

Twelve installable packages including rubric-spec, iaa-kit, judge-bench, judge-card.

See the page →
RUBRIC STUDIO OPEN

A folder a reviewer can diff.

Open is not a trial. It is the IDE. Cloud begins when multi-author review is the actual problem.

Rubric Studio Open | The IDE for the rubric | AuraOne