Apache-2.0 · release proof required
Risk lint for MCP manifests.
Risk taxonomy, CLI, and GitHub Action wrapper for MCP manifests, permissions, claims, and unsafe tool surfaces.
Eval manifests, regression banks, dataset and embodiment cards, contamination audits. The provenance an audit asks for, with release and runtime proof required before availability is marketed. Nothing you review is represented as uploaded or pooled.
The same checks that sign the evidence inside Human Data OS and App Data OS. Source links are listed; release proof is required.
Risk linting, contract tests, trace replay, OTEL bridges, trace cards. CI runtime proof required.
Quality gates, recovery metrics, VLA probes, embodiment cards. Runtime and data-boundary proof required.
Reproducible failure cases. Review packets a reader can open without an account.
Each one is a command-line diagnostic, a portable file a reviewer can diff, a GitHub workflow, or a review packet a reader can open without an account. Apache-2.0 source links are listed, with release proof required before runtime availability is marketed. These are the open primitives the signed evidence in Human Data OS and App Data OS is built on. The reliability layer should be inspectable.
Repo-level regression gates and replay harnesses the agent cannot game. Risk linting, contract tests, OTEL bridges, and trace cards a reviewer can open. CI runtime proof required.
Apache-2.0 · release proof required
Risk taxonomy, CLI, and GitHub Action wrapper for MCP manifests, permissions, claims, and unsafe tool surfaces.
Apache-2.0 · release proof required
Offline contract tests for A2A agent cards, task lifecycle states, structured payloads, and errors.
Apache-2.0 · release proof required
Deterministic replay harness that turns failed agent tool-call traces into local regression tests.
Apache-2.0 · release proof required
Portable Markdown and JSON cards for one agent run: tools, retries, data touched, outcome, failure mode.
Apache-2.0 · release proof required
Bridge OpenTelemetry and Phoenix GenAI spans into redacted eval regression cases.
Apache-2.0 · release proof required
No-model PR review notes for prompt and rubric changes: weights, criteria, boundaries, injection exposure.
Local quality gates, failure and recovery metrics, VLA robustness probes, and release cards for teleop and VLA datasets. Review robot datasets without uploading the robot.
Apache-2.0 · release proof required
Local quality gates for LeRobot-style datasets: metadata, episodes, sensors, action and state fields.
Apache-2.0 · release proof required
Schema and metrics for human intervention and recovery segments, including repeated-failure clusters.
Apache-2.0 · release proof required
Simulator-light VLA diagnostics for language, vision, metadata, task-phase, and embodiment perturbations.
Apache-2.0 · release proof required
Structured robot dataset and VLA release cards for sensors, action spaces, frames, control rate, limits.
The failure bank in your hands: reproducible failure cases that keep the mistake so it is caught again. Review packets a reader can open without an account.
Apache-2.0 · release proof required
Synthetic agent and robotics failure cases with reproducible commands and expected review labels.
Apache-2.0 · release proof required
Technical review packets, no-endorsement language, and a feedback-log schema for lab review asks.
The twelve tools above, plus the rest of AuraOne Open on GitHub, npm, PyPI, and Homebrew: CI Actions, SDKs, CLIs, schema tools, and example repos from one page. Apache-2.0 source links are listed, with package and runtime proof required before install availability is marketed. Nothing is represented as pooled on our servers.
Portable rubric specs, deterministic scoring, judge diagnostics, IAA, contamination checks, adapters, and eval-run provenance. The provenance machinery an audit asks for; package and local-runtime proof are required before availability is marketed.
Local evaluation tooling for rubric validation, linting, and deterministic scoring. Source link listed; package proof required.
Portable AuraOne Rubric Schema v1 validator and adapters.
Modern inter-annotator agreement metrics with bootstrap confidence intervals.
Judge Card schema, validator, and renderer for judge-model disclosure.
Diagnostic probes for LLM-as-judge reliability and bias checks.
Adapters between rubric-spec and common evaluation framework inputs.
Signed manifest envelope for eval runs, artifacts, and reproducibility.
N-gram, embedding, canary, answer-pattern, and corpus contamination auditor.
Controlled synthetic annotator disagreement generator for adjudication workflows.
Executable rubric-spec v1 conformance suite and badges.
CI surfaces for eval validation, dataset-card checks, rubric PR feedback, and action smoke fixtures. Action runtime proof is required before workflow availability is marketed.
GitHub Action for running EvalKit validation, scoring, and reporting in CI.
GitHub Action for validating dataset cards and required metadata in pull requests.
Rubric diffs and lint feedback for pull requests that change evaluation criteria.
Public smoke tests for AuraOne Open GitHub Action workflows.
Hosted API SDKs, headless Studio engines, and the browser UI and 3D SDKs you keep. Editable source, your code. Apache-2.0.
TypeScript SDK for the AuraOne hosted API.
Python SDK and CLI for the AuraOne hosted API.
Headless Agent Studio Open protocol, trace-store, sidecar, and export CLI.
Headless Robotics Studio Open dataset adapters, QA, clustering, orchestration, and exports.
The AI-native 3D SDK for the web. Browser-native scenes and typed GLB and glTF assets.
The Liquid Glass app-surface system for React and Next.js. Your interface, your dependencies.
Rubric Studio Open VS Code and JavaScript integration package.
Homebrew and desktop Studio listings require current cask, release, and platform proof before install availability is marketed.
Homebrew listing requires current release proof.
Homebrew cask listing requires current desktop release proof.
Worked MCP, A2A, OTEL, replay, and CI examples for Agent Studio Open.
After a competitor lost four terabytes, including who its workers were, nobody wants tooling that pools their data. Open never does. These are the same QA and provenance checks that sign the evidence in Human Data OS and App Data OS. Source links are listed, while package and runtime proof are required before availability is marketed. Start open. Bring AuraOne in when the problem becomes shared authorship, approval queues, or governance across a team.