Capture schema.
How a demonstration is represented on disk and on the wire. Enough detail that a training engineer at a humanoid robotics team can integrate against it without a call.
Session
A session is the top-level unit. One session per recording. A session contains one or more clips, each aligned to a task brief.
{
"session_id": "sess_01J...",
"operator_id": "op_01J...",
"tenant_id": "ten_01J...",
"task_brief_id": "tb_01J...",
"device": {
"model": "iphone_15_pro",
"tier": 1,
"has_lidar": true
},
"created_at": "2026-04-21T14:22:00Z",
"consent": {
"brief_accepted_at": "2026-04-21T14:21:58Z",
"safety_rules_version": "v4.2"
},
"clips": ["clip_01J...", "clip_02J..."]
}Clip
A clip is the reviewed unit. Safety, smoothness, and quality are scored at the clip level. Pose and trajectory are emitted per-clip.
{
"clip_id": "clip_01J...",
"session_id": "sess_01J...",
"duration_ms": 18420,
"video": {
"codec": "hevc",
"frames": 553,
"fps": 30,
"resolution": [1920, 1080]
},
"depth": {
"modality": "lidar",
"frames": 553
},
"pose": {
"format": "coco_wholebody",
"frames": 553
},
"trajectory": {
"end_effector_frames": 553
},
"environment": {
"room_scale_mesh": true,
"blur_enforced": true
},
"scoring": {
"safety": 0.94,
"smoothness": 0.81,
"quality": 0.88,
"status": "approved"
}
}Safety signals
Safety is scored machine-first, reviewed human-second. A reviewer can override a machine score; the override is recorded with the rationale.
- contact_severity — peak contact force, normalized by end-effector mass.
- jerk_peak — L2 norm of the third derivative of end-effector position.
- privacy_enforced — boolean. Did on-device blur complete before upload.
- reviewer_override — nullable. If non-null, the machine score is superseded and the rationale is stored.
Provenance
Every clip carries provenance back to the operator, the device, the task brief, and the safety-rules version. This is what makes the review auditable.