Mental model

Session, clip, signals, provenance.

A session is the top-level recording event. A clip is the reviewed training unit. Signals carry quality and safety decisions. Provenance keeps the operator, task brief, device, consent receipt, license scope, and rule version attached.

Session

sess_01J...

One capture event with operator, task, device, consent, license, and child clip ids.

Clip

clip_01J...

The reviewed unit used by Robotics Studio for scrub, filter, tag, and export.

Safety

machine + human

Machine scores can be overridden by a reviewer with rationale preserved.

Provenance

task · consent · license · device · rules

Every reviewed clip can explain where it came from, what rights attach, and which policy judged it.

Large-media model

Long captures become sessions, segments, parts, and derivatives.

One or two hours of phone footage should not be treated as one loose upload. AuraOne models a parent media session, segment records, multipart upload parts, raw media, proxy derivatives, and provider proof so review and buyer delivery can start from verified assets.

Media session

50 GB+ total

The parent recording event that rolls up segment progress, review outcomes, payout state, and buyer readiness.

Media segment

5-10 minute unit

The upload and review unit with consent, task, quality, raw/proxy, and segment-level payout metadata.

Media part

64-256 MB

A signed object-store part with byte range, status, ETag, expiry, and retry state.

Derivative

proxy, thumbnail, transcript, label

Review and export aids linked back to the source segment with their own checksums and lifecycle policy.

Session object

The capture envelope.

A session records the operator/device context and references one or more clips aligned to a task brief.

{
  "session_id": "sess_01J...",
  "operator_id": "op_01J...",
  "tenant_id": "ten_01J...",
  "task_brief_id": "tb_01J...",
  "device": {
    "model": "iphone_15_pro",
    "tier": 1,
    "has_lidar": true
  },
  "created_at": "2026-04-21T14:22:00Z",
  "consent": {
    "receipt_id": "consent_01J...",
    "brief_accepted_at": "2026-04-21T14:21:58Z",
    "co_resident_required": false
  },
  "license": {
    "scope_id": "lic_01J...",
    "allowed_use": ["robotics_training", "internal_eval"]
  },
  "safety_rules_version": "v4.2",
  "clips": ["clip_01J...", "clip_02J..."]
}

Segment object

The upload envelope for large media.

Segments are independently resumable and independently reviewable. A failed segment should not force a two-hour session to start over.

{
  "media_session_id": "msess_01J...",
  "segment_id": "seg_01J...",
  "capture_id": "cap_01J...",
  "segment_index": 4,
  "started_at": "2026-06-04T18:00:00Z",
  "ended_at": "2026-06-04T18:08:52Z",
  "boundary_reason": "task_step",
  "raw": {
    "object_key": "robotics/raw/org_1/msess_01J/seg_01J.mov",
    "size_bytes": 1099511628,
    "sha256": "c5f8...",
    "provider_proof_status": "verified"
  },
  "proxy": {
    "object_key": "robotics/proxy/org_1/msess_01J/seg_01J.mp4",
    "review_ready": true
  },
  "upload": {
    "status": "verified",
    "part_count": 18,
    "missing_part_count": 0
  },
  "review": {
    "status": "proxy_ready",
    "decision": "pending"
  }
}

Clip object

The reviewed training unit.

Clips carry video, depth, pose, trajectory, environment metadata, and scoring state. Robotics Studio Open can inspect local equivalents and export reviewed manifests.

{
  "clip_id": "clip_01J...",
  "session_id": "sess_01J...",
  "duration_ms": 18420,
  "video": {
    "codec": "hevc",
    "frames": 553,
    "fps": 30,
    "resolution": [1920, 1080]
  },
  "depth": {
    "modality": "lidar",
    "frames": 553
  },
  "trajectory": {
    "end_effector_frames": 553
  },
  "environment": {
    "room_scale_mesh": true,
    "blur_enforced": true
  },
  "scoring": {
    "safety": 0.94,
    "smoothness": 0.81,
    "quality": 0.88,
    "status": "approved"
  }
}

Safety signals

Make review decisions portable.

Core fields include contact severity, jerk peak, privacy enforcement, dropped-frame status, timestamp drift, and reviewer override rationale. These fields become filterable review state and export evidence.

Provider proof

Raw media is not deliverable until storage agrees.

For large captures, the backend records object-store proof before final review and buyer handoff: provider, bucket, safe key prefix, byte size, checksum metadata, encryption/lifecycle tags, request id, and integrity status. Mismatches block review or delivery according to buyer policy.

The review unit is a clip, not a loose video.