robotrace.log_episode

The single one-shot entry point for ingesting an episode. Equivalent to start_episode(...) → upload all artifacts → finalize. Use this for the 95% case of "I have files on disk, log them and move on."

The contract is sacred

This signature is the sacred SDK contract. Once we cut 1.0.0, breaking it requires:

  • A major version bump (1.x2.0)
  • At least one minor of DeprecationWarning before the break ships, so existing training scripts get an early warning instead of a TypeError

Until 1.0.0 (we're at 0.1.0a6 today) we may still iterate on the shape - every change lands in the SDK changelog.

Already on OpenTelemetry? Calling log_episode(...) inside an active OTel span attaches trace_id / span_id / traceparent to the episode automatically - no new kwargs. See OpenTelemetry trace correlation for the install step (pip install 'robotrace-dev[otel]==0.1.0a6') and how the portal deep-links into your APM.

Signature

def log_episode(
    *,
    # Identification
    name: str | None = None,
    source: Literal["real", "sim", "replay"] = "real",
    robot: str | None = None,
 
    # Reproducibility - load-bearing
    policy_version: str | None = None,
    env_version: str | None = None,
    git_sha: str | None = None,
    seed: int | None = None,
 
    # Artifact paths (uploaded inline via signed PUT URLs)
    video: str | Path | None = None,
    sensors: str | Path | None = None,
    actions: str | Path | None = None,
 
    # Run details
    duration_s: float | None = None,
    fps: float | None = None,
    metadata: Mapping[str, Any] | None = None,
 
    # Final state - defaults to "ready". Pass "failed" when the run
    # errored before producing usable data.
    status: Literal["ready", "failed"] = "ready",
) -> Episode

All arguments are keyword-only - positional calls raise TypeError. This is intentional: it lets us add new params without breaking older call sites.

Identification

name: str | None

Human-readable label for the run, shown in the episodes list. Falls back to episode_<short_id> when omitted. Use the same naming scheme across runs of the same task - it makes the list filterable.

source: "real" | "sim" | "replay"

Where the episode came from:

  • real - physical robot. The default.
  • sim - simulator (MuJoCo, Genesis, Isaac, Drake, etc.).
  • replay - generated by re-rolling a policy against a previously recorded observation stream. The eval engine sets this for you; you generally don't pass it manually.

robot: str | None

Stable identifier for the physical robot or sim configuration that produced the episode. Recommend a short slug (halcyon-bimanual-01, franka-right, ur5-cell-3) so the portal can group runs across days.

Reproducibility (load-bearing)

These four fields exist so future-you can re-roll a new policy against this episode and know what changed. Don't drop them to "simplify" - the eval engine literally can't run without them.

policy_version: str | None

A stable identifier for the policy / model checkpoint that produced this episode. Conventions we recommend:

StyleExample
SL/ILckpt_2026-05-01_step_180k
RLppo_2026-05-01_seed42
Frozen baselinebaseline_v1
VLApap-v3.2.1 (semver against the policy)

Whatever you pick - make it resolvable. The re-roll feature can only re-run a policy version it can locate, so don't put random hashes here unless your registry can map them back to weights.

env_version: str | None

The environment / world version. For sim, the build hash or config tag (mujoco_warehouse_v3, genesis-rev412). For real-world, the workcell setup version (cell_a_2026-04-12). Required so re-rolls know whether comparing across policy_versions is fair.

git_sha: str | None

The git SHA of your training/inference code at the time the episode was produced. We don't validate that the SHA exists in any specific repo - that's between you and your CI. Seven characters minimum is the convention.

seed: int | None

The seed used by the policy / env. If your stack uses multiple seeds, pass the highest-level one and stash the rest in metadata.

Artifacts

Local file paths. Each is uploaded to Cloudflare R2 via a short-lived signed PUT URL - bytes never touch the RoboTrace origin server. The SDK streams from disk so memory stays flat regardless of file size.

video: str | Path | None

A video file (.mp4, .webm, .mov). The signed URL is minted with Content-Type: video/mp4, so the file's actual content type needs to match. Files up to 8 GB are supported in Phase 1; split longer episodes.

sensors: str | Path | None

A serialized sensor blob - typically a .npy, .npz, .h5, or .bin file containing per-step sensor arrays ((T, ...) shaped, time axis first). Format is opaque to the server: we store the bytes and let your replay tooling deserialize them.

actions: str | Path | None

A serialized actions blob - typically a .parquet, .feather, or .npy file containing the (T, action_dim) action vector. Required if you want to re-roll a different policy on this episode later.

The SDK sanity-checks file extensions against slot names. Passing actions="run.mp4" raises ConfigurationError - likely the kwargs got swapped.

Run details

duration_s: float | None

Wall-clock duration of the run in seconds. Shown on the detail page and used by the dashboard heatmap to weight cells.

fps: float | None

Sampling rate for the recorded sensors / actions. Used by the replay viewer (when it ships) to align video and sensor tracks.

metadata: Mapping[str, Any] | None

Free-form JSON metadata stored as metadata jsonb on the episode row. Use it for anything that doesn't fit the standard fields: operator, lighting, shift, hardware revision, task outcome, etc.

Don't put bytes or raw sensor values here - that's what the artifact slots are for. The column is indexed for JSON search but not designed for multi-MB blobs.

status: "ready" | "failed"

Final state to flip the episode into. Defaults to "ready". Pass "failed" when you know the run errored before producing usable data - the episode still appears in the list but is filtered out of "recent successful runs" cards.

Return value

@dataclass
class Episode:
    id: str                               # uuid, as str
    status: str                           # "ready" or "failed"
    storage: Literal["r2", "unconfigured"]
    upload_urls: dict[ArtifactKind, UploadUrl]

You rarely need the return value from log_episode - by the time it returns, everything's already uploaded and finalized. Useful when you want to capture the episode id for your own logs:

ep = rt.log_episode(...)
my_logger.info("logged episode", episode_id=ep.id)

Errors

log_episode raises typed exceptions on every failure path. See Errors for the full hierarchy and recovery patterns. The most common ones in this call:

ExceptionWhen
ConfigurationErrorapi_key / base_url missing, or a file path doesn't exist
AuthErrorAPI key bad / revoked
ValidationErrorPayload didn't pass server-side validation
ConflictError(rare) Episode is somehow already archived
TransportErrorNetwork / DNS / timeout
ServerError5xx - flag for retries

If an upload fails partway through, the SDK auto-flips the run to status="failed" with the failure reason in metadata.failure_reason before re-raising - so you don't end up with ghostly "recording" runs in the portal.

Don'ts

  • Don't call log_episode from inside your training inner loop. Rate-limit at episode boundaries, not per step.
  • Don't put episode bytes in metadata. The DB is for metadata, R2 is for bytes.
  • Don't log the API key in your training script - use environment variables. The SDK never logs the key value.
  • Don't pass positional arguments. The contract is keyword-only on purpose.