Changelog

What we shipped to the SDK.

Every release that changes how you log, replay, or store episodes. SDK calls, ingest pipeline, storage, portal - the surfaces a developer touches. Marketing-site changes land on the team blog instead.

  1. SDK 0.1.0a5.post1 - PyPI README + CLI on the tin

    PEP 440 post-release: same wheel as `0.1.0a5`, refreshed PyPI project-description copy. Quickstart now shows portal API key vs `robotrace login`, documents `robotrace whoami`, adds a CLI command table, and clarifies env vs `~/.robotrace/credentials` resolution.

    • Install pin · `pip install robotrace-dev==0.1.0a5.post1` matches this README drop; `==0.1.0a5` is byte-identical code - pick either for CI reproducibility.
  2. SDK 0.1.0a5 - OTel traceparent respects sampling flag

    The `traceparent` string we attach to episodes now preserves the upstream OpenTelemetry `trace_flags` byte instead of overwriting the sampled bit to `01`. That matches W3C Trace Context - downstream systems that ingest the header (curl replay, sidecars, APMs) no longer contradict the customer's sampler.

    • Behavior change · Previously `capture_trace_context()` forced `…-01` even when `SpanContext.trace_flags` indicated unsampled (`…-00`). Portal deep-links still use `trace_id` / `span_id` only; only the propagation header semantics changed. Upgrade from `0.1.0a4` if you propagate `metadata.otel.traceparent` outside RoboTrace.
  3. SDK 0.1.0a4 - Replay regression harness

    Re-roll a candidate policy against historical episodes for real. The customer-side runner downloads baseline actions/sensors from R2, replays them through a Python callable on the customer's own hardware, and uploads per-episode diff metrics. The portal renders the same 5-metric DiffCard the marketing site has been promising - now with actual numbers from actual training runs.

    • robotrace.evals · three new verbs · create_run(candidate_policy_version, baseline_episode_ids) opens a campaign and seeds one eval_results row per baseline. run_against(run, policy_callable=...) walks every baseline, fetches its actions.npz + sensors.npz via the signed-GET artifact resolver, runs the customer's policy locally, computes success / reward / collision / time-to-goal / L2 / OOD-share metrics, and posts each result. complete_run(run) triggers the server-side rollup and returns the same summary shape the portal DiffCard renders.
    • robotrace replay run · new CLI verb · Drives the same loop from the command line. Flags: --policy module:fn (importable callable, gunicorn-style), --candidate-version, --baseline-episodes ep1,ep2,… or @file.txt, --baseline-version, --dry-run (skip uploads, useful while iterating on the callable). Prints per-episode progress with clickable portal links and a final summary table. See /docs/sdk/evals for the full reference.
    • Customer-side runner, by design · Per AGENTS.md the policy weights never touch RoboTrace infrastructure - the customer's policy_callable runs on their hardware, the SDK only uploads the per-episode metric blob plus a metadata-only source="replay" episode so the portal can drill from the eval row back to its replay. Reuses the existing artifact resolver route - same RLS guard, no new bytes-egress path.
    • Per-episode safety + the _outcome sentinel · Failures inside policy_callable are caught per-baseline and recorded as status="failed" rows with a truncated traceback - one bad observation can't sink a sweep. Customers who can compute success at the policy layer return a {"_outcome": {success, reward_total, …}} dict in the last action; the runner pulls those values into the candidate columns of the metric blob so the DiffCard shows real movement instead of a delta-of-zero.
    • New portal surface · /portal/evals lists every campaign (success-delta pill, candidate vs baseline policy, status). /portal/evals/[id] shows the rollup DiffCard alongside a per-episode results table that links into each replay episode. Episode detail pages render a 'Part of eval run' pill when metadata.eval_run_id is set so the navigation closes the loop both ways. Three new server routes (POST /api/ingest/eval-run, .../[id]/result, .../[id]/finalize) carry the ingest contract - cross-tenant guards on every baseline_episode_id.
    • Rate-limit ergonomics on 429 · New typed RateLimitError(APIError) with a retry_after int parsed from the Retry-After header - a robot rig that bumps a quota now sees "wait 30s" instead of an opaque APIError. The SDK transparently retries on the safe call sites (start_episode create, signed-PUT uploads, evals.create_run, evals run_against per-result upserts) using Retry-After when present (capped at 30s) and exponential backoff (1, 2, 4s) otherwise. Episode.finalize and evals.complete_run deliberately do NOT auto-retry - the server may have processed the mutation before the 429 was sent back, and re-issuing on future paid tiers could double-bill artifact storage. Catch RateLimitError at the call site, sleep exc.retry_after or 30, retry yourself. See /docs/sdk/errors#ratelimiterror-429.
    • robotrace logout --revoke · Self-revoke from the CLI. Until now logout only removed the local credentials file - the key on the server stayed alive until you opened the portal, which was the wrong default for stolen-laptop and rig-decommission scenarios. Passing --revoke POSTs to a new /api/cli/auth/revoke endpoint authenticated with the stored Bearer key, flips revoked_at on the matching client_api_keys row, then deletes the local file. The route is scoped to "the key you authenticated with" - you can't use one CLI key to revoke a sibling, which keeps the blast radius of a leaked key consistent (attacker holding key A can only kill A, never B / C). Network failure or 5xx still wipes the local file (the point of logout is a local guarantee) but exits non-zero so CI catches it. See /docs/sdk/cli-login#logout-revoke-kill-the-key-server-side-too.
    • What's still V1 · Webhooks (eval_run.completed) - the Team-tier bullet on /pricing. Hosted runner - the eval_runs.runner_kind column is already in place so the V1 schema bump is a default change, not a migration. Cross-run trendlines (v13 vs v14 vs v15). CI-triggered regressions.
  4. Episode delete cascade, admin audit log, platform polish

    Fixes eval baseline FK blocking hard-delete, ships an append-only admin audit trail, and closes legal + docs share-surface work. CLI login polish for SDK 0.1.0a6 is in the May 16 entry.

    • Migration 0013 · eval_results baseline CASCADE · Deleting an episode referenced as an eval-run baseline used to fail on FK RESTRICT. The baseline FK is now ON DELETE CASCADE - per-episode eval_results rows drop with the baseline episode; eval run rollups stay until you archive the run. Portal and admin delete flows work again when regression history exists.
    • Admin audit log · /admin/audit is live - append-only trail of sensitive CMS actions (access decisions, client invites, maintenance toggles, API key mint/revoke, episode deletes, role changes). Rows insert via the service role inside Server Actions; admins read through RLS. Apply migration 0014_audit_log.sql and supabase/policies/audit_log.sql if your deployment predates this ship.
    • Accessibility statement · New /accessibility route (LegalShell, sitemap, maintenance gate). Documents UserWay lazy-loaded on marketing only - portal and admin omit overlays. Footer Company column + portal Help link for procurement conversations.
    • Legal contact routing · /terms and /privacy drop placeholder legal-entity lines and add hello@robotrace.dev for general routing alongside the existing legal@ address.
    • Docs share surfaces (P1f partial) · /docs/quickstart ships a dedicated Open Graph card plus aligned openGraph/twitter metadata (Start-here ribbon) so Slack previews read as onboarding, not the generic docs hub. /request-access, /about, and /contact got the same shared-copy metadata pass.
    • SDK version pins · Admin and portal Powered-by chips read live SDK_VERSION from apps/web/lib/sdk/version.ts (0.1.0a6 at this ship).
  5. SDK 0.1.0a6 - friendlier `robotrace login` terminal output

    CLI sign-in reads warmer and clearer: welcome line, verification block rework, ansi hints on capable ttys, and a tighter in-place spinner. Pin `pip install robotrace-dev==0.1.0a6` for this drop.

    • Portal · The `/cli/auth` countdown now hydrates cleanly (timer starts after mount) so approving a device login no longer flashes a React mismatch in devtools.
  6. SDK 0.1.0a3 - LeRobot adapter

    Hugging Face LeRobot datasets are now first-class. One `pip install`, one call, every trajectory in a Hub dataset becomes its own RoboTrace episode with frame-accurate video, sensor / action NPZ files, and reward / outcome rolled into metadata.

    • robotrace.adapters.lerobot · Four verbs mirror the ROS 2 adapter shape - scan_dataset (read meta only, fast Hub probe), encode_episode (write video.mp4 + sensors.npz + actions.npz for one trajectory), upload_episode (one-shot single episode), upload_dataset (bulk walk every trajectory with optional on_progress callback). Each LeRobot trajectory becomes one RoboTrace episode - natural mapping, no policy decisions to make.
    • Lean install, no torch baggage · The [lerobot] extra deliberately does NOT depend on the `lerobot` PyPI package (which would pull torch + torchvision + pyav + several CUDA wheels). We read the v2.1 on-disk format directly with pyarrow + huggingface_hub. ~20 MB install on top of the base SDK - same footprint as [ros2].
    • Auto-classification of LeRobot columns · observation.images.<cam> → video, action[.x] → actions, next.{reward,done,success,*} → episode-level metadata, observation.* + unknown columns → sensors. Internal LeRobot bookkeeping (timestamp, frame_index, etc.) gets filtered. Multi-camera datasets tile horizontally; pass canonical_camera=... to pin one camera and skip the opencv path entirely (single-cam copies the source mp4 byte-for-byte).
    • Episode outcome surfaces in metadata · next.reward gets summed into a single per-trajectory next.reward_sum on the episode's metadata block, alongside next.done / next.success. Training pipelines can read it without unpacking the actions NPZ.
    • v3.0 dataset format · Multi-episode parquet shards (LeRobot v3.0, late 2025) are NOT yet supported - the adapter raises a clear ConfigurationError pointing at the v2.1 revision fallback. Most public lerobot/* Hub datasets are still v2.1 as of this release; v3.0 lands in a follow-up once we see real-user demand.
  7. SDK 0.1.0a2, R2 storage, and portal polish

    A long day of shipping. Storage went from local-only to a real Cloudflare R2 bucket behind signed URLs, the SDK earned its first OpenTelemetry release, and the portal closed three of its biggest day-1 friction points.

    • SDK 0.1.0a2 · OpenTelemetry trace correlation · New optional [otel] extra (opentelemetry-api only - no heavy SDK). When the SDK detects an active span, it attaches trace_id / span_id / traceparent to every start_episode call. Server validates the W3C trace-context shape and persists it on the episode. The portal episode page renders a Tracing card with copy buttons and an optional one-click 'Open trace' deep-link via NEXT_PUBLIC_TRACE_URL_TEMPLATE (Datadog, Honeycomb, Grafana Tempo, Jaeger). Zero new kwargs - turn it on by installing the extra.
    • Cloudflare R2 wired end-to-end · Episode bytes (.mp4, .npz, .parquet) now flow from the SDK straight to a real R2 bucket via signed PUT URLs minted by the ingest route. The bucket stays private - the DB stores canonical R2 object keys, and a new /api/episodes/[id]/artifact/[kind] route handler mints fresh 1-hour signed GET URLs on every read, gated by the caller's tenant.
    • Episode delete in portal + admin · Three-dot row menu on the episode list (and a matching admin variant) with Archive / Restore / Delete. Delete uses a type-DELETE confirmation dialog to make accidental loss expensive. Note: bytes in R2 are not yet swept - the row-delete clears the DB record only. A reaper worker is on the roadmap.
    • Demo episode for empty portal · First-time-approved users used to land on 'No episodes yet'. They now land on the same empty state, but the preview row is a real clickable Sample run - clicking opens a canonical, read-only sample episode with a synthetic pick-and-place video. Implemented as a sentinel-UUID short-circuit at three boundaries (list page, detail page, artifact resolver) - no migration, no fake DB row, gated behind DEMO_EPISODE_VIDEO_KEY so unseeded deployments can't ship a broken player.
    • Profile vs. Workspace · Settings used to have one editable name field that - silently - also wrote to the workspace name when the caller was the owner. So 'Acme Robotics' became both your personal name (greeting: 'Good afternoon Acme') and your workspace label. The two are now split into a Profile card (personal display name) and a Workspace card (owner-only rename), with clear copy explaining which is which.
  8. ROS 2 adapter - rosbag2 in, episode out

    • ROS 2 adapter (rosbag2 → episode) · The packages/sdk-python [ros2] extra is no longer empty. New scan_bag / encode_bag / upload_bag helpers walk a rosbag2 directory, encode camera topics to .mp4 with OpenCV, and hand the result to the same upload pipeline the rest of the SDK uses. ROS 2 humble + jazzy supported, no rclpy at runtime so you can read bags without a sourced ROS environment.