Early access · Observability for AI robots

The black box recorder for AI robots.

RoboTrace records every robot run - video, sensors, actions, model version, code version, and environment - so teams can replay failures, compare models, and catch regressions before they reach real hardware.

“We stopped shipping blind regressions to the robot.”
Replay-grade telemetry means something concrete when every run ships with full context- not thumbnails in Slack.

Canonical install · v0.1.0a6

python3 -m pip install robotrace-dev

PyPI robotrace-dev; import robotrace in code. Tabs use python3 -m pip /py -m pip so install targets the interpreter you intend.

robotrace loginROS 2 nativeLeRobot readyOpen SDK on GitHub
Scroll
The release gate

If you can't replay it, your robot isn't ready for production.

Production-grade observability isn't a nice-to-have - it's the bar a robot run has to clear before it counts as shippable. A run is replayable when the recording is enough to reconstruct it byte-for-byte, months later, on a fresh laptop, by someone who wasn't in the room.

  • 01

    What the robot saw

    Every camera stream, frame-aligned to the moment the policy made each decision. The same .mp4 your model was looking at - not a thumbnail, not a screenshot.

  • 02

    What it felt and did

    Every joint state, IMU sample, force reading on the input side; every command vector on the output side. Time-aligned, no manual stitching.

  • 03

    Why it did it

    policy_version, env_version, git_sha, seed. The exact weights, the exact code, the exact RNG that produced this run - so a year from now you can re-roll a new candidate against it.

That's what RoboTrace captures on every log_episode(...). If a run can't be replayed from those four things alone, we treat it as a bug - not a missing feature.

Why RoboTrace

Yes, your robot already writes logs. Here's why that isn't enough to ship a policy.

Every robotics team starts the same way - rosbags piling up on a dev machine, an MP4 from the wrist cam in someone's Downloads folder, regression checks in a notebook nobody can rerun. It works until a model regresses in the field and you can't tell which run, which sensor, or which scene broke it.

Built for robotics ML teams shipping to hardware

Who gets mileage first: imitation / RL / VLA engineers wiring real logs into training - robotics platform leads bridging sim, datasets, and the floor - teams already on ROS 2 or HF LeRobot and tired of rewriting the same ingest scaffolding every quarter.

VLA · imitation · RL engineersPlatform · infra leadsROS 2 + LeRobot workloads

Python SDK on PyPI

Install mirrors docs - wired for signed uploads to object storage.

Rosbag · LeRobot ready

ROS 2 bag scans + HF LeRobot v2.1 → episodes without heavyweight extras.

Replay regression harness

Candidate policies rerun against baseline episodes inside your own infra.

Invite-first Phase 1

Access is approved by hand - we onboard every org before they ship real bytes.

What you already have

Logs your robot writes today - every team starts here, and almost everyone hits the same wall three months later.

  • Rosbags, MP4s, and joint-encoder CSVs sit in different files at different rates. Replaying “what the robot saw at t=14.3s” means stitching them together by hand.
  • 30 GB rosbags live on the laptop that recorded them. They never get backed up, never get searched, and never make it into training data.
  • A new policy can’t run against a rosbag — a rosbag is a recording, not an environment. Re-rolling a candidate means putting the robot back on the table.
  • Regression checks are a notebook on someone’s laptop. When v8 breaks reflective objects, you hear about it from a customer email.
  • “Which policy ran on which robot during the Tuesday demo?” is a 4-hour archaeology session through Slack DMs and shared drives.

What RoboTrace adds

Recommended

One pip install behind the ingest path you\u2019d eventually have to build yourself - without losing the quarter to plumbing.

  • One log_episode(...) call ships video, sensors, and motor commands as a single synced episode. Resumable uploads, signed URLs, and R2 storage are already wired up.
  • Every episode is searchable by robot, policy version, scene, and git SHA. The dataset that trained v8 is still there, byte-identical, in week 12.
  • Re-roll any candidate policy against thousands of past episodes. Per-step performance deltas, side-by-side, before the robot leaves the rack.
  • Out-of-distribution alerts and crash replays the moment a deployed robot drifts. You hear about regressions from RoboTrace, not from your customer.
  • Tag, slice, and snapshot runs into reproducible training sets. No git-LFS, no “Maria sent it in Slack on March 12,” no archaeology.

Your engineers' job is the policy that makes the robot do the right thing. RoboTrace is everything between that policy and the engineer - the SDK, the storage, the replay, the regression harness - so the next quarter goes into the model, not into rebuilding the dashboard your team complains about every standup.

The product

One platform. Four problems your team has been duct-taping for months.

RoboTrace is the single surface for replay, regressions, datasets, and ops - without gluing FFmpeg folders, Jupyter reruns, and bespoke Slack bots together every sprint.

We didn't need louder dashboards - we needed the same synced episode twice: once when we debated a model change, once when hardware in the field disagreed with the spreadsheet.
Phrasing we hear across invite onboarding - summarized and anonymized; not a named customer endorsement.
01

Replay every robot run

Walk through a synced timeline of video, sensors, and motor commands. Pause on the exact frame your robot did the wrong thing.

  • Multi-camera support
  • Frame-accurate playback
  • Bookmark moments
02

Test before you ship

Run a new model against thousands of past runs. See exactly where it improves and where it regresses - before it touches a real robot.

  • Side-by-side model diffs
  • Per-step performance delta
  • Regression dashboards
03

Versioned datasets

Tag, slice, and snapshot runs into reproducible training sets. Reach for the same data weeks later without juggling git-LFS or shared drives.

  • Tag and filter runs
  • Saved slices
  • S3-compatible exports
04

Fleet observability

Stream live telemetry from your deployed robots. Get notified the moment a model behaves out of distribution or crashes in the field.

  • Out-of-distribution alerts
  • Crash replays
  • Per-robot drilldowns
How it works

From pip installto your first synced replay URL in five minutes.

The pillars above assume episodes that survive the lab - here is how a team earns them without rewiring training: bolt the CLI onto the rollout you already ship, mint a replay link when ingest finishes, review motion and sensors inside one synced portal pane, then line up robotrace replay run batches against your archive before the arm books another slot.

  • STEP 01

    Install and log in

    One pip install plus one `robotrace login` and your machine is authorized - no copy-pasting API keys out of a portal. The CLI opens your browser, you click Authorize, and the credentials drop into ~/.robotrace/credentials.

    Terminal · bash
    $ python3 -m pip install robotrace-dev
    Successfully installed robotrace-dev-0.1.0a6
    
    $ robotrace login
    
    Welcome to RoboTrace!
    RoboTrace - sign in via your browser
    https://app.robotrace.dev
    
    Verification link (already includes your user code):
    [robotrace] https://app.robotrace.dev/cli/auth?code=XKDF-PQ4N
    Opening your default browser…
    
    Confirm this matches the page before Authorize
      XKDF-PQ4N
    
    Signed in as art@robotrace.dev
      Credentials · ~/.robotrace/credentials
      Profile · default
      Portal · https://app.robotrace.dev/portal
    
  • STEP 02

    Wrap your run loop

    Three lines around the loop you already have - LeRobot rollouts, ROS 2 bags, a Gymnasium env, or bespoke training glue. Once the episode opens, the SDK prints a replay URL you paste into Slack: one permalink to the synced run your whole team rewatches.

    train.py - output
    # import robotrace as rt
    # 
    # with rt.start_episode(name="pick-and-place v3",
    #                        policy_version="vla-7b@v12") as ep:
    #     rollout(env, policy, ep)
    #     ep.upload_video('./run.mp4')
    
    $ python train.py
    [robotrace] https://app.robotrace.dev/portal/episodes/ep_8f3c
    uploading run.mp4 … 100%   sensors.bin … 100%
    Episode ready in 4.2s.
    
  • STEP 03

    Open the dashboard

    Runs surface in your project within seconds - scrub frame-accurate video with ingest-time sensors and torque traces pinned to one playhead, bookmark the frame where things went wrong, and hand teammates the identical scrub via a permalink.

  • STEP 04

    Test on past runs

    Once runs live in object storage instead of a disappearing laptop folder, rerun them offline with robotrace replay run and robotrace.evals. Seed baseline episode IDs inline or from file, attach your callable so checkpoints stay on your runners, publish metrics upward, then read the DiffCard under /portal/evals before the next torque pass.

    robotrace replay run \
      --policy my_team.policies:candidate_v13 \
      --candidate-version pap-v13 \
      --baseline-episodes a1b2c3d4-e5f6-7890-abcd-ef1234567890 \
        f9e8d7c6-b5a4-3210-fedc-ba0987654321 \
      --baseline-version pap-v12 \
      --name nightly-vs-v13
Synced episode replay

Every frame. Every joint. Every action.

Scrub once and the whole run stays locked: cameras, joint torque, and policy output share one timeline. Pause where something breaks, bookmark that instant, and paste the link - whoever opens it sees the same frame you did.

  • One scrubber

    Every stream shares the same clock - video, sensors, and actions move together.

  • Shareable bookmarks

    Copy a link with ?t=…ms and teammates land on your exact frame.

  • Inspect before re-roll

    Find the failure in the portal before you queue another eval or book the arm.

Example episode replay: reference camera on the left, synced sensor tracks on the right, one shared scrubber at the bottom, and a shareable bookmark link for the current frame.
app.robotrace.dev/portal/episodes/ep_8f3c · pick-and-place-v3
Shared momentPaused at04:21.042
?t=261042ms · anyone on your team opens this frame

Reference video

Live replay
Front camera720p · 30 fps
04:21 / 06:48
PickPlacegripper tracked

Synced signals

Front cameraMain RGB feed
Wrist cameraClose-up angle
Joint torqueRight arm · 1 kHz
Policy actionsModel output per step

Shared timeline

04:21
06:48

Demo run: pick-and-place with two cameras, joint torque, and policy actions on one scrubber. The playhead you see moves all four tracks and the arm animation together.

Test on past runs

Don't ship a regression to a real robot.

Every run RoboTrace records becomes a test case. Use robotrace.evals or robotrace replay run to replay a candidate against thousands of baseline episodes, see exactly where it does better - and worse, and land the rollup in /portal/evals - without booking another hour on the arm.

  • No extra rack time

    Re-roll candidates against archived observations - train the comparison loop without booking the arm.

  • Compare labeled policy versions

    Baseline and candidate versions roll up side-by-side in the portal DiffCard - same metrics every run.

  • Curate any baseline list

    Pass episode IDs inline or from a file - build the slice however you query your archive today.

  • Weights stay on your machines

    Signed URLs fetch tensors for replay; your callable executes locally - RoboTrace only receives the metric payloads.

Eval · 1,000 episodes · pick-and-place-v3DURATION 03:42
Metricv12v13Δ
  • Success rate62.4%78.1%+15.7
  • Avg. reward / step0.4120.521+0.109
  • Collision rate8.2%3.1%−5.1
  • Time-to-goal (s)14.711.2−3.5
  • OOD action share1.9%0.4%−1.5
Recommend: ship v135 of 5 better
Integrations

Plays nicely with your stack.

Bring your simulator, policy, or dataset format. RoboTrace ships lean optional extras ([ros2], [lerobot]) so heavy dependencies never land on rigs that do not need them. Everything else can go straight through log_episode with NumPy-friendly buffers and paths.

On the roadmap
  • MuJoCo
  • Isaac Sim
  • Genesis
  • Gymnasium
  • Hugging Face
FAQ

The questions we get every week.

No buzzwords - what robotics teams actually ask before they wire log_episode into training.

5 lines of Python

Start logging in minutes.

Join the early-access cohort. We're onboarding teams one at a time so we can stay close to the painful parts. Tell us what you're building and we'll get back to you this week.

ROS 2 nativeLeRobot readyOpen SDK

Phase 1 · Free for early-access teams · English only