Early access · Observability for AI robots

The black box recorder for AI robots.

RoboTrace records every robot run - video, sensors, actions, model version, code version, and environment - so teams can replay failures, compare models, and catch regressions before they reach real hardware.

“We stopped shipping blind regressions to the robot.”

Replay-grade telemetry means something concrete when every run ships with full context- not thumbnails in Slack.

Canonical install · v0.1.0a6

PyPI robotrace-dev; import robotrace in code. Tabs use python3 -m pip /py -m pip so install targets the interpreter you intend.

Get early access Read the docs

robotrace loginROS 2 nativeLeRobot readyOpen SDK on GitHub

Scroll

The release gate

If you can't replay it, your robot isn't ready for production.

Production-grade observability isn't a nice-to-have - it's the bar a robot run has to clear before it counts as shippable. A run is replayable when the recording is enough to reconstruct it byte-for-byte, months later, on a fresh laptop, by someone who wasn't in the room.

01
What the robot saw
Every camera stream, frame-aligned to the moment the policy made each decision. The same .mp4 your model was looking at - not a thumbnail, not a screenshot.
02
What it felt and did
Every joint state, IMU sample, force reading on the input side; every command vector on the output side. Time-aligned, no manual stitching.
03
Why it did it
policy_version, env_version, git_sha, seed. The exact weights, the exact code, the exact RNG that produced this run - so a year from now you can re-roll a new candidate against it.

That's what RoboTrace captures on every log_episode(...). If a run can't be replayed from those four things alone, we treat it as a bug - not a missing feature.

Why RoboTrace

Yes, your robot already writes logs. Here's why that isn't enough to ship a policy.

Every robotics team starts the same way - rosbags piling up on a dev machine, an MP4 from the wrist cam in someone's Downloads folder, regression checks in a notebook nobody can rerun. It works until a model regresses in the field and you can't tell which run, which sensor, or which scene broke it.

Built for robotics ML teams shipping to hardware

Who gets mileage first: imitation / RL / VLA engineers wiring real logs into training - robotics platform leads bridging sim, datasets, and the floor - teams already on ROS 2 or HF LeRobot and tired of rewriting the same ingest scaffolding every quarter.

VLA · imitation · RL engineersPlatform · infra leadsROS 2 + LeRobot workloads

Python SDK on PyPI

Install mirrors docs - wired for signed uploads to object storage.

Rosbag · LeRobot ready

ROS 2 bag scans + HF LeRobot v2.1 → episodes without heavyweight extras.

Replay regression harness

Candidate policies rerun against baseline episodes inside your own infra.

Invite-first Phase 1

Access is approved by hand - we onboard every org before they ship real bytes.

What you already have

Logs your robot writes today - every team starts here, and almost everyone hits the same wall three months later.

Rosbags, MP4s, and joint-encoder CSVs sit in different files at different rates. Replaying “what the robot saw at t=14.3s” means stitching them together by hand.
30 GB rosbags live on the laptop that recorded them. They never get backed up, never get searched, and never make it into training data.
A new policy can’t run against a rosbag — a rosbag is a recording, not an environment. Re-rolling a candidate means putting the robot back on the table.
Regression checks are a notebook on someone’s laptop. When v8 breaks reflective objects, you hear about it from a customer email.
“Which policy ran on which robot during the Tuesday demo?” is a 4-hour archaeology session through Slack DMs and shared drives.

What RoboTrace adds

Recommended

One pip install behind the ingest path you\u2019d eventually have to build yourself - without losing the quarter to plumbing.

One log_episode(...) call ships video, sensors, and motor commands as a single synced episode. Resumable uploads, signed URLs, and R2 storage are already wired up.
Every episode is searchable by robot, policy version, scene, and git SHA. The dataset that trained v8 is still there, byte-identical, in week 12.
Re-roll any candidate policy against thousands of past episodes. Per-step performance deltas, side-by-side, before the robot leaves the rack.
Out-of-distribution alerts and crash replays the moment a deployed robot drifts. You hear about regressions from RoboTrace, not from your customer.
Tag, slice, and snapshot runs into reproducible training sets. No git-LFS, no “Maria sent it in Slack on March 12,” no archaeology.

Your engineers' job is the policy that makes the robot do the right thing. RoboTrace is everything between that policy and the engineer - the SDK, the storage, the replay, the regression harness - so the next quarter goes into the model, not into rebuilding the dashboard your team complains about every standup.

The product

One platform. Four problems your team has been duct-taping for months.

RoboTrace is the single surface for replay, regressions, datasets, and ops - without gluing FFmpeg folders, Jupyter reruns, and bespoke Slack bots together every sprint.

We didn't need louder dashboards - we needed the same synced episode twice: once when we debated a model change, once when hardware in the field disagreed with the spreadsheet.

Phrasing we hear across invite onboarding - summarized and anonymized; not a named customer endorsement.

Replay every robot run

Walk through a synced timeline of video, sensors, and motor commands. Pause on the exact frame your robot did the wrong thing.

Multi-camera support
Frame-accurate playback
Bookmark moments

Test before you ship

Run a new model against thousands of past runs. See exactly where it improves and where it regresses - before it touches a real robot.

Side-by-side model diffs
Per-step performance delta
Regression dashboards

Versioned datasets

Tag, slice, and snapshot runs into reproducible training sets. Reach for the same data weeks later without juggling git-LFS or shared drives.

Tag and filter runs
Saved slices
S3-compatible exports

Fleet observability

Stream live telemetry from your deployed robots. Get notified the moment a model behaves out of distribution or crashes in the field.

Out-of-distribution alerts
Crash replays
Per-robot drilldowns

How it works

From `pip install`to your first synced replay URL in five minutes.

The pillars above assume episodes that survive the lab - here is how a team earns them without rewiring training: bolt the CLI onto the rollout you already ship, mint a replay link when ingest finishes, review motion and sensors inside one synced portal pane, then line up robotrace replay run batches against your archive before the arm books another slot.

STEP 01

Install and log in

One pip install plus one `robotrace login` and your machine is authorized - no copy-pasting API keys out of a portal. The CLI opens your browser, you click Authorize, and the credentials drop into ~/.robotrace/credentials.

Terminal · bash

$ python3 -m pip install robotrace-dev
Successfully installed robotrace-dev-0.1.0a6

$ robotrace login

Welcome to RoboTrace!
RoboTrace - sign in via your browser
https://app.robotrace.dev

Verification link (already includes your user code):
[robotrace] → https://app.robotrace.dev/cli/auth?code=XKDF-PQ4N
Opening your default browser…

Confirm this matches the page before Authorize
  XKDF-PQ4N

✓ Signed in as art@robotrace.dev
  Credentials · ~/.robotrace/credentials
  Profile · default
  Portal · https://app.robotrace.dev/portal

STEP 02

Wrap your run loop

Three lines around the loop you already have - LeRobot rollouts, ROS 2 bags, a Gymnasium env, or bespoke training glue. Once the episode opens, the SDK prints a replay URL you paste into Slack: one permalink to the synced run your whole team rewatches.

train.py - output

# import robotrace as rt
# 
# with rt.start_episode(name="pick-and-place v3",
#                        policy_version="vla-7b@v12") as ep:
#     rollout(env, policy, ep)
#     ep.upload_video('./run.mp4')

$ python train.py
[robotrace] → https://app.robotrace.dev/portal/episodes/ep_8f3c
uploading run.mp4 … 100%   sensors.bin … 100%
✓ Episode ready in 4.2s.

STEP 03
Open the dashboard
Runs surface in your project within seconds - scrub frame-accurate video with ingest-time sensors and torque traces pinned to one playhead, bookmark the frame where things went wrong, and hand teammates the identical scrub via a permalink.
STEP 04
Test on past runs
Once runs live in object storage instead of a disappearing laptop folder, rerun them offline with robotrace replay run and robotrace.evals. Seed baseline episode IDs inline or from file, attach your callable so checkpoints stay on your runners, publish metrics upward, then read the DiffCard under /portal/evals before the next torque pass.
robotrace replay run \ --policy my_team.policies:candidate_v13 \ --candidate-version pap-v13 \ --baseline-episodes a1b2c3d4-e5f6-7890-abcd-ef1234567890 \ f9e8d7c6-b5a4-3210-fedc-ba0987654321 \ --baseline-version pap-v12 \ --name nightly-vs-v13

Synced episode replay

Every frame. Every joint. Every action.

Scrub once and the whole run stays locked: cameras, joint torque, and policy output share one timeline. Pause where something breaks, bookmark that instant, and paste the link - whoever opens it sees the same frame you did.

One scrubber
Every stream shares the same clock - video, sensors, and actions move together.
Shareable bookmarks
Copy a link with ?t=…ms and teammates land on your exact frame.
Inspect before re-roll
Find the failure in the portal before you queue another eval or book the arm.

app.robotrace.dev/portal/episodes/ep_8f3c · pick-and-place-v3

Shared momentPaused at04:21.042·frame 7956

?t=261042ms · anyone on your team opens this frame

Reference video

Live replay

Front camera720p · 30 fps

04:21 / 06:48

Synced signals

Front cameraMain RGB feed

Wrist cameraClose-up angle

Joint torqueRight arm · 1 kHz

Policy actionsModel output per step

Shared timeline

04:21

06:48

Demo run: pick-and-place with two cameras, joint torque, and policy actions on one scrubber. The playhead you see moves all four tracks and the arm animation together.

Test on past runs

Don't ship a regression to a real robot.

Every run RoboTrace records becomes a test case. Use robotrace.evals or robotrace replay run to replay a candidate against thousands of baseline episodes, see exactly where it does better - and worse, and land the rollup in /portal/evals - without booking another hour on the arm.

No extra rack time
Re-roll candidates against archived observations - train the comparison loop without booking the arm.
Compare labeled policy versions
Baseline and candidate versions roll up side-by-side in the portal DiffCard - same metrics every run.
Curate any baseline list
Pass episode IDs inline or from a file - build the slice however you query your archive today.
Weights stay on your machines
Signed URLs fetch tensors for replay; your callable executes locally - RoboTrace only receives the metric payloads.

Eval · 1,000 episodes · pick-and-place-v3DURATION 03:42

Metricv12v13Δ

Success rate62.4%78.1%+15.7
Avg. reward / step0.4120.521+0.109
Collision rate8.2%3.1%−5.1
Time-to-goal (s)14.711.2−3.5
OOD action share1.9%0.4%−1.5

Recommend: ship v135 of 5 better

Integrations

Plays nicely with your stack.

Bring your simulator, policy, or dataset format. RoboTrace ships lean optional extras ([ros2], [lerobot]) so heavy dependencies never land on rigs that do not need them. Everything else can go straight through log_episode with NumPy-friendly buffers and paths.

ROS 2

humble · jazzy · rosbag2 (sqlite3 · mcap)

Ready

Install

python3 -m pip install 'robotrace-dev[ros2]'

↳robotrace.adapters.ros2

Read reference

LeRobot

Hugging Face datasets · v2.1 on disk

Ready

Install

python3 -m pip install 'robotrace-dev[lerobot]'

↳robotrace.adapters.lerobot

Read reference

Raw NumPy

arrays, tensors, and file paths - no bridge

Ready

Install (core)

python3 -m pip install robotrace-dev

↳robotrace.log_episode

Read reference

On the roadmap

MuJoCo
Isaac Sim
Genesis
Gymnasium
Hugging Face

FAQ

The questions we get every week.

No buzzwords - what robotics teams actually ask before they wire log_episode into training.

It ships today in the Python SDK (`robotrace.evals`) and CLI (`robotrace replay run`): you pass baseline episode IDs (inline or `@file`), RoboTrace serves signed GET URLs for stored observations/actions, and your policy callable runs on your own hardware - weights never upload to us. Per-episode metrics sync back and finalize into a rollup under `/portal/evals`. Deterministic policies produce straightforward metric deltas. Stochastic policies usually need a fixed seed for apples-to-apples rerolls, or you compare sampled actions against the logged trace - richer built-in Monte Carlo overlays are still roadmap. Episodes already retain policy_version, env_version, git_sha, and seed for this workflow.

5 lines of Python

Start logging in minutes.

Join the early-access cohort. We're onboarding teams one at a time so we can stay close to the painful parts. Tell us what you're building and we'll get back to you this week.

Get early access Read the docs

ROS 2 nativeLeRobot readyOpen SDK

Phase 1 · Free for early-access teams · English only

The black box recorder for AI robots.

If you can't replay it, your robot isn't ready for production.

What the robot saw

What it felt and did

Why it did it

Yes, your robot already writes logs. Here's why that isn't enough to ship a policy.

What you already have

What RoboTrace adds

One platform. Four problems your team has been duct-taping for months.

Replay every robot run

Test before you ship

Versioned datasets

Fleet observability

From pip installto your first synced replay URL in five minutes.

Install and log in

Wrap your run loop

Open the dashboard

Test on past runs

Every frame. Every joint. Every action.

Don't ship a regression to a real robot.

Plays nicely with your stack.

The questions we get every week.

Does it work with my custom robot rig?

Where is the data stored?

Is it self-hostable?

Is it open source?

How does replay regression work for stochastic policies?

How much does it cost?

Do you support real-time fleet streaming?

Start logging in minutes.

From `pip install`to your first synced replay URL in five minutes.