RL Training KPIs

Code RL systems use compilation success as the reward signal. For Rust projects, compile time is 70–90% of the rollout phase — the main bottleneck of the training loop. cargo-slicer rl-bench translates compile speedup into the KPI language used by MLOps teams.

Usage

# Measure current project (2 cold builds per mode)
cargo-slicer rl-bench

# Custom options
cargo-slicer rl-bench --runs 3 --rollout-fraction 0.85 \
  --gpus 16 --project /tmp/your-project

# Persist to bench-results.db
cargo-slicer rl-bench --db bench-results.db

KPIs reported

KPI 1 — Cold-build throughput (samples/hour)

samples/hour = 3600 / compile_time_seconds

KPI 2 — Incremental feedback latency

Time from a one-line edit to the first cargo check result.

KPI 3 — Compute cost per valid sample

cost = compile_time / pass_rate

KPI 4 — Cluster-hour equivalent

How many RL samples fit in one GPU-cluster-hour at a given rollout fraction.

Example output (nushell, 1.26× speedup)

Numbers below are nushell — verified Apr 2026 with identical RUSTFLAGS for both modes (-Z threads=8, wild linker). An earlier version of this example claimed 5.1× for nushell; that was an apples-to-oranges comparison where the baseline lacked the parallel frontend and fast linker. The honest speedup is 1.26× (103 s → 82 s).

  KPI 1 — Cold-Build Throughput (samples/hour)
    Baseline   :   103.0s  →      34 samples/hr
    cargo-slicer:   82.0s  →      43 samples/hr  (1.26× faster)

  KPI 2 — Incremental Feedback Latency (cargo check)
    Baseline   :    12.4s  →     290 feedback-loops/hr
    cargo-slicer:    4.1s  →     878 feedback-loops/hr  (3.0× faster)

  Cluster-Hour Equivalent (8 GPUs, 80% rollout fraction)
    Baseline   :     272 samples / cluster-hour
    cargo-slicer:    344 samples / cluster-hour  (1.26× more data)

Persisting results

Results are written to the rl_kpi table in bench-results.db:

SELECT project, baseline_cold_secs, slicer_cold_secs, speedup,
       slicer_throughput_per_hr, ts
FROM rl_kpi ORDER BY ts DESC LIMIT 10;