Benchmarks
All numbers are cold builds (after cargo clean) on a 48-core Linux server
with nightly Rust.
Virtual slicer — rust-perf standard suite (not yet re-verified)
These single-crate numbers were measured without -Z threads=8 or the wild
linker. They have not been re-verified with the current fair-RUSTFLAGS
protocol and may overstate speedups (same apples-to-oranges issue as the
retracted workspace numbers above).
| Project | Baseline | cargo-slicer | Speedup |
|---|---|---|---|
| image 0.25.6 (lib) | 40,742 ms | 1,461 ms | 27.9× |
| ripgrep 14.1.1 (bin) | 24,094 ms | 5,891 ms | 4.09× |
| cargo 0.87.1 (workspace) | 133,797 ms | 61,922 ms | 2.16× |
| diesel 2.2.10 (lib) | 25,854 ms | 14,339 ms | 1.80× |
| syn 2.0.101 (lib) | 6,711 ms | 4,157 ms | 1.61× |
| serde 1.0.219 (lib) | 3,951 ms | 3,966 ms | 1.00× |
serde is already minimal — almost all of its code is reachable via derive
macros. The slicer correctly identifies this.
Virtual slicer — real binary projects
All measurements use identical RUSTFLAGS for both baseline and vslice-cc
(-Z threads=8 -C linker=clang -C link-arg=--ld-path=wild). 48-core machine,
Apr 2026, 2–3 runs per mode.
| Project | Baseline | vslice-cc | Speedup | Notes |
|---|---|---|---|---|
| helix (16 local crates) | 68 s | 44 s | 1.55× | |
| ripgrep (50K LOC) | 10.5 s | 7 s | 1.50× | |
| zed (209 local crates) | 1098 s | 767 s | 1.43× | 76 driver, 131 skip |
| zeroclaw (4 local crates) | 686 s | 522 s | 1.31× | 3,786 stubs / ~241k mono items (1.6% overall, 4.4% bin) |
| nushell (41 local crates) | 103 s | 82 s | 1.26× |
Retracted claims: nushell was reported at 5.1× — apples-to-oranges RUSTFLAGS mismatch; re-measured speedup is 1.26×. cargo-slicer (self) was claimed at 1.74× but re-verified at 1.00× (only 1 driver crate, 0 stubs).
Docker benchmarks (docker run cargo-slicer bench)
Fair comparison inside Docker: same nightly toolchain, cargo fetch before
timing (excludes download time), cargo clean between baseline and slicer.
Slicer timing includes cargo-slicer pre-analyze overhead.
| Project | Baseline | Slicer | Speedup |
|---|---|---|---|
| zed (209 crates) | 1149 s | 545 s | 2.11× |
| helix (16 crates) | 95 s | 59 s | 1.61× |
| zeroclaw (4 crates) | 842 s | 542 s | 1.55× |
| ripgrep (17 crates) | 15 s | 12 s | 1.31× |
| nushell (41 crates) | 118 s | 94 s | 1.25× |
Docker speedups are higher than bare-metal for large projects (zed 2.11× vs 1.43×) because fewer cores amplify the benefit of eliminating codegen work — less parallelism means each eliminated function saves more wall time.
# Run the benchmark yourself
docker build -t cargo-slicer .
docker run --rm -v /path/to/project:/workspace/project cargo-slicer bench
Warm-cache daemon — verified (Apr 2026)
Both baseline and warmed use nightly + -Z threads=8. Interleaved rounds,
dispatch pre-warmed, rm -rf target/ before each run.
| Crate | Baseline | Warmed | Speedup |
|---|---|---|---|
| image 0.25 | 4.9 s | 2.1 s | 2.3× |
| syn 2.0 | 1.0 s | 0.66 s | 1.5× |
An earlier version of this table claimed 8.5× for image (40.7 s → 4.8 s) and 1.7× for syn (6.7 s → 4.0 s). Those baselines were measured without
-Z threads=8and the wild linker, while the warmed runs had them — the same apples-to-oranges error as the nushell 5.1×. cargo 0.87.1 (claimed 2.3×) is a regression with fair RUSTFLAGS: baseline 15 s vs warmed 64 s — dispatch overhead serializes what-Z threads=8parallelizes across 48 cores.
A warm cache populated by one project is reused across all projects on the same machine.
Upstream -Z dead-fn-elimination patch
These numbers come from the in-tree rustc patch (src/upstream_patch/), which
implements the same algorithm natively in the compiler.
| Project | Baseline | -Z dead-fn-elimination | Reduction |
|---|---|---|---|
| zed | 1,790 s | 1,238 s | −31%, 9.2 min saved |
| rustc workspace (67 crates) 1 | 336 s | 176 s | −48%, 2.7 min saved |
| ripgrep | 13 s | 13 s | break-even (all fns reachable) |
Per @petrochenkov's V2 review feedback, the
"rustc" row reflects x.py build compiler/rustc --stage 1 — the 67
workspace crates that make up librustc_driver.so, not the ~70-line rustc
binary crate. The original "rustc" label was misleading.
Patched stage1 oracle (rust-1.90.0 stable, 2026-04-26)
The in-tree patch was rebuilt against rust-1.90.0 (commit 1159e78c) with
[rust] debug-assertions = true, overflow-checks = true to runtime-check
the V9 invariant reachable_set ⊆ post-BFS-set.
| Run | Wall time | Fns eliminated | Output check |
|---|---|---|---|
| stage1 baseline (ripgrep) | 62.1 s | 0 | runs |
stage1 + -Z dead-fn-elim | 59.9 s | 904 | identical to baseline |
debug_assert holds — no ICE, binary correct under the seed-set invariant.
ASE 2026 corpus sweep — top 2,669 crates by downloads
Correctness validation on a representative slice of the ecosystem. Run via
scripts/bench_ase_corpus.sh; library crates gated on build success, binary
crates additionally smoke-tested with --version / --help.
V10/V11 reframing (2026-04-29): numbers split by crate kind. The in-tree
-Z dead-fn-eliminationflag is a no-op on libraries today (V1 early-return); the userspacecargo-slicertool's RUSTC_WRAPPER pipeline does run on libraries. They are reported separately so the in-tree claim applies only to the binaries it actually runs on.
Binary subset (n=65) — relevant to in-tree -Z dead-fn-elimination:
| Metric | Value |
|---|---|
| Binary crates attempted | 65 |
| Both legs built | 59 |
| Slicer-only failures | 0 |
| Median build speedup | 1.38× |
| Mean build speedup | 2.45× |
| % speedup ≥ 1.0× | 69.5% |
| % speedup ≥ 1.5× | 45.8% |
| % speedup ≥ 2.0× | 27.1% |
Library subset (n=2,538) — userspace cargo-slicer only, NOT the -Z
flag: 2,393 of 2,538 libraries built under both legs with zero
slicer-only failures; userspace median 1.50×. This number measures
cross-crate orchestration in the userspace tool, not the single-crate
in-tree flag.
Full corpus catalog (all 2,669 crates with rank, version, downloads, build times, and slicer status): ASE 2026 Corpus · CSV.
Full point-by-point response to the @petrochenkov V1–V11 review and
reproduction instructions live in vadim-response-results.md
on the cargo-slicer repository.
C/C++ projects — clang-daemon PCH acceleration
build-accelerate.sh (included in the image) auto-detects C/C++ projects and
injects a precompiled header via clang-daemon. The technique eliminates
repeated header parsing across parallel compilation units.
Already benchmarked (48-core server, Clang 21, -j48):
| Project | Stars | Files | Baseline | Accelerated | Speedup | Notes |
|---|---|---|---|---|---|---|
| Linux kernel 6.14 | 227k | 26,339 | ~890 s | ~730 s | 1.22× | GCC fallback for asm-heavy files |
| LLVM 20 | — | ~2,873 | measured | measured | 1.22× | Clang 21 compiling Clang 20 |
| LLVM 21 | — | ~2,873 | measured | measured | 1.24× | Self-hosted build |
| vim | — | ~300 | baseline | accelerated | 1.3× | Small project, overhead minimal |
| sqlite3 | — | 1 (amalgam) | 20 s | 20.2 s | 1.01× | Single-file; PCH gives nothing |
Predicted speedup for top starred projects (based on file count × header density model):
| Rank | Project | Stars | Lang | Files | LOC | Build | Predicted | Reason |
|---|---|---|---|---|---|---|---|---|
| 1 | Linux | 227k | C | 26,339 | ~20M | Make | 1.2× ✅ benchmarked | |
| 2 | TensorFlow | 195k | C++ | ~650 | ~2.5M | Bazel/CMake | 1.15–1.25× | Heavy STL + proto headers |
| 3 | Godot | 109k | C++ | ~3,500 | ~8.6M | SCons | 1.2–1.3× | Large header graph |
| 4 | Electron | 121k | C++ | (Chromium) | ~25M | ninja | 1.2× | Chromium-scale header reuse |
| 5 | OpenCV | 87k | C++ | ~1,000 | ~600K | CMake | 1.15–1.2× | Dense OpenCV headers |
| 6 | FFmpeg | 58k | C | ~500 | ~1M | autotools | 1.1–1.2× | libav* headers per file |
| 7 | Bitcoin | 89k | C++ | ~500 | ~750K | CMake | 1.1–1.2× | Boost + secp256k1 headers |
| 8 | Netdata | 78k | C | ~700 | ~700K | CMake | 1.1–1.15× | Moderate header depth |
| 9 | Redis | 74k | C | ~250 | ~330K | Make | 1.05–1.1× | Shallow headers, small codebase |
| 10 | Git | 60k | C | ~400 | ~140K | Make | 1.05–1.1× | Minimal headers |
| — | llama.cpp | 102k | C++ | ~150 | ~250K | CMake | 1.05× | Small; GGML headers not dense |
| — | sqlite3 | — | C | 1 | ~255K | Make | ≈1× | Amalgamation; no parallelism |
Key insight: speedup scales with (files × header parse fraction). Projects with thousands of files each including the same heavyweight headers (Linux, Godot, TensorFlow, Chromium) get the most benefit. Single-file amalgamations (sqlite3) and projects with shallow headers (Redis, Git) get little to none.
To run against any of these projects:
# Clone and accelerate (auto-detects C/C++ via compile_commands.json or Makefile)
git clone https://github.com/torvalds/linux
build-accelerate.sh ./linux
# Or via Docker (mounts your checkout)
docker run --rm --cpus=48 \
-v $(pwd)/linux:/workspace/project \
ghcr.io/yijunyu/cargo-slicer:latest
For projects using SCons (Godot) or Bazel (TensorFlow), generate
compile_commands.jsonfirst:# Godot scons compiledb # TensorFlow (CMake path) cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -B build && cp build/compile_commands.json .
Running benchmarks yourself
# Multi-crate CI benchmark (7 projects, baseline vs vslice-cc, 3 runs each)
./scripts/ci_bench_multicrate.sh
# Individual project
./scripts/bench_fresh_build.sh nushell baseline 3
./scripts/bench_fresh_build.sh nushell vslice-cc 3
# RL training KPI report
cargo-slicer rl-bench --project /tmp/your-project --runs 2
Results are stored in bench-results.db (SQLite).