Benchmarks
All numbers are cold builds (after cargo clean) on a 48-core Linux server
with nightly Rust.
Virtual slicer — rust-perf standard suite (not yet re-verified)
These single-crate numbers were measured without -Z threads=8 or the wild
linker. They have not been re-verified with the current fair-RUSTFLAGS
protocol and may overstate speedups (same apples-to-oranges issue as the
retracted workspace numbers above).
| Project | Baseline | cargo-slicer | Speedup |
|---|---|---|---|
| image 0.25.6 (lib) | 40,742 ms | 1,461 ms | 27.9× |
| ripgrep 14.1.1 (bin) | 24,094 ms | 5,891 ms | 4.09× |
| cargo 0.87.1 (workspace) | 133,797 ms | 61,922 ms | 2.16× |
| diesel 2.2.10 (lib) | 25,854 ms | 14,339 ms | 1.80× |
| syn 2.0.101 (lib) | 6,711 ms | 4,157 ms | 1.61× |
| serde 1.0.219 (lib) | 3,951 ms | 3,966 ms | 1.00× |
serde is already minimal — almost all of its code is reachable via derive
macros. The slicer correctly identifies this.
Virtual slicer — real binary projects
All measurements use identical RUSTFLAGS for both baseline and vslice-cc
(-Z threads=8 -C linker=clang -C link-arg=--ld-path=wild). 48-core machine,
Apr 2026, 2–3 runs per mode.
| Project | Baseline | vslice-cc | Speedup | Notes |
|---|---|---|---|---|
| helix (16 local crates) | 68 s | 44 s | 1.55× | |
| ripgrep (50K LOC) | 10.5 s | 7 s | 1.50× | |
| zed (209 local crates) | 1098 s | 767 s | 1.43× | 76 driver, 131 skip |
| zeroclaw (4 local crates) | 686 s | 522 s | 1.31× | 3,786 stubs / ~241k mono items (1.6% overall, 4.4% bin) |
| nushell (41 local crates) | 103 s | 82 s | 1.26× |
Retracted claims: nushell was reported at 5.1× — apples-to-oranges RUSTFLAGS mismatch; honest speedup is 1.26×. cargo-slicer (self) was claimed at 1.74× but re-verified at 1.00× (only 1 driver crate, 0 stubs).
Docker benchmarks (docker run cargo-slicer bench)
Fair comparison inside Docker: same nightly toolchain, cargo fetch before
timing (excludes download time), cargo clean between baseline and slicer.
Slicer timing includes cargo-slicer pre-analyze overhead.
| Project | Baseline | Slicer | Speedup |
|---|---|---|---|
| zed (209 crates) | 1149 s | 545 s | 2.11× |
| helix (16 crates) | 95 s | 59 s | 1.61× |
| zeroclaw (4 crates) | 842 s | 542 s | 1.55× |
| ripgrep (17 crates) | 15 s | 12 s | 1.31× |
| nushell (41 crates) | 118 s | 94 s | 1.25× |
Docker speedups are higher than bare-metal for large projects (zed 2.11× vs 1.43×) because fewer cores amplify the benefit of eliminating codegen work — less parallelism means each eliminated function saves more wall time.
# Run the benchmark yourself
docker build -t cargo-slicer .
docker run --rm -v /path/to/project:/workspace/project cargo-slicer bench
Warm-cache daemon — verified (Apr 2026)
Both baseline and warmed use nightly + -Z threads=8. Interleaved rounds,
dispatch pre-warmed, rm -rf target/ before each run.
| Crate | Baseline | Warmed | Speedup |
|---|---|---|---|
| image 0.25 | 4.9 s | 2.1 s | 2.3× |
| syn 2.0 | 1.0 s | 0.66 s | 1.5× |
An earlier version of this table claimed 8.5× for image (40.7 s → 4.8 s) and 1.7× for syn (6.7 s → 4.0 s). Those baselines were measured without
-Z threads=8and the wild linker, while the warmed runs had them — the same apples-to-oranges error as the nushell 5.1×. cargo 0.87.1 (claimed 2.3×) is a regression with fair RUSTFLAGS: baseline 15 s vs warmed 64 s — dispatch overhead serializes what-Z threads=8parallelizes across 48 cores.
A warm cache populated by one project is reused across all projects on the same machine.
Upstream -Z dead-fn-elimination patch
| Project | Baseline | -Z dead-fn-elimination | Reduction |
|---|---|---|---|
| zed | 1,790 s | 1,238 s | −31%, 9.2 min saved |
| rustc | 336 s | 176 s | −48%, 2.7 min saved |
| ripgrep | 13 s | 13 s | break-even (all fns reachable) |
C/C++ projects — clang-daemon PCH acceleration
build-accelerate.sh (included in the image) auto-detects C/C++ projects and
injects a precompiled header via clang-daemon. The technique eliminates
repeated header parsing across parallel compilation units.
Already benchmarked (48-core server, Clang 21, -j48):
| Project | Stars | Files | Baseline | Accelerated | Speedup | Notes |
|---|---|---|---|---|---|---|
| Linux kernel 6.14 | 227k | 26,339 | ~890 s | ~730 s | 1.22× | GCC fallback for asm-heavy files |
| LLVM 20 | — | ~2,873 | measured | measured | 1.22× | Clang 21 compiling Clang 20 |
| LLVM 21 | — | ~2,873 | measured | measured | 1.24× | Self-hosted build |
| vim | — | ~300 | baseline | accelerated | 1.3× | Small project, overhead minimal |
| sqlite3 | — | 1 (amalgam) | 20 s | 20.2 s | 1.01× | Single-file; PCH gives nothing |
Predicted speedup for top starred projects (based on file count × header density model):
| Rank | Project | Stars | Lang | Files | LOC | Build | Predicted | Reason |
|---|---|---|---|---|---|---|---|---|
| 1 | Linux | 227k | C | 26,339 | ~20M | Make | 1.2× ✅ benchmarked | |
| 2 | TensorFlow | 195k | C++ | ~650 | ~2.5M | Bazel/CMake | 1.15–1.25× | Heavy STL + proto headers |
| 3 | Godot | 109k | C++ | ~3,500 | ~8.6M | SCons | 1.2–1.3× | Large header graph |
| 4 | Electron | 121k | C++ | (Chromium) | ~25M | ninja | 1.2× | Chromium-scale header reuse |
| 5 | OpenCV | 87k | C++ | ~1,000 | ~600K | CMake | 1.15–1.2× | Dense OpenCV headers |
| 6 | FFmpeg | 58k | C | ~500 | ~1M | autotools | 1.1–1.2× | libav* headers per file |
| 7 | Bitcoin | 89k | C++ | ~500 | ~750K | CMake | 1.1–1.2× | Boost + secp256k1 headers |
| 8 | Netdata | 78k | C | ~700 | ~700K | CMake | 1.1–1.15× | Moderate header depth |
| 9 | Redis | 74k | C | ~250 | ~330K | Make | 1.05–1.1× | Shallow headers, small codebase |
| 10 | Git | 60k | C | ~400 | ~140K | Make | 1.05–1.1× | Minimal headers |
| — | llama.cpp | 102k | C++ | ~150 | ~250K | CMake | 1.05× | Small; GGML headers not dense |
| — | sqlite3 | — | C | 1 | ~255K | Make | ≈1× | Amalgamation; no parallelism |
Key insight: speedup scales with (files × header parse fraction). Projects with thousands of files each including the same heavyweight headers (Linux, Godot, TensorFlow, Chromium) get the most benefit. Single-file amalgamations (sqlite3) and projects with shallow headers (Redis, Git) get little to none.
To run against any of these projects:
# Clone and accelerate (auto-detects C/C++ via compile_commands.json or Makefile)
git clone https://github.com/torvalds/linux
build-accelerate.sh ./linux
# Or via Docker (mounts your checkout)
docker run --rm --cpus=48 \
-v $(pwd)/linux:/workspace/project \
ghcr.io/yijunyu/cargo-slicer:latest
For projects using SCons (Godot) or Bazel (TensorFlow), generate
compile_commands.jsonfirst:# Godot scons compiledb # TensorFlow (CMake path) cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -B build && cp build/compile_commands.json .
Running benchmarks yourself
# Multi-crate CI benchmark (7 projects, baseline vs vslice-cc, 3 runs each)
./scripts/ci_bench_multicrate.sh
# Individual project
./scripts/bench_fresh_build.sh nushell baseline 3
./scripts/bench_fresh_build.sh nushell vslice-cc 3
# RL training KPI report
cargo-slicer rl-bench --project /tmp/your-project --runs 2
Results are stored in bench-results.db (SQLite).