Benchmarks

All numbers are cold builds (after cargo clean) on a 48-core Linux server with nightly Rust.

Virtual slicer — rust-perf standard suite (not yet re-verified)

These single-crate numbers were measured without -Z threads=8 or the wild linker. They have not been re-verified with the current fair-RUSTFLAGS protocol and may overstate speedups (same apples-to-oranges issue as the retracted workspace numbers above).

ProjectBaselinecargo-slicerSpeedup
image 0.25.6 (lib)40,742 ms1,461 ms27.9×
ripgrep 14.1.1 (bin)24,094 ms5,891 ms4.09×
cargo 0.87.1 (workspace)133,797 ms61,922 ms2.16×
diesel 2.2.10 (lib)25,854 ms14,339 ms1.80×
syn 2.0.101 (lib)6,711 ms4,157 ms1.61×
serde 1.0.219 (lib)3,951 ms3,966 ms1.00×

serde is already minimal — almost all of its code is reachable via derive macros. The slicer correctly identifies this.

Virtual slicer — real binary projects

All measurements use identical RUSTFLAGS for both baseline and vslice-cc (-Z threads=8 -C linker=clang -C link-arg=--ld-path=wild). 48-core machine, Apr 2026, 2–3 runs per mode.

ProjectBaselinevslice-ccSpeedupNotes
helix (16 local crates)68 s44 s1.55×
ripgrep (50K LOC)10.5 s7 s1.50×
zed (209 local crates)1098 s767 s1.43×76 driver, 131 skip
zeroclaw (4 local crates)686 s522 s1.31×3,786 stubs / ~241k mono items (1.6% overall, 4.4% bin)
nushell (41 local crates)103 s82 s1.26×

Retracted claims: nushell was reported at 5.1× — apples-to-oranges RUSTFLAGS mismatch; re-measured speedup is 1.26×. cargo-slicer (self) was claimed at 1.74× but re-verified at 1.00× (only 1 driver crate, 0 stubs).

Docker benchmarks (docker run cargo-slicer bench)

Fair comparison inside Docker: same nightly toolchain, cargo fetch before timing (excludes download time), cargo clean between baseline and slicer. Slicer timing includes cargo-slicer pre-analyze overhead.

ProjectBaselineSlicerSpeedup
zed (209 crates)1149 s545 s2.11×
helix (16 crates)95 s59 s1.61×
zeroclaw (4 crates)842 s542 s1.55×
ripgrep (17 crates)15 s12 s1.31×
nushell (41 crates)118 s94 s1.25×

Docker speedups are higher than bare-metal for large projects (zed 2.11× vs 1.43×) because fewer cores amplify the benefit of eliminating codegen work — less parallelism means each eliminated function saves more wall time.

# Run the benchmark yourself
docker build -t cargo-slicer .
docker run --rm -v /path/to/project:/workspace/project cargo-slicer bench

Warm-cache daemon — verified (Apr 2026)

Both baseline and warmed use nightly + -Z threads=8. Interleaved rounds, dispatch pre-warmed, rm -rf target/ before each run.

CrateBaselineWarmedSpeedup
image 0.254.9 s2.1 s2.3×
syn 2.01.0 s0.66 s1.5×

An earlier version of this table claimed 8.5× for image (40.7 s → 4.8 s) and 1.7× for syn (6.7 s → 4.0 s). Those baselines were measured without -Z threads=8 and the wild linker, while the warmed runs had them — the same apples-to-oranges error as the nushell 5.1×. cargo 0.87.1 (claimed 2.3×) is a regression with fair RUSTFLAGS: baseline 15 s vs warmed 64 s — dispatch overhead serializes what -Z threads=8 parallelizes across 48 cores.

A warm cache populated by one project is reused across all projects on the same machine.

Upstream -Z dead-fn-elimination patch

These numbers come from the in-tree rustc patch (src/upstream_patch/), which implements the same algorithm natively in the compiler.

ProjectBaseline-Z dead-fn-eliminationReduction
zed1,790 s1,238 s−31%, 9.2 min saved
rustc workspace (67 crates) 1336 s176 s−48%, 2.7 min saved
ripgrep13 s13 sbreak-even (all fns reachable)
1

Per @petrochenkov's V2 review feedback, the "rustc" row reflects x.py build compiler/rustc --stage 1 — the 67 workspace crates that make up librustc_driver.so, not the ~70-line rustc binary crate. The original "rustc" label was misleading.

Patched stage1 oracle (rust-1.90.0 stable, 2026-04-26)

The in-tree patch was rebuilt against rust-1.90.0 (commit 1159e78c) with [rust] debug-assertions = true, overflow-checks = true to runtime-check the V9 invariant reachable_set ⊆ post-BFS-set.

RunWall timeFns eliminatedOutput check
stage1 baseline (ripgrep)62.1 s0runs
stage1 + -Z dead-fn-elim59.9 s904identical to baseline

debug_assert holds — no ICE, binary correct under the seed-set invariant.

ASE 2026 corpus sweep — top 2,669 crates by downloads

Correctness validation on a representative slice of the ecosystem. Run via scripts/bench_ase_corpus.sh; library crates gated on build success, binary crates additionally smoke-tested with --version / --help.

V10/V11 reframing (2026-04-29): numbers split by crate kind. The in-tree -Z dead-fn-elimination flag is a no-op on libraries today (V1 early-return); the userspace cargo-slicer tool's RUSTC_WRAPPER pipeline does run on libraries. They are reported separately so the in-tree claim applies only to the binaries it actually runs on.

Binary subset (n=65) — relevant to in-tree -Z dead-fn-elimination:

MetricValue
Binary crates attempted65
Both legs built59
Slicer-only failures0
Median build speedup1.38×
Mean build speedup2.45×
% speedup ≥ 1.0×69.5%
% speedup ≥ 1.5×45.8%
% speedup ≥ 2.0×27.1%

Library subset (n=2,538) — userspace cargo-slicer only, NOT the -Z flag: 2,393 of 2,538 libraries built under both legs with zero slicer-only failures; userspace median 1.50×. This number measures cross-crate orchestration in the userspace tool, not the single-crate in-tree flag.

Full corpus catalog (all 2,669 crates with rank, version, downloads, build times, and slicer status): ASE 2026 Corpus · CSV.

Full point-by-point response to the @petrochenkov V1–V11 review and reproduction instructions live in vadim-response-results.md on the cargo-slicer repository.

C/C++ projects — clang-daemon PCH acceleration

build-accelerate.sh (included in the image) auto-detects C/C++ projects and injects a precompiled header via clang-daemon. The technique eliminates repeated header parsing across parallel compilation units.

Already benchmarked (48-core server, Clang 21, -j48):

ProjectStarsFilesBaselineAcceleratedSpeedupNotes
Linux kernel 6.14227k26,339~890 s~730 s1.22×GCC fallback for asm-heavy files
LLVM 20~2,873measuredmeasured1.22×Clang 21 compiling Clang 20
LLVM 21~2,873measuredmeasured1.24×Self-hosted build
vim~300baselineaccelerated1.3×Small project, overhead minimal
sqlite31 (amalgam)20 s20.2 s1.01×Single-file; PCH gives nothing

Predicted speedup for top starred projects (based on file count × header density model):

RankProjectStarsLangFilesLOCBuildPredictedReason
1Linux227kC26,339~20MMake1.2× ✅ benchmarked
2TensorFlow195kC++~650~2.5MBazel/CMake1.15–1.25×Heavy STL + proto headers
3Godot109kC++~3,500~8.6MSCons1.2–1.3×Large header graph
4Electron121kC++(Chromium)~25Mninja1.2×Chromium-scale header reuse
5OpenCV87kC++~1,000~600KCMake1.15–1.2×Dense OpenCV headers
6FFmpeg58kC~500~1Mautotools1.1–1.2×libav* headers per file
7Bitcoin89kC++~500~750KCMake1.1–1.2×Boost + secp256k1 headers
8Netdata78kC~700~700KCMake1.1–1.15×Moderate header depth
9Redis74kC~250~330KMake1.05–1.1×Shallow headers, small codebase
10Git60kC~400~140KMake1.05–1.1×Minimal headers
llama.cpp102kC++~150~250KCMake1.05×Small; GGML headers not dense
sqlite3C1~255KMake≈1×Amalgamation; no parallelism

Key insight: speedup scales with (files × header parse fraction). Projects with thousands of files each including the same heavyweight headers (Linux, Godot, TensorFlow, Chromium) get the most benefit. Single-file amalgamations (sqlite3) and projects with shallow headers (Redis, Git) get little to none.

To run against any of these projects:

# Clone and accelerate (auto-detects C/C++ via compile_commands.json or Makefile)
git clone https://github.com/torvalds/linux
build-accelerate.sh ./linux

# Or via Docker (mounts your checkout)
docker run --rm --cpus=48 \
  -v $(pwd)/linux:/workspace/project \
  ghcr.io/yijunyu/cargo-slicer:latest

For projects using SCons (Godot) or Bazel (TensorFlow), generate compile_commands.json first:

# Godot
scons compiledb
# TensorFlow (CMake path)
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -B build && cp build/compile_commands.json .

Running benchmarks yourself

# Multi-crate CI benchmark (7 projects, baseline vs vslice-cc, 3 runs each)
./scripts/ci_bench_multicrate.sh

# Individual project
./scripts/bench_fresh_build.sh nushell baseline 3
./scripts/bench_fresh_build.sh nushell vslice-cc 3

# RL training KPI report
cargo-slicer rl-bench --project /tmp/your-project --runs 2

Results are stored in bench-results.db (SQLite).