Benchmarks

All numbers are cold builds (after cargo clean) on a 48-core Linux server with nightly Rust.

Virtual slicer — rust-perf standard suite (not yet re-verified)

These single-crate numbers were measured without -Z threads=8 or the wild linker. They have not been re-verified with the current fair-RUSTFLAGS protocol and may overstate speedups (same apples-to-oranges issue as the retracted workspace numbers above).

Project	Baseline	cargo-slicer	Speedup
image 0.25.6 (lib)	40,742 ms	1,461 ms	27.9×
ripgrep 14.1.1 (bin)	24,094 ms	5,891 ms	4.09×
cargo 0.87.1 (workspace)	133,797 ms	61,922 ms	2.16×
diesel 2.2.10 (lib)	25,854 ms	14,339 ms	1.80×
syn 2.0.101 (lib)	6,711 ms	4,157 ms	1.61×
serde 1.0.219 (lib)	3,951 ms	3,966 ms	1.00×

serde is already minimal — almost all of its code is reachable via derive macros. The slicer correctly identifies this.

Virtual slicer — real binary projects

All measurements use identical RUSTFLAGS for both baseline and vslice-cc (-Z threads=8 -C linker=clang -C link-arg=--ld-path=wild). 48-core machine, Apr 2026, 2–3 runs per mode.

Project	Baseline	vslice-cc	Speedup	Notes
helix (16 local crates)	68 s	44 s	1.55×
ripgrep (50K LOC)	10.5 s	7 s	1.50×
zed (209 local crates)	1098 s	767 s	1.43×	76 driver, 131 skip
zeroclaw (4 local crates)	686 s	522 s	1.31×	3,786 stubs / ~241k mono items (1.6% overall, 4.4% bin)
nushell (41 local crates)	103 s	82 s	1.26×

Retracted claims: nushell was reported at 5.1× — apples-to-oranges RUSTFLAGS mismatch; re-measured speedup is 1.26×. cargo-slicer (self) was claimed at 1.74× but re-verified at 1.00× (only 1 driver crate, 0 stubs).

Docker benchmarks (`docker run cargo-slicer bench`)

Fair comparison inside Docker: same nightly toolchain, cargo fetch before timing (excludes download time), cargo clean between baseline and slicer. Slicer timing includes cargo-slicer pre-analyze overhead.

Project	Baseline	Slicer	Speedup
zed (209 crates)	1149 s	545 s	2.11×
helix (16 crates)	95 s	59 s	1.61×
zeroclaw (4 crates)	842 s	542 s	1.55×
ripgrep (17 crates)	15 s	12 s	1.31×
nushell (41 crates)	118 s	94 s	1.25×

Docker speedups are higher than bare-metal for large projects (zed 2.11× vs 1.43×) because fewer cores amplify the benefit of eliminating codegen work — less parallelism means each eliminated function saves more wall time.

# Run the benchmark yourself
docker build -t cargo-slicer .
docker run --rm -v /path/to/project:/workspace/project cargo-slicer bench

Warm-cache daemon — verified (Apr 2026)

Both baseline and warmed use nightly + -Z threads=8. Interleaved rounds, dispatch pre-warmed, rm -rf target/ before each run.

Crate	Baseline	Warmed	Speedup
image 0.25	4.9 s	2.1 s	2.3×
syn 2.0	1.0 s	0.66 s	1.5×

An earlier version of this table claimed 8.5× for image (40.7 s → 4.8 s) and 1.7× for syn (6.7 s → 4.0 s). Those baselines were measured without -Z threads=8 and the wild linker, while the warmed runs had them — the same apples-to-oranges error as the nushell 5.1×. cargo 0.87.1 (claimed 2.3×) is a regression with fair RUSTFLAGS: baseline 15 s vs warmed 64 s — dispatch overhead serializes what -Z threads=8 parallelizes across 48 cores.

A warm cache populated by one project is reused across all projects on the same machine.

Upstream `-Z dead-fn-elimination` patch

These numbers come from the in-tree rustc patch (src/upstream_patch/), which implements the same algorithm natively in the compiler.

Project	Baseline	-Z dead-fn-elimination	Reduction
zed	1,790 s	1,238 s	−31%, 9.2 min saved
rustc workspace (67 crates) ¹	336 s	176 s	−48%, 2.7 min saved
ripgrep	13 s	13 s	break-even (all fns reachable)

Per @petrochenkov's V2 review feedback, the "rustc" row reflects x.py build compiler/rustc --stage 1 — the 67 workspace crates that make up librustc_driver.so, not the ~70-line rustc binary crate. The original "rustc" label was misleading.

Patched stage1 oracle (rust-1.90.0 stable, 2026-04-26)

The in-tree patch was rebuilt against rust-1.90.0 (commit 1159e78c) with [rust] debug-assertions = true, overflow-checks = true to runtime-check the V9 invariant reachable_set ⊆ post-BFS-set.

Run	Wall time	Fns eliminated	Output check
stage1 baseline (ripgrep)	62.1 s	0	runs
stage1 + `-Z dead-fn-elim`	59.9 s	904	identical to baseline

debug_assert holds — no ICE, binary correct under the seed-set invariant.

ASE 2026 corpus sweep — top 2,669 crates by downloads

Correctness validation on a representative slice of the ecosystem. Run via scripts/bench_ase_corpus.sh; library crates gated on build success, binary crates additionally smoke-tested with --version / --help.

V10/V11 reframing (2026-04-29): numbers split by crate kind. The in-tree -Z dead-fn-elimination flag is a no-op on libraries today (V1 early-return); the userspace cargo-slicer tool's RUSTC_WRAPPER pipeline does run on libraries. They are reported separately so the in-tree claim applies only to the binaries it actually runs on.

Binary subset (n=65) — relevant to in-tree -Z dead-fn-elimination:

Metric	Value
Binary crates attempted	65
Both legs built	59
Slicer-only failures	0
Median build speedup	1.38×
Mean build speedup	2.45×
% speedup ≥ 1.0×	69.5%
% speedup ≥ 1.5×	45.8%
% speedup ≥ 2.0×	27.1%

Library subset (n=2,538) — userspace cargo-slicer only, NOT the -Z flag: 2,393 of 2,538 libraries built under both legs with zero slicer-only failures; userspace median 1.50×. This number measures cross-crate orchestration in the userspace tool, not the single-crate in-tree flag.

Full corpus catalog (all 2,669 crates with rank, version, downloads, build times, and slicer status): ASE 2026 Corpus · CSV.

Full point-by-point response to the @petrochenkov V1–V11 review and reproduction instructions live in vadim-response-results.md on the cargo-slicer repository.

C/C++ projects — clang-daemon PCH acceleration

build-accelerate.sh (included in the image) auto-detects C/C++ projects and injects a precompiled header via clang-daemon. The technique eliminates repeated header parsing across parallel compilation units.

Already benchmarked (48-core server, Clang 21, -j48):

Project	Stars	Files	Baseline	Accelerated	Speedup	Notes
Linux kernel 6.14	227k	26,339	~890 s	~730 s	1.22×	GCC fallback for asm-heavy files
LLVM 20	—	~2,873	measured	measured	1.22×	Clang 21 compiling Clang 20
LLVM 21	—	~2,873	measured	measured	1.24×	Self-hosted build
vim	—	~300	baseline	accelerated	1.3×	Small project, overhead minimal
sqlite3	—	1 (amalgam)	20 s	20.2 s	1.01×	Single-file; PCH gives nothing

Predicted speedup for top starred projects (based on file count × header density model):

Rank	Project	Stars	Lang	Files	LOC	Build	Predicted	Reason
1	Linux	227k	C	26,339	~20M	Make	1.2× ✅ benchmarked
2	TensorFlow	195k	C++	~650	~2.5M	Bazel/CMake	1.15–1.25×	Heavy STL + proto headers
3	Godot	109k	C++	~3,500	~8.6M	SCons	1.2–1.3×	Large header graph
4	Electron	121k	C++	(Chromium)	~25M	ninja	1.2×	Chromium-scale header reuse
5	OpenCV	87k	C++	~1,000	~600K	CMake	1.15–1.2×	Dense OpenCV headers
6	FFmpeg	58k	C	~500	~1M	autotools	1.1–1.2×	libav* headers per file
7	Bitcoin	89k	C++	~500	~750K	CMake	1.1–1.2×	Boost + secp256k1 headers
8	Netdata	78k	C	~700	~700K	CMake	1.1–1.15×	Moderate header depth
9	Redis	74k	C	~250	~330K	Make	1.05–1.1×	Shallow headers, small codebase
10	Git	60k	C	~400	~140K	Make	1.05–1.1×	Minimal headers
—	llama.cpp	102k	C++	~150	~250K	CMake	1.05×	Small; GGML headers not dense
—	sqlite3	—	C	1	~255K	Make	≈1×	Amalgamation; no parallelism

Key insight: speedup scales with (files × header parse fraction). Projects with thousands of files each including the same heavyweight headers (Linux, Godot, TensorFlow, Chromium) get the most benefit. Single-file amalgamations (sqlite3) and projects with shallow headers (Redis, Git) get little to none.

To run against any of these projects:

# Clone and accelerate (auto-detects C/C++ via compile_commands.json or Makefile)
git clone https://github.com/torvalds/linux
build-accelerate.sh ./linux

# Or via Docker (mounts your checkout)
docker run --rm --cpus=48 \
  -v $(pwd)/linux:/workspace/project \
  ghcr.io/yijunyu/cargo-slicer:latest

For projects using SCons (Godot) or Bazel (TensorFlow), generate compile_commands.json first:
# Godot
scons compiledb
# TensorFlow (CMake path)
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -B build && cp build/compile_commands.json .

Running benchmarks yourself

# Multi-crate CI benchmark (7 projects, baseline vs vslice-cc, 3 runs each)
./scripts/ci_bench_multicrate.sh

# Individual project
./scripts/bench_fresh_build.sh nushell baseline 3
./scripts/bench_fresh_build.sh nushell vslice-cc 3

# RL training KPI report
cargo-slicer rl-bench --project /tmp/your-project --runs 2

Results are stored in bench-results.db (SQLite).