Benchmarks

All numbers are cold builds (after cargo clean) on a 48-core Linux server with nightly Rust.

Virtual slicer — rust-perf standard suite (not yet re-verified)

These single-crate numbers were measured without -Z threads=8 or the wild linker. They have not been re-verified with the current fair-RUSTFLAGS protocol and may overstate speedups (same apples-to-oranges issue as the retracted workspace numbers above).

ProjectBaselinecargo-slicerSpeedup
image 0.25.6 (lib)40,742 ms1,461 ms27.9×
ripgrep 14.1.1 (bin)24,094 ms5,891 ms4.09×
cargo 0.87.1 (workspace)133,797 ms61,922 ms2.16×
diesel 2.2.10 (lib)25,854 ms14,339 ms1.80×
syn 2.0.101 (lib)6,711 ms4,157 ms1.61×
serde 1.0.219 (lib)3,951 ms3,966 ms1.00×

serde is already minimal — almost all of its code is reachable via derive macros. The slicer correctly identifies this.

Virtual slicer — real binary projects

All measurements use identical RUSTFLAGS for both baseline and vslice-cc (-Z threads=8 -C linker=clang -C link-arg=--ld-path=wild). 48-core machine, Apr 2026, 2–3 runs per mode.

ProjectBaselinevslice-ccSpeedupNotes
helix (16 local crates)68 s44 s1.55×
ripgrep (50K LOC)10.5 s7 s1.50×
zed (209 local crates)1098 s767 s1.43×76 driver, 131 skip
zeroclaw (4 local crates)686 s522 s1.31×3,786 stubs / ~241k mono items (1.6% overall, 4.4% bin)
nushell (41 local crates)103 s82 s1.26×

Retracted claims: nushell was reported at 5.1× — apples-to-oranges RUSTFLAGS mismatch; honest speedup is 1.26×. cargo-slicer (self) was claimed at 1.74× but re-verified at 1.00× (only 1 driver crate, 0 stubs).

Docker benchmarks (docker run cargo-slicer bench)

Fair comparison inside Docker: same nightly toolchain, cargo fetch before timing (excludes download time), cargo clean between baseline and slicer. Slicer timing includes cargo-slicer pre-analyze overhead.

ProjectBaselineSlicerSpeedup
zed (209 crates)1149 s545 s2.11×
helix (16 crates)95 s59 s1.61×
zeroclaw (4 crates)842 s542 s1.55×
ripgrep (17 crates)15 s12 s1.31×
nushell (41 crates)118 s94 s1.25×

Docker speedups are higher than bare-metal for large projects (zed 2.11× vs 1.43×) because fewer cores amplify the benefit of eliminating codegen work — less parallelism means each eliminated function saves more wall time.

# Run the benchmark yourself
docker build -t cargo-slicer .
docker run --rm -v /path/to/project:/workspace/project cargo-slicer bench

Warm-cache daemon — verified (Apr 2026)

Both baseline and warmed use nightly + -Z threads=8. Interleaved rounds, dispatch pre-warmed, rm -rf target/ before each run.

CrateBaselineWarmedSpeedup
image 0.254.9 s2.1 s2.3×
syn 2.01.0 s0.66 s1.5×

An earlier version of this table claimed 8.5× for image (40.7 s → 4.8 s) and 1.7× for syn (6.7 s → 4.0 s). Those baselines were measured without -Z threads=8 and the wild linker, while the warmed runs had them — the same apples-to-oranges error as the nushell 5.1×. cargo 0.87.1 (claimed 2.3×) is a regression with fair RUSTFLAGS: baseline 15 s vs warmed 64 s — dispatch overhead serializes what -Z threads=8 parallelizes across 48 cores.

A warm cache populated by one project is reused across all projects on the same machine.

Upstream -Z dead-fn-elimination patch

ProjectBaseline-Z dead-fn-eliminationReduction
zed1,790 s1,238 s−31%, 9.2 min saved
rustc336 s176 s−48%, 2.7 min saved
ripgrep13 s13 sbreak-even (all fns reachable)

C/C++ projects — clang-daemon PCH acceleration

build-accelerate.sh (included in the image) auto-detects C/C++ projects and injects a precompiled header via clang-daemon. The technique eliminates repeated header parsing across parallel compilation units.

Already benchmarked (48-core server, Clang 21, -j48):

ProjectStarsFilesBaselineAcceleratedSpeedupNotes
Linux kernel 6.14227k26,339~890 s~730 s1.22×GCC fallback for asm-heavy files
LLVM 20~2,873measuredmeasured1.22×Clang 21 compiling Clang 20
LLVM 21~2,873measuredmeasured1.24×Self-hosted build
vim~300baselineaccelerated1.3×Small project, overhead minimal
sqlite31 (amalgam)20 s20.2 s1.01×Single-file; PCH gives nothing

Predicted speedup for top starred projects (based on file count × header density model):

RankProjectStarsLangFilesLOCBuildPredictedReason
1Linux227kC26,339~20MMake1.2× ✅ benchmarked
2TensorFlow195kC++~650~2.5MBazel/CMake1.15–1.25×Heavy STL + proto headers
3Godot109kC++~3,500~8.6MSCons1.2–1.3×Large header graph
4Electron121kC++(Chromium)~25Mninja1.2×Chromium-scale header reuse
5OpenCV87kC++~1,000~600KCMake1.15–1.2×Dense OpenCV headers
6FFmpeg58kC~500~1Mautotools1.1–1.2×libav* headers per file
7Bitcoin89kC++~500~750KCMake1.1–1.2×Boost + secp256k1 headers
8Netdata78kC~700~700KCMake1.1–1.15×Moderate header depth
9Redis74kC~250~330KMake1.05–1.1×Shallow headers, small codebase
10Git60kC~400~140KMake1.05–1.1×Minimal headers
llama.cpp102kC++~150~250KCMake1.05×Small; GGML headers not dense
sqlite3C1~255KMake≈1×Amalgamation; no parallelism

Key insight: speedup scales with (files × header parse fraction). Projects with thousands of files each including the same heavyweight headers (Linux, Godot, TensorFlow, Chromium) get the most benefit. Single-file amalgamations (sqlite3) and projects with shallow headers (Redis, Git) get little to none.

To run against any of these projects:

# Clone and accelerate (auto-detects C/C++ via compile_commands.json or Makefile)
git clone https://github.com/torvalds/linux
build-accelerate.sh ./linux

# Or via Docker (mounts your checkout)
docker run --rm --cpus=48 \
  -v $(pwd)/linux:/workspace/project \
  ghcr.io/yijunyu/cargo-slicer:latest

For projects using SCons (Godot) or Bazel (TensorFlow), generate compile_commands.json first:

# Godot
scons compiledb
# TensorFlow (CMake path)
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -B build && cp build/compile_commands.json .

Running benchmarks yourself

# Multi-crate CI benchmark (7 projects, baseline vs vslice-cc, 3 runs each)
./scripts/ci_bench_multicrate.sh

# Individual project
./scripts/bench_fresh_build.sh nushell baseline 3
./scripts/bench_fresh_build.sh nushell vslice-cc 3

# RL training KPI report
cargo-slicer rl-bench --project /tmp/your-project --runs 2

Results are stored in bench-results.db (SQLite).