ASE 2026 Corpus

The ASE 2026 corpus is the empirical evaluation set for cargo-slicer's correctness and speedup claims. It is the top 2,669 crates by all-time downloads on crates.io, fetched 2026-04-26.

This page is the canonical reference for the corpus. Other documents and README sections that mention "the ASE 2026 corpus sweep" link here.

Headline numbers

Note (2026-04-29 reframing). Per follow-up review V10/V11 from @petrochenkov, speedup numbers are split by crate kind. The in-tree -Z dead-fn-elimination flag is a no-op on libraries today (early-return when entry_fn().is_none()), so library numbers cannot be folded into a single headline alongside binary numbers. The userspace cargo-slicer tool's RUSTC_WRAPPER pipeline does run on libraries, and its numbers are reported separately below. See docs/vadim-response-results.md for the full V10/V11 discussion.

Corpus shape (single sweep, both legs)

MetricValue
Crates fetched2,669
Tarball / extract errors10
Crates that ran2,603
Library crates2,538
Binary crates65
Both legs built (clean compare)2,452
Baseline-only failures151
Slicer-only correctness regressions0

Across the full corpus the slicer leg never failed when the baseline succeeded — this correctness statement holds for both kinds.

Binary subset (n=65) — relevant to in-tree -Z dead-fn-elimination

MetricValue
Binary crates attempted65
Both legs built59
Baseline-only failures6
Slicer-only failures0
Median build speedup1.38×
Mean build speedup2.45×
% speedup ≥ 1.0×69.5%
% speedup ≥ 1.5×45.8%
% speedup ≥ 2.0×27.1%
10th percentile0.67×
90th percentile3.58×

Library subset (n=2,538) — userspace cargo-slicer only, NOT the -Z flag

MetricValue
Library crates attempted2,538
Both legs built2,393
Baseline-only failures145
Slicer-only failures0
Median build speedup (userspace tool)1.50×
Mean build speedup (userspace tool)3.99×
10th percentile0.65×
90th percentile7.42×

The library median is not a claim about -Z dead-fn-elimination. It is a measurement of cross-crate orchestration in the userspace tool, which is where (per V11) the algorithm actually earns its keep — single-crate elimination overlaps heavily with -Wunused + monomorphization + LLVM DCE.

Speedup distribution

BucketCrates% of compared
< 0.5× (regression)1285.2%
0.5 – 0.8×28111.5%
0.8 – 1.0×25110.2%
1.0 – 1.5×57123.3%
1.5 – 2.0×34113.9%
2.0 – 5.0×51921.2%
5.0 – 20.0×27211.1%
≥ 20.0×893.6%

The wall-time regression tail (speedup < 1.0×, 26.9% of crates) is concentrated on tiny crates where the slicer's per-invocation overhead dominates a sub-1-second baseline build. None of these 660 crates are correctness regressions — every one of them produced a correct binary; they just took longer to build than the baseline. For full reproducibility, all of them are kept in the corpus and in the published CSV.

Methodology

# 1. Fetch top 2,669 crates by downloads
ase2026d/crates-corpus/fetch-crates.sh

# 2. Run baseline + slicer back-to-back on each tarball, 8-way parallel
for f in fetch/*.crate; do
    while [ "$(jobs -r -p | wc -l)" -ge 8 ]; do wait -n; done
    NAME=...; VERSION=...
    ./scripts/bench_ase_corpus.sh "$NAME" "$VERSION" &
done
wait

# 3. Aggregate
python3 scripts/aggregate_ase_results.py > results/aggregate_full.json

For each crate the harness:

  1. Extracts the tarball.
  2. Runs cargo +nightly build --release --offline (baseline), retrying online once on failure to fetch transitive deps.
  3. Runs cargo +nightly build --release again under RUSTC_WRAPPER=cargo_slicer_dispatch with CARGO_SLICER_VIRTUAL=1 CARGO_SLICER_CODEGEN_FILTER=1.
  4. For binary crates: smoke-tests the produced binary with --version then --help (both legs must succeed for "correctness_ok").
  5. For library crates: build success is the correctness signal — cargo test is intentionally skipped on the slicer leg because the userspace slicer over-stubs #[test] functions (V7 issue, orthogonal to dead-fn-elimination's binary-output property).
  6. Records (baseline_secs, slicer_secs, speedup, correctness_ok) to results/<name>-<version>.bench.json.

Top 25 by speedup

These are the crates with the largest wall-time speedup. The pattern: tiny library crates whose baseline is dominated by codegen of a few large unreachable functions that the slicer eliminates entirely.

RankCrateVersionKindBaselineSlicerSpeedup
2170aws-sig-auth0.60.3lib39.98 s0.21 s190.38×
1193retain_mut0.1.9lib37.12 s0.25 s148.48×
2630mutate_once0.1.2lib28.07 s0.20 s140.35×
866raw-window-handle0.6.2lib49.13 s0.42 s116.98×
958utf8-width0.1.8lib14.49 s0.14 s103.50×
2470line-wrap0.2.0lib17.40 s0.19 s91.58×
2277sptr0.3.2lib16.77 s0.21 s79.86×
799deadpool-runtime0.3.1lib18.78 s0.24 s78.25×
598md50.8.0lib19.76 s0.26 s76.00×
1811htmlescape0.3.1lib26.21 s0.36 s72.81×
695indenter0.3.4lib14.26 s0.20 s71.30×
1382cached_proc_macro_types0.1.1lib18.57 s0.28 s66.32×
884endian-type0.2.0lib14.51 s0.23 s63.09×
1941renderdoc-sys1.1.0lib14.92 s0.24 s62.17×
2290subtle-ng2.5.0lib16.14 s0.26 s62.08×
925local-waker0.1.4lib10.29 s0.17 s60.53×
1073safemem0.3.3lib9.54 s0.16 s59.62×
1996replace_with0.1.8lib11.53 s0.20 s57.65×
1069deunicode1.6.2lib21.24 s0.39 s54.46×
2195aws-endpoint0.60.3lib11.85 s0.22 s53.86×
1694khronos_api3.1.0lib46.14 s0.88 s52.43×
984nodrop0.1.14lib9.35 s0.18 s51.94×
787tagptr0.2.0lib12.76 s0.26 s49.08×
2023unscanny0.1.0lib21.43 s0.45 s47.62×
625precomputed-hash0.1.1lib8.92 s0.19 s46.95×

Top 25 by downloads

These are the most-depended-upon crates in the corpus. Speedups are more modest because their baselines are already small and their workspaces already have most code reachable through derive macros and re-exports.

RankCrateVersionDownloadsKindBaselineSlicerSpeedup
1syn2.0.1171,595,761,057lib5.42 s4.33 s1.25×
2hashbrown0.17.01,469,613,376lib1.92 s1.49 s1.29×
3bitflags2.11.11,226,802,506lib0.17 s0.47 s0.36×
4getrandom0.4.21,183,144,905lib2.44 s1.81 s1.35×
5rand_core0.10.11,106,779,248lib0.29 s0.21 s1.38×
6proc-macro21.0.1061,102,185,702lib1.87 s1.12 s1.67×
7libc0.2.1861,097,915,001lib1.70 s1.53 s1.11×
8base640.22.11,091,953,499lib0.28 s0.58 s0.48×
9quote1.0.451,087,228,740lib1.39 s2.37 s0.59×
10rand0.8.61,080,778,604lib8.48 s7.97 s1.06×
11regex-syntax0.8.101,019,226,974lib3.80 s2.97 s1.28×
12indexmap2.14.01,013,962,675lib2.24 s2.06 s1.09×
13itertools0.14.01,013,787,654lib0.58 s2.21 s0.26×
14cfg-if1.0.4975,444,268lib0.15 s0.13 s1.15×
15serde1.0.228952,229,993lib4.33 s4.13 s1.05×
16thiserror-impl2.0.18929,278,419lib3.43 s2.07 s1.66×
17thiserror2.0.18929,103,435lib4.06 s4.37 s0.93×
18rand_chacha0.10.0927,875,277lib4.14 s4.15 s1.00×
19windows-sys0.61.2920,841,453lib0.60 s0.35 s1.71×
20memchr2.8.0907,112,303lib1.61 s1.19 s1.35×
21unicode-ident1.0.24892,277,911lib0.29 s0.19 s1.53×
22serde_derive1.0.228889,654,647lib6.92 s6.18 s1.12×
23itoa1.0.18882,296,719lib0.41 s0.24 s1.71×
24autocfg1.5.0880,421,082lib0.76 s0.45 s1.69×
25heck0.5.0865,719,042lib0.75 s0.70 s1.07×

bitflags, base64, quote, itertools, thiserror: ~1.0× or worse — these crates are tiny (sub-second baseline) and the slicer's per-invocation overhead dominates. They are not correctness regressions.

Full per-crate data

The complete table — all 2,669 crates with rank, version, downloads, build times, and slicer status — lives in a single CSV in the source tree:

docs/ase2026-corpus.csv

Schema:

rank,name,version,downloads,kind,baseline_build_secs,slicer_build_secs,speedup,status
1,syn,2.0.117,1595761057,lib,5.42,4.33,1.252,both_built
2,hashbrown,0.17.0,1469613376,lib,1.92,1.49,1.295,both_built
...

status values:

  • both_built — baseline + slicer both succeeded; speedup is meaningful.
  • baseline_failed — the baseline cargo build itself failed (missing feature combos, target dependencies, etc.); the slicer leg is not attempted.
  • slicer_regression — baseline succeeded, slicer failed. The corpus contains zero entries with this status — that is the headline correctness claim.
  • error_tarball_missing — fetch failure during fetch-crates.sh.
  • not_run — corpus entry without a matching result file (typically semver-with-build-metadata that the harness split mis-parsed).

Where this corpus is referenced