Benchmark Methodology

This chapter describes the rigorous methodology used to measure the performance characteristics of the aspect-rs framework. Understanding how we benchmark ensures you can trust the results and reproduce them yourself.

Overview

Performance benchmarking for aspect-oriented programming frameworks requires careful measurement to separate:

Aspect overhead from business logic execution time
Compile-time costs from runtime costs
Framework overhead from application complexity
Microbenchmark results from real-world performance

We use industry-standard tools and methodologies to ensure accurate, reproducible results.

Benchmarking Tools

Criterion.rs

All benchmarks use Criterion.rs, the gold standard for Rust benchmarking:

[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }

[[bench]]
name = "aspect_overhead"
harness = false

Why Criterion?

Statistical analysis of measurements with outlier detection
HTML reports with interactive graphs
Warmup periods to reach stable CPU state
Automatic comparison against saved baselines
Confidence intervals and significance testing
Guards against measurement bias

Benchmark Structure

#![allow(unused)]
fn main() {
use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn benchmark_baseline(c: &mut Criterion) {
    c.bench_function("no_aspect", |b| {
        b.iter(|| {
            baseline_function(black_box(42))
        })
    });
}

fn benchmark_with_aspect(c: &mut Criterion) {
    c.bench_function("with_logging", |b| {
        b.iter(|| {
            aspected_function(black_box(42))
        })
    });
}

criterion_group!(benches, benchmark_baseline, benchmark_with_aspect);
criterion_main!(benches);
}

Key elements:

black_box() prevents compiler optimization of benchmarked code
bench_function() runs multiple iterations automatically
Statistical analysis determines confidence intervals
Results include mean, median, standard deviation

Measurement Categories

1. Aspect Overhead

Measures the performance cost of the aspect framework itself:

#![allow(unused)]
fn main() {
// Baseline: no aspects
#[inline(never)]
fn baseline_add(a: i32, b: i32) -> i32 {
    a + b
}

// With no-op aspect  
#[aspect(NoOpAspect)]
#[inline(never)]
fn aspected_add(a: i32, b: i32) -> i32 {
    a + b
}

// Benchmark both
c.bench_function("baseline", |b| b.iter(|| baseline_add(black_box(1), black_box(2))));
c.bench_function("no-op aspect", |b| b.iter(|| aspected_add(black_box(1), black_box(2))));
}

What we measure:

JoinPoint structure allocation and initialization
Aspect trait virtual method dispatch overhead
before/after/around advice execution time
Result boxing and unboxing costs
Error handling propagation

Expected result: No-op aspect overhead should be <5ns on modern CPUs.

2. Component Costs

Isolates individual framework components to identify bottlenecks:

#![allow(unused)]
fn main() {
// Just JoinPoint creation
c.bench_function("joinpoint_creation", |b| {
    b.iter(|| {
        let ctx = JoinPoint {
            function_name: "test",
            module_path: "bench",
            location: Location { file: "bench.rs", line: 10 },
        };
        black_box(ctx);
    })
});

// Just aspect method call
c.bench_function("aspect_before_call", |b| {
    let aspect = LoggingAspect::new();
    let ctx = create_joinpoint();
    
    b.iter(|| {
        aspect.before(black_box(&ctx));
    })
});

// Just ProceedingJoinPoint proceed
c.bench_function("pjp_proceed", |b| {
    b.iter(|| {
        let pjp = create_proceeding_joinpoint(|| Ok(42));
        black_box(pjp.proceed().unwrap());
    })
});
}

This helps us understand where optimization efforts should focus.

3. Scaling Behavior

Tests performance as complexity increases with multiple aspects:

#![allow(unused)]
fn main() {
// 1 aspect
#[aspect(LoggingAspect::new())]
fn one_aspect() { do_work(); }

// 3 aspects
#[aspect(LoggingAspect::new())]
#[aspect(TimingAspect::new())]
#[aspect(MetricsAspect::new())]
fn three_aspects() { do_work(); }

// 5 aspects
#[aspect(LoggingAspect::new())]
#[aspect(TimingAspect::new())]
#[aspect(MetricsAspect::new())]
#[aspect(CachingAspect::new())]
#[aspect(RetryAspect::new(3, 100))]
fn five_aspects() { do_work(); }
}

Expected results:

Linear scaling: O(n) where n = number of aspects
No quadratic behavior or pathological cases
Consistent per-aspect overhead (~2-5ns each)

4. Real-World Scenarios

Benchmarks that simulate actual production usage patterns:

#![allow(unused)]
fn main() {
// API request simulation
c.bench_function("api_request_baseline", |b| {
    let db = setup_test_database();
    b.iter(|| {
        let request = create_request(black_box(123));
        process_request_baseline(black_box(&db), black_box(request))
    })
});

c.bench_function("api_request_with_aspects", |b| {
    let db = setup_test_database();
    b.iter(|| {
        let request = create_request(black_box(123));
        process_request_with_aspects(black_box(&db), black_box(request))
    })
});
}

These scenarios include realistic I/O, database operations, and business logic complexity.

Benchmark Configurations

Compiler Optimization Flags

All benchmarks run with production-level optimizations:

[profile.bench]
opt-level = 3           # Maximum optimization
lto = "fat"            # Link-time optimization across all crates
codegen-units = 1      # Better optimization (slower compile)
panic = "abort"        # Smaller code, faster unwinding

Rationale:

opt-level = 3: Enables all LLVM optimizations
lto = "fat": Allows cross-crate inlining of aspect code
codegen-units = 1: Gives optimizer maximum visibility
panic = "abort": Removes unwinding overhead

This configuration represents how aspect-rs would be deployed in production.

System Configuration

For reproducible results, benchmarks should run on:

CPU: Modern x86_64 processor (2+ GHz, consistent clock)
RAM: 8+ GB available
OS: Linux (Ubuntu 22.04 LTS) or macOS (latest)
System Load: Minimal background processes

Preparing system for benchmarking:

# Disable CPU frequency scaling (Linux)
sudo cpupower frequency-set --governor performance

# Stop unnecessary services
sudo systemctl stop bluetooth cups avahi-daemon

# Clear system caches
sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

# Verify no other heavy processes
htop  # Should show <10% CPU usage at idle

# Run benchmarks
cargo bench --workspace

Statistical Rigor

Criterion automatically provides robust statistics:

Warmup: 3 seconds to reach stable CPU state
Sample size: 100 samples minimum
Iterations: 10,000+ per sample (adjusted for duration)
Outlier detection: Modified Thompson Tau test
Confidence intervals: 95% by default
Significance testing: Student’s t-test (p < 0.05)

Example output with interpretation:

no_aspect               time:   [2.1234 ns 2.1456 ns 2.1678 ns]
                        change: [-0.5123% +0.1234% +0.7890%] (p = 0.23 > 0.05)
                        No change in performance detected.

with_logging            time:   [2.2345 ns 2.2567 ns 2.2789 ns]
                        change: [-0.3456% +0.2345% +0.8901%] (p = 0.34 > 0.05)
                        No change in performance detected.

Calculated overhead: 0.1111 ns (5.18% increase)
95% confidence interval: [4.85%, 5.51%]

The three time values represent: [lower bound, estimate, upper bound] of the 95% confidence interval.

Controlling Variables

Preventing Compiler Optimization

The Rust compiler is highly intelligent and may optimize away benchmarked code:

#![allow(unused)]
fn main() {
// BAD: Compiler might optimize away unused result
fn bad_benchmark(c: &mut Criterion) {
    c.bench_function("bad", |b| {
        b.iter(|| {
            aspected_function(42)  // Result unused, may be eliminated!
        })
    });
}

// GOOD: Use black_box to prevent optimization
fn good_benchmark(c: &mut Criterion) {
    c.bench_function("good", |b| {
        b.iter(|| {
            black_box(aspected_function(black_box(42)))
        })
    });
}
}

Why this matters:

Without black_box(), compiler may inline and optimize away entire function
Could measure 0ns when actual code takes nanoseconds
Results would be misleading and non-representative

Avoiding Measurement Noise

Common sources of noise in benchmarks:

CPU throttling: Use performance governor, not powersave
Background processes: Close browsers, IDEs, chat apps
Network activity: Disable WiFi/Ethernet during benchmarks
Disk I/O: Use tmpfs (/dev/shm) for temporary files
System updates: Disable auto-updates temporarily
Thermal throttling: Ensure adequate cooling
Turbo boost: Can cause inconsistent results; disable if needed

Isolation with `#[inline(never)]`

Prevents cross-function optimization for fair comparison:

#![allow(unused)]
fn main() {
#[inline(never)]
fn baseline_function(x: i32) -> i32 {
    x * 2
}

#[aspect(LoggingAspect::new())]
#[inline(never)]
fn aspected_function(x: i32) -> i32 {
    x * 2
}
}

This ensures:

Each function is compiled as a separate unit
No inlining across benchmark boundaries
Fair comparison of actual runtime costs
Results reflect real-world function call overhead

Baseline Comparison Methodology

Manual vs Aspect-Based Implementation

Critical comparison: aspect framework vs hand-written equivalent:

#![allow(unused)]
fn main() {
// Manual logging (baseline - what developers write without aspects)
#[inline(never)]
fn manual_logging(x: i32) -> i32 {
    println!("[ENTRY] manual_logging");
    let result = x * 2;
    println!("[EXIT] manual_logging");
    result
}

// Aspect-based logging (what aspect-rs provides)
#[aspect(LoggingAspect::new())]
#[inline(never)]
fn aspect_logging(x: i32) -> i32 {
    x * 2
}

// Benchmark both approaches
c.bench_function("manual_logging", |b| {
    b.iter(|| manual_logging(black_box(42)))
});

c.bench_function("aspect_logging", |b| {
    b.iter(|| aspect_logging(black_box(42)))
});
}

Success criteria: Aspect overhead should be <5% compared to manual implementation.

If overhead exceeds 10%, we investigate and optimize the framework.

Benchmark Organization

Microbenchmarks

Located in aspect-core/benches/:

aspect-core/benches/
├── aspect_overhead.rs      # Basic aspect overhead measurement
├── joinpoint_creation.rs   # JoinPoint allocation cost
├── advice_dispatch.rs      # Virtual method dispatch timing
├── multiple_aspects.rs     # Scaling with aspect count
├── around_advice.rs        # ProceedingJoinPoint overhead
└── error_handling.rs       # AspectError propagation cost

Each file focuses on one specific performance aspect.

Integration Benchmarks

Located in aspect-examples/benches/:

aspect-examples/benches/
├── api_server_bench.rs     # Full API request/response cycle
├── database_bench.rs       # Transaction aspect overhead
├── security_bench.rs       # Authorization check performance
├── resilience_bench.rs     # Retry/circuit breaker costs
└── caching_bench.rs        # Cache lookup/store overhead

These measure realistic, end-to-end scenarios.

Regression Detection

Using saved baselines to detect performance regressions:

# Save baseline from main branch
git checkout main
cargo bench --workspace -- --save-baseline main

# Switch to feature branch
git checkout feature/new-optimization
cargo bench --workspace -- --baseline main

Criterion output:

no_aspect               time:   [2.1456 ns 2.1678 ns 2.1890 ns]
                        change: [-1.2% +0.5% +2.1%] (p = 0.42 > 0.05)
                        No significant change detected.

with_logging            time:   [2.3456 ns 2.3678 ns 2.3890 ns]
                        change: [+8.2% +9.5% +10.8%] (p = 0.001 < 0.05)
                        Performance has regressed.

A regression >5% triggers investigation before merge.

Metrics Collected

Primary Performance Metrics

Mean execution time - Average across all samples
Median execution time - Middle value (robust against outliers)
Standard deviation - Measure of variance
Min/Max - Best and worst case timings

Secondary Metrics

Memory allocations - Tracked via dhat profiler
Binary size - Measured via cargo-bloat
Compile time - Via cargo build --timings
LLVM IR size - Via cargo-llvm-lines

Derived Metrics

Overhead percentage: (aspect_time - baseline_time) / baseline_time * 100
Per-aspect cost: total_overhead / number_of_aspects
Throughput: Operations per second

Interpreting Results

Statistical Significance

Criterion uses Student’s t-test with threshold p < 0.05:

p < 0.05: Change is statistically significant
p ≥ 0.05: Change is within noise/variance

Example interpretation:

time:   [2.2567 ns 2.2789 ns 2.3012 ns]
change: [+5.12% +5.45% +5.78%] (p = 0.002 < 0.05)
Performance has regressed.

This indicates a true regression, not measurement noise.

Acceptable Variance

Normal variance in nanosecond-level microbenchmarks:

0-2%: Excellent stability
2-5%: Good stability (typical for microbenchmarks)
5-10%: Acceptable (environmental factors)
>10%: Investigate (possible actual regression or system issue)

Regression Investigation Thresholds

When performance degrades:

<3% slower: Likely noise; monitor trend
3-5% slower: Verify across multiple runs
5-10% slower: Worth investigating cause
>10% slower: Definite regression; requires fix before merge
>25% slower: Critical regression; blocks PR immediately

Best Practices

DO:

✅ Use black_box() for all inputs and outputs
✅ Run on dedicated hardware when possible
✅ Use #[inline(never)] for fair comparison
✅ Benchmark realistic workloads, not just microbenchmarks
✅ Save baselines for regression detection
✅ Run benchmarks multiple times to verify stability
✅ Document system configuration and environment
✅ Compare against hand-written alternatives
✅ Use appropriate sample sizes (100+ samples)
✅ Warm up before measuring

DON’T:

❌ Run benchmarks on laptop battery power
❌ Run with heavy background processes active
❌ Compare debug vs release builds
❌ Trust single-run results
❌ Ignore compiler warnings about dead code elimination
❌ Benchmark without black_box() protection
❌ Compare results from different machines directly
❌ Cherry-pick favorable results

Reproducibility

Version Control

All benchmark code is version controlled:

aspect-rs/
├── aspect-core/benches/       # Framework benchmarks
├── aspect-examples/benches/   # Application benchmarks
├── BENCHMARKS.md             # Results documentation
└── benches/README.md         # Running instructions

Running Benchmarks

Anyone can reproduce our results:

# Clone repository
git clone https://github.com/user/aspect-rs
cd aspect-rs

# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Run all benchmarks
cargo bench --workspace

# View detailed HTML reports
open target/criterion/report/index.html

# Or view specific benchmark
open target/criterion/aspect_overhead/report/index.html

Criterion generates multiple output formats:

HTML: Interactive charts and detailed statistics
JSON: Raw data in target/criterion/<bench>/base/estimates.json
CSV: Can be exported for spreadsheet analysis

# Generate comparison report
cargo bench --workspace -- --baseline previous

# Export results
cp -r target/criterion/report benchmark-results-2024-02-16/

Validation

Cross-Platform Testing

We run benchmarks on multiple platforms to ensure consistency:

Linux: Ubuntu 22.04 LTS (x86_64)
macOS: Latest version (ARM64 M1/M2)
Windows: Windows 11 (x86_64)

Overhead percentages should be similar across platforms (within 2-3%).

Manual Verification

Spot-check Criterion results with manual timing:

#![allow(unused)]
fn main() {
use std::time::Instant;

fn manual_timing() {
    let iterations = 10_000_000;

    // Baseline timing
    let start = Instant::now();
    for i in 0..iterations {
        black_box(baseline_function(black_box(i as i32)));
    }
    let baseline_time = start.elapsed();

    // With aspect timing
    let start = Instant::now();
    for i in 0..iterations {
        black_box(aspected_function(black_box(i as i32)));
    }
    let aspect_time = start.elapsed();

    let baseline_ns = baseline_time.as_nanos() / iterations as u128;
    let aspect_ns = aspect_time.as_nanos() / iterations as u128;
    
    println!("Baseline: {} ns", baseline_ns);
    println!("With aspect: {} ns", aspect_ns);
    println!("Overhead: {:.2}%",
        (aspect_ns as f64 - baseline_ns as f64) / baseline_ns as f64 * 100.0
    );
}
}

Results should match Criterion within ±10%.

Key Takeaways

Criterion.rs provides statistical rigor - Use it for all benchmarks
Control variables carefully - Minimize environmental noise
Prevent unwanted optimization - Use black_box() and #[inline(never)]
Compare fairly - Benchmark against equivalent hand-written code
Save baselines - Enable regression detection over time
Run multiple times - Verify stability and reproducibility
Document everything - Record system config, compiler flags, environment
Validate results - Cross-check on multiple platforms

Understanding methodology builds confidence in results. When you see “5% overhead”, you know exactly what that means and how it was measured.

Next Steps

See Benchmark Results for actual measured performance data
See Real-World Performance for production scenarios
See Optimization Techniques for improving performance
See Running Benchmarks for step-by-step execution guide

Related Chapters:

Chapter 8: Case Studies - Real-world examples
Chapter 9.2: Results - Measured performance data
Chapter 9.5: Running - How to execute benchmarks

Keyboard shortcuts

aspect-rs: AOP Framework for Rust