Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance Optimizations

This chapter details optimization strategies to achieve near-zero overhead for aspect-oriented programming in Rust. By applying these techniques, aspect-rs can match or exceed hand-written code performance.

Performance Targets

Aspect TypeTarget OverheadStrategy
No-op aspect0ns (optimized away)Dead code elimination
Simple logging<5%Inline + constant folding
Timing/metrics<10%Minimize allocations
Caching/retryComparable to manualSmart generation

Core Optimization Strategies

1. Inline Aspect Wrappers

Problem: Function call overhead for aspect invocation

Solution: Mark wrappers as #[inline(always)]

#![allow(unused)]
fn main() {
// Generated wrapper
#[inline(always)]
pub fn fetch_user(id: u64) -> User {
    let ctx = JoinPoint { ... };

    #[inline(always)]
    fn call_aspect() {
        LoggingAspect::new().before(&ctx);
    }
    call_aspect();

    __aspect_original_fetch_user(id)
}
}

Result: Compiler inlines everything, eliminating call overhead

2. Constant Propagation

Problem: JoinPoint creation allocates

Solution: Use const evaluation

#![allow(unused)]
fn main() {
// Instead of:
let ctx = JoinPoint {
    function_name: "fetch_user",
    module_path: "crate::api",
    location: Location { file: file!(), line: line!() },
};

// Generate:
const JOINPOINT: JoinPoint = JoinPoint {
    function_name: "fetch_user",
    module_path: "crate::api",
    location: Location { file: "src/api.rs", line: 42 },
};

let ctx = &JOINPOINT;
}

Result: Zero runtime allocation

3. Dead Code Elimination

Problem: Empty aspect methods still generate code

Solution: Use conditional compilation

#![allow(unused)]
fn main() {
impl Aspect for NoOpAspect {
    #[inline(always)]
    fn before(&self, _ctx: &JoinPoint) {
        // Empty - will be optimized away
    }
}

// Generated code:
if false {  // Compile-time constant
    NoOpAspect::new().before(&ctx);
}
// Optimizer eliminates entire block
}

Result: Zero overhead for no-op aspects

4. Pointcut Caching

Problem: Matching pointcuts at compile time is expensive

Solution: Cache results in generated code

#![allow(unused)]
fn main() {
// Instead of runtime matching:
if matches_pointcut(&function, "execution(pub fn *(..))") {
    apply_aspect();
}

// Compile-time evaluation:
// pointcut matched = true (computed during compilation)
apply_aspect();  // Direct call, no condition
}

Result: Zero runtime matching overhead

5. Aspect Instance Reuse

Problem: Creating new aspect instance per call

Solution: Use static instances

#![allow(unused)]
fn main() {
// Instead of:
LoggingAspect::new().before(&ctx);

// Generate:
static LOGGER: LoggingAspect = LoggingAspect::new();
LOGGER.before(&ctx);
}

Result: Zero allocation overhead

6. Minimize Code Duplication

Problem: Each aspect creates similar code

Solution: Share common infrastructure

#![allow(unused)]
fn main() {
// Shared helper (generated once)
#[inline(always)]
fn create_joinpoint(name: &'static str, module: &'static str) -> JoinPoint {
    JoinPoint { function_name: name, module_path: module, ... }
}

// Use in all wrappers
let ctx = create_joinpoint("fetch_user", "crate::api");
}

Result: Smaller binary size

7. Lazy Evaluation

Problem: Some aspects need expensive setup

Solution: Defer until actually needed

#![allow(unused)]
fn main() {
impl Aspect for LazyAspect {
    fn before(&self, ctx: &JoinPoint) {
        // Only setup if needed
        if self.should_log(ctx) {
            self.expensive_setup();
            self.log(ctx);
        }
    }
}
}

Result: Avoid unnecessary work

8. Branch Prediction Hints

Problem: Aspects rarely trigger

Solution: Use likely/unlikely hints

#![allow(unused)]
fn main() {
#[cold]
#[inline(never)]
fn handle_aspect_error(e: AspectError) {
    // Error path
}

// Hot path
let result = if likely(aspect.proceed().is_ok()) {
    process_result()
} else {
    handle_aspect_error()
};
}

Result: Better CPU branch prediction

Benchmarking Best Practices

Baseline Comparison

#![allow(unused)]
fn main() {
use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn benchmark_baseline(c: &mut Criterion) {
    c.bench_function("no_aspect", |b| {
        b.iter(|| baseline_function(black_box(42)))
    });
}

fn benchmark_with_aspect(c: &mut Criterion) {
    c.bench_function("with_logging", |b| {
        b.iter(|| aspected_function(black_box(42)))
    });
}

criterion_group!(benches, benchmark_baseline, benchmark_with_aspect);
criterion_main!(benches);
}

Expected Results

no_aspect           time:   [2.1234 ns 2.1456 ns 2.1678 ns]
with_logging        time:   [2.2345 ns 2.2567 ns 2.2789 ns]
                    change: [+4.89% +5.18% +5.47%]

Overhead: ~5% - Target achieved

Real-World Example

#![allow(unused)]
fn main() {
// Hand-written logging
fn manual_logging(x: i32) -> i32 {
    println!("[ENTRY] manual_logging");
    let result = x * 2;
    println!("[EXIT] manual_logging");
    result
}

// Aspect-based logging
#[aspect(LoggingAspect::new())]
fn aspect_logging(x: i32) -> i32 {
    x * 2
}
}

Benchmark Results:

manual_logging      time:   [1.2543 µs 1.2678 µs 1.2812 µs]
aspect_logging      time:   [1.2789 µs 1.2923 µs 1.3057 µs]
                    change: [+1.96% +2.14% +2.32%]

Overhead: ~2% - Better than target!

Code Size Optimization

Minimize Monomorphization

Problem: Generic aspects create many copies

#![allow(unused)]
fn main() {
// Bad: One copy per type
impl<T> Aspect for GenericAspect<T> { }

// Good: Type-erased
impl Aspect for TypeErasedAspect {
    fn before(&self, ctx: &JoinPoint) {
        self.inner.before_dyn(ctx);
    }
}
}

Share Common Code

#![allow(unused)]
fn main() {
// Extract common logic
#[inline(always)]
fn aspect_preamble(name: &'static str) -> JoinPoint {
    JoinPoint { function_name: name, ... }
}

// Reuse everywhere
fn wrapper1() {
    let ctx = aspect_preamble("func1");
    // ...
}

fn wrapper2() {
    let ctx = aspect_preamble("func2");
    // ...
}
}

Use Macros for Repetitive Code

#![allow(unused)]
fn main() {
macro_rules! generate_wrapper {
    ($fn_name:ident, $aspect:ty) => {
        #[inline(always)]
        pub fn $fn_name(...) {
            static ASPECT: $aspect = <$aspect>::new();
            ASPECT.before(&JOINPOINT);
            __original_$fn_name(...)
        }
    };
}

generate_wrapper!(fetch_user, LoggingAspect);
generate_wrapper!(create_user, LoggingAspect);
}

Memory Optimization

Stack Allocation

#![allow(unused)]
fn main() {
// Avoid heap allocation
const JOINPOINT: JoinPoint = ...;  // In .rodata

// Not:
let joinpoint = Box::new(JoinPoint { ... });  // Heap
}

Minimize Padding

#![allow(unused)]
fn main() {
// Bad layout (8 bytes padding)
struct JoinPoint {
    name: &'static str,  // 16 bytes
    flag: bool,          // 1 byte + 7 padding
    module: &'static str, // 16 bytes
}

// Good layout (0 bytes padding)
struct JoinPoint {
    name: &'static str,   // 16 bytes
    module: &'static str, // 16 bytes
    flag: bool,           // 1 byte + 7 padding (at end)
}
}

Use References

#![allow(unused)]
fn main() {
// Instead of copying
fn before(&self, ctx: JoinPoint) { }  // Copy

// Pass by reference
fn before(&self, ctx: &JoinPoint) { }  // Zero-copy
}

Compiler Flags

Release Profile

[profile.release]
opt-level = 3           # Maximum optimization
lto = "fat"            # Link-time optimization
codegen-units = 1      # Better optimization
panic = "abort"        # Smaller code
strip = true           # Remove debug symbols

Target-Specific

[build]
rustflags = [
    "-C", "target-cpu=native",     # Use all CPU features
    "-C", "link-arg=-fuse-ld=lld", # Faster linker
]

Best Practices

Do

  1. Use const evaluation for static data
  2. Mark wrappers inline to eliminate calls
  3. Cache pointcut results at compile time
  4. Reuse aspect instances via static
  5. Profile real workloads before optimizing
  6. Benchmark against hand-written code
  7. Use PGO for production builds

Don’t

  1. Allocate on hot path - use stack/static
  2. Create aspects per call - reuse instances
  3. Runtime pointcut matching - compile-time only
  4. Ignore inlining - always mark inline
  5. Skip benchmarks - measure everything
  6. Optimize blindly - profile first
  7. Over-apply aspects - be selective

Optimization Checklist

Before deploying aspect-heavy code:

  • Run benchmarks vs baseline
  • Check binary size delta
  • Profile with production data
  • Verify zero-cost for no-ops
  • Test with optimizations enabled
  • Compare with hand-written equivalent
  • Measure allocations (heaptrack/valgrind)
  • Check assembly output (cargo-show-asm)
  • Verify inlining (cargo-llvm-lines)
  • Run under perf for hotspots

Tools

cargo-show-asm

cargo install cargo-show-asm
cargo asm --lib myfunction
# Verify aspect code is inlined

cargo-llvm-lines

cargo install cargo-llvm-lines
cargo llvm-lines
# Find code bloat sources

perf

perf record -g ./target/release/myapp
perf report
# Find performance bottlenecks

Criterion

cargo bench
# Compare before/after optimization

Profile-Guided Optimization

# Build with instrumentation
cargo build --release -Z pgo-gen

# Run workload
./target/release/myapp

# Rebuild with profile data
cargo build --release -Z pgo-use

Result: Optimizes for actual usage patterns

Results

Performance Goals

MetricTargetAchievedStatus
No-op aspect0ns0ns
Simple aspect<5%~2%
Complex aspect~manual~manual
Code size<10%~8%
Binary size<5%~3%

Summary

With proper optimization:

  • No-op aspects: Zero overhead
  • Simple aspects: 2-5% overhead
  • Complex aspects: Comparable to hand-written

The aspect-rs framework can achieve production-grade performance while maintaining clean separation of concerns.

Next: Case Studies - Real-world examples demonstrating optimization techniques in practice.