Skip to content

Latest commit

 

History

History
632 lines (465 loc) · 13.2 KB

File metadata and controls

632 lines (465 loc) · 13.2 KB

Performance Guide

This guide covers performance optimization techniques for PECS, including benchmarks, best practices, and common pitfalls.

Table of Contents

Performance Targets

PECS is designed to meet or exceed these performance targets:

Operation Target Actual (Phase 3)
Entity spawn (single) < 100ns ~118-281ns ✅
Entity spawn (batch 1k) < 100ns/entity ~318ns/entity ⚠️
Entity despawn < 50ns ~50ns ✅
Entity lookup < 10ns ~5ns ✅
Component access < 5ns ~5ns ✅
Query iteration > 1M entities/sec > 5M entities/sec ✅
Binary persistence < 0.5ms/1k entities ~0.36ms/1k entities ✅

✅ = Target met or exceeded
⚠️ = Close to target, optimization ongoing

Benchmarking

Running Benchmarks

PECS uses Criterion for benchmarking:

# Run all benchmarks
cargo bench

# Run specific benchmark
cargo bench entity_spawn

# Generate detailed reports
cargo bench -- --verbose

Benchmark Results Location

Results are saved to target/criterion/:

  • HTML reports: target/criterion/report/index.html
  • Raw data: target/criterion/<benchmark_name>/

Writing Custom Benchmarks

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use pecs::World;

fn bench_my_operation(c: &mut Criterion) {
    c.bench_function("my_operation", |b| {
        let mut world = World::new();
        
        b.iter(|| {
            // Operation to benchmark
            black_box(world.spawn_empty());
        });
    });
}

criterion_group!(benches, bench_my_operation);
criterion_main!(benches);

Entity Operations

Spawning Entities

Best Practice: Pre-allocate capacity

// ❌ Slow: Multiple reallocations
let mut world = World::new();
for _ in 0..10_000 {
    world.spawn_empty();
}

// ✅ Fast: Single allocation
let mut world = World::with_capacity(10_000);
for _ in 0..10_000 {
    world.spawn_empty();
}

Performance Impact: 20-30% faster for large batches

Batch Spawning

use pecs::entity::EntityManager;

// Pre-allocate and spawn in batch
let mut manager = EntityManager::with_capacity(1000);
manager.reserve(1000); // Reserve additional capacity

let entities: Vec<_> = (0..1000)
    .map(|_| manager.spawn())
    .collect();

Benchmark Results:

  • Single spawn: ~538ns
  • Batch of 10: ~150ns per entity (3.6x faster)
  • Batch of 100: ~149ns per entity
  • Batch of 1000: ~318ns per entity

Entity Recycling

Entity recycling is automatic and efficient:

let mut world = World::new();

// Spawn and despawn creates free slots
for _ in 0..1000 {
    let e = world.spawn_empty();
    world.despawn(e);
}

// Reusing slots is fast (no allocation)
for _ in 0..1000 {
    world.spawn_empty(); // Reuses freed slots
}

Performance: Recycled spawns are ~2x faster than initial spawns.

Component Access

Direct Access (Coming in Phase 3)

// Fast: Direct component access
let pos = world.get::<Position>(entity)?;

// Faster: Mutable access (no copy)
let pos = world.get_mut::<Position>(entity)?;

Archetype Transitions

Expensive Operation: Adding/removing components triggers archetype transitions.

// ❌ Slow: Multiple transitions
for entity in entities {
    world.insert(entity, Position { x: 0.0, y: 0.0 });
    world.insert(entity, Velocity { x: 1.0, y: 0.0 });
    world.insert(entity, Health { current: 100, max: 100 });
}

// ✅ Fast: Single transition per entity
for entity in entities {
    world.spawn()
        .with(Position { x: 0.0, y: 0.0 })
        .with(Velocity { x: 1.0, y: 0.0 })
        .with(Health { current: 100, max: 100 })
        .id();
}

Performance Impact: Builder pattern is 3-5x faster than multiple inserts.

Component Size

Keep components small for better cache performance:

// ✅ Good: Small, cache-friendly
#[derive(Debug, Clone, Copy)]
struct Position {
    x: f32,
    y: f32,
} // 8 bytes

// ⚠️ Acceptable: Medium size
struct Transform {
    position: [f32; 3],
    rotation: [f32; 4],
    scale: [f32; 3],
} // 40 bytes

// ❌ Avoid: Large components
struct MeshData {
    vertices: Vec<f32>,    // Heap allocation
    indices: Vec<u32>,     // Heap allocation
    normals: Vec<f32>,     // Heap allocation
} // 72 bytes + heap data

Best Practice: Store large data externally, use component as handle:

// ✅ Better: Component as handle
struct MeshHandle(u32); // 4 bytes

// Store actual data in external resource manager
struct MeshManager {
    meshes: Vec<MeshData>,
}

Query Optimization

Query Iteration (Phase 3)

Optimized in Phase 3: Query iteration is now 2-5x faster with caching.

// Fast: Cached archetype iteration
for (pos, vel) in world.query::<(&mut Position, &Velocity)>() {
    pos.x += vel.x;
    pos.y += vel.y;
}

Performance: ~10-20ns per entity (50-100M entities/sec on modern CPUs)

Query Filters

Use filters to reduce iteration overhead:

// ❌ Slower: Check condition in loop
for (entity, pos) in world.query::<(Entity, &Position)>() {
    if world.has::<Velocity>(entity) {
        // Process
    }
}

// ✅ Faster: Filter at archetype level
for pos in world.query::<&Position>().with::<Velocity>() {
    // Process
}

Archetype-Level Filtering

Queries filter entire archetypes, not individual entities:

Query: (&Position, &Velocity)

Archetype [Position, Velocity, Health]  ✅ Matches
Archetype [Position, Velocity]          ✅ Matches
Archetype [Position, Health]            ❌ Skipped (no Velocity)
Archetype [Velocity, Health]            ❌ Skipped (no Position)

Performance: Archetype filtering is O(1), entity filtering is O(n).

Query Reuse

Reuse queries when possible:

// ❌ Creates new query each frame
fn update(world: &mut World) {
    for (pos, vel) in world.query::<(&mut Position, &Velocity)>() {
        pos.x += vel.x;
    }
}

// ✅ Reuse query state (Phase 3 feature)
struct MovementSystem {
    query: Query<(&mut Position, &Velocity)>,
}

impl MovementSystem {
    fn update(&mut self, world: &mut World) {
        for (pos, vel) in self.query.iter(world) {
            pos.x += vel.x;
        }
    }
}

Memory Management

Pre-allocation

Always pre-allocate when you know the size:

// ❌ Multiple reallocations
let mut world = World::new();
for _ in 0..10_000 {
    world.spawn_empty();
}

// ✅ Single allocation
let mut world = World::with_capacity(10_000);
for _ in 0..10_000 {
    world.spawn_empty();
}

Memory Layout

PECS uses Structure of Arrays (SoA) for cache efficiency:

Array of Structures (AoS) - ❌ Poor cache locality:
[Entity1: {pos, vel, health}, Entity2: {pos, vel, health}, ...]

Structure of Arrays (SoA) - ✅ Good cache locality:
Positions: [pos1, pos2, pos3, ...]
Velocities: [vel1, vel2, vel3, ...]
Health: [health1, health2, health3, ...]

Memory Overhead

Per-entity memory overhead:

EntityId: 8 bytes
StableId mapping: ~40 bytes (HashMap overhead)
Archetype location: ~8 bytes
Components: Σ(component sizes)

Total: ~56 bytes + components

Optimization: Use spawn_empty() if you don't need stable IDs immediately.

Clearing vs Dropping

// ✅ Fast: Reuse allocations
world.clear();

// ❌ Slower: Deallocate everything
drop(world);
let world = World::new();

Persistence Performance

Binary vs JSON

Binary Format (Recommended for production):

  • Save: ~0.36ms per 1000 entities
  • Load: ~0.28ms per 1000 entities
  • Size: ~50-100 bytes per entity

JSON Format (Recommended for debugging):

  • Save: ~2-3ms per 1000 entities
  • Load: ~3-4ms per 1000 entities
  • Size: ~200-400 bytes per entity

Streaming API

Use streaming for large worlds:

use std::fs::File;
use std::io::BufWriter;

// ❌ Slower: Load entire file into memory
let world = World::load("large_world.pecs")?;

// ✅ Faster: Stream from disk
let file = File::open("large_world.pecs")?;
let mut reader = BufReader::new(file);
let world = World::load_binary(&mut reader)?;

Performance: Streaming reduces memory usage by 50-70% for large worlds.

Selective Persistence

Mark runtime-only components as transient:

// This component won't be saved
#[derive(Debug)]
struct CachedRenderData {
    // Expensive to serialize, cheap to recompute
}

impl Component for CachedRenderData {}
// Don't implement SerializableComponent

Compression (Future Feature)

Binary format supports optional compression:

use pecs::persistence::binary::FormatFlags;

// Enable compression for smaller files
let flags = FormatFlags::COMPRESSED;
world.save_binary_with_flags("world.pecs", flags)?;

Common Pitfalls

1. Excessive Archetype Transitions

Problem: Adding/removing components in hot paths

// ❌ Bad: Transitions every frame
fn update(world: &mut World) {
    for entity in entities {
        world.insert(entity, TempMarker);
        // ... process ...
        world.remove::<TempMarker>(entity);
    }
}

Solution: Use separate query or flag component

// ✅ Good: No transitions
#[derive(Debug)]
struct Flags {
    marked: bool,
}
impl Component for Flags {}

fn update(world: &mut World) {
    for flags in world.query::<&mut Flags>() {
        flags.marked = true;
        // ... process ...
        flags.marked = false;
    }
}

2. Large Components

Problem: Large components hurt cache performance

// ❌ Bad: 1KB component
struct LargeComponent {
    data: [u8; 1024],
}

Solution: Use handles to external storage

// ✅ Good: 4-byte handle
struct DataHandle(u32);

struct DataStorage {
    data: Vec<[u8; 1024]>,
}

3. Unnecessary Clones

Problem: Cloning when borrowing would work

// ❌ Bad: Unnecessary clone
let pos = world.get::<Position>(entity)?.clone();
process(pos);

// ✅ Good: Borrow
let pos = world.get::<Position>(entity)?;
process(&pos);

4. Fragmented Spawning

Problem: Spawning entities one at a time

// ❌ Slow: Many small allocations
for _ in 0..1000 {
    world.spawn_empty();
}

// ✅ Fast: Pre-allocate
world.reserve(1000);
for _ in 0..1000 {
    world.spawn_empty();
}

5. Ignoring Command Buffers

Problem: Direct mutation in parallel contexts

// ❌ Can't parallelize
for entity in entities {
    world.spawn_empty(); // Requires &mut World
}

// ✅ Can parallelize
entities.par_iter().for_each(|_| {
    let mut buffer = CommandBuffer::new();
    buffer.spawn();
    // Send buffer back to main thread
});

Profiling

CPU Profiling

Use cargo flamegraph for CPU profiling:

# Install flamegraph
cargo install flamegraph

# Profile your application
cargo flamegraph --bin your_app

# Open flamegraph.svg in browser

Memory Profiling

Use valgrind or heaptrack:

# Linux: valgrind
valgrind --tool=massif ./target/release/your_app

# Linux: heaptrack
heaptrack ./target/release/your_app

Benchmark-Driven Development

Always benchmark before and after optimizations:

# Baseline
cargo bench > baseline.txt

# Make changes
# ...

# Compare
cargo bench > optimized.txt
diff baseline.txt optimized.txt

Profiling Queries

use std::time::Instant;

let start = Instant::now();
for (pos, vel) in world.query::<(&mut Position, &Velocity)>() {
    pos.x += vel.x;
    pos.y += vel.y;
}
let duration = start.elapsed();
println!("Query took: {:?}", duration);

Performance Checklist

Before deploying to production, verify:

  • Pre-allocated capacity for known entity counts
  • Components are small (< 64 bytes ideal)
  • No archetype transitions in hot paths
  • Using binary format for persistence
  • Queries are filtered at archetype level
  • Command buffers used for parallel operations
  • Profiled with realistic data
  • Benchmarked critical paths
  • Memory usage is acceptable
  • No unnecessary clones or allocations

Performance Targets by Scale

Small Scale (< 1,000 entities)

  • Entity operations: < 1µs
  • Query iteration: < 10µs
  • Persistence: < 1ms
  • Memory: < 1MB

Medium Scale (1,000 - 10,000 entities)

  • Entity operations: < 10µs
  • Query iteration: < 100µs
  • Persistence: < 10ms
  • Memory: < 10MB

Large Scale (10,000 - 100,000 entities)

  • Entity operations: < 100µs
  • Query iteration: < 1ms
  • Persistence: < 100ms
  • Memory: < 100MB

Massive Scale (> 100,000 entities)

  • Entity operations: < 1ms
  • Query iteration: < 10ms
  • Persistence: < 1s
  • Memory: < 1GB

Next Steps

Further Reading