Phase 13: Performance Optimization — Devlog¶
Status: Complete (+ postmortem cleanup)
Tests: 142 (+ 7 benchmark, + 11 determinism, + 28 postmortem = 188 total phase tests)
Total Suite: 4,247 tests passing (up from 4,077)
New Source Files: 2 (core/numba_utils.py, simulation/aggregation.py)
Modified Source Files: ~10 (terrain/infrastructure.py, terrain/los.py, detection/estimation.py, combat/ballistics.py, movement/pathfinding.py, simulation/battle.py, simulation/engine.py, simulation/scenario.py, validation/monte_carlo.py, pyproject.toml)
New Scripts: scripts/profile_golan.py (Golan campaign profiling)
Overview¶
Phase 13 delivers performance optimization across three tracks: algorithmic improvements (13a), compiled extensions (13b), and parallelism enhancement (13c). All changes are backward-compatible via enable_* config flags and safe defaults. The optional numba dependency (uv sync --extra perf) enables JIT compilation for hot paths; without it, pure-Python fallbacks are used.
Sub-phase Summary¶
13a-1: Benchmark Infrastructure (7 benchmark tests)¶
- Added
benchmarkmarker topyproject.toml, excluded from default test runs - Created
tests/benchmarks/test_phase13_benchmarks.pywith baseline measurements for spatial queries, Kalman predict, LOS checks, pathfinding, RK4 trajectory, MC import, and viewshed
13a-2: STRtree Spatial Indexing (14 tests)¶
- Rewrote
terrain/infrastructure.pyto useshapely.STRtreefor all spatial queries - Built 3 STRtree indices:
_road_tree,_building_tree,_airfield_tree - Rewrote
roads_near(),nearest_road(),buildings_at(),buildings_near(),airfields_near()to use STRtree queries - Post-filter for condition > 0 (damaged features)
13a-3: Kalman F/Q Matrix Caching (6 tests)¶
- Added
_cached_dt,_cached_F,_cached_Qfields toStateEstimator predict()reuses cached matrices when dt matches (eliminates ~2499 redundant matrix constructions per tick)
13a-4: Multi-tick LOS Cache (11 tests)¶
- Added
invalidate_cells(dirty_cells)toLOSEngine - Selectively removes cache entries involving moved-unit grid cells
- More efficient than full
clear_los_cache()when few units moved - Added
los_cache_sizeproperty for monitoring
13a-5: Viewshed Vectorization (8 tests)¶
visible_area()uses numpy broadcasting for distance computation- Skips out-of-range cells before running per-cell LOS checks
- Benefits from per-tick LOS cache
13a-6: Auto-Resolve for Minor Battles (17 tests)¶
- Added
auto_resolve()toBattleManagerwith simplified Lanchester attrition - 10 time steps, exponent 0.5, morale and supply modifiers
auto_resolve_enabledandauto_resolve_max_unitsconfig flagsAutoResolveResultdataclass with winner, side_losses, duration
13a-7: Force Aggregation/Disaggregation (27 tests)¶
- New
simulation/aggregation.py(~350 lines) AggregationEnginemanages complete lifecycle: snapshot → aggregate → disaggregateUnitSnapshotcaptures all subsystem state (unit, morale, weapons, sensors, supply)AggregateUnitrepresents composite formation with constituent snapshotscheck_aggregation_candidates()finds eligible groups (distance from battle, minimum size)check_disaggregation_triggers()finds aggregates needing breakup (approaching battle)- State persistence via
get_state()/set_state() - Deterministic aggregate IDs
13b-1: Numba Utils Infrastructure (5 tests)¶
- New
core/numba_utils.pywithNUMBA_AVAILABLEflag and@optional_jitdecorator - Falls back to identity decorator when Numba not installed
- Supports both
@optional_jitand@optional_jit(cache=False)patterns
13b-2: Numba JIT for RK4 Trajectory (18 tests)¶
- Extracted
_derivs_kernel()and_rk4_trajectory_kernel()as JIT functions compute_trajectory()delegates to kernel, returns 2-point trajectory (start + impact)@optional_jiton_speed_of_sound()and_mach_drag_multiplier()- Boolean config flags converted to int for JIT compatibility
13b-3: Numba JIT for DDA Raycasting (8 tests)¶
_los_terrain_kernel()JIT function for terrain-only LOS ray march- Inline bilinear interpolation and earth curvature correction
_check_los_terrain_jit()method on LOSEngine
13b-4: A* Difficulty Grid Pre-computation (11 tests)¶
_compute_difficulty_grid()pre-computes cell difficulty into numpy arrayfind_path()uses array lookup (O(1)) instead of per-cell dict cache- Bounding box + 10-cell margin with fallback for out-of-bounds cells
13c-1: MC Parallelism Enhancement (6 tests)¶
- Both
MonteCarloHarnessandCampaignMonteCarloHarnessusesubmit()+as_completed() - Results sorted by seed for deterministic ordering
13c-2: Integration Benchmarks + Determinism Tests (11 determinism tests)¶
- LOS cache: selective invalidation matches full clear
- Kalman cache: cached predict identical to uncached
- RK4 trajectory: deterministic across runs
- Aggregation: round-trip preserves unit state
- Auto-resolve: deterministic given same PRNG
- Viewshed: deterministic and matches individual LOS checks
Implementation Decisions¶
-
STRtree
buildings_at()usespredicate='covered_by'not'contains'— Shapely 2.0 STRtree has inverted semantics for this predicate direction. -
RK4 kernel returns 2-point trajectory — Numba can't create Python objects, so the fast path returns only impact data. Full trajectory recording (for visualization) would require a separate slow path.
-
A* uses pre-computed numpy grid instead of Numba JIT — A involves heapq and dicts which Numba doesn't support well. Pre-computing difficulty into a numpy array eliminates the per-cell method call overhead (the actual bottleneck) while keeping A logic in pure Python.
-
Force aggregation captures all subsystem state —
UnitSnapshotstores unit state, morale, weapons, sensors, and supply inventory. This ensures disaggregation fully restores the original unit across all simulation systems. -
Auto-resolve adapted from COA wargaming — The Lanchester attrition model in
battle.pyis adapted fromc2/planning/coa.py::wargame_coa(), using the same exponent and attrition math.
Known Limitations¶
- Numba not installed by default — JIT kernels only activate with
uv sync --extra perf. Without Numba, all code paths use pure-Python fallbacks with identical behavior. - Auto-resolve is simplified — Uses aggregate combat power, not per-unit engagement. Suitable for minor/distant battles, not main battles.
- Aggregation does not handle orders — Active orders are lost on aggregation. Disaggregated units await new orders.
- Thread-pool per-side parallelism not implemented — The plan included per-side thread-pool parallelism within ticks, but this was deferred as it requires careful PRNG stream partitioning and the risk of non-determinism outweighs the benefit for current scenario scales.
Postmortem Cleanup (28 tests)¶
Benchmarking revealed that two major features — force aggregation and selective LOS invalidation — were implemented but not wired into the simulation loop. Auto-resolve was already wired. The postmortem wires the remaining features and adds profiling infrastructure.
Changes¶
-
simulation/scenario.py— Addedaggregation_engine: Any = Nonefield toSimulationContext. Included inget_state()/set_state()engine lists. InstantiatedAggregationEngineinScenarioLoader._create_engines(). -
simulation/engine.py— Wired aggregation into strategic tick: betweenupdate_strategic()anddetect_engagements(), runscheck_disaggregation_triggers()→disaggregate(), thencheck_aggregation_candidates()→aggregate(). Added_compute_battle_positions()helper (computes centroids from active battle unit_ids). Addedenable_selective_los_invalidationconfig flag toEngineConfig. Replaced monolithic LOS cache clear with dirty-cell tracking:_snapshot_unit_cells()before/after movement,invalidate_cells()for changed cells. -
scripts/profile_golan.py— New profiling script usingPerformanceProfilerfor the Golan Heights campaign.
Known Limitations¶
- Aggregation still disabled by default —
AggregationConfig.enable_aggregation = False. Explicit opt-in required via scenario config. - Selective LOS disabled by default —
EngineConfig.enable_selective_los_invalidation = False. Full clear remains the safe default. - Aggregation does not preserve active orders — Disaggregated units await new orders (unchanged from Phase 13a-7).
Test Files¶
| File | Tests | Focus |
|---|---|---|
test_phase13_benchmarks.py |
7 | Performance baselines |
test_phase_13a2_strtree.py |
14 | STRtree spatial indexing |
test_phase_13a3_kalman_cache.py |
6 | Kalman F/Q caching |
test_phase_13a4_los_cache.py |
11 | Selective LOS invalidation |
test_phase_13a5_viewshed.py |
8 | Viewshed vectorization |
test_phase_13a6_auto_resolve.py |
17 | Auto-resolve |
test_phase_13a7_aggregation.py |
27 | Force aggregation/disaggregation |
test_phase_13b1_numba_utils.py |
5 | Numba utils infrastructure |
test_phase_13b2_numba_rk4.py |
18 | RK4 JIT kernel |
test_phase_13b3_numba_dda.py |
8 | DDA raycasting JIT |
test_phase_13b4_astar_precompute.py |
11 | A* difficulty grid |
test_phase_13c1_mc_parallel.py |
6 | MC parallelism |
test_phase13_determinism.py |
11 | Determinism verification |
test_phase_13_postmortem.py |
28 | Aggregation wiring, selective LOS wiring, integration |