# Story 4.5: Time-Budgeted Solving Status: done ## Story As a HIL engineer (Sarah), I want strict timeout with graceful degradation, so that real-time constraints are never violated. ## Acceptance Criteria 1. **Strict Timeout Enforcement** (AC: #1) - Given solver with timeout = 1000ms - When time budget exceeded - Then solver stops immediately (no iteration continues past timeout) - And timeout is checked at each iteration start 2. **Best State Return on Timeout** (AC: #2) - Given solver that times out - When returning from timeout - Then returns `ConvergedState` with `status = TimedOutWithBestState` - And `state` contains the best-known state (lowest residual norm encountered) - And `iterations` contains the count of completed iterations - And `final_residual` contains the best residual norm 3. **HIL Zero-Order Hold (ZOH) Support** (AC: #3) - Given HIL scenario with previous state available - When timeout occurs - Then solver can optionally return previous state instead of current best - And `zoh_fallback: bool` config option controls this behavior 4. **Timeout Across Fallback Switches** (AC: #4) - Given `FallbackSolver` with timeout configured - When fallback occurs between Newton and Picard - Then timeout applies to total solving time (already implemented in Story 4.4) - And best state is preserved across solver switches 5. **Pre-Allocated Buffers** (AC: #5) - Given a finalized `System` - When the solver initializes - Then all buffers for tracking best state are pre-allocated - And no heap allocation occurs during iteration loop 6. **Configurable Timeout Behavior** (AC: #6) - Given `TimeoutConfig` struct - When setting `return_best_state_on_timeout: false` - Then solver returns `SolverError::Timeout` instead of `ConvergedState` - And `zoh_fallback` and `return_best_state_on_timeout` are configurable ## Tasks / Subtasks - [ ] Implement `TimeoutConfig` struct in `crates/solver/src/solver.rs` (AC: #6) - [ ] Add `return_best_state_on_timeout: bool` (default: true) - [ ] Add `zoh_fallback: bool` (default: false) - [ ] Implement `Default` trait - [ ] Add best-state tracking to `NewtonConfig` (AC: #1, #2, #5) - [ ] Add `best_state: Vec` pre-allocated buffer - [ ] Add `best_residual: f64` tracking variable - [ ] Update best state when residual improves - [ ] Return `ConvergedState` with `TimedOutWithBestState` on timeout - [ ] Add best-state tracking to `PicardConfig` (AC: #1, #2, #5) - [ ] Add `best_state: Vec` pre-allocated buffer - [ ] Add `best_residual: f64` tracking variable - [ ] Update best state when residual improves - [ ] Return `ConvergedState` with `TimedOutWithBestState` on timeout - [ ] Update `FallbackSolver` for best-state preservation (AC: #4) - [ ] Track best state across solver switches - [ ] Return best state on timeout regardless of which solver was active - [ ] Implement ZOH fallback support (AC: #3) - [ ] Add `previous_state: Option>` to solver configs - [ ] On timeout with `zoh_fallback: true`, return previous state if available - [ ] Integration tests (AC: #1-#6) - [ ] Test timeout returns best state (not error) - [ ] Test best state is actually the lowest residual encountered - [ ] Test ZOH fallback returns previous state - [ ] Test timeout behavior with `return_best_state_on_timeout: false` - [ ] Test timeout across fallback switches preserves best state - [ ] Test no heap allocation during iteration with best-state tracking ## Dev Notes ### Epic Context **Epic 4: Intelligent Solver Engine** — Solve any system with < 1s guarantee, Newton-Raphson ↔ Sequential Substitution fallback. **Story Dependencies:** - **Story 4.1 (Solver Trait Abstraction)** — DONE: `Solver` trait, `SolverError`, `ConvergedState` defined - **Story 4.2 (Newton-Raphson Implementation)** — DONE: Full Newton-Raphson with line search, timeout, divergence detection - **Story 4.3 (Sequential Substitution)** — DONE: Picard implementation with relaxation, timeout, divergence detection - **Story 4.4 (Intelligent Fallback Strategy)** — DONE: FallbackSolver with timeout across switches - **Story 4.6 (Smart Initialization Heuristic)** — NEXT: Automatic initial guesses from temperatures **FRs covered:** FR17 (configurable timeout), FR18 (best state on timeout), FR20 (convergence criterion) ### Architecture Context **Technical Stack:** - `thiserror` for error handling (already in solver) - `tracing` for observability (already in solver) - `std::time::Instant` for timeout enforcement **Code Structure:** - `crates/solver/src/solver.rs` — NewtonConfig, PicardConfig, FallbackSolver modifications - `crates/solver/src/system.rs` — EXISTING: `System` with `compute_residuals()` **Relevant Architecture Decisions:** - **No allocation in hot path:** Pre-allocate best-state buffers before iteration loop [Source: architecture.md] - **Error Handling:** Centralized error enum with `thiserror` [Source: architecture.md] - **Zero-panic policy:** All operations return `Result` [Source: architecture.md] - **HIL latency < 20ms:** Real-time constraints must be respected [Source: prd.md NFR6] ### Developer Context **Existing Implementation (Story 4.1 + 4.2 + 4.3 + 4.4):** ```rust // crates/solver/src/solver.rs - EXISTING pub enum ConvergenceStatus { Converged, TimedOutWithBestState, // Already defined for this story! } pub struct ConvergedState { pub state: Vec, pub iterations: usize, pub final_residual: f64, pub status: ConvergenceStatus, } // Current timeout behavior (Story 4.2/4.3): // Returns Err(SolverError::Timeout { timeout_ms }) on timeout // This story changes it to return Ok(ConvergedState { status: TimedOutWithBestState }) ``` **Current Timeout Implementation:** ```rust // In NewtonConfig::solve() and PicardConfig::solve() if let Some(timeout) = self.timeout { if start_time.elapsed() > timeout { tracing::info!(...); return Err(SolverError::Timeout { timeout_ms: ... }); } } ``` **What Needs to Change:** 1. Track best state during iteration (pre-allocated buffer) 2. On timeout, return `Ok(ConvergedState { status: TimedOutWithBestState, ... })` 3. Make this behavior configurable via `TimeoutConfig` ### Technical Requirements **Best-State Tracking Algorithm:** ``` Input: System, timeout Output: ConvergedState (Converged or TimedOutWithBestState) 1. Initialize: - best_state = pre-allocated buffer (copy of initial state) - best_residual = initial residual norm - start_time = Instant::now() 2. Each iteration: a. Check timeout BEFORE starting iteration b. Compute residuals and update state c. If new residual < best_residual: - Copy current state to best_state - Update best_residual = new residual d. Check convergence 3. On timeout: - If return_best_state_on_timeout: - Return Ok(ConvergedState { state: best_state, iterations: completed_iterations, final_residual: best_residual, status: TimedOutWithBestState, }) - Else: - Return Err(SolverError::Timeout { timeout_ms }) ``` **Key Design Decisions:** | Decision | Rationale | |----------|-----------| | Check timeout at iteration start | Guarantees no iteration exceeds budget | | Pre-allocate best_state buffer | No heap allocation in hot path (NFR4) | | Track best residual, not latest | Best state is more useful for HIL | | Configurable return behavior | Some users prefer error on timeout | | ZOH fallback optional | HIL-specific feature, not always needed | **TimeoutConfig Structure:** ```rust pub struct TimeoutConfig { /// Return best-known state on timeout instead of error. /// Default: true (graceful degradation for HIL) pub return_best_state_on_timeout: bool, /// On timeout, return previous state (ZOH) instead of current best. /// Requires `previous_state` to be set before solving. /// Default: false pub zoh_fallback: bool, } ``` **Integration with Existing Configs:** ```rust pub struct NewtonConfig { // ... existing fields ... pub timeout: Option, // NEW: Timeout behavior configuration pub timeout_config: TimeoutConfig, // NEW: Pre-allocated buffer for best state tracking // (allocated once in solve(), not stored in config) } pub struct PicardConfig { // ... existing fields ... pub timeout: Option, // NEW: Timeout behavior configuration pub timeout_config: TimeoutConfig, } ``` **ZOH (Zero-Order Hold) for HIL:** ```rust impl NewtonConfig { /// Set previous state for ZOH fallback on timeout. pub fn with_previous_state(mut self, state: Vec) -> Self { self.previous_state = Some(state); self } // In solve(): // On timeout with zoh_fallback=true and previous_state available: // Return previous_state instead of best_state } ``` ### Architecture Compliance - **NewType pattern:** Use `Pressure`, `Temperature` from core where applicable - **No bare f64** in public API where physical meaning exists - **tracing:** Use `tracing::info!` for timeout events, `tracing::debug!` for best-state updates - **Result:** On timeout with `return_best_state_on_timeout: true`, return `Ok(ConvergedState)` - **approx:** Use `assert_relative_eq!` in tests for floating-point comparisons - **Pre-allocation:** Best-state buffer allocated once before iteration loop ### Library/Framework Requirements - **thiserror** — Error enum derive (already in solver) - **tracing** — Structured logging (already in solver) - **std::time::Instant** — Timeout enforcement ### File Structure Requirements **Modified files:** - `crates/solver/src/solver.rs` — Add `TimeoutConfig`, modify `NewtonConfig`, `PicardConfig`, `FallbackSolver` **Tests:** - Unit tests in `solver.rs` (timeout behavior, best-state tracking, ZOH fallback) - Integration tests in `tests/` directory (full system solving with timeout) ### Testing Requirements **Unit Tests:** - TimeoutConfig defaults are sensible - Best state is tracked correctly during iteration - Timeout returns `ConvergedState` with `TimedOutWithBestState` - ZOH fallback returns previous state when configured - `return_best_state_on_timeout: false` returns error on timeout **Integration Tests:** - System that times out returns best state (not error) - Best state has lower residual than initial state - Timeout across fallback switches preserves best state - HIL scenario: ZOH fallback returns previous state **Performance Tests:** - No heap allocation during iteration with best-state tracking - Timeout check overhead is negligible (< 1μs per check) ### Previous Story Intelligence (4.4) **FallbackSolver Implementation Complete:** - `FallbackConfig` with `fallback_enabled`, `return_to_newton_threshold`, `max_fallback_switches` - `FallbackSolver` wrapping `NewtonConfig` and `PicardConfig` - Timeout applies to total solving time across switches - Pre-allocated buffers pattern established **Key Patterns to Follow:** - Use `residual_norm()` helper for L2 norm calculation - Use `tracing::debug!` for iteration logging - Use `tracing::info!` for timeout events - Return `ConvergedState::new()` on success **Best-State Tracking Considerations:** - Track best state in FallbackSolver across solver switches - Each underlying solver (Newton/Picard) tracks its own best state - FallbackSolver preserves best state when switching ### Git Intelligence Recent commits show: - `be70a7a` — feat(core): implement physical types with NewType pattern - Epic 1-3 complete (components, fluids, topology) - Story 4.1-4.4 complete (Solver trait, Newton, Picard, Fallback) - Ready for Time-Budgeted Solving implementation ### Project Context Reference - **FR17:** [Source: epics.md — Solver respects configurable time budget (timeout)] - **FR18:** [Source: epics.md — On timeout, solver returns best known state with NonConverged status] - **FR20:** [Source: epics.md — Convergence criterion checks Delta Pressure < 1 Pa (1e-5 bar)] - **NFR1:** [Source: prd.md — Steady State convergence time < 1 second for standard cycle in Cold Start] - **NFR4:** [Source: prd.md — No dynamic allocation in solver loop (pre-calculated allocation only)] - **NFR6:** [Source: prd.md — HIL latency < 20 ms for real-time integration with PLC] - **NFR10:** [Source: prd.md — Graceful error handling: timeout, non-convergence, saturation return explicit Result] - **Solver Architecture:** [Source: architecture.md — Trait-based static polymorphism with enum dispatch] - **Error Handling:** [Source: architecture.md — Centralized error enum with thiserror] ### Story Completion Status - **Status:** ready-for-dev - **Completion note:** Ultimate context engine analysis completed — comprehensive developer guide created