Sepehr 2d3d19665b docs(bmad): sync status and renumber Epic 8 (Fluid-Component integration)

2026-02-20 22:29:42 +01:00

13 KiB

Raw Blame History

Story 4.5: Time-Budgeted Solving

Status: done

Story

As a HIL engineer (Sarah), I want strict timeout with graceful degradation, so that real-time constraints are never violated.

Acceptance Criteria

Strict Timeout Enforcement (AC: #1)
- Given solver with timeout = 1000ms
- When time budget exceeded
- Then solver stops immediately (no iteration continues past timeout)
- And timeout is checked at each iteration start
Best State Return on Timeout (AC: #2)
- Given solver that times out
- When returning from timeout
- Then returns ConvergedState with status = TimedOutWithBestState
- And state contains the best-known state (lowest residual norm encountered)
- And iterations contains the count of completed iterations
- And final_residual contains the best residual norm
HIL Zero-Order Hold (ZOH) Support (AC: #3)
- Given HIL scenario with previous state available
- When timeout occurs
- Then solver can optionally return previous state instead of current best
- And zoh_fallback: bool config option controls this behavior
Timeout Across Fallback Switches (AC: #4)
- Given FallbackSolver with timeout configured
- When fallback occurs between Newton and Picard
- Then timeout applies to total solving time (already implemented in Story 4.4)
- And best state is preserved across solver switches
Pre-Allocated Buffers (AC: #5)
- Given a finalized System
- When the solver initializes
- Then all buffers for tracking best state are pre-allocated
- And no heap allocation occurs during iteration loop
Configurable Timeout Behavior (AC: #6)
- Given TimeoutConfig struct
- When setting return_best_state_on_timeout: false
- Then solver returns SolverError::Timeout instead of ConvergedState
- And zoh_fallback and return_best_state_on_timeout are configurable

Tasks / Subtasks

Implement TimeoutConfig struct in crates/solver/src/solver.rs (AC: #6)
- Add return_best_state_on_timeout: bool (default: true)
- Add zoh_fallback: bool (default: false)
- Implement Default trait
Add best-state tracking to NewtonConfig (AC: #1, #2, #5)
- Add best_state: Vec<f64> pre-allocated buffer
- Add best_residual: f64 tracking variable
- Update best state when residual improves
- Return ConvergedState with TimedOutWithBestState on timeout
Add best-state tracking to PicardConfig (AC: #1, #2, #5)
- Add best_state: Vec<f64> pre-allocated buffer
- Add best_residual: f64 tracking variable
- Update best state when residual improves
- Return ConvergedState with TimedOutWithBestState on timeout
Update FallbackSolver for best-state preservation (AC: #4)
- Track best state across solver switches
- Return best state on timeout regardless of which solver was active
Implement ZOH fallback support (AC: #3)
- Add previous_state: Option<Vec<f64>> to solver configs
- On timeout with zoh_fallback: true, return previous state if available
Integration tests (AC: #1-#6)
- Test timeout returns best state (not error)
- Test best state is actually the lowest residual encountered
- Test ZOH fallback returns previous state
- Test timeout behavior with return_best_state_on_timeout: false
- Test timeout across fallback switches preserves best state
- Test no heap allocation during iteration with best-state tracking

Dev Notes

Epic Context

Epic 4: Intelligent Solver Engine — Solve any system with < 1s guarantee, Newton-Raphson ↔ Sequential Substitution fallback.

Story Dependencies:

Story 4.1 (Solver Trait Abstraction) — DONE: Solver trait, SolverError, ConvergedState defined
Story 4.2 (Newton-Raphson Implementation) — DONE: Full Newton-Raphson with line search, timeout, divergence detection
Story 4.3 (Sequential Substitution) — DONE: Picard implementation with relaxation, timeout, divergence detection
Story 4.4 (Intelligent Fallback Strategy) — DONE: FallbackSolver with timeout across switches
Story 4.6 (Smart Initialization Heuristic) — NEXT: Automatic initial guesses from temperatures

FRs covered: FR17 (configurable timeout), FR18 (best state on timeout), FR20 (convergence criterion)

Architecture Context

Technical Stack:

thiserror for error handling (already in solver)
tracing for observability (already in solver)
std::time::Instant for timeout enforcement

Code Structure:

crates/solver/src/solver.rs — NewtonConfig, PicardConfig, FallbackSolver modifications
crates/solver/src/system.rs — EXISTING: System with compute_residuals()

Relevant Architecture Decisions:

No allocation in hot path: Pre-allocate best-state buffers before iteration loop [Source: architecture.md]
Error Handling: Centralized error enum with thiserror [Source: architecture.md]
Zero-panic policy: All operations return Result [Source: architecture.md]
HIL latency < 20ms: Real-time constraints must be respected [Source: prd.md NFR6]

Developer Context

Existing Implementation (Story 4.1 + 4.2 + 4.3 + 4.4):

// crates/solver/src/solver.rs - EXISTING

pub enum ConvergenceStatus {
    Converged,
    TimedOutWithBestState,  // Already defined for this story!
}

pub struct ConvergedState {
    pub state: Vec<f64>,
    pub iterations: usize,
    pub final_residual: f64,
    pub status: ConvergenceStatus,
}

// Current timeout behavior (Story 4.2/4.3):
// Returns Err(SolverError::Timeout { timeout_ms }) on timeout
// This story changes it to return Ok(ConvergedState { status: TimedOutWithBestState })

Current Timeout Implementation:

// In NewtonConfig::solve() and PicardConfig::solve()
if let Some(timeout) = self.timeout {
    if start_time.elapsed() > timeout {
        tracing::info!(...);
        return Err(SolverError::Timeout { timeout_ms: ... });
    }
}

What Needs to Change:

Track best state during iteration (pre-allocated buffer)
On timeout, return Ok(ConvergedState { status: TimedOutWithBestState, ... })
Make this behavior configurable via TimeoutConfig

Technical Requirements

Best-State Tracking Algorithm:

Input: System, timeout
Output: ConvergedState (Converged or TimedOutWithBestState)

1. Initialize:
   - best_state = pre-allocated buffer (copy of initial state)
   - best_residual = initial residual norm
   - start_time = Instant::now()

2. Each iteration:
   a. Check timeout BEFORE starting iteration
   b. Compute residuals and update state
   c. If new residual < best_residual:
      - Copy current state to best_state
      - Update best_residual = new residual
   d. Check convergence

3. On timeout:
   - If return_best_state_on_timeout:
     - Return Ok(ConvergedState {
         state: best_state,
         iterations: completed_iterations,
         final_residual: best_residual,
         status: TimedOutWithBestState,
       })
   - Else:
     - Return Err(SolverError::Timeout { timeout_ms })

Key Design Decisions:

Decision	Rationale
Check timeout at iteration start	Guarantees no iteration exceeds budget
Pre-allocate best_state buffer	No heap allocation in hot path (NFR4)
Track best residual, not latest	Best state is more useful for HIL
Configurable return behavior	Some users prefer error on timeout
ZOH fallback optional	HIL-specific feature, not always needed

TimeoutConfig Structure:

pub struct TimeoutConfig {
    /// Return best-known state on timeout instead of error.
    /// Default: true (graceful degradation for HIL)
    pub return_best_state_on_timeout: bool,
    
    /// On timeout, return previous state (ZOH) instead of current best.
    /// Requires `previous_state` to be set before solving.
    /// Default: false
    pub zoh_fallback: bool,
}

Integration with Existing Configs:

pub struct NewtonConfig {
    // ... existing fields ...
    pub timeout: Option<Duration>,
    
    // NEW: Timeout behavior configuration
    pub timeout_config: TimeoutConfig,
    
    // NEW: Pre-allocated buffer for best state tracking
    // (allocated once in solve(), not stored in config)
}

pub struct PicardConfig {
    // ... existing fields ...
    pub timeout: Option<Duration>,
    
    // NEW: Timeout behavior configuration
    pub timeout_config: TimeoutConfig,
}

ZOH (Zero-Order Hold) for HIL:

impl NewtonConfig {
    /// Set previous state for ZOH fallback on timeout.
    pub fn with_previous_state(mut self, state: Vec<f64>) -> Self {
        self.previous_state = Some(state);
        self
    }
    
    // In solve():
    // On timeout with zoh_fallback=true and previous_state available:
    // Return previous_state instead of best_state
}

Architecture Compliance

NewType pattern: Use Pressure, Temperature from core where applicable
No bare f64 in public API where physical meaning exists
tracing: Use tracing::info! for timeout events, tracing::debug! for best-state updates
Result<T, E>: On timeout with return_best_state_on_timeout: true, return Ok(ConvergedState)
approx: Use assert_relative_eq! in tests for floating-point comparisons
Pre-allocation: Best-state buffer allocated once before iteration loop

Library/Framework Requirements

thiserror — Error enum derive (already in solver)
tracing — Structured logging (already in solver)
std::time::Instant — Timeout enforcement

File Structure Requirements

Modified files:

crates/solver/src/solver.rs — Add TimeoutConfig, modify NewtonConfig, PicardConfig, FallbackSolver

Tests:

Unit tests in solver.rs (timeout behavior, best-state tracking, ZOH fallback)
Integration tests in tests/ directory (full system solving with timeout)

Testing Requirements

Unit Tests:

TimeoutConfig defaults are sensible
Best state is tracked correctly during iteration
Timeout returns ConvergedState with TimedOutWithBestState
ZOH fallback returns previous state when configured
return_best_state_on_timeout: false returns error on timeout

Integration Tests:

System that times out returns best state (not error)
Best state has lower residual than initial state
Timeout across fallback switches preserves best state
HIL scenario: ZOH fallback returns previous state

Performance Tests:

No heap allocation during iteration with best-state tracking
Timeout check overhead is negligible (< 1μs per check)

Previous Story Intelligence (4.4)

FallbackSolver Implementation Complete:

FallbackConfig with fallback_enabled, return_to_newton_threshold, max_fallback_switches
FallbackSolver wrapping NewtonConfig and PicardConfig
Timeout applies to total solving time across switches
Pre-allocated buffers pattern established

Key Patterns to Follow:

Use residual_norm() helper for L2 norm calculation
Use tracing::debug! for iteration logging
Use tracing::info! for timeout events
Return ConvergedState::new() on success

Best-State Tracking Considerations:

Track best state in FallbackSolver across solver switches
Each underlying solver (Newton/Picard) tracks its own best state
FallbackSolver preserves best state when switching

Git Intelligence

Recent commits show:

be70a7a — feat(core): implement physical types with NewType pattern
Epic 1-3 complete (components, fluids, topology)
Story 4.1-4.4 complete (Solver trait, Newton, Picard, Fallback)
Ready for Time-Budgeted Solving implementation

Project Context Reference

FR17: [Source: epics.md — Solver respects configurable time budget (timeout)]
FR18: [Source: epics.md — On timeout, solver returns best known state with NonConverged status]
FR20: [Source: epics.md — Convergence criterion checks Delta Pressure < 1 Pa (1e-5 bar)]
NFR1: [Source: prd.md — Steady State convergence time < 1 second for standard cycle in Cold Start]
NFR4: [Source: prd.md — No dynamic allocation in solver loop (pre-calculated allocation only)]
NFR6: [Source: prd.md — HIL latency < 20 ms for real-time integration with PLC]
NFR10: [Source: prd.md — Graceful error handling: timeout, non-convergence, saturation return explicit Result<T, Error>]
Solver Architecture: [Source: architecture.md — Trait-based static polymorphism with enum dispatch]
Error Handling: [Source: architecture.md — Centralized error enum with thiserror]

Story Completion Status

Status: ready-for-dev
Completion note: Ultimate context engine analysis completed — comprehensive developer guide created

13 KiB Raw Blame History