342 lines
13 KiB
Markdown
342 lines
13 KiB
Markdown
# Story 4.5: Time-Budgeted Solving
|
|
|
|
Status: done
|
|
|
|
<!-- Note: Validation is optional. Run validate-create-story for quality check before dev-story. -->
|
|
|
|
## Story
|
|
|
|
As a HIL engineer (Sarah),
|
|
I want strict timeout with graceful degradation,
|
|
so that real-time constraints are never violated.
|
|
|
|
## Acceptance Criteria
|
|
|
|
1. **Strict Timeout Enforcement** (AC: #1)
|
|
- Given solver with timeout = 1000ms
|
|
- When time budget exceeded
|
|
- Then solver stops immediately (no iteration continues past timeout)
|
|
- And timeout is checked at each iteration start
|
|
|
|
2. **Best State Return on Timeout** (AC: #2)
|
|
- Given solver that times out
|
|
- When returning from timeout
|
|
- Then returns `ConvergedState` with `status = TimedOutWithBestState`
|
|
- And `state` contains the best-known state (lowest residual norm encountered)
|
|
- And `iterations` contains the count of completed iterations
|
|
- And `final_residual` contains the best residual norm
|
|
|
|
3. **HIL Zero-Order Hold (ZOH) Support** (AC: #3)
|
|
- Given HIL scenario with previous state available
|
|
- When timeout occurs
|
|
- Then solver can optionally return previous state instead of current best
|
|
- And `zoh_fallback: bool` config option controls this behavior
|
|
|
|
4. **Timeout Across Fallback Switches** (AC: #4)
|
|
- Given `FallbackSolver` with timeout configured
|
|
- When fallback occurs between Newton and Picard
|
|
- Then timeout applies to total solving time (already implemented in Story 4.4)
|
|
- And best state is preserved across solver switches
|
|
|
|
5. **Pre-Allocated Buffers** (AC: #5)
|
|
- Given a finalized `System`
|
|
- When the solver initializes
|
|
- Then all buffers for tracking best state are pre-allocated
|
|
- And no heap allocation occurs during iteration loop
|
|
|
|
6. **Configurable Timeout Behavior** (AC: #6)
|
|
- Given `TimeoutConfig` struct
|
|
- When setting `return_best_state_on_timeout: false`
|
|
- Then solver returns `SolverError::Timeout` instead of `ConvergedState`
|
|
- And `zoh_fallback` and `return_best_state_on_timeout` are configurable
|
|
|
|
## Tasks / Subtasks
|
|
|
|
- [ ] Implement `TimeoutConfig` struct in `crates/solver/src/solver.rs` (AC: #6)
|
|
- [ ] Add `return_best_state_on_timeout: bool` (default: true)
|
|
- [ ] Add `zoh_fallback: bool` (default: false)
|
|
- [ ] Implement `Default` trait
|
|
|
|
- [ ] Add best-state tracking to `NewtonConfig` (AC: #1, #2, #5)
|
|
- [ ] Add `best_state: Vec<f64>` pre-allocated buffer
|
|
- [ ] Add `best_residual: f64` tracking variable
|
|
- [ ] Update best state when residual improves
|
|
- [ ] Return `ConvergedState` with `TimedOutWithBestState` on timeout
|
|
|
|
- [ ] Add best-state tracking to `PicardConfig` (AC: #1, #2, #5)
|
|
- [ ] Add `best_state: Vec<f64>` pre-allocated buffer
|
|
- [ ] Add `best_residual: f64` tracking variable
|
|
- [ ] Update best state when residual improves
|
|
- [ ] Return `ConvergedState` with `TimedOutWithBestState` on timeout
|
|
|
|
- [ ] Update `FallbackSolver` for best-state preservation (AC: #4)
|
|
- [ ] Track best state across solver switches
|
|
- [ ] Return best state on timeout regardless of which solver was active
|
|
|
|
- [ ] Implement ZOH fallback support (AC: #3)
|
|
- [ ] Add `previous_state: Option<Vec<f64>>` to solver configs
|
|
- [ ] On timeout with `zoh_fallback: true`, return previous state if available
|
|
|
|
- [ ] Integration tests (AC: #1-#6)
|
|
- [ ] Test timeout returns best state (not error)
|
|
- [ ] Test best state is actually the lowest residual encountered
|
|
- [ ] Test ZOH fallback returns previous state
|
|
- [ ] Test timeout behavior with `return_best_state_on_timeout: false`
|
|
- [ ] Test timeout across fallback switches preserves best state
|
|
- [ ] Test no heap allocation during iteration with best-state tracking
|
|
|
|
## Dev Notes
|
|
|
|
### Epic Context
|
|
|
|
**Epic 4: Intelligent Solver Engine** — Solve any system with < 1s guarantee, Newton-Raphson ↔ Sequential Substitution fallback.
|
|
|
|
**Story Dependencies:**
|
|
- **Story 4.1 (Solver Trait Abstraction)** — DONE: `Solver` trait, `SolverError`, `ConvergedState` defined
|
|
- **Story 4.2 (Newton-Raphson Implementation)** — DONE: Full Newton-Raphson with line search, timeout, divergence detection
|
|
- **Story 4.3 (Sequential Substitution)** — DONE: Picard implementation with relaxation, timeout, divergence detection
|
|
- **Story 4.4 (Intelligent Fallback Strategy)** — DONE: FallbackSolver with timeout across switches
|
|
- **Story 4.6 (Smart Initialization Heuristic)** — NEXT: Automatic initial guesses from temperatures
|
|
|
|
**FRs covered:** FR17 (configurable timeout), FR18 (best state on timeout), FR20 (convergence criterion)
|
|
|
|
### Architecture Context
|
|
|
|
**Technical Stack:**
|
|
- `thiserror` for error handling (already in solver)
|
|
- `tracing` for observability (already in solver)
|
|
- `std::time::Instant` for timeout enforcement
|
|
|
|
**Code Structure:**
|
|
- `crates/solver/src/solver.rs` — NewtonConfig, PicardConfig, FallbackSolver modifications
|
|
- `crates/solver/src/system.rs` — EXISTING: `System` with `compute_residuals()`
|
|
|
|
**Relevant Architecture Decisions:**
|
|
- **No allocation in hot path:** Pre-allocate best-state buffers before iteration loop [Source: architecture.md]
|
|
- **Error Handling:** Centralized error enum with `thiserror` [Source: architecture.md]
|
|
- **Zero-panic policy:** All operations return `Result` [Source: architecture.md]
|
|
- **HIL latency < 20ms:** Real-time constraints must be respected [Source: prd.md NFR6]
|
|
|
|
### Developer Context
|
|
|
|
**Existing Implementation (Story 4.1 + 4.2 + 4.3 + 4.4):**
|
|
|
|
```rust
|
|
// crates/solver/src/solver.rs - EXISTING
|
|
|
|
pub enum ConvergenceStatus {
|
|
Converged,
|
|
TimedOutWithBestState, // Already defined for this story!
|
|
}
|
|
|
|
pub struct ConvergedState {
|
|
pub state: Vec<f64>,
|
|
pub iterations: usize,
|
|
pub final_residual: f64,
|
|
pub status: ConvergenceStatus,
|
|
}
|
|
|
|
// Current timeout behavior (Story 4.2/4.3):
|
|
// Returns Err(SolverError::Timeout { timeout_ms }) on timeout
|
|
// This story changes it to return Ok(ConvergedState { status: TimedOutWithBestState })
|
|
```
|
|
|
|
**Current Timeout Implementation:**
|
|
```rust
|
|
// In NewtonConfig::solve() and PicardConfig::solve()
|
|
if let Some(timeout) = self.timeout {
|
|
if start_time.elapsed() > timeout {
|
|
tracing::info!(...);
|
|
return Err(SolverError::Timeout { timeout_ms: ... });
|
|
}
|
|
}
|
|
```
|
|
|
|
**What Needs to Change:**
|
|
1. Track best state during iteration (pre-allocated buffer)
|
|
2. On timeout, return `Ok(ConvergedState { status: TimedOutWithBestState, ... })`
|
|
3. Make this behavior configurable via `TimeoutConfig`
|
|
|
|
### Technical Requirements
|
|
|
|
**Best-State Tracking Algorithm:**
|
|
|
|
```
|
|
Input: System, timeout
|
|
Output: ConvergedState (Converged or TimedOutWithBestState)
|
|
|
|
1. Initialize:
|
|
- best_state = pre-allocated buffer (copy of initial state)
|
|
- best_residual = initial residual norm
|
|
- start_time = Instant::now()
|
|
|
|
2. Each iteration:
|
|
a. Check timeout BEFORE starting iteration
|
|
b. Compute residuals and update state
|
|
c. If new residual < best_residual:
|
|
- Copy current state to best_state
|
|
- Update best_residual = new residual
|
|
d. Check convergence
|
|
|
|
3. On timeout:
|
|
- If return_best_state_on_timeout:
|
|
- Return Ok(ConvergedState {
|
|
state: best_state,
|
|
iterations: completed_iterations,
|
|
final_residual: best_residual,
|
|
status: TimedOutWithBestState,
|
|
})
|
|
- Else:
|
|
- Return Err(SolverError::Timeout { timeout_ms })
|
|
```
|
|
|
|
**Key Design Decisions:**
|
|
|
|
| Decision | Rationale |
|
|
|----------|-----------|
|
|
| Check timeout at iteration start | Guarantees no iteration exceeds budget |
|
|
| Pre-allocate best_state buffer | No heap allocation in hot path (NFR4) |
|
|
| Track best residual, not latest | Best state is more useful for HIL |
|
|
| Configurable return behavior | Some users prefer error on timeout |
|
|
| ZOH fallback optional | HIL-specific feature, not always needed |
|
|
|
|
**TimeoutConfig Structure:**
|
|
|
|
```rust
|
|
pub struct TimeoutConfig {
|
|
/// Return best-known state on timeout instead of error.
|
|
/// Default: true (graceful degradation for HIL)
|
|
pub return_best_state_on_timeout: bool,
|
|
|
|
/// On timeout, return previous state (ZOH) instead of current best.
|
|
/// Requires `previous_state` to be set before solving.
|
|
/// Default: false
|
|
pub zoh_fallback: bool,
|
|
}
|
|
```
|
|
|
|
**Integration with Existing Configs:**
|
|
|
|
```rust
|
|
pub struct NewtonConfig {
|
|
// ... existing fields ...
|
|
pub timeout: Option<Duration>,
|
|
|
|
// NEW: Timeout behavior configuration
|
|
pub timeout_config: TimeoutConfig,
|
|
|
|
// NEW: Pre-allocated buffer for best state tracking
|
|
// (allocated once in solve(), not stored in config)
|
|
}
|
|
|
|
pub struct PicardConfig {
|
|
// ... existing fields ...
|
|
pub timeout: Option<Duration>,
|
|
|
|
// NEW: Timeout behavior configuration
|
|
pub timeout_config: TimeoutConfig,
|
|
}
|
|
```
|
|
|
|
**ZOH (Zero-Order Hold) for HIL:**
|
|
|
|
```rust
|
|
impl NewtonConfig {
|
|
/// Set previous state for ZOH fallback on timeout.
|
|
pub fn with_previous_state(mut self, state: Vec<f64>) -> Self {
|
|
self.previous_state = Some(state);
|
|
self
|
|
}
|
|
|
|
// In solve():
|
|
// On timeout with zoh_fallback=true and previous_state available:
|
|
// Return previous_state instead of best_state
|
|
}
|
|
```
|
|
|
|
### Architecture Compliance
|
|
|
|
- **NewType pattern:** Use `Pressure`, `Temperature` from core where applicable
|
|
- **No bare f64** in public API where physical meaning exists
|
|
- **tracing:** Use `tracing::info!` for timeout events, `tracing::debug!` for best-state updates
|
|
- **Result<T, E>:** On timeout with `return_best_state_on_timeout: true`, return `Ok(ConvergedState)`
|
|
- **approx:** Use `assert_relative_eq!` in tests for floating-point comparisons
|
|
- **Pre-allocation:** Best-state buffer allocated once before iteration loop
|
|
|
|
### Library/Framework Requirements
|
|
|
|
- **thiserror** — Error enum derive (already in solver)
|
|
- **tracing** — Structured logging (already in solver)
|
|
- **std::time::Instant** — Timeout enforcement
|
|
|
|
### File Structure Requirements
|
|
|
|
**Modified files:**
|
|
- `crates/solver/src/solver.rs` — Add `TimeoutConfig`, modify `NewtonConfig`, `PicardConfig`, `FallbackSolver`
|
|
|
|
**Tests:**
|
|
- Unit tests in `solver.rs` (timeout behavior, best-state tracking, ZOH fallback)
|
|
- Integration tests in `tests/` directory (full system solving with timeout)
|
|
|
|
### Testing Requirements
|
|
|
|
**Unit Tests:**
|
|
- TimeoutConfig defaults are sensible
|
|
- Best state is tracked correctly during iteration
|
|
- Timeout returns `ConvergedState` with `TimedOutWithBestState`
|
|
- ZOH fallback returns previous state when configured
|
|
- `return_best_state_on_timeout: false` returns error on timeout
|
|
|
|
**Integration Tests:**
|
|
- System that times out returns best state (not error)
|
|
- Best state has lower residual than initial state
|
|
- Timeout across fallback switches preserves best state
|
|
- HIL scenario: ZOH fallback returns previous state
|
|
|
|
**Performance Tests:**
|
|
- No heap allocation during iteration with best-state tracking
|
|
- Timeout check overhead is negligible (< 1μs per check)
|
|
|
|
### Previous Story Intelligence (4.4)
|
|
|
|
**FallbackSolver Implementation Complete:**
|
|
- `FallbackConfig` with `fallback_enabled`, `return_to_newton_threshold`, `max_fallback_switches`
|
|
- `FallbackSolver` wrapping `NewtonConfig` and `PicardConfig`
|
|
- Timeout applies to total solving time across switches
|
|
- Pre-allocated buffers pattern established
|
|
|
|
**Key Patterns to Follow:**
|
|
- Use `residual_norm()` helper for L2 norm calculation
|
|
- Use `tracing::debug!` for iteration logging
|
|
- Use `tracing::info!` for timeout events
|
|
- Return `ConvergedState::new()` on success
|
|
|
|
**Best-State Tracking Considerations:**
|
|
- Track best state in FallbackSolver across solver switches
|
|
- Each underlying solver (Newton/Picard) tracks its own best state
|
|
- FallbackSolver preserves best state when switching
|
|
|
|
### Git Intelligence
|
|
|
|
Recent commits show:
|
|
- `be70a7a` — feat(core): implement physical types with NewType pattern
|
|
- Epic 1-3 complete (components, fluids, topology)
|
|
- Story 4.1-4.4 complete (Solver trait, Newton, Picard, Fallback)
|
|
- Ready for Time-Budgeted Solving implementation
|
|
|
|
### Project Context Reference
|
|
|
|
- **FR17:** [Source: epics.md — Solver respects configurable time budget (timeout)]
|
|
- **FR18:** [Source: epics.md — On timeout, solver returns best known state with NonConverged status]
|
|
- **FR20:** [Source: epics.md — Convergence criterion checks Delta Pressure < 1 Pa (1e-5 bar)]
|
|
- **NFR1:** [Source: prd.md — Steady State convergence time < 1 second for standard cycle in Cold Start]
|
|
- **NFR4:** [Source: prd.md — No dynamic allocation in solver loop (pre-calculated allocation only)]
|
|
- **NFR6:** [Source: prd.md — HIL latency < 20 ms for real-time integration with PLC]
|
|
- **NFR10:** [Source: prd.md — Graceful error handling: timeout, non-convergence, saturation return explicit Result<T, Error>]
|
|
- **Solver Architecture:** [Source: architecture.md — Trait-based static polymorphism with enum dispatch]
|
|
- **Error Handling:** [Source: architecture.md — Centralized error enum with thiserror]
|
|
|
|
### Story Completion Status
|
|
|
|
- **Status:** ready-for-dev
|
|
- **Completion note:** Ultimate context engine analysis completed — comprehensive developer guide created |