379 lines
14 KiB
Markdown
379 lines
14 KiB
Markdown
# Story 4.4: Intelligent Fallback Strategy
|
|
|
|
Status: done
|
|
|
|
<!-- Note: Validation is optional. Run validate-create-story for quality check before dev-story. -->
|
|
|
|
## Story
|
|
|
|
As a simulation user,
|
|
I want automatic fallback with smart return conditions,
|
|
so that convergence is guaranteed without solver oscillation.
|
|
|
|
## Acceptance Criteria
|
|
|
|
1. **Auto-Switch on Newton Divergence** (AC: #1)
|
|
- Given Newton-Raphson diverging
|
|
- When divergence detected (> 3 increasing residuals)
|
|
- Then auto-switch to Sequential Substitution (Picard)
|
|
- And the switch is logged with `tracing::warn!`
|
|
|
|
2. **Return to Newton Only When Stable** (AC: #2)
|
|
- Given Picard iteration converging
|
|
- When residual norm falls below `return_to_newton_threshold`
|
|
- Then attempt to return to Newton-Raphson
|
|
- And if Newton diverges again, stay on Picard permanently
|
|
|
|
3. **Oscillation Prevention** (AC: #3)
|
|
- Given multiple solver switches
|
|
- When switch count exceeds `max_fallback_switches` (default: 2)
|
|
- Then stay on current solver (Picard) permanently
|
|
- And log the decision with `tracing::info!`
|
|
|
|
4. **Configurable Fallback Behavior** (AC: #4)
|
|
- Given a `FallbackConfig` struct
|
|
- When setting `fallback_enabled: false`
|
|
- Then no fallback occurs (pure Newton or Picard)
|
|
- And `return_to_newton_threshold` and `max_fallback_switches` are configurable
|
|
|
|
5. **Timeout Enforcement Across Switches** (AC: #5)
|
|
- Given a solver with timeout configured
|
|
- When fallback occurs
|
|
- Then the timeout applies to the total solving time
|
|
- And each solver inherits the remaining time budget
|
|
|
|
6. **Pre-Allocated Buffers** (AC: #6)
|
|
- Given a finalized `System`
|
|
- When the fallback solver initializes
|
|
- Then all buffers are pre-allocated once
|
|
- And no heap allocation occurs during solver switches
|
|
|
|
## Tasks / Subtasks
|
|
|
|
- [x] Implement `FallbackConfig` struct in `crates/solver/src/solver.rs` (AC: #4)
|
|
- [x] Add `fallback_enabled: bool` (default: true)
|
|
- [x] Add `return_to_newton_threshold: f64` (default: 1e-3)
|
|
- [x] Add `max_fallback_switches: usize` (default: 2)
|
|
- [x] Implement `Default` trait
|
|
|
|
- [x] Implement `solve_with_fallback()` function (AC: #1, #2, #3, #5, #6)
|
|
- [x] Create `FallbackSolver` struct wrapping `NewtonConfig` and `PicardConfig`
|
|
- [x] Implement main fallback logic with state tracking
|
|
- [x] Track `switch_count` and `current_solver` enum
|
|
- [x] Implement Newton → Picard switch on divergence
|
|
- [x] Implement Picard → Newton return when below threshold
|
|
- [x] Implement oscillation prevention (max switches)
|
|
- [x] Handle timeout across solver switches (remaining time)
|
|
- [x] Add `tracing::warn!` for switches, `tracing::info!` for decisions
|
|
|
|
- [x] Implement `Solver` trait for `FallbackSolver` (AC: #1-#6)
|
|
- [x] Delegate to `solve_with_fallback()` in `solve()` method
|
|
- [x] Implement `with_timeout()` builder pattern
|
|
|
|
- [x] Integration tests (AC: #1, #2, #3, #4, #5, #6)
|
|
- [x] Test Newton diverges → Picard converges
|
|
- [x] Test Newton diverges → Picard stabilizes → Newton returns
|
|
- [x] Test oscillation prevention (max switches reached)
|
|
- [x] Test fallback disabled (pure Newton behavior)
|
|
- [x] Test timeout applies across switches
|
|
- [x] Test no heap allocation during switches
|
|
|
|
## Dev Notes
|
|
|
|
### Epic Context
|
|
|
|
**Epic 4: Intelligent Solver Engine** — Solve any system with < 1s guarantee, Newton-Raphson ↔ Sequential Substitution fallback.
|
|
|
|
**Story Dependencies:**
|
|
- **Story 4.1 (Solver Trait Abstraction)** — DONE: `Solver` trait, `SolverError`, `ConvergedState` defined
|
|
- **Story 4.2 (Newton-Raphson Implementation)** — DONE: Full Newton-Raphson with line search, timeout, divergence detection
|
|
- **Story 4.3 (Sequential Substitution)** — DONE: Picard implementation with relaxation, timeout, divergence detection
|
|
- **Story 4.5 (Time-Budgeted Solving)** — NEXT: Extends timeout handling with best-state return
|
|
- **Story 4.8 (Jacobian Freezing)** — Newton-specific optimization, not applicable to fallback
|
|
|
|
**FRs covered:** FR16 (Auto-fallback solver switching), FR17 (timeout), FR18 (best state on timeout), FR20 (convergence criterion)
|
|
|
|
### Architecture Context
|
|
|
|
**Technical Stack:**
|
|
- `thiserror` for error handling (already in solver)
|
|
- `tracing` for observability (already in solver)
|
|
- `std::time::Instant` for timeout enforcement across switches
|
|
|
|
**Code Structure:**
|
|
- `crates/solver/src/solver.rs` — FallbackSolver implementation
|
|
- `crates/solver/src/system.rs` — EXISTING: `System` with `compute_residuals()`
|
|
|
|
**Relevant Architecture Decisions:**
|
|
- **Solver Architecture:** Trait-based static polymorphism with enum dispatch [Source: architecture.md]
|
|
- **No allocation in hot path:** Pre-allocate all buffers before iteration loop [Source: architecture.md]
|
|
- **Error Handling:** Centralized error enum with `thiserror` [Source: architecture.md]
|
|
- **Zero-panic policy:** All operations return `Result` [Source: architecture.md]
|
|
|
|
### Developer Context
|
|
|
|
**Existing Implementation (Story 4.1 + 4.2 + 4.3):**
|
|
|
|
```rust
|
|
// crates/solver/src/solver.rs
|
|
pub struct NewtonConfig {
|
|
pub max_iterations: usize, // default: 100
|
|
pub tolerance: f64, // default: 1e-6
|
|
pub line_search: bool, // default: false
|
|
pub timeout: Option<Duration>, // default: None
|
|
pub divergence_threshold: f64, // default: 1e10
|
|
// ... other fields
|
|
}
|
|
|
|
pub struct PicardConfig {
|
|
pub max_iterations: usize, // default: 100
|
|
pub tolerance: f64, // default: 1e-6
|
|
pub relaxation_factor: f64, // default: 0.5
|
|
pub timeout: Option<Duration>, // default: None
|
|
pub divergence_threshold: f64, // default: 1e10
|
|
pub divergence_patience: usize, // default: 5
|
|
}
|
|
|
|
pub enum SolverStrategy {
|
|
NewtonRaphson(NewtonConfig),
|
|
SequentialSubstitution(PicardConfig),
|
|
}
|
|
```
|
|
|
|
**Divergence Detection Already Implemented:**
|
|
- Newton: 3 consecutive residual increases → `SolverError::Divergence`
|
|
- Picard: 5 consecutive residual increases → `SolverError::Divergence`
|
|
|
|
### Technical Requirements
|
|
|
|
**Intelligent Fallback Algorithm:**
|
|
|
|
```
|
|
Input: System, FallbackConfig, timeout
|
|
Output: ConvergedState or SolverError
|
|
|
|
1. Initialize:
|
|
- start_time = Instant::now()
|
|
- switch_count = 0
|
|
- current_solver = NewtonRaphson
|
|
- remaining_time = timeout
|
|
|
|
2. Main fallback loop:
|
|
a. Run current solver with remaining_time
|
|
b. If converged → return ConvergedState
|
|
c. If timeout → return Timeout error
|
|
|
|
d. If Divergence and current_solver == NewtonRaphson:
|
|
- If switch_count >= max_fallback_switches:
|
|
- Log "Max switches reached, staying on Newton (will fail)"
|
|
- Return Divergence error
|
|
- Switch to Picard
|
|
- switch_count += 1
|
|
- Log "Newton diverged, switching to Picard (switch #{switch_count})"
|
|
- Continue loop
|
|
|
|
e. If Picard converging and residual < return_to_newton_threshold:
|
|
- If switch_count < max_fallback_switches:
|
|
- Switch to Newton
|
|
- switch_count += 1
|
|
- Log "Picard stabilized, attempting Newton return"
|
|
- Continue loop
|
|
- Else:
|
|
- Stay on Picard until convergence or failure
|
|
|
|
f. If Divergence and current_solver == Picard:
|
|
- Return Divergence error (no more fallbacks)
|
|
|
|
3. Return result
|
|
```
|
|
|
|
**Key Design Decisions:**
|
|
|
|
| Decision | Rationale |
|
|
|----------|-----------|
|
|
| Start with Newton | Quadratic convergence when it works |
|
|
| Max 2 switches | Prevent infinite oscillation |
|
|
| Return threshold 1e-3 | Newton works well near solution |
|
|
| Track remaining time | Timeout applies to total solve |
|
|
| Stay on Picard after max switches | Picard is more robust |
|
|
|
|
**State Tracking:**
|
|
|
|
```rust
|
|
enum CurrentSolver {
|
|
Newton,
|
|
Picard,
|
|
}
|
|
|
|
struct FallbackState {
|
|
current_solver: CurrentSolver,
|
|
switch_count: usize,
|
|
newton_attempts: usize,
|
|
picard_attempts: usize,
|
|
}
|
|
```
|
|
|
|
**Timeout Handling Across Switches:**
|
|
|
|
```rust
|
|
fn solve_with_timeout(&mut self, system: &mut System, timeout: Duration) -> Result<ConvergedState, SolverError> {
|
|
let start_time = Instant::now();
|
|
|
|
loop {
|
|
let elapsed = start_time.elapsed();
|
|
let remaining = timeout.saturating_sub(elapsed);
|
|
|
|
if remaining.is_zero() {
|
|
return Err(SolverError::Timeout { timeout_ms: timeout.as_millis() as u64 });
|
|
}
|
|
|
|
// Run current solver with remaining time
|
|
let solver_timeout = self.current_solver_timeout(remaining);
|
|
match self.run_current_solver(system, solver_timeout) {
|
|
Ok(state) => return Ok(state),
|
|
Err(SolverError::Timeout { .. }) => return Err(SolverError::Timeout { ... }),
|
|
Err(SolverError::Divergence { .. }) => {
|
|
if !self.handle_divergence() {
|
|
return Err(...);
|
|
}
|
|
}
|
|
other => return other,
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Architecture Compliance
|
|
|
|
- **NewType pattern:** Use `Pressure`, `Temperature` from core where applicable
|
|
- **No bare f64** in public API where physical meaning exists
|
|
- **tracing:** Use `tracing::warn!` for switches, `tracing::info!` for decisions
|
|
- **Result<T, E>:** All fallible operations return `Result`
|
|
- **approx:** Use `assert_relative_eq!` in tests for floating-point comparisons
|
|
- **Pre-allocation:** All buffers allocated once before fallback loop
|
|
|
|
### Library/Framework Requirements
|
|
|
|
- **thiserror** — Error enum derive (already in solver)
|
|
- **tracing** — Structured logging (already in solver)
|
|
- **std::time::Instant** — Timeout enforcement across switches
|
|
|
|
### File Structure Requirements
|
|
|
|
**Modified files:**
|
|
- `crates/solver/src/solver.rs` — Add `FallbackConfig`, `FallbackSolver`, implement `Solver` trait
|
|
|
|
**Tests:**
|
|
- Unit tests in `solver.rs` (fallback logic, oscillation prevention, timeout)
|
|
- Integration tests in `tests/` directory (full system solving with fallback)
|
|
|
|
### Testing Requirements
|
|
|
|
**Unit Tests:**
|
|
- FallbackConfig defaults are sensible
|
|
- Newton diverges → Picard converges
|
|
- Oscillation prevention triggers at max switches
|
|
- Fallback disabled behaves as pure solver
|
|
- Timeout applies across switches
|
|
|
|
**Integration Tests:**
|
|
- Stiff system where Newton diverges but Picard converges
|
|
- System where Picard stabilizes and Newton returns
|
|
- System that oscillates and gets stuck on Picard
|
|
- Compare iteration counts: Newton-only vs Fallback
|
|
|
|
**Performance Tests:**
|
|
- No heap allocation during solver switches
|
|
- Convergence time < 1s for standard cycle (NFR1)
|
|
|
|
### Previous Story Intelligence (4.3)
|
|
|
|
**Picard Implementation Complete:**
|
|
- `PicardConfig::solve()` fully implemented with all features
|
|
- Pre-allocated buffers pattern established
|
|
- Timeout enforcement via `std::time::Instant`
|
|
- Divergence detection (5 consecutive increases)
|
|
- Relaxation factor for stability
|
|
- 37 unit tests in solver.rs, 29 integration tests
|
|
|
|
**Key Patterns to Follow:**
|
|
- Use `residual_norm()` helper for L2 norm calculation
|
|
- Use `check_divergence()` pattern with patience parameter
|
|
- Use `tracing::debug!` for iteration logging
|
|
- Use `tracing::info!` for convergence events
|
|
- Return `ConvergedState::new()` on success
|
|
|
|
**Fallback-Specific Considerations:**
|
|
- Track state across solver invocations
|
|
- Preserve system state between switches
|
|
- Log all decisions for debugging
|
|
- Handle partial convergence gracefully
|
|
|
|
### Git Intelligence
|
|
|
|
Recent commits show:
|
|
- `be70a7a` — feat(core): implement physical types with NewType pattern
|
|
- Epic 1-3 complete (components, fluids, topology)
|
|
- Story 4.1 complete (Solver trait abstraction)
|
|
- Story 4.2 complete (Newton-Raphson implementation)
|
|
- Story 4.3 complete (Sequential Substitution implementation)
|
|
- Ready for Intelligent Fallback implementation
|
|
|
|
### Project Context Reference
|
|
|
|
- **FR16:** [Source: epics.md — Solver automatically switches to Sequential Substitution if Newton-Raphson diverges]
|
|
- **FR17:** [Source: epics.md — Solver respects configurable time budget (timeout)]
|
|
- **FR18:** [Source: epics.md — On timeout, solver returns best known state with NonConverged status]
|
|
- **FR20:** [Source: epics.md — Convergence criterion checks Delta Pressure < 1 Pa (1e-5 bar)]
|
|
- **NFR1:** [Source: prd.md — Steady State convergence time < 1 second for standard cycle in Cold Start]
|
|
- **NFR4:** [Source: prd.md — No dynamic allocation in solver loop (pre-calculated allocation only)]
|
|
- **Solver Architecture:** [Source: architecture.md — Trait-based static polymorphism with enum dispatch]
|
|
- **Error Handling:** [Source: architecture.md — Centralized error enum with thiserror]
|
|
|
|
### Story Completion Status
|
|
|
|
- **Status:** ready-for-dev
|
|
- **Completion note:** Ultimate context engine analysis completed — comprehensive developer guide created
|
|
|
|
## Change Log
|
|
|
|
- 2026-02-18: Story 4.4 created from create-story workflow. Ready for dev.
|
|
- 2026-02-18: Story 4.4 implementation complete. All tasks done, tests passing.
|
|
- 2026-02-18: Code review completed. Fixed HIGH issues: AC #2 Newton return logic, AC #3 max switches behavior, Newton re-divergence handling. Fixed MEDIUM issues: Config cloning optimization, improved oscillation prevention tests.
|
|
|
|
## Dev Agent Record
|
|
|
|
### Agent Model Used
|
|
|
|
Claude 3.5 Sonnet (claude-3-5-sonnet)
|
|
|
|
### Debug Log References
|
|
|
|
No blocking issues encountered during implementation.
|
|
|
|
### Completion Notes List
|
|
|
|
- ✅ Implemented `FallbackConfig` struct with all required fields and `Default` trait
|
|
- ✅ Implemented `FallbackSolver` struct wrapping `NewtonConfig` and `PicardConfig`
|
|
- ✅ Implemented intelligent fallback algorithm with state tracking
|
|
- ✅ Newton → Picard switch on divergence with `tracing::warn!` logging
|
|
- ✅ Picard → Newton return when residual below threshold with `tracing::info!` logging
|
|
- ✅ Oscillation prevention via `max_fallback_switches` configuration
|
|
- ✅ Timeout enforcement across solver switches (remaining time budget)
|
|
- ✅ Pre-allocated buffers in underlying solvers (no heap allocation during switches)
|
|
- ✅ Implemented `Solver` trait for `FallbackSolver` with `solve()` and `with_timeout()`
|
|
- ✅ Added 12 unit tests for FallbackConfig and FallbackSolver
|
|
- ✅ Added 16 integration tests covering all acceptance criteria
|
|
- ✅ All 109 unit tests + 16 integration tests + 13 doc tests pass
|
|
|
|
### File List
|
|
|
|
**Modified:**
|
|
- `crates/solver/src/solver.rs` — Added `FallbackConfig`, `FallbackSolver`, `CurrentSolver` enum, `FallbackState` struct, and `Solver` trait implementation
|
|
|
|
**Created:**
|
|
- `crates/solver/tests/fallback_solver.rs` — Integration tests for FallbackSolver
|
|
|
|
**Updated:**
|
|
- `_bmad-output/implementation-artifacts/sprint-status.yaml` — Updated story status to "in-progress" then "review"
|