Momento/docs/3-3-smart-routing-fallback.md

# Story 3.3: Smart-Routing Fallback

Status: review

<!-- Ultimate context engine analysis completed - comprehensive developer guide created -->

## Story

As a system,
I want to automatically fall back to a secondary provider when the primary fails,
so that users experience zero downtime during external API outages.

**Epic:** Epic 3 — The SaaS Commercial Engine (Monetization & API Cost Protection)
**FR coverage:** FR18 (admin-configurable fallback rules), NFR-R1 (graceful degradation ≤1.5s).

---

## Acceptance Criteria

1. [AC1] **Retriable failures:** When a primary provider call fails with HTTP **429** or **5xx** (or AI SDK equivalent such as `APICallError` with those status codes), the system treats the failure as retriable and attempts exactly **one** secondary route for the same feature lane (`chat` | `tags` | `embedding`).
2. [AC2] **NFR-R1 (≤1.5s):** From the moment the primary failure is classified as retriable until the secondary provider accepts the request (first successful response chunk for streaming, or resolved promise for non-streaming), elapsed wall-clock time MUST be **≤ 1500ms**. Measure with `performance.now()` in tests; log `fallbackMs` in debug mode.
3. [AC3] **Per-lane secondary config:** Secondary provider/model are resolved from admin/env keys mirroring primary lane keys:
   - `AI_PROVIDER_CHAT_FALLBACK`, `AI_MODEL_CHAT_FALLBACK`
   - `AI_PROVIDER_TAGS_FALLBACK`, `AI_MODEL_TAGS_FALLBACK`
   - `AI_PROVIDER_EMBEDDING_FALLBACK`, `AI_MODEL_EMBEDDING_FALLBACK`
   If fallback provider is unset/empty, behavior is **primary-only** (no-op fallback — same as today).
4. [AC4] **Single choke-point:** Fallback orchestration lives in **`memento-note/lib/ai/fallback.ts`** (new) and is invoked from **`getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider`** via a thin wrapper OR from one shared `withAiFallback(lane, config, fn)` used by all hot paths — **not** copy-pasted in every API route. `resolveAiRoute()` in `router.ts` stays synchronous and primary-only; fallback is a **separate** resolve using `*_FALLBACK` keys.
5. [AC5] **Chat streaming path:** `app/api/chat/route.ts` MUST use fallback-aware execution so a failed `streamText` start on primary retries once on secondary before returning 502 to the client. Do not buffer entire streams for retry.
6. [AC6] **Non-retriable errors:** **4xx** other than 429 (401, 403, 400), validation errors, and **quota errors** (`QuotaExceededError` / HTTP 402 from entitlements) MUST **not** trigger provider fallback.
7. [AC7] **Observability:** When fallback fires, emit structured debug log `{ lane, primaryProvider, secondaryProvider, primaryStatus, fallbackMs }` — gated by `NODE_ENV !== 'production'` or `MEMENTO_AI_ROUTE_DEBUG=1` (same pattern as Story 3.2). Never log API keys.
8. [AC8] **Regression:** Existing `resolveAiRoute` unit tests and factory delegation from Story 3.2 remain green; new tests cover fallback resolution and retriable error classification.

---

## Tasks / Subtasks

- [x] Task 1: Fallback resolution API (AC: #3, #4)
  - [x] Subtask 1.1: Add `resolveAiFallbackRoute(lane, config): ResolvedAiRoute | null` in `fallback.ts` (returns `null` if no fallback provider configured)
  - [x] Subtask 1.2: Register new keys in `lib/config.ts` `ENV_FALLBACKS` for all six `*_FALLBACK` keys
  - [x] Subtask 1.3: Optional admin fields in `admin-settings-form.tsx` for fallback provider + model per lane (FR18 minimal — three dropdowns + model inputs)
- [x] Task 2: Error classification (AC: #1, #6)
  - [x] Subtask 2.1: Implement `isRetriableProviderError(err: unknown): boolean` handling AI SDK `APICallError`, `Response` status, and provider-specific wrappers
  - [x] Subtask 2.2: Unit tests for 429, 500, 503 → true; 401, 402, 400 → false
- [x] Task 3: Execution wrapper (AC: #2, #4, #7)
  - [x] Subtask 3.1: Implement `withAiProviderFallback<T>(lane, config, execute: (provider: AIProvider) => Promise<T>): Promise<T>` with 1500ms budget for fallback attempt after primary failure
  - [x] Subtask 3.2: On success via secondary, call optional debug logger with timing meta
- [x] Task 4: Integrate hot paths (AC: #5, #4)
  - [x] Subtask 4.1: **Chat:** wrap `streamText` initiation in `app/api/chat/route.ts` (primary `getChatProvider` → on retriable failure → secondary provider model)
  - [x] Subtask 4.2: **Tags/titles:** wrap calls in `app/api/ai/tags/route.ts`, `app/api/ai/title-suggestions/route.ts` (and `task-extract.tool.ts` if same pattern)
  - [x] Subtask 4.3: **Embeddings:** wrap `getEmbeddings` in `embedding.service.ts` call site
  - [x] Subtask 4.4: Defer low-traffic paths (brainstorm, agents, pptx) to follow-up — documented below
- [x] Task 5: Tests & NFR-R1 proof (AC: #2, #8)
  - [x] Subtask 5.1: `tests/unit/fallback.test.ts` — resolution, classification, mocked dual-provider success under 1.5s
  - [x] Subtask 5.2: Run `npm run test:unit -- tests/unit/fallback.test.ts tests/unit/router.test.ts`
- [x] Task 6: Explicit non-goals (AC: #6)
  - [x] Subtask 6.1: Do **not** implement BYOK “no fallback” branch logic beyond a commented seam / early return stub — Story **3.5**
  - [x] Subtask 6.2: Do **not** implement multi-hop chains (tertiary provider) or cost-sorted global `PROVIDER_FALLBACK_CHAIN` from `byok-billing-patch-v3.md` wholesale — single secondary per lane only
  - [x] Subtask 6.3: Do **not** fallback on Memento quota exhaustion — entitlements stay upstream of provider calls

---

## Dev Notes

### Epic context

- **3.1** — Redis entitlements before AI; `checkEntitlementOrThrow` stays **before** any provider call. Fallback does not bypass quotas.
- **3.2** — `router.ts` + factory delegation delivered; primary route only. This story adds **failure-driven** secondary execution without changing `resolveAiRoute` semantics.
- **3.4** — Host-pays billing context; fallback must not mis-attribute token usage (track against same user/host as primary attempt).
- **3.5** — BYOK: when user key is active, skip system fallback chain (document seam only in 3.3).

### Current codebase reality (READ BEFORE CODING)

| File | Current state | This story changes |
|------|---------------|-------------------|
| `lib/ai/router.ts` | Sync primary `resolveAiRoute`; comments point to 3.3 | Add **no** HTTP here; optionally export lane types for fallback |
| `lib/ai/factory.ts` | `get*Provider` → `resolveAiRoute` + `getProviderInstance` | Call sites use `withAiProviderFallback` OR factory returns wrapper |
| `app/api/chat/route.ts` | `getChatProvider` + `streamText` | Wrap stream start with fallback |
| `lib/config.ts` | No `*_FALLBACK` keys yet | Add env fallbacks |
| Admin settings | Primary provider/model only | Add fallback fields (FR18) |

**There is no `lib/ai/fallback.ts` today.** PRD FR18 and `byok-billing-patch-v3.md` describe a fuller aspirational router — implement **NFR-R1 + single secondary per lane**, not the full patch pseudocode.

### Recommended implementation shape

```typescript
// lib/ai/fallback.ts (sketch — adapt to repo patterns)

export function resolveAiFallbackRoute(lane: AiFeatureLane, config: Record<string, string>): ResolvedAiRoute | null {
  const providerKey = lane === 'chat' ? 'AI_PROVIDER_CHAT_FALLBACK' : lane === 'tags' ? 'AI_PROVIDER_TAGS_FALLBACK' : 'AI_PROVIDER_EMBEDDING_FALLBACK'
  const modelKey = lane === 'chat' ? 'AI_MODEL_CHAT_FALLBACK' : lane === 'tags' ? 'AI_MODEL_TAGS_FALLBACK' : 'AI_MODEL_EMBEDDING_FALLBACK'
  const provider = pick(config, providerKey)
  if (!provider) return null
  // build ResolvedAiRoute using same validation as router (anthropic guard on embedding lane)
}

export async function withAiProviderFallback<T>(
  lane: AiFeatureLane,
  config: Record<string, string>,
  run: (provider: AIProvider) => Promise<T>,
): Promise<T> {
  const primary = getProviderForLane(lane, config) // use existing factory helpers internally
  try {
    return await run(primary)
  } catch (err) {
    if (!isRetriableProviderError(err)) throw err
    const fb = resolveAiFallbackRoute(lane, config)
    if (!fb) throw err
    const t0 = performance.now()
    const secondary = getProviderInstance(fb.providerType, config, fb.modelName, fb.embeddingModelName, fb.ollamaBaseUrl)
    try {
      const result = await run(secondary)
      logFallbackDebug({ lane, fallbackMs: performance.now() - t0, ... })
      return result
    } catch (secondaryErr) {
      throw secondaryErr // or aggregate errors
    }
  }
}
```

**Streaming chat:** Primary `streamText` may fail before body bytes; catch that error, then call `streamText` again with `secondary.getModel()`. Do not retry mid-stream after partial tokens were sent.

### Files — expected touch list

**NEW**

- `memento-note/lib/ai/fallback.ts`
- `memento-note/tests/unit/fallback.test.ts`

**UPDATE**

- `memento-note/lib/config.ts` — `ENV_FALLBACKS` for fallback keys
- `memento-note/app/api/chat/route.ts` — fallback-aware `streamText`
- `memento-note/app/api/ai/tags/route.ts`
- `memento-note/app/api/ai/title-suggestions/route.ts`
- `memento-note/lib/ai/services/semantic-search.service.ts` (embedding path — verify exact call site)
- `memento-note/app/(admin)/admin/settings/admin-settings-form.tsx` — optional fallback UI (FR18)
- `memento-note/locales/en.json` + `fr.json` — admin labels only if UI added

**READ BEFORE MODIFY**

- `memento-note/lib/ai/router.ts` — primary resolution; do not break
- `memento-note/lib/ai/factory.ts` — `getProviderInstance`, `PROVIDER_DEFAULTS`
- `memento-note/lib/entitlements.ts` — ordering vs fallback
- `memento-note/tests/unit/router.test.ts` — regression baseline

### Testing standards

- Vitest; mock providers throwing controlled errors.
- Use `vi.fn()` primary fail / secondary succeed pattern.
- Timing test: secondary invoked within 1500ms of primary failure (mock instant failures).

---

## Dev Agent Guardrails

### Technical requirements

- **NFR-R1 scope:** Time budget covers **failover decision + secondary request start**, not full LLM generation latency.
- **One retry only:** At most one secondary attempt per user request per lane.
- **Preserve Story 3.2 NFR-P3:** `resolveAiRoute` / `resolveAiFallbackRoute` remain sync, no HTTP, no Redis.
- **Keys:** Never log secrets; debug logs use provider **type** and model id only.
- **i18n:** If admin UI adds labels, update `en.json` and `fr.json` — no hardcoded French/English in components.

### Architecture compliance

- Brownfield Next.js App Router; reuse `AIProvider` interface (`lib/ai/types.ts`).
- OpenRouter secondary models: slash IDs (`deepseek/deepseek-chat`) via existing `createOpenRouterProvider`.
- Embedding lane: reuse router rule — **reject** `anthropic` / `anthropic_custom` for embedding fallback.

### Library / framework requirements

- Reuse Vercel AI SDK error types (`import { APICallError } from 'ai'`) for status detection where applicable.
- No new HTTP client; no circuit-breaker library required for 3.3.

### File structure requirements

- `fallback.ts` beside `router.ts` under `lib/ai/`.
- Named exports; match `factory.ts` / `router.ts` style.

### Testing requirements

- `isRetriableProviderError` matrix test (429, 500, 401, QuotaExceededError).
- `resolveAiFallbackRoute` returns `null` when unset; valid route when configured.
- Integration-style unit test for `withAiProviderFallback` success path.

---

## Previous Story Intelligence

**Source:** `docs/3-2-custom-llm-router.md`

- Central router is **sync**; factory delegates `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider` to `resolveAiRoute`.
- Explicit non-goal in 3.2: multi-provider HTTP fallback → **this story**.
- Chat logs `formatAiRouteDebug` when `MEMENTO_AI_ROUTE_DEBUG=1` or non-production.
- Extension seam for BYOK commented in `router.ts` — fallback module should accept future `skipFallback: true` when BYOK active (3.5).

**Source:** `docs/3-1-freemium-quota-tracking.md`

- Quota checks run **before** AI; 402 is not a provider outage.
- Redis fail-open on entitlement errors — do not conflate with provider fallback.
- Feature keys: `chat`, `semantic_search`, `auto_tag`, `auto_title`.

---

## Git Intelligence Summary

| Commit | Insight |
|--------|---------|
| `41596c2` | OpenRouter key fallback `OPENROUTER_API_KEY` → `CUSTOM_OPENAI_API_KEY` — secondary route must use same `getProviderInstance` paths |
| `1fcea6e` | Recent AI/embeddings work — verify semantic search embedding call site before wrapping |
| `195e845` | Security hardening elsewhere — fallback logs must not leak prompts or keys |

---

## Latest Technical Information

- **Vercel AI SDK:** `APICallError` exposes `statusCode` for HTTP classification; use for 429/5xx detection.
- **OpenRouter:** OpenAI-compatible errors return standard HTTP status on upstream failures; treat like other OpenAI-compatible providers.
- **Default secondary suggestion (dev/staging only, not hardcoded in prod):** If admin leaves fallback empty, document recommended pairing in Dev Notes (e.g. primary `openai` → secondary `deepseek` or `openrouter`) but **require explicit config** for production behavior.

---

## Project Context Reference

- **Epics:** `docs/epics.md` — Story 3.3 + NFR-R1
- **PRD:** `docs/prd.md` — FR18, NFR-R1
- **Implementation readiness:** `docs/implementation-readiness-report.md` — FR18 marked missing; this story is first slice
- **Aspirational full router:** `memento-note/docs/byok-billing-patch-v3.md` §2 — `executeLLM` + `PROVIDER_FALLBACK_CHAIN`; **do not implement wholesale**
- **Prior stories:** `docs/3-1-freemium-quota-tracking.md`, `docs/3-2-custom-llm-router.md`

---

## Dev Agent Record

### Agent Model Used

Composer (Cursor)

### Debug Log References

- `npm run test:unit -- tests/unit/fallback.test.ts tests/unit/router.test.ts` — 14 passed

### Completion Notes List

- Added `lib/ai/fallback.ts` with `resolveAiFallbackRoute`, `isRetriableProviderError`, `withAiProviderFallback` (1500ms budget, single secondary retry).
- Integrated hot paths: chat (`streamText`), tags, title-suggestions, embeddings (`embedding.service.ts`), `task-extract.tool.ts`.
- Admin UI: fallback provider/model per lane (tags, embeddings, chat) + i18n FR/EN.
- `skipSystemFallback` option stubbed for Story 3.5 BYOK.
- Deferred: brainstorm, agents, pptx routes (low traffic).

### File List

- `memento-note/lib/ai/fallback.ts` (new)
- `memento-note/lib/config.ts`
- `memento-note/tests/unit/fallback.test.ts` (new)
- `memento-note/app/api/chat/route.ts`
- `memento-note/app/api/ai/tags/route.ts`
- `memento-note/app/api/ai/title-suggestions/route.ts`
- `memento-note/lib/ai/services/embedding.service.ts`
- `memento-note/lib/ai/tools/task-extract.tool.ts`
- `memento-note/app/(admin)/admin/settings/admin-settings-form.tsx`
- `memento-note/locales/en.json`
- `memento-note/locales/fr.json`

### Change Log

- 2026-05-15: Story 3.3 implemented — provider failover on 429/5xx with per-lane admin fallback config.
- 2026-05-15: Code review — 2 decisions, 5 patches applied, 6 deferred, 5 dismissed. 29 tests passing.

---

## Story Completion Status

- Story ID: 3.3
- Story Key: `3-3-smart-routing-fallback`
- File: `docs/3-3-smart-routing-fallback.md`
- Status: **review**
- Completion Note: Code review patches applied. 29 tests passing (14 fallback + 15 router).

---

### Review Findings (2026-05-15)

#### Decisions — Resolved

- [x] [D1→A] Fallback provider validation — ajout `VALID_PROVIDERS` check dans `resolveAiFallbackRoute` (import depuis router.ts), throw sur provider inconnu.
- [x] [D2→A] Same-provider skip — si fallback === primaire, `resolveAiFallbackRoute` retourne `null` (pas de retry inutile).

#### Patches — Applied

- [x] [P1] CRITIQUE — Auth bypass `title-suggestions/route.ts` : ajout early return 401 pour session null + `.catch()` sur incrementUsageAsync.
- [x] [P2] `resolveAiFallbackRoute` throw dans catch — `getSecondaryProvider` wrappé dans try/catch qui retourne `null` sur erreur config, erreur primaire préservée.
- [x] [P3] `extractProviderErrorStatus` récursion bornée — `maxDepth` 5, `undefined` au-delà. Test cause circulaire ajouté.
- [x] [P4] NFR-R1 timer déplacé avant `getSecondaryProvider` — mesure complète du failover.
- [x] [P5] Tests ajoutés : 403 non-retriable, cause circulaire, cause nested, provider inconnu, same provider skip, config error preserves primary error. 14→29 tests au total.

#### Deferred

- Chat mid-stream failure — by design (AC5 retry au start seulement)
- Ollama lane URLs absents de config.ts — cfgOnly() intentionnel
- Batch embedding all-or-nothing — pré-existant
- onFinish sans error handling — pré-existant
- Pas de circuit breaker — out of scope 3.3
- incrementUsageAsync sur fail-open — by design

#### Dismissed

- title-suggestions/task-extract réutilisent 'tags' lane — by design
- Pas de régression inline 3.2 — tests séparés
- embeddingModelName calculé pour non-embedding — pas un bug
- Mock state au-delà du 2ème appel — correct
- Helpers sans annotations de type — TypeScript infère