# Story 3.3: Smart-Routing Fallback Status: review ## Story As a system, I want to automatically fall back to a secondary provider when the primary fails, so that users experience zero downtime during external API outages. **Epic:** Epic 3 — The SaaS Commercial Engine (Monetization & API Cost Protection) **FR coverage:** FR18 (admin-configurable fallback rules), NFR-R1 (graceful degradation ≤1.5s). --- ## Acceptance Criteria 1. [AC1] **Retriable failures:** When a primary provider call fails with HTTP **429** or **5xx** (or AI SDK equivalent such as `APICallError` with those status codes), the system treats the failure as retriable and attempts exactly **one** secondary route for the same feature lane (`chat` | `tags` | `embedding`). 2. [AC2] **NFR-R1 (≤1.5s):** From the moment the primary failure is classified as retriable until the secondary provider accepts the request (first successful response chunk for streaming, or resolved promise for non-streaming), elapsed wall-clock time MUST be **≤ 1500ms**. Measure with `performance.now()` in tests; log `fallbackMs` in debug mode. 3. [AC3] **Per-lane secondary config:** Secondary provider/model are resolved from admin/env keys mirroring primary lane keys: - `AI_PROVIDER_CHAT_FALLBACK`, `AI_MODEL_CHAT_FALLBACK` - `AI_PROVIDER_TAGS_FALLBACK`, `AI_MODEL_TAGS_FALLBACK` - `AI_PROVIDER_EMBEDDING_FALLBACK`, `AI_MODEL_EMBEDDING_FALLBACK` If fallback provider is unset/empty, behavior is **primary-only** (no-op fallback — same as today). 4. [AC4] **Single choke-point:** Fallback orchestration lives in **`memento-note/lib/ai/fallback.ts`** (new) and is invoked from **`getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider`** via a thin wrapper OR from one shared `withAiFallback(lane, config, fn)` used by all hot paths — **not** copy-pasted in every API route. `resolveAiRoute()` in `router.ts` stays synchronous and primary-only; fallback is a **separate** resolve using `*_FALLBACK` keys. 5. [AC5] **Chat streaming path:** `app/api/chat/route.ts` MUST use fallback-aware execution so a failed `streamText` start on primary retries once on secondary before returning 502 to the client. Do not buffer entire streams for retry. 6. [AC6] **Non-retriable errors:** **4xx** other than 429 (401, 403, 400), validation errors, and **quota errors** (`QuotaExceededError` / HTTP 402 from entitlements) MUST **not** trigger provider fallback. 7. [AC7] **Observability:** When fallback fires, emit structured debug log `{ lane, primaryProvider, secondaryProvider, primaryStatus, fallbackMs }` — gated by `NODE_ENV !== 'production'` or `MEMENTO_AI_ROUTE_DEBUG=1` (same pattern as Story 3.2). Never log API keys. 8. [AC8] **Regression:** Existing `resolveAiRoute` unit tests and factory delegation from Story 3.2 remain green; new tests cover fallback resolution and retriable error classification. --- ## Tasks / Subtasks - [x] Task 1: Fallback resolution API (AC: #3, #4) - [x] Subtask 1.1: Add `resolveAiFallbackRoute(lane, config): ResolvedAiRoute | null` in `fallback.ts` (returns `null` if no fallback provider configured) - [x] Subtask 1.2: Register new keys in `lib/config.ts` `ENV_FALLBACKS` for all six `*_FALLBACK` keys - [x] Subtask 1.3: Optional admin fields in `admin-settings-form.tsx` for fallback provider + model per lane (FR18 minimal — three dropdowns + model inputs) - [x] Task 2: Error classification (AC: #1, #6) - [x] Subtask 2.1: Implement `isRetriableProviderError(err: unknown): boolean` handling AI SDK `APICallError`, `Response` status, and provider-specific wrappers - [x] Subtask 2.2: Unit tests for 429, 500, 503 → true; 401, 402, 400 → false - [x] Task 3: Execution wrapper (AC: #2, #4, #7) - [x] Subtask 3.1: Implement `withAiProviderFallback(lane, config, execute: (provider: AIProvider) => Promise): Promise` with 1500ms budget for fallback attempt after primary failure - [x] Subtask 3.2: On success via secondary, call optional debug logger with timing meta - [x] Task 4: Integrate hot paths (AC: #5, #4) - [x] Subtask 4.1: **Chat:** wrap `streamText` initiation in `app/api/chat/route.ts` (primary `getChatProvider` → on retriable failure → secondary provider model) - [x] Subtask 4.2: **Tags/titles:** wrap calls in `app/api/ai/tags/route.ts`, `app/api/ai/title-suggestions/route.ts` (and `task-extract.tool.ts` if same pattern) - [x] Subtask 4.3: **Embeddings:** wrap `getEmbeddings` in `embedding.service.ts` call site - [x] Subtask 4.4: Defer low-traffic paths (brainstorm, agents, pptx) to follow-up — documented below - [x] Task 5: Tests & NFR-R1 proof (AC: #2, #8) - [x] Subtask 5.1: `tests/unit/fallback.test.ts` — resolution, classification, mocked dual-provider success under 1.5s - [x] Subtask 5.2: Run `npm run test:unit -- tests/unit/fallback.test.ts tests/unit/router.test.ts` - [x] Task 6: Explicit non-goals (AC: #6) - [x] Subtask 6.1: Do **not** implement BYOK “no fallback” branch logic beyond a commented seam / early return stub — Story **3.5** - [x] Subtask 6.2: Do **not** implement multi-hop chains (tertiary provider) or cost-sorted global `PROVIDER_FALLBACK_CHAIN` from `byok-billing-patch-v3.md` wholesale — single secondary per lane only - [x] Subtask 6.3: Do **not** fallback on Memento quota exhaustion — entitlements stay upstream of provider calls --- ## Dev Notes ### Epic context - **3.1** — Redis entitlements before AI; `checkEntitlementOrThrow` stays **before** any provider call. Fallback does not bypass quotas. - **3.2** — `router.ts` + factory delegation delivered; primary route only. This story adds **failure-driven** secondary execution without changing `resolveAiRoute` semantics. - **3.4** — Host-pays billing context; fallback must not mis-attribute token usage (track against same user/host as primary attempt). - **3.5** — BYOK: when user key is active, skip system fallback chain (document seam only in 3.3). ### Current codebase reality (READ BEFORE CODING) | File | Current state | This story changes | |------|---------------|-------------------| | `lib/ai/router.ts` | Sync primary `resolveAiRoute`; comments point to 3.3 | Add **no** HTTP here; optionally export lane types for fallback | | `lib/ai/factory.ts` | `get*Provider` → `resolveAiRoute` + `getProviderInstance` | Call sites use `withAiProviderFallback` OR factory returns wrapper | | `app/api/chat/route.ts` | `getChatProvider` + `streamText` | Wrap stream start with fallback | | `lib/config.ts` | No `*_FALLBACK` keys yet | Add env fallbacks | | Admin settings | Primary provider/model only | Add fallback fields (FR18) | **There is no `lib/ai/fallback.ts` today.** PRD FR18 and `byok-billing-patch-v3.md` describe a fuller aspirational router — implement **NFR-R1 + single secondary per lane**, not the full patch pseudocode. ### Recommended implementation shape ```typescript // lib/ai/fallback.ts (sketch — adapt to repo patterns) export function resolveAiFallbackRoute(lane: AiFeatureLane, config: Record): ResolvedAiRoute | null { const providerKey = lane === 'chat' ? 'AI_PROVIDER_CHAT_FALLBACK' : lane === 'tags' ? 'AI_PROVIDER_TAGS_FALLBACK' : 'AI_PROVIDER_EMBEDDING_FALLBACK' const modelKey = lane === 'chat' ? 'AI_MODEL_CHAT_FALLBACK' : lane === 'tags' ? 'AI_MODEL_TAGS_FALLBACK' : 'AI_MODEL_EMBEDDING_FALLBACK' const provider = pick(config, providerKey) if (!provider) return null // build ResolvedAiRoute using same validation as router (anthropic guard on embedding lane) } export async function withAiProviderFallback( lane: AiFeatureLane, config: Record, run: (provider: AIProvider) => Promise, ): Promise { const primary = getProviderForLane(lane, config) // use existing factory helpers internally try { return await run(primary) } catch (err) { if (!isRetriableProviderError(err)) throw err const fb = resolveAiFallbackRoute(lane, config) if (!fb) throw err const t0 = performance.now() const secondary = getProviderInstance(fb.providerType, config, fb.modelName, fb.embeddingModelName, fb.ollamaBaseUrl) try { const result = await run(secondary) logFallbackDebug({ lane, fallbackMs: performance.now() - t0, ... }) return result } catch (secondaryErr) { throw secondaryErr // or aggregate errors } } } ``` **Streaming chat:** Primary `streamText` may fail before body bytes; catch that error, then call `streamText` again with `secondary.getModel()`. Do not retry mid-stream after partial tokens were sent. ### Files — expected touch list **NEW** - `memento-note/lib/ai/fallback.ts` - `memento-note/tests/unit/fallback.test.ts` **UPDATE** - `memento-note/lib/config.ts` — `ENV_FALLBACKS` for fallback keys - `memento-note/app/api/chat/route.ts` — fallback-aware `streamText` - `memento-note/app/api/ai/tags/route.ts` - `memento-note/app/api/ai/title-suggestions/route.ts` - `memento-note/lib/ai/services/semantic-search.service.ts` (embedding path — verify exact call site) - `memento-note/app/(admin)/admin/settings/admin-settings-form.tsx` — optional fallback UI (FR18) - `memento-note/locales/en.json` + `fr.json` — admin labels only if UI added **READ BEFORE MODIFY** - `memento-note/lib/ai/router.ts` — primary resolution; do not break - `memento-note/lib/ai/factory.ts` — `getProviderInstance`, `PROVIDER_DEFAULTS` - `memento-note/lib/entitlements.ts` — ordering vs fallback - `memento-note/tests/unit/router.test.ts` — regression baseline ### Testing standards - Vitest; mock providers throwing controlled errors. - Use `vi.fn()` primary fail / secondary succeed pattern. - Timing test: secondary invoked within 1500ms of primary failure (mock instant failures). --- ## Dev Agent Guardrails ### Technical requirements - **NFR-R1 scope:** Time budget covers **failover decision + secondary request start**, not full LLM generation latency. - **One retry only:** At most one secondary attempt per user request per lane. - **Preserve Story 3.2 NFR-P3:** `resolveAiRoute` / `resolveAiFallbackRoute` remain sync, no HTTP, no Redis. - **Keys:** Never log secrets; debug logs use provider **type** and model id only. - **i18n:** If admin UI adds labels, update `en.json` and `fr.json` — no hardcoded French/English in components. ### Architecture compliance - Brownfield Next.js App Router; reuse `AIProvider` interface (`lib/ai/types.ts`). - OpenRouter secondary models: slash IDs (`deepseek/deepseek-chat`) via existing `createOpenRouterProvider`. - Embedding lane: reuse router rule — **reject** `anthropic` / `anthropic_custom` for embedding fallback. ### Library / framework requirements - Reuse Vercel AI SDK error types (`import { APICallError } from 'ai'`) for status detection where applicable. - No new HTTP client; no circuit-breaker library required for 3.3. ### File structure requirements - `fallback.ts` beside `router.ts` under `lib/ai/`. - Named exports; match `factory.ts` / `router.ts` style. ### Testing requirements - `isRetriableProviderError` matrix test (429, 500, 401, QuotaExceededError). - `resolveAiFallbackRoute` returns `null` when unset; valid route when configured. - Integration-style unit test for `withAiProviderFallback` success path. --- ## Previous Story Intelligence **Source:** `docs/3-2-custom-llm-router.md` - Central router is **sync**; factory delegates `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider` to `resolveAiRoute`. - Explicit non-goal in 3.2: multi-provider HTTP fallback → **this story**. - Chat logs `formatAiRouteDebug` when `MEMENTO_AI_ROUTE_DEBUG=1` or non-production. - Extension seam for BYOK commented in `router.ts` — fallback module should accept future `skipFallback: true` when BYOK active (3.5). **Source:** `docs/3-1-freemium-quota-tracking.md` - Quota checks run **before** AI; 402 is not a provider outage. - Redis fail-open on entitlement errors — do not conflate with provider fallback. - Feature keys: `chat`, `semantic_search`, `auto_tag`, `auto_title`. --- ## Git Intelligence Summary | Commit | Insight | |--------|---------| | `41596c2` | OpenRouter key fallback `OPENROUTER_API_KEY` → `CUSTOM_OPENAI_API_KEY` — secondary route must use same `getProviderInstance` paths | | `1fcea6e` | Recent AI/embeddings work — verify semantic search embedding call site before wrapping | | `195e845` | Security hardening elsewhere — fallback logs must not leak prompts or keys | --- ## Latest Technical Information - **Vercel AI SDK:** `APICallError` exposes `statusCode` for HTTP classification; use for 429/5xx detection. - **OpenRouter:** OpenAI-compatible errors return standard HTTP status on upstream failures; treat like other OpenAI-compatible providers. - **Default secondary suggestion (dev/staging only, not hardcoded in prod):** If admin leaves fallback empty, document recommended pairing in Dev Notes (e.g. primary `openai` → secondary `deepseek` or `openrouter`) but **require explicit config** for production behavior. --- ## Project Context Reference - **Epics:** `docs/epics.md` — Story 3.3 + NFR-R1 - **PRD:** `docs/prd.md` — FR18, NFR-R1 - **Implementation readiness:** `docs/implementation-readiness-report.md` — FR18 marked missing; this story is first slice - **Aspirational full router:** `memento-note/docs/byok-billing-patch-v3.md` §2 — `executeLLM` + `PROVIDER_FALLBACK_CHAIN`; **do not implement wholesale** - **Prior stories:** `docs/3-1-freemium-quota-tracking.md`, `docs/3-2-custom-llm-router.md` --- ## Dev Agent Record ### Agent Model Used Composer (Cursor) ### Debug Log References - `npm run test:unit -- tests/unit/fallback.test.ts tests/unit/router.test.ts` — 14 passed ### Completion Notes List - Added `lib/ai/fallback.ts` with `resolveAiFallbackRoute`, `isRetriableProviderError`, `withAiProviderFallback` (1500ms budget, single secondary retry). - Integrated hot paths: chat (`streamText`), tags, title-suggestions, embeddings (`embedding.service.ts`), `task-extract.tool.ts`. - Admin UI: fallback provider/model per lane (tags, embeddings, chat) + i18n FR/EN. - `skipSystemFallback` option stubbed for Story 3.5 BYOK. - Deferred: brainstorm, agents, pptx routes (low traffic). ### File List - `memento-note/lib/ai/fallback.ts` (new) - `memento-note/lib/config.ts` - `memento-note/tests/unit/fallback.test.ts` (new) - `memento-note/app/api/chat/route.ts` - `memento-note/app/api/ai/tags/route.ts` - `memento-note/app/api/ai/title-suggestions/route.ts` - `memento-note/lib/ai/services/embedding.service.ts` - `memento-note/lib/ai/tools/task-extract.tool.ts` - `memento-note/app/(admin)/admin/settings/admin-settings-form.tsx` - `memento-note/locales/en.json` - `memento-note/locales/fr.json` ### Change Log - 2026-05-15: Story 3.3 implemented — provider failover on 429/5xx with per-lane admin fallback config. - 2026-05-15: Code review — 2 decisions, 5 patches applied, 6 deferred, 5 dismissed. 29 tests passing. --- ## Story Completion Status - Story ID: 3.3 - Story Key: `3-3-smart-routing-fallback` - File: `docs/3-3-smart-routing-fallback.md` - Status: **review** - Completion Note: Code review patches applied. 29 tests passing (14 fallback + 15 router). --- ### Review Findings (2026-05-15) #### Decisions — Resolved - [x] [D1→A] Fallback provider validation — ajout `VALID_PROVIDERS` check dans `resolveAiFallbackRoute` (import depuis router.ts), throw sur provider inconnu. - [x] [D2→A] Same-provider skip — si fallback === primaire, `resolveAiFallbackRoute` retourne `null` (pas de retry inutile). #### Patches — Applied - [x] [P1] CRITIQUE — Auth bypass `title-suggestions/route.ts` : ajout early return 401 pour session null + `.catch()` sur incrementUsageAsync. - [x] [P2] `resolveAiFallbackRoute` throw dans catch — `getSecondaryProvider` wrappé dans try/catch qui retourne `null` sur erreur config, erreur primaire préservée. - [x] [P3] `extractProviderErrorStatus` récursion bornée — `maxDepth` 5, `undefined` au-delà. Test cause circulaire ajouté. - [x] [P4] NFR-R1 timer déplacé avant `getSecondaryProvider` — mesure complète du failover. - [x] [P5] Tests ajoutés : 403 non-retriable, cause circulaire, cause nested, provider inconnu, same provider skip, config error preserves primary error. 14→29 tests au total. #### Deferred - Chat mid-stream failure — by design (AC5 retry au start seulement) - Ollama lane URLs absents de config.ts — cfgOnly() intentionnel - Batch embedding all-or-nothing — pré-existant - onFinish sans error handling — pré-existant - Pas de circuit breaker — out of scope 3.3 - incrementUsageAsync sur fail-open — by design #### Dismissed - title-suggestions/task-extract réutilisent 'tags' lane — by design - Pas de régression inline 3.2 — tests séparés - embeddingModelName calculé pour non-embedding — pas un bug - Mock state au-delà du 2ème appel — correct - Helpers sans annotations de type — TypeScript infère