- Sidebar: dynamic brand-accent colors, brainstorm section restyled - AI chat general: popup panel with expand/collapse, hides when contextual AI open - AI chat contextual: tabs reordered (Actions first), X close button, height fix - Settings: all tabs restyled, 6 new color presets (sage, terracotta, iron, etc.) - Global color cleanup: emerald/orange hardcoded → brand-accent dynamic - Brainstorm page: orange → brand-accent throughout - PageEntry animation component added to key pages - Floating AI button: bg-brand-accent instead of hardcoded black - i18n: all 15 locales updated with new AI/billing keys - Billing: freemium quota tracking, BYOK, stripe subscription scaffolding - Admin: integrated into new design - AGENTS.md + CLAUDE.md project rules added
17 KiB
Story 3.3: Smart-Routing Fallback
Status: review
Story
As a system, I want to automatically fall back to a secondary provider when the primary fails, so that users experience zero downtime during external API outages.
Epic: Epic 3 — The SaaS Commercial Engine (Monetization & API Cost Protection)
FR coverage: FR18 (admin-configurable fallback rules), NFR-R1 (graceful degradation ≤1.5s).
Acceptance Criteria
- [AC1] Retriable failures: When a primary provider call fails with HTTP 429 or 5xx (or AI SDK equivalent such as
APICallErrorwith those status codes), the system treats the failure as retriable and attempts exactly one secondary route for the same feature lane (chat|tags|embedding). - [AC2] NFR-R1 (≤1.5s): From the moment the primary failure is classified as retriable until the secondary provider accepts the request (first successful response chunk for streaming, or resolved promise for non-streaming), elapsed wall-clock time MUST be ≤ 1500ms. Measure with
performance.now()in tests; logfallbackMsin debug mode. - [AC3] Per-lane secondary config: Secondary provider/model are resolved from admin/env keys mirroring primary lane keys:
AI_PROVIDER_CHAT_FALLBACK,AI_MODEL_CHAT_FALLBACKAI_PROVIDER_TAGS_FALLBACK,AI_MODEL_TAGS_FALLBACKAI_PROVIDER_EMBEDDING_FALLBACK,AI_MODEL_EMBEDDING_FALLBACKIf fallback provider is unset/empty, behavior is primary-only (no-op fallback — same as today).
- [AC4] Single choke-point: Fallback orchestration lives in
memento-note/lib/ai/fallback.ts(new) and is invoked fromgetChatProvider/getTagsProvider/getEmbeddingsProvidervia a thin wrapper OR from one sharedwithAiFallback(lane, config, fn)used by all hot paths — not copy-pasted in every API route.resolveAiRoute()inrouter.tsstays synchronous and primary-only; fallback is a separate resolve using*_FALLBACKkeys. - [AC5] Chat streaming path:
app/api/chat/route.tsMUST use fallback-aware execution so a failedstreamTextstart on primary retries once on secondary before returning 502 to the client. Do not buffer entire streams for retry. - [AC6] Non-retriable errors: 4xx other than 429 (401, 403, 400), validation errors, and quota errors (
QuotaExceededError/ HTTP 402 from entitlements) MUST not trigger provider fallback. - [AC7] Observability: When fallback fires, emit structured debug log
{ lane, primaryProvider, secondaryProvider, primaryStatus, fallbackMs }— gated byNODE_ENV !== 'production'orMEMENTO_AI_ROUTE_DEBUG=1(same pattern as Story 3.2). Never log API keys. - [AC8] Regression: Existing
resolveAiRouteunit tests and factory delegation from Story 3.2 remain green; new tests cover fallback resolution and retriable error classification.
Tasks / Subtasks
- Task 1: Fallback resolution API (AC: #3, #4)
- Subtask 1.1: Add
resolveAiFallbackRoute(lane, config): ResolvedAiRoute | nullinfallback.ts(returnsnullif no fallback provider configured) - Subtask 1.2: Register new keys in
lib/config.tsENV_FALLBACKSfor all six*_FALLBACKkeys - Subtask 1.3: Optional admin fields in
admin-settings-form.tsxfor fallback provider + model per lane (FR18 minimal — three dropdowns + model inputs)
- Subtask 1.1: Add
- Task 2: Error classification (AC: #1, #6)
- Subtask 2.1: Implement
isRetriableProviderError(err: unknown): booleanhandling AI SDKAPICallError,Responsestatus, and provider-specific wrappers - Subtask 2.2: Unit tests for 429, 500, 503 → true; 401, 402, 400 → false
- Subtask 2.1: Implement
- Task 3: Execution wrapper (AC: #2, #4, #7)
- Subtask 3.1: Implement
withAiProviderFallback<T>(lane, config, execute: (provider: AIProvider) => Promise<T>): Promise<T>with 1500ms budget for fallback attempt after primary failure - Subtask 3.2: On success via secondary, call optional debug logger with timing meta
- Subtask 3.1: Implement
- Task 4: Integrate hot paths (AC: #5, #4)
- Subtask 4.1: Chat: wrap
streamTextinitiation inapp/api/chat/route.ts(primarygetChatProvider→ on retriable failure → secondary provider model) - Subtask 4.2: Tags/titles: wrap calls in
app/api/ai/tags/route.ts,app/api/ai/title-suggestions/route.ts(andtask-extract.tool.tsif same pattern) - Subtask 4.3: Embeddings: wrap
getEmbeddingsinembedding.service.tscall site - Subtask 4.4: Defer low-traffic paths (brainstorm, agents, pptx) to follow-up — documented below
- Subtask 4.1: Chat: wrap
- Task 5: Tests & NFR-R1 proof (AC: #2, #8)
- Subtask 5.1:
tests/unit/fallback.test.ts— resolution, classification, mocked dual-provider success under 1.5s - Subtask 5.2: Run
npm run test:unit -- tests/unit/fallback.test.ts tests/unit/router.test.ts
- Subtask 5.1:
- Task 6: Explicit non-goals (AC: #6)
- Subtask 6.1: Do not implement BYOK “no fallback” branch logic beyond a commented seam / early return stub — Story 3.5
- Subtask 6.2: Do not implement multi-hop chains (tertiary provider) or cost-sorted global
PROVIDER_FALLBACK_CHAINfrombyok-billing-patch-v3.mdwholesale — single secondary per lane only - Subtask 6.3: Do not fallback on Memento quota exhaustion — entitlements stay upstream of provider calls
Dev Notes
Epic context
- 3.1 — Redis entitlements before AI;
checkEntitlementOrThrowstays before any provider call. Fallback does not bypass quotas. - 3.2 —
router.ts+ factory delegation delivered; primary route only. This story adds failure-driven secondary execution without changingresolveAiRoutesemantics. - 3.4 — Host-pays billing context; fallback must not mis-attribute token usage (track against same user/host as primary attempt).
- 3.5 — BYOK: when user key is active, skip system fallback chain (document seam only in 3.3).
Current codebase reality (READ BEFORE CODING)
| File | Current state | This story changes |
|---|---|---|
lib/ai/router.ts |
Sync primary resolveAiRoute; comments point to 3.3 |
Add no HTTP here; optionally export lane types for fallback |
lib/ai/factory.ts |
get*Provider → resolveAiRoute + getProviderInstance |
Call sites use withAiProviderFallback OR factory returns wrapper |
app/api/chat/route.ts |
getChatProvider + streamText |
Wrap stream start with fallback |
lib/config.ts |
No *_FALLBACK keys yet |
Add env fallbacks |
| Admin settings | Primary provider/model only | Add fallback fields (FR18) |
There is no lib/ai/fallback.ts today. PRD FR18 and byok-billing-patch-v3.md describe a fuller aspirational router — implement NFR-R1 + single secondary per lane, not the full patch pseudocode.
Recommended implementation shape
// lib/ai/fallback.ts (sketch — adapt to repo patterns)
export function resolveAiFallbackRoute(lane: AiFeatureLane, config: Record<string, string>): ResolvedAiRoute | null {
const providerKey = lane === 'chat' ? 'AI_PROVIDER_CHAT_FALLBACK' : lane === 'tags' ? 'AI_PROVIDER_TAGS_FALLBACK' : 'AI_PROVIDER_EMBEDDING_FALLBACK'
const modelKey = lane === 'chat' ? 'AI_MODEL_CHAT_FALLBACK' : lane === 'tags' ? 'AI_MODEL_TAGS_FALLBACK' : 'AI_MODEL_EMBEDDING_FALLBACK'
const provider = pick(config, providerKey)
if (!provider) return null
// build ResolvedAiRoute using same validation as router (anthropic guard on embedding lane)
}
export async function withAiProviderFallback<T>(
lane: AiFeatureLane,
config: Record<string, string>,
run: (provider: AIProvider) => Promise<T>,
): Promise<T> {
const primary = getProviderForLane(lane, config) // use existing factory helpers internally
try {
return await run(primary)
} catch (err) {
if (!isRetriableProviderError(err)) throw err
const fb = resolveAiFallbackRoute(lane, config)
if (!fb) throw err
const t0 = performance.now()
const secondary = getProviderInstance(fb.providerType, config, fb.modelName, fb.embeddingModelName, fb.ollamaBaseUrl)
try {
const result = await run(secondary)
logFallbackDebug({ lane, fallbackMs: performance.now() - t0, ... })
return result
} catch (secondaryErr) {
throw secondaryErr // or aggregate errors
}
}
}
Streaming chat: Primary streamText may fail before body bytes; catch that error, then call streamText again with secondary.getModel(). Do not retry mid-stream after partial tokens were sent.
Files — expected touch list
NEW
memento-note/lib/ai/fallback.tsmemento-note/tests/unit/fallback.test.ts
UPDATE
memento-note/lib/config.ts—ENV_FALLBACKSfor fallback keysmemento-note/app/api/chat/route.ts— fallback-awarestreamTextmemento-note/app/api/ai/tags/route.tsmemento-note/app/api/ai/title-suggestions/route.tsmemento-note/lib/ai/services/semantic-search.service.ts(embedding path — verify exact call site)memento-note/app/(admin)/admin/settings/admin-settings-form.tsx— optional fallback UI (FR18)memento-note/locales/en.json+fr.json— admin labels only if UI added
READ BEFORE MODIFY
memento-note/lib/ai/router.ts— primary resolution; do not breakmemento-note/lib/ai/factory.ts—getProviderInstance,PROVIDER_DEFAULTSmemento-note/lib/entitlements.ts— ordering vs fallbackmemento-note/tests/unit/router.test.ts— regression baseline
Testing standards
- Vitest; mock providers throwing controlled errors.
- Use
vi.fn()primary fail / secondary succeed pattern. - Timing test: secondary invoked within 1500ms of primary failure (mock instant failures).
Dev Agent Guardrails
Technical requirements
- NFR-R1 scope: Time budget covers failover decision + secondary request start, not full LLM generation latency.
- One retry only: At most one secondary attempt per user request per lane.
- Preserve Story 3.2 NFR-P3:
resolveAiRoute/resolveAiFallbackRouteremain sync, no HTTP, no Redis. - Keys: Never log secrets; debug logs use provider type and model id only.
- i18n: If admin UI adds labels, update
en.jsonandfr.json— no hardcoded French/English in components.
Architecture compliance
- Brownfield Next.js App Router; reuse
AIProviderinterface (lib/ai/types.ts). - OpenRouter secondary models: slash IDs (
deepseek/deepseek-chat) via existingcreateOpenRouterProvider. - Embedding lane: reuse router rule — reject
anthropic/anthropic_customfor embedding fallback.
Library / framework requirements
- Reuse Vercel AI SDK error types (
import { APICallError } from 'ai') for status detection where applicable. - No new HTTP client; no circuit-breaker library required for 3.3.
File structure requirements
fallback.tsbesiderouter.tsunderlib/ai/.- Named exports; match
factory.ts/router.tsstyle.
Testing requirements
isRetriableProviderErrormatrix test (429, 500, 401, QuotaExceededError).resolveAiFallbackRoutereturnsnullwhen unset; valid route when configured.- Integration-style unit test for
withAiProviderFallbacksuccess path.
Previous Story Intelligence
Source: docs/3-2-custom-llm-router.md
- Central router is sync; factory delegates
getChatProvider/getTagsProvider/getEmbeddingsProvidertoresolveAiRoute. - Explicit non-goal in 3.2: multi-provider HTTP fallback → this story.
- Chat logs
formatAiRouteDebugwhenMEMENTO_AI_ROUTE_DEBUG=1or non-production. - Extension seam for BYOK commented in
router.ts— fallback module should accept futureskipFallback: truewhen BYOK active (3.5).
Source: docs/3-1-freemium-quota-tracking.md
- Quota checks run before AI; 402 is not a provider outage.
- Redis fail-open on entitlement errors — do not conflate with provider fallback.
- Feature keys:
chat,semantic_search,auto_tag,auto_title.
Git Intelligence Summary
| Commit | Insight |
|---|---|
41596c2 |
OpenRouter key fallback OPENROUTER_API_KEY → CUSTOM_OPENAI_API_KEY — secondary route must use same getProviderInstance paths |
1fcea6e |
Recent AI/embeddings work — verify semantic search embedding call site before wrapping |
195e845 |
Security hardening elsewhere — fallback logs must not leak prompts or keys |
Latest Technical Information
- Vercel AI SDK:
APICallErrorexposesstatusCodefor HTTP classification; use for 429/5xx detection. - OpenRouter: OpenAI-compatible errors return standard HTTP status on upstream failures; treat like other OpenAI-compatible providers.
- Default secondary suggestion (dev/staging only, not hardcoded in prod): If admin leaves fallback empty, document recommended pairing in Dev Notes (e.g. primary
openai→ secondarydeepseekoropenrouter) but require explicit config for production behavior.
Project Context Reference
- Epics:
docs/epics.md— Story 3.3 + NFR-R1 - PRD:
docs/prd.md— FR18, NFR-R1 - Implementation readiness:
docs/implementation-readiness-report.md— FR18 marked missing; this story is first slice - Aspirational full router:
memento-note/docs/byok-billing-patch-v3.md§2 —executeLLM+PROVIDER_FALLBACK_CHAIN; do not implement wholesale - Prior stories:
docs/3-1-freemium-quota-tracking.md,docs/3-2-custom-llm-router.md
Dev Agent Record
Agent Model Used
Composer (Cursor)
Debug Log References
npm run test:unit -- tests/unit/fallback.test.ts tests/unit/router.test.ts— 14 passed
Completion Notes List
- Added
lib/ai/fallback.tswithresolveAiFallbackRoute,isRetriableProviderError,withAiProviderFallback(1500ms budget, single secondary retry). - Integrated hot paths: chat (
streamText), tags, title-suggestions, embeddings (embedding.service.ts),task-extract.tool.ts. - Admin UI: fallback provider/model per lane (tags, embeddings, chat) + i18n FR/EN.
skipSystemFallbackoption stubbed for Story 3.5 BYOK.- Deferred: brainstorm, agents, pptx routes (low traffic).
File List
memento-note/lib/ai/fallback.ts(new)memento-note/lib/config.tsmemento-note/tests/unit/fallback.test.ts(new)memento-note/app/api/chat/route.tsmemento-note/app/api/ai/tags/route.tsmemento-note/app/api/ai/title-suggestions/route.tsmemento-note/lib/ai/services/embedding.service.tsmemento-note/lib/ai/tools/task-extract.tool.tsmemento-note/app/(admin)/admin/settings/admin-settings-form.tsxmemento-note/locales/en.jsonmemento-note/locales/fr.json
Change Log
- 2026-05-15: Story 3.3 implemented — provider failover on 429/5xx with per-lane admin fallback config.
- 2026-05-15: Code review — 2 decisions, 5 patches applied, 6 deferred, 5 dismissed. 29 tests passing.
Story Completion Status
- Story ID: 3.3
- Story Key:
3-3-smart-routing-fallback - File:
docs/3-3-smart-routing-fallback.md - Status: review
- Completion Note: Code review patches applied. 29 tests passing (14 fallback + 15 router).
Review Findings (2026-05-15)
Decisions — Resolved
- [D1→A] Fallback provider validation — ajout
VALID_PROVIDERScheck dansresolveAiFallbackRoute(import depuis router.ts), throw sur provider inconnu. - [D2→A] Same-provider skip — si fallback === primaire,
resolveAiFallbackRouteretournenull(pas de retry inutile).
Patches — Applied
- [P1] CRITIQUE — Auth bypass
title-suggestions/route.ts: ajout early return 401 pour session null +.catch()sur incrementUsageAsync. - [P2]
resolveAiFallbackRoutethrow dans catch —getSecondaryProviderwrappé dans try/catch qui retournenullsur erreur config, erreur primaire préservée. - [P3]
extractProviderErrorStatusrécursion bornée —maxDepth5,undefinedau-delà. Test cause circulaire ajouté. - [P4] NFR-R1 timer déplacé avant
getSecondaryProvider— mesure complète du failover. - [P5] Tests ajoutés : 403 non-retriable, cause circulaire, cause nested, provider inconnu, same provider skip, config error preserves primary error. 14→29 tests au total.
Deferred
- Chat mid-stream failure — by design (AC5 retry au start seulement)
- Ollama lane URLs absents de config.ts — cfgOnly() intentionnel
- Batch embedding all-or-nothing — pré-existant
- onFinish sans error handling — pré-existant
- Pas de circuit breaker — out of scope 3.3
- incrementUsageAsync sur fail-open — by design
Dismissed
- title-suggestions/task-extract réutilisent 'tags' lane — by design
- Pas de régression inline 3.2 — tests séparés
- embeddingModelName calculé pour non-embedding — pas un bug
- Mock state au-delà du 2ème appel — correct
- Helpers sans annotations de type — TypeScript infère