Files
Momento/docs/3-3-smart-routing-fallback.md
Antigravity bd495be965
All checks were successful
Deploy to Production / Build and Deploy (push) Successful in 12s
feat: design system overhaul — sidebar, AI chats, settings, brainstorm, color cleanup
- Sidebar: dynamic brand-accent colors, brainstorm section restyled
- AI chat general: popup panel with expand/collapse, hides when contextual AI open
- AI chat contextual: tabs reordered (Actions first), X close button, height fix
- Settings: all tabs restyled, 6 new color presets (sage, terracotta, iron, etc.)
- Global color cleanup: emerald/orange hardcoded → brand-accent dynamic
- Brainstorm page: orange → brand-accent throughout
- PageEntry animation component added to key pages
- Floating AI button: bg-brand-accent instead of hardcoded black
- i18n: all 15 locales updated with new AI/billing keys
- Billing: freemium quota tracking, BYOK, stripe subscription scaffolding
- Admin: integrated into new design
- AGENTS.md + CLAUDE.md project rules added
2026-05-16 12:59:30 +00:00

17 KiB

Story 3.3: Smart-Routing Fallback

Status: review

Story

As a system, I want to automatically fall back to a secondary provider when the primary fails, so that users experience zero downtime during external API outages.

Epic: Epic 3 — The SaaS Commercial Engine (Monetization & API Cost Protection)
FR coverage: FR18 (admin-configurable fallback rules), NFR-R1 (graceful degradation ≤1.5s).


Acceptance Criteria

  1. [AC1] Retriable failures: When a primary provider call fails with HTTP 429 or 5xx (or AI SDK equivalent such as APICallError with those status codes), the system treats the failure as retriable and attempts exactly one secondary route for the same feature lane (chat | tags | embedding).
  2. [AC2] NFR-R1 (≤1.5s): From the moment the primary failure is classified as retriable until the secondary provider accepts the request (first successful response chunk for streaming, or resolved promise for non-streaming), elapsed wall-clock time MUST be ≤ 1500ms. Measure with performance.now() in tests; log fallbackMs in debug mode.
  3. [AC3] Per-lane secondary config: Secondary provider/model are resolved from admin/env keys mirroring primary lane keys:
    • AI_PROVIDER_CHAT_FALLBACK, AI_MODEL_CHAT_FALLBACK
    • AI_PROVIDER_TAGS_FALLBACK, AI_MODEL_TAGS_FALLBACK
    • AI_PROVIDER_EMBEDDING_FALLBACK, AI_MODEL_EMBEDDING_FALLBACK If fallback provider is unset/empty, behavior is primary-only (no-op fallback — same as today).
  4. [AC4] Single choke-point: Fallback orchestration lives in memento-note/lib/ai/fallback.ts (new) and is invoked from getChatProvider / getTagsProvider / getEmbeddingsProvider via a thin wrapper OR from one shared withAiFallback(lane, config, fn) used by all hot paths — not copy-pasted in every API route. resolveAiRoute() in router.ts stays synchronous and primary-only; fallback is a separate resolve using *_FALLBACK keys.
  5. [AC5] Chat streaming path: app/api/chat/route.ts MUST use fallback-aware execution so a failed streamText start on primary retries once on secondary before returning 502 to the client. Do not buffer entire streams for retry.
  6. [AC6] Non-retriable errors: 4xx other than 429 (401, 403, 400), validation errors, and quota errors (QuotaExceededError / HTTP 402 from entitlements) MUST not trigger provider fallback.
  7. [AC7] Observability: When fallback fires, emit structured debug log { lane, primaryProvider, secondaryProvider, primaryStatus, fallbackMs } — gated by NODE_ENV !== 'production' or MEMENTO_AI_ROUTE_DEBUG=1 (same pattern as Story 3.2). Never log API keys.
  8. [AC8] Regression: Existing resolveAiRoute unit tests and factory delegation from Story 3.2 remain green; new tests cover fallback resolution and retriable error classification.

Tasks / Subtasks

  • Task 1: Fallback resolution API (AC: #3, #4)
    • Subtask 1.1: Add resolveAiFallbackRoute(lane, config): ResolvedAiRoute | null in fallback.ts (returns null if no fallback provider configured)
    • Subtask 1.2: Register new keys in lib/config.ts ENV_FALLBACKS for all six *_FALLBACK keys
    • Subtask 1.3: Optional admin fields in admin-settings-form.tsx for fallback provider + model per lane (FR18 minimal — three dropdowns + model inputs)
  • Task 2: Error classification (AC: #1, #6)
    • Subtask 2.1: Implement isRetriableProviderError(err: unknown): boolean handling AI SDK APICallError, Response status, and provider-specific wrappers
    • Subtask 2.2: Unit tests for 429, 500, 503 → true; 401, 402, 400 → false
  • Task 3: Execution wrapper (AC: #2, #4, #7)
    • Subtask 3.1: Implement withAiProviderFallback<T>(lane, config, execute: (provider: AIProvider) => Promise<T>): Promise<T> with 1500ms budget for fallback attempt after primary failure
    • Subtask 3.2: On success via secondary, call optional debug logger with timing meta
  • Task 4: Integrate hot paths (AC: #5, #4)
    • Subtask 4.1: Chat: wrap streamText initiation in app/api/chat/route.ts (primary getChatProvider → on retriable failure → secondary provider model)
    • Subtask 4.2: Tags/titles: wrap calls in app/api/ai/tags/route.ts, app/api/ai/title-suggestions/route.ts (and task-extract.tool.ts if same pattern)
    • Subtask 4.3: Embeddings: wrap getEmbeddings in embedding.service.ts call site
    • Subtask 4.4: Defer low-traffic paths (brainstorm, agents, pptx) to follow-up — documented below
  • Task 5: Tests & NFR-R1 proof (AC: #2, #8)
    • Subtask 5.1: tests/unit/fallback.test.ts — resolution, classification, mocked dual-provider success under 1.5s
    • Subtask 5.2: Run npm run test:unit -- tests/unit/fallback.test.ts tests/unit/router.test.ts
  • Task 6: Explicit non-goals (AC: #6)
    • Subtask 6.1: Do not implement BYOK “no fallback” branch logic beyond a commented seam / early return stub — Story 3.5
    • Subtask 6.2: Do not implement multi-hop chains (tertiary provider) or cost-sorted global PROVIDER_FALLBACK_CHAIN from byok-billing-patch-v3.md wholesale — single secondary per lane only
    • Subtask 6.3: Do not fallback on Memento quota exhaustion — entitlements stay upstream of provider calls

Dev Notes

Epic context

  • 3.1 — Redis entitlements before AI; checkEntitlementOrThrow stays before any provider call. Fallback does not bypass quotas.
  • 3.2router.ts + factory delegation delivered; primary route only. This story adds failure-driven secondary execution without changing resolveAiRoute semantics.
  • 3.4 — Host-pays billing context; fallback must not mis-attribute token usage (track against same user/host as primary attempt).
  • 3.5 — BYOK: when user key is active, skip system fallback chain (document seam only in 3.3).

Current codebase reality (READ BEFORE CODING)

File Current state This story changes
lib/ai/router.ts Sync primary resolveAiRoute; comments point to 3.3 Add no HTTP here; optionally export lane types for fallback
lib/ai/factory.ts get*ProviderresolveAiRoute + getProviderInstance Call sites use withAiProviderFallback OR factory returns wrapper
app/api/chat/route.ts getChatProvider + streamText Wrap stream start with fallback
lib/config.ts No *_FALLBACK keys yet Add env fallbacks
Admin settings Primary provider/model only Add fallback fields (FR18)

There is no lib/ai/fallback.ts today. PRD FR18 and byok-billing-patch-v3.md describe a fuller aspirational router — implement NFR-R1 + single secondary per lane, not the full patch pseudocode.

// lib/ai/fallback.ts (sketch — adapt to repo patterns)

export function resolveAiFallbackRoute(lane: AiFeatureLane, config: Record<string, string>): ResolvedAiRoute | null {
  const providerKey = lane === 'chat' ? 'AI_PROVIDER_CHAT_FALLBACK' : lane === 'tags' ? 'AI_PROVIDER_TAGS_FALLBACK' : 'AI_PROVIDER_EMBEDDING_FALLBACK'
  const modelKey = lane === 'chat' ? 'AI_MODEL_CHAT_FALLBACK' : lane === 'tags' ? 'AI_MODEL_TAGS_FALLBACK' : 'AI_MODEL_EMBEDDING_FALLBACK'
  const provider = pick(config, providerKey)
  if (!provider) return null
  // build ResolvedAiRoute using same validation as router (anthropic guard on embedding lane)
}

export async function withAiProviderFallback<T>(
  lane: AiFeatureLane,
  config: Record<string, string>,
  run: (provider: AIProvider) => Promise<T>,
): Promise<T> {
  const primary = getProviderForLane(lane, config) // use existing factory helpers internally
  try {
    return await run(primary)
  } catch (err) {
    if (!isRetriableProviderError(err)) throw err
    const fb = resolveAiFallbackRoute(lane, config)
    if (!fb) throw err
    const t0 = performance.now()
    const secondary = getProviderInstance(fb.providerType, config, fb.modelName, fb.embeddingModelName, fb.ollamaBaseUrl)
    try {
      const result = await run(secondary)
      logFallbackDebug({ lane, fallbackMs: performance.now() - t0, ... })
      return result
    } catch (secondaryErr) {
      throw secondaryErr // or aggregate errors
    }
  }
}

Streaming chat: Primary streamText may fail before body bytes; catch that error, then call streamText again with secondary.getModel(). Do not retry mid-stream after partial tokens were sent.

Files — expected touch list

NEW

  • memento-note/lib/ai/fallback.ts
  • memento-note/tests/unit/fallback.test.ts

UPDATE

  • memento-note/lib/config.tsENV_FALLBACKS for fallback keys
  • memento-note/app/api/chat/route.ts — fallback-aware streamText
  • memento-note/app/api/ai/tags/route.ts
  • memento-note/app/api/ai/title-suggestions/route.ts
  • memento-note/lib/ai/services/semantic-search.service.ts (embedding path — verify exact call site)
  • memento-note/app/(admin)/admin/settings/admin-settings-form.tsx — optional fallback UI (FR18)
  • memento-note/locales/en.json + fr.json — admin labels only if UI added

READ BEFORE MODIFY

  • memento-note/lib/ai/router.ts — primary resolution; do not break
  • memento-note/lib/ai/factory.tsgetProviderInstance, PROVIDER_DEFAULTS
  • memento-note/lib/entitlements.ts — ordering vs fallback
  • memento-note/tests/unit/router.test.ts — regression baseline

Testing standards

  • Vitest; mock providers throwing controlled errors.
  • Use vi.fn() primary fail / secondary succeed pattern.
  • Timing test: secondary invoked within 1500ms of primary failure (mock instant failures).

Dev Agent Guardrails

Technical requirements

  • NFR-R1 scope: Time budget covers failover decision + secondary request start, not full LLM generation latency.
  • One retry only: At most one secondary attempt per user request per lane.
  • Preserve Story 3.2 NFR-P3: resolveAiRoute / resolveAiFallbackRoute remain sync, no HTTP, no Redis.
  • Keys: Never log secrets; debug logs use provider type and model id only.
  • i18n: If admin UI adds labels, update en.json and fr.json — no hardcoded French/English in components.

Architecture compliance

  • Brownfield Next.js App Router; reuse AIProvider interface (lib/ai/types.ts).
  • OpenRouter secondary models: slash IDs (deepseek/deepseek-chat) via existing createOpenRouterProvider.
  • Embedding lane: reuse router rule — reject anthropic / anthropic_custom for embedding fallback.

Library / framework requirements

  • Reuse Vercel AI SDK error types (import { APICallError } from 'ai') for status detection where applicable.
  • No new HTTP client; no circuit-breaker library required for 3.3.

File structure requirements

  • fallback.ts beside router.ts under lib/ai/.
  • Named exports; match factory.ts / router.ts style.

Testing requirements

  • isRetriableProviderError matrix test (429, 500, 401, QuotaExceededError).
  • resolveAiFallbackRoute returns null when unset; valid route when configured.
  • Integration-style unit test for withAiProviderFallback success path.

Previous Story Intelligence

Source: docs/3-2-custom-llm-router.md

  • Central router is sync; factory delegates getChatProvider / getTagsProvider / getEmbeddingsProvider to resolveAiRoute.
  • Explicit non-goal in 3.2: multi-provider HTTP fallback → this story.
  • Chat logs formatAiRouteDebug when MEMENTO_AI_ROUTE_DEBUG=1 or non-production.
  • Extension seam for BYOK commented in router.ts — fallback module should accept future skipFallback: true when BYOK active (3.5).

Source: docs/3-1-freemium-quota-tracking.md

  • Quota checks run before AI; 402 is not a provider outage.
  • Redis fail-open on entitlement errors — do not conflate with provider fallback.
  • Feature keys: chat, semantic_search, auto_tag, auto_title.

Git Intelligence Summary

Commit Insight
41596c2 OpenRouter key fallback OPENROUTER_API_KEYCUSTOM_OPENAI_API_KEY — secondary route must use same getProviderInstance paths
1fcea6e Recent AI/embeddings work — verify semantic search embedding call site before wrapping
195e845 Security hardening elsewhere — fallback logs must not leak prompts or keys

Latest Technical Information

  • Vercel AI SDK: APICallError exposes statusCode for HTTP classification; use for 429/5xx detection.
  • OpenRouter: OpenAI-compatible errors return standard HTTP status on upstream failures; treat like other OpenAI-compatible providers.
  • Default secondary suggestion (dev/staging only, not hardcoded in prod): If admin leaves fallback empty, document recommended pairing in Dev Notes (e.g. primary openai → secondary deepseek or openrouter) but require explicit config for production behavior.

Project Context Reference

  • Epics: docs/epics.md — Story 3.3 + NFR-R1
  • PRD: docs/prd.md — FR18, NFR-R1
  • Implementation readiness: docs/implementation-readiness-report.md — FR18 marked missing; this story is first slice
  • Aspirational full router: memento-note/docs/byok-billing-patch-v3.md §2 — executeLLM + PROVIDER_FALLBACK_CHAIN; do not implement wholesale
  • Prior stories: docs/3-1-freemium-quota-tracking.md, docs/3-2-custom-llm-router.md

Dev Agent Record

Agent Model Used

Composer (Cursor)

Debug Log References

  • npm run test:unit -- tests/unit/fallback.test.ts tests/unit/router.test.ts — 14 passed

Completion Notes List

  • Added lib/ai/fallback.ts with resolveAiFallbackRoute, isRetriableProviderError, withAiProviderFallback (1500ms budget, single secondary retry).
  • Integrated hot paths: chat (streamText), tags, title-suggestions, embeddings (embedding.service.ts), task-extract.tool.ts.
  • Admin UI: fallback provider/model per lane (tags, embeddings, chat) + i18n FR/EN.
  • skipSystemFallback option stubbed for Story 3.5 BYOK.
  • Deferred: brainstorm, agents, pptx routes (low traffic).

File List

  • memento-note/lib/ai/fallback.ts (new)
  • memento-note/lib/config.ts
  • memento-note/tests/unit/fallback.test.ts (new)
  • memento-note/app/api/chat/route.ts
  • memento-note/app/api/ai/tags/route.ts
  • memento-note/app/api/ai/title-suggestions/route.ts
  • memento-note/lib/ai/services/embedding.service.ts
  • memento-note/lib/ai/tools/task-extract.tool.ts
  • memento-note/app/(admin)/admin/settings/admin-settings-form.tsx
  • memento-note/locales/en.json
  • memento-note/locales/fr.json

Change Log

  • 2026-05-15: Story 3.3 implemented — provider failover on 429/5xx with per-lane admin fallback config.
  • 2026-05-15: Code review — 2 decisions, 5 patches applied, 6 deferred, 5 dismissed. 29 tests passing.

Story Completion Status

  • Story ID: 3.3
  • Story Key: 3-3-smart-routing-fallback
  • File: docs/3-3-smart-routing-fallback.md
  • Status: review
  • Completion Note: Code review patches applied. 29 tests passing (14 fallback + 15 router).

Review Findings (2026-05-15)

Decisions — Resolved

  • [D1→A] Fallback provider validation — ajout VALID_PROVIDERS check dans resolveAiFallbackRoute (import depuis router.ts), throw sur provider inconnu.
  • [D2→A] Same-provider skip — si fallback === primaire, resolveAiFallbackRoute retourne null (pas de retry inutile).

Patches — Applied

  • [P1] CRITIQUE — Auth bypass title-suggestions/route.ts : ajout early return 401 pour session null + .catch() sur incrementUsageAsync.
  • [P2] resolveAiFallbackRoute throw dans catch — getSecondaryProvider wrappé dans try/catch qui retourne null sur erreur config, erreur primaire préservée.
  • [P3] extractProviderErrorStatus récursion bornée — maxDepth 5, undefined au-delà. Test cause circulaire ajouté.
  • [P4] NFR-R1 timer déplacé avant getSecondaryProvider — mesure complète du failover.
  • [P5] Tests ajoutés : 403 non-retriable, cause circulaire, cause nested, provider inconnu, same provider skip, config error preserves primary error. 14→29 tests au total.

Deferred

  • Chat mid-stream failure — by design (AC5 retry au start seulement)
  • Ollama lane URLs absents de config.ts — cfgOnly() intentionnel
  • Batch embedding all-or-nothing — pré-existant
  • onFinish sans error handling — pré-existant
  • Pas de circuit breaker — out of scope 3.3
  • incrementUsageAsync sur fail-open — by design

Dismissed

  • title-suggestions/task-extract réutilisent 'tags' lane — by design
  • Pas de régression inline 3.2 — tests séparés
  • embeddingModelName calculé pour non-embedding — pas un bug
  • Mock state au-delà du 2ème appel — correct
  • Helpers sans annotations de type — TypeScript infère