Files

Deploy to Production / Build and Deploy (push) Successful in 12s

Details

feat: design system overhaul — sidebar, AI chats, settings, brainstorm, color cleanup

- Sidebar: dynamic brand-accent colors, brainstorm section restyled
- AI chat general: popup panel with expand/collapse, hides when contextual AI open
- AI chat contextual: tabs reordered (Actions first), X close button, height fix
- Settings: all tabs restyled, 6 new color presets (sage, terracotta, iron, etc.)
- Global color cleanup: emerald/orange hardcoded → brand-accent dynamic
- Brainstorm page: orange → brand-accent throughout
- PageEntry animation component added to key pages
- Floating AI button: bg-brand-accent instead of hardcoded black
- i18n: all 15 locales updated with new AI/billing keys
- Billing: freemium quota tracking, BYOK, stripe subscription scaffolding
- Admin: integrated into new design
- AGENTS.md + CLAUDE.md project rules added

2026-05-16 12:59:30 +00:00

17 KiB

Raw Blame History

Story 3.3: Smart-Routing Fallback

Status: review

Story

As a system, I want to automatically fall back to a secondary provider when the primary fails, so that users experience zero downtime during external API outages.

Epic: Epic 3 — The SaaS Commercial Engine (Monetization & API Cost Protection)
FR coverage: FR18 (admin-configurable fallback rules), NFR-R1 (graceful degradation ≤1.5s).

Acceptance Criteria

[AC1] Retriable failures: When a primary provider call fails with HTTP 429 or 5xx (or AI SDK equivalent such as APICallError with those status codes), the system treats the failure as retriable and attempts exactly one secondary route for the same feature lane (chat | tags | embedding).
[AC2] NFR-R1 (≤1.5s): From the moment the primary failure is classified as retriable until the secondary provider accepts the request (first successful response chunk for streaming, or resolved promise for non-streaming), elapsed wall-clock time MUST be ≤ 1500ms. Measure with performance.now() in tests; log fallbackMs in debug mode.
[AC3] Per-lane secondary config: Secondary provider/model are resolved from admin/env keys mirroring primary lane keys:
- AI_PROVIDER_CHAT_FALLBACK, AI_MODEL_CHAT_FALLBACK
- AI_PROVIDER_TAGS_FALLBACK, AI_MODEL_TAGS_FALLBACK
- AI_PROVIDER_EMBEDDING_FALLBACK, AI_MODEL_EMBEDDING_FALLBACK If fallback provider is unset/empty, behavior is primary-only (no-op fallback — same as today).
[AC4] Single choke-point: Fallback orchestration lives in memento-note/lib/ai/fallback.ts (new) and is invoked from getChatProvider / getTagsProvider / getEmbeddingsProvider via a thin wrapper OR from one shared withAiFallback(lane, config, fn) used by all hot paths — not copy-pasted in every API route. resolveAiRoute() in router.ts stays synchronous and primary-only; fallback is a separate resolve using *_FALLBACK keys.
[AC5] Chat streaming path: app/api/chat/route.ts MUST use fallback-aware execution so a failed streamText start on primary retries once on secondary before returning 502 to the client. Do not buffer entire streams for retry.
[AC6] Non-retriable errors: 4xx other than 429 (401, 403, 400), validation errors, and quota errors (QuotaExceededError / HTTP 402 from entitlements) MUST not trigger provider fallback.
[AC7] Observability: When fallback fires, emit structured debug log { lane, primaryProvider, secondaryProvider, primaryStatus, fallbackMs } — gated by NODE_ENV !== 'production' or MEMENTO_AI_ROUTE_DEBUG=1 (same pattern as Story 3.2). Never log API keys.
[AC8] Regression: Existing resolveAiRoute unit tests and factory delegation from Story 3.2 remain green; new tests cover fallback resolution and retriable error classification.

Tasks / Subtasks

Task 1: Fallback resolution API (AC: #3, #4)
- Subtask 1.1: Add resolveAiFallbackRoute(lane, config): ResolvedAiRoute | null in fallback.ts (returns null if no fallback provider configured)
- Subtask 1.2: Register new keys in lib/config.ts ENV_FALLBACKS for all six *_FALLBACK keys
- Subtask 1.3: Optional admin fields in admin-settings-form.tsx for fallback provider + model per lane (FR18 minimal — three dropdowns + model inputs)
Task 2: Error classification (AC: #1, #6)
- Subtask 2.1: Implement isRetriableProviderError(err: unknown): boolean handling AI SDK APICallError, Response status, and provider-specific wrappers
- Subtask 2.2: Unit tests for 429, 500, 503 → true; 401, 402, 400 → false
Task 3: Execution wrapper (AC: #2, #4, #7)
- Subtask 3.1: Implement withAiProviderFallback<T>(lane, config, execute: (provider: AIProvider) => Promise<T>): Promise<T> with 1500ms budget for fallback attempt after primary failure
- Subtask 3.2: On success via secondary, call optional debug logger with timing meta
Task 4: Integrate hot paths (AC: #5, #4)
- Subtask 4.1: Chat: wrap streamText initiation in app/api/chat/route.ts (primary getChatProvider → on retriable failure → secondary provider model)
- Subtask 4.2: Tags/titles: wrap calls in app/api/ai/tags/route.ts, app/api/ai/title-suggestions/route.ts (and task-extract.tool.ts if same pattern)
- Subtask 4.3: Embeddings: wrap getEmbeddings in embedding.service.ts call site
- Subtask 4.4: Defer low-traffic paths (brainstorm, agents, pptx) to follow-up — documented below
Task 5: Tests & NFR-R1 proof (AC: #2, #8)
- Subtask 5.1: tests/unit/fallback.test.ts — resolution, classification, mocked dual-provider success under 1.5s
- Subtask 5.2: Run npm run test:unit -- tests/unit/fallback.test.ts tests/unit/router.test.ts
Task 6: Explicit non-goals (AC: #6)
- Subtask 6.1: Do not implement BYOK “no fallback” branch logic beyond a commented seam / early return stub — Story 3.5
- Subtask 6.2: Do not implement multi-hop chains (tertiary provider) or cost-sorted global PROVIDER_FALLBACK_CHAIN from byok-billing-patch-v3.md wholesale — single secondary per lane only
- Subtask 6.3: Do not fallback on Memento quota exhaustion — entitlements stay upstream of provider calls

Dev Notes

Epic context

3.1 — Redis entitlements before AI; checkEntitlementOrThrow stays before any provider call. Fallback does not bypass quotas.
3.2 — router.ts + factory delegation delivered; primary route only. This story adds failure-driven secondary execution without changing resolveAiRoute semantics.
3.4 — Host-pays billing context; fallback must not mis-attribute token usage (track against same user/host as primary attempt).
3.5 — BYOK: when user key is active, skip system fallback chain (document seam only in 3.3).

Current codebase reality (READ BEFORE CODING)

File	Current state	This story changes
`lib/ai/router.ts`	Sync primary `resolveAiRoute`; comments point to 3.3	Add no HTTP here; optionally export lane types for fallback
`lib/ai/factory.ts`	`get*Provider` → `resolveAiRoute` + `getProviderInstance`	Call sites use `withAiProviderFallback` OR factory returns wrapper
`app/api/chat/route.ts`	`getChatProvider` + `streamText`	Wrap stream start with fallback
`lib/config.ts`	No `*_FALLBACK` keys yet	Add env fallbacks
Admin settings	Primary provider/model only	Add fallback fields (FR18)

There is no lib/ai/fallback.ts today. PRD FR18 and byok-billing-patch-v3.md describe a fuller aspirational router — implement NFR-R1 + single secondary per lane, not the full patch pseudocode.

Recommended implementation shape

// lib/ai/fallback.ts (sketch — adapt to repo patterns)

export function resolveAiFallbackRoute(lane: AiFeatureLane, config: Record<string, string>): ResolvedAiRoute | null {
  const providerKey = lane === 'chat' ? 'AI_PROVIDER_CHAT_FALLBACK' : lane === 'tags' ? 'AI_PROVIDER_TAGS_FALLBACK' : 'AI_PROVIDER_EMBEDDING_FALLBACK'
  const modelKey = lane === 'chat' ? 'AI_MODEL_CHAT_FALLBACK' : lane === 'tags' ? 'AI_MODEL_TAGS_FALLBACK' : 'AI_MODEL_EMBEDDING_FALLBACK'
  const provider = pick(config, providerKey)
  if (!provider) return null
  // build ResolvedAiRoute using same validation as router (anthropic guard on embedding lane)
}

export async function withAiProviderFallback<T>(
  lane: AiFeatureLane,
  config: Record<string, string>,
  run: (provider: AIProvider) => Promise<T>,
): Promise<T> {
  const primary = getProviderForLane(lane, config) // use existing factory helpers internally
  try {
    return await run(primary)
  } catch (err) {
    if (!isRetriableProviderError(err)) throw err
    const fb = resolveAiFallbackRoute(lane, config)
    if (!fb) throw err
    const t0 = performance.now()
    const secondary = getProviderInstance(fb.providerType, config, fb.modelName, fb.embeddingModelName, fb.ollamaBaseUrl)
    try {
      const result = await run(secondary)
      logFallbackDebug({ lane, fallbackMs: performance.now() - t0, ... })
      return result
    } catch (secondaryErr) {
      throw secondaryErr // or aggregate errors
    }
  }
}

Streaming chat: Primary streamText may fail before body bytes; catch that error, then call streamText again with secondary.getModel(). Do not retry mid-stream after partial tokens were sent.

Files — expected touch list

NEW

memento-note/lib/ai/fallback.ts
memento-note/tests/unit/fallback.test.ts

UPDATE

memento-note/lib/config.ts — ENV_FALLBACKS for fallback keys
memento-note/app/api/chat/route.ts — fallback-aware streamText
memento-note/app/api/ai/tags/route.ts
memento-note/app/api/ai/title-suggestions/route.ts
memento-note/lib/ai/services/semantic-search.service.ts (embedding path — verify exact call site)
memento-note/app/(admin)/admin/settings/admin-settings-form.tsx — optional fallback UI (FR18)
memento-note/locales/en.json + fr.json — admin labels only if UI added

READ BEFORE MODIFY

memento-note/lib/ai/router.ts — primary resolution; do not break
memento-note/lib/ai/factory.ts — getProviderInstance, PROVIDER_DEFAULTS
memento-note/lib/entitlements.ts — ordering vs fallback
memento-note/tests/unit/router.test.ts — regression baseline

Testing standards

Vitest; mock providers throwing controlled errors.
Use vi.fn() primary fail / secondary succeed pattern.
Timing test: secondary invoked within 1500ms of primary failure (mock instant failures).

Dev Agent Guardrails

Technical requirements

NFR-R1 scope: Time budget covers failover decision + secondary request start, not full LLM generation latency.
One retry only: At most one secondary attempt per user request per lane.
Preserve Story 3.2 NFR-P3: resolveAiRoute / resolveAiFallbackRoute remain sync, no HTTP, no Redis.
Keys: Never log secrets; debug logs use provider type and model id only.
i18n: If admin UI adds labels, update en.json and fr.json — no hardcoded French/English in components.

Architecture compliance

Brownfield Next.js App Router; reuse AIProvider interface (lib/ai/types.ts).
OpenRouter secondary models: slash IDs (deepseek/deepseek-chat) via existing createOpenRouterProvider.
Embedding lane: reuse router rule — reject anthropic / anthropic_custom for embedding fallback.

Library / framework requirements

Reuse Vercel AI SDK error types (import { APICallError } from 'ai') for status detection where applicable.
No new HTTP client; no circuit-breaker library required for 3.3.

File structure requirements

fallback.ts beside router.ts under lib/ai/.
Named exports; match factory.ts / router.ts style.

Testing requirements

isRetriableProviderError matrix test (429, 500, 401, QuotaExceededError).
resolveAiFallbackRoute returns null when unset; valid route when configured.
Integration-style unit test for withAiProviderFallback success path.

Previous Story Intelligence

Source: docs/3-2-custom-llm-router.md

Central router is sync; factory delegates getChatProvider / getTagsProvider / getEmbeddingsProvider to resolveAiRoute.
Explicit non-goal in 3.2: multi-provider HTTP fallback → this story.
Chat logs formatAiRouteDebug when MEMENTO_AI_ROUTE_DEBUG=1 or non-production.
Extension seam for BYOK commented in router.ts — fallback module should accept future skipFallback: true when BYOK active (3.5).

Source: docs/3-1-freemium-quota-tracking.md

Quota checks run before AI; 402 is not a provider outage.
Redis fail-open on entitlement errors — do not conflate with provider fallback.
Feature keys: chat, semantic_search, auto_tag, auto_title.

Git Intelligence Summary

Commit	Insight
`41596c2`	OpenRouter key fallback `OPENROUTER_API_KEY` → `CUSTOM_OPENAI_API_KEY` — secondary route must use same `getProviderInstance` paths
`1fcea6e`	Recent AI/embeddings work — verify semantic search embedding call site before wrapping
`195e845`	Security hardening elsewhere — fallback logs must not leak prompts or keys

Latest Technical Information

Vercel AI SDK: APICallError exposes statusCode for HTTP classification; use for 429/5xx detection.
OpenRouter: OpenAI-compatible errors return standard HTTP status on upstream failures; treat like other OpenAI-compatible providers.
Default secondary suggestion (dev/staging only, not hardcoded in prod): If admin leaves fallback empty, document recommended pairing in Dev Notes (e.g. primary openai → secondary deepseek or openrouter) but require explicit config for production behavior.

Project Context Reference

Epics: docs/epics.md — Story 3.3 + NFR-R1
PRD: docs/prd.md — FR18, NFR-R1
Implementation readiness: docs/implementation-readiness-report.md — FR18 marked missing; this story is first slice
Aspirational full router: memento-note/docs/byok-billing-patch-v3.md §2 — executeLLM + PROVIDER_FALLBACK_CHAIN; do not implement wholesale
Prior stories: docs/3-1-freemium-quota-tracking.md, docs/3-2-custom-llm-router.md

Dev Agent Record

Agent Model Used

Composer (Cursor)

Debug Log References

npm run test:unit -- tests/unit/fallback.test.ts tests/unit/router.test.ts — 14 passed

Completion Notes List

Added lib/ai/fallback.ts with resolveAiFallbackRoute, isRetriableProviderError, withAiProviderFallback (1500ms budget, single secondary retry).
Integrated hot paths: chat (streamText), tags, title-suggestions, embeddings (embedding.service.ts), task-extract.tool.ts.
Admin UI: fallback provider/model per lane (tags, embeddings, chat) + i18n FR/EN.
skipSystemFallback option stubbed for Story 3.5 BYOK.
Deferred: brainstorm, agents, pptx routes (low traffic).

File List

memento-note/lib/ai/fallback.ts (new)
memento-note/lib/config.ts
memento-note/tests/unit/fallback.test.ts (new)
memento-note/app/api/chat/route.ts
memento-note/app/api/ai/tags/route.ts
memento-note/app/api/ai/title-suggestions/route.ts
memento-note/lib/ai/services/embedding.service.ts
memento-note/lib/ai/tools/task-extract.tool.ts
memento-note/app/(admin)/admin/settings/admin-settings-form.tsx
memento-note/locales/en.json
memento-note/locales/fr.json

Change Log

2026-05-15: Story 3.3 implemented — provider failover on 429/5xx with per-lane admin fallback config.
2026-05-15: Code review — 2 decisions, 5 patches applied, 6 deferred, 5 dismissed. 29 tests passing.

Story Completion Status

Story ID: 3.3
Story Key: 3-3-smart-routing-fallback
File: docs/3-3-smart-routing-fallback.md
Status: review
Completion Note: Code review patches applied. 29 tests passing (14 fallback + 15 router).

Review Findings (2026-05-15)

Decisions — Resolved

[D1→A] Fallback provider validation — ajout VALID_PROVIDERS check dans resolveAiFallbackRoute (import depuis router.ts), throw sur provider inconnu.
[D2→A] Same-provider skip — si fallback === primaire, resolveAiFallbackRoute retourne null (pas de retry inutile).

Patches — Applied

[P1] CRITIQUE — Auth bypass title-suggestions/route.ts : ajout early return 401 pour session null + .catch() sur incrementUsageAsync.
[P2] resolveAiFallbackRoute throw dans catch — getSecondaryProvider wrappé dans try/catch qui retourne null sur erreur config, erreur primaire préservée.
[P3] extractProviderErrorStatus récursion bornée — maxDepth 5, undefined au-delà. Test cause circulaire ajouté.
[P4] NFR-R1 timer déplacé avant getSecondaryProvider — mesure complète du failover.
[P5] Tests ajoutés : 403 non-retriable, cause circulaire, cause nested, provider inconnu, same provider skip, config error preserves primary error. 14→29 tests au total.

Deferred

Chat mid-stream failure — by design (AC5 retry au start seulement)
Ollama lane URLs absents de config.ts — cfgOnly() intentionnel
Batch embedding all-or-nothing — pré-existant
onFinish sans error handling — pré-existant
Pas de circuit breaker — out of scope 3.3
incrementUsageAsync sur fail-open — by design

Dismissed

title-suggestions/task-extract réutilisent 'tags' lane — by design
Pas de régression inline 3.2 — tests séparés
embeddingModelName calculé pour non-embedding — pas un bug
Mock state au-delà du 2ème appel — correct
Helpers sans annotations de type — TypeScript infère

17 KiB Raw Blame History