All checks were successful
Deploy to Production / Build and Deploy (push) Successful in 12s
- Sidebar: dynamic brand-accent colors, brainstorm section restyled - AI chat general: popup panel with expand/collapse, hides when contextual AI open - AI chat contextual: tabs reordered (Actions first), X close button, height fix - Settings: all tabs restyled, 6 new color presets (sage, terracotta, iron, etc.) - Global color cleanup: emerald/orange hardcoded → brand-accent dynamic - Brainstorm page: orange → brand-accent throughout - PageEntry animation component added to key pages - Floating AI button: bg-brand-accent instead of hardcoded black - i18n: all 15 locales updated with new AI/billing keys - Billing: freemium quota tracking, BYOK, stripe subscription scaffolding - Admin: integrated into new design - AGENTS.md + CLAUDE.md project rules added
314 lines
17 KiB
Markdown
314 lines
17 KiB
Markdown
# Story 3.3: Smart-Routing Fallback
|
|
|
|
Status: review
|
|
|
|
<!-- Ultimate context engine analysis completed - comprehensive developer guide created -->
|
|
|
|
## Story
|
|
|
|
As a system,
|
|
I want to automatically fall back to a secondary provider when the primary fails,
|
|
so that users experience zero downtime during external API outages.
|
|
|
|
**Epic:** Epic 3 — The SaaS Commercial Engine (Monetization & API Cost Protection)
|
|
**FR coverage:** FR18 (admin-configurable fallback rules), NFR-R1 (graceful degradation ≤1.5s).
|
|
|
|
---
|
|
|
|
## Acceptance Criteria
|
|
|
|
1. [AC1] **Retriable failures:** When a primary provider call fails with HTTP **429** or **5xx** (or AI SDK equivalent such as `APICallError` with those status codes), the system treats the failure as retriable and attempts exactly **one** secondary route for the same feature lane (`chat` | `tags` | `embedding`).
|
|
2. [AC2] **NFR-R1 (≤1.5s):** From the moment the primary failure is classified as retriable until the secondary provider accepts the request (first successful response chunk for streaming, or resolved promise for non-streaming), elapsed wall-clock time MUST be **≤ 1500ms**. Measure with `performance.now()` in tests; log `fallbackMs` in debug mode.
|
|
3. [AC3] **Per-lane secondary config:** Secondary provider/model are resolved from admin/env keys mirroring primary lane keys:
|
|
- `AI_PROVIDER_CHAT_FALLBACK`, `AI_MODEL_CHAT_FALLBACK`
|
|
- `AI_PROVIDER_TAGS_FALLBACK`, `AI_MODEL_TAGS_FALLBACK`
|
|
- `AI_PROVIDER_EMBEDDING_FALLBACK`, `AI_MODEL_EMBEDDING_FALLBACK`
|
|
If fallback provider is unset/empty, behavior is **primary-only** (no-op fallback — same as today).
|
|
4. [AC4] **Single choke-point:** Fallback orchestration lives in **`memento-note/lib/ai/fallback.ts`** (new) and is invoked from **`getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider`** via a thin wrapper OR from one shared `withAiFallback(lane, config, fn)` used by all hot paths — **not** copy-pasted in every API route. `resolveAiRoute()` in `router.ts` stays synchronous and primary-only; fallback is a **separate** resolve using `*_FALLBACK` keys.
|
|
5. [AC5] **Chat streaming path:** `app/api/chat/route.ts` MUST use fallback-aware execution so a failed `streamText` start on primary retries once on secondary before returning 502 to the client. Do not buffer entire streams for retry.
|
|
6. [AC6] **Non-retriable errors:** **4xx** other than 429 (401, 403, 400), validation errors, and **quota errors** (`QuotaExceededError` / HTTP 402 from entitlements) MUST **not** trigger provider fallback.
|
|
7. [AC7] **Observability:** When fallback fires, emit structured debug log `{ lane, primaryProvider, secondaryProvider, primaryStatus, fallbackMs }` — gated by `NODE_ENV !== 'production'` or `MEMENTO_AI_ROUTE_DEBUG=1` (same pattern as Story 3.2). Never log API keys.
|
|
8. [AC8] **Regression:** Existing `resolveAiRoute` unit tests and factory delegation from Story 3.2 remain green; new tests cover fallback resolution and retriable error classification.
|
|
|
|
---
|
|
|
|
## Tasks / Subtasks
|
|
|
|
- [x] Task 1: Fallback resolution API (AC: #3, #4)
|
|
- [x] Subtask 1.1: Add `resolveAiFallbackRoute(lane, config): ResolvedAiRoute | null` in `fallback.ts` (returns `null` if no fallback provider configured)
|
|
- [x] Subtask 1.2: Register new keys in `lib/config.ts` `ENV_FALLBACKS` for all six `*_FALLBACK` keys
|
|
- [x] Subtask 1.3: Optional admin fields in `admin-settings-form.tsx` for fallback provider + model per lane (FR18 minimal — three dropdowns + model inputs)
|
|
- [x] Task 2: Error classification (AC: #1, #6)
|
|
- [x] Subtask 2.1: Implement `isRetriableProviderError(err: unknown): boolean` handling AI SDK `APICallError`, `Response` status, and provider-specific wrappers
|
|
- [x] Subtask 2.2: Unit tests for 429, 500, 503 → true; 401, 402, 400 → false
|
|
- [x] Task 3: Execution wrapper (AC: #2, #4, #7)
|
|
- [x] Subtask 3.1: Implement `withAiProviderFallback<T>(lane, config, execute: (provider: AIProvider) => Promise<T>): Promise<T>` with 1500ms budget for fallback attempt after primary failure
|
|
- [x] Subtask 3.2: On success via secondary, call optional debug logger with timing meta
|
|
- [x] Task 4: Integrate hot paths (AC: #5, #4)
|
|
- [x] Subtask 4.1: **Chat:** wrap `streamText` initiation in `app/api/chat/route.ts` (primary `getChatProvider` → on retriable failure → secondary provider model)
|
|
- [x] Subtask 4.2: **Tags/titles:** wrap calls in `app/api/ai/tags/route.ts`, `app/api/ai/title-suggestions/route.ts` (and `task-extract.tool.ts` if same pattern)
|
|
- [x] Subtask 4.3: **Embeddings:** wrap `getEmbeddings` in `embedding.service.ts` call site
|
|
- [x] Subtask 4.4: Defer low-traffic paths (brainstorm, agents, pptx) to follow-up — documented below
|
|
- [x] Task 5: Tests & NFR-R1 proof (AC: #2, #8)
|
|
- [x] Subtask 5.1: `tests/unit/fallback.test.ts` — resolution, classification, mocked dual-provider success under 1.5s
|
|
- [x] Subtask 5.2: Run `npm run test:unit -- tests/unit/fallback.test.ts tests/unit/router.test.ts`
|
|
- [x] Task 6: Explicit non-goals (AC: #6)
|
|
- [x] Subtask 6.1: Do **not** implement BYOK “no fallback” branch logic beyond a commented seam / early return stub — Story **3.5**
|
|
- [x] Subtask 6.2: Do **not** implement multi-hop chains (tertiary provider) or cost-sorted global `PROVIDER_FALLBACK_CHAIN` from `byok-billing-patch-v3.md` wholesale — single secondary per lane only
|
|
- [x] Subtask 6.3: Do **not** fallback on Memento quota exhaustion — entitlements stay upstream of provider calls
|
|
|
|
---
|
|
|
|
## Dev Notes
|
|
|
|
### Epic context
|
|
|
|
- **3.1** — Redis entitlements before AI; `checkEntitlementOrThrow` stays **before** any provider call. Fallback does not bypass quotas.
|
|
- **3.2** — `router.ts` + factory delegation delivered; primary route only. This story adds **failure-driven** secondary execution without changing `resolveAiRoute` semantics.
|
|
- **3.4** — Host-pays billing context; fallback must not mis-attribute token usage (track against same user/host as primary attempt).
|
|
- **3.5** — BYOK: when user key is active, skip system fallback chain (document seam only in 3.3).
|
|
|
|
### Current codebase reality (READ BEFORE CODING)
|
|
|
|
| File | Current state | This story changes |
|
|
|------|---------------|-------------------|
|
|
| `lib/ai/router.ts` | Sync primary `resolveAiRoute`; comments point to 3.3 | Add **no** HTTP here; optionally export lane types for fallback |
|
|
| `lib/ai/factory.ts` | `get*Provider` → `resolveAiRoute` + `getProviderInstance` | Call sites use `withAiProviderFallback` OR factory returns wrapper |
|
|
| `app/api/chat/route.ts` | `getChatProvider` + `streamText` | Wrap stream start with fallback |
|
|
| `lib/config.ts` | No `*_FALLBACK` keys yet | Add env fallbacks |
|
|
| Admin settings | Primary provider/model only | Add fallback fields (FR18) |
|
|
|
|
**There is no `lib/ai/fallback.ts` today.** PRD FR18 and `byok-billing-patch-v3.md` describe a fuller aspirational router — implement **NFR-R1 + single secondary per lane**, not the full patch pseudocode.
|
|
|
|
### Recommended implementation shape
|
|
|
|
```typescript
|
|
// lib/ai/fallback.ts (sketch — adapt to repo patterns)
|
|
|
|
export function resolveAiFallbackRoute(lane: AiFeatureLane, config: Record<string, string>): ResolvedAiRoute | null {
|
|
const providerKey = lane === 'chat' ? 'AI_PROVIDER_CHAT_FALLBACK' : lane === 'tags' ? 'AI_PROVIDER_TAGS_FALLBACK' : 'AI_PROVIDER_EMBEDDING_FALLBACK'
|
|
const modelKey = lane === 'chat' ? 'AI_MODEL_CHAT_FALLBACK' : lane === 'tags' ? 'AI_MODEL_TAGS_FALLBACK' : 'AI_MODEL_EMBEDDING_FALLBACK'
|
|
const provider = pick(config, providerKey)
|
|
if (!provider) return null
|
|
// build ResolvedAiRoute using same validation as router (anthropic guard on embedding lane)
|
|
}
|
|
|
|
export async function withAiProviderFallback<T>(
|
|
lane: AiFeatureLane,
|
|
config: Record<string, string>,
|
|
run: (provider: AIProvider) => Promise<T>,
|
|
): Promise<T> {
|
|
const primary = getProviderForLane(lane, config) // use existing factory helpers internally
|
|
try {
|
|
return await run(primary)
|
|
} catch (err) {
|
|
if (!isRetriableProviderError(err)) throw err
|
|
const fb = resolveAiFallbackRoute(lane, config)
|
|
if (!fb) throw err
|
|
const t0 = performance.now()
|
|
const secondary = getProviderInstance(fb.providerType, config, fb.modelName, fb.embeddingModelName, fb.ollamaBaseUrl)
|
|
try {
|
|
const result = await run(secondary)
|
|
logFallbackDebug({ lane, fallbackMs: performance.now() - t0, ... })
|
|
return result
|
|
} catch (secondaryErr) {
|
|
throw secondaryErr // or aggregate errors
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Streaming chat:** Primary `streamText` may fail before body bytes; catch that error, then call `streamText` again with `secondary.getModel()`. Do not retry mid-stream after partial tokens were sent.
|
|
|
|
### Files — expected touch list
|
|
|
|
**NEW**
|
|
|
|
- `memento-note/lib/ai/fallback.ts`
|
|
- `memento-note/tests/unit/fallback.test.ts`
|
|
|
|
**UPDATE**
|
|
|
|
- `memento-note/lib/config.ts` — `ENV_FALLBACKS` for fallback keys
|
|
- `memento-note/app/api/chat/route.ts` — fallback-aware `streamText`
|
|
- `memento-note/app/api/ai/tags/route.ts`
|
|
- `memento-note/app/api/ai/title-suggestions/route.ts`
|
|
- `memento-note/lib/ai/services/semantic-search.service.ts` (embedding path — verify exact call site)
|
|
- `memento-note/app/(admin)/admin/settings/admin-settings-form.tsx` — optional fallback UI (FR18)
|
|
- `memento-note/locales/en.json` + `fr.json` — admin labels only if UI added
|
|
|
|
**READ BEFORE MODIFY**
|
|
|
|
- `memento-note/lib/ai/router.ts` — primary resolution; do not break
|
|
- `memento-note/lib/ai/factory.ts` — `getProviderInstance`, `PROVIDER_DEFAULTS`
|
|
- `memento-note/lib/entitlements.ts` — ordering vs fallback
|
|
- `memento-note/tests/unit/router.test.ts` — regression baseline
|
|
|
|
### Testing standards
|
|
|
|
- Vitest; mock providers throwing controlled errors.
|
|
- Use `vi.fn()` primary fail / secondary succeed pattern.
|
|
- Timing test: secondary invoked within 1500ms of primary failure (mock instant failures).
|
|
|
|
---
|
|
|
|
## Dev Agent Guardrails
|
|
|
|
### Technical requirements
|
|
|
|
- **NFR-R1 scope:** Time budget covers **failover decision + secondary request start**, not full LLM generation latency.
|
|
- **One retry only:** At most one secondary attempt per user request per lane.
|
|
- **Preserve Story 3.2 NFR-P3:** `resolveAiRoute` / `resolveAiFallbackRoute` remain sync, no HTTP, no Redis.
|
|
- **Keys:** Never log secrets; debug logs use provider **type** and model id only.
|
|
- **i18n:** If admin UI adds labels, update `en.json` and `fr.json` — no hardcoded French/English in components.
|
|
|
|
### Architecture compliance
|
|
|
|
- Brownfield Next.js App Router; reuse `AIProvider` interface (`lib/ai/types.ts`).
|
|
- OpenRouter secondary models: slash IDs (`deepseek/deepseek-chat`) via existing `createOpenRouterProvider`.
|
|
- Embedding lane: reuse router rule — **reject** `anthropic` / `anthropic_custom` for embedding fallback.
|
|
|
|
### Library / framework requirements
|
|
|
|
- Reuse Vercel AI SDK error types (`import { APICallError } from 'ai'`) for status detection where applicable.
|
|
- No new HTTP client; no circuit-breaker library required for 3.3.
|
|
|
|
### File structure requirements
|
|
|
|
- `fallback.ts` beside `router.ts` under `lib/ai/`.
|
|
- Named exports; match `factory.ts` / `router.ts` style.
|
|
|
|
### Testing requirements
|
|
|
|
- `isRetriableProviderError` matrix test (429, 500, 401, QuotaExceededError).
|
|
- `resolveAiFallbackRoute` returns `null` when unset; valid route when configured.
|
|
- Integration-style unit test for `withAiProviderFallback` success path.
|
|
|
|
---
|
|
|
|
## Previous Story Intelligence
|
|
|
|
**Source:** `docs/3-2-custom-llm-router.md`
|
|
|
|
- Central router is **sync**; factory delegates `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider` to `resolveAiRoute`.
|
|
- Explicit non-goal in 3.2: multi-provider HTTP fallback → **this story**.
|
|
- Chat logs `formatAiRouteDebug` when `MEMENTO_AI_ROUTE_DEBUG=1` or non-production.
|
|
- Extension seam for BYOK commented in `router.ts` — fallback module should accept future `skipFallback: true` when BYOK active (3.5).
|
|
|
|
**Source:** `docs/3-1-freemium-quota-tracking.md`
|
|
|
|
- Quota checks run **before** AI; 402 is not a provider outage.
|
|
- Redis fail-open on entitlement errors — do not conflate with provider fallback.
|
|
- Feature keys: `chat`, `semantic_search`, `auto_tag`, `auto_title`.
|
|
|
|
---
|
|
|
|
## Git Intelligence Summary
|
|
|
|
| Commit | Insight |
|
|
|--------|---------|
|
|
| `41596c2` | OpenRouter key fallback `OPENROUTER_API_KEY` → `CUSTOM_OPENAI_API_KEY` — secondary route must use same `getProviderInstance` paths |
|
|
| `1fcea6e` | Recent AI/embeddings work — verify semantic search embedding call site before wrapping |
|
|
| `195e845` | Security hardening elsewhere — fallback logs must not leak prompts or keys |
|
|
|
|
---
|
|
|
|
## Latest Technical Information
|
|
|
|
- **Vercel AI SDK:** `APICallError` exposes `statusCode` for HTTP classification; use for 429/5xx detection.
|
|
- **OpenRouter:** OpenAI-compatible errors return standard HTTP status on upstream failures; treat like other OpenAI-compatible providers.
|
|
- **Default secondary suggestion (dev/staging only, not hardcoded in prod):** If admin leaves fallback empty, document recommended pairing in Dev Notes (e.g. primary `openai` → secondary `deepseek` or `openrouter`) but **require explicit config** for production behavior.
|
|
|
|
---
|
|
|
|
## Project Context Reference
|
|
|
|
- **Epics:** `docs/epics.md` — Story 3.3 + NFR-R1
|
|
- **PRD:** `docs/prd.md` — FR18, NFR-R1
|
|
- **Implementation readiness:** `docs/implementation-readiness-report.md` — FR18 marked missing; this story is first slice
|
|
- **Aspirational full router:** `memento-note/docs/byok-billing-patch-v3.md` §2 — `executeLLM` + `PROVIDER_FALLBACK_CHAIN`; **do not implement wholesale**
|
|
- **Prior stories:** `docs/3-1-freemium-quota-tracking.md`, `docs/3-2-custom-llm-router.md`
|
|
|
|
---
|
|
|
|
## Dev Agent Record
|
|
|
|
### Agent Model Used
|
|
|
|
Composer (Cursor)
|
|
|
|
### Debug Log References
|
|
|
|
- `npm run test:unit -- tests/unit/fallback.test.ts tests/unit/router.test.ts` — 14 passed
|
|
|
|
### Completion Notes List
|
|
|
|
- Added `lib/ai/fallback.ts` with `resolveAiFallbackRoute`, `isRetriableProviderError`, `withAiProviderFallback` (1500ms budget, single secondary retry).
|
|
- Integrated hot paths: chat (`streamText`), tags, title-suggestions, embeddings (`embedding.service.ts`), `task-extract.tool.ts`.
|
|
- Admin UI: fallback provider/model per lane (tags, embeddings, chat) + i18n FR/EN.
|
|
- `skipSystemFallback` option stubbed for Story 3.5 BYOK.
|
|
- Deferred: brainstorm, agents, pptx routes (low traffic).
|
|
|
|
### File List
|
|
|
|
- `memento-note/lib/ai/fallback.ts` (new)
|
|
- `memento-note/lib/config.ts`
|
|
- `memento-note/tests/unit/fallback.test.ts` (new)
|
|
- `memento-note/app/api/chat/route.ts`
|
|
- `memento-note/app/api/ai/tags/route.ts`
|
|
- `memento-note/app/api/ai/title-suggestions/route.ts`
|
|
- `memento-note/lib/ai/services/embedding.service.ts`
|
|
- `memento-note/lib/ai/tools/task-extract.tool.ts`
|
|
- `memento-note/app/(admin)/admin/settings/admin-settings-form.tsx`
|
|
- `memento-note/locales/en.json`
|
|
- `memento-note/locales/fr.json`
|
|
|
|
### Change Log
|
|
|
|
- 2026-05-15: Story 3.3 implemented — provider failover on 429/5xx with per-lane admin fallback config.
|
|
- 2026-05-15: Code review — 2 decisions, 5 patches applied, 6 deferred, 5 dismissed. 29 tests passing.
|
|
|
|
---
|
|
|
|
## Story Completion Status
|
|
|
|
- Story ID: 3.3
|
|
- Story Key: `3-3-smart-routing-fallback`
|
|
- File: `docs/3-3-smart-routing-fallback.md`
|
|
- Status: **review**
|
|
- Completion Note: Code review patches applied. 29 tests passing (14 fallback + 15 router).
|
|
|
|
---
|
|
|
|
### Review Findings (2026-05-15)
|
|
|
|
#### Decisions — Resolved
|
|
|
|
- [x] [D1→A] Fallback provider validation — ajout `VALID_PROVIDERS` check dans `resolveAiFallbackRoute` (import depuis router.ts), throw sur provider inconnu.
|
|
- [x] [D2→A] Same-provider skip — si fallback === primaire, `resolveAiFallbackRoute` retourne `null` (pas de retry inutile).
|
|
|
|
#### Patches — Applied
|
|
|
|
- [x] [P1] CRITIQUE — Auth bypass `title-suggestions/route.ts` : ajout early return 401 pour session null + `.catch()` sur incrementUsageAsync.
|
|
- [x] [P2] `resolveAiFallbackRoute` throw dans catch — `getSecondaryProvider` wrappé dans try/catch qui retourne `null` sur erreur config, erreur primaire préservée.
|
|
- [x] [P3] `extractProviderErrorStatus` récursion bornée — `maxDepth` 5, `undefined` au-delà. Test cause circulaire ajouté.
|
|
- [x] [P4] NFR-R1 timer déplacé avant `getSecondaryProvider` — mesure complète du failover.
|
|
- [x] [P5] Tests ajoutés : 403 non-retriable, cause circulaire, cause nested, provider inconnu, same provider skip, config error preserves primary error. 14→29 tests au total.
|
|
|
|
#### Deferred
|
|
|
|
- Chat mid-stream failure — by design (AC5 retry au start seulement)
|
|
- Ollama lane URLs absents de config.ts — cfgOnly() intentionnel
|
|
- Batch embedding all-or-nothing — pré-existant
|
|
- onFinish sans error handling — pré-existant
|
|
- Pas de circuit breaker — out of scope 3.3
|
|
- incrementUsageAsync sur fail-open — by design
|
|
|
|
#### Dismissed
|
|
|
|
- title-suggestions/task-extract réutilisent 'tags' lane — by design
|
|
- Pas de régression inline 3.2 — tests séparés
|
|
- embeddingModelName calculé pour non-embedding — pas un bug
|
|
- Mock state au-delà du 2ème appel — correct
|
|
- Helpers sans annotations de type — TypeScript infère
|