Files
Momento/docs/3-2-custom-llm-router.md
Antigravity bd495be965
All checks were successful
Deploy to Production / Build and Deploy (push) Successful in 12s
feat: design system overhaul — sidebar, AI chats, settings, brainstorm, color cleanup
- Sidebar: dynamic brand-accent colors, brainstorm section restyled
- AI chat general: popup panel with expand/collapse, hides when contextual AI open
- AI chat contextual: tabs reordered (Actions first), X close button, height fix
- Settings: all tabs restyled, 6 new color presets (sage, terracotta, iron, etc.)
- Global color cleanup: emerald/orange hardcoded → brand-accent dynamic
- Brainstorm page: orange → brand-accent throughout
- PageEntry animation component added to key pages
- Floating AI button: bg-brand-accent instead of hardcoded black
- i18n: all 15 locales updated with new AI/billing keys
- Billing: freemium quota tracking, BYOK, stripe subscription scaffolding
- Admin: integrated into new design
- AGENTS.md + CLAUDE.md project rules added
2026-05-16 12:59:30 +00:00

233 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Story 3.2: Custom LLM Router & OpenRouter Integration
Status: review
<!-- Ultimate context engine analysis completed - comprehensive developer guide created -->
## Story
As a system,
I want to route AI prompts across many providers with OpenRouter as the unified aggregation layer where appropriate,
so that we have flexibility in API fulfillment without vendor lock-in.
**Epic:** Epic 3 — The SaaS Commercial Engine (Monetization & API Cost Protection)
**FR coverage:** FR17 (dynamic provider switching; aggregation via OpenRouter for long-tail models).
---
## Acceptance Criteria
1. [AC1] **Central router module:** Introduce `memento-note/lib/ai/router.ts` that owns **resolution** of which backing gateway (`ProviderType` from `factory.ts`) and **model identifiers** (including OpenRouter `vendor/model` slugs) to use per **feature lane** (`chat` | `tags` | `embedding`). Resolution must be **pure synchronous configuration composition** on top of existing `getSystemConfig()` / env keys — **no outbound HTTP** inside the routers resolve step.
2. [AC2] **NFR-P3 (≤50ms routing):** From intercept (entry to router resolve) through returning a ready-to-call `AIProvider` instance, wall-clock time MUST stay **under 50ms** on warm server paths (document measurement approach: `performance.now()` around resolve only in tests or dev-only logging behind `NODE_ENV !== 'production'`).
3. [AC3] **OpenRouter path:** When admin configures `AI_PROVIDER_CHAT`, `AI_PROVIDER_TAGS`, or `AI_PROVIDER_EMBEDDING` as `openrouter`, requests MUST use `CustomOpenAIProvider` against `https://openrouter.ai/api/v1` with models expressed as OpenRouter IDs (e.g. `openai/gpt-4o-mini`, `deepseek/deepseek-chat`). Honor existing env fallback: `OPENROUTER_API_KEY` then `CUSTOM_OPENAI_API_KEY` (already in `createOpenRouterProvider`).
4. [AC4] **Single choke-point:** `getChatProvider`, `getTagsProvider`, and `getEmbeddingsProvider` in `lib/ai/factory.ts` MUST delegate provider construction to the router (or shared resolver invoked by both router and factory) so future stories (3.3 fallback, 3.5 BYOK) attach in one place. **Do not** leave three divergent code paths that each interpret env/admin config differently.
5. [AC5] **Regression safety:** All existing AI call sites that already use `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider` continue to work without signature changes (backward-compatible exports).
6. [AC6] **Observability hook:** Router exposes a minimal structured log or debug field (e.g. `{ lane, providerType, modelId, resolveMs }`) usable from one representative path (e.g. `app/api/chat/route.ts`) so operators can verify routing choices — gated so production logs are not noisy (debug flag or `NODE_ENV`).
---
## Tasks / Subtasks
- [x] Task 1: Define router API and types (AC: #1, #4)
- [x] Subtask 1.1: Add `AiFeatureLane` union (`'chat' | 'tags' | 'embedding'`) and `ResolvedAiRoute` type `{ providerType, modelName, embeddingModelName?, meta }` in `router.ts`
- [x] Subtask 1.2: Implement `resolveAiRoute(lane, config: Record<string, string>): ResolvedAiRoute` mapping existing env keys (`AI_PROVIDER_CHAT`, `AI_MODEL_CHAT`, `AI_PROVIDER_TAGS`, `AI_MODEL_TAGS`, `AI_PROVIDER_EMBEDDING`, `AI_MODEL_EMBEDDING`, fallbacks per current `factory.ts`)
- [x] Task 2: Wire factory to router (AC: #4, #5)
- [x] Subtask 2.1: Refactor `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider` to call `resolveAiRoute` then `getProviderInstance` (export `getProviderInstance` from `factory.ts` if needed, or move instantiation behind `instantiateFromRoute()` in router that internally imports provider constructors)
- [x] Subtask 2.2: Preserve all validation rules (e.g. embeddings cannot use `anthropic` / `anthropic_custom`)
- [x] Task 3: OpenRouter model IDs & docs (AC: #3)
- [x] Subtask 3.1: Document default OpenRouter models in dev notes and align `PROVIDER_DEFAULTS.openrouter` if epic requires explicit multi-provider coverage via OpenRouter slugs
- [x] Subtask 3.2: Ensure HTTP headers expected by OpenRouter (`Authorization`, optional `HTTP-Referer` / `X-Title` per OpenRouter docs) are set in `CustomOpenAIProvider` or router wrapper if missing — **verify against current `providers/custom-openai.ts`**
- [x] Task 4: NFR-P3 measurement (AC: #2)
- [x] Subtask 4.1: Add unit test(s) that mock config and assert resolve completes under 50ms (allow generous CI timer slack — test repeated runs median or single-run ceiling with comment that excludes cold JIT)
- [x] Subtask 4.2: Optional micro-benchmark in `tests/` documenting methodology
- [x] Task 5: Observability (AC: #6)
- [x] Subtask 5.1: Log structured routing meta once per chat request (behind flag or non-production)
- [x] Task 6: Story boundaries — **explicit non-goals** (AC: #4)
- [x] Subtask 6.1: Do **not** implement multi-provider HTTP fallback loops here — that is Story **3.3**
- [x] Subtask 6.2: Do **not** implement BYOK decryption or `UserAPIKey` — that is Story **3.5**; only leave clean extension seams (`resolveApiKey` TODO or interface stub commented)
---
## Dev Notes
### Epic context
- Story **3.1** delivered Redis-backed entitlements (`checkEntitlementOrThrow`, `trackFeatureUsage`). Router runs **after** quota passes at API boundaries (e.g. chat route already checks entitlement before `getChatProvider`).
- Story **3.3** will add failure-driven fallback (429/500) within **1.5s** — router should return primary route only; retry wrapper belongs in 3.3 or in a thin `executeWithFallback` future module.
### Current codebase reality (must read before coding)
- **No `lib/ai/router.ts` today** — PRD and `byok-billing-patch-v3.md` describe a **target** architecture; this story delivers the **first production slice**: deterministic routing + OpenRouter normalization + timing guarantee on resolve.
- **`lib/ai/factory.ts`** already implements `createOpenRouterProvider` and `getProviderInstance` switch — reuse this; avoid parallel provider factories.
- **`lib/config.ts` / admin settings** expose `AI_PROVIDER_*` and model fields consumed by `getSystemConfig()` — router MUST honor the same precedence chain as todays `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider`.
### Files — expected touch list
**NEW**
- `memento-note/lib/ai/router.ts` — route resolution + optional `getProviderForLane()` helper
**UPDATE**
- `memento-note/lib/ai/factory.ts` — delegate to router; possibly export `getProviderInstance` for tests
- `memento-note/app/api/chat/route.ts` — optional debug log line using resolved meta (AC6)
- `memento-note/tests/unit/` — new `router.test.ts` (or adjacent file) for resolve speed + mapping correctness
**READ BEFORE MODIFY**
- `memento-note/lib/ai/providers/custom-openai.ts` — OpenRouter compatibility & headers
- `memento-note/lib/entitlements.ts` — ordering vs routing (quota stays outside router)
### Testing standards
- Use existing **Vitest** setup.
- Unit-test `resolveAiRoute` with frozen config objects — no live Redis/DB.
- Prefer deterministic assertions on `{ providerType, modelName }` outputs for each lane.
---
## Dev Agent Guardrails
### Technical requirements
- **NFR-P3 scope:** Applies to **router resolution + provider instantiation**, not LLM network latency. Do not await HTTP inside resolve.
- **Thread-safe:** Resolver must remain side-effect-free aside from optional gated logging (no global mutation of config).
- **Keys:** Never log API keys. Log only provider **type** and **model id**.
- **i18n:** No user-facing strings required for this story unless touching UI (prefer none).
### Architecture compliance
- Align with brownfield stack: Next.js App Router, existing `AIProvider` interface (`lib/ai/types.ts`).
- OpenRouter model naming: use slash-separated IDs per [OpenRouter models API](https://openrouter.ai/docs).
- Preserve compatibility with **direct** providers (`deepseek`, `openai`, …) when admin selects them — OpenRouter is one gateway among many, not forced globally unless configured.
### Library / framework requirements
- Reuse `@ai-sdk/*` / existing provider wrappers already in repo — no new HTTP client for routing.
- Do not add heavy dependencies for routing (no new DI framework).
### File structure requirements
- Router lives under `memento-note/lib/ai/router.ts` (matches PRD / patch doc path).
- Named exports, TypeScript, match existing `factory.ts` style.
### Testing requirements
- Coverage for: OpenRouter lane resolution; embedding lane rejection of anthropic providers; fallback precedence parity with pre-refactor `factory.ts` (golden-table tests comparing old vs new outputs optional — strong signal).
---
## Previous Story Intelligence
**Source:** `docs/3-1-freemium-quota-tracking.md`
- Entitlements: `checkEntitlementOrThrow(userId, 'chat')` pattern on chat API; feature keys like `'chat'`, `'semantic_search'`, `'auto_tag'`, `'auto_title'`.
- Redis + `ioredis` singleton `lib/redis.ts`; atomic Lua `checkAndConsume` — routing must not introduce slow Redis calls into resolve path.
- Review fixes emphasized: no blocking entitlement drift on hot path, parameterized queries elsewhere — router stays CPU-only on resolve.
- Duplicate helpers consolidated into `lib/quota-utils.ts` — follow same “single source of truth” mindset for AI env resolution.
---
## Git Intelligence Summary
Recent commits on AI paths:
| Commit | Insight |
|----------|---------|
| `41596c2` | OpenRouter already falls back to `CUSTOM_OPENAI_API_KEY` when `OPENROUTER_API_KEY` missing — preserve in router delegation |
| `195e845` | Security-sensitive SQL elsewhere — router changes must not weaken logging or leak secrets |
---
## Latest Technical Information
- **OpenRouter:** OpenAI-compatible `/chat/completions` base URL `https://openrouter.ai/api/v1`; models addressed as `provider/model`. Confirm optional headers for attribution in OpenRouter current docs when implementing AC3.
- **Vercel AI SDK:** Chat flows use `streamText` with provider from `getChatProvider` — unchanged surface after refactor.
---
## Project Context Reference
- **Epics:** `docs/epics.md` — Story 3.2 acceptance + FR17 mapping
- **PRD:** `docs/prd.md` — FR17, NFR-P3, architecture bullet mentioning `lib/ai/router.ts`
- **Architecture / SaaS:** `memento-note/docs/saas-deployment-prep.md` — tiers and AI cost context
- **BYOK / future router:** `memento-note/docs/byok-billing-patch-v3.md` §2 — aspirational full router (BYOK + fallback); **implement only the routing choke-point slice** in this story
---
## Dev Agent Record
### Agent Model Used
Composer (Cursor agent)
### Debug Log References
- Vitest: `tests/unit/router.test.ts` (resolve precedence, anthropic embedding guard, OpenRouter slugs, median timing).
### Completion Notes List
- Implemented `lib/ai/router.ts` with synchronous `resolveAiRoute`, `resolveAiRouteWithTiming`, and `formatAiRouteDebug`.
- Refactored `getTagsProvider`, `getEmbeddingsProvider`, and `getChatProvider` to delegate to `resolveAiRoute` + exported `getProviderInstance`.
- Chat API logs structured routing JSON when `NODE_ENV !== 'production'` or `MEMENTO_AI_ROUTE_DEBUG=1`.
- Confirmed `CustomOpenAIProvider` already sets `HTTP-Referer` and `X-Title` on outbound fetch (OpenRouter-compatible).
- Documented OpenRouter slugs beside `PROVIDER_DEFAULTS.openrouter`.
- Regression checks: `npm run test:unit -- tests/unit/router.test.ts tests/unit/entitlements.test.ts` (green). Full `npm run test:unit` still reports known unrelated failures (migration suites, Playwright-named files picked up by Vitest).
### File List
- `memento-note/lib/ai/router.ts` (new)
- `memento-note/lib/ai/factory.ts` (modified)
- `memento-note/app/api/chat/route.ts` (modified)
- `memento-note/tests/unit/router.test.ts` (new)
---
## Change Log
| Date | Change |
|------|--------|
| 2026-05-15 | Story 3.2 implemented: central AI router, factory delegation, chat observability, unit tests |
| 2026-05-15 | Code review: allowlist validation (D1), removed config leak in logs (D2), provider-specific model defaults (D3), null guard in pick(), .catch() on incrementUsageAsync, updated model names to May 2026 |
---
## Story Completion Status
- Story ID: 3.2
- Story Key: `3-2-custom-llm-router`
- Status: **review**
- Completion Note: Code review patches applied. 3 decisions resolved, 4 patches applied, 15 tests passing.
---
### Review Findings (2026-05-15)
#### Decision Needed — Resolved
- [x] [Review][Decision → Patch] Unsafe `as AiGatewayProvider` cast — no runtime validation [router.ts:114] — Résolu : allowlist `VALID_PROVIDERS` avec throw sur provider inconnu.
- [x] [Review][Decision → Patch] `console.error` dump tout le config — risque de leak API keys [router.ts:31,58,86,106] — Résolu : dump supprimé, message d'erreur clair uniquement.
- [x] [Review][Decision → Patch] Default `granite4:latest` (Ollama) envoyé aux providers non-Ollama [router.ts] — Résolu : `PROVIDER_MODEL_DEFAULTS` map avec defaults par provider (deepseek-v4-flash, gpt-4.1-mini, gemini-2.5-flash, etc.).
#### Patch — Applied
- [x] [Review][Patch] `pick()` ne filtre pas `null` — TypeError sur `.toLowerCase()` [router.ts:40] — Fixed : `v != null` au lieu de `v !== undefined`.
- [x] [Review][Patch] `incrementUsageAsync` fire-and-forget sans `.catch()` [chat/route.ts:61] — Fixed : ajout `.catch()` avec log d'erreur.
- [x] [Review][Patch] PROVIDER_DEFAULTS model names obsolètes [factory.ts] — Fixed : mis à jour (deepseek-v4-flash, gpt-4.1-mini, gemini-2.5-flash, mistral-medium-3.5-latest).
- [x] [Review][Patch] Tests manquants pour validation provider + defaults dynamiques [router.test.ts] — Fixed : +9 tests (unknown provider, typo, provider defaults ×6, null config, explicit override).
#### Deferred
- [x] [Review][Defer] OLLAMA_BASE_URL routing inconsistency router vs factory — pré-existant, cfgOnly vs env fallback dans createOllamaProvider
- [x] [Review][Defer] Prototype pollution sur config object — faible risque, config vient de getSystemConfig()
- [x] [Review][Defer] NODE_ENV debug logge dans tous les env non-production — acceptable, pas de secrets dans le debug output
- [x] [Review][Defer] Double résolution debug dans chat/route.ts — acceptable, synchrone et gated par env flag
#### Dismissed
- Embedding lane utilise AI_MODEL_TAGS — by design (matching pré-refactor factory behavior)
- AC6 seulement wired dans chat — spec dit "one representative path", correct
- `formatAiRouteDebug` resolveMs undefined — JSON.stringify omet undefined, correct
- Cross-lane fallback tags→chat — conception intentionnelle
- Tests performance flaky en CI — acceptable avec comment "warm path"