All checks were successful
Deploy to Production / Build and Deploy (push) Successful in 12s
- Sidebar: dynamic brand-accent colors, brainstorm section restyled - AI chat general: popup panel with expand/collapse, hides when contextual AI open - AI chat contextual: tabs reordered (Actions first), X close button, height fix - Settings: all tabs restyled, 6 new color presets (sage, terracotta, iron, etc.) - Global color cleanup: emerald/orange hardcoded → brand-accent dynamic - Brainstorm page: orange → brand-accent throughout - PageEntry animation component added to key pages - Floating AI button: bg-brand-accent instead of hardcoded black - i18n: all 15 locales updated with new AI/billing keys - Billing: freemium quota tracking, BYOK, stripe subscription scaffolding - Admin: integrated into new design - AGENTS.md + CLAUDE.md project rules added
233 lines
14 KiB
Markdown
233 lines
14 KiB
Markdown
# Story 3.2: Custom LLM Router & OpenRouter Integration
|
||
|
||
Status: review
|
||
|
||
<!-- Ultimate context engine analysis completed - comprehensive developer guide created -->
|
||
|
||
## Story
|
||
|
||
As a system,
|
||
I want to route AI prompts across many providers with OpenRouter as the unified aggregation layer where appropriate,
|
||
so that we have flexibility in API fulfillment without vendor lock-in.
|
||
|
||
**Epic:** Epic 3 — The SaaS Commercial Engine (Monetization & API Cost Protection)
|
||
**FR coverage:** FR17 (dynamic provider switching; aggregation via OpenRouter for long-tail models).
|
||
|
||
---
|
||
|
||
## Acceptance Criteria
|
||
|
||
1. [AC1] **Central router module:** Introduce `memento-note/lib/ai/router.ts` that owns **resolution** of which backing gateway (`ProviderType` from `factory.ts`) and **model identifiers** (including OpenRouter `vendor/model` slugs) to use per **feature lane** (`chat` | `tags` | `embedding`). Resolution must be **pure synchronous configuration composition** on top of existing `getSystemConfig()` / env keys — **no outbound HTTP** inside the router’s resolve step.
|
||
2. [AC2] **NFR-P3 (≤50ms routing):** From intercept (entry to router resolve) through returning a ready-to-call `AIProvider` instance, wall-clock time MUST stay **under 50ms** on warm server paths (document measurement approach: `performance.now()` around resolve only in tests or dev-only logging behind `NODE_ENV !== 'production'`).
|
||
3. [AC3] **OpenRouter path:** When admin configures `AI_PROVIDER_CHAT`, `AI_PROVIDER_TAGS`, or `AI_PROVIDER_EMBEDDING` as `openrouter`, requests MUST use `CustomOpenAIProvider` against `https://openrouter.ai/api/v1` with models expressed as OpenRouter IDs (e.g. `openai/gpt-4o-mini`, `deepseek/deepseek-chat`). Honor existing env fallback: `OPENROUTER_API_KEY` then `CUSTOM_OPENAI_API_KEY` (already in `createOpenRouterProvider`).
|
||
4. [AC4] **Single choke-point:** `getChatProvider`, `getTagsProvider`, and `getEmbeddingsProvider` in `lib/ai/factory.ts` MUST delegate provider construction to the router (or shared resolver invoked by both router and factory) so future stories (3.3 fallback, 3.5 BYOK) attach in one place. **Do not** leave three divergent code paths that each interpret env/admin config differently.
|
||
5. [AC5] **Regression safety:** All existing AI call sites that already use `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider` continue to work without signature changes (backward-compatible exports).
|
||
6. [AC6] **Observability hook:** Router exposes a minimal structured log or debug field (e.g. `{ lane, providerType, modelId, resolveMs }`) usable from one representative path (e.g. `app/api/chat/route.ts`) so operators can verify routing choices — gated so production logs are not noisy (debug flag or `NODE_ENV`).
|
||
|
||
---
|
||
|
||
## Tasks / Subtasks
|
||
|
||
- [x] Task 1: Define router API and types (AC: #1, #4)
|
||
- [x] Subtask 1.1: Add `AiFeatureLane` union (`'chat' | 'tags' | 'embedding'`) and `ResolvedAiRoute` type `{ providerType, modelName, embeddingModelName?, meta }` in `router.ts`
|
||
- [x] Subtask 1.2: Implement `resolveAiRoute(lane, config: Record<string, string>): ResolvedAiRoute` mapping existing env keys (`AI_PROVIDER_CHAT`, `AI_MODEL_CHAT`, `AI_PROVIDER_TAGS`, `AI_MODEL_TAGS`, `AI_PROVIDER_EMBEDDING`, `AI_MODEL_EMBEDDING`, fallbacks per current `factory.ts`)
|
||
- [x] Task 2: Wire factory to router (AC: #4, #5)
|
||
- [x] Subtask 2.1: Refactor `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider` to call `resolveAiRoute` then `getProviderInstance` (export `getProviderInstance` from `factory.ts` if needed, or move instantiation behind `instantiateFromRoute()` in router that internally imports provider constructors)
|
||
- [x] Subtask 2.2: Preserve all validation rules (e.g. embeddings cannot use `anthropic` / `anthropic_custom`)
|
||
- [x] Task 3: OpenRouter model IDs & docs (AC: #3)
|
||
- [x] Subtask 3.1: Document default OpenRouter models in dev notes and align `PROVIDER_DEFAULTS.openrouter` if epic requires explicit multi-provider coverage via OpenRouter slugs
|
||
- [x] Subtask 3.2: Ensure HTTP headers expected by OpenRouter (`Authorization`, optional `HTTP-Referer` / `X-Title` per OpenRouter docs) are set in `CustomOpenAIProvider` or router wrapper if missing — **verify against current `providers/custom-openai.ts`**
|
||
- [x] Task 4: NFR-P3 measurement (AC: #2)
|
||
- [x] Subtask 4.1: Add unit test(s) that mock config and assert resolve completes under 50ms (allow generous CI timer slack — test repeated runs median or single-run ceiling with comment that excludes cold JIT)
|
||
- [x] Subtask 4.2: Optional micro-benchmark in `tests/` documenting methodology
|
||
- [x] Task 5: Observability (AC: #6)
|
||
- [x] Subtask 5.1: Log structured routing meta once per chat request (behind flag or non-production)
|
||
- [x] Task 6: Story boundaries — **explicit non-goals** (AC: #4)
|
||
- [x] Subtask 6.1: Do **not** implement multi-provider HTTP fallback loops here — that is Story **3.3**
|
||
- [x] Subtask 6.2: Do **not** implement BYOK decryption or `UserAPIKey` — that is Story **3.5**; only leave clean extension seams (`resolveApiKey` TODO or interface stub commented)
|
||
|
||
---
|
||
|
||
## Dev Notes
|
||
|
||
### Epic context
|
||
|
||
- Story **3.1** delivered Redis-backed entitlements (`checkEntitlementOrThrow`, `trackFeatureUsage`). Router runs **after** quota passes at API boundaries (e.g. chat route already checks entitlement before `getChatProvider`).
|
||
- Story **3.3** will add failure-driven fallback (429/500) within **1.5s** — router should return primary route only; retry wrapper belongs in 3.3 or in a thin `executeWithFallback` future module.
|
||
|
||
### Current codebase reality (must read before coding)
|
||
|
||
- **No `lib/ai/router.ts` today** — PRD and `byok-billing-patch-v3.md` describe a **target** architecture; this story delivers the **first production slice**: deterministic routing + OpenRouter normalization + timing guarantee on resolve.
|
||
- **`lib/ai/factory.ts`** already implements `createOpenRouterProvider` and `getProviderInstance` switch — reuse this; avoid parallel provider factories.
|
||
- **`lib/config.ts` / admin settings** expose `AI_PROVIDER_*` and model fields consumed by `getSystemConfig()` — router MUST honor the same precedence chain as today’s `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider`.
|
||
|
||
### Files — expected touch list
|
||
|
||
**NEW**
|
||
|
||
- `memento-note/lib/ai/router.ts` — route resolution + optional `getProviderForLane()` helper
|
||
|
||
**UPDATE**
|
||
|
||
- `memento-note/lib/ai/factory.ts` — delegate to router; possibly export `getProviderInstance` for tests
|
||
- `memento-note/app/api/chat/route.ts` — optional debug log line using resolved meta (AC6)
|
||
- `memento-note/tests/unit/` — new `router.test.ts` (or adjacent file) for resolve speed + mapping correctness
|
||
|
||
**READ BEFORE MODIFY**
|
||
|
||
- `memento-note/lib/ai/providers/custom-openai.ts` — OpenRouter compatibility & headers
|
||
- `memento-note/lib/entitlements.ts` — ordering vs routing (quota stays outside router)
|
||
|
||
### Testing standards
|
||
|
||
- Use existing **Vitest** setup.
|
||
- Unit-test `resolveAiRoute` with frozen config objects — no live Redis/DB.
|
||
- Prefer deterministic assertions on `{ providerType, modelName }` outputs for each lane.
|
||
|
||
---
|
||
|
||
## Dev Agent Guardrails
|
||
|
||
### Technical requirements
|
||
|
||
- **NFR-P3 scope:** Applies to **router resolution + provider instantiation**, not LLM network latency. Do not await HTTP inside resolve.
|
||
- **Thread-safe:** Resolver must remain side-effect-free aside from optional gated logging (no global mutation of config).
|
||
- **Keys:** Never log API keys. Log only provider **type** and **model id**.
|
||
- **i18n:** No user-facing strings required for this story unless touching UI (prefer none).
|
||
|
||
### Architecture compliance
|
||
|
||
- Align with brownfield stack: Next.js App Router, existing `AIProvider` interface (`lib/ai/types.ts`).
|
||
- OpenRouter model naming: use slash-separated IDs per [OpenRouter models API](https://openrouter.ai/docs).
|
||
- Preserve compatibility with **direct** providers (`deepseek`, `openai`, …) when admin selects them — OpenRouter is one gateway among many, not forced globally unless configured.
|
||
|
||
### Library / framework requirements
|
||
|
||
- Reuse `@ai-sdk/*` / existing provider wrappers already in repo — no new HTTP client for routing.
|
||
- Do not add heavy dependencies for routing (no new DI framework).
|
||
|
||
### File structure requirements
|
||
|
||
- Router lives under `memento-note/lib/ai/router.ts` (matches PRD / patch doc path).
|
||
- Named exports, TypeScript, match existing `factory.ts` style.
|
||
|
||
### Testing requirements
|
||
|
||
- Coverage for: OpenRouter lane resolution; embedding lane rejection of anthropic providers; fallback precedence parity with pre-refactor `factory.ts` (golden-table tests comparing old vs new outputs optional — strong signal).
|
||
|
||
---
|
||
|
||
## Previous Story Intelligence
|
||
|
||
**Source:** `docs/3-1-freemium-quota-tracking.md`
|
||
|
||
- Entitlements: `checkEntitlementOrThrow(userId, 'chat')` pattern on chat API; feature keys like `'chat'`, `'semantic_search'`, `'auto_tag'`, `'auto_title'`.
|
||
- Redis + `ioredis` singleton `lib/redis.ts`; atomic Lua `checkAndConsume` — routing must not introduce slow Redis calls into resolve path.
|
||
- Review fixes emphasized: no blocking entitlement drift on hot path, parameterized queries elsewhere — router stays CPU-only on resolve.
|
||
- Duplicate helpers consolidated into `lib/quota-utils.ts` — follow same “single source of truth” mindset for AI env resolution.
|
||
|
||
---
|
||
|
||
## Git Intelligence Summary
|
||
|
||
Recent commits on AI paths:
|
||
|
||
| Commit | Insight |
|
||
|----------|---------|
|
||
| `41596c2` | OpenRouter already falls back to `CUSTOM_OPENAI_API_KEY` when `OPENROUTER_API_KEY` missing — preserve in router delegation |
|
||
| `195e845` | Security-sensitive SQL elsewhere — router changes must not weaken logging or leak secrets |
|
||
|
||
---
|
||
|
||
## Latest Technical Information
|
||
|
||
- **OpenRouter:** OpenAI-compatible `/chat/completions` base URL `https://openrouter.ai/api/v1`; models addressed as `provider/model`. Confirm optional headers for attribution in OpenRouter current docs when implementing AC3.
|
||
- **Vercel AI SDK:** Chat flows use `streamText` with provider from `getChatProvider` — unchanged surface after refactor.
|
||
|
||
---
|
||
|
||
## Project Context Reference
|
||
|
||
- **Epics:** `docs/epics.md` — Story 3.2 acceptance + FR17 mapping
|
||
- **PRD:** `docs/prd.md` — FR17, NFR-P3, architecture bullet mentioning `lib/ai/router.ts`
|
||
- **Architecture / SaaS:** `memento-note/docs/saas-deployment-prep.md` — tiers and AI cost context
|
||
- **BYOK / future router:** `memento-note/docs/byok-billing-patch-v3.md` §2 — aspirational full router (BYOK + fallback); **implement only the routing choke-point slice** in this story
|
||
|
||
---
|
||
|
||
## Dev Agent Record
|
||
|
||
### Agent Model Used
|
||
|
||
Composer (Cursor agent)
|
||
|
||
### Debug Log References
|
||
|
||
- Vitest: `tests/unit/router.test.ts` (resolve precedence, anthropic embedding guard, OpenRouter slugs, median timing).
|
||
|
||
### Completion Notes List
|
||
|
||
- Implemented `lib/ai/router.ts` with synchronous `resolveAiRoute`, `resolveAiRouteWithTiming`, and `formatAiRouteDebug`.
|
||
- Refactored `getTagsProvider`, `getEmbeddingsProvider`, and `getChatProvider` to delegate to `resolveAiRoute` + exported `getProviderInstance`.
|
||
- Chat API logs structured routing JSON when `NODE_ENV !== 'production'` or `MEMENTO_AI_ROUTE_DEBUG=1`.
|
||
- Confirmed `CustomOpenAIProvider` already sets `HTTP-Referer` and `X-Title` on outbound fetch (OpenRouter-compatible).
|
||
- Documented OpenRouter slugs beside `PROVIDER_DEFAULTS.openrouter`.
|
||
- Regression checks: `npm run test:unit -- tests/unit/router.test.ts tests/unit/entitlements.test.ts` (green). Full `npm run test:unit` still reports known unrelated failures (migration suites, Playwright-named files picked up by Vitest).
|
||
|
||
### File List
|
||
|
||
- `memento-note/lib/ai/router.ts` (new)
|
||
- `memento-note/lib/ai/factory.ts` (modified)
|
||
- `memento-note/app/api/chat/route.ts` (modified)
|
||
- `memento-note/tests/unit/router.test.ts` (new)
|
||
|
||
---
|
||
|
||
## Change Log
|
||
|
||
| Date | Change |
|
||
|------|--------|
|
||
| 2026-05-15 | Story 3.2 implemented: central AI router, factory delegation, chat observability, unit tests |
|
||
| 2026-05-15 | Code review: allowlist validation (D1), removed config leak in logs (D2), provider-specific model defaults (D3), null guard in pick(), .catch() on incrementUsageAsync, updated model names to May 2026 |
|
||
|
||
---
|
||
|
||
## Story Completion Status
|
||
|
||
- Story ID: 3.2
|
||
- Story Key: `3-2-custom-llm-router`
|
||
- Status: **review**
|
||
- Completion Note: Code review patches applied. 3 decisions resolved, 4 patches applied, 15 tests passing.
|
||
|
||
---
|
||
|
||
### Review Findings (2026-05-15)
|
||
|
||
#### Decision Needed — Resolved
|
||
|
||
- [x] [Review][Decision → Patch] Unsafe `as AiGatewayProvider` cast — no runtime validation [router.ts:114] — Résolu : allowlist `VALID_PROVIDERS` avec throw sur provider inconnu.
|
||
- [x] [Review][Decision → Patch] `console.error` dump tout le config — risque de leak API keys [router.ts:31,58,86,106] — Résolu : dump supprimé, message d'erreur clair uniquement.
|
||
- [x] [Review][Decision → Patch] Default `granite4:latest` (Ollama) envoyé aux providers non-Ollama [router.ts] — Résolu : `PROVIDER_MODEL_DEFAULTS` map avec defaults par provider (deepseek-v4-flash, gpt-4.1-mini, gemini-2.5-flash, etc.).
|
||
|
||
#### Patch — Applied
|
||
|
||
- [x] [Review][Patch] `pick()` ne filtre pas `null` — TypeError sur `.toLowerCase()` [router.ts:40] — Fixed : `v != null` au lieu de `v !== undefined`.
|
||
- [x] [Review][Patch] `incrementUsageAsync` fire-and-forget sans `.catch()` [chat/route.ts:61] — Fixed : ajout `.catch()` avec log d'erreur.
|
||
- [x] [Review][Patch] PROVIDER_DEFAULTS model names obsolètes [factory.ts] — Fixed : mis à jour (deepseek-v4-flash, gpt-4.1-mini, gemini-2.5-flash, mistral-medium-3.5-latest).
|
||
- [x] [Review][Patch] Tests manquants pour validation provider + defaults dynamiques [router.test.ts] — Fixed : +9 tests (unknown provider, typo, provider defaults ×6, null config, explicit override).
|
||
|
||
#### Deferred
|
||
|
||
- [x] [Review][Defer] OLLAMA_BASE_URL routing inconsistency router vs factory — pré-existant, cfgOnly vs env fallback dans createOllamaProvider
|
||
- [x] [Review][Defer] Prototype pollution sur config object — faible risque, config vient de getSystemConfig()
|
||
- [x] [Review][Defer] NODE_ENV debug logge dans tous les env non-production — acceptable, pas de secrets dans le debug output
|
||
- [x] [Review][Defer] Double résolution debug dans chat/route.ts — acceptable, synchrone et gated par env flag
|
||
|
||
#### Dismissed
|
||
|
||
- Embedding lane utilise AI_MODEL_TAGS — by design (matching pré-refactor factory behavior)
|
||
- AC6 seulement wired dans chat — spec dit "one representative path", correct
|
||
- `formatAiRouteDebug` resolveMs undefined — JSON.stringify omet undefined, correct
|
||
- Cross-lane fallback tags→chat — conception intentionnelle
|
||
- Tests performance flaky en CI — acceptable avec comment "warm path"
|