# Story 3.2: Custom LLM Router & OpenRouter Integration Status: review ## Story As a system, I want to route AI prompts across many providers with OpenRouter as the unified aggregation layer where appropriate, so that we have flexibility in API fulfillment without vendor lock-in. **Epic:** Epic 3 — The SaaS Commercial Engine (Monetization & API Cost Protection) **FR coverage:** FR17 (dynamic provider switching; aggregation via OpenRouter for long-tail models). --- ## Acceptance Criteria 1. [AC1] **Central router module:** Introduce `memento-note/lib/ai/router.ts` that owns **resolution** of which backing gateway (`ProviderType` from `factory.ts`) and **model identifiers** (including OpenRouter `vendor/model` slugs) to use per **feature lane** (`chat` | `tags` | `embedding`). Resolution must be **pure synchronous configuration composition** on top of existing `getSystemConfig()` / env keys — **no outbound HTTP** inside the router’s resolve step. 2. [AC2] **NFR-P3 (≤50ms routing):** From intercept (entry to router resolve) through returning a ready-to-call `AIProvider` instance, wall-clock time MUST stay **under 50ms** on warm server paths (document measurement approach: `performance.now()` around resolve only in tests or dev-only logging behind `NODE_ENV !== 'production'`). 3. [AC3] **OpenRouter path:** When admin configures `AI_PROVIDER_CHAT`, `AI_PROVIDER_TAGS`, or `AI_PROVIDER_EMBEDDING` as `openrouter`, requests MUST use `CustomOpenAIProvider` against `https://openrouter.ai/api/v1` with models expressed as OpenRouter IDs (e.g. `openai/gpt-4o-mini`, `deepseek/deepseek-chat`). Honor existing env fallback: `OPENROUTER_API_KEY` then `CUSTOM_OPENAI_API_KEY` (already in `createOpenRouterProvider`). 4. [AC4] **Single choke-point:** `getChatProvider`, `getTagsProvider`, and `getEmbeddingsProvider` in `lib/ai/factory.ts` MUST delegate provider construction to the router (or shared resolver invoked by both router and factory) so future stories (3.3 fallback, 3.5 BYOK) attach in one place. **Do not** leave three divergent code paths that each interpret env/admin config differently. 5. [AC5] **Regression safety:** All existing AI call sites that already use `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider` continue to work without signature changes (backward-compatible exports). 6. [AC6] **Observability hook:** Router exposes a minimal structured log or debug field (e.g. `{ lane, providerType, modelId, resolveMs }`) usable from one representative path (e.g. `app/api/chat/route.ts`) so operators can verify routing choices — gated so production logs are not noisy (debug flag or `NODE_ENV`). --- ## Tasks / Subtasks - [x] Task 1: Define router API and types (AC: #1, #4) - [x] Subtask 1.1: Add `AiFeatureLane` union (`'chat' | 'tags' | 'embedding'`) and `ResolvedAiRoute` type `{ providerType, modelName, embeddingModelName?, meta }` in `router.ts` - [x] Subtask 1.2: Implement `resolveAiRoute(lane, config: Record): ResolvedAiRoute` mapping existing env keys (`AI_PROVIDER_CHAT`, `AI_MODEL_CHAT`, `AI_PROVIDER_TAGS`, `AI_MODEL_TAGS`, `AI_PROVIDER_EMBEDDING`, `AI_MODEL_EMBEDDING`, fallbacks per current `factory.ts`) - [x] Task 2: Wire factory to router (AC: #4, #5) - [x] Subtask 2.1: Refactor `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider` to call `resolveAiRoute` then `getProviderInstance` (export `getProviderInstance` from `factory.ts` if needed, or move instantiation behind `instantiateFromRoute()` in router that internally imports provider constructors) - [x] Subtask 2.2: Preserve all validation rules (e.g. embeddings cannot use `anthropic` / `anthropic_custom`) - [x] Task 3: OpenRouter model IDs & docs (AC: #3) - [x] Subtask 3.1: Document default OpenRouter models in dev notes and align `PROVIDER_DEFAULTS.openrouter` if epic requires explicit multi-provider coverage via OpenRouter slugs - [x] Subtask 3.2: Ensure HTTP headers expected by OpenRouter (`Authorization`, optional `HTTP-Referer` / `X-Title` per OpenRouter docs) are set in `CustomOpenAIProvider` or router wrapper if missing — **verify against current `providers/custom-openai.ts`** - [x] Task 4: NFR-P3 measurement (AC: #2) - [x] Subtask 4.1: Add unit test(s) that mock config and assert resolve completes under 50ms (allow generous CI timer slack — test repeated runs median or single-run ceiling with comment that excludes cold JIT) - [x] Subtask 4.2: Optional micro-benchmark in `tests/` documenting methodology - [x] Task 5: Observability (AC: #6) - [x] Subtask 5.1: Log structured routing meta once per chat request (behind flag or non-production) - [x] Task 6: Story boundaries — **explicit non-goals** (AC: #4) - [x] Subtask 6.1: Do **not** implement multi-provider HTTP fallback loops here — that is Story **3.3** - [x] Subtask 6.2: Do **not** implement BYOK decryption or `UserAPIKey` — that is Story **3.5**; only leave clean extension seams (`resolveApiKey` TODO or interface stub commented) --- ## Dev Notes ### Epic context - Story **3.1** delivered Redis-backed entitlements (`checkEntitlementOrThrow`, `trackFeatureUsage`). Router runs **after** quota passes at API boundaries (e.g. chat route already checks entitlement before `getChatProvider`). - Story **3.3** will add failure-driven fallback (429/500) within **1.5s** — router should return primary route only; retry wrapper belongs in 3.3 or in a thin `executeWithFallback` future module. ### Current codebase reality (must read before coding) - **No `lib/ai/router.ts` today** — PRD and `byok-billing-patch-v3.md` describe a **target** architecture; this story delivers the **first production slice**: deterministic routing + OpenRouter normalization + timing guarantee on resolve. - **`lib/ai/factory.ts`** already implements `createOpenRouterProvider` and `getProviderInstance` switch — reuse this; avoid parallel provider factories. - **`lib/config.ts` / admin settings** expose `AI_PROVIDER_*` and model fields consumed by `getSystemConfig()` — router MUST honor the same precedence chain as today’s `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider`. ### Files — expected touch list **NEW** - `memento-note/lib/ai/router.ts` — route resolution + optional `getProviderForLane()` helper **UPDATE** - `memento-note/lib/ai/factory.ts` — delegate to router; possibly export `getProviderInstance` for tests - `memento-note/app/api/chat/route.ts` — optional debug log line using resolved meta (AC6) - `memento-note/tests/unit/` — new `router.test.ts` (or adjacent file) for resolve speed + mapping correctness **READ BEFORE MODIFY** - `memento-note/lib/ai/providers/custom-openai.ts` — OpenRouter compatibility & headers - `memento-note/lib/entitlements.ts` — ordering vs routing (quota stays outside router) ### Testing standards - Use existing **Vitest** setup. - Unit-test `resolveAiRoute` with frozen config objects — no live Redis/DB. - Prefer deterministic assertions on `{ providerType, modelName }` outputs for each lane. --- ## Dev Agent Guardrails ### Technical requirements - **NFR-P3 scope:** Applies to **router resolution + provider instantiation**, not LLM network latency. Do not await HTTP inside resolve. - **Thread-safe:** Resolver must remain side-effect-free aside from optional gated logging (no global mutation of config). - **Keys:** Never log API keys. Log only provider **type** and **model id**. - **i18n:** No user-facing strings required for this story unless touching UI (prefer none). ### Architecture compliance - Align with brownfield stack: Next.js App Router, existing `AIProvider` interface (`lib/ai/types.ts`). - OpenRouter model naming: use slash-separated IDs per [OpenRouter models API](https://openrouter.ai/docs). - Preserve compatibility with **direct** providers (`deepseek`, `openai`, …) when admin selects them — OpenRouter is one gateway among many, not forced globally unless configured. ### Library / framework requirements - Reuse `@ai-sdk/*` / existing provider wrappers already in repo — no new HTTP client for routing. - Do not add heavy dependencies for routing (no new DI framework). ### File structure requirements - Router lives under `memento-note/lib/ai/router.ts` (matches PRD / patch doc path). - Named exports, TypeScript, match existing `factory.ts` style. ### Testing requirements - Coverage for: OpenRouter lane resolution; embedding lane rejection of anthropic providers; fallback precedence parity with pre-refactor `factory.ts` (golden-table tests comparing old vs new outputs optional — strong signal). --- ## Previous Story Intelligence **Source:** `docs/3-1-freemium-quota-tracking.md` - Entitlements: `checkEntitlementOrThrow(userId, 'chat')` pattern on chat API; feature keys like `'chat'`, `'semantic_search'`, `'auto_tag'`, `'auto_title'`. - Redis + `ioredis` singleton `lib/redis.ts`; atomic Lua `checkAndConsume` — routing must not introduce slow Redis calls into resolve path. - Review fixes emphasized: no blocking entitlement drift on hot path, parameterized queries elsewhere — router stays CPU-only on resolve. - Duplicate helpers consolidated into `lib/quota-utils.ts` — follow same “single source of truth” mindset for AI env resolution. --- ## Git Intelligence Summary Recent commits on AI paths: | Commit | Insight | |----------|---------| | `41596c2` | OpenRouter already falls back to `CUSTOM_OPENAI_API_KEY` when `OPENROUTER_API_KEY` missing — preserve in router delegation | | `195e845` | Security-sensitive SQL elsewhere — router changes must not weaken logging or leak secrets | --- ## Latest Technical Information - **OpenRouter:** OpenAI-compatible `/chat/completions` base URL `https://openrouter.ai/api/v1`; models addressed as `provider/model`. Confirm optional headers for attribution in OpenRouter current docs when implementing AC3. - **Vercel AI SDK:** Chat flows use `streamText` with provider from `getChatProvider` — unchanged surface after refactor. --- ## Project Context Reference - **Epics:** `docs/epics.md` — Story 3.2 acceptance + FR17 mapping - **PRD:** `docs/prd.md` — FR17, NFR-P3, architecture bullet mentioning `lib/ai/router.ts` - **Architecture / SaaS:** `memento-note/docs/saas-deployment-prep.md` — tiers and AI cost context - **BYOK / future router:** `memento-note/docs/byok-billing-patch-v3.md` §2 — aspirational full router (BYOK + fallback); **implement only the routing choke-point slice** in this story --- ## Dev Agent Record ### Agent Model Used Composer (Cursor agent) ### Debug Log References - Vitest: `tests/unit/router.test.ts` (resolve precedence, anthropic embedding guard, OpenRouter slugs, median timing). ### Completion Notes List - Implemented `lib/ai/router.ts` with synchronous `resolveAiRoute`, `resolveAiRouteWithTiming`, and `formatAiRouteDebug`. - Refactored `getTagsProvider`, `getEmbeddingsProvider`, and `getChatProvider` to delegate to `resolveAiRoute` + exported `getProviderInstance`. - Chat API logs structured routing JSON when `NODE_ENV !== 'production'` or `MEMENTO_AI_ROUTE_DEBUG=1`. - Confirmed `CustomOpenAIProvider` already sets `HTTP-Referer` and `X-Title` on outbound fetch (OpenRouter-compatible). - Documented OpenRouter slugs beside `PROVIDER_DEFAULTS.openrouter`. - Regression checks: `npm run test:unit -- tests/unit/router.test.ts tests/unit/entitlements.test.ts` (green). Full `npm run test:unit` still reports known unrelated failures (migration suites, Playwright-named files picked up by Vitest). ### File List - `memento-note/lib/ai/router.ts` (new) - `memento-note/lib/ai/factory.ts` (modified) - `memento-note/app/api/chat/route.ts` (modified) - `memento-note/tests/unit/router.test.ts` (new) --- ## Change Log | Date | Change | |------|--------| | 2026-05-15 | Story 3.2 implemented: central AI router, factory delegation, chat observability, unit tests | | 2026-05-15 | Code review: allowlist validation (D1), removed config leak in logs (D2), provider-specific model defaults (D3), null guard in pick(), .catch() on incrementUsageAsync, updated model names to May 2026 | --- ## Story Completion Status - Story ID: 3.2 - Story Key: `3-2-custom-llm-router` - Status: **review** - Completion Note: Code review patches applied. 3 decisions resolved, 4 patches applied, 15 tests passing. --- ### Review Findings (2026-05-15) #### Decision Needed — Resolved - [x] [Review][Decision → Patch] Unsafe `as AiGatewayProvider` cast — no runtime validation [router.ts:114] — Résolu : allowlist `VALID_PROVIDERS` avec throw sur provider inconnu. - [x] [Review][Decision → Patch] `console.error` dump tout le config — risque de leak API keys [router.ts:31,58,86,106] — Résolu : dump supprimé, message d'erreur clair uniquement. - [x] [Review][Decision → Patch] Default `granite4:latest` (Ollama) envoyé aux providers non-Ollama [router.ts] — Résolu : `PROVIDER_MODEL_DEFAULTS` map avec defaults par provider (deepseek-v4-flash, gpt-4.1-mini, gemini-2.5-flash, etc.). #### Patch — Applied - [x] [Review][Patch] `pick()` ne filtre pas `null` — TypeError sur `.toLowerCase()` [router.ts:40] — Fixed : `v != null` au lieu de `v !== undefined`. - [x] [Review][Patch] `incrementUsageAsync` fire-and-forget sans `.catch()` [chat/route.ts:61] — Fixed : ajout `.catch()` avec log d'erreur. - [x] [Review][Patch] PROVIDER_DEFAULTS model names obsolètes [factory.ts] — Fixed : mis à jour (deepseek-v4-flash, gpt-4.1-mini, gemini-2.5-flash, mistral-medium-3.5-latest). - [x] [Review][Patch] Tests manquants pour validation provider + defaults dynamiques [router.test.ts] — Fixed : +9 tests (unknown provider, typo, provider defaults ×6, null config, explicit override). #### Deferred - [x] [Review][Defer] OLLAMA_BASE_URL routing inconsistency router vs factory — pré-existant, cfgOnly vs env fallback dans createOllamaProvider - [x] [Review][Defer] Prototype pollution sur config object — faible risque, config vient de getSystemConfig() - [x] [Review][Defer] NODE_ENV debug logge dans tous les env non-production — acceptable, pas de secrets dans le debug output - [x] [Review][Defer] Double résolution debug dans chat/route.ts — acceptable, synchrone et gated par env flag #### Dismissed - Embedding lane utilise AI_MODEL_TAGS — by design (matching pré-refactor factory behavior) - AC6 seulement wired dans chat — spec dit "one representative path", correct - `formatAiRouteDebug` resolveMs undefined — JSON.stringify omet undefined, correct - Cross-lane fallback tags→chat — conception intentionnelle - Tests performance flaky en CI — acceptable avec comment "warm path"