Momento/docs/3-2-custom-llm-router.md

# Story 3.2: Custom LLM Router & OpenRouter Integration

Status: review

<!-- Ultimate context engine analysis completed - comprehensive developer guide created -->

## Story

As a system,
I want to route AI prompts across many providers with OpenRouter as the unified aggregation layer where appropriate,
so that we have flexibility in API fulfillment without vendor lock-in.

**Epic:** Epic 3 — The SaaS Commercial Engine (Monetization & API Cost Protection)
**FR coverage:** FR17 (dynamic provider switching; aggregation via OpenRouter for long-tail models).

---

## Acceptance Criteria

1. [AC1] **Central router module:** Introduce `memento-note/lib/ai/router.ts` that owns **resolution** of which backing gateway (`ProviderType` from `factory.ts`) and **model identifiers** (including OpenRouter `vendor/model` slugs) to use per **feature lane** (`chat` | `tags` | `embedding`). Resolution must be **pure synchronous configuration composition** on top of existing `getSystemConfig()` / env keys — **no outbound HTTP** inside the router’s resolve step.
2. [AC2] **NFR-P3 (≤50ms routing):** From intercept (entry to router resolve) through returning a ready-to-call `AIProvider` instance, wall-clock time MUST stay **under 50ms** on warm server paths (document measurement approach: `performance.now()` around resolve only in tests or dev-only logging behind `NODE_ENV !== 'production'`).
3. [AC3] **OpenRouter path:** When admin configures `AI_PROVIDER_CHAT`, `AI_PROVIDER_TAGS`, or `AI_PROVIDER_EMBEDDING` as `openrouter`, requests MUST use `CustomOpenAIProvider` against `https://openrouter.ai/api/v1` with models expressed as OpenRouter IDs (e.g. `openai/gpt-4o-mini`, `deepseek/deepseek-chat`). Honor existing env fallback: `OPENROUTER_API_KEY` then `CUSTOM_OPENAI_API_KEY` (already in `createOpenRouterProvider`).
4. [AC4] **Single choke-point:** `getChatProvider`, `getTagsProvider`, and `getEmbeddingsProvider` in `lib/ai/factory.ts` MUST delegate provider construction to the router (or shared resolver invoked by both router and factory) so future stories (3.3 fallback, 3.5 BYOK) attach in one place. **Do not** leave three divergent code paths that each interpret env/admin config differently.
5. [AC5] **Regression safety:** All existing AI call sites that already use `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider` continue to work without signature changes (backward-compatible exports).
6. [AC6] **Observability hook:** Router exposes a minimal structured log or debug field (e.g. `{ lane, providerType, modelId, resolveMs }`) usable from one representative path (e.g. `app/api/chat/route.ts`) so operators can verify routing choices — gated so production logs are not noisy (debug flag or `NODE_ENV`).

---

## Tasks / Subtasks

- [x] Task 1: Define router API and types (AC: #1, #4)
  - [x] Subtask 1.1: Add `AiFeatureLane` union (`'chat' | 'tags' | 'embedding'`) and `ResolvedAiRoute` type `{ providerType, modelName, embeddingModelName?, meta }` in `router.ts`
  - [x] Subtask 1.2: Implement `resolveAiRoute(lane, config: Record<string, string>): ResolvedAiRoute` mapping existing env keys (`AI_PROVIDER_CHAT`, `AI_MODEL_CHAT`, `AI_PROVIDER_TAGS`, `AI_MODEL_TAGS`, `AI_PROVIDER_EMBEDDING`, `AI_MODEL_EMBEDDING`, fallbacks per current `factory.ts`)
- [x] Task 2: Wire factory to router (AC: #4, #5)
  - [x] Subtask 2.1: Refactor `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider` to call `resolveAiRoute` then `getProviderInstance` (export `getProviderInstance` from `factory.ts` if needed, or move instantiation behind `instantiateFromRoute()` in router that internally imports provider constructors)
  - [x] Subtask 2.2: Preserve all validation rules (e.g. embeddings cannot use `anthropic` / `anthropic_custom`)
- [x] Task 3: OpenRouter model IDs & docs (AC: #3)
  - [x] Subtask 3.1: Document default OpenRouter models in dev notes and align `PROVIDER_DEFAULTS.openrouter` if epic requires explicit multi-provider coverage via OpenRouter slugs
  - [x] Subtask 3.2: Ensure HTTP headers expected by OpenRouter (`Authorization`, optional `HTTP-Referer` / `X-Title` per OpenRouter docs) are set in `CustomOpenAIProvider` or router wrapper if missing — **verify against current `providers/custom-openai.ts`**
- [x] Task 4: NFR-P3 measurement (AC: #2)
  - [x] Subtask 4.1: Add unit test(s) that mock config and assert resolve completes under 50ms (allow generous CI timer slack — test repeated runs median or single-run ceiling with comment that excludes cold JIT)
  - [x] Subtask 4.2: Optional micro-benchmark in `tests/` documenting methodology
- [x] Task 5: Observability (AC: #6)
  - [x] Subtask 5.1: Log structured routing meta once per chat request (behind flag or non-production)
- [x] Task 6: Story boundaries — **explicit non-goals** (AC: #4)
  - [x] Subtask 6.1: Do **not** implement multi-provider HTTP fallback loops here — that is Story **3.3**
  - [x] Subtask 6.2: Do **not** implement BYOK decryption or `UserAPIKey` — that is Story **3.5**; only leave clean extension seams (`resolveApiKey` TODO or interface stub commented)

---

## Dev Notes

### Epic context

- Story **3.1** delivered Redis-backed entitlements (`checkEntitlementOrThrow`, `trackFeatureUsage`). Router runs **after** quota passes at API boundaries (e.g. chat route already checks entitlement before `getChatProvider`).
- Story **3.3** will add failure-driven fallback (429/500) within **1.5s** — router should return primary route only; retry wrapper belongs in 3.3 or in a thin `executeWithFallback` future module.

### Current codebase reality (must read before coding)

- **No `lib/ai/router.ts` today** — PRD and `byok-billing-patch-v3.md` describe a **target** architecture; this story delivers the **first production slice**: deterministic routing + OpenRouter normalization + timing guarantee on resolve.
- **`lib/ai/factory.ts`** already implements `createOpenRouterProvider` and `getProviderInstance` switch — reuse this; avoid parallel provider factories.
- **`lib/config.ts` / admin settings** expose `AI_PROVIDER_*` and model fields consumed by `getSystemConfig()` — router MUST honor the same precedence chain as today’s `getChatProvider` / `getTagsProvider` / `getEmbeddingsProvider`.

### Files — expected touch list

**NEW**

- `memento-note/lib/ai/router.ts` — route resolution + optional `getProviderForLane()` helper

**UPDATE**

- `memento-note/lib/ai/factory.ts` — delegate to router; possibly export `getProviderInstance` for tests
- `memento-note/app/api/chat/route.ts` — optional debug log line using resolved meta (AC6)
- `memento-note/tests/unit/` — new `router.test.ts` (or adjacent file) for resolve speed + mapping correctness

**READ BEFORE MODIFY**

- `memento-note/lib/ai/providers/custom-openai.ts` — OpenRouter compatibility & headers
- `memento-note/lib/entitlements.ts` — ordering vs routing (quota stays outside router)

### Testing standards

- Use existing **Vitest** setup.
- Unit-test `resolveAiRoute` with frozen config objects — no live Redis/DB.
- Prefer deterministic assertions on `{ providerType, modelName }` outputs for each lane.

---

## Dev Agent Guardrails

### Technical requirements

- **NFR-P3 scope:** Applies to **router resolution + provider instantiation**, not LLM network latency. Do not await HTTP inside resolve.
- **Thread-safe:** Resolver must remain side-effect-free aside from optional gated logging (no global mutation of config).
- **Keys:** Never log API keys. Log only provider **type** and **model id**.
- **i18n:** No user-facing strings required for this story unless touching UI (prefer none).

### Architecture compliance

- Align with brownfield stack: Next.js App Router, existing `AIProvider` interface (`lib/ai/types.ts`).
- OpenRouter model naming: use slash-separated IDs per [OpenRouter models API](https://openrouter.ai/docs).
- Preserve compatibility with **direct** providers (`deepseek`, `openai`, …) when admin selects them — OpenRouter is one gateway among many, not forced globally unless configured.

### Library / framework requirements

- Reuse `@ai-sdk/*` / existing provider wrappers already in repo — no new HTTP client for routing.
- Do not add heavy dependencies for routing (no new DI framework).

### File structure requirements

- Router lives under `memento-note/lib/ai/router.ts` (matches PRD / patch doc path).
- Named exports, TypeScript, match existing `factory.ts` style.

### Testing requirements

- Coverage for: OpenRouter lane resolution; embedding lane rejection of anthropic providers; fallback precedence parity with pre-refactor `factory.ts` (golden-table tests comparing old vs new outputs optional — strong signal).

---

## Previous Story Intelligence

**Source:** `docs/3-1-freemium-quota-tracking.md`

- Entitlements: `checkEntitlementOrThrow(userId, 'chat')` pattern on chat API; feature keys like `'chat'`, `'semantic_search'`, `'auto_tag'`, `'auto_title'`.
- Redis + `ioredis` singleton `lib/redis.ts`; atomic Lua `checkAndConsume` — routing must not introduce slow Redis calls into resolve path.
- Review fixes emphasized: no blocking entitlement drift on hot path, parameterized queries elsewhere — router stays CPU-only on resolve.
- Duplicate helpers consolidated into `lib/quota-utils.ts` — follow same “single source of truth” mindset for AI env resolution.

---

## Git Intelligence Summary

Recent commits on AI paths:

| Commit   | Insight |
|----------|---------|
| `41596c2` | OpenRouter already falls back to `CUSTOM_OPENAI_API_KEY` when `OPENROUTER_API_KEY` missing — preserve in router delegation |
| `195e845` | Security-sensitive SQL elsewhere — router changes must not weaken logging or leak secrets |

---

## Latest Technical Information

- **OpenRouter:** OpenAI-compatible `/chat/completions` base URL `https://openrouter.ai/api/v1`; models addressed as `provider/model`. Confirm optional headers for attribution in OpenRouter current docs when implementing AC3.
- **Vercel AI SDK:** Chat flows use `streamText` with provider from `getChatProvider` — unchanged surface after refactor.

---

## Project Context Reference

- **Epics:** `docs/epics.md` — Story 3.2 acceptance + FR17 mapping
- **PRD:** `docs/prd.md` — FR17, NFR-P3, architecture bullet mentioning `lib/ai/router.ts`
- **Architecture / SaaS:** `memento-note/docs/saas-deployment-prep.md` — tiers and AI cost context
- **BYOK / future router:** `memento-note/docs/byok-billing-patch-v3.md` §2 — aspirational full router (BYOK + fallback); **implement only the routing choke-point slice** in this story

---

## Dev Agent Record

### Agent Model Used

Composer (Cursor agent)

### Debug Log References

- Vitest: `tests/unit/router.test.ts` (resolve precedence, anthropic embedding guard, OpenRouter slugs, median timing).

### Completion Notes List

- Implemented `lib/ai/router.ts` with synchronous `resolveAiRoute`, `resolveAiRouteWithTiming`, and `formatAiRouteDebug`.
- Refactored `getTagsProvider`, `getEmbeddingsProvider`, and `getChatProvider` to delegate to `resolveAiRoute` + exported `getProviderInstance`.
- Chat API logs structured routing JSON when `NODE_ENV !== 'production'` or `MEMENTO_AI_ROUTE_DEBUG=1`.
- Confirmed `CustomOpenAIProvider` already sets `HTTP-Referer` and `X-Title` on outbound fetch (OpenRouter-compatible).
- Documented OpenRouter slugs beside `PROVIDER_DEFAULTS.openrouter`.
- Regression checks: `npm run test:unit -- tests/unit/router.test.ts tests/unit/entitlements.test.ts` (green). Full `npm run test:unit` still reports known unrelated failures (migration suites, Playwright-named files picked up by Vitest).

### File List

- `memento-note/lib/ai/router.ts` (new)
- `memento-note/lib/ai/factory.ts` (modified)
- `memento-note/app/api/chat/route.ts` (modified)
- `memento-note/tests/unit/router.test.ts` (new)

---

## Change Log

| Date | Change |
|------|--------|
| 2026-05-15 | Story 3.2 implemented: central AI router, factory delegation, chat observability, unit tests |
| 2026-05-15 | Code review: allowlist validation (D1), removed config leak in logs (D2), provider-specific model defaults (D3), null guard in pick(), .catch() on incrementUsageAsync, updated model names to May 2026 |

---

## Story Completion Status

- Story ID: 3.2
- Story Key: `3-2-custom-llm-router`
- Status: **review**
- Completion Note: Code review patches applied. 3 decisions resolved, 4 patches applied, 15 tests passing.

---

### Review Findings (2026-05-15)

#### Decision Needed — Resolved

- [x] [Review][Decision → Patch] Unsafe `as AiGatewayProvider` cast — no runtime validation [router.ts:114] — Résolu : allowlist `VALID_PROVIDERS` avec throw sur provider inconnu.
- [x] [Review][Decision → Patch] `console.error` dump tout le config — risque de leak API keys [router.ts:31,58,86,106] — Résolu : dump supprimé, message d'erreur clair uniquement.
- [x] [Review][Decision → Patch] Default `granite4:latest` (Ollama) envoyé aux providers non-Ollama [router.ts] — Résolu : `PROVIDER_MODEL_DEFAULTS` map avec defaults par provider (deepseek-v4-flash, gpt-4.1-mini, gemini-2.5-flash, etc.).

#### Patch — Applied

- [x] [Review][Patch] `pick()` ne filtre pas `null` — TypeError sur `.toLowerCase()` [router.ts:40] — Fixed : `v != null` au lieu de `v !== undefined`.
- [x] [Review][Patch] `incrementUsageAsync` fire-and-forget sans `.catch()` [chat/route.ts:61] — Fixed : ajout `.catch()` avec log d'erreur.
- [x] [Review][Patch] PROVIDER_DEFAULTS model names obsolètes [factory.ts] — Fixed : mis à jour (deepseek-v4-flash, gpt-4.1-mini, gemini-2.5-flash, mistral-medium-3.5-latest).
- [x] [Review][Patch] Tests manquants pour validation provider + defaults dynamiques [router.test.ts] — Fixed : +9 tests (unknown provider, typo, provider defaults ×6, null config, explicit override).

#### Deferred

- [x] [Review][Defer] OLLAMA_BASE_URL routing inconsistency router vs factory — pré-existant, cfgOnly vs env fallback dans createOllamaProvider
- [x] [Review][Defer] Prototype pollution sur config object — faible risque, config vient de getSystemConfig()
- [x] [Review][Defer] NODE_ENV debug logge dans tous les env non-production — acceptable, pas de secrets dans le debug output
- [x] [Review][Defer] Double résolution debug dans chat/route.ts — acceptable, synchrone et gated par env flag

#### Dismissed

- Embedding lane utilise AI_MODEL_TAGS — by design (matching pré-refactor factory behavior)
- AC6 seulement wired dans chat — spec dit "one representative path", correct
- `formatAiRouteDebug` resolveMs undefined — JSON.stringify omet undefined, correct
- Cross-lane fallback tags→chat — conception intentionnelle
- Tests performance flaky en CI — acceptable avec comment "warm path"