Files
Momento/docs/3-2-custom-llm-router.md
Antigravity bd495be965
All checks were successful
Deploy to Production / Build and Deploy (push) Successful in 12s
feat: design system overhaul — sidebar, AI chats, settings, brainstorm, color cleanup
- Sidebar: dynamic brand-accent colors, brainstorm section restyled
- AI chat general: popup panel with expand/collapse, hides when contextual AI open
- AI chat contextual: tabs reordered (Actions first), X close button, height fix
- Settings: all tabs restyled, 6 new color presets (sage, terracotta, iron, etc.)
- Global color cleanup: emerald/orange hardcoded → brand-accent dynamic
- Brainstorm page: orange → brand-accent throughout
- PageEntry animation component added to key pages
- Floating AI button: bg-brand-accent instead of hardcoded black
- i18n: all 15 locales updated with new AI/billing keys
- Billing: freemium quota tracking, BYOK, stripe subscription scaffolding
- Admin: integrated into new design
- AGENTS.md + CLAUDE.md project rules added
2026-05-16 12:59:30 +00:00

14 KiB
Raw Blame History

Story 3.2: Custom LLM Router & OpenRouter Integration

Status: review

Story

As a system, I want to route AI prompts across many providers with OpenRouter as the unified aggregation layer where appropriate, so that we have flexibility in API fulfillment without vendor lock-in.

Epic: Epic 3 — The SaaS Commercial Engine (Monetization & API Cost Protection)
FR coverage: FR17 (dynamic provider switching; aggregation via OpenRouter for long-tail models).


Acceptance Criteria

  1. [AC1] Central router module: Introduce memento-note/lib/ai/router.ts that owns resolution of which backing gateway (ProviderType from factory.ts) and model identifiers (including OpenRouter vendor/model slugs) to use per feature lane (chat | tags | embedding). Resolution must be pure synchronous configuration composition on top of existing getSystemConfig() / env keys — no outbound HTTP inside the routers resolve step.
  2. [AC2] NFR-P3 (≤50ms routing): From intercept (entry to router resolve) through returning a ready-to-call AIProvider instance, wall-clock time MUST stay under 50ms on warm server paths (document measurement approach: performance.now() around resolve only in tests or dev-only logging behind NODE_ENV !== 'production').
  3. [AC3] OpenRouter path: When admin configures AI_PROVIDER_CHAT, AI_PROVIDER_TAGS, or AI_PROVIDER_EMBEDDING as openrouter, requests MUST use CustomOpenAIProvider against https://openrouter.ai/api/v1 with models expressed as OpenRouter IDs (e.g. openai/gpt-4o-mini, deepseek/deepseek-chat). Honor existing env fallback: OPENROUTER_API_KEY then CUSTOM_OPENAI_API_KEY (already in createOpenRouterProvider).
  4. [AC4] Single choke-point: getChatProvider, getTagsProvider, and getEmbeddingsProvider in lib/ai/factory.ts MUST delegate provider construction to the router (or shared resolver invoked by both router and factory) so future stories (3.3 fallback, 3.5 BYOK) attach in one place. Do not leave three divergent code paths that each interpret env/admin config differently.
  5. [AC5] Regression safety: All existing AI call sites that already use getChatProvider / getTagsProvider / getEmbeddingsProvider continue to work without signature changes (backward-compatible exports).
  6. [AC6] Observability hook: Router exposes a minimal structured log or debug field (e.g. { lane, providerType, modelId, resolveMs }) usable from one representative path (e.g. app/api/chat/route.ts) so operators can verify routing choices — gated so production logs are not noisy (debug flag or NODE_ENV).

Tasks / Subtasks

  • Task 1: Define router API and types (AC: #1, #4)
    • Subtask 1.1: Add AiFeatureLane union ('chat' | 'tags' | 'embedding') and ResolvedAiRoute type { providerType, modelName, embeddingModelName?, meta } in router.ts
    • Subtask 1.2: Implement resolveAiRoute(lane, config: Record<string, string>): ResolvedAiRoute mapping existing env keys (AI_PROVIDER_CHAT, AI_MODEL_CHAT, AI_PROVIDER_TAGS, AI_MODEL_TAGS, AI_PROVIDER_EMBEDDING, AI_MODEL_EMBEDDING, fallbacks per current factory.ts)
  • Task 2: Wire factory to router (AC: #4, #5)
    • Subtask 2.1: Refactor getChatProvider / getTagsProvider / getEmbeddingsProvider to call resolveAiRoute then getProviderInstance (export getProviderInstance from factory.ts if needed, or move instantiation behind instantiateFromRoute() in router that internally imports provider constructors)
    • Subtask 2.2: Preserve all validation rules (e.g. embeddings cannot use anthropic / anthropic_custom)
  • Task 3: OpenRouter model IDs & docs (AC: #3)
    • Subtask 3.1: Document default OpenRouter models in dev notes and align PROVIDER_DEFAULTS.openrouter if epic requires explicit multi-provider coverage via OpenRouter slugs
    • Subtask 3.2: Ensure HTTP headers expected by OpenRouter (Authorization, optional HTTP-Referer / X-Title per OpenRouter docs) are set in CustomOpenAIProvider or router wrapper if missing — verify against current providers/custom-openai.ts
  • Task 4: NFR-P3 measurement (AC: #2)
    • Subtask 4.1: Add unit test(s) that mock config and assert resolve completes under 50ms (allow generous CI timer slack — test repeated runs median or single-run ceiling with comment that excludes cold JIT)
    • Subtask 4.2: Optional micro-benchmark in tests/ documenting methodology
  • Task 5: Observability (AC: #6)
    • Subtask 5.1: Log structured routing meta once per chat request (behind flag or non-production)
  • Task 6: Story boundaries — explicit non-goals (AC: #4)
    • Subtask 6.1: Do not implement multi-provider HTTP fallback loops here — that is Story 3.3
    • Subtask 6.2: Do not implement BYOK decryption or UserAPIKey — that is Story 3.5; only leave clean extension seams (resolveApiKey TODO or interface stub commented)

Dev Notes

Epic context

  • Story 3.1 delivered Redis-backed entitlements (checkEntitlementOrThrow, trackFeatureUsage). Router runs after quota passes at API boundaries (e.g. chat route already checks entitlement before getChatProvider).
  • Story 3.3 will add failure-driven fallback (429/500) within 1.5s — router should return primary route only; retry wrapper belongs in 3.3 or in a thin executeWithFallback future module.

Current codebase reality (must read before coding)

  • No lib/ai/router.ts today — PRD and byok-billing-patch-v3.md describe a target architecture; this story delivers the first production slice: deterministic routing + OpenRouter normalization + timing guarantee on resolve.
  • lib/ai/factory.ts already implements createOpenRouterProvider and getProviderInstance switch — reuse this; avoid parallel provider factories.
  • lib/config.ts / admin settings expose AI_PROVIDER_* and model fields consumed by getSystemConfig() — router MUST honor the same precedence chain as todays getChatProvider / getTagsProvider / getEmbeddingsProvider.

Files — expected touch list

NEW

  • memento-note/lib/ai/router.ts — route resolution + optional getProviderForLane() helper

UPDATE

  • memento-note/lib/ai/factory.ts — delegate to router; possibly export getProviderInstance for tests
  • memento-note/app/api/chat/route.ts — optional debug log line using resolved meta (AC6)
  • memento-note/tests/unit/ — new router.test.ts (or adjacent file) for resolve speed + mapping correctness

READ BEFORE MODIFY

  • memento-note/lib/ai/providers/custom-openai.ts — OpenRouter compatibility & headers
  • memento-note/lib/entitlements.ts — ordering vs routing (quota stays outside router)

Testing standards

  • Use existing Vitest setup.
  • Unit-test resolveAiRoute with frozen config objects — no live Redis/DB.
  • Prefer deterministic assertions on { providerType, modelName } outputs for each lane.

Dev Agent Guardrails

Technical requirements

  • NFR-P3 scope: Applies to router resolution + provider instantiation, not LLM network latency. Do not await HTTP inside resolve.
  • Thread-safe: Resolver must remain side-effect-free aside from optional gated logging (no global mutation of config).
  • Keys: Never log API keys. Log only provider type and model id.
  • i18n: No user-facing strings required for this story unless touching UI (prefer none).

Architecture compliance

  • Align with brownfield stack: Next.js App Router, existing AIProvider interface (lib/ai/types.ts).
  • OpenRouter model naming: use slash-separated IDs per OpenRouter models API.
  • Preserve compatibility with direct providers (deepseek, openai, …) when admin selects them — OpenRouter is one gateway among many, not forced globally unless configured.

Library / framework requirements

  • Reuse @ai-sdk/* / existing provider wrappers already in repo — no new HTTP client for routing.
  • Do not add heavy dependencies for routing (no new DI framework).

File structure requirements

  • Router lives under memento-note/lib/ai/router.ts (matches PRD / patch doc path).
  • Named exports, TypeScript, match existing factory.ts style.

Testing requirements

  • Coverage for: OpenRouter lane resolution; embedding lane rejection of anthropic providers; fallback precedence parity with pre-refactor factory.ts (golden-table tests comparing old vs new outputs optional — strong signal).

Previous Story Intelligence

Source: docs/3-1-freemium-quota-tracking.md

  • Entitlements: checkEntitlementOrThrow(userId, 'chat') pattern on chat API; feature keys like 'chat', 'semantic_search', 'auto_tag', 'auto_title'.
  • Redis + ioredis singleton lib/redis.ts; atomic Lua checkAndConsume — routing must not introduce slow Redis calls into resolve path.
  • Review fixes emphasized: no blocking entitlement drift on hot path, parameterized queries elsewhere — router stays CPU-only on resolve.
  • Duplicate helpers consolidated into lib/quota-utils.ts — follow same “single source of truth” mindset for AI env resolution.

Git Intelligence Summary

Recent commits on AI paths:

Commit Insight
41596c2 OpenRouter already falls back to CUSTOM_OPENAI_API_KEY when OPENROUTER_API_KEY missing — preserve in router delegation
195e845 Security-sensitive SQL elsewhere — router changes must not weaken logging or leak secrets

Latest Technical Information

  • OpenRouter: OpenAI-compatible /chat/completions base URL https://openrouter.ai/api/v1; models addressed as provider/model. Confirm optional headers for attribution in OpenRouter current docs when implementing AC3.
  • Vercel AI SDK: Chat flows use streamText with provider from getChatProvider — unchanged surface after refactor.

Project Context Reference

  • Epics: docs/epics.md — Story 3.2 acceptance + FR17 mapping
  • PRD: docs/prd.md — FR17, NFR-P3, architecture bullet mentioning lib/ai/router.ts
  • Architecture / SaaS: memento-note/docs/saas-deployment-prep.md — tiers and AI cost context
  • BYOK / future router: memento-note/docs/byok-billing-patch-v3.md §2 — aspirational full router (BYOK + fallback); implement only the routing choke-point slice in this story

Dev Agent Record

Agent Model Used

Composer (Cursor agent)

Debug Log References

  • Vitest: tests/unit/router.test.ts (resolve precedence, anthropic embedding guard, OpenRouter slugs, median timing).

Completion Notes List

  • Implemented lib/ai/router.ts with synchronous resolveAiRoute, resolveAiRouteWithTiming, and formatAiRouteDebug.
  • Refactored getTagsProvider, getEmbeddingsProvider, and getChatProvider to delegate to resolveAiRoute + exported getProviderInstance.
  • Chat API logs structured routing JSON when NODE_ENV !== 'production' or MEMENTO_AI_ROUTE_DEBUG=1.
  • Confirmed CustomOpenAIProvider already sets HTTP-Referer and X-Title on outbound fetch (OpenRouter-compatible).
  • Documented OpenRouter slugs beside PROVIDER_DEFAULTS.openrouter.
  • Regression checks: npm run test:unit -- tests/unit/router.test.ts tests/unit/entitlements.test.ts (green). Full npm run test:unit still reports known unrelated failures (migration suites, Playwright-named files picked up by Vitest).

File List

  • memento-note/lib/ai/router.ts (new)
  • memento-note/lib/ai/factory.ts (modified)
  • memento-note/app/api/chat/route.ts (modified)
  • memento-note/tests/unit/router.test.ts (new)

Change Log

Date Change
2026-05-15 Story 3.2 implemented: central AI router, factory delegation, chat observability, unit tests
2026-05-15 Code review: allowlist validation (D1), removed config leak in logs (D2), provider-specific model defaults (D3), null guard in pick(), .catch() on incrementUsageAsync, updated model names to May 2026

Story Completion Status

  • Story ID: 3.2
  • Story Key: 3-2-custom-llm-router
  • Status: review
  • Completion Note: Code review patches applied. 3 decisions resolved, 4 patches applied, 15 tests passing.

Review Findings (2026-05-15)

Decision Needed — Resolved

  • [Review][Decision → Patch] Unsafe as AiGatewayProvider cast — no runtime validation [router.ts:114] — Résolu : allowlist VALID_PROVIDERS avec throw sur provider inconnu.
  • [Review][Decision → Patch] console.error dump tout le config — risque de leak API keys [router.ts:31,58,86,106] — Résolu : dump supprimé, message d'erreur clair uniquement.
  • [Review][Decision → Patch] Default granite4:latest (Ollama) envoyé aux providers non-Ollama [router.ts] — Résolu : PROVIDER_MODEL_DEFAULTS map avec defaults par provider (deepseek-v4-flash, gpt-4.1-mini, gemini-2.5-flash, etc.).

Patch — Applied

  • [Review][Patch] pick() ne filtre pas null — TypeError sur .toLowerCase() [router.ts:40] — Fixed : v != null au lieu de v !== undefined.
  • [Review][Patch] incrementUsageAsync fire-and-forget sans .catch() [chat/route.ts:61] — Fixed : ajout .catch() avec log d'erreur.
  • [Review][Patch] PROVIDER_DEFAULTS model names obsolètes [factory.ts] — Fixed : mis à jour (deepseek-v4-flash, gpt-4.1-mini, gemini-2.5-flash, mistral-medium-3.5-latest).
  • [Review][Patch] Tests manquants pour validation provider + defaults dynamiques [router.test.ts] — Fixed : +9 tests (unknown provider, typo, provider defaults ×6, null config, explicit override).

Deferred

  • [Review][Defer] OLLAMA_BASE_URL routing inconsistency router vs factory — pré-existant, cfgOnly vs env fallback dans createOllamaProvider
  • [Review][Defer] Prototype pollution sur config object — faible risque, config vient de getSystemConfig()
  • [Review][Defer] NODE_ENV debug logge dans tous les env non-production — acceptable, pas de secrets dans le debug output
  • [Review][Defer] Double résolution debug dans chat/route.ts — acceptable, synchrone et gated par env flag

Dismissed

  • Embedding lane utilise AI_MODEL_TAGS — by design (matching pré-refactor factory behavior)
  • AC6 seulement wired dans chat — spec dit "one representative path", correct
  • formatAiRouteDebug resolveMs undefined — JSON.stringify omet undefined, correct
  • Cross-lane fallback tags→chat — conception intentionnelle
  • Tests performance flaky en CI — acceptable avec comment "warm path"