All checks were successful
Deploy to Production / Build and Deploy (push) Successful in 12s
- Sidebar: dynamic brand-accent colors, brainstorm section restyled - AI chat general: popup panel with expand/collapse, hides when contextual AI open - AI chat contextual: tabs reordered (Actions first), X close button, height fix - Settings: all tabs restyled, 6 new color presets (sage, terracotta, iron, etc.) - Global color cleanup: emerald/orange hardcoded → brand-accent dynamic - Brainstorm page: orange → brand-accent throughout - PageEntry animation component added to key pages - Floating AI button: bg-brand-accent instead of hardcoded black - i18n: all 15 locales updated with new AI/billing keys - Billing: freemium quota tracking, BYOK, stripe subscription scaffolding - Admin: integrated into new design - AGENTS.md + CLAUDE.md project rules added
14 KiB
14 KiB
Story 3.2: Custom LLM Router & OpenRouter Integration
Status: review
Story
As a system, I want to route AI prompts across many providers with OpenRouter as the unified aggregation layer where appropriate, so that we have flexibility in API fulfillment without vendor lock-in.
Epic: Epic 3 — The SaaS Commercial Engine (Monetization & API Cost Protection)
FR coverage: FR17 (dynamic provider switching; aggregation via OpenRouter for long-tail models).
Acceptance Criteria
- [AC1] Central router module: Introduce
memento-note/lib/ai/router.tsthat owns resolution of which backing gateway (ProviderTypefromfactory.ts) and model identifiers (including OpenRoutervendor/modelslugs) to use per feature lane (chat|tags|embedding). Resolution must be pure synchronous configuration composition on top of existinggetSystemConfig()/ env keys — no outbound HTTP inside the router’s resolve step. - [AC2] NFR-P3 (≤50ms routing): From intercept (entry to router resolve) through returning a ready-to-call
AIProviderinstance, wall-clock time MUST stay under 50ms on warm server paths (document measurement approach:performance.now()around resolve only in tests or dev-only logging behindNODE_ENV !== 'production'). - [AC3] OpenRouter path: When admin configures
AI_PROVIDER_CHAT,AI_PROVIDER_TAGS, orAI_PROVIDER_EMBEDDINGasopenrouter, requests MUST useCustomOpenAIProvideragainsthttps://openrouter.ai/api/v1with models expressed as OpenRouter IDs (e.g.openai/gpt-4o-mini,deepseek/deepseek-chat). Honor existing env fallback:OPENROUTER_API_KEYthenCUSTOM_OPENAI_API_KEY(already increateOpenRouterProvider). - [AC4] Single choke-point:
getChatProvider,getTagsProvider, andgetEmbeddingsProviderinlib/ai/factory.tsMUST delegate provider construction to the router (or shared resolver invoked by both router and factory) so future stories (3.3 fallback, 3.5 BYOK) attach in one place. Do not leave three divergent code paths that each interpret env/admin config differently. - [AC5] Regression safety: All existing AI call sites that already use
getChatProvider/getTagsProvider/getEmbeddingsProvidercontinue to work without signature changes (backward-compatible exports). - [AC6] Observability hook: Router exposes a minimal structured log or debug field (e.g.
{ lane, providerType, modelId, resolveMs }) usable from one representative path (e.g.app/api/chat/route.ts) so operators can verify routing choices — gated so production logs are not noisy (debug flag orNODE_ENV).
Tasks / Subtasks
- Task 1: Define router API and types (AC: #1, #4)
- Subtask 1.1: Add
AiFeatureLaneunion ('chat' | 'tags' | 'embedding') andResolvedAiRoutetype{ providerType, modelName, embeddingModelName?, meta }inrouter.ts - Subtask 1.2: Implement
resolveAiRoute(lane, config: Record<string, string>): ResolvedAiRoutemapping existing env keys (AI_PROVIDER_CHAT,AI_MODEL_CHAT,AI_PROVIDER_TAGS,AI_MODEL_TAGS,AI_PROVIDER_EMBEDDING,AI_MODEL_EMBEDDING, fallbacks per currentfactory.ts)
- Subtask 1.1: Add
- Task 2: Wire factory to router (AC: #4, #5)
- Subtask 2.1: Refactor
getChatProvider/getTagsProvider/getEmbeddingsProviderto callresolveAiRoutethengetProviderInstance(exportgetProviderInstancefromfactory.tsif needed, or move instantiation behindinstantiateFromRoute()in router that internally imports provider constructors) - Subtask 2.2: Preserve all validation rules (e.g. embeddings cannot use
anthropic/anthropic_custom)
- Subtask 2.1: Refactor
- Task 3: OpenRouter model IDs & docs (AC: #3)
- Subtask 3.1: Document default OpenRouter models in dev notes and align
PROVIDER_DEFAULTS.openrouterif epic requires explicit multi-provider coverage via OpenRouter slugs - Subtask 3.2: Ensure HTTP headers expected by OpenRouter (
Authorization, optionalHTTP-Referer/X-Titleper OpenRouter docs) are set inCustomOpenAIProvideror router wrapper if missing — verify against currentproviders/custom-openai.ts
- Subtask 3.1: Document default OpenRouter models in dev notes and align
- Task 4: NFR-P3 measurement (AC: #2)
- Subtask 4.1: Add unit test(s) that mock config and assert resolve completes under 50ms (allow generous CI timer slack — test repeated runs median or single-run ceiling with comment that excludes cold JIT)
- Subtask 4.2: Optional micro-benchmark in
tests/documenting methodology
- Task 5: Observability (AC: #6)
- Subtask 5.1: Log structured routing meta once per chat request (behind flag or non-production)
- Task 6: Story boundaries — explicit non-goals (AC: #4)
- Subtask 6.1: Do not implement multi-provider HTTP fallback loops here — that is Story 3.3
- Subtask 6.2: Do not implement BYOK decryption or
UserAPIKey— that is Story 3.5; only leave clean extension seams (resolveApiKeyTODO or interface stub commented)
Dev Notes
Epic context
- Story 3.1 delivered Redis-backed entitlements (
checkEntitlementOrThrow,trackFeatureUsage). Router runs after quota passes at API boundaries (e.g. chat route already checks entitlement beforegetChatProvider). - Story 3.3 will add failure-driven fallback (429/500) within 1.5s — router should return primary route only; retry wrapper belongs in 3.3 or in a thin
executeWithFallbackfuture module.
Current codebase reality (must read before coding)
- No
lib/ai/router.tstoday — PRD andbyok-billing-patch-v3.mddescribe a target architecture; this story delivers the first production slice: deterministic routing + OpenRouter normalization + timing guarantee on resolve. lib/ai/factory.tsalready implementscreateOpenRouterProviderandgetProviderInstanceswitch — reuse this; avoid parallel provider factories.lib/config.ts/ admin settings exposeAI_PROVIDER_*and model fields consumed bygetSystemConfig()— router MUST honor the same precedence chain as today’sgetChatProvider/getTagsProvider/getEmbeddingsProvider.
Files — expected touch list
NEW
memento-note/lib/ai/router.ts— route resolution + optionalgetProviderForLane()helper
UPDATE
memento-note/lib/ai/factory.ts— delegate to router; possibly exportgetProviderInstancefor testsmemento-note/app/api/chat/route.ts— optional debug log line using resolved meta (AC6)memento-note/tests/unit/— newrouter.test.ts(or adjacent file) for resolve speed + mapping correctness
READ BEFORE MODIFY
memento-note/lib/ai/providers/custom-openai.ts— OpenRouter compatibility & headersmemento-note/lib/entitlements.ts— ordering vs routing (quota stays outside router)
Testing standards
- Use existing Vitest setup.
- Unit-test
resolveAiRoutewith frozen config objects — no live Redis/DB. - Prefer deterministic assertions on
{ providerType, modelName }outputs for each lane.
Dev Agent Guardrails
Technical requirements
- NFR-P3 scope: Applies to router resolution + provider instantiation, not LLM network latency. Do not await HTTP inside resolve.
- Thread-safe: Resolver must remain side-effect-free aside from optional gated logging (no global mutation of config).
- Keys: Never log API keys. Log only provider type and model id.
- i18n: No user-facing strings required for this story unless touching UI (prefer none).
Architecture compliance
- Align with brownfield stack: Next.js App Router, existing
AIProviderinterface (lib/ai/types.ts). - OpenRouter model naming: use slash-separated IDs per OpenRouter models API.
- Preserve compatibility with direct providers (
deepseek,openai, …) when admin selects them — OpenRouter is one gateway among many, not forced globally unless configured.
Library / framework requirements
- Reuse
@ai-sdk/*/ existing provider wrappers already in repo — no new HTTP client for routing. - Do not add heavy dependencies for routing (no new DI framework).
File structure requirements
- Router lives under
memento-note/lib/ai/router.ts(matches PRD / patch doc path). - Named exports, TypeScript, match existing
factory.tsstyle.
Testing requirements
- Coverage for: OpenRouter lane resolution; embedding lane rejection of anthropic providers; fallback precedence parity with pre-refactor
factory.ts(golden-table tests comparing old vs new outputs optional — strong signal).
Previous Story Intelligence
Source: docs/3-1-freemium-quota-tracking.md
- Entitlements:
checkEntitlementOrThrow(userId, 'chat')pattern on chat API; feature keys like'chat','semantic_search','auto_tag','auto_title'. - Redis +
ioredissingletonlib/redis.ts; atomic LuacheckAndConsume— routing must not introduce slow Redis calls into resolve path. - Review fixes emphasized: no blocking entitlement drift on hot path, parameterized queries elsewhere — router stays CPU-only on resolve.
- Duplicate helpers consolidated into
lib/quota-utils.ts— follow same “single source of truth” mindset for AI env resolution.
Git Intelligence Summary
Recent commits on AI paths:
| Commit | Insight |
|---|---|
41596c2 |
OpenRouter already falls back to CUSTOM_OPENAI_API_KEY when OPENROUTER_API_KEY missing — preserve in router delegation |
195e845 |
Security-sensitive SQL elsewhere — router changes must not weaken logging or leak secrets |
Latest Technical Information
- OpenRouter: OpenAI-compatible
/chat/completionsbase URLhttps://openrouter.ai/api/v1; models addressed asprovider/model. Confirm optional headers for attribution in OpenRouter current docs when implementing AC3. - Vercel AI SDK: Chat flows use
streamTextwith provider fromgetChatProvider— unchanged surface after refactor.
Project Context Reference
- Epics:
docs/epics.md— Story 3.2 acceptance + FR17 mapping - PRD:
docs/prd.md— FR17, NFR-P3, architecture bullet mentioninglib/ai/router.ts - Architecture / SaaS:
memento-note/docs/saas-deployment-prep.md— tiers and AI cost context - BYOK / future router:
memento-note/docs/byok-billing-patch-v3.md§2 — aspirational full router (BYOK + fallback); implement only the routing choke-point slice in this story
Dev Agent Record
Agent Model Used
Composer (Cursor agent)
Debug Log References
- Vitest:
tests/unit/router.test.ts(resolve precedence, anthropic embedding guard, OpenRouter slugs, median timing).
Completion Notes List
- Implemented
lib/ai/router.tswith synchronousresolveAiRoute,resolveAiRouteWithTiming, andformatAiRouteDebug. - Refactored
getTagsProvider,getEmbeddingsProvider, andgetChatProviderto delegate toresolveAiRoute+ exportedgetProviderInstance. - Chat API logs structured routing JSON when
NODE_ENV !== 'production'orMEMENTO_AI_ROUTE_DEBUG=1. - Confirmed
CustomOpenAIProvideralready setsHTTP-RefererandX-Titleon outbound fetch (OpenRouter-compatible). - Documented OpenRouter slugs beside
PROVIDER_DEFAULTS.openrouter. - Regression checks:
npm run test:unit -- tests/unit/router.test.ts tests/unit/entitlements.test.ts(green). Fullnpm run test:unitstill reports known unrelated failures (migration suites, Playwright-named files picked up by Vitest).
File List
memento-note/lib/ai/router.ts(new)memento-note/lib/ai/factory.ts(modified)memento-note/app/api/chat/route.ts(modified)memento-note/tests/unit/router.test.ts(new)
Change Log
| Date | Change |
|---|---|
| 2026-05-15 | Story 3.2 implemented: central AI router, factory delegation, chat observability, unit tests |
| 2026-05-15 | Code review: allowlist validation (D1), removed config leak in logs (D2), provider-specific model defaults (D3), null guard in pick(), .catch() on incrementUsageAsync, updated model names to May 2026 |
Story Completion Status
- Story ID: 3.2
- Story Key:
3-2-custom-llm-router - Status: review
- Completion Note: Code review patches applied. 3 decisions resolved, 4 patches applied, 15 tests passing.
Review Findings (2026-05-15)
Decision Needed — Resolved
- [Review][Decision → Patch] Unsafe
as AiGatewayProvidercast — no runtime validation [router.ts:114] — Résolu : allowlistVALID_PROVIDERSavec throw sur provider inconnu. - [Review][Decision → Patch]
console.errordump tout le config — risque de leak API keys [router.ts:31,58,86,106] — Résolu : dump supprimé, message d'erreur clair uniquement. - [Review][Decision → Patch] Default
granite4:latest(Ollama) envoyé aux providers non-Ollama [router.ts] — Résolu :PROVIDER_MODEL_DEFAULTSmap avec defaults par provider (deepseek-v4-flash, gpt-4.1-mini, gemini-2.5-flash, etc.).
Patch — Applied
- [Review][Patch]
pick()ne filtre pasnull— TypeError sur.toLowerCase()[router.ts:40] — Fixed :v != nullau lieu dev !== undefined. - [Review][Patch]
incrementUsageAsyncfire-and-forget sans.catch()[chat/route.ts:61] — Fixed : ajout.catch()avec log d'erreur. - [Review][Patch] PROVIDER_DEFAULTS model names obsolètes [factory.ts] — Fixed : mis à jour (deepseek-v4-flash, gpt-4.1-mini, gemini-2.5-flash, mistral-medium-3.5-latest).
- [Review][Patch] Tests manquants pour validation provider + defaults dynamiques [router.test.ts] — Fixed : +9 tests (unknown provider, typo, provider defaults ×6, null config, explicit override).
Deferred
- [Review][Defer] OLLAMA_BASE_URL routing inconsistency router vs factory — pré-existant, cfgOnly vs env fallback dans createOllamaProvider
- [Review][Defer] Prototype pollution sur config object — faible risque, config vient de getSystemConfig()
- [Review][Defer] NODE_ENV debug logge dans tous les env non-production — acceptable, pas de secrets dans le debug output
- [Review][Defer] Double résolution debug dans chat/route.ts — acceptable, synchrone et gated par env flag
Dismissed
- Embedding lane utilise AI_MODEL_TAGS — by design (matching pré-refactor factory behavior)
- AC6 seulement wired dans chat — spec dit "one representative path", correct
formatAiRouteDebugresolveMs undefined — JSON.stringify omet undefined, correct- Cross-lane fallback tags→chat — conception intentionnelle
- Tests performance flaky en CI — acceptable avec comment "warm path"