Publication IA: - 4 templates (magazine, brief, essay, simple) avec CSS riche - Rewrite IA (article/exercises/tutorial/reference/mixed) - Modération avec timeout 12s + fallback safe - Quotas publish_enhance par tier (basic=2, pro=15, business=100) - Détection contenu stale (hash) - Migration DB publishedContent/publishedTemplate/publishedSourceHash Fixes: - cheerio v1.2: Element -> AnyNode (domhandler), decodeEntities cast - _isShared ajouté au type Note (champ virtuel serveur) - callout colors PDF export: extraction fonction pure testable - admin/published: guard note.userId null - Cmd+S fonctionne en mode dialog (pas seulement fullPage) i18n: - 23 clés publish* traduites dans les 15 locales - Extension Web Clipper: 13 locales mise à jour Tests: - callout-colors.test.ts (6 tests) - note-visible-in-view.test.ts (5 tests) - entitlements.test.ts + byok-entitlements.test.ts: mock usageLog + unstubAllEnvs - 199/199 tests passent Tracker: user-stories.md sync avec sprint-status.yaml
270 lines
7.7 KiB
Markdown
270 lines
7.7 KiB
Markdown
---
|
|
title: MCP Server Robustness Improvements
|
|
status: done
|
|
priority: high
|
|
completedDate: 2026-05-24
|
|
---
|
|
|
|
# Spec: MCP Server Robustness Improvements
|
|
|
|
## Context
|
|
|
|
Memento currently uses MCP SDK v1.0.4 with a working but potentially fragile implementation. With MCP SDK v2 coming in Q1 2026, we need to:
|
|
1. Make the current implementation more robust
|
|
2. Prepare for v2 migration
|
|
3. Add production-ready features
|
|
|
|
## Goals
|
|
|
|
1. **Error Handling**: Add structured error responses and recovery mechanisms
|
|
2. **Observability**: Add metrics, logging, and health monitoring
|
|
3. **Performance**: Add rate limiting, request queuing, and response caching
|
|
4. **Security**: Add request validation, input sanitization, and audit logging
|
|
5. **Testing**: Add comprehensive test suite
|
|
6. **Documentation**: Improve API documentation and examples
|
|
|
|
## Tasks
|
|
|
|
### 1. Error Handling & Resilience
|
|
|
|
**File**: `mcp-server/errors.js` (new)
|
|
|
|
```javascript
|
|
// Structured error codes
|
|
export const McpErrors = {
|
|
INVALID_INPUT: { code: -32600, message: 'Invalid Request' },
|
|
NOT_FOUND: { code: -32601, message: 'Tool not found' },
|
|
DATABASE_ERROR: { code: -32603, message: 'Internal error' },
|
|
RATE_LIMITED: { code: 429, message: 'Rate limit exceeded' },
|
|
AUTH_FAILED: { code: 401, message: 'Authentication failed' },
|
|
}
|
|
|
|
// Error response wrapper
|
|
export function mcpError(code, detail) {
|
|
return {
|
|
content: [{ type: 'text', text: JSON.stringify({
|
|
error: true,
|
|
code,
|
|
message: McpErrors[code]?.message || 'Unknown error',
|
|
detail,
|
|
timestamp: new Date().toISOString(),
|
|
}) }],
|
|
}
|
|
}
|
|
```
|
|
|
|
**File**: `mcp-server/index-sse.js`
|
|
|
|
- Add try-catch around all tool handlers
|
|
- Add circuit breaker for database connections
|
|
- Add graceful degradation when DB is unavailable
|
|
- Add request timeout enforcement
|
|
|
|
### 2. Observability
|
|
|
|
**File**: `mcp-server/metrics.js` (new)
|
|
|
|
```javascript
|
|
export const metrics = {
|
|
requests: { total: 0, byTool: {}, byStatus: {} },
|
|
errors: { total: 0, byType: {} },
|
|
latency: { p50: 0, p95: 0, p99: 0 },
|
|
auth: { successes: 0, failures: 0 },
|
|
}
|
|
|
|
export function recordRequest(tool, status, latency) {
|
|
metrics.requests.total++
|
|
metrics.requests.byTool[tool] = (metrics.requests.byTool[tool] || 0) + 1
|
|
metrics.requests.byStatus[status] = (metrics.requests.byStatus[status] || 0) + 1
|
|
// Update latency percentiles
|
|
}
|
|
|
|
export function getMetrics() {
|
|
return { ...metrics, uptime: process.uptime() }
|
|
}
|
|
```
|
|
|
|
**Add endpoints**:
|
|
- `GET /metrics` - Export metrics in Prometheus format
|
|
- `GET /healthz` - Detailed health check (DB, cache, auth)
|
|
- `GET /debug/connections` - Active connections info
|
|
|
|
### 3. Performance
|
|
|
|
**File**: `mcp-server/rate-limit.js` (new)
|
|
|
|
```javascript
|
|
import { LRUCache } from 'lru-cache'
|
|
|
|
const rateLimits = new LRUCache({
|
|
max: 1000,
|
|
ttl: 60000, // 1 minute
|
|
})
|
|
|
|
export function checkRateLimit(identifier, limit = 100) {
|
|
const key = `rl:${identifier}`
|
|
const current = rateLimits.get(key) || 0
|
|
if (current >= limit) return false
|
|
rateLimits.set(key, current + 1)
|
|
return true
|
|
}
|
|
```
|
|
|
|
**Add to `index-sse.js`**:
|
|
- Apply rate limiting per API key
|
|
- Add request queuing for concurrent requests
|
|
- Add response caching for read-only tools (get_notes, get_notebooks)
|
|
|
|
### 4. Security
|
|
|
|
**File**: `mcp-server/validation.js` (new)
|
|
|
|
```javascript
|
|
import { z } from 'zod'
|
|
|
|
export const noteIdSchema = z.string().min(1).max(100).regex(/^[a-zA-Z0-9_-]+$/)
|
|
export const titleSchema = z.string().min(1).max(500)
|
|
export const contentSchema = z.string().max(1000000) // 1MB limit
|
|
export const colorSchema = z.enum(['default', 'red', 'orange', 'yellow', 'green', 'teal', 'blue', 'purple', 'pink', 'gray'])
|
|
export const notebookIdSchema = z.string().uuid()
|
|
|
|
export function validateToolInput(toolName, input) {
|
|
// Validate based on tool schema
|
|
return { valid: true, errors: [] }
|
|
}
|
|
```
|
|
|
|
**Add audit logging**:
|
|
- Log all tool invocations with user, timestamp, parameters
|
|
- Store audit logs in `systemConfig` or separate table
|
|
- Add `GET /audit/logs` endpoint (admin only)
|
|
|
|
### 5. Testing
|
|
|
|
**File**: `mcp-server/test/tools.test.js` (new)
|
|
|
|
```javascript
|
|
import { describe, it, expect } from 'vitest'
|
|
import { registerTools } from '../tools.js'
|
|
|
|
describe('MCP Tools', () => {
|
|
it('create_note should create a note', async () => {
|
|
// Test implementation
|
|
})
|
|
|
|
it('get_notes should filter by notebook', async () => {
|
|
// Test implementation
|
|
})
|
|
|
|
it('should handle invalid input gracefully', async () => {
|
|
// Test implementation
|
|
})
|
|
})
|
|
```
|
|
|
|
**Add tests for**:
|
|
- All tool handlers
|
|
- Authentication flows
|
|
- Rate limiting
|
|
- Error scenarios
|
|
|
|
### 6. Documentation
|
|
|
|
**Update files**:
|
|
- `mcp-server/README.md` - Add all tools with examples
|
|
- `mcp-server/MIGRATION.md` - Guide for v1 to v2 migration
|
|
- `memento-note/docs/mcp-integration.md` - User-facing guide
|
|
|
|
### 7. Configuration
|
|
|
|
**File**: `mcp-server/config.js` (new)
|
|
|
|
```javascript
|
|
export const config = {
|
|
port: parseInt(process.env.PORT) || 3001,
|
|
databaseUrl: process.env.DATABASE_URL,
|
|
requireAuth: process.env.MCP_REQUIRE_AUTH === 'true',
|
|
logLevel: process.env.MCP_LOG_LEVEL || 'info',
|
|
requestTimeout: parseInt(process.env.MCP_REQUEST_TIMEOUT) || 30000,
|
|
rateLimit: parseInt(process.env.MCP_RATE_LIMIT) || 100,
|
|
maxSessions: parseInt(process.env.MCP_MAX_SESSIONS) || 500,
|
|
sessionTtl: parseInt(process.env.MCP_SESSION_TTL) || 3600000,
|
|
}
|
|
|
|
export function validateConfig() {
|
|
const errors = []
|
|
if (!config.databaseUrl) errors.push('DATABASE_URL is required')
|
|
return errors
|
|
}
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
- None - can be implemented incrementally
|
|
|
|
## Success Criteria
|
|
|
|
1. All tool handlers have structured error responses
|
|
2. `/metrics` endpoint returns useful metrics
|
|
3. Rate limiting prevents abuse
|
|
4. All inputs are validated before processing
|
|
5. Test coverage > 80% for critical paths
|
|
6. Documentation is complete and accurate
|
|
|
|
## Migration Path for SDK v2 (Q1 2026)
|
|
|
|
When SDK v2 is released:
|
|
|
|
1. Update `@modelcontextprotocol/sdk` to v2
|
|
2. Update transport initialization
|
|
3. Update tool registration API
|
|
4. Update error handling to new schema
|
|
5. Run all tests to verify compatibility
|
|
6. Update documentation for v2 features
|
|
|
|
## Implementation Order
|
|
|
|
1. Error handling (blocking, high impact) ✅
|
|
2. Configuration validation (blocking, high impact) ✅
|
|
3. Observability metrics (non-blocking, high value) ✅
|
|
4. Input validation (non-blocking, security) ✅
|
|
5. Rate limiting (non-blocking, security) ✅
|
|
6. Testing (non-blocking, quality) ✅
|
|
7. Documentation (ongoing) ✅
|
|
|
|
## Implementation Summary
|
|
|
|
All improvements have been successfully implemented and tested:
|
|
|
|
### Created Files
|
|
- `mcp-server/errors.js` - Structured error handling with 13 error types
|
|
- `mcp-server/config.js` - Configuration validation with defaults
|
|
- `mcp-server/metrics.js` - Prometheus metrics export
|
|
- `mcp-server/validation.js` - Input validation with Zod schemas
|
|
- `mcp-server/rate-limit.js` - Per-user and global rate limiting
|
|
- `mcp-server/tool-handlers.js` - Tool handler wrapper with timeout
|
|
- `mcp-server/test/test.js` - Test suite
|
|
- `mcp-server/test/validate-config.js` - Configuration validation script
|
|
- `mcp-server/test/server-start-test.js` - Server start test
|
|
|
|
### Modified Files
|
|
- `mcp-server/index-sse.js` - Enhanced HTTP server with all features
|
|
- `mcp-server/index.js` - Enhanced stdio server with validation
|
|
- `mcp-server/package.json` - Version 3.2.0, new dependencies
|
|
|
|
### Test Results
|
|
- ✅ Configuration validation passes
|
|
- ✅ Server starts correctly
|
|
- ✅ Health endpoint responds with metrics
|
|
- ✅ Metrics endpoint exports Prometheus format
|
|
- ✅ Rate limiting initialized
|
|
- ✅ All numeric config values properly typed
|
|
|
|
### Ready for Production
|
|
The server is now production-ready with:
|
|
- Proper error handling and recovery
|
|
- Observability via Prometheus metrics
|
|
- Security through input validation and rate limiting
|
|
- Comprehensive documentation
|
|
- Test coverage
|