feat(monitoring): business metrics + hardening sécurité
All checks were successful
CI / Lint, Unit Tests & Build (push) Successful in 5m21s
CI / Deploy production (on server) (push) Successful in 3m43s

Métriques business dans /api/metrics :
- Abonnements par tier/status (BASIC/PRO/ENTERPRISE × ACTIVE/CANCELED)
- Nouveaux abonnements ce mois vs mois dernier
- Désabonnements / churn ce mois vs mois dernier
- Utilisateurs actifs 7j / 30j (proxy : note modifiée)
- Nouvelles inscriptions 7j / ce mois
- Runs agents IA par status (30j + aujourd'hui) + tokens consommés
- Usage IA par feature (requêtes + tokens ce mois)
- Logins aujourd'hui / ce mois (via AuditLog)
- Sessions brainstorm ce mois
- Flashcards total + reviews ce mois

Alertes Prometheus :
- HighChurnRate (> 10 désabonnements ce mois)
- NoNewUsersLast7Days (aucune inscription 7j)
- AgentRunsHighErrorRate (> 20% erreurs agents)
- BusinessMetricsCollectionFailed

Hardening monitoring :
- Ports monitoring → 127.0.0.1 (plus exposés publiquement)
- Images pinned (prometheus v2.53.0, grafana 11.1.0, etc.)
- alertmanager-bridge fake → metalmatze/alertmanager-bot:0.4.3
- /api/metrics sécurisé avec METRICS_TOKEN bearer
- Prometheus auth bearer via credentials_file
- Redis AOF + 256mb, healthcheck → /api/build-info
- repeat_interval 4h, inhibit_rules alertmanager
- Secrets CI/CD : AUTH_GOOGLE_SECRET, METRICS_TOKEN, GRAFANA, MCP_API_KEY

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Antigravity
2026-05-29 14:49:34 +00:00
parent 8571080037
commit 79fd6553b7
9 changed files with 352 additions and 51 deletions

View File

@@ -30,7 +30,7 @@ services:
image: redis:7-alpine
container_name: memento-redis
restart: unless-stopped
command: redis-server --maxmemory 128mb --maxmemory-policy allkeys-lru
command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru --appendonly yes --appendfsync everysec
volumes:
- redis-data:/data
ports:
@@ -73,7 +73,7 @@ services:
condition: service_healthy
restart: unless-stopped
healthcheck:
test: ["CMD", "node", "-e", "require('http').get('http://localhost:3000/',r=>process.exit(r.statusCode<500?0:1)).on('error',()=>process.exit(1))"]
test: ["CMD", "node", "-e", "require('http').get('http://localhost:3000/api/build-info',r=>process.exit(r.statusCode<500?0:1)).on('error',()=>process.exit(1))"]
interval: 15s
timeout: 10s
retries: 5
@@ -150,7 +150,7 @@ services:
cpus: '0.25'
memory: 128M
healthcheck:
test: ["CMD-SHELL", "wget --header \"x-api-key: 1b11f42537c1442456ea413feee75bac\" -q -O /dev/null http://localhost:3001/ || exit 1"]
test: ["CMD-SHELL", "wget --header \"x-api-key: ${MCP_API_KEY:-dev-key}\" -q -O /dev/null http://localhost:3001/ || exit 1"]
interval: 30s
timeout: 10s
retries: 3