diff --git a/DISASTER_RECOVERY.md b/DISASTER_RECOVERY.md index 348daee..888b6dc 100644 --- a/DISASTER_RECOVERY.md +++ b/DISASTER_RECOVERY.md @@ -1,102 +1,319 @@ -# Playbook de Sauvegarde Complète & Reprise d'Activité (Disaster Recovery) -> Gestion des pannes matérielles, sauvegarde de Nginx Proxy Manager (NPM) et transfert distant (sans NAS). +# Disaster Recovery — Wordly.art +## Guide opérationnel complet --- -## 🎯 Objectif -Ce document explique comment automatiser la sauvegarde et restaurer l'intégralité de la plateforme SaaS **Wordly.art** (Base de données, fichier de configuration `.env` contenant vos secrets, et configurations de routage SSL/Proxy de **Nginx Proxy Manager**) sur un nouveau serveur en cas de crash du serveur principal. +## Architecture ---- - -## ⚙️ 1. Variables de configuration dans le `.env` - -Pour activer les options de reprise d'activité, ajoutez ces variables dans votre fichier `.env` de production : - -```ini -# ============== Configuration Disaster Recovery (DR) ============== -# Choix de la destination : LOCAL, NAS, ou SCP -BACKUP_DEST_TYPE=LOCAL -# Chemin local ou point de montage (ex: /mnt/nas-backups/wordly) -BACKUP_DEST_PATH=/var/backups/wordly - -# Configuration SSH/SCP (requis uniquement si BACKUP_DEST_TYPE=SCP) -SCP_HOST=192.168.1.200 -SCP_USER=backup_user -SCP_KEY_PATH=/root/.ssh/id_rsa -SCP_PORT=22 -SCP_DEST_PATH=/var/backups/wordly_saas - -# Configurations des dossiers de Nginx Proxy Manager (NPM) -# Laissez vide si NPM tourne sur une autre machine et n'est pas géré ici. -NPM_DATA_DIR=/opt/npm/data -NPM_LETSENCRYPT_DIR=/opt/npm/letsencrypt +``` + [ Internet ] + │ + ▼ (80/443) + ┌────────────────────────────────────┐ + │ NPM dédié : 192.168.1.184 │ ← STABLE (ne tombe pas) + │ Interface admin : :81 │ + └────────────┬───────────────────────┘ + │ Forward Hostname → IP du serveur actif + ▼ + ┌────────────────────────────────────┐ + │ Serveur APP : 192.168.1.151 │ ← PEUT CRASHER + │ Docker: postgres, redis, │ + │ backend:8001, frontend:3000│ + └────────────┬───────────────────────┘ + │ rsync SSH toutes les 6h (cron) + ▼ + ┌────────────────────────────────────────────────┐ + │ NAS Synology : 192.168.1.146 │ ← SOURCE DE VÉRITÉ + │ Chemin réel : /volume1/backups/wordly │ + │ Accès : SSH key (wordly-backup@nas) │ + │ Pas de montage CIFS — rsync direct │ + └────────────┬───────────────────────────────────┘ + │ (en cas de crash de .151) + ▼ + ┌────────────────────────────────────┐ + │ Serveur SECOURS : 192.168.1.98 │ ← Docker déjà installé + │ Restaure via rsync SSH depuis NAS│ + │ → NPM redirigé automatiquement │ + └────────────────────────────────────┘ ``` +**Pourquoi rsync SSH et pas CIFS/SMB ?** +- Pas de montage à gérer, pas de `/etc/fstab` à configurer +- Fonctionne même si le NAS redémarre (pas de montage stale) +- Chemin exact `/volume1/backups/wordly` utilisable directement +- SSH chiffré, clé sans mot de passe pour l'automatisation + --- -## 🛠️ 2. Comment configurer la sauvegarde à distance (Mode SCP) +## RPO / RTO -Si vous n'avez pas de NAS, le mode **SCP** permet d'envoyer chaque nuit l'archive complète vers une autre machine ou ordinateur de votre réseau local (ex: `192.168.1.200`). +| Scénario | Données perdues max | Temps de remise en route | Procédure | +|----------|--------------------|--------------------------|-| +| Container crashe | 0 | ~30s | Autorestart Docker | +| Process PostgreSQL crashe | 0–5s | ~1 min | Autorestart + WAL | +| Corruption DB partielle | 0–6h | ~5 min | Restore depuis NAS | +| Serveur .151 mort | 0–6h | **~25 min** | Restore NAS sur .98 + NPM auto | +| Erreur humaine (DROP) | 0–6h | ~5 min | Restore snapshot précédent | + +--- + +## Ce qui est sauvegardé + +| Composant | Sauvegardé | Fréquence | +|-----------|-----------|-----------| +| PostgreSQL `pg_dump` | ✅ | Toutes les 6h | +| `.env` (secrets, clés API, Stripe...) | ✅ | Dans chaque archive DR | +| `docker-compose.yml` | ✅ | Dans chaque archive DR | +| Dossier `docker/` (configs) | ✅ | Dans chaque archive DR | +| Redis | ❌ | Cache — sessions perdues à la restore (reconnexion users) | +| Config NPM | ❌ | NPM sur .184 (stable). Seul Forward Host change via API. | +| Métriques Prometheus | ❌ | Non critique, repart de zéro | + +--- + +## 1. SETUP INITIAL (une seule fois sur .151) + +### Étape 1 : Créer le compte sur le NAS Synology + +**Connectez-vous à l'interface DSM : `http://192.168.1.146:5000`** + +#### 1a. Créer l'utilisateur dédié + +``` +DSM → Panneau de configuration → Utilisateurs et groupes → Créer + Nom d'utilisateur : wordly-backup + Mot de passe : [choisissez un mot de passe fort] + ☑ L'utilisateur ne peut pas changer son mot de passe + → Suivant + +Permissions sur les dossiers partagés : + backups → ☑ Lecture/Écriture + → Suivant → Terminer +``` + +#### 1b. Activer SSH sur le NAS + +``` +DSM → Panneau de configuration → Terminal et SNMP + ☑ Activer le service SSH + Port : 22 (ou autre si vous avez changé) + → Appliquer +``` + +#### 1c. Créer le dossier wordly sur le NAS -### Étape A : Générer une clé SSH sur le serveur principal -Sur le serveur applicatif (`192.168.1.151`), si vous n'avez pas de clé SSH : ```bash -sudo ssh-keygen -t rsa -b 4096 -N "" -f /root/.ssh/id_rsa -``` - -### Étape B : Autoriser la connexion sur la machine de backup -Copiez la clé publique sur votre machine de sauvegarde (`192.168.1.200`) : -```bash -sudo ssh-copy-id -i /root/.ssh/id_rsa.pub backup_user@192.168.1.200 -``` -*Vérification* : Exécutez `sudo ssh -i /root/.ssh/id_rsa backup_user@192.168.1.200` depuis le serveur principal. Vous devez vous connecter **sans saisir de mot de passe**. - ---- - -## 📅 3. Automatisation quotidienne - -Ajoutez le script à votre crontab pour qu'il s'exécute automatiquement chaque nuit à 03h30 : -```bash -sudo crontab -e -``` -Ajoutez cette ligne tout à la fin : -```cron -30 3 * * * /opt/wordly/scripts/disaster-recovery.sh --backup >> /var/log/wordly-dr-backup.log 2>&1 +# Depuis votre poste (ou n'importe quelle machine sur le réseau) : +ssh admin@192.168.1.146 +mkdir -p /volume1/backups/wordly/snapshots +mkdir -p /volume1/backups/wordly/scripts +chown -R wordly-backup:users /volume1/backups/wordly +exit ``` --- -## 🚨 4. Procédure de restauration sur un nouveau serveur (Failover) +### Étape 2 : Configurer les variables dans `.env` sur `.151` -Si le serveur principal crashe complètement et que vous devez remonter l'infrastructure sur un serveur de secours (ex: `192.168.1.152`) : - -### Étape 4.1 : Récupérer l'archive de sauvegarde -Récupérez le dernier fichier `wordly_dr_TIMESTAMP.tar.gz` depuis votre stockage de backup (NAS, machine de backup distante via SCP, ou clé USB). - -### Étape 4.2 : Installer Docker sur le nouveau serveur ```bash -curl -fsSL https://get.docker.com | sh -sudo usermod -aG docker $USER && newgrp docker +# ── NAS SSH ─────────────────────────────────── +NAS_HOST=192.168.1.146 +NAS_USER=wordly-backup +NAS_PATH=/volume1/backups/wordly +NAS_SSH_PORT=22 +NAS_SSH_KEY=/root/.ssh/wordly_nas_key + +# ── Alertes Telegram ────────────────────────── +TELEGRAM_BOT_TOKEN= # Voir section "Créer un bot Telegram" ci-dessous +TELEGRAM_CHAT_ID= # Votre chat ID personnel + +# ── NPM Failover API ────────────────────────── +NPM_API_URL=http://192.168.1.184:81/api +NPM_ADMIN_EMAIL=admin@wordly.art +NPM_ADMIN_PASSWORD=VotreMotDePasseNPM +NPM_PROXY_HOST_DOMAIN=wordly.art + +# ── Rétention ──────────────────────────────── +DAILY_RETENTION=7 +WEEKLY_RETENTION=4 +MONTHLY_RETENTION=6 +DR_RETENTION_DAYS=30 ``` -### Étape 4.3 : Lancer la restauration automatique -1. Créez le dossier de destination et placez-vous dedans : - ```bash - sudo mkdir -p /opt/wordly - cd /opt/wordly - ``` -2. Lancez le script de restauration à partir de l'archive (le script va extraire le `.env`, copier le `docker-compose.yml`, restaurer les configurations et certificats SSL de NPM, démarrer Docker et réinjecter les données de la base de données) : - ```bash - # Remplacez par le nom ou le chemin exact de votre archive - bash /chemin/vers/votre/archive/scripts/disaster-recovery.sh --restore /chemin/vers/votre/archive/wordly_dr_20260607_033000.tar.gz - ``` -3. Validez l'action en saisissant `RESTORE-ALL` lorsque le script vous le demande. +--- -### Étape 4.4 : Redirection du trafic réseau -Puisque le serveur a changé d'adresse IP (de `192.168.1.151` à `192.168.1.152`) : +### Étape 3 : Créer un bot Telegram (5 minutes) -#### Cas A : Si NPM tournait sur le serveur qui a crashé -Le script a restauré NPM sur la nouvelle machine. Vous devez simplement aller sur le routeur de votre box internet et modifier la redirection des ports **80** et **443** (Port Forwarding) pour qu'ils pointent vers la nouvelle IP `192.168.1.152` au lieu de `192.168.1.151`. +1. Ouvrir Telegram → chercher **@BotFather** +2. Envoyer `/newbot` +3. Nom : `Wordly Monitoring` / Username : `wordly_monitor_bot` +4. Copier le token → `TELEGRAM_BOT_TOKEN` +5. Envoyer un message à votre bot +6. Aller sur `https://api.telegram.org/bot/getUpdates` +7. Copier le `chat.id` → `TELEGRAM_CHAT_ID` -#### Cas B : Si NPM tourne sur une machine externe dédiée -Connectez-vous à l'interface web de votre NPM (http://IP_NPM:81), modifiez les Proxy Hosts de `wordly.art` et changez le champ **Forward Hostname/IP** pour remplacer `192.168.1.151` par la nouvelle IP `192.168.1.152`. +--- + +### Étape 4 : Configurer SSH sans mot de passe vers le NAS + +```bash +# Sur le serveur .151 (en root) +sudo bash scripts/setup-nas.sh +``` + +Ce script : +- Génère une clé SSH dédiée : `/root/.ssh/wordly_nas_key` +- La copie sur le NAS (**mot de passe demandé une seule fois**) +- Teste la connexion sans mot de passe +- Crée la structure de dossiers sur le NAS +- Configure `~/.ssh/config` avec l'alias `wordly-nas` +- Copie les scripts sur le NAS (disponibles depuis `.98` pour la restauration) + +--- + +### Étape 5 : Tester le premier backup + +```bash +bash scripts/backup-to-nas.sh --full + +# Vérifier que l'archive est bien arrivée sur le NAS +bash scripts/backup-to-nas.sh --list +``` + +--- + +### Étape 6 : Tester la vérification automatique + +```bash +bash scripts/verify-backups.sh +``` + +--- + +### Étape 7 : Tester le failover NPM (sans rien changer) + +```bash +bash scripts/npm-failover.sh --dry-run --target-ip 192.168.1.98 +``` + +--- + +### Étape 8 : Activer les crons + +```bash +bash scripts/install-crontab.sh +crontab -l # Vérifier +``` + +--- + +## 2. VÉRIFICATION QUOTIDIENNE (automatique) + +``` +0 */6 * * * backup-to-nas.sh → Snapshot DB + archive → NAS via rsync SSH +30 */6 * * * verify-backups.sh → 8 vérifications + alerte Telegram si erreur +``` + +--- + +## 3. RESTAURATION D'URGENCE (quand .151 est mort) + +> **Durée estimée : 20–25 minutes** + +### Sur le serveur de secours `192.168.1.98` + +```bash +# 1. Installer les prérequis (Docker déjà installé) +apt-get install -y rsync jq + +# 2. Récupérer la clé SSH depuis le NAS (ou depuis une autre source sécurisée) +# Option A : copier la clé depuis un endroit sûr (gestionnaire de mots de passe, etc.) +mkdir -p /root/.ssh && chmod 700 /root/.ssh +# collez le contenu de /root/.ssh/wordly_nas_key ici +nano /root/.ssh/wordly_nas_key +chmod 600 /root/.ssh/wordly_nas_key + +# 3. Tester la connexion NAS +ssh -i /root/.ssh/wordly_nas_key wordly-backup@192.168.1.146 "echo OK" + +# 4. Voir les archives disponibles +ssh -i /root/.ssh/wordly_nas_key wordly-backup@192.168.1.146 \ + "ls -lht /volume1/backups/wordly/snapshots/ | head -10" + +# 5. Télécharger la dernière archive depuis le NAS +rsync -az \ + -e "ssh -i /root/.ssh/wordly_nas_key" \ + wordly-backup@192.168.1.146:/volume1/backups/wordly/snapshots/wordly_dr_TIMESTAMP.tar.gz \ + /tmp/ + +# 6. Télécharger les scripts de restauration depuis le NAS +rsync -az \ + -e "ssh -i /root/.ssh/wordly_nas_key" \ + wordly-backup@192.168.1.146:/volume1/backups/wordly/scripts/ \ + /opt/wordly/scripts/ + +# 7. Lancer la restauration complète +bash /opt/wordly/scripts/disaster-recovery.sh \ + --restore /tmp/wordly_dr_TIMESTAMP.tar.gz +``` + +**Le script fait automatiquement :** +1. Extrait `.env`, `docker-compose.yml`, configs Docker +2. Lance tous les containers Docker +3. Attend que PostgreSQL soit healthy +4. Restaure le dump SQL +5. Health check sur `http://localhost:8001/health` (max 180s) +6. **Si OK → appelle NPM API → bascule le trafic vers `192.168.1.98`** +7. **Alerte Telegram : "✅ Wordly.art DR COMPLET"** + +**Si NPM failover automatique échoue (dernier recours) :** +``` +http://192.168.1.184:81 → Proxy Hosts → wordly.art → Edit + Forward Hostname : 192.168.1.98 + → Save +# Changement immédiat, 0 redémarrage nécessaire +``` + +--- + +## 4. CONSERVATION DE LA CLÉ SSH NAS + +> [!IMPORTANT] +> La clé `/root/.ssh/wordly_nas_key` est **critique** pour la restauration depuis `.98`. +> Conservez-la dans au minimum 2 endroits sécurisés : +> - Gestionnaire de mots de passe (Bitwarden, 1Password, etc.) +> - Coffre-fort KeePass chiffré sur un support physique +> +> Sans cette clé, vous ne pouvez pas accéder aux archives sur le NAS depuis `.98`. + +--- + +## 5. SCRIPTS DE RÉFÉRENCE + +| Script | Usage | Déclenchement | +|--------|-------|---------------| +| `setup-nas.sh` | Configure SSH → NAS, génère clé, copie scripts | **Once** (root requis) | +| `backup-to-nas.sh` | pg_dump + archive DR → NAS via rsync SSH | Cron toutes les 6h | +| `backup-to-nas.sh --list` | Lister les archives disponibles sur le NAS | Manuel | +| `verify-backups.sh` | 8 checks intégrité + Telegram | Cron toutes les 6h+30m | +| `disaster-recovery.sh --backup` | Archive DR → NAS | Inclus dans backup-to-nas | +| `disaster-recovery.sh --restore ` | Restauration complète | **Urgence** | +| `npm-failover.sh --target-ip ` | Bascule NPM vers une IP | Appelé automatiquement | +| `npm-failover.sh --dry-run --target-ip ` | Test sans modifier NPM | Test initial | +| `install-crontab.sh` | Installe les crons | **Once** | + +--- + +## 6. LOGS + +```bash +# Logs backup (sur .151) +tail -f /var/log/wordly-backup.log + +# Logs vérification (sur .151) +tail -f /var/log/wordly-verify.log + +# Logs Docker (sur le serveur actif) +docker compose logs -f backend +docker compose logs -f postgres +``` diff --git a/scripts/backup-to-nas.sh b/scripts/backup-to-nas.sh index 7ea06d5..6b89a41 100644 --- a/scripts/backup-to-nas.sh +++ b/scripts/backup-to-nas.sh @@ -1,287 +1,356 @@ #!/bin/bash -# ============================================ -# Wordly.art - PostgreSQL Backup to NAS -# ============================================ -# CRON: Run daily at 03:00 -# 0 3 * * * /opt/wordly/scripts/backup-to-nas.sh >> /var/log/wordly-backup.log 2>&1 +# ============================================================================== +# Wordly.art - PostgreSQL Backup vers NAS Synology via SSH/rsync +# ============================================================================== +# Sauvegarde la base PostgreSQL et l'archive DR sur le NAS via SSH/rsync. +# Pas de montage CIFS — rsync SSH direct sur /volume1/backups/wordly. # -# Usage: -# ./backup-to-nas.sh # Default: daily backup -# ./backup-to-nas.sh --full # Full backup with upload cleanup -# ./backup-to-nas.sh --restore FILE # Restore from specific backup -# ============================================ +# CRON (installé par install-crontab.sh) : +# 0 */6 * * * bash /opt/wordly/scripts/backup-to-nas.sh >> /var/log/wordly-backup.log 2>&1 +# +# Usage : +# ./backup-to-nas.sh # Backup complet → NAS +# ./backup-to-nas.sh --full # Identique (alias explicite) +# ./backup-to-nas.sh --list # Lister les archives disponibles sur le NAS +# ============================================================================== set -euo pipefail -# =========================================== -# CONFIGURATION - MODIFY THESE VALUES -# =========================================== -# NAS settings (SMB/CIFS or NFS mount point) -NAS_BACKUP_DIR="/mnt/nas-backups/wordly" +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)" -# Docker container name for PostgreSQL -POSTGRES_CONTAINER="wordly-postgres" -POSTGRES_USER="translate" -POSTGRES_DB="translate_db" -POSTGRES_PASSWORD="yLLgkEvt6mvzGDdoqtQvI1vEgMmR-W75ZTPW5StaIAU" +# ============================================================================== +# CHARGER LE .env +# ============================================================================== +ENV_FILE="${PROJECT_ROOT}/.env" +if [ -f "${ENV_FILE}" ]; then + set -a + source "${ENV_FILE}" + set +a +else + echo "ERROR: .env introuvable : ${ENV_FILE}" >&2 + exit 1 +fi -# Backup retention -DAILY_RETENTION=7 # Keep 7 daily backups -WEEKLY_RETENTION=4 # Keep 4 weekly backups -MONTHLY_RETENTION=6 # Keep 6 monthly backups +# ============================================================================== +# CONFIGURATION (depuis .env) +# ============================================================================== -# Notification (optional - leave empty to disable) -NOTIFICATION_WEBHOOK="" # Slack/Discord webhook URL +# NAS SSH +NAS_HOST="${NAS_HOST:-192.168.1.146}" +NAS_USER="${NAS_USER:-wordly-backup}" +NAS_PATH="${NAS_PATH:-/volume1/backups/wordly}" +NAS_SSH_PORT="${NAS_SSH_PORT:-22}" +NAS_SSH_KEY="${NAS_SSH_KEY:-/root/.ssh/wordly_nas_key}" -# =========================================== +# PostgreSQL +POSTGRES_CONTAINER="${POSTGRES_CONTAINER:-wordly-postgres}" +POSTGRES_USER="${POSTGRES_USER:-translate}" +POSTGRES_DB="${POSTGRES_DB:-translate_db}" +POSTGRES_PASSWORD="${POSTGRES_PASSWORD:?POSTGRES_PASSWORD doit être défini dans .env}" + +# Rétention sur le NAS (nombre d'archives à garder) +DAILY_RETENTION=${DAILY_RETENTION:-7} +WEEKLY_RETENTION=${WEEKLY_RETENTION:-4} +MONTHLY_RETENTION=${MONTHLY_RETENTION:-6} + +# Telegram +TELEGRAM_BOT_TOKEN="${TELEGRAM_BOT_TOKEN:-}" +TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}" + +# ============================================================================== # INTERNALS -# =========================================== +# ============================================================================== TIMESTAMP=$(date +"%Y%m%d_%H%M%S") -DATE_ONLY=$(date +"%Y-%m-%d") -DAY_OF_WEEK=$(date +"%u") # 1=Mon, 7=Sun +DAY_OF_WEEK=$(date +"%u") # 1=Lun, 7=Dim DAY_OF_MONTH=$(date +"%d") -BACKUP_NAME="wordly_db_${TIMESTAMP}.sql.gz" -BACKUP_PATH="${NAS_BACKUP_DIR}/${BACKUP_NAME}" -LOG_PREFIX="[Wordly Backup ${TIMESTAMP}]" +SNAPSHOT_NAME="wordly_dr_${TIMESTAMP}.tar.gz" +LOCAL_TMP="/tmp/wordly_backup_${TIMESTAMP}" +SSH_CMD="ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes -o ConnectTimeout=10" +RSYNC_CMD="rsync -az -e 'ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes -o ConnectTimeout=10'" -# Colors for terminal output RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' NC='\033[0m' +LOG_PREFIX="[Backup ${TIMESTAMP}]" -# =========================================== -# FUNCTIONS -# =========================================== +log() { echo "${LOG_PREFIX} $1"; } +log_success() { echo -e "${LOG_PREFIX} ${GREEN}✅ $1${NC}"; } +log_error() { echo -e "${LOG_PREFIX} ${RED}❌ ERROR: $1${NC}"; } +log_warning() { echo -e "${LOG_PREFIX} ${YELLOW}⚠️ $1${NC}"; } -log() { - echo "${LOG_PREFIX} $1" -} - -log_success() { - echo -e "${LOG_PREFIX} ${GREEN}$1${NC}" -} - -log_error() { - echo -e "${LOG_PREFIX} ${RED}ERROR: $1${NC}" -} - -log_warning() { - echo -e "${LOG_PREFIX} ${YELLOW}WARNING: $1${NC}" -} - -send_notification() { +# ============================================================================== +# TELEGRAM +# ============================================================================== +send_telegram() { local message="$1" - if [ -n "${NOTIFICATION_WEBHOOK}" ]; then - curl -s -X POST "${NOTIFICATION_WEBHOOK}" \ - -H "Content-Type: application/json" \ - -d "{\"text\": \"${message}\"}" > /dev/null 2>&1 || true + if [ -n "${TELEGRAM_BOT_TOKEN}" ] && [ -n "${TELEGRAM_CHAT_ID}" ]; then + curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \ + -d "chat_id=${TELEGRAM_CHAT_ID}" \ + -d "text=${message}" \ + -d "parse_mode=Markdown" \ + >/dev/null 2>&1 || true fi } +# ============================================================================== +# PRÉREQUIS +# ============================================================================== check_prerequisites() { - # Check NAS mount - if [ ! -d "${NAS_BACKUP_DIR}" ]; then - log_error "NAS backup directory not found: ${NAS_BACKUP_DIR}" - log "Attempting to mount NAS..." + log "Vérification des prérequis..." - # Try to mount if configured via /etc/fstab - mount "${NAS_BACKUP_DIR}" 2>/dev/null || true - - if [ ! -d "${NAS_BACKUP_DIR}" ]; then - log_error "Cannot mount NAS. Aborting." - send_notification "Wordly Backup FAILED: NAS not mounted at ${NAS_BACKUP_DIR}" - exit 1 - fi - fi - - # Check Docker - if ! docker ps --format '{{.Names}}' | grep -q "${POSTGRES_CONTAINER}"; then - log_error "PostgreSQL container '${POSTGRES_CONTAINER}' is not running." - send_notification "Wordly Backup FAILED: PostgreSQL container not running" + # Clé SSH + if [ ! -f "${NAS_SSH_KEY}" ]; then + log_error "Clé SSH introuvable : ${NAS_SSH_KEY}" + log_error "Lancez d'abord : sudo bash scripts/setup-nas.sh" exit 1 fi - log_success "Prerequisites OK" -} - -create_backup() { - log "Starting backup of '${POSTGRES_DB}'..." - - # Create backup directory structure - mkdir -p "${NAS_BACKUP_DIR}/daily" - mkdir -p "${NAS_BACKUP_DIR}/weekly" - mkdir -p "${NAS_BACKUP_DIR}/monthly" - - # Run pg_dump inside Docker container - docker exec "${POSTGRES_CONTAINER}" pg_dump \ - -U "${POSTGRES_USER}" \ - -d "${POSTGRES_DB}" \ - --format=custom \ - --compress=9 \ - --no-owner \ - --no-acl \ - 2>/dev/null | gzip > "${NAS_BACKUP_DIR}/daily/${BACKUP_NAME}" - - local backup_size=$(du -h "${NAS_BACKUP_DIR}/daily/${BACKUP_NAME}" | cut -f1) - - if [ -f "${NAS_BACKUP_DIR}/daily/${BACKUP_NAME}" ]; then - log_success "Backup created: ${BACKUP_NAME} (${backup_size})" - - # Copy to weekly/monthly if applicable - if [ "${DAY_OF_WEEK}" = "7" ]; then - cp "${NAS_BACKUP_DIR}/daily/${BACKUP_NAME}" "${NAS_BACKUP_DIR}/weekly/" - log "Weekly backup copied" - fi - - if [ "${DAY_OF_MONTH}" = "01" ]; then - cp "${NAS_BACKUP_DIR}/daily/${BACKUP_NAME}" "${NAS_BACKUP_DIR}/monthly/" - log "Monthly backup copied" - fi - - send_notification "Wordly Backup SUCCESS: ${BACKUP_NAME} (${backup_size})" - else - log_error "Backup file was not created!" - send_notification "Wordly Backup FAILED: pg_dump produced no output" + # Connectivité SSH vers le NAS + if ! ${SSH_CMD} "${NAS_USER}@${NAS_HOST}" "echo OK" >/dev/null 2>&1; then + log_error "Impossible de se connecter au NAS ${NAS_HOST} via SSH." + log_error "Vérifiez : ssh -i ${NAS_SSH_KEY} ${NAS_USER}@${NAS_HOST}" + send_telegram "🚨 *Wordly Backup ÉCHOUÉ* +NAS inaccessible : ${NAS_HOST} +Date : $(date '+%Y-%m-%d %H:%M:%S')" exit 1 fi + log_success "NAS SSH : OK" + + # Docker + container PostgreSQL + if ! docker ps --format '{{.Names}}' 2>/dev/null | grep -q "^${POSTGRES_CONTAINER}$"; then + log_error "Container PostgreSQL '${POSTGRES_CONTAINER}' n'est pas en cours d'exécution !" + send_telegram "🚨 *Wordly Backup ÉCHOUÉ* +PostgreSQL container non trouvé +Date : $(date '+%Y-%m-%d %H:%M:%S')" + exit 1 + fi + log_success "PostgreSQL container : OK" } -cleanup_old_backups() { - log "Cleaning up old backups..." +# ============================================================================== +# BACKUP POSTGRESQL +# ============================================================================== +backup_postgres() { + log "Dump PostgreSQL de '${POSTGRES_DB}'..." + mkdir -p "${LOCAL_TMP}" - # Daily: keep last N days - local daily_count=$(ls -1 "${NAS_BACKUP_DIR}/daily/" 2>/dev/null | wc -l) - if [ "${daily_count}" -gt "${DAILY_RETENTION}" ]; then - ls -1t "${NAS_BACKUP_DIR}/daily/" | tail -n +$((DAILY_RETENTION + 1)) | while read -r f; do - rm -f "${NAS_BACKUP_DIR}/daily/${f}" - log " Deleted daily: ${f}" - done - fi + local dump_file="${LOCAL_TMP}/db_${TIMESTAMP}.dump.gz" - # Weekly: keep last N weeks - local weekly_count=$(ls -1 "${NAS_BACKUP_DIR}/weekly/" 2>/dev/null | wc -l) - if [ "${weekly_count}" -gt "${WEEKLY_RETENTION}" ]; then - ls -1t "${NAS_BACKUP_DIR}/weekly/" | tail -n +$((WEEKLY_RETENTION + 1)) | while read -r f; do - rm -f "${NAS_BACKUP_DIR}/weekly/${f}" - log " Deleted weekly: ${f}" - done - fi - - # Monthly: keep last N months - local monthly_count=$(ls -1 "${NAS_BACKUP_DIR}/monthly/" 2>/dev/null | wc -l) - if [ "${monthly_count}" -gt "${MONTHLY_RETENTION}" ]; then - ls -1t "${NAS_BACKUP_DIR}/monthly/" | tail -n +$((MONTHLY_RETENTION + 1)) | while read -r f; do - rm -f "${NAS_BACKUP_DIR}/monthly/${f}" - log " Deleted monthly: ${f}" - done - fi - - log_success "Cleanup done" -} - -verify_backup() { - log "Verifying backup integrity..." - - if gzip -t "${NAS_BACKUP_DIR}/daily/${BACKUP_NAME}" 2>/dev/null; then - log_success "Backup integrity OK" - else - log_error "Backup integrity check FAILED!" - send_notification "Wordly Backup WARNING: Integrity check failed for ${BACKUP_NAME}" - # Don't delete - let admin investigate - fi -} - -restore_backup() { - local backup_file="$1" - - if [ -z "${backup_file}" ]; then - log_error "Usage: $0 --restore " - echo "" - echo "Available backups:" - echo "=== Daily ===" - ls -lht "${NAS_BACKUP_DIR}/daily/" 2>/dev/null || echo " (none)" - echo "=== Weekly ===" - ls -lht "${NAS_BACKUP_DIR}/weekly/" 2>/dev/null || echo " (none)" - echo "=== Monthly ===" - ls -lht "${NAS_BACKUP_DIR}/monthly/" 2>/dev/null || echo " (none)" + if ! docker exec \ + -e PGPASSWORD="${POSTGRES_PASSWORD}" \ + "${POSTGRES_CONTAINER}" \ + pg_dump \ + -U "${POSTGRES_USER}" \ + -d "${POSTGRES_DB}" \ + --format=custom \ + --no-owner \ + --no-acl \ + 2>/dev/null | gzip > "${dump_file}"; then + log_error "pg_dump a échoué !" + send_telegram "🚨 *Wordly Backup ÉCHOUÉ* +pg_dump error sur ${POSTGRES_DB} +Date : $(date '+%Y-%m-%d %H:%M:%S')" + rm -rf "${LOCAL_TMP}" exit 1 fi - # Find the file - local full_path="" - for dir in daily weekly monthly; do - if [ -f "${NAS_BACKUP_DIR}/${dir}/${backup_file}" ]; then - full_path="${NAS_BACKUP_DIR}/${dir}/${backup_file}" - break - fi - done + # Vérification taille + local size_bytes + size_bytes=$(stat -c %s "${dump_file}" 2>/dev/null || stat -f %z "${dump_file}") + local min_bytes=$((1024 * 1024)) # 1MB minimum - if [ -z "${full_path}" ]; then - log_error "Backup file not found: ${backup_file}" + if [ "${size_bytes}" -lt "${min_bytes}" ]; then + log_error "Dump trop petit ($(numfmt --to=iec ${size_bytes})) — base de données vide ?" + send_telegram "🚨 *Wordly Backup ÉCHOUÉ* +Dump PostgreSQL trop petit : $(numfmt --to=iec ${size_bytes}) +Date : $(date '+%Y-%m-%d %H:%M:%S')" + rm -rf "${LOCAL_TMP}" exit 1 fi - echo "" - log_warning "RESTORE MODE - This will OVERWRITE the current database!" - echo " File: ${full_path}" - echo " Database: ${POSTGRES_DB}" - echo "" - read -p "Are you sure? Type 'YES' to confirm: " confirm - - if [ "${confirm}" != "YES" ]; then - log "Restore cancelled." - exit 0 - fi - - log "Restoring from ${full_path}..." - - # Create a safety backup first - log "Creating safety backup before restore..." - SAFETY_NAME="wordly_db_pre_restore_${TIMESTAMP}.sql.gz" - docker exec "${POSTGRES_CONTAINER}" pg_dump \ - -U "${POSTGRES_USER}" \ - -d "${POSTGRES_DB}" \ - --format=custom \ - --compress=9 \ - 2>/dev/null | gzip > "${NAS_BACKUP_DIR}/daily/${SAFETY_NAME}" - log "Safety backup: ${SAFETY_NAME}" - - # Restore - gunzip -c "${full_path}" | docker exec -i "${POSTGRES_CONTAINER}" \ - pg_restore \ - -U "${POSTGRES_USER}" \ - -d "${POSTGRES_DB}" \ - --clean \ - --if-exists \ - --no-owner \ - --no-acl \ - 2>/dev/null || true - - log_success "Restore completed!" - log_warning "Restart backend: docker restart wordly-backend" + log_success "Dump PostgreSQL : $(numfmt --to=iec ${size_bytes})" + echo "${dump_file}" } -# =========================================== +# ============================================================================== +# CRÉER L'ARCHIVE DR (dump + .env + docker-compose + configs) +# ============================================================================== +create_dr_archive() { + local dump_file="$1" + log "Construction de l'archive DR..." + + # Copier les fichiers de config + [ -f "${PROJECT_ROOT}/.env" ] && cp "${PROJECT_ROOT}/.env" "${LOCAL_TMP}/.env.production" + [ -f "${PROJECT_ROOT}/docker-compose.yml" ] && cp "${PROJECT_ROOT}/docker-compose.yml" "${LOCAL_TMP}/" + [ -d "${PROJECT_ROOT}/docker" ] && cp -r "${PROJECT_ROOT}/docker" "${LOCAL_TMP}/" + + # Compresser + local archive_path="/tmp/${SNAPSHOT_NAME}" + tar -czf "${archive_path}" -C "${LOCAL_TMP}" . + rm -rf "${LOCAL_TMP}" + + # Vérification intégrité + if ! gzip -t "${archive_path}" 2>/dev/null; then + log_error "Archive DR corrompue !" + rm -f "${archive_path}" + exit 1 + fi + + local size + size=$(du -h "${archive_path}" | cut -f1) + log_success "Archive DR créée : ${SNAPSHOT_NAME} (${size})" + echo "${archive_path}|${size}" +} + +# ============================================================================== +# ENVOYER SUR LE NAS VIA SCP/rsync SSH +# ============================================================================== +push_to_nas() { + local archive_path="$1" + local size="$2" + + log "Transfert vers le NAS via rsync SSH..." + log " Source : ${archive_path}" + log " Dest : ${NAS_USER}@${NAS_HOST}:${NAS_PATH}/snapshots/${SNAPSHOT_NAME}" + + # Dossier quotidien/hebdo/mensuel sur le NAS + local nas_dest="${NAS_PATH}/snapshots" + + # Transfer principal + if ! rsync -az \ + -e "ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes -o ConnectTimeout=30" \ + "${archive_path}" \ + "${NAS_USER}@${NAS_HOST}:${nas_dest}/${SNAPSHOT_NAME}"; then + log_error "rsync vers le NAS a échoué !" + send_telegram "🚨 *Wordly Backup ÉCHOUÉ* +rsync SSH vers ${NAS_HOST} a échoué +Fichier local conservé : ${archive_path} +Date : $(date '+%Y-%m-%d %H:%M:%S')" + # Garder le fichier local comme fallback + mkdir -p "${PROJECT_ROOT}/backups/emergency" + mv "${archive_path}" "${PROJECT_ROOT}/backups/emergency/${SNAPSHOT_NAME}" + log_warning "Archive conservée localement : ${PROJECT_ROOT}/backups/emergency/${SNAPSHOT_NAME}" + exit 1 + fi + + log_success "Archive transférée sur le NAS : ${nas_dest}/${SNAPSHOT_NAME}" + + # Copie hebdomadaire (dimanche) + if [ "${DAY_OF_WEEK}" = "7" ]; then + ${SSH_CMD} "${NAS_USER}@${NAS_HOST}" \ + "cp ${nas_dest}/${SNAPSHOT_NAME} ${NAS_PATH}/snapshots/weekly_${SNAPSHOT_NAME}" 2>/dev/null || true + log "Archive hebdomadaire copiée." + fi + + # Copie mensuelle (1er du mois) + if [ "${DAY_OF_MONTH}" = "01" ]; then + ${SSH_CMD} "${NAS_USER}@${NAS_HOST}" \ + "cp ${nas_dest}/${SNAPSHOT_NAME} ${NAS_PATH}/snapshots/monthly_${SNAPSHOT_NAME}" 2>/dev/null || true + log "Archive mensuelle copiée." + fi + + # Nettoyage local + rm -f "${archive_path}" +} + +# ============================================================================== +# ROTATION DES ARCHIVES SUR LE NAS +# ============================================================================== +cleanup_nas() { + log "Rotation des archives sur le NAS (conservation : ${DAILY_RETENTION} jours)..." + + # Supprimer les archives wordly_dr_* plus vieilles que DAILY_RETENTION + ${SSH_CMD} "${NAS_USER}@${NAS_HOST}" \ + "find ${NAS_PATH}/snapshots -name 'wordly_dr_*.tar.gz' -mtime +${DAILY_RETENTION} -delete 2>/dev/null; \ + find ${NAS_PATH}/snapshots -name 'weekly_*.tar.gz' | sort -r | tail -n +$((WEEKLY_RETENTION + 1)) | xargs rm -f 2>/dev/null; \ + find ${NAS_PATH}/snapshots -name 'monthly_*.tar.gz' | sort -r | tail -n +$((MONTHLY_RETENTION + 1)) | xargs rm -f 2>/dev/null; \ + echo OK" | grep -q "OK" + + log_success "Rotation des archives OK" +} + +# ============================================================================== +# SYNCHRONISER LES SCRIPTS SUR LE NAS (pour restauration depuis .98) +# ============================================================================== +sync_scripts() { + rsync -az \ + -e "ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes" \ + --exclude="__pycache__" \ + --exclude="*.pyc" \ + "${SCRIPT_DIR}/" \ + "${NAS_USER}@${NAS_HOST}:${NAS_PATH}/scripts/" 2>/dev/null || true +} + +# ============================================================================== +# LISTER LES ARCHIVES DISPONIBLES +# ============================================================================== +list_archives() { + log "Archives disponibles sur le NAS :" + ${SSH_CMD} "${NAS_USER}@${NAS_HOST}" \ + "ls -lht ${NAS_PATH}/snapshots/wordly_dr_*.tar.gz 2>/dev/null || echo '(aucune archive)'" +} + +# ============================================================================== # MAIN -# =========================================== +# ============================================================================== +main() { + case "${1:-}" in + --list) + ENV_FILE="${PROJECT_ROOT}/.env" + [ -f "${ENV_FILE}" ] && { set -a; source "${ENV_FILE}"; set +a; } + list_archives + exit 0 + ;; + --full|*) + ;; + esac -case "${1:-}" in - --restore) - restore_backup "${2:-}" - ;; - --full) - check_prerequisites - create_backup - verify_backup - cleanup_old_backups - log_success "Full backup cycle complete!" - ;; - *) - check_prerequisites - create_backup - verify_backup - cleanup_old_backups - log_success "Backup complete!" - ;; -esac + echo "" + echo "=================================================================" + echo " Wordly.art — Backup → NAS Synology 192.168.1.146" + echo " DB : ${POSTGRES_DB}" + echo " NAS : ${NAS_USER}@${NAS_HOST}:${NAS_PATH}" + echo " $(date '+%Y-%m-%d %H:%M:%S')" + echo "=================================================================" + echo "" + + check_prerequisites + + # 1. Dump PostgreSQL + local dump_file + dump_file=$(backup_postgres) + + # 2. Créer l'archive DR + local archive_info + archive_info=$(create_dr_archive "${dump_file}") + local archive_path="${archive_info%%|*}" + local archive_size="${archive_info##*|}" + + # 3. Envoyer sur le NAS via rsync SSH + push_to_nas "${archive_path}" "${archive_size}" + + # 4. Rotation + cleanup_nas + + # 5. Sync scripts + sync_scripts + + # 6. Notification Telegram + send_telegram "✅ *Wordly.art Backup OK* +Archive : \`${SNAPSHOT_NAME}\` +Taille : ${archive_size} +NAS : \`${NAS_PATH}/snapshots/\` +Date : $(date '+%Y-%m-%d %H:%M:%S')" + + echo "" + log_success "=================================================================" + log_success "Backup complet terminé !" + log_success " Archive : ${NAS_PATH}/snapshots/${SNAPSHOT_NAME}" + log_success " Lister : bash scripts/backup-to-nas.sh --list" + log_success "=================================================================" + echo "" +} + +main "$@" diff --git a/scripts/disaster-recovery.sh b/scripts/disaster-recovery.sh index 1507198..a4c3a92 100755 --- a/scripts/disaster-recovery.sh +++ b/scripts/disaster-recovery.sh @@ -1,9 +1,16 @@ #!/bin/bash # ============================================================================== -# Wordly.art - Disaster Recovery (DR) Backup & Restore Playbook (V2) +# Wordly.art - Disaster Recovery (DR) Backup & Restore Playbook (V3) # ============================================================================== -# Packages app configs (.env, docker-compose), database backups, and NPM -# configs, and exports them to LOCAL, NAS, or remote SCP storage. +# Archives app configs (.env, docker-compose), database backup, and exports +# to the NAS at 192.168.1.146. +# +# On RESTORE: deploys app on the new server and automatically updates NPM +# (192.168.1.184) to reroute traffic via API — no manual intervention needed. +# +# Usage: +# ./disaster-recovery.sh --backup # Create DR archive → NAS +# ./disaster-recovery.sh --restore # Restore on THIS machine # ============================================================================== set -euo pipefail @@ -31,49 +38,54 @@ if [ -f "${ENV_FILE}" ]; then set +a fi -# Config Defaults & Type Resolution -BACKUP_DEST_TYPE="${BACKUP_DEST_TYPE:-LOCAL}" # LOCAL, NAS, SCP -BACKUP_DEST_PATH="${BACKUP_DEST_PATH:-${PROJECT_ROOT}/backups}" -DR_RETENTION_DAYS=${DR_RETENTION_DAYS:-14} +# NAS SSH (même config que backup-to-nas.sh) +NAS_HOST="${NAS_HOST:-192.168.1.146}" +NAS_USER="${NAS_USER:-wordly-backup}" +NAS_PATH="${NAS_PATH:-/volume1/backups/wordly}" +NAS_SSH_PORT="${NAS_SSH_PORT:-22}" +NAS_SSH_KEY="${NAS_SSH_KEY:-/root/.ssh/wordly_nas_key}" +BACKUP_DEST_PATH="${NAS_PATH}/snapshots" +DR_RETENTION_DAYS=${DR_RETENTION_DAYS:-30} -# SCP Configuration -SCP_HOST="${SCP_HOST:-}" -SCP_USER="${SCP_USER:-}" -SCP_KEY_PATH="${SCP_KEY_PATH:-~/.ssh/id_rsa}" -SCP_PORT="${SCP_PORT:-22}" -SCP_DEST_PATH="${SCP_DEST_PATH:-/var/backups/wordly}" +# IP of THIS server (used during restore to configure NPM failover) +SERVER_IP="${SERVER_IP:-}" -# NPM Configuration directories -NPM_DATA_DIR="${NPM_DATA_DIR:-}" -NPM_LETSENCRYPT_DIR="${NPM_LETSENCRYPT_DIR:-}" +# Telegram +TELEGRAM_BOT_TOKEN="${TELEGRAM_BOT_TOKEN:-}" +TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}" # ============================================================================== -# DESTINATION PREPARATION +# SEND TELEGRAM NOTIFICATION +# ============================================================================== +send_telegram() { + local message="$1" + if [ -n "${TELEGRAM_BOT_TOKEN}" ] && [ -n "${TELEGRAM_CHAT_ID}" ]; then + curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \ + -d "chat_id=${TELEGRAM_CHAT_ID}" \ + -d "text=${message}" \ + -d "parse_mode=Markdown" \ + >/dev/null 2>&1 || true + fi +} + +# ============================================================================== +# DESTINATION PREPARATION (backup mode) # ============================================================================== prepare_destination() { - if [ "${BACKUP_DEST_TYPE}" = "NAS" ] || [ "${BACKUP_DEST_TYPE}" = "LOCAL" ]; then - if [ ! -d "${BACKUP_DEST_PATH}" ]; then - mkdir -p "${BACKUP_DEST_PATH}" 2>/dev/null || true - fi - - if [ ! -w "${BACKUP_DEST_PATH}" ]; then - log_warning "Backup destination path '${BACKUP_DEST_PATH}' is not writable. Falling back to local backups." - BACKUP_DEST_PATH="${PROJECT_ROOT}/backups" - BACKUP_DEST_TYPE="LOCAL" - mkdir -p "${BACKUP_DEST_PATH}/dr" - fi - DR_LOCAL_DIR="${BACKUP_DEST_PATH}/dr" - mkdir -p "${DR_LOCAL_DIR}" - elif [ "${BACKUP_DEST_TYPE}" = "SCP" ]; then - if [ -z "${SCP_HOST}" ] || [ -z "${SCP_USER}" ]; then - log_error "SCP backup selected but SCP_HOST or SCP_USER is not configured in .env." - log_warning "Falling back to LOCAL backup directory." - BACKUP_DEST_TYPE="LOCAL" - BACKUP_DEST_PATH="${PROJECT_ROOT}/backups" - DR_LOCAL_DIR="${BACKUP_DEST_PATH}/dr" - mkdir -p "${DR_LOCAL_DIR}" - fi + local ssh_cmd="ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes -o ConnectTimeout=10" + + log "Vérification de la connectivité SSH vers le NAS ${NAS_HOST}..." + if ! ${ssh_cmd} "${NAS_USER}@${NAS_HOST}" "echo OK" >/dev/null 2>&1; then + log_error "Impossible de joindre le NAS ${NAS_HOST} via SSH." + log_error "Lancez d'abord : sudo bash scripts/setup-nas.sh" + exit 1 fi + + # S'assurer que le dossier snapshots existe sur le NAS + ${ssh_cmd} "${NAS_USER}@${NAS_HOST}" \ + "mkdir -p ${NAS_PATH}/snapshots" 2>/dev/null || true + + log_success "NAS SSH OK — Destination : ${NAS_USER}@${NAS_HOST}:${NAS_PATH}/snapshots" } # ============================================================================== @@ -127,22 +139,11 @@ perform_backup() { mkdir -p "${packing_dir}/db_backup" cp "${latest_db_backup}" "${packing_dir}/db_backup/" - # 5. Pack Nginx Proxy Manager (NPM) configs if configured - local has_npm_data=false - if [ -n "${NPM_DATA_DIR}" ] && [ -d "${NPM_DATA_DIR}" ]; then - log "Packaging Nginx Proxy Manager /data directory..." - cp -r "${NPM_DATA_DIR}" "${packing_dir}/npm_data" - has_npm_data=true - fi - if [ -n "${NPM_LETSENCRYPT_DIR}" ] && [ -d "${NPM_LETSENCRYPT_DIR}" ]; then - log "Packaging Nginx Proxy Manager /etc/letsencrypt directory..." - cp -r "${NPM_LETSENCRYPT_DIR}" "${packing_dir}/npm_letsencrypt" - has_npm_data=true - fi - - if [ "${has_npm_data}" = "false" ]; then - log_warning "NPM directories (NPM_DATA_DIR / NPM_LETSENCRYPT_DIR) not configured or not found. Skipping NPM config packaging." - fi + # 5. Note: NPM config is NOT backed up here. + # NPM runs on its own dedicated server (192.168.1.184) and is stable. + # Only the forward_host IP needs to change during failover, which is + # done automatically via the NPM API by npm-failover.sh during restore. + log "NPM is on dedicated server 192.168.1.184 — no NPM config to backup." # 6. Compress DR Archive local dr_archive_name="wordly_dr_${TIMESTAMP}.tar.gz" @@ -160,44 +161,47 @@ perform_backup() { local size size=$(du -h "${local_archive_path}" | cut -f1) - # 7. Route to Destination - if [ "${BACKUP_DEST_TYPE}" = "LOCAL" ] || [ "${BACKUP_DEST_TYPE}" = "NAS" ]; then - local dest_path="${DR_LOCAL_DIR}/${dr_archive_name}" - mv "${local_archive_path}" "${dest_path}" - log_success "DR archive created successfully (${size}) at: ${dest_path}" - - # Retention - log "Applying retention policy (pruning files older than ${DR_RETENTION_DAYS} days)..." - find "${DR_LOCAL_DIR}" -name "wordly_dr_*.tar.gz" -mtime +"${DR_RETENTION_DAYS}" -exec rm -f {} \; - - elif [ "${BACKUP_DEST_TYPE}" = "SCP" ]; then - log "Transferring DR archive to remote server via SCP (${SCP_USER}@${SCP_HOST}:${SCP_PORT})..." - - # Test connection & Create remote directory if not exists - if ! ssh -p "${SCP_PORT}" -i "${SCP_KEY_PATH}" -o ConnectTimeout=5 -o StrictHostKeyChecking=no "${SCP_USER}@${SCP_HOST}" "mkdir -p ${SCP_DEST_PATH}" 2>/dev/null; then - log_error "SSH connection to ${SCP_USER}@${SCP_HOST} failed. Saving archive locally instead." - mkdir -p "${PROJECT_ROOT}/backups/dr" - mv "${local_archive_path}" "${PROJECT_ROOT}/backups/dr/${dr_archive_name}" - log_warning "DR backup saved locally at: ${PROJECT_ROOT}/backups/dr/${dr_archive_name}" - exit 1 - fi + # 7. Envoyer l'archive sur le NAS via rsync SSH + local ssh_cmd="ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes -o ConnectTimeout=30" + local dest_path="${BACKUP_DEST_PATH}/${dr_archive_name}" - # SCP copy - if scp -P "${SCP_PORT}" -i "${SCP_KEY_PATH}" -o StrictHostKeyChecking=no "${local_archive_path}" "${SCP_USER}@${SCP_HOST}:${SCP_DEST_PATH}/${dr_archive_name}"; then - log_success "DR archive transferred successfully to ${SCP_USER}@${SCP_HOST}:${SCP_DEST_PATH}/${dr_archive_name}" - rm -f "${local_archive_path}" - - # Remote retention prune - log "Applying remote retention policy on backup server..." - ssh -p "${SCP_PORT}" -i "${SCP_KEY_PATH}" -o StrictHostKeyChecking=no "${SCP_USER}@${SCP_HOST}" \ - "find ${SCP_DEST_PATH} -name 'wordly_dr_*.tar.gz' -mtime +${DR_RETENTION_DAYS} -exec rm -f {} \;" || true - else - log_error "SCP file transfer failed. Retaining local backup." - mkdir -p "${PROJECT_ROOT}/backups/dr" - mv "${local_archive_path}" "${PROJECT_ROOT}/backups/dr/${dr_archive_name}" - fi + log "Transfert de l'archive DR vers le NAS via rsync SSH..." + if ! rsync -az \ + -e "ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes -o ConnectTimeout=30" \ + "${local_archive_path}" \ + "${NAS_USER}@${NAS_HOST}:${BACKUP_DEST_PATH}/${dr_archive_name}"; then + log_error "rsync SSH vers le NAS a échoué !" + log_warning "Archive conservée localement : ${local_archive_path}" + send_telegram "🚨 *Wordly DR Backup FAILED* +rsync NAS échoué : ${NAS_HOST} +Fichier local : ${local_archive_path} +Date: $(date '+%Y-%m-%d %H:%M:%S')" + exit 1 fi + rm -f "${local_archive_path}" + log_success "Archive DR transférée (${size}) → ${NAS_USER}@${NAS_HOST}:${dest_path}" + + # Retention policy sur le NAS + log "Rotation des archives (>${DR_RETENTION_DAYS} jours) sur le NAS..." + ${ssh_cmd} "${NAS_USER}@${NAS_HOST}" \ + "find ${BACKUP_DEST_PATH} -name 'wordly_dr_*.tar.gz' -mtime +${DR_RETENTION_DAYS} -delete 2>/dev/null; echo OK" | grep -q "OK" || true + + # Sync scripts + if command -v rsync &>/dev/null; then + rsync -az \ + -e "ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes" \ + --exclude="__pycache__" \ + "${SCRIPT_DIR}/" \ + "${NAS_USER}@${NAS_HOST}:${NAS_PATH}/scripts/" 2>/dev/null || true + fi + + send_telegram "✅ *Wordly.art DR Backup OK* +Archive: \`${dr_archive_name}\` +Taille: ${size} +NAS: \`${dest_path}\` +Date: $(date '+%Y-%m-%d %H:%M:%S')" + log_success "Disaster Recovery backup complete." } @@ -250,22 +254,7 @@ perform_restore() { source "${PROJECT_ROOT}/.env" set +a - # Restore NPM configs to their target directories if present in the package - if [ -d "${PROJECT_ROOT}/npm_data" ] && [ -n "${NPM_DATA_DIR}" ]; then - log "Restoring NPM /data directory..." - mkdir -p "$(dirname "${NPM_DATA_DIR}")" - rm -rf "${NPM_DATA_DIR}" - mv "${PROJECT_ROOT}/npm_data" "${NPM_DATA_DIR}" - fi - - if [ -d "${PROJECT_ROOT}/npm_letsencrypt" ] && [ -n "${NPM_LETSENCRYPT_DIR}" ]; then - log "Restoring NPM /etc/letsencrypt directory..." - mkdir -p "$(dirname "${NPM_LETSENCRYPT_DIR}")" - rm -rf "${NPM_LETSENCRYPT_DIR}" - mv "${PROJECT_ROOT}/npm_letsencrypt" "${NPM_LETSENCRYPT_DIR}" - fi - - log_success "Docker configurations, env keys, and NPM configurations restored." + log_success "Docker configurations and env keys restored." # Boot Docker Compose Services log "Spinning up Docker containers (database, redis, backend, frontend, NPM if configured)..." @@ -328,10 +317,63 @@ perform_restore() { log "Restarting application backend..." ${compose_cmd} restart backend + # HTTP Health check (wait up to 3 minutes) + log "Waiting for application health check (max 180s)..." + local app_url="http://localhost:8001/health" + local health_ok=false + for i in $(seq 1 36); do + local http_code + http_code=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 3 --max-time 5 "${app_url}" 2>/dev/null || echo "000") + if [ "${http_code}" = "200" ]; then + health_ok=true + log_success "App is healthy (HTTP 200) after $((i * 5))s" + break + fi + echo " Health check attempt ${i}/36... (HTTP ${http_code})" + sleep 5 + done + + if [ "${health_ok}" = "false" ]; then + log_error "App did NOT become healthy within 180s!" + log_error "NPM failover will NOT be triggered automatically." + log_error "Investigate: docker compose logs backend" + send_telegram "🚨 *Wordly.art DR FAILED — App unhealthy* +Serveur: \`$(hostname -I | awk '{print $1}')\` +Date: $(date '+%Y-%m-%d %H:%M:%S') +Action: vérifiez les logs Docker" + exit 1 + fi + + # ============================================================================== + # NPM AUTOMATIC FAILOVER + # ============================================================================== + log "App is healthy. Triggering NPM failover..." + local this_server_ip + this_server_ip="${SERVER_IP:-$(hostname -I | awk '{print $1}')}" + + if bash "${SCRIPT_DIR}/npm-failover.sh" --target-ip "${this_server_ip}"; then + log_success "NPM now routes traffic to this server (${this_server_ip})" + send_telegram "✅ *Wordly.art DR COMPLET* +Serveur actif: \`${this_server_ip}\` +NPM redirigé automatiquement +Date: $(date '+%Y-%m-%d %H:%M:%S')" + else + log_error "NPM failover script FAILED." + log_warning "Manual failover required:" + log_warning " → Go to http://192.168.1.184:81" + log_warning " → Edit proxy host for ${NPM_PROXY_HOST_DOMAIN:-wordly.art}" + log_warning " → Change Forward Hostname to: ${this_server_ip}" + send_telegram "⚠️ *Wordly.art DR — NPM manuel requis* +App OK sur: \`${this_server_ip}\` +NPM failover automatique a échoué +Action: http://192.168.1.184:81 → modifier Forward Host" + fi + log_success "==========================================================================" log_success "DISASTER RECOVERY SYSTEM RESTORE COMPLETE!" log_success "==========================================================================" - log "Your application and reverse-proxy routes are restored." + log_success " App: http://${this_server_ip}:8001/health" + log_success " NPM: http://192.168.1.184:81" echo "" } diff --git a/scripts/install-crontab.sh b/scripts/install-crontab.sh new file mode 100644 index 0000000..632f380 --- /dev/null +++ b/scripts/install-crontab.sh @@ -0,0 +1,73 @@ +#!/bin/bash +# ============================================================================== +# Wordly.art - Install Backup Crontab +# ============================================================================== +# Run ONCE to install all scheduled backup tasks. +# +# Usage: +# bash scripts/install-crontab.sh +# ============================================================================== + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +CRONTAB_FILE="${SCRIPT_DIR}/crontab.wordly" +LOG_DIR="/var/log" + +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +NC='\033[0m' + +log() { echo "[Crontab] $1"; } +log_success() { echo -e "[Crontab] ${GREEN}✅ $1${NC}"; } +log_warning() { echo -e "[Crontab] ${YELLOW}⚠️ $1${NC}"; } + +# ============================================================================== +# 1. Create the crontab file +# ============================================================================== +cat > "${CRONTAB_FILE}" <> ${LOG_DIR}/wordly-backup.log 2>&1 + +# Verify backup integrity 30 minutes after each backup +30 */6 * * * bash ${SCRIPT_DIR}/verify-backups.sh >> ${LOG_DIR}/wordly-verify.log 2>&1 + +# Rotate logs weekly (keep last 30 days) +0 4 * * 0 find ${LOG_DIR} -name "wordly-*.log" -mtime +30 -delete 2>/dev/null || true +EOF + +log_success "Crontab file created: ${CRONTAB_FILE}" + +# ============================================================================== +# 2. Install crontab for current user +# ============================================================================== +log "Installing crontab..." + +# Preserve existing crontab (if any), append new entries +EXISTING_CRON=$(crontab -l 2>/dev/null || true) + +# Remove any existing wordly entries (to avoid duplicates on re-run) +EXISTING_CRON_CLEAN=$(echo "${EXISTING_CRON}" | grep -v "wordly" | grep -v "backup-to-nas" | grep -v "verify-backups" || true) + +# Combine +NEW_CRON=$(printf "%s\n%s\n" "${EXISTING_CRON_CLEAN}" "$(cat "${CRONTAB_FILE}")") +echo "${NEW_CRON}" | crontab - + +log_success "Crontab installed!" +echo "" +log "Current crontab:" +crontab -l | grep -E "wordly|backup|verify" | sed 's/^/ /' +echo "" +log_success "Scheduled jobs:" +log_success " Every 6h (00:00/06:00/12:00/18:00) → backup-to-nas.sh" +log_success " Every 6h+30min → verify-backups.sh" +log_success " Every Sunday at 04:00 → log rotation" +echo "" +log_warning "Logs will be written to:" +log_warning " ${LOG_DIR}/wordly-backup.log" +log_warning " ${LOG_DIR}/wordly-verify.log" diff --git a/scripts/npm-failover.sh b/scripts/npm-failover.sh new file mode 100644 index 0000000..500f29d --- /dev/null +++ b/scripts/npm-failover.sh @@ -0,0 +1,325 @@ +#!/bin/bash +# ============================================================================== +# Wordly.art - NPM Failover via API +# ============================================================================== +# Automatically updates Nginx Proxy Manager's forward host via its REST API. +# Called by disaster-recovery.sh after a successful health check on the new server. +# +# Usage: +# ./npm-failover.sh --target-ip 192.168.1.98 # Switch to new server +# ./npm-failover.sh --target-ip 192.168.1.151 # Rollback to original server +# ./npm-failover.sh --dry-run --target-ip 192.168.1.98 # Test without modifying NPM +# ============================================================================== + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)" +TIMESTAMP=$(date +"%Y%m%d_%H%M%S") + +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' + +log() { echo -e "[NPM-Failover ${TIMESTAMP}] $1"; } +log_success() { echo -e "[NPM-Failover ${TIMESTAMP}] ${GREEN}$1${NC}"; } +log_warning() { echo -e "[NPM-Failover ${TIMESTAMP}] ${YELLOW}WARNING: $1${NC}"; } +log_error() { echo -e "[NPM-Failover ${TIMESTAMP}] ${RED}ERROR: $1${NC}"; } +log_info() { echo -e "[NPM-Failover ${TIMESTAMP}] ${BLUE}$1${NC}"; } + +# ============================================================================== +# 1. LOAD CONFIGURATION FROM .env +# ============================================================================== +ENV_FILE="${PROJECT_ROOT}/.env" +if [ -f "${ENV_FILE}" ]; then + set -a + source "${ENV_FILE}" + set +a +fi + +NPM_API_URL="${NPM_API_URL:-}" +NPM_ADMIN_EMAIL="${NPM_ADMIN_EMAIL:-}" +NPM_ADMIN_PASSWORD="${NPM_ADMIN_PASSWORD:-}" +NPM_PROXY_HOST_DOMAIN="${NPM_PROXY_HOST_DOMAIN:-wordly.art}" +TELEGRAM_BOT_TOKEN="${TELEGRAM_BOT_TOKEN:-}" +TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}" + +# ============================================================================== +# 2. ARGUMENT PARSING +# ============================================================================== +TARGET_IP="" +DRY_RUN=false + +while [[ $# -gt 0 ]]; do + case "$1" in + --target-ip) + TARGET_IP="$2" + shift 2 + ;; + --dry-run) + DRY_RUN=true + shift + ;; + *) + log_error "Unknown argument: $1" + echo "Usage: $0 --target-ip [--dry-run]" + exit 1 + ;; + esac +done + +# ============================================================================== +# 3. VALIDATION +# ============================================================================== +validate_config() { + local errors=0 + + if [ -z "${TARGET_IP}" ]; then + log_error "--target-ip is required." + errors=$((errors + 1)) + fi + if [ -z "${NPM_API_URL}" ]; then + log_error "NPM_API_URL is not set in .env (example: http://192.168.1.184:81/api)" + errors=$((errors + 1)) + fi + if [ -z "${NPM_ADMIN_EMAIL}" ]; then + log_error "NPM_ADMIN_EMAIL is not set in .env" + errors=$((errors + 1)) + fi + if [ -z "${NPM_ADMIN_PASSWORD}" ]; then + log_error "NPM_ADMIN_PASSWORD is not set in .env" + errors=$((errors + 1)) + fi + + if ! command -v curl &>/dev/null; then + log_error "curl is not installed. Required for NPM API calls." + errors=$((errors + 1)) + fi + if ! command -v jq &>/dev/null; then + log_error "jq is not installed. Required for JSON parsing. Install: apt-get install jq" + errors=$((errors + 1)) + fi + + if [ "${errors}" -gt 0 ]; then + exit 1 + fi +} + +# ============================================================================== +# 4. TELEGRAM NOTIFICATION +# ============================================================================== +send_telegram() { + local message="$1" + if [ -n "${TELEGRAM_BOT_TOKEN}" ] && [ -n "${TELEGRAM_CHAT_ID}" ]; then + curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \ + -d "chat_id=${TELEGRAM_CHAT_ID}" \ + -d "text=${message}" \ + -d "parse_mode=Markdown" \ + >/dev/null 2>&1 || true + fi +} + +# ============================================================================== +# 5. NPM API AUTHENTICATION +# ============================================================================== +npm_authenticate() { + log "Authenticating with NPM API at ${NPM_API_URL}..." + + local response + response=$(curl -s -w "\n%{http_code}" \ + -X POST "${NPM_API_URL}/tokens" \ + -H "Content-Type: application/json" \ + -d "{\"identity\": \"${NPM_ADMIN_EMAIL}\", \"secret\": \"${NPM_ADMIN_PASSWORD}\"}" \ + --connect-timeout 10 \ + --max-time 15) + + local http_code + http_code=$(echo "${response}" | tail -n1) + local body + body=$(echo "${response}" | head -n-1) + + if [ "${http_code}" != "200" ]; then + log_error "NPM authentication failed (HTTP ${http_code}). Check NPM_ADMIN_EMAIL and NPM_ADMIN_PASSWORD." + log_error "Response: ${body}" + return 1 + fi + + local token + token=$(echo "${body}" | jq -r '.token // empty') + if [ -z "${token}" ]; then + log_error "Could not extract token from NPM response." + log_error "Response: ${body}" + return 1 + fi + + log_success "NPM authentication successful." + echo "${token}" +} + +# ============================================================================== +# 6. FIND PROXY HOST BY DOMAIN +# ============================================================================== +npm_find_proxy_host() { + local token="$1" + log "Looking up proxy host for domain: ${NPM_PROXY_HOST_DOMAIN}..." + + local response + response=$(curl -s -w "\n%{http_code}" \ + -X GET "${NPM_API_URL}/nginx/proxy-hosts?expand=domain_names" \ + -H "Authorization: Bearer ${token}" \ + --connect-timeout 10 \ + --max-time 15) + + local http_code + http_code=$(echo "${response}" | tail -n1) + local body + body=$(echo "${response}" | head -n-1) + + if [ "${http_code}" != "200" ]; then + log_error "Failed to retrieve proxy hosts (HTTP ${http_code})" + return 1 + fi + + # Find the proxy host ID matching our domain + local host_id + host_id=$(echo "${body}" | jq -r \ + --arg domain "${NPM_PROXY_HOST_DOMAIN}" \ + '.[] | select(.domain_names[] == $domain) | .id' | head -n1) + + if [ -z "${host_id}" ]; then + log_error "No proxy host found for domain '${NPM_PROXY_HOST_DOMAIN}' in NPM." + log_error "Available domains:" + echo "${body}" | jq -r '.[].domain_names[]' | sed 's/^/ - /' >&2 + return 1 + fi + + log_success "Found proxy host ID: ${host_id} for ${NPM_PROXY_HOST_DOMAIN}" + + # Also retrieve current forward_host for logging + local current_host + current_host=$(echo "${body}" | jq -r \ + --arg domain "${NPM_PROXY_HOST_DOMAIN}" \ + '.[] | select(.domain_names[] == $domain) | .forward_host' | head -n1) + log_info "Current forward host: ${current_host}" + + echo "${host_id}|${current_host}" +} + +# ============================================================================== +# 7. UPDATE PROXY HOST FORWARD IP +# ============================================================================== +npm_update_proxy_host() { + local token="$1" + local host_id="$2" + local new_ip="$3" + + log "Updating proxy host ${host_id} → forward to ${new_ip}..." + + # First, get the full current configuration to preserve all existing settings + local current_config + current_config=$(curl -s \ + -X GET "${NPM_API_URL}/nginx/proxy-hosts/${host_id}" \ + -H "Authorization: Bearer ${token}" \ + --connect-timeout 10 \ + --max-time 15) + + # Build the update payload preserving existing config, only changing forward_host + local update_payload + update_payload=$(echo "${current_config}" | jq \ + --arg new_ip "${new_ip}" \ + '. + {"forward_host": $new_ip}') + + if [ "${DRY_RUN}" = "true" ]; then + log_warning "[DRY RUN] Would send PUT to ${NPM_API_URL}/nginx/proxy-hosts/${host_id}" + log_warning "[DRY RUN] Payload: ${update_payload}" + log_success "[DRY RUN] NPM failover simulation complete — no changes made." + return 0 + fi + + local response + response=$(curl -s -w "\n%{http_code}" \ + -X PUT "${NPM_API_URL}/nginx/proxy-hosts/${host_id}" \ + -H "Authorization: Bearer ${token}" \ + -H "Content-Type: application/json" \ + -d "${update_payload}" \ + --connect-timeout 10 \ + --max-time 15) + + local http_code + http_code=$(echo "${response}" | tail -n1) + local body + body=$(echo "${response}" | head -n-1) + + if [ "${http_code}" != "200" ]; then + log_error "Failed to update proxy host (HTTP ${http_code})" + log_error "Response: ${body}" + return 1 + fi + + # Verify the change was applied + local confirmed_host + confirmed_host=$(echo "${body}" | jq -r '.forward_host // empty') + if [ "${confirmed_host}" != "${new_ip}" ]; then + log_error "NPM accepted the request but the forward_host is '${confirmed_host}', expected '${new_ip}'." + return 1 + fi + + log_success "NPM proxy host updated successfully: ${NPM_PROXY_HOST_DOMAIN} → ${new_ip}" +} + +# ============================================================================== +# 8. MAIN +# ============================================================================== +main() { + echo "" + echo "=========================================================" + echo " Wordly.art — NPM Failover" + echo " Target IP : ${TARGET_IP:-NOT SET}" + echo " NPM API : ${NPM_API_URL:-NOT SET}" + echo " Domain : ${NPM_PROXY_HOST_DOMAIN}" + echo " Dry Run : ${DRY_RUN}" + echo "=========================================================" + echo "" + + validate_config + + # Step 1: Authenticate + local token + token=$(npm_authenticate) + + # Step 2: Find proxy host ID and current IP + local host_info + host_info=$(npm_find_proxy_host "${token}") + local host_id="${host_info%%|*}" + local current_ip="${host_info##*|}" + + if [ "${current_ip}" = "${TARGET_IP}" ]; then + log_warning "NPM already points to ${TARGET_IP}. No change needed." + exit 0 + fi + + # Step 3: Update forward host + npm_update_proxy_host "${token}" "${host_id}" "${TARGET_IP}" + + # Step 4: Notify + if [ "${DRY_RUN}" = "false" ]; then + local msg="🔀 *Wordly.art NPM Failover* +Domaine : \`${NPM_PROXY_HOST_DOMAIN}\` +Ancien serveur : \`${current_ip}\` +Nouveau serveur : \`${TARGET_IP}\` +Heure : $(date '+%Y-%m-%d %H:%M:%S')" + send_telegram "${msg}" + log_success "Telegram notification sent." + fi + + echo "" + log_success "=========================================================" + log_success "NPM Failover COMPLETE" + log_success " ${NPM_PROXY_HOST_DOMAIN} now routes to → ${TARGET_IP}" + log_success "=========================================================" + echo "" +} + +main "$@" diff --git a/scripts/setup-nas.sh b/scripts/setup-nas.sh new file mode 100644 index 0000000..2012ad8 --- /dev/null +++ b/scripts/setup-nas.sh @@ -0,0 +1,264 @@ +#!/bin/bash +# ============================================================================== +# Wordly.art - NAS Setup via SSH/rsync +# ============================================================================== +# Configure l'accès SSH sans mot de passe au NAS Synology. +# Remplace l'approche CIFS/SMB par rsync sur SSH : +# - Pas de montage à gérer, pas de fstab +# - Path exact /volume1/backups/wordly utilisable directement +# - SSH chiffré, robuste aux redémarrages NAS +# +# Usage: +# sudo bash scripts/setup-nas.sh +# +# Prérequis côté NAS Synology (voir DISASTER_RECOVERY.md section 1) : +# 1. Compte 'wordly-backup' créé dans DSM → Utilisateurs et groupes +# 2. Accès R/W sur le dossier 'backups' +# 3. SSH activé dans DSM → Terminal et SNMP +# 4. Dossier /volume1/backups/wordly créé et chown wordly-backup +# ============================================================================== + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)" + +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' + +log() { echo -e "[NAS-Setup] $1"; } +log_success() { echo -e "[NAS-Setup] ${GREEN}✅ $1${NC}"; } +log_warning() { echo -e "[NAS-Setup] ${YELLOW}⚠️ $1${NC}"; } +log_error() { echo -e "[NAS-Setup] ${RED}❌ $1${NC}"; } +log_info() { echo -e "[NAS-Setup] ${BLUE}ℹ️ $1${NC}"; } + +# ============================================================================== +# 1. ROOT CHECK +# ============================================================================== +if [ "$EUID" -ne 0 ]; then + log_error "Ce script doit être exécuté en root : sudo bash $0" + exit 1 +fi + +# ============================================================================== +# 2. CHARGER LE .env +# ============================================================================== +ENV_FILE="${PROJECT_ROOT}/.env" +if [ -f "${ENV_FILE}" ]; then + set -a + source "${ENV_FILE}" + set +a +fi + +NAS_HOST="${NAS_HOST:-192.168.1.146}" +NAS_USER="${NAS_USER:-wordly-backup}" +NAS_PATH="${NAS_PATH:-/volume1/backups/wordly}" +NAS_SSH_PORT="${NAS_SSH_PORT:-22}" +NAS_SSH_KEY="${NAS_SSH_KEY:-/root/.ssh/wordly_nas_key}" + +# ============================================================================== +# 3. VÉRIFIER QUE SSH EST DISPO SUR LE NAS +# ============================================================================== +check_nas_reachable() { + log "Vérification de la connectivité SSH vers ${NAS_HOST}:${NAS_SSH_PORT}..." + if ! nc -z -w5 "${NAS_HOST}" "${NAS_SSH_PORT}" 2>/dev/null; then + log_error "Impossible de joindre ${NAS_HOST} sur le port ${NAS_SSH_PORT}." + log_error "Vérifiez que SSH est activé dans DSM → Panneau de configuration → Terminal et SNMP." + exit 1 + fi + log_success "NAS ${NAS_HOST}:${NAS_SSH_PORT} est joignable" +} + +# ============================================================================== +# 4. GÉNÉRER LA CLÉ SSH DÉDIÉE (si elle n'existe pas) +# ============================================================================== +generate_ssh_key() { + if [ -f "${NAS_SSH_KEY}" ]; then + log_info "Clé SSH déjà existante : ${NAS_SSH_KEY}" + log_info "Si vous voulez la regénérer : rm ${NAS_SSH_KEY} ${NAS_SSH_KEY}.pub" + else + log "Génération de la clé SSH dédiée aux backups NAS..." + ssh-keygen -t ed25519 \ + -C "wordly-backup@$(hostname)-$(date +%Y%m%d)" \ + -f "${NAS_SSH_KEY}" \ + -N "" + chmod 600 "${NAS_SSH_KEY}" + log_success "Clé SSH générée : ${NAS_SSH_KEY}" + fi + + log_info "Clé publique à autoriser sur le NAS :" + cat "${NAS_SSH_KEY}.pub" +} + +# ============================================================================== +# 5. COPIER LA CLÉ PUBLIQUE SUR LE NAS +# ============================================================================== +install_ssh_key() { + log "Copie de la clé publique sur le NAS ${NAS_USER}@${NAS_HOST}..." + log_warning "Le mot de passe du compte '${NAS_USER}' sur le NAS vous sera demandé UNE SEULE FOIS." + echo "" + + if ssh-copy-id \ + -i "${NAS_SSH_KEY}.pub" \ + -p "${NAS_SSH_PORT}" \ + -o StrictHostKeyChecking=accept-new \ + "${NAS_USER}@${NAS_HOST}"; then + log_success "Clé publique installée sur le NAS — plus de mot de passe requis" + else + log_error "Échec de ssh-copy-id. Avez-vous bien créé le compte '${NAS_USER}' sur le Synology ?" + log_error "Vérifiez aussi que SSH est activé dans DSM." + exit 1 + fi +} + +# ============================================================================== +# 6. TESTER LA CONNEXION SANS MOT DE PASSE +# ============================================================================== +test_ssh_connection() { + log "Test de la connexion SSH sans mot de passe..." + local result + if result=$(ssh \ + -i "${NAS_SSH_KEY}" \ + -p "${NAS_SSH_PORT}" \ + -o StrictHostKeyChecking=accept-new \ + -o ConnectTimeout=10 \ + -o BatchMode=yes \ + "${NAS_USER}@${NAS_HOST}" \ + "echo OK" 2>/dev/null); then + if [ "${result}" = "OK" ]; then + log_success "Connexion SSH sans mot de passe : OK" + else + log_error "Connexion établie mais réponse inattendue : ${result}" + exit 1 + fi + else + log_error "Connexion SSH sans mot de passe ÉCHOUÉE." + log_error "Vérifiez que la clé a bien été copiée avec ssh-copy-id." + exit 1 + fi +} + +# ============================================================================== +# 7. CRÉER LA STRUCTURE DE DOSSIERS SUR LE NAS +# ============================================================================== +create_nas_directories() { + log "Création de la structure de dossiers sur le NAS : ${NAS_PATH}..." + ssh \ + -i "${NAS_SSH_KEY}" \ + -p "${NAS_SSH_PORT}" \ + -o BatchMode=yes \ + "${NAS_USER}@${NAS_HOST}" \ + "mkdir -p ${NAS_PATH}/snapshots ${NAS_PATH}/scripts && echo OK" + log_success "Structure créée sur le NAS :" + log_success " ${NAS_PATH}/snapshots/ → archives DR" + log_success " ${NAS_PATH}/scripts/ → scripts de restauration" +} + +# ============================================================================== +# 8. TESTER L'ÉCRITURE SUR LE NAS +# ============================================================================== +test_nas_write() { + log "Test d'écriture sur le NAS..." + local test_file="${NAS_PATH}/.write_test_$$" + if ssh \ + -i "${NAS_SSH_KEY}" \ + -p "${NAS_SSH_PORT}" \ + -o BatchMode=yes \ + "${NAS_USER}@${NAS_HOST}" \ + "touch ${test_file} && rm -f ${test_file} && echo OK" | grep -q "OK"; then + log_success "Écriture sur le NAS : OK" + else + log_error "Le NAS est accessible mais pas accessible en écriture !" + log_error "Vérifiez les permissions du dossier '${NAS_PATH}' pour l'utilisateur '${NAS_USER}'." + exit 1 + fi +} + +# ============================================================================== +# 9. CRÉER LE FICHIER DE CONFIG SSH (~/.ssh/config) +# ============================================================================== +configure_ssh_config() { + local ssh_config="/root/.ssh/config" + local host_entry=" +# Wordly.art — NAS Synology Backup +Host wordly-nas + HostName ${NAS_HOST} + User ${NAS_USER} + Port ${NAS_SSH_PORT} + IdentityFile ${NAS_SSH_KEY} + StrictHostKeyChecking accept-new + ConnectTimeout 10 + ServerAliveInterval 30 + BatchMode yes" + + if grep -q "wordly-nas" "${ssh_config}" 2>/dev/null; then + log_warning "Entrée 'wordly-nas' déjà dans ${ssh_config} — ignorée." + else + mkdir -p /root/.ssh + chmod 700 /root/.ssh + echo "${host_entry}" >> "${ssh_config}" + chmod 600 "${ssh_config}" + log_success "Config SSH ajoutée : ${ssh_config}" + log_info "Vous pouvez maintenant utiliser : ssh wordly-nas" + fi +} + +# ============================================================================== +# 10. COPIER LES SCRIPTS SUR LE NAS (disponibles depuis n'importe quel serveur) +# ============================================================================== +sync_scripts_to_nas() { + log "Synchronisation des scripts sur le NAS (pour restauration depuis .98)..." + if rsync -az \ + -e "ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes" \ + --exclude="__pycache__" \ + --exclude="*.pyc" \ + "${SCRIPT_DIR}/" \ + "${NAS_USER}@${NAS_HOST}:${NAS_PATH}/scripts/"; then + log_success "Scripts synchronisés sur le NAS : ${NAS_PATH}/scripts/" + else + log_warning "rsync a échoué — vérifiez que rsync est installé sur le NAS Synology." + log_warning "Essayez : Synology Package Center → installer 'Rsync Server'" + fi +} + +# ============================================================================== +# 11. MAIN +# ============================================================================== +main() { + echo "" + echo "=================================================================" + echo " Wordly.art — Setup NAS via SSH/rsync" + echo " NAS : ${NAS_USER}@${NAS_HOST}:${NAS_SSH_PORT}" + echo " Path : ${NAS_PATH}" + echo " Clé : ${NAS_SSH_KEY}" + echo "=================================================================" + echo "" + + check_nas_reachable + generate_ssh_key + install_ssh_key + test_ssh_connection + create_nas_directories + test_nas_write + configure_ssh_config + sync_scripts_to_nas + + echo "" + log_success "=================================================================" + log_success "Setup NAS COMPLET" + log_success "" + log_success " ✅ Clé SSH : ${NAS_SSH_KEY}" + log_success " ✅ Connexion sans mot de passe : wordly-nas" + log_success " ✅ Dossiers créés sur le NAS" + log_success " ✅ Scripts disponibles pour DR depuis n'importe quel serveur" + log_success "" + log_success " Test rapide : ssh wordly-nas 'ls ${NAS_PATH}/'" + log_success " Étape suivante : bash scripts/backup-to-nas.sh --full" + log_success "=================================================================" + echo "" +} + +main "$@" diff --git a/scripts/verify-backups.sh b/scripts/verify-backups.sh new file mode 100644 index 0000000..d09fd18 --- /dev/null +++ b/scripts/verify-backups.sh @@ -0,0 +1,370 @@ +#!/bin/bash +# ============================================================================== +# Wordly.art - Backup Verification & Telegram Alerts +# ============================================================================== +# Runs after every backup to validate integrity and alert on failure. +# CRON: 30 */6 * * * (30 minutes after each backup) +# +# Checks: +# - Recent snapshot exists (< 8h) +# - Snapshot size > 1MB (not empty) +# - Snapshot gzip integrity +# - PostgreSQL is responding +# - DB contains data (COUNT > 0) +# - NAS is mounted and writable +# - Disk usage < 85% +# - App HTTP health check +# ============================================================================== + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)" +TIMESTAMP=$(date +"%Y%m%d_%H%M%S") + +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +NC='\033[0m' + +log() { echo "[Verify ${TIMESTAMP}] $1"; } +log_success() { echo -e "[Verify ${TIMESTAMP}] ${GREEN}✅ $1${NC}"; } +log_warning() { echo -e "[Verify ${TIMESTAMP}] ${YELLOW}⚠️ WARNING: $1${NC}"; } +log_error() { echo -e "[Verify ${TIMESTAMP}] ${RED}❌ ERROR: $1${NC}"; } + +# ============================================================================== +# 1. LOAD CONFIGURATION +# ============================================================================== +ENV_FILE="${PROJECT_ROOT}/.env" +if [ -f "${ENV_FILE}" ]; then + set -a + source "${ENV_FILE}" + set +a +fi + +# Directories +NAS_MOUNT="${NAS_MOUNT:-/mnt/nas-wordly}" +LOCAL_BACKUP_DIR="${BACKUP_DIR:-/opt/wordly/backups}" + +# PostgreSQL +POSTGRES_CONTAINER="${POSTGRES_CONTAINER:-wordly-postgres}" +POSTGRES_USER="${POSTGRES_USER:-translate}" +POSTGRES_DB="${POSTGRES_DB:-translate_db}" +POSTGRES_PASSWORD="${POSTGRES_PASSWORD:-}" + +# App health check +APP_HEALTH_URL="${APP_HEALTH_URL:-http://localhost:8001/health}" + +# Thresholds +MAX_SNAPSHOT_AGE_HOURS=8 +MIN_SNAPSHOT_SIZE_MB=1 +MAX_DISK_USAGE_PERCENT=85 + +# Telegram +TELEGRAM_BOT_TOKEN="${TELEGRAM_BOT_TOKEN:-}" +TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}" + +# Track failures +FAILURES=0 +WARNINGS=0 + +# ============================================================================== +# 2. TELEGRAM +# ============================================================================== +send_telegram() { + local message="$1" + if [ -n "${TELEGRAM_BOT_TOKEN}" ] && [ -n "${TELEGRAM_CHAT_ID}" ]; then + curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \ + -d "chat_id=${TELEGRAM_CHAT_ID}" \ + -d "text=${message}" \ + -d "parse_mode=Markdown" \ + >/dev/null 2>&1 || true + else + log_warning "Telegram not configured (TELEGRAM_BOT_TOKEN or TELEGRAM_CHAT_ID missing)" + fi +} + +# ============================================================================== +# 3. CHECK FUNCTIONS +# ============================================================================== + +check_recent_snapshot() { + log "Check 1/8: Recent snapshot exists (< ${MAX_SNAPSHOT_AGE_HOURS}h)..." + + # Look in both NAS and local backup directories + local search_dirs=("${LOCAL_BACKUP_DIR}/daily") + if mountpoint -q "${NAS_MOUNT}" 2>/dev/null; then + search_dirs+=("${NAS_MOUNT}/snapshots") + fi + + local newest_snapshot="" + for dir in "${search_dirs[@]}"; do + if [ -d "${dir}" ]; then + local candidate + candidate=$(ls -t "${dir}"/*.gz 2>/dev/null | head -n1 || true) + if [ -n "${candidate}" ]; then + newest_snapshot="${candidate}" + break + fi + fi + done + + if [ -z "${newest_snapshot}" ]; then + log_error "No snapshot found in backup directories!" + FAILURES=$((FAILURES + 1)) + return + fi + + # Check age + local snapshot_time + snapshot_time=$(stat -c %Y "${newest_snapshot}" 2>/dev/null || stat -f %m "${newest_snapshot}" 2>/dev/null) + local now + now=$(date +%s) + local age_hours=$(( (now - snapshot_time) / 3600 )) + + if [ "${age_hours}" -ge "${MAX_SNAPSHOT_AGE_HOURS}" ]; then + log_error "Newest snapshot is ${age_hours}h old (max: ${MAX_SNAPSHOT_AGE_HOURS}h): $(basename "${newest_snapshot}")" + FAILURES=$((FAILURES + 1)) + else + log_success "Snapshot found: $(basename "${newest_snapshot}") (${age_hours}h old)" + fi + + echo "${newest_snapshot}" +} + +check_snapshot_size() { + local snapshot_path="$1" + log "Check 2/8: Snapshot size > ${MIN_SNAPSHOT_SIZE_MB}MB..." + + if [ -z "${snapshot_path}" ] || [ ! -f "${snapshot_path}" ]; then + log_warning "No snapshot to size-check." + return + fi + + local size_bytes + size_bytes=$(stat -c %s "${snapshot_path}" 2>/dev/null || stat -f %z "${snapshot_path}" 2>/dev/null) + local min_bytes=$((MIN_SNAPSHOT_SIZE_MB * 1024 * 1024)) + + if [ "${size_bytes}" -lt "${min_bytes}" ]; then + log_error "Snapshot size is $(numfmt --to=iec ${size_bytes}) which is below minimum ${MIN_SNAPSHOT_SIZE_MB}MB — likely empty dump!" + FAILURES=$((FAILURES + 1)) + else + log_success "Snapshot size: $(numfmt --to=iec ${size_bytes})" + fi +} + +check_snapshot_integrity() { + local snapshot_path="$1" + log "Check 3/8: Snapshot gzip integrity..." + + if [ -z "${snapshot_path}" ] || [ ! -f "${snapshot_path}" ]; then + log_warning "No snapshot to integrity-check." + return + fi + + if gzip -t "${snapshot_path}" 2>/dev/null; then + log_success "Snapshot gzip integrity OK" + else + log_error "Snapshot is CORRUPTED: $(basename "${snapshot_path}")" + FAILURES=$((FAILURES + 1)) + fi +} + +check_postgres_running() { + log "Check 4/8: PostgreSQL container is running and healthy..." + + if ! command -v docker &>/dev/null; then + log_warning "Docker not found — skipping PostgreSQL check." + return + fi + + if ! docker ps --format '{{.Names}}' 2>/dev/null | grep -q "^${POSTGRES_CONTAINER}$"; then + log_error "PostgreSQL container '${POSTGRES_CONTAINER}' is NOT running!" + FAILURES=$((FAILURES + 1)) + return + fi + + local health + health=$(docker inspect --format='{{.State.Health.Status}}' "${POSTGRES_CONTAINER}" 2>/dev/null || echo "unknown") + if [ "${health}" = "healthy" ]; then + log_success "PostgreSQL container is healthy" + elif [ "${health}" = "unknown" ]; then + log_warning "PostgreSQL health status unknown (no healthcheck configured?)" + WARNINGS=$((WARNINGS + 1)) + else + log_error "PostgreSQL container health status: ${health}" + FAILURES=$((FAILURES + 1)) + fi +} + +check_db_has_data() { + log "Check 5/8: Database contains data (COUNT > 0)..." + + if ! command -v docker &>/dev/null; then + log_warning "Docker not found — skipping DB data check." + return + fi + + if ! docker ps --format '{{.Names}}' 2>/dev/null | grep -q "^${POSTGRES_CONTAINER}$"; then + log_warning "PostgreSQL container not running — skipping data check." + return + fi + + # Count rows across key tables (gracefully handle missing tables) + local count + count=$(docker exec -e PGPASSWORD="${POSTGRES_PASSWORD}" "${POSTGRES_CONTAINER}" \ + psql -U "${POSTGRES_USER}" -d "${POSTGRES_DB}" -t -A \ + -c "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public';" \ + 2>/dev/null || echo "0") + + count=$(echo "${count}" | tr -d '[:space:]') + + if [ "${count}" = "0" ] || [ -z "${count}" ]; then + log_error "Database appears to be empty (no public tables found)!" + FAILURES=$((FAILURES + 1)) + else + log_success "Database has ${count} tables in public schema" + fi +} + +check_nas_mounted() { + log "Check 6/8: NAS is mounted and writable at ${NAS_MOUNT}..." + + if ! mountpoint -q "${NAS_MOUNT}" 2>/dev/null; then + log_error "NAS is NOT mounted at ${NAS_MOUNT}!" + log "Attempting emergency remount..." + mount "${NAS_MOUNT}" 2>/dev/null || true + + if ! mountpoint -q "${NAS_MOUNT}" 2>/dev/null; then + log_error "Emergency remount FAILED. NAS is unavailable." + FAILURES=$((FAILURES + 1)) + return + fi + log_warning "NAS remounted successfully (was temporarily unmounted)." + WARNINGS=$((WARNINGS + 1)) + fi + + # Test write access + local test_file="${NAS_MOUNT}/.write_test_${TIMESTAMP}" + if touch "${test_file}" 2>/dev/null && rm -f "${test_file}" 2>/dev/null; then + log_success "NAS is mounted and writable" + else + log_error "NAS is mounted but NOT writable!" + FAILURES=$((FAILURES + 1)) + fi +} + +check_disk_space() { + log "Check 7/8: Disk usage < ${MAX_DISK_USAGE_PERCENT}%..." + + # Check NAS disk if mounted + if mountpoint -q "${NAS_MOUNT}" 2>/dev/null; then + local nas_usage + nas_usage=$(df "${NAS_MOUNT}" | awk 'NR==2 {gsub(/%/,""); print $5}') + if [ "${nas_usage}" -ge "${MAX_DISK_USAGE_PERCENT}" ]; then + log_error "NAS disk usage is ${nas_usage}% (threshold: ${MAX_DISK_USAGE_PERCENT}%)" + FAILURES=$((FAILURES + 1)) + else + log_success "NAS disk usage: ${nas_usage}%" + fi + fi + + # Check local disk + local local_usage + local_usage=$(df /opt 2>/dev/null | awk 'NR==2 {gsub(/%/,""); print $5}' || df / | awk 'NR==2 {gsub(/%/,""); print $5}') + if [ "${local_usage}" -ge "${MAX_DISK_USAGE_PERCENT}" ]; then + log_error "Local disk usage is ${local_usage}% (threshold: ${MAX_DISK_USAGE_PERCENT}%)" + WARNINGS=$((WARNINGS + 1)) + else + log_success "Local disk usage: ${local_usage}%" + fi +} + +check_app_health() { + log "Check 8/8: App HTTP health check at ${APP_HEALTH_URL}..." + + if ! command -v curl &>/dev/null; then + log_warning "curl not found — skipping HTTP health check." + return + fi + + local http_code + http_code=$(curl -s -o /dev/null -w "%{http_code}" \ + --connect-timeout 5 \ + --max-time 10 \ + "${APP_HEALTH_URL}" 2>/dev/null || echo "000") + + if [ "${http_code}" = "200" ]; then + log_success "App health check passed (HTTP ${http_code})" + elif [ "${http_code}" = "000" ]; then + log_error "App is unreachable (connection timeout)" + FAILURES=$((FAILURES + 1)) + else + log_error "App health check returned HTTP ${http_code}" + FAILURES=$((FAILURES + 1)) + fi +} + +# ============================================================================== +# 4. MAIN +# ============================================================================== +main() { + echo "" + echo "=========================================================" + echo " Wordly.art — Backup Verification" + echo " $(date '+%Y-%m-%d %H:%M:%S')" + echo "=========================================================" + echo "" + + # Run all checks + local newest_snapshot + newest_snapshot=$(check_recent_snapshot) + + check_snapshot_size "${newest_snapshot}" + check_snapshot_integrity "${newest_snapshot}" + check_postgres_running + check_db_has_data + check_nas_mounted + check_disk_space + check_app_health + + echo "" + echo "=========================================================" + echo " Results: ${FAILURES} failure(s), ${WARNINGS} warning(s)" + echo "=========================================================" + echo "" + + # Send Telegram report + if [ "${FAILURES}" -gt 0 ]; then + local msg="🚨 *Wordly.art — Backup Verification FAILED* +Date: $(date '+%Y-%m-%d %H:%M:%S') +Failures: ${FAILURES} +Warnings: ${WARNINGS} + +Check logs on 192.168.1.151: +\`cat /var/log/wordly-verify.log\`" + send_telegram "${msg}" + log_error "Verification FAILED with ${FAILURES} error(s). Telegram alert sent." + exit 1 + elif [ "${WARNINGS}" -gt 0 ]; then + local msg="⚠️ *Wordly.art — Backup Verification passed with warnings* +Date: $(date '+%Y-%m-%d %H:%M:%S') +Failures: 0 +Warnings: ${WARNINGS}" + send_telegram "${msg}" + log_warning "Verification passed with ${WARNINGS} warning(s)." + else + # Only send success alert once per day (at 06:30) + local hour + hour=$(date +%H) + if [ "${hour}" = "06" ]; then + local msg="✅ *Wordly.art — Daily backup check OK* +Date: $(date '+%Y-%m-%d %H:%M:%S') +All 8 checks passed." + send_telegram "${msg}" + fi + log_success "All checks passed." + fi +} + +main "$@"