From 80b49ee354d13a2053f858116000f87d9266062a Mon Sep 17 00:00:00 2001 From: sepehr Date: Sun, 7 Jun 2026 09:50:51 +0200 Subject: [PATCH] Robustness: Implement multi-destination backups (LOCAL, NAS, SCP) and backup/restore of NPM configurations --- DISASTER_RECOVERY.md | 158 +++++++++++----------- scripts/disaster-recovery.sh | 245 ++++++++++++++++++++++------------- 2 files changed, 230 insertions(+), 173 deletions(-) diff --git a/DISASTER_RECOVERY.md b/DISASTER_RECOVERY.md index 33e5e97..348daee 100644 --- a/DISASTER_RECOVERY.md +++ b/DISASTER_RECOVERY.md @@ -1,106 +1,102 @@ -# Guide de Reprise d'Activité (Disaster Recovery Playbook) -> Procédure pas-à-pas en cas de crash total du serveur principal (`192.168.1.151`) +# Playbook de Sauvegarde Complète & Reprise d'Activité (Disaster Recovery) +> Gestion des pannes matérielles, sauvegarde de Nginx Proxy Manager (NPM) et transfert distant (sans NAS). --- ## 🎯 Objectif -Ce document décrit comment restaurer l'intégralité de la plateforme SaaS **Wordly.art** (Base de données active, configurations secrètes `.env` et services Docker) sur un **nouveau serveur** de secours si le serveur principal tombe en panne complète. +Ce document explique comment automatiser la sauvegarde et restaurer l'intégralité de la plateforme SaaS **Wordly.art** (Base de données, fichier de configuration `.env` contenant vos secrets, et configurations de routage SSL/Proxy de **Nginx Proxy Manager**) sur un nouveau serveur en cas de crash du serveur principal. --- -## 📁 Fonctionnement de la sauvegarde DR (Disaster Recovery) +## ⚙️ 1. Variables de configuration dans le `.env` -Le script de reprise d'activité [disaster-recovery.sh](file:///d:/dev1405/office_translator/scripts/disaster-recovery.sh) génère une archive compressée contenant l'intégralité du système à restaurer : -1. **La base de données active** (PostgreSQL ou SQLite). -2. **Le fichier de configuration de production** `.env` (contenant vos clés API Stripe, OpenAI, DeepL, etc.). -3. **Le fichier `docker-compose.yml`** et ses variantes. -4. **Le dossier `docker/`** contenant toutes les configurations Prometheus, Grafana, Nginx, etc. +Pour activer les options de reprise d'activité, ajoutez ces variables dans votre fichier `.env` de production : -Toutes ces archives sont stockées sur votre **NAS** à l'abri des pannes matérielles du serveur local : `/mnt/nas-backups/wordly/dr/`. +```ini +# ============== Configuration Disaster Recovery (DR) ============== +# Choix de la destination : LOCAL, NAS, ou SCP +BACKUP_DEST_TYPE=LOCAL +# Chemin local ou point de montage (ex: /mnt/nas-backups/wordly) +BACKUP_DEST_PATH=/var/backups/wordly + +# Configuration SSH/SCP (requis uniquement si BACKUP_DEST_TYPE=SCP) +SCP_HOST=192.168.1.200 +SCP_USER=backup_user +SCP_KEY_PATH=/root/.ssh/id_rsa +SCP_PORT=22 +SCP_DEST_PATH=/var/backups/wordly_saas + +# Configurations des dossiers de Nginx Proxy Manager (NPM) +# Laissez vide si NPM tourne sur une autre machine et n'est pas géré ici. +NPM_DATA_DIR=/opt/npm/data +NPM_LETSENCRYPT_DIR=/opt/npm/letsencrypt +``` --- -## 🛠️ Étape 1 : Automatisation de la sauvegarde complète (A faire aujourd'hui) +## 🛠️ 2. Comment configurer la sauvegarde à distance (Mode SCP) -Pour que la sauvegarde Disaster Recovery s'exécute automatiquement chaque nuit à 03h30 : +Si vous n'avez pas de NAS, le mode **SCP** permet d'envoyer chaque nuit l'archive complète vers une autre machine ou ordinateur de votre réseau local (ex: `192.168.1.200`). -1. Connectez-vous en SSH sur votre serveur principal (`192.168.1.151`). -2. Ouvrez le planificateur de tâches (cron) : - ```bash - crontab -e - ``` -3. Ajoutez la ligne suivante tout à la fin du fichier : - ```cron - 30 3 * * * /opt/wordly/scripts/disaster-recovery.sh --backup >> /var/log/wordly-dr-backup.log 2>&1 - ``` -4. Sauvegardez et quittez. Désormais, une archive complète de restauration sera créée et envoyée sur votre NAS chaque nuit, avec une rétention automatique de **14 jours**. +### Étape A : Générer une clé SSH sur le serveur principal +Sur le serveur applicatif (`192.168.1.151`), si vous n'avez pas de clé SSH : +```bash +sudo ssh-keygen -t rsa -b 4096 -N "" -f /root/.ssh/id_rsa +``` + +### Étape B : Autoriser la connexion sur la machine de backup +Copiez la clé publique sur votre machine de sauvegarde (`192.168.1.200`) : +```bash +sudo ssh-copy-id -i /root/.ssh/id_rsa.pub backup_user@192.168.1.200 +``` +*Vérification* : Exécutez `sudo ssh -i /root/.ssh/id_rsa backup_user@192.168.1.200` depuis le serveur principal. Vous devez vous connecter **sans saisir de mot de passe**. --- -## 🚨 Étape 2 : Procédure de restauration (En cas de crash du serveur) +## 📅 3. Automatisation quotidienne -Si le serveur `192.168.1.151` est indisponible et que vous devez remonter le SaaS sur une nouvelle machine (ex : **`192.168.1.152`**), suivez ces étapes : +Ajoutez le script à votre crontab pour qu'il s'exécute automatiquement chaque nuit à 03h30 : +```bash +sudo crontab -e +``` +Ajoutez cette ligne tout à la fin : +```cron +30 3 * * * /opt/wordly/scripts/disaster-recovery.sh --backup >> /var/log/wordly-dr-backup.log 2>&1 +``` -### 2.1 Préparation de la nouvelle machine -1. Installez Docker et Docker Compose sur le nouveau serveur : +--- + +## 🚨 4. Procédure de restauration sur un nouveau serveur (Failover) + +Si le serveur principal crashe complètement et que vous devez remonter l'infrastructure sur un serveur de secours (ex: `192.168.1.152`) : + +### Étape 4.1 : Récupérer l'archive de sauvegarde +Récupérez le dernier fichier `wordly_dr_TIMESTAMP.tar.gz` depuis votre stockage de backup (NAS, machine de backup distante via SCP, ou clé USB). + +### Étape 4.2 : Installer Docker sur le nouveau serveur +```bash +curl -fsSL https://get.docker.com | sh +sudo usermod -aG docker $USER && newgrp docker +``` + +### Étape 4.3 : Lancer la restauration automatique +1. Créez le dossier de destination et placez-vous dedans : ```bash - curl -fsSL https://get.docker.com | sh - sudo usermod -aG docker $USER && newgrp docker - ``` -2. Créez le dossier du projet et clonez le repository (ou copiez les fichiers depuis le NAS) : - ```bash - git clone -b production-deployment https://gitea.parsanet.org/sepehr/office_translator.git /opt/wordly + sudo mkdir -p /opt/wordly cd /opt/wordly ``` +2. Lancez le script de restauration à partir de l'archive (le script va extraire le `.env`, copier le `docker-compose.yml`, restaurer les configurations et certificats SSL de NPM, démarrer Docker et réinjecter les données de la base de données) : + ```bash + # Remplacez par le nom ou le chemin exact de votre archive + bash /chemin/vers/votre/archive/scripts/disaster-recovery.sh --restore /chemin/vers/votre/archive/wordly_dr_20260607_033000.tar.gz + ``` +3. Validez l'action en saisissant `RESTORE-ALL` lorsque le script vous le demande. -### 2.2 Monter le NAS sur le nouveau serveur -Pour que le nouveau serveur accède aux sauvegardes stockées sur votre NAS : -1. Installez l'utilitaire de montage : - ```bash - sudo apt install cifs-utils -y - sudo mkdir -p /mnt/nas-backups/wordly - ``` -2. Créez le fichier de credentials : - ```bash - sudo tee /etc/nas-credentials </dev/null || true -fi +# SCP Configuration +SCP_HOST="${SCP_HOST:-}" +SCP_USER="${SCP_USER:-}" +SCP_KEY_PATH="${SCP_KEY_PATH:-~/.ssh/id_rsa}" +SCP_PORT="${SCP_PORT:-22}" +SCP_DEST_PATH="${SCP_DEST_PATH:-/var/backups/wordly}" -if [ ! -w "${NAS_BACKUP_DIR}" ]; then - log_warning "NAS backup directory '${NAS_BACKUP_DIR}' is not writable or mounted. Falling back to local backups folder." - NAS_BACKUP_DIR="${PROJECT_ROOT}/backups" -fi +# NPM Configuration directories +NPM_DATA_DIR="${NPM_DATA_DIR:-}" +NPM_LETSENCRYPT_DIR="${NPM_LETSENCRYPT_DIR:-}" -DR_BACKUP_DIR="${NAS_BACKUP_DIR}/dr" -DR_RETENTION_DAYS=14 +# ============================================================================== +# DESTINATION PREPARATION +# ============================================================================== +prepare_destination() { + if [ "${BACKUP_DEST_TYPE}" = "NAS" ] || [ "${BACKUP_DEST_TYPE}" = "LOCAL" ]; then + if [ ! -d "${BACKUP_DEST_PATH}" ]; then + mkdir -p "${BACKUP_DEST_PATH}" 2>/dev/null || true + fi + + if [ ! -w "${BACKUP_DEST_PATH}" ]; then + log_warning "Backup destination path '${BACKUP_DEST_PATH}' is not writable. Falling back to local backups." + BACKUP_DEST_PATH="${PROJECT_ROOT}/backups" + BACKUP_DEST_TYPE="LOCAL" + mkdir -p "${BACKUP_DEST_PATH}/dr" + fi + DR_LOCAL_DIR="${BACKUP_DEST_PATH}/dr" + mkdir -p "${DR_LOCAL_DIR}" + elif [ "${BACKUP_DEST_TYPE}" = "SCP" ]; then + if [ -z "${SCP_HOST}" ] || [ -z "${SCP_USER}" ]; then + log_error "SCP backup selected but SCP_HOST or SCP_USER is not configured in .env." + log_warning "Falling back to LOCAL backup directory." + BACKUP_DEST_TYPE="LOCAL" + BACKUP_DEST_PATH="${PROJECT_ROOT}/backups" + DR_LOCAL_DIR="${BACKUP_DEST_PATH}/dr" + mkdir -p "${DR_LOCAL_DIR}" + fi + fi +} # ============================================================================== # BACKUP ACTION # ============================================================================== perform_backup() { - log "Initiating complete Disaster Recovery backup..." + prepare_destination + log "Starting Disaster Recovery backup (Destination Mode: ${BACKUP_DEST_TYPE})..." - # 1. Trigger DB Backup first - log "Triggering database backup..." + # 1. Trigger DB Backup + log "Triggering database dump..." if ! bash "${SCRIPT_DIR}/backup-database.sh" --full; then - log_error "Database backup failed. Aborting DR package." + log_error "Database backup failed. Aborting DR packaging." exit 1 fi - # 2. Locate the latest DB backup - # Check default backup directory (resolved from .env or script fallback) - local local_backup_dir - local_backup_dir="${BACKUP_DIR:-${PROJECT_ROOT}/backups}" + # 2. Locate DB Backup file + local local_backup_dir="${BACKUP_DIR:-${PROJECT_ROOT}/backups}" local latest_db_backup - latest_db_backup=$(ls -t "${local_backup_dir}/daily/"*.gz 2>/dev/null | head -n 1 || true) if [ -z "${latest_db_backup}" ]; then - log_error "Could not locate the generated database backup file in ${local_backup_dir}/daily/." + log_error "Could not find database backup file." exit 1 fi - log "Latest database backup located: $(basename "${latest_db_backup}")" + log "Database backup file loaded: $(basename "${latest_db_backup}")" - # 3. Create temp packing folder + # 3. Create temp packaging folder local packing_dir="${PROJECT_ROOT}/temp_dr_pack_${TIMESTAMP}" mkdir -p "${packing_dir}" - - # 4. Copy configurations - log "Packaging configuration files..." + + # 4. Pack Configurations + log "Packing application configuration (.env & docker-compose)..." if [ -f "${PROJECT_ROOT}/.env" ]; then cp "${PROJECT_ROOT}/.env" "${packing_dir}/.env.production" - else - log_warning "No .env file found at project root. Continuing without it." fi - - # Copy docker-compose files + for f in docker-compose.yml docker-compose.local.yml docker-compose.monitoring.yml docker-compose.dev.yml; do if [ -f "${PROJECT_ROOT}/${f}" ]; then cp "${PROJECT_ROOT}/${f}" "${packing_dir}/" fi done - # Copy docker directory (Prometheus, Grafana, Nginx configs, Dockerfiles) if [ -d "${PROJECT_ROOT}/docker" ]; then cp -r "${PROJECT_ROOT}/docker" "${packing_dir}/" fi - - # Copy scripts directory (so restore scripts are present in the package) if [ -d "${PROJECT_ROOT}/scripts" ]; then cp -r "${PROJECT_ROOT}/scripts" "${packing_dir}/" fi - # Copy the DB backup archive mkdir -p "${packing_dir}/db_backup" cp "${latest_db_backup}" "${packing_dir}/db_backup/" - # 5. Compress Everything - mkdir -p "${DR_BACKUP_DIR}" + # 5. Pack Nginx Proxy Manager (NPM) configs if configured + local has_npm_data=false + if [ -n "${NPM_DATA_DIR}" ] && [ -d "${NPM_DATA_DIR}" ]; then + log "Packaging Nginx Proxy Manager /data directory..." + cp -r "${NPM_DATA_DIR}" "${packing_dir}/npm_data" + has_npm_data=true + fi + if [ -n "${NPM_LETSENCRYPT_DIR}" ] && [ -d "${NPM_LETSENCRYPT_DIR}" ]; then + log "Packaging Nginx Proxy Manager /etc/letsencrypt directory..." + cp -r "${NPM_LETSENCRYPT_DIR}" "${packing_dir}/npm_letsencrypt" + has_npm_data=true + fi + + if [ "${has_npm_data}" = "false" ]; then + log_warning "NPM directories (NPM_DATA_DIR / NPM_LETSENCRYPT_DIR) not configured or not found. Skipping NPM config packaging." + fi + + # 6. Compress DR Archive local dr_archive_name="wordly_dr_${TIMESTAMP}.tar.gz" - local dr_archive_path="${DR_BACKUP_DIR}/${dr_archive_name}" + local local_archive_path="${PROJECT_ROOT}/${dr_archive_name}" - log "Compressing configurations and database into DR package..." - tar -czf "${dr_archive_path}" -C "${packing_dir}" . - - # Clean up temp packaging folder + log "Compressing configurations, database, and NPM data into DR archive..." + tar -czf "${local_archive_path}" -C "${packing_dir}" . rm -rf "${packing_dir}" - if [ -f "${dr_archive_path}" ] && [ -s "${dr_archive_path}" ]; then - local size - size=$(du -h "${dr_archive_path}" | cut -f1) - log_success "Disaster Recovery backup package created: ${dr_archive_name} (${size})" - log_success "Stored securely at: ${dr_archive_path}" - - # 6. Apply retention cleanups - log "Cleaning up old DR packages (retention: ${DR_RETENTION_DAYS} days)..." - find "${DR_BACKUP_DIR}" -name "wordly_dr_*.tar.gz" -mtime +"${DR_RETENTION_DAYS}" -exec rm -f {} \; - log_success "Disaster Recovery backup complete." - else - log_error "DR Archive compression failed." + if [ ! -f "${local_archive_path}" ] || [ ! -s "${local_archive_path}" ]; then + log_error "Failed to compress archive." exit 1 fi + + local size + size=$(du -h "${local_archive_path}" | cut -f1) + + # 7. Route to Destination + if [ "${BACKUP_DEST_TYPE}" = "LOCAL" ] || [ "${BACKUP_DEST_TYPE}" = "NAS" ]; then + local dest_path="${DR_LOCAL_DIR}/${dr_archive_name}" + mv "${local_archive_path}" "${dest_path}" + log_success "DR archive created successfully (${size}) at: ${dest_path}" + + # Retention + log "Applying retention policy (pruning files older than ${DR_RETENTION_DAYS} days)..." + find "${DR_LOCAL_DIR}" -name "wordly_dr_*.tar.gz" -mtime +"${DR_RETENTION_DAYS}" -exec rm -f {} \; + + elif [ "${BACKUP_DEST_TYPE}" = "SCP" ]; then + log "Transferring DR archive to remote server via SCP (${SCP_USER}@${SCP_HOST}:${SCP_PORT})..." + + # Test connection & Create remote directory if not exists + if ! ssh -p "${SCP_PORT}" -i "${SCP_KEY_PATH}" -o ConnectTimeout=5 -o StrictHostKeyChecking=no "${SCP_USER}@${SCP_HOST}" "mkdir -p ${SCP_DEST_PATH}" 2>/dev/null; then + log_error "SSH connection to ${SCP_USER}@${SCP_HOST} failed. Saving archive locally instead." + mkdir -p "${PROJECT_ROOT}/backups/dr" + mv "${local_archive_path}" "${PROJECT_ROOT}/backups/dr/${dr_archive_name}" + log_warning "DR backup saved locally at: ${PROJECT_ROOT}/backups/dr/${dr_archive_name}" + exit 1 + fi + + # SCP copy + if scp -P "${SCP_PORT}" -i "${SCP_KEY_PATH}" -o StrictHostKeyChecking=no "${local_archive_path}" "${SCP_USER}@${SCP_HOST}:${SCP_DEST_PATH}/${dr_archive_name}"; then + log_success "DR archive transferred successfully to ${SCP_USER}@${SCP_HOST}:${SCP_DEST_PATH}/${dr_archive_name}" + rm -f "${local_archive_path}" + + # Remote retention prune + log "Applying remote retention policy on backup server..." + ssh -p "${SCP_PORT}" -i "${SCP_KEY_PATH}" -o StrictHostKeyChecking=no "${SCP_USER}@${SCP_HOST}" \ + "find ${SCP_DEST_PATH} -name 'wordly_dr_*.tar.gz' -mtime +${DR_RETENTION_DAYS} -exec rm -f {} \;" || true + else + log_error "SCP file transfer failed. Retaining local backup." + mkdir -p "${PROJECT_ROOT}/backups/dr" + mv "${local_archive_path}" "${PROJECT_ROOT}/backups/dr/${dr_archive_name}" + fi + fi + + log_success "Disaster Recovery backup complete." } # ============================================================================== @@ -147,20 +210,17 @@ perform_restore() { if [ -z "${dr_package}" ]; then log_error "No DR package archive specified." echo "Usage: $0 --restore " - echo "Available archives in ${DR_BACKUP_DIR}:" - ls -lh "${DR_BACKUP_DIR}"/wordly_dr_*.tar.gz 2>/dev/null || echo " (none)" exit 1 fi if [ ! -f "${dr_package}" ]; then - log_error "Archive file not found: ${dr_package}" + log_error "DR Archive file not found: ${dr_package}" exit 1 fi echo "" - log_warning "RESTORE DISASTER RECOVERY PACKAGE - THIS WILL OVERWRITE ENVIRONMENT CONFIGURATIONS AND DATABASES!" + log_warning "RESTORE DISASTER RECOVERY PACKAGE - THIS WILL OVERWRITE ENVIRONMENT CONFIGURATIONS, DATABASES, AND NPM FILES!" echo " Archive: ${dr_package}" - echo " Target : Current Server Host (Workspace)" echo "" read -p "Type 'RESTORE-ALL' to confirm complete system restore: " confirm_val if [ "${confirm_val}" != "RESTORE-ALL" ]; then @@ -169,17 +229,17 @@ perform_restore() { fi log "Extracting DR archive contents..." - - # Create safety backup of existing .env before overwrite + + # Safety backup of existing .env if [ -f "${PROJECT_ROOT}/.env" ]; then cp "${PROJECT_ROOT}/.env" "${PROJECT_ROOT}/.env.bak_before_dr_restore_${TIMESTAMP}" log "Created backup of existing .env: .env.bak_before_dr_restore_${TIMESTAMP}" fi - # Extract configs directly into project root + # Extract all tar -xzf "${dr_package}" -C "${PROJECT_ROOT}" - # Restore .env from packaged .env.production + # Restore .env if [ -f "${PROJECT_ROOT}/.env.production" ]; then mv "${PROJECT_ROOT}/.env.production" "${PROJECT_ROOT}/.env" log "Restored .env configuration" @@ -190,22 +250,30 @@ perform_restore() { source "${PROJECT_ROOT}/.env" set +a - log_success "Docker and configurations extracted successfully." - - # Boot Docker Compose Services - log "Spinning up Docker containers (database, redis, backend, frontend)..." - if ! command -v docker-compose &>/dev/null && ! docker compose version &>/dev/null; then - log_error "docker-compose is not installed. Please install Docker first." - exit 1 + # Restore NPM configs to their target directories if present in the package + if [ -d "${PROJECT_ROOT}/npm_data" ] && [ -n "${NPM_DATA_DIR}" ]; then + log "Restoring NPM /data directory..." + mkdir -p "$(dirname "${NPM_DATA_DIR}")" + rm -rf "${NPM_DATA_DIR}" + mv "${PROJECT_ROOT}/npm_data" "${NPM_DATA_DIR}" fi - # Try running docker compose + if [ -d "${PROJECT_ROOT}/npm_letsencrypt" ] && [ -n "${NPM_LETSENCRYPT_DIR}" ]; then + log "Restoring NPM /etc/letsencrypt directory..." + mkdir -p "$(dirname "${NPM_LETSENCRYPT_DIR}")" + rm -rf "${NPM_LETSENCRYPT_DIR}" + mv "${PROJECT_ROOT}/npm_letsencrypt" "${NPM_LETSENCRYPT_DIR}" + fi + + log_success "Docker configurations, env keys, and NPM configurations restored." + + # Boot Docker Compose Services + log "Spinning up Docker containers (database, redis, backend, frontend, NPM if configured)..." local compose_cmd="docker compose" if ! docker compose version &>/dev/null; then compose_cmd="docker-compose" fi - # Start services in detached mode ${compose_cmd} up -d # Locate the embedded database backup @@ -242,7 +310,6 @@ perform_restore() { # Restore the database using the database backup script log "Triggering database restore..." - # Make sure backups/daily folder exists and copy the db backup there for backup-database.sh to see it local local_backup_dir="${BACKUP_DIR:-${PROJECT_ROOT}/backups}" mkdir -p "${local_backup_dir}/daily" cp "${db_backup_archive}" "${local_backup_dir}/daily/" @@ -251,13 +318,10 @@ perform_restore() { db_archive_filename=$(basename "${db_backup_archive}") # Run DB restore - # Sourcing backup-database.sh with the file name - # We pass the confirmation non-interactively using YES or mock prompt if needed, - # but backup-database.sh reads YES/RESTORE. Let's make it easy: - log "Restoring DB contents... (You will need to type 'RESTORE' if prompted)" + log "Restoring DB contents..." bash "${SCRIPT_DIR}/backup-database.sh" --restore "${db_archive_filename}" - # Clean up extracted folders + # Clean up extracted temporary folder rm -rf "${PROJECT_ROOT}/db_backup" # Restart app to clear connection caches @@ -267,10 +331,7 @@ perform_restore() { log_success "==========================================================================" log_success "DISASTER RECOVERY SYSTEM RESTORE COMPLETE!" log_success "==========================================================================" - log "Your application has been restored and started." - log "Next Steps:" - log "1. Verify the service is online: curl http://localhost:8000/health" - log "2. Update your Nginx Proxy Manager (NPM) domains to point to this server's IP." + log "Your application and reverse-proxy routes are restored." echo "" } @@ -286,9 +347,9 @@ main() { perform_restore "${2:-}" ;; *) - echo "Wordly Disaster Recovery Utility" + echo "Wordly Disaster Recovery Utility (V2)" echo "Usage:" - echo " $0 --backup # Package and copy configs + database to NAS" + echo " $0 --backup # Package configs, db dump, NPM configurations, and export" echo " $0 --restore # Extract and restore full stack on new machine" exit 1 ;;