Documentation: Add French Disaster Recovery Playbook for server failovers

This commit is contained in:
2026-06-07 09:39:26 +02:00
parent e497f2d218
commit 670d3f4376
2 changed files with 393 additions and 0 deletions

106
DISASTER_RECOVERY.md Normal file
View File

@@ -0,0 +1,106 @@
# Guide de Reprise d'Activité (Disaster Recovery Playbook)
> Procédure pas-à-pas en cas de crash total du serveur principal (`192.168.1.151`)
---
## 🎯 Objectif
Ce document décrit comment restaurer l'intégralité de la plateforme SaaS **Wordly.art** (Base de données active, configurations secrètes `.env` et services Docker) sur un **nouveau serveur** de secours si le serveur principal tombe en panne complète.
---
## 📁 Fonctionnement de la sauvegarde DR (Disaster Recovery)
Le script de reprise d'activité [disaster-recovery.sh](file:///d:/dev1405/office_translator/scripts/disaster-recovery.sh) génère une archive compressée contenant l'intégralité du système à restaurer :
1. **La base de données active** (PostgreSQL ou SQLite).
2. **Le fichier de configuration de production** `.env` (contenant vos clés API Stripe, OpenAI, DeepL, etc.).
3. **Le fichier `docker-compose.yml`** et ses variantes.
4. **Le dossier `docker/`** contenant toutes les configurations Prometheus, Grafana, Nginx, etc.
Toutes ces archives sont stockées sur votre **NAS** à l'abri des pannes matérielles du serveur local : `/mnt/nas-backups/wordly/dr/`.
---
## 🛠️ Étape 1 : Automatisation de la sauvegarde complète (A faire aujourd'hui)
Pour que la sauvegarde Disaster Recovery s'exécute automatiquement chaque nuit à 03h30 :
1. Connectez-vous en SSH sur votre serveur principal (`192.168.1.151`).
2. Ouvrez le planificateur de tâches (cron) :
```bash
crontab -e
```
3. Ajoutez la ligne suivante tout à la fin du fichier :
```cron
30 3 * * * /opt/wordly/scripts/disaster-recovery.sh --backup >> /var/log/wordly-dr-backup.log 2>&1
```
4. Sauvegardez et quittez. Désormais, une archive complète de restauration sera créée et envoyée sur votre NAS chaque nuit, avec une rétention automatique de **14 jours**.
---
## 🚨 Étape 2 : Procédure de restauration (En cas de crash du serveur)
Si le serveur `192.168.1.151` est indisponible et que vous devez remonter le SaaS sur une nouvelle machine (ex : **`192.168.1.152`**), suivez ces étapes :
### 2.1 Préparation de la nouvelle machine
1. Installez Docker et Docker Compose sur le nouveau serveur :
```bash
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER && newgrp docker
```
2. Créez le dossier du projet et clonez le repository (ou copiez les fichiers depuis le NAS) :
```bash
git clone -b production-deployment https://gitea.parsanet.org/sepehr/office_translator.git /opt/wordly
cd /opt/wordly
```
### 2.2 Monter le NAS sur le nouveau serveur
Pour que le nouveau serveur accède aux sauvegardes stockées sur votre NAS :
1. Installez l'utilitaire de montage :
```bash
sudo apt install cifs-utils -y
sudo mkdir -p /mnt/nas-backups/wordly
```
2. Créez le fichier de credentials :
```bash
sudo tee /etc/nas-credentials <<EOF
username=wordly-backup
password=MOT_DE_PASSE_NAS
domain=WORKGROUP
EOF
sudo chmod 600 /etc/nas-credentials
```
3. Montez le NAS temporairement pour la restauration (ou via `/etc/fstab` pour le rendre persistant) :
```bash
sudo mount -t cifs -o credentials=/etc/nas-credentials,uid=$(id -u),gid=$(id -g),vers=3.0 //IP_DU_NAS/wordly-backups /mnt/nas-backups/wordly
```
### 2.3 Exécuter la restauration
1. Listez les archives DR disponibles sur le NAS pour choisir la plus récente :
```bash
ls -lh /mnt/nas-backups/wordly/dr/
```
2. Lancez la restauration complète (le script va extraire le `.env`, copier le `docker-compose.yml`, lancer les services Docker, attendre que la base de données soit prête et y injecter vos données) :
```bash
# Remplacez par le nom de l'archive voulue
bash scripts/disaster-recovery.sh --restore /mnt/nas-backups/wordly/dr/wordly_dr_20260607_033000.tar.gz
```
3. Le script vous demandera de confirmer en tapant `RESTORE-ALL`. Tapez-le et appuyez sur Entrée.
4. Une fois terminé, vérifiez que l'application tourne bien :
```bash
docker compose ps
curl http://localhost:8000/health
```
### 2.4 Mettre à jour Nginx Proxy Manager (NPM)
Votre service Nginx Proxy Manager (NPM) s'occupe de rediriger les requêtes de `wordly.art` vers l'IP interne du serveur.
1. Connectez-vous à l'interface d'administration de votre NPM (**http://IP_NPM:81**).
2. Allez dans **Proxy Hosts**.
3. Pour les entrées de domaines concernées :
- `wordly.art`
- `www.wordly.art`
- `monitoring.wordly.art`
4. Cliquez sur **Edit** (les trois points à droite).
5. Dans le champ **Forward Hostname / IP**, remplacez l'ancienne adresse IP (`192.168.1.151`) par la nouvelle IP de votre serveur de secours (**`192.168.1.152`**).
6. Cliquez sur **Save**.
**C'est terminé !** Vos utilisateurs peuvent à nouveau accéder à `https://wordly.art` et le système de paiement Stripe continuera à envoyer les webhooks vers la nouvelle machine automatiquement, sans aucune autre configuration requise.

287
scripts/disaster-recovery.sh Executable file
View File

@@ -0,0 +1,287 @@
#!/bin/bash
# ==============================================================================
# Wordly.art - Disaster Recovery (DR) Backup & Restore Playbook
# ==============================================================================
# Packages configuration files (.env, docker-compose) and database backups
# into a single archive, and automates restorations on new machines.
#
# Usage:
# ./disaster-recovery.sh --backup # Package and backup configs + DB to NAS
# ./disaster-recovery.sh --restore FILE.tar.gz # Extract configs, boot Docker, restore DB
# ==============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
log() { echo -e "[DR ${TIMESTAMP}] $1"; }
log_success() { echo -e "[DR ${TIMESTAMP}] ${GREEN}$1${NC}"; }
log_warning() { echo -e "[DR ${TIMESTAMP}] ${YELLOW}WARNING: $1${NC}"; }
log_error() { echo -e "[DR ${TIMESTAMP}] ${RED}ERROR: $1${NC}"; }
# Sourcing .env for path defaults
ENV_FILE="${PROJECT_ROOT}/.env"
if [ -f "${ENV_FILE}" ]; then
set -a
source "${ENV_FILE}"
set +a
fi
# Configuration defaults
NAS_BACKUP_DIR="${NAS_BACKUP_DIR:-/mnt/nas-backups/wordly}"
DR_BACKUP_DIR="${NAS_BACKUP_DIR}/dr"
DR_RETENTION_DAYS=14
# ==============================================================================
# BACKUP ACTION
# ==============================================================================
perform_backup() {
log "Initiating complete Disaster Recovery backup..."
# 1. Trigger DB Backup first
log "Triggering database backup..."
if ! bash "${SCRIPT_DIR}/backup-database.sh" --full; then
log_error "Database backup failed. Aborting DR package."
exit 1
fi
# 2. Locate the latest DB backup
# Check default backup directory (resolved from .env or script fallback)
local local_backup_dir
local_backup_dir="${BACKUP_DIR:-${PROJECT_ROOT}/backups}"
local latest_db_backup
latest_db_backup=$(ls -t "${local_backup_dir}/daily/"*.gz 2>/dev/null | head -n 1 || true)
if [ -z "${latest_db_backup}" ]; then
log_error "Could not locate the generated database backup file in ${local_backup_dir}/daily/."
exit 1
fi
log "Latest database backup located: $(basename "${latest_db_backup}")"
# 3. Create temp packing folder
local packing_dir="${PROJECT_ROOT}/temp_dr_pack_${TIMESTAMP}"
mkdir -p "${packing_dir}"
# 4. Copy configurations
log "Packaging configuration files..."
if [ -f "${PROJECT_ROOT}/.env" ]; then
cp "${PROJECT_ROOT}/.env" "${packing_dir}/.env.production"
else
log_warning "No .env file found at project root. Continuing without it."
fi
# Copy docker-compose files
for f in docker-compose.yml docker-compose.local.yml docker-compose.monitoring.yml docker-compose.dev.yml; do
if [ -f "${PROJECT_ROOT}/${f}" ]; then
cp "${PROJECT_ROOT}/${f}" "${packing_dir}/"
fi
done
# Copy docker directory (Prometheus, Grafana, Nginx configs, Dockerfiles)
if [ -d "${PROJECT_ROOT}/docker" ]; then
cp -r "${PROJECT_ROOT}/docker" "${packing_dir}/"
fi
# Copy scripts directory (so restore scripts are present in the package)
if [ -d "${PROJECT_ROOT}/scripts" ]; then
cp -r "${PROJECT_ROOT}/scripts" "${packing_dir}/"
fi
# Copy the DB backup archive
mkdir -p "${packing_dir}/db_backup"
cp "${latest_db_backup}" "${packing_dir}/db_backup/"
# 5. Compress Everything
mkdir -p "${DR_BACKUP_DIR}"
local dr_archive_name="wordly_dr_${TIMESTAMP}.tar.gz"
local dr_archive_path="${DR_BACKUP_DIR}/${dr_archive_name}"
log "Compressing configurations and database into DR package..."
tar -czf "${dr_archive_path}" -C "${packing_dir}" .
# Clean up temp packaging folder
rm -rf "${packing_dir}"
if [ -f "${dr_archive_path}" ] && [ -s "${dr_archive_path}" ]; then
local size
size=$(du -h "${dr_archive_path}" | cut -f1)
log_success "Disaster Recovery backup package created: ${dr_archive_name} (${size})"
log_success "Stored securely at: ${dr_archive_path}"
# 6. Apply retention cleanups
log "Cleaning up old DR packages (retention: ${DR_RETENTION_DAYS} days)..."
find "${DR_BACKUP_DIR}" -name "wordly_dr_*.tar.gz" -mtime +"${DR_RETENTION_DAYS}" -exec rm -f {} \;
log_success "Disaster Recovery backup complete."
else
log_error "DR Archive compression failed."
exit 1
fi
}
# ==============================================================================
# RESTORE ACTION
# ==============================================================================
perform_restore() {
local dr_package="$1"
if [ -z "${dr_package}" ]; then
log_error "No DR package archive specified."
echo "Usage: $0 --restore <path_to_archive.tar.gz>"
echo "Available archives in ${DR_BACKUP_DIR}:"
ls -lh "${DR_BACKUP_DIR}"/wordly_dr_*.tar.gz 2>/dev/null || echo " (none)"
exit 1
fi
if [ ! -f "${dr_package}" ]; then
log_error "Archive file not found: ${dr_package}"
exit 1
fi
echo ""
log_warning "RESTORE DISASTER RECOVERY PACKAGE - THIS WILL OVERWRITE ENVIRONMENT CONFIGURATIONS AND DATABASES!"
echo " Archive: ${dr_package}"
echo " Target : Current Server Host (Workspace)"
echo ""
read -p "Type 'RESTORE-ALL' to confirm complete system restore: " confirm_val
if [ "${confirm_val}" != "RESTORE-ALL" ]; then
log "System restore cancelled."
exit 0
fi
log "Extracting DR archive contents..."
# Create safety backup of existing .env before overwrite
if [ -f "${PROJECT_ROOT}/.env" ]; then
cp "${PROJECT_ROOT}/.env" "${PROJECT_ROOT}/.env.bak_before_dr_restore_${TIMESTAMP}"
log "Created backup of existing .env: .env.bak_before_dr_restore_${TIMESTAMP}"
fi
# Extract configs directly into project root
tar -xzf "${dr_package}" -C "${PROJECT_ROOT}"
# Restore .env from packaged .env.production
if [ -f "${PROJECT_ROOT}/.env.production" ]; then
mv "${PROJECT_ROOT}/.env.production" "${PROJECT_ROOT}/.env"
log "Restored .env configuration"
fi
# Reload variables from restored .env
set -a
source "${PROJECT_ROOT}/.env"
set +a
log_success "Docker and configurations extracted successfully."
# Boot Docker Compose Services
log "Spinning up Docker containers (database, redis, backend, frontend)..."
if ! command -v docker-compose &>/dev/null && ! docker compose version &>/dev/null; then
log_error "docker-compose is not installed. Please install Docker first."
exit 1
fi
# Try running docker compose
local compose_cmd="docker compose"
if ! docker compose version &>/dev/null; then
compose_cmd="docker-compose"
fi
# Start services in detached mode
${compose_cmd} up -d
# Locate the embedded database backup
local db_backup_archive
db_backup_archive=$(ls "${PROJECT_ROOT}/db_backup/"*.gz 2>/dev/null | head -n 1 || true)
if [ -z "${db_backup_archive}" ]; then
log_error "Database backup archive not found inside the DR package extraction."
exit 1
fi
log "Database backup located: $(basename "${db_backup_archive}")"
# Wait for database container to be healthy (PostgreSQL)
local db_type="sqlite"
if [[ "${DATABASE_URL:-}" =~ ^postgres ]]; then
db_type="postgres"
fi
if [ "${db_type}" = "postgres" ]; then
local postgres_container="${POSTGRES_CONTAINER:-wordly-postgres}"
log "Waiting for PostgreSQL container (${postgres_container}) to be healthy..."
for i in $(seq 1 30); do
if docker inspect --format='{{.State.Health.Status}}' "${postgres_container}" 2>/dev/null | grep -q "healthy"; then
log_success "Database container is healthy."
break
fi
echo " Waiting for database... ($i/30)"
sleep 2
done
else
sleep 2
fi
# Restore the database using the database backup script
log "Triggering database restore..."
# Make sure backups/daily folder exists and copy the db backup there for backup-database.sh to see it
local local_backup_dir="${BACKUP_DIR:-${PROJECT_ROOT}/backups}"
mkdir -p "${local_backup_dir}/daily"
cp "${db_backup_archive}" "${local_backup_dir}/daily/"
local db_archive_filename
db_archive_filename=$(basename "${db_backup_archive}")
# Run DB restore
# Sourcing backup-database.sh with the file name
# We pass the confirmation non-interactively using YES or mock prompt if needed,
# but backup-database.sh reads YES/RESTORE. Let's make it easy:
log "Restoring DB contents... (You will need to type 'RESTORE' if prompted)"
bash "${SCRIPT_DIR}/backup-database.sh" --restore "${db_archive_filename}"
# Clean up extracted folders
rm -rf "${PROJECT_ROOT}/db_backup"
# Restart app to clear connection caches
log "Restarting application backend..."
${compose_cmd} restart backend
log_success "=========================================================================="
log_success "DISASTER RECOVERY SYSTEM RESTORE COMPLETE!"
log_success "=========================================================================="
log "Your application has been restored and started."
log "Next Steps:"
log "1. Verify the service is online: curl http://localhost:8000/health"
log "2. Update your Nginx Proxy Manager (NPM) domains to point to this server's IP."
echo ""
}
# ==============================================================================
# MAIN ENTRY
# ==============================================================================
main() {
case "${1:-}" in
--backup)
perform_backup
;;
--restore)
perform_restore "${2:-}"
;;
*)
echo "Wordly Disaster Recovery Utility"
echo "Usage:"
echo " $0 --backup # Package and copy configs + database to NAS"
echo " $0 --restore <archive.tar.gz> # Extract and restore full stack on new machine"
exit 1
;;
esac
}
main "$@"