feat: add NAS backup, verification, and DR scripts
Some checks failed
Deploy to Production / Build and Deploy (push) Has been cancelled

This commit is contained in:
2026-06-07 11:12:01 +02:00
parent fb6740f333
commit 3f980ad537
7 changed files with 1789 additions and 429 deletions

View File

@@ -1,102 +1,319 @@
# Playbook de Sauvegarde Complète & Reprise d'Activité (Disaster Recovery)
> Gestion des pannes matérielles, sauvegarde de Nginx Proxy Manager (NPM) et transfert distant (sans NAS).
# Disaster Recovery — Wordly.art
## Guide opérationnel complet
---
## 🎯 Objectif
Ce document explique comment automatiser la sauvegarde et restaurer l'intégralité de la plateforme SaaS **Wordly.art** (Base de données, fichier de configuration `.env` contenant vos secrets, et configurations de routage SSL/Proxy de **Nginx Proxy Manager**) sur un nouveau serveur en cas de crash du serveur principal.
## Architecture
---
## ⚙️ 1. Variables de configuration dans le `.env`
Pour activer les options de reprise d'activité, ajoutez ces variables dans votre fichier `.env` de production :
```ini
# ============== Configuration Disaster Recovery (DR) ==============
# Choix de la destination : LOCAL, NAS, ou SCP
BACKUP_DEST_TYPE=LOCAL
# Chemin local ou point de montage (ex: /mnt/nas-backups/wordly)
BACKUP_DEST_PATH=/var/backups/wordly
# Configuration SSH/SCP (requis uniquement si BACKUP_DEST_TYPE=SCP)
SCP_HOST=192.168.1.200
SCP_USER=backup_user
SCP_KEY_PATH=/root/.ssh/id_rsa
SCP_PORT=22
SCP_DEST_PATH=/var/backups/wordly_saas
# Configurations des dossiers de Nginx Proxy Manager (NPM)
# Laissez vide si NPM tourne sur une autre machine et n'est pas géré ici.
NPM_DATA_DIR=/opt/npm/data
NPM_LETSENCRYPT_DIR=/opt/npm/letsencrypt
```
[ Internet ]
▼ (80/443)
┌────────────────────────────────────┐
│ NPM dédié : 192.168.1.184 │ ← STABLE (ne tombe pas)
│ Interface admin : :81 │
└────────────┬───────────────────────┘
│ Forward Hostname → IP du serveur actif
┌────────────────────────────────────┐
│ Serveur APP : 192.168.1.151 │ ← PEUT CRASHER
│ Docker: postgres, redis, │
│ backend:8001, frontend:3000│
└────────────┬───────────────────────┘
│ rsync SSH toutes les 6h (cron)
┌────────────────────────────────────────────────┐
│ NAS Synology : 192.168.1.146 │ ← SOURCE DE VÉRITÉ
│ Chemin réel : /volume1/backups/wordly │
│ Accès : SSH key (wordly-backup@nas) │
│ Pas de montage CIFS — rsync direct │
└────────────┬───────────────────────────────────┘
│ (en cas de crash de .151)
┌────────────────────────────────────┐
│ Serveur SECOURS : 192.168.1.98 │ ← Docker déjà installé
│ Restaure via rsync SSH depuis NAS│
│ → NPM redirigé automatiquement │
└────────────────────────────────────┘
```
**Pourquoi rsync SSH et pas CIFS/SMB ?**
- Pas de montage à gérer, pas de `/etc/fstab` à configurer
- Fonctionne même si le NAS redémarre (pas de montage stale)
- Chemin exact `/volume1/backups/wordly` utilisable directement
- SSH chiffré, clé sans mot de passe pour l'automatisation
---
## 🛠️ 2. Comment configurer la sauvegarde à distance (Mode SCP)
## RPO / RTO
Si vous n'avez pas de NAS, le mode **SCP** permet d'envoyer chaque nuit l'archive complète vers une autre machine ou ordinateur de votre réseau local (ex: `192.168.1.200`).
| Scénario | Données perdues max | Temps de remise en route | Procédure |
|----------|--------------------|--------------------------|-|
| Container crashe | 0 | ~30s | Autorestart Docker |
| Process PostgreSQL crashe | 05s | ~1 min | Autorestart + WAL |
| Corruption DB partielle | 06h | ~5 min | Restore depuis NAS |
| Serveur .151 mort | 06h | **~25 min** | Restore NAS sur .98 + NPM auto |
| Erreur humaine (DROP) | 06h | ~5 min | Restore snapshot précédent |
---
## Ce qui est sauvegardé
| Composant | Sauvegardé | Fréquence |
|-----------|-----------|-----------|
| PostgreSQL `pg_dump` | ✅ | Toutes les 6h |
| `.env` (secrets, clés API, Stripe...) | ✅ | Dans chaque archive DR |
| `docker-compose.yml` | ✅ | Dans chaque archive DR |
| Dossier `docker/` (configs) | ✅ | Dans chaque archive DR |
| Redis | ❌ | Cache — sessions perdues à la restore (reconnexion users) |
| Config NPM | ❌ | NPM sur .184 (stable). Seul Forward Host change via API. |
| Métriques Prometheus | ❌ | Non critique, repart de zéro |
---
## 1. SETUP INITIAL (une seule fois sur .151)
### Étape 1 : Créer le compte sur le NAS Synology
**Connectez-vous à l'interface DSM : `http://192.168.1.146:5000`**
#### 1a. Créer l'utilisateur dédié
```
DSM → Panneau de configuration → Utilisateurs et groupes → Créer
Nom d'utilisateur : wordly-backup
Mot de passe : [choisissez un mot de passe fort]
☑ L'utilisateur ne peut pas changer son mot de passe
→ Suivant
Permissions sur les dossiers partagés :
backups → ☑ Lecture/Écriture
→ Suivant → Terminer
```
#### 1b. Activer SSH sur le NAS
```
DSM → Panneau de configuration → Terminal et SNMP
☑ Activer le service SSH
Port : 22 (ou autre si vous avez changé)
→ Appliquer
```
#### 1c. Créer le dossier wordly sur le NAS
### Étape A : Générer une clé SSH sur le serveur principal
Sur le serveur applicatif (`192.168.1.151`), si vous n'avez pas de clé SSH :
```bash
sudo ssh-keygen -t rsa -b 4096 -N "" -f /root/.ssh/id_rsa
```
### Étape B : Autoriser la connexion sur la machine de backup
Copiez la clé publique sur votre machine de sauvegarde (`192.168.1.200`) :
```bash
sudo ssh-copy-id -i /root/.ssh/id_rsa.pub backup_user@192.168.1.200
```
*Vérification* : Exécutez `sudo ssh -i /root/.ssh/id_rsa backup_user@192.168.1.200` depuis le serveur principal. Vous devez vous connecter **sans saisir de mot de passe**.
---
## 📅 3. Automatisation quotidienne
Ajoutez le script à votre crontab pour qu'il s'exécute automatiquement chaque nuit à 03h30 :
```bash
sudo crontab -e
```
Ajoutez cette ligne tout à la fin :
```cron
30 3 * * * /opt/wordly/scripts/disaster-recovery.sh --backup >> /var/log/wordly-dr-backup.log 2>&1
# Depuis votre poste (ou n'importe quelle machine sur le réseau) :
ssh admin@192.168.1.146
mkdir -p /volume1/backups/wordly/snapshots
mkdir -p /volume1/backups/wordly/scripts
chown -R wordly-backup:users /volume1/backups/wordly
exit
```
---
## 🚨 4. Procédure de restauration sur un nouveau serveur (Failover)
### Étape 2 : Configurer les variables dans `.env` sur `.151`
Si le serveur principal crashe complètement et que vous devez remonter l'infrastructure sur un serveur de secours (ex: `192.168.1.152`) :
### Étape 4.1 : Récupérer l'archive de sauvegarde
Récupérez le dernier fichier `wordly_dr_TIMESTAMP.tar.gz` depuis votre stockage de backup (NAS, machine de backup distante via SCP, ou clé USB).
### Étape 4.2 : Installer Docker sur le nouveau serveur
```bash
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER && newgrp docker
# ── NAS SSH ───────────────────────────────────
NAS_HOST=192.168.1.146
NAS_USER=wordly-backup
NAS_PATH=/volume1/backups/wordly
NAS_SSH_PORT=22
NAS_SSH_KEY=/root/.ssh/wordly_nas_key
# ── Alertes Telegram ──────────────────────────
TELEGRAM_BOT_TOKEN= # Voir section "Créer un bot Telegram" ci-dessous
TELEGRAM_CHAT_ID= # Votre chat ID personnel
# ── NPM Failover API ──────────────────────────
NPM_API_URL=http://192.168.1.184:81/api
NPM_ADMIN_EMAIL=admin@wordly.art
NPM_ADMIN_PASSWORD=VotreMotDePasseNPM
NPM_PROXY_HOST_DOMAIN=wordly.art
# ── Rétention ────────────────────────────────
DAILY_RETENTION=7
WEEKLY_RETENTION=4
MONTHLY_RETENTION=6
DR_RETENTION_DAYS=30
```
### Étape 4.3 : Lancer la restauration automatique
1. Créez le dossier de destination et placez-vous dedans :
```bash
sudo mkdir -p /opt/wordly
cd /opt/wordly
```
2. Lancez le script de restauration à partir de l'archive (le script va extraire le `.env`, copier le `docker-compose.yml`, restaurer les configurations et certificats SSL de NPM, démarrer Docker et réinjecter les données de la base de données) :
```bash
# Remplacez par le nom ou le chemin exact de votre archive
bash /chemin/vers/votre/archive/scripts/disaster-recovery.sh --restore /chemin/vers/votre/archive/wordly_dr_20260607_033000.tar.gz
```
3. Validez l'action en saisissant `RESTORE-ALL` lorsque le script vous le demande.
---
### Étape 4.4 : Redirection du trafic réseau
Puisque le serveur a changé d'adresse IP (de `192.168.1.151` à `192.168.1.152`) :
### Étape 3 : Créer un bot Telegram (5 minutes)
#### Cas A : Si NPM tournait sur le serveur qui a crashé
Le script a restauré NPM sur la nouvelle machine. Vous devez simplement aller sur le routeur de votre box internet et modifier la redirection des ports **80** et **443** (Port Forwarding) pour qu'ils pointent vers la nouvelle IP `192.168.1.152` au lieu de `192.168.1.151`.
1. Ouvrir Telegram → chercher **@BotFather**
2. Envoyer `/newbot`
3. Nom : `Wordly Monitoring` / Username : `wordly_monitor_bot`
4. Copier le token → `TELEGRAM_BOT_TOKEN`
5. Envoyer un message à votre bot
6. Aller sur `https://api.telegram.org/bot<TOKEN>/getUpdates`
7. Copier le `chat.id``TELEGRAM_CHAT_ID`
#### Cas B : Si NPM tourne sur une machine externe dédiée
Connectez-vous à l'interface web de votre NPM (http://IP_NPM:81), modifiez les Proxy Hosts de `wordly.art` et changez le champ **Forward Hostname/IP** pour remplacer `192.168.1.151` par la nouvelle IP `192.168.1.152`.
---
### Étape 4 : Configurer SSH sans mot de passe vers le NAS
```bash
# Sur le serveur .151 (en root)
sudo bash scripts/setup-nas.sh
```
Ce script :
- Génère une clé SSH dédiée : `/root/.ssh/wordly_nas_key`
- La copie sur le NAS (**mot de passe demandé une seule fois**)
- Teste la connexion sans mot de passe
- Crée la structure de dossiers sur le NAS
- Configure `~/.ssh/config` avec l'alias `wordly-nas`
- Copie les scripts sur le NAS (disponibles depuis `.98` pour la restauration)
---
### Étape 5 : Tester le premier backup
```bash
bash scripts/backup-to-nas.sh --full
# Vérifier que l'archive est bien arrivée sur le NAS
bash scripts/backup-to-nas.sh --list
```
---
### Étape 6 : Tester la vérification automatique
```bash
bash scripts/verify-backups.sh
```
---
### Étape 7 : Tester le failover NPM (sans rien changer)
```bash
bash scripts/npm-failover.sh --dry-run --target-ip 192.168.1.98
```
---
### Étape 8 : Activer les crons
```bash
bash scripts/install-crontab.sh
crontab -l # Vérifier
```
---
## 2. VÉRIFICATION QUOTIDIENNE (automatique)
```
0 */6 * * * backup-to-nas.sh → Snapshot DB + archive → NAS via rsync SSH
30 */6 * * * verify-backups.sh → 8 vérifications + alerte Telegram si erreur
```
---
## 3. RESTAURATION D'URGENCE (quand .151 est mort)
> **Durée estimée : 2025 minutes**
### Sur le serveur de secours `192.168.1.98`
```bash
# 1. Installer les prérequis (Docker déjà installé)
apt-get install -y rsync jq
# 2. Récupérer la clé SSH depuis le NAS (ou depuis une autre source sécurisée)
# Option A : copier la clé depuis un endroit sûr (gestionnaire de mots de passe, etc.)
mkdir -p /root/.ssh && chmod 700 /root/.ssh
# collez le contenu de /root/.ssh/wordly_nas_key ici
nano /root/.ssh/wordly_nas_key
chmod 600 /root/.ssh/wordly_nas_key
# 3. Tester la connexion NAS
ssh -i /root/.ssh/wordly_nas_key wordly-backup@192.168.1.146 "echo OK"
# 4. Voir les archives disponibles
ssh -i /root/.ssh/wordly_nas_key wordly-backup@192.168.1.146 \
"ls -lht /volume1/backups/wordly/snapshots/ | head -10"
# 5. Télécharger la dernière archive depuis le NAS
rsync -az \
-e "ssh -i /root/.ssh/wordly_nas_key" \
wordly-backup@192.168.1.146:/volume1/backups/wordly/snapshots/wordly_dr_TIMESTAMP.tar.gz \
/tmp/
# 6. Télécharger les scripts de restauration depuis le NAS
rsync -az \
-e "ssh -i /root/.ssh/wordly_nas_key" \
wordly-backup@192.168.1.146:/volume1/backups/wordly/scripts/ \
/opt/wordly/scripts/
# 7. Lancer la restauration complète
bash /opt/wordly/scripts/disaster-recovery.sh \
--restore /tmp/wordly_dr_TIMESTAMP.tar.gz
```
**Le script fait automatiquement :**
1. Extrait `.env`, `docker-compose.yml`, configs Docker
2. Lance tous les containers Docker
3. Attend que PostgreSQL soit healthy
4. Restaure le dump SQL
5. Health check sur `http://localhost:8001/health` (max 180s)
6. **Si OK → appelle NPM API → bascule le trafic vers `192.168.1.98`**
7. **Alerte Telegram : "✅ Wordly.art DR COMPLET"**
**Si NPM failover automatique échoue (dernier recours) :**
```
http://192.168.1.184:81 → Proxy Hosts → wordly.art → Edit
Forward Hostname : 192.168.1.98
→ Save
# Changement immédiat, 0 redémarrage nécessaire
```
---
## 4. CONSERVATION DE LA CLÉ SSH NAS
> [!IMPORTANT]
> La clé `/root/.ssh/wordly_nas_key` est **critique** pour la restauration depuis `.98`.
> Conservez-la dans au minimum 2 endroits sécurisés :
> - Gestionnaire de mots de passe (Bitwarden, 1Password, etc.)
> - Coffre-fort KeePass chiffré sur un support physique
>
> Sans cette clé, vous ne pouvez pas accéder aux archives sur le NAS depuis `.98`.
---
## 5. SCRIPTS DE RÉFÉRENCE
| Script | Usage | Déclenchement |
|--------|-------|---------------|
| `setup-nas.sh` | Configure SSH → NAS, génère clé, copie scripts | **Once** (root requis) |
| `backup-to-nas.sh` | pg_dump + archive DR → NAS via rsync SSH | Cron toutes les 6h |
| `backup-to-nas.sh --list` | Lister les archives disponibles sur le NAS | Manuel |
| `verify-backups.sh` | 8 checks intégrité + Telegram | Cron toutes les 6h+30m |
| `disaster-recovery.sh --backup` | Archive DR → NAS | Inclus dans backup-to-nas |
| `disaster-recovery.sh --restore <archive>` | Restauration complète | **Urgence** |
| `npm-failover.sh --target-ip <IP>` | Bascule NPM vers une IP | Appelé automatiquement |
| `npm-failover.sh --dry-run --target-ip <IP>` | Test sans modifier NPM | Test initial |
| `install-crontab.sh` | Installe les crons | **Once** |
---
## 6. LOGS
```bash
# Logs backup (sur .151)
tail -f /var/log/wordly-backup.log
# Logs vérification (sur .151)
tail -f /var/log/wordly-verify.log
# Logs Docker (sur le serveur actif)
docker compose logs -f backend
docker compose logs -f postgres
```

View File

@@ -1,287 +1,356 @@
#!/bin/bash
# ============================================
# Wordly.art - PostgreSQL Backup to NAS
# ============================================
# CRON: Run daily at 03:00
# 0 3 * * * /opt/wordly/scripts/backup-to-nas.sh >> /var/log/wordly-backup.log 2>&1
# ==============================================================================
# Wordly.art - PostgreSQL Backup vers NAS Synology via SSH/rsync
# ==============================================================================
# Sauvegarde la base PostgreSQL et l'archive DR sur le NAS via SSH/rsync.
# Pas de montage CIFS — rsync SSH direct sur /volume1/backups/wordly.
#
# Usage:
# ./backup-to-nas.sh # Default: daily backup
# ./backup-to-nas.sh --full # Full backup with upload cleanup
# ./backup-to-nas.sh --restore FILE # Restore from specific backup
# ============================================
# CRON (installé par install-crontab.sh) :
# 0 */6 * * * bash /opt/wordly/scripts/backup-to-nas.sh >> /var/log/wordly-backup.log 2>&1
#
# Usage :
# ./backup-to-nas.sh # Backup complet → NAS
# ./backup-to-nas.sh --full # Identique (alias explicite)
# ./backup-to-nas.sh --list # Lister les archives disponibles sur le NAS
# ==============================================================================
set -euo pipefail
# ===========================================
# CONFIGURATION - MODIFY THESE VALUES
# ===========================================
# NAS settings (SMB/CIFS or NFS mount point)
NAS_BACKUP_DIR="/mnt/nas-backups/wordly"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# Docker container name for PostgreSQL
POSTGRES_CONTAINER="wordly-postgres"
POSTGRES_USER="translate"
POSTGRES_DB="translate_db"
POSTGRES_PASSWORD="yLLgkEvt6mvzGDdoqtQvI1vEgMmR-W75ZTPW5StaIAU"
# ==============================================================================
# CHARGER LE .env
# ==============================================================================
ENV_FILE="${PROJECT_ROOT}/.env"
if [ -f "${ENV_FILE}" ]; then
set -a
source "${ENV_FILE}"
set +a
else
echo "ERROR: .env introuvable : ${ENV_FILE}" >&2
exit 1
fi
# Backup retention
DAILY_RETENTION=7 # Keep 7 daily backups
WEEKLY_RETENTION=4 # Keep 4 weekly backups
MONTHLY_RETENTION=6 # Keep 6 monthly backups
# ==============================================================================
# CONFIGURATION (depuis .env)
# ==============================================================================
# Notification (optional - leave empty to disable)
NOTIFICATION_WEBHOOK="" # Slack/Discord webhook URL
# NAS SSH
NAS_HOST="${NAS_HOST:-192.168.1.146}"
NAS_USER="${NAS_USER:-wordly-backup}"
NAS_PATH="${NAS_PATH:-/volume1/backups/wordly}"
NAS_SSH_PORT="${NAS_SSH_PORT:-22}"
NAS_SSH_KEY="${NAS_SSH_KEY:-/root/.ssh/wordly_nas_key}"
# ===========================================
# PostgreSQL
POSTGRES_CONTAINER="${POSTGRES_CONTAINER:-wordly-postgres}"
POSTGRES_USER="${POSTGRES_USER:-translate}"
POSTGRES_DB="${POSTGRES_DB:-translate_db}"
POSTGRES_PASSWORD="${POSTGRES_PASSWORD:?POSTGRES_PASSWORD doit être défini dans .env}"
# Rétention sur le NAS (nombre d'archives à garder)
DAILY_RETENTION=${DAILY_RETENTION:-7}
WEEKLY_RETENTION=${WEEKLY_RETENTION:-4}
MONTHLY_RETENTION=${MONTHLY_RETENTION:-6}
# Telegram
TELEGRAM_BOT_TOKEN="${TELEGRAM_BOT_TOKEN:-}"
TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}"
# ==============================================================================
# INTERNALS
# ===========================================
# ==============================================================================
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
DATE_ONLY=$(date +"%Y-%m-%d")
DAY_OF_WEEK=$(date +"%u") # 1=Mon, 7=Sun
DAY_OF_WEEK=$(date +"%u") # 1=Lun, 7=Dim
DAY_OF_MONTH=$(date +"%d")
BACKUP_NAME="wordly_db_${TIMESTAMP}.sql.gz"
BACKUP_PATH="${NAS_BACKUP_DIR}/${BACKUP_NAME}"
LOG_PREFIX="[Wordly Backup ${TIMESTAMP}]"
SNAPSHOT_NAME="wordly_dr_${TIMESTAMP}.tar.gz"
LOCAL_TMP="/tmp/wordly_backup_${TIMESTAMP}"
SSH_CMD="ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes -o ConnectTimeout=10"
RSYNC_CMD="rsync -az -e 'ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes -o ConnectTimeout=10'"
# Colors for terminal output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
LOG_PREFIX="[Backup ${TIMESTAMP}]"
# ===========================================
# FUNCTIONS
# ===========================================
log() { echo "${LOG_PREFIX} $1"; }
log_success() { echo -e "${LOG_PREFIX} ${GREEN}$1${NC}"; }
log_error() { echo -e "${LOG_PREFIX} ${RED}❌ ERROR: $1${NC}"; }
log_warning() { echo -e "${LOG_PREFIX} ${YELLOW}⚠️ $1${NC}"; }
log() {
echo "${LOG_PREFIX} $1"
}
log_success() {
echo -e "${LOG_PREFIX} ${GREEN}$1${NC}"
}
log_error() {
echo -e "${LOG_PREFIX} ${RED}ERROR: $1${NC}"
}
log_warning() {
echo -e "${LOG_PREFIX} ${YELLOW}WARNING: $1${NC}"
}
send_notification() {
# ==============================================================================
# TELEGRAM
# ==============================================================================
send_telegram() {
local message="$1"
if [ -n "${NOTIFICATION_WEBHOOK}" ]; then
curl -s -X POST "${NOTIFICATION_WEBHOOK}" \
-H "Content-Type: application/json" \
-d "{\"text\": \"${message}\"}" > /dev/null 2>&1 || true
if [ -n "${TELEGRAM_BOT_TOKEN}" ] && [ -n "${TELEGRAM_CHAT_ID}" ]; then
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d "chat_id=${TELEGRAM_CHAT_ID}" \
-d "text=${message}" \
-d "parse_mode=Markdown" \
>/dev/null 2>&1 || true
fi
}
# ==============================================================================
# PRÉREQUIS
# ==============================================================================
check_prerequisites() {
# Check NAS mount
if [ ! -d "${NAS_BACKUP_DIR}" ]; then
log_error "NAS backup directory not found: ${NAS_BACKUP_DIR}"
log "Attempting to mount NAS..."
log "Vérification des prérequis..."
# Try to mount if configured via /etc/fstab
mount "${NAS_BACKUP_DIR}" 2>/dev/null || true
if [ ! -d "${NAS_BACKUP_DIR}" ]; then
log_error "Cannot mount NAS. Aborting."
send_notification "Wordly Backup FAILED: NAS not mounted at ${NAS_BACKUP_DIR}"
exit 1
fi
fi
# Check Docker
if ! docker ps --format '{{.Names}}' | grep -q "${POSTGRES_CONTAINER}"; then
log_error "PostgreSQL container '${POSTGRES_CONTAINER}' is not running."
send_notification "Wordly Backup FAILED: PostgreSQL container not running"
# Clé SSH
if [ ! -f "${NAS_SSH_KEY}" ]; then
log_error "Clé SSH introuvable : ${NAS_SSH_KEY}"
log_error "Lancez d'abord : sudo bash scripts/setup-nas.sh"
exit 1
fi
log_success "Prerequisites OK"
}
create_backup() {
log "Starting backup of '${POSTGRES_DB}'..."
# Create backup directory structure
mkdir -p "${NAS_BACKUP_DIR}/daily"
mkdir -p "${NAS_BACKUP_DIR}/weekly"
mkdir -p "${NAS_BACKUP_DIR}/monthly"
# Run pg_dump inside Docker container
docker exec "${POSTGRES_CONTAINER}" pg_dump \
-U "${POSTGRES_USER}" \
-d "${POSTGRES_DB}" \
--format=custom \
--compress=9 \
--no-owner \
--no-acl \
2>/dev/null | gzip > "${NAS_BACKUP_DIR}/daily/${BACKUP_NAME}"
local backup_size=$(du -h "${NAS_BACKUP_DIR}/daily/${BACKUP_NAME}" | cut -f1)
if [ -f "${NAS_BACKUP_DIR}/daily/${BACKUP_NAME}" ]; then
log_success "Backup created: ${BACKUP_NAME} (${backup_size})"
# Copy to weekly/monthly if applicable
if [ "${DAY_OF_WEEK}" = "7" ]; then
cp "${NAS_BACKUP_DIR}/daily/${BACKUP_NAME}" "${NAS_BACKUP_DIR}/weekly/"
log "Weekly backup copied"
fi
if [ "${DAY_OF_MONTH}" = "01" ]; then
cp "${NAS_BACKUP_DIR}/daily/${BACKUP_NAME}" "${NAS_BACKUP_DIR}/monthly/"
log "Monthly backup copied"
fi
send_notification "Wordly Backup SUCCESS: ${BACKUP_NAME} (${backup_size})"
else
log_error "Backup file was not created!"
send_notification "Wordly Backup FAILED: pg_dump produced no output"
# Connectivité SSH vers le NAS
if ! ${SSH_CMD} "${NAS_USER}@${NAS_HOST}" "echo OK" >/dev/null 2>&1; then
log_error "Impossible de se connecter au NAS ${NAS_HOST} via SSH."
log_error "Vérifiez : ssh -i ${NAS_SSH_KEY} ${NAS_USER}@${NAS_HOST}"
send_telegram "🚨 *Wordly Backup ÉCHOUÉ*
NAS inaccessible : ${NAS_HOST}
Date : $(date '+%Y-%m-%d %H:%M:%S')"
exit 1
fi
log_success "NAS SSH : OK"
# Docker + container PostgreSQL
if ! docker ps --format '{{.Names}}' 2>/dev/null | grep -q "^${POSTGRES_CONTAINER}$"; then
log_error "Container PostgreSQL '${POSTGRES_CONTAINER}' n'est pas en cours d'exécution !"
send_telegram "🚨 *Wordly Backup ÉCHOUÉ*
PostgreSQL container non trouvé
Date : $(date '+%Y-%m-%d %H:%M:%S')"
exit 1
fi
log_success "PostgreSQL container : OK"
}
cleanup_old_backups() {
log "Cleaning up old backups..."
# ==============================================================================
# BACKUP POSTGRESQL
# ==============================================================================
backup_postgres() {
log "Dump PostgreSQL de '${POSTGRES_DB}'..."
mkdir -p "${LOCAL_TMP}"
# Daily: keep last N days
local daily_count=$(ls -1 "${NAS_BACKUP_DIR}/daily/" 2>/dev/null | wc -l)
if [ "${daily_count}" -gt "${DAILY_RETENTION}" ]; then
ls -1t "${NAS_BACKUP_DIR}/daily/" | tail -n +$((DAILY_RETENTION + 1)) | while read -r f; do
rm -f "${NAS_BACKUP_DIR}/daily/${f}"
log " Deleted daily: ${f}"
done
fi
local dump_file="${LOCAL_TMP}/db_${TIMESTAMP}.dump.gz"
# Weekly: keep last N weeks
local weekly_count=$(ls -1 "${NAS_BACKUP_DIR}/weekly/" 2>/dev/null | wc -l)
if [ "${weekly_count}" -gt "${WEEKLY_RETENTION}" ]; then
ls -1t "${NAS_BACKUP_DIR}/weekly/" | tail -n +$((WEEKLY_RETENTION + 1)) | while read -r f; do
rm -f "${NAS_BACKUP_DIR}/weekly/${f}"
log " Deleted weekly: ${f}"
done
fi
# Monthly: keep last N months
local monthly_count=$(ls -1 "${NAS_BACKUP_DIR}/monthly/" 2>/dev/null | wc -l)
if [ "${monthly_count}" -gt "${MONTHLY_RETENTION}" ]; then
ls -1t "${NAS_BACKUP_DIR}/monthly/" | tail -n +$((MONTHLY_RETENTION + 1)) | while read -r f; do
rm -f "${NAS_BACKUP_DIR}/monthly/${f}"
log " Deleted monthly: ${f}"
done
fi
log_success "Cleanup done"
}
verify_backup() {
log "Verifying backup integrity..."
if gzip -t "${NAS_BACKUP_DIR}/daily/${BACKUP_NAME}" 2>/dev/null; then
log_success "Backup integrity OK"
else
log_error "Backup integrity check FAILED!"
send_notification "Wordly Backup WARNING: Integrity check failed for ${BACKUP_NAME}"
# Don't delete - let admin investigate
fi
}
restore_backup() {
local backup_file="$1"
if [ -z "${backup_file}" ]; then
log_error "Usage: $0 --restore <backup_file>"
echo ""
echo "Available backups:"
echo "=== Daily ==="
ls -lht "${NAS_BACKUP_DIR}/daily/" 2>/dev/null || echo " (none)"
echo "=== Weekly ==="
ls -lht "${NAS_BACKUP_DIR}/weekly/" 2>/dev/null || echo " (none)"
echo "=== Monthly ==="
ls -lht "${NAS_BACKUP_DIR}/monthly/" 2>/dev/null || echo " (none)"
if ! docker exec \
-e PGPASSWORD="${POSTGRES_PASSWORD}" \
"${POSTGRES_CONTAINER}" \
pg_dump \
-U "${POSTGRES_USER}" \
-d "${POSTGRES_DB}" \
--format=custom \
--no-owner \
--no-acl \
2>/dev/null | gzip > "${dump_file}"; then
log_error "pg_dump a échoué !"
send_telegram "🚨 *Wordly Backup ÉCHOUÉ*
pg_dump error sur ${POSTGRES_DB}
Date : $(date '+%Y-%m-%d %H:%M:%S')"
rm -rf "${LOCAL_TMP}"
exit 1
fi
# Find the file
local full_path=""
for dir in daily weekly monthly; do
if [ -f "${NAS_BACKUP_DIR}/${dir}/${backup_file}" ]; then
full_path="${NAS_BACKUP_DIR}/${dir}/${backup_file}"
break
fi
done
# Vérification taille
local size_bytes
size_bytes=$(stat -c %s "${dump_file}" 2>/dev/null || stat -f %z "${dump_file}")
local min_bytes=$((1024 * 1024)) # 1MB minimum
if [ -z "${full_path}" ]; then
log_error "Backup file not found: ${backup_file}"
if [ "${size_bytes}" -lt "${min_bytes}" ]; then
log_error "Dump trop petit ($(numfmt --to=iec ${size_bytes})) — base de données vide ?"
send_telegram "🚨 *Wordly Backup ÉCHOUÉ*
Dump PostgreSQL trop petit : $(numfmt --to=iec ${size_bytes})
Date : $(date '+%Y-%m-%d %H:%M:%S')"
rm -rf "${LOCAL_TMP}"
exit 1
fi
echo ""
log_warning "RESTORE MODE - This will OVERWRITE the current database!"
echo " File: ${full_path}"
echo " Database: ${POSTGRES_DB}"
echo ""
read -p "Are you sure? Type 'YES' to confirm: " confirm
if [ "${confirm}" != "YES" ]; then
log "Restore cancelled."
exit 0
fi
log "Restoring from ${full_path}..."
# Create a safety backup first
log "Creating safety backup before restore..."
SAFETY_NAME="wordly_db_pre_restore_${TIMESTAMP}.sql.gz"
docker exec "${POSTGRES_CONTAINER}" pg_dump \
-U "${POSTGRES_USER}" \
-d "${POSTGRES_DB}" \
--format=custom \
--compress=9 \
2>/dev/null | gzip > "${NAS_BACKUP_DIR}/daily/${SAFETY_NAME}"
log "Safety backup: ${SAFETY_NAME}"
# Restore
gunzip -c "${full_path}" | docker exec -i "${POSTGRES_CONTAINER}" \
pg_restore \
-U "${POSTGRES_USER}" \
-d "${POSTGRES_DB}" \
--clean \
--if-exists \
--no-owner \
--no-acl \
2>/dev/null || true
log_success "Restore completed!"
log_warning "Restart backend: docker restart wordly-backend"
log_success "Dump PostgreSQL : $(numfmt --to=iec ${size_bytes})"
echo "${dump_file}"
}
# ===========================================
# ==============================================================================
# CRÉER L'ARCHIVE DR (dump + .env + docker-compose + configs)
# ==============================================================================
create_dr_archive() {
local dump_file="$1"
log "Construction de l'archive DR..."
# Copier les fichiers de config
[ -f "${PROJECT_ROOT}/.env" ] && cp "${PROJECT_ROOT}/.env" "${LOCAL_TMP}/.env.production"
[ -f "${PROJECT_ROOT}/docker-compose.yml" ] && cp "${PROJECT_ROOT}/docker-compose.yml" "${LOCAL_TMP}/"
[ -d "${PROJECT_ROOT}/docker" ] && cp -r "${PROJECT_ROOT}/docker" "${LOCAL_TMP}/"
# Compresser
local archive_path="/tmp/${SNAPSHOT_NAME}"
tar -czf "${archive_path}" -C "${LOCAL_TMP}" .
rm -rf "${LOCAL_TMP}"
# Vérification intégrité
if ! gzip -t "${archive_path}" 2>/dev/null; then
log_error "Archive DR corrompue !"
rm -f "${archive_path}"
exit 1
fi
local size
size=$(du -h "${archive_path}" | cut -f1)
log_success "Archive DR créée : ${SNAPSHOT_NAME} (${size})"
echo "${archive_path}|${size}"
}
# ==============================================================================
# ENVOYER SUR LE NAS VIA SCP/rsync SSH
# ==============================================================================
push_to_nas() {
local archive_path="$1"
local size="$2"
log "Transfert vers le NAS via rsync SSH..."
log " Source : ${archive_path}"
log " Dest : ${NAS_USER}@${NAS_HOST}:${NAS_PATH}/snapshots/${SNAPSHOT_NAME}"
# Dossier quotidien/hebdo/mensuel sur le NAS
local nas_dest="${NAS_PATH}/snapshots"
# Transfer principal
if ! rsync -az \
-e "ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes -o ConnectTimeout=30" \
"${archive_path}" \
"${NAS_USER}@${NAS_HOST}:${nas_dest}/${SNAPSHOT_NAME}"; then
log_error "rsync vers le NAS a échoué !"
send_telegram "🚨 *Wordly Backup ÉCHOUÉ*
rsync SSH vers ${NAS_HOST} a échoué
Fichier local conservé : ${archive_path}
Date : $(date '+%Y-%m-%d %H:%M:%S')"
# Garder le fichier local comme fallback
mkdir -p "${PROJECT_ROOT}/backups/emergency"
mv "${archive_path}" "${PROJECT_ROOT}/backups/emergency/${SNAPSHOT_NAME}"
log_warning "Archive conservée localement : ${PROJECT_ROOT}/backups/emergency/${SNAPSHOT_NAME}"
exit 1
fi
log_success "Archive transférée sur le NAS : ${nas_dest}/${SNAPSHOT_NAME}"
# Copie hebdomadaire (dimanche)
if [ "${DAY_OF_WEEK}" = "7" ]; then
${SSH_CMD} "${NAS_USER}@${NAS_HOST}" \
"cp ${nas_dest}/${SNAPSHOT_NAME} ${NAS_PATH}/snapshots/weekly_${SNAPSHOT_NAME}" 2>/dev/null || true
log "Archive hebdomadaire copiée."
fi
# Copie mensuelle (1er du mois)
if [ "${DAY_OF_MONTH}" = "01" ]; then
${SSH_CMD} "${NAS_USER}@${NAS_HOST}" \
"cp ${nas_dest}/${SNAPSHOT_NAME} ${NAS_PATH}/snapshots/monthly_${SNAPSHOT_NAME}" 2>/dev/null || true
log "Archive mensuelle copiée."
fi
# Nettoyage local
rm -f "${archive_path}"
}
# ==============================================================================
# ROTATION DES ARCHIVES SUR LE NAS
# ==============================================================================
cleanup_nas() {
log "Rotation des archives sur le NAS (conservation : ${DAILY_RETENTION} jours)..."
# Supprimer les archives wordly_dr_* plus vieilles que DAILY_RETENTION
${SSH_CMD} "${NAS_USER}@${NAS_HOST}" \
"find ${NAS_PATH}/snapshots -name 'wordly_dr_*.tar.gz' -mtime +${DAILY_RETENTION} -delete 2>/dev/null; \
find ${NAS_PATH}/snapshots -name 'weekly_*.tar.gz' | sort -r | tail -n +$((WEEKLY_RETENTION + 1)) | xargs rm -f 2>/dev/null; \
find ${NAS_PATH}/snapshots -name 'monthly_*.tar.gz' | sort -r | tail -n +$((MONTHLY_RETENTION + 1)) | xargs rm -f 2>/dev/null; \
echo OK" | grep -q "OK"
log_success "Rotation des archives OK"
}
# ==============================================================================
# SYNCHRONISER LES SCRIPTS SUR LE NAS (pour restauration depuis .98)
# ==============================================================================
sync_scripts() {
rsync -az \
-e "ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes" \
--exclude="__pycache__" \
--exclude="*.pyc" \
"${SCRIPT_DIR}/" \
"${NAS_USER}@${NAS_HOST}:${NAS_PATH}/scripts/" 2>/dev/null || true
}
# ==============================================================================
# LISTER LES ARCHIVES DISPONIBLES
# ==============================================================================
list_archives() {
log "Archives disponibles sur le NAS :"
${SSH_CMD} "${NAS_USER}@${NAS_HOST}" \
"ls -lht ${NAS_PATH}/snapshots/wordly_dr_*.tar.gz 2>/dev/null || echo '(aucune archive)'"
}
# ==============================================================================
# MAIN
# ===========================================
# ==============================================================================
main() {
case "${1:-}" in
--list)
ENV_FILE="${PROJECT_ROOT}/.env"
[ -f "${ENV_FILE}" ] && { set -a; source "${ENV_FILE}"; set +a; }
list_archives
exit 0
;;
--full|*)
;;
esac
case "${1:-}" in
--restore)
restore_backup "${2:-}"
;;
--full)
check_prerequisites
create_backup
verify_backup
cleanup_old_backups
log_success "Full backup cycle complete!"
;;
*)
check_prerequisites
create_backup
verify_backup
cleanup_old_backups
log_success "Backup complete!"
;;
esac
echo ""
echo "================================================================="
echo " Wordly.art — Backup → NAS Synology 192.168.1.146"
echo " DB : ${POSTGRES_DB}"
echo " NAS : ${NAS_USER}@${NAS_HOST}:${NAS_PATH}"
echo " $(date '+%Y-%m-%d %H:%M:%S')"
echo "================================================================="
echo ""
check_prerequisites
# 1. Dump PostgreSQL
local dump_file
dump_file=$(backup_postgres)
# 2. Créer l'archive DR
local archive_info
archive_info=$(create_dr_archive "${dump_file}")
local archive_path="${archive_info%%|*}"
local archive_size="${archive_info##*|}"
# 3. Envoyer sur le NAS via rsync SSH
push_to_nas "${archive_path}" "${archive_size}"
# 4. Rotation
cleanup_nas
# 5. Sync scripts
sync_scripts
# 6. Notification Telegram
send_telegram "✅ *Wordly.art Backup OK*
Archive : \`${SNAPSHOT_NAME}\`
Taille : ${archive_size}
NAS : \`${NAS_PATH}/snapshots/\`
Date : $(date '+%Y-%m-%d %H:%M:%S')"
echo ""
log_success "================================================================="
log_success "Backup complet terminé !"
log_success " Archive : ${NAS_PATH}/snapshots/${SNAPSHOT_NAME}"
log_success " Lister : bash scripts/backup-to-nas.sh --list"
log_success "================================================================="
echo ""
}
main "$@"

View File

@@ -1,9 +1,16 @@
#!/bin/bash
# ==============================================================================
# Wordly.art - Disaster Recovery (DR) Backup & Restore Playbook (V2)
# Wordly.art - Disaster Recovery (DR) Backup & Restore Playbook (V3)
# ==============================================================================
# Packages app configs (.env, docker-compose), database backups, and NPM
# configs, and exports them to LOCAL, NAS, or remote SCP storage.
# Archives app configs (.env, docker-compose), database backup, and exports
# to the NAS at 192.168.1.146.
#
# On RESTORE: deploys app on the new server and automatically updates NPM
# (192.168.1.184) to reroute traffic via API — no manual intervention needed.
#
# Usage:
# ./disaster-recovery.sh --backup # Create DR archive → NAS
# ./disaster-recovery.sh --restore <archive> # Restore on THIS machine
# ==============================================================================
set -euo pipefail
@@ -31,49 +38,54 @@ if [ -f "${ENV_FILE}" ]; then
set +a
fi
# Config Defaults & Type Resolution
BACKUP_DEST_TYPE="${BACKUP_DEST_TYPE:-LOCAL}" # LOCAL, NAS, SCP
BACKUP_DEST_PATH="${BACKUP_DEST_PATH:-${PROJECT_ROOT}/backups}"
DR_RETENTION_DAYS=${DR_RETENTION_DAYS:-14}
# NAS SSH (même config que backup-to-nas.sh)
NAS_HOST="${NAS_HOST:-192.168.1.146}"
NAS_USER="${NAS_USER:-wordly-backup}"
NAS_PATH="${NAS_PATH:-/volume1/backups/wordly}"
NAS_SSH_PORT="${NAS_SSH_PORT:-22}"
NAS_SSH_KEY="${NAS_SSH_KEY:-/root/.ssh/wordly_nas_key}"
BACKUP_DEST_PATH="${NAS_PATH}/snapshots"
DR_RETENTION_DAYS=${DR_RETENTION_DAYS:-30}
# SCP Configuration
SCP_HOST="${SCP_HOST:-}"
SCP_USER="${SCP_USER:-}"
SCP_KEY_PATH="${SCP_KEY_PATH:-~/.ssh/id_rsa}"
SCP_PORT="${SCP_PORT:-22}"
SCP_DEST_PATH="${SCP_DEST_PATH:-/var/backups/wordly}"
# IP of THIS server (used during restore to configure NPM failover)
SERVER_IP="${SERVER_IP:-}"
# NPM Configuration directories
NPM_DATA_DIR="${NPM_DATA_DIR:-}"
NPM_LETSENCRYPT_DIR="${NPM_LETSENCRYPT_DIR:-}"
# Telegram
TELEGRAM_BOT_TOKEN="${TELEGRAM_BOT_TOKEN:-}"
TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}"
# ==============================================================================
# DESTINATION PREPARATION
# SEND TELEGRAM NOTIFICATION
# ==============================================================================
send_telegram() {
local message="$1"
if [ -n "${TELEGRAM_BOT_TOKEN}" ] && [ -n "${TELEGRAM_CHAT_ID}" ]; then
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d "chat_id=${TELEGRAM_CHAT_ID}" \
-d "text=${message}" \
-d "parse_mode=Markdown" \
>/dev/null 2>&1 || true
fi
}
# ==============================================================================
# DESTINATION PREPARATION (backup mode)
# ==============================================================================
prepare_destination() {
if [ "${BACKUP_DEST_TYPE}" = "NAS" ] || [ "${BACKUP_DEST_TYPE}" = "LOCAL" ]; then
if [ ! -d "${BACKUP_DEST_PATH}" ]; then
mkdir -p "${BACKUP_DEST_PATH}" 2>/dev/null || true
fi
if [ ! -w "${BACKUP_DEST_PATH}" ]; then
log_warning "Backup destination path '${BACKUP_DEST_PATH}' is not writable. Falling back to local backups."
BACKUP_DEST_PATH="${PROJECT_ROOT}/backups"
BACKUP_DEST_TYPE="LOCAL"
mkdir -p "${BACKUP_DEST_PATH}/dr"
fi
DR_LOCAL_DIR="${BACKUP_DEST_PATH}/dr"
mkdir -p "${DR_LOCAL_DIR}"
elif [ "${BACKUP_DEST_TYPE}" = "SCP" ]; then
if [ -z "${SCP_HOST}" ] || [ -z "${SCP_USER}" ]; then
log_error "SCP backup selected but SCP_HOST or SCP_USER is not configured in .env."
log_warning "Falling back to LOCAL backup directory."
BACKUP_DEST_TYPE="LOCAL"
BACKUP_DEST_PATH="${PROJECT_ROOT}/backups"
DR_LOCAL_DIR="${BACKUP_DEST_PATH}/dr"
mkdir -p "${DR_LOCAL_DIR}"
fi
local ssh_cmd="ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes -o ConnectTimeout=10"
log "Vérification de la connectivité SSH vers le NAS ${NAS_HOST}..."
if ! ${ssh_cmd} "${NAS_USER}@${NAS_HOST}" "echo OK" >/dev/null 2>&1; then
log_error "Impossible de joindre le NAS ${NAS_HOST} via SSH."
log_error "Lancez d'abord : sudo bash scripts/setup-nas.sh"
exit 1
fi
# S'assurer que le dossier snapshots existe sur le NAS
${ssh_cmd} "${NAS_USER}@${NAS_HOST}" \
"mkdir -p ${NAS_PATH}/snapshots" 2>/dev/null || true
log_success "NAS SSH OK — Destination : ${NAS_USER}@${NAS_HOST}:${NAS_PATH}/snapshots"
}
# ==============================================================================
@@ -127,22 +139,11 @@ perform_backup() {
mkdir -p "${packing_dir}/db_backup"
cp "${latest_db_backup}" "${packing_dir}/db_backup/"
# 5. Pack Nginx Proxy Manager (NPM) configs if configured
local has_npm_data=false
if [ -n "${NPM_DATA_DIR}" ] && [ -d "${NPM_DATA_DIR}" ]; then
log "Packaging Nginx Proxy Manager /data directory..."
cp -r "${NPM_DATA_DIR}" "${packing_dir}/npm_data"
has_npm_data=true
fi
if [ -n "${NPM_LETSENCRYPT_DIR}" ] && [ -d "${NPM_LETSENCRYPT_DIR}" ]; then
log "Packaging Nginx Proxy Manager /etc/letsencrypt directory..."
cp -r "${NPM_LETSENCRYPT_DIR}" "${packing_dir}/npm_letsencrypt"
has_npm_data=true
fi
if [ "${has_npm_data}" = "false" ]; then
log_warning "NPM directories (NPM_DATA_DIR / NPM_LETSENCRYPT_DIR) not configured or not found. Skipping NPM config packaging."
fi
# 5. Note: NPM config is NOT backed up here.
# NPM runs on its own dedicated server (192.168.1.184) and is stable.
# Only the forward_host IP needs to change during failover, which is
# done automatically via the NPM API by npm-failover.sh during restore.
log "NPM is on dedicated server 192.168.1.184 — no NPM config to backup."
# 6. Compress DR Archive
local dr_archive_name="wordly_dr_${TIMESTAMP}.tar.gz"
@@ -160,44 +161,47 @@ perform_backup() {
local size
size=$(du -h "${local_archive_path}" | cut -f1)
# 7. Route to Destination
if [ "${BACKUP_DEST_TYPE}" = "LOCAL" ] || [ "${BACKUP_DEST_TYPE}" = "NAS" ]; then
local dest_path="${DR_LOCAL_DIR}/${dr_archive_name}"
mv "${local_archive_path}" "${dest_path}"
log_success "DR archive created successfully (${size}) at: ${dest_path}"
# Retention
log "Applying retention policy (pruning files older than ${DR_RETENTION_DAYS} days)..."
find "${DR_LOCAL_DIR}" -name "wordly_dr_*.tar.gz" -mtime +"${DR_RETENTION_DAYS}" -exec rm -f {} \;
elif [ "${BACKUP_DEST_TYPE}" = "SCP" ]; then
log "Transferring DR archive to remote server via SCP (${SCP_USER}@${SCP_HOST}:${SCP_PORT})..."
# Test connection & Create remote directory if not exists
if ! ssh -p "${SCP_PORT}" -i "${SCP_KEY_PATH}" -o ConnectTimeout=5 -o StrictHostKeyChecking=no "${SCP_USER}@${SCP_HOST}" "mkdir -p ${SCP_DEST_PATH}" 2>/dev/null; then
log_error "SSH connection to ${SCP_USER}@${SCP_HOST} failed. Saving archive locally instead."
mkdir -p "${PROJECT_ROOT}/backups/dr"
mv "${local_archive_path}" "${PROJECT_ROOT}/backups/dr/${dr_archive_name}"
log_warning "DR backup saved locally at: ${PROJECT_ROOT}/backups/dr/${dr_archive_name}"
exit 1
fi
# 7. Envoyer l'archive sur le NAS via rsync SSH
local ssh_cmd="ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes -o ConnectTimeout=30"
local dest_path="${BACKUP_DEST_PATH}/${dr_archive_name}"
# SCP copy
if scp -P "${SCP_PORT}" -i "${SCP_KEY_PATH}" -o StrictHostKeyChecking=no "${local_archive_path}" "${SCP_USER}@${SCP_HOST}:${SCP_DEST_PATH}/${dr_archive_name}"; then
log_success "DR archive transferred successfully to ${SCP_USER}@${SCP_HOST}:${SCP_DEST_PATH}/${dr_archive_name}"
rm -f "${local_archive_path}"
# Remote retention prune
log "Applying remote retention policy on backup server..."
ssh -p "${SCP_PORT}" -i "${SCP_KEY_PATH}" -o StrictHostKeyChecking=no "${SCP_USER}@${SCP_HOST}" \
"find ${SCP_DEST_PATH} -name 'wordly_dr_*.tar.gz' -mtime +${DR_RETENTION_DAYS} -exec rm -f {} \;" || true
else
log_error "SCP file transfer failed. Retaining local backup."
mkdir -p "${PROJECT_ROOT}/backups/dr"
mv "${local_archive_path}" "${PROJECT_ROOT}/backups/dr/${dr_archive_name}"
fi
log "Transfert de l'archive DR vers le NAS via rsync SSH..."
if ! rsync -az \
-e "ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes -o ConnectTimeout=30" \
"${local_archive_path}" \
"${NAS_USER}@${NAS_HOST}:${BACKUP_DEST_PATH}/${dr_archive_name}"; then
log_error "rsync SSH vers le NAS a échoué !"
log_warning "Archive conservée localement : ${local_archive_path}"
send_telegram "🚨 *Wordly DR Backup FAILED*
rsync NAS échoué : ${NAS_HOST}
Fichier local : ${local_archive_path}
Date: $(date '+%Y-%m-%d %H:%M:%S')"
exit 1
fi
rm -f "${local_archive_path}"
log_success "Archive DR transférée (${size}) → ${NAS_USER}@${NAS_HOST}:${dest_path}"
# Retention policy sur le NAS
log "Rotation des archives (>${DR_RETENTION_DAYS} jours) sur le NAS..."
${ssh_cmd} "${NAS_USER}@${NAS_HOST}" \
"find ${BACKUP_DEST_PATH} -name 'wordly_dr_*.tar.gz' -mtime +${DR_RETENTION_DAYS} -delete 2>/dev/null; echo OK" | grep -q "OK" || true
# Sync scripts
if command -v rsync &>/dev/null; then
rsync -az \
-e "ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes" \
--exclude="__pycache__" \
"${SCRIPT_DIR}/" \
"${NAS_USER}@${NAS_HOST}:${NAS_PATH}/scripts/" 2>/dev/null || true
fi
send_telegram "✅ *Wordly.art DR Backup OK*
Archive: \`${dr_archive_name}\`
Taille: ${size}
NAS: \`${dest_path}\`
Date: $(date '+%Y-%m-%d %H:%M:%S')"
log_success "Disaster Recovery backup complete."
}
@@ -250,22 +254,7 @@ perform_restore() {
source "${PROJECT_ROOT}/.env"
set +a
# Restore NPM configs to their target directories if present in the package
if [ -d "${PROJECT_ROOT}/npm_data" ] && [ -n "${NPM_DATA_DIR}" ]; then
log "Restoring NPM /data directory..."
mkdir -p "$(dirname "${NPM_DATA_DIR}")"
rm -rf "${NPM_DATA_DIR}"
mv "${PROJECT_ROOT}/npm_data" "${NPM_DATA_DIR}"
fi
if [ -d "${PROJECT_ROOT}/npm_letsencrypt" ] && [ -n "${NPM_LETSENCRYPT_DIR}" ]; then
log "Restoring NPM /etc/letsencrypt directory..."
mkdir -p "$(dirname "${NPM_LETSENCRYPT_DIR}")"
rm -rf "${NPM_LETSENCRYPT_DIR}"
mv "${PROJECT_ROOT}/npm_letsencrypt" "${NPM_LETSENCRYPT_DIR}"
fi
log_success "Docker configurations, env keys, and NPM configurations restored."
log_success "Docker configurations and env keys restored."
# Boot Docker Compose Services
log "Spinning up Docker containers (database, redis, backend, frontend, NPM if configured)..."
@@ -328,10 +317,63 @@ perform_restore() {
log "Restarting application backend..."
${compose_cmd} restart backend
# HTTP Health check (wait up to 3 minutes)
log "Waiting for application health check (max 180s)..."
local app_url="http://localhost:8001/health"
local health_ok=false
for i in $(seq 1 36); do
local http_code
http_code=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 3 --max-time 5 "${app_url}" 2>/dev/null || echo "000")
if [ "${http_code}" = "200" ]; then
health_ok=true
log_success "App is healthy (HTTP 200) after $((i * 5))s"
break
fi
echo " Health check attempt ${i}/36... (HTTP ${http_code})"
sleep 5
done
if [ "${health_ok}" = "false" ]; then
log_error "App did NOT become healthy within 180s!"
log_error "NPM failover will NOT be triggered automatically."
log_error "Investigate: docker compose logs backend"
send_telegram "🚨 *Wordly.art DR FAILED — App unhealthy*
Serveur: \`$(hostname -I | awk '{print $1}')\`
Date: $(date '+%Y-%m-%d %H:%M:%S')
Action: vérifiez les logs Docker"
exit 1
fi
# ==============================================================================
# NPM AUTOMATIC FAILOVER
# ==============================================================================
log "App is healthy. Triggering NPM failover..."
local this_server_ip
this_server_ip="${SERVER_IP:-$(hostname -I | awk '{print $1}')}"
if bash "${SCRIPT_DIR}/npm-failover.sh" --target-ip "${this_server_ip}"; then
log_success "NPM now routes traffic to this server (${this_server_ip})"
send_telegram "✅ *Wordly.art DR COMPLET*
Serveur actif: \`${this_server_ip}\`
NPM redirigé automatiquement
Date: $(date '+%Y-%m-%d %H:%M:%S')"
else
log_error "NPM failover script FAILED."
log_warning "Manual failover required:"
log_warning " → Go to http://192.168.1.184:81"
log_warning " → Edit proxy host for ${NPM_PROXY_HOST_DOMAIN:-wordly.art}"
log_warning " → Change Forward Hostname to: ${this_server_ip}"
send_telegram "⚠️ *Wordly.art DR — NPM manuel requis*
App OK sur: \`${this_server_ip}\`
NPM failover automatique a échoué
Action: http://192.168.1.184:81 → modifier Forward Host"
fi
log_success "=========================================================================="
log_success "DISASTER RECOVERY SYSTEM RESTORE COMPLETE!"
log_success "=========================================================================="
log "Your application and reverse-proxy routes are restored."
log_success " App: http://${this_server_ip}:8001/health"
log_success " NPM: http://192.168.1.184:81"
echo ""
}

View File

@@ -0,0 +1,73 @@
#!/bin/bash
# ==============================================================================
# Wordly.art - Install Backup Crontab
# ==============================================================================
# Run ONCE to install all scheduled backup tasks.
#
# Usage:
# bash scripts/install-crontab.sh
# ==============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
CRONTAB_FILE="${SCRIPT_DIR}/crontab.wordly"
LOG_DIR="/var/log"
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
log() { echo "[Crontab] $1"; }
log_success() { echo -e "[Crontab] ${GREEN}$1${NC}"; }
log_warning() { echo -e "[Crontab] ${YELLOW}⚠️ $1${NC}"; }
# ==============================================================================
# 1. Create the crontab file
# ==============================================================================
cat > "${CRONTAB_FILE}" <<EOF
# ==============================================================================
# Wordly.art — Backup & Disaster Recovery Crontab
# Installed by: bash scripts/install-crontab.sh
# ==============================================================================
# Backup to NAS every 6 hours (00:00, 06:00, 12:00, 18:00)
0 */6 * * * bash ${SCRIPT_DIR}/backup-to-nas.sh >> ${LOG_DIR}/wordly-backup.log 2>&1
# Verify backup integrity 30 minutes after each backup
30 */6 * * * bash ${SCRIPT_DIR}/verify-backups.sh >> ${LOG_DIR}/wordly-verify.log 2>&1
# Rotate logs weekly (keep last 30 days)
0 4 * * 0 find ${LOG_DIR} -name "wordly-*.log" -mtime +30 -delete 2>/dev/null || true
EOF
log_success "Crontab file created: ${CRONTAB_FILE}"
# ==============================================================================
# 2. Install crontab for current user
# ==============================================================================
log "Installing crontab..."
# Preserve existing crontab (if any), append new entries
EXISTING_CRON=$(crontab -l 2>/dev/null || true)
# Remove any existing wordly entries (to avoid duplicates on re-run)
EXISTING_CRON_CLEAN=$(echo "${EXISTING_CRON}" | grep -v "wordly" | grep -v "backup-to-nas" | grep -v "verify-backups" || true)
# Combine
NEW_CRON=$(printf "%s\n%s\n" "${EXISTING_CRON_CLEAN}" "$(cat "${CRONTAB_FILE}")")
echo "${NEW_CRON}" | crontab -
log_success "Crontab installed!"
echo ""
log "Current crontab:"
crontab -l | grep -E "wordly|backup|verify" | sed 's/^/ /'
echo ""
log_success "Scheduled jobs:"
log_success " Every 6h (00:00/06:00/12:00/18:00) → backup-to-nas.sh"
log_success " Every 6h+30min → verify-backups.sh"
log_success " Every Sunday at 04:00 → log rotation"
echo ""
log_warning "Logs will be written to:"
log_warning " ${LOG_DIR}/wordly-backup.log"
log_warning " ${LOG_DIR}/wordly-verify.log"

325
scripts/npm-failover.sh Normal file
View File

@@ -0,0 +1,325 @@
#!/bin/bash
# ==============================================================================
# Wordly.art - NPM Failover via API
# ==============================================================================
# Automatically updates Nginx Proxy Manager's forward host via its REST API.
# Called by disaster-recovery.sh after a successful health check on the new server.
#
# Usage:
# ./npm-failover.sh --target-ip 192.168.1.98 # Switch to new server
# ./npm-failover.sh --target-ip 192.168.1.151 # Rollback to original server
# ./npm-failover.sh --dry-run --target-ip 192.168.1.98 # Test without modifying NPM
# ==============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
log() { echo -e "[NPM-Failover ${TIMESTAMP}] $1"; }
log_success() { echo -e "[NPM-Failover ${TIMESTAMP}] ${GREEN}$1${NC}"; }
log_warning() { echo -e "[NPM-Failover ${TIMESTAMP}] ${YELLOW}WARNING: $1${NC}"; }
log_error() { echo -e "[NPM-Failover ${TIMESTAMP}] ${RED}ERROR: $1${NC}"; }
log_info() { echo -e "[NPM-Failover ${TIMESTAMP}] ${BLUE}$1${NC}"; }
# ==============================================================================
# 1. LOAD CONFIGURATION FROM .env
# ==============================================================================
ENV_FILE="${PROJECT_ROOT}/.env"
if [ -f "${ENV_FILE}" ]; then
set -a
source "${ENV_FILE}"
set +a
fi
NPM_API_URL="${NPM_API_URL:-}"
NPM_ADMIN_EMAIL="${NPM_ADMIN_EMAIL:-}"
NPM_ADMIN_PASSWORD="${NPM_ADMIN_PASSWORD:-}"
NPM_PROXY_HOST_DOMAIN="${NPM_PROXY_HOST_DOMAIN:-wordly.art}"
TELEGRAM_BOT_TOKEN="${TELEGRAM_BOT_TOKEN:-}"
TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}"
# ==============================================================================
# 2. ARGUMENT PARSING
# ==============================================================================
TARGET_IP=""
DRY_RUN=false
while [[ $# -gt 0 ]]; do
case "$1" in
--target-ip)
TARGET_IP="$2"
shift 2
;;
--dry-run)
DRY_RUN=true
shift
;;
*)
log_error "Unknown argument: $1"
echo "Usage: $0 --target-ip <IP> [--dry-run]"
exit 1
;;
esac
done
# ==============================================================================
# 3. VALIDATION
# ==============================================================================
validate_config() {
local errors=0
if [ -z "${TARGET_IP}" ]; then
log_error "--target-ip is required."
errors=$((errors + 1))
fi
if [ -z "${NPM_API_URL}" ]; then
log_error "NPM_API_URL is not set in .env (example: http://192.168.1.184:81/api)"
errors=$((errors + 1))
fi
if [ -z "${NPM_ADMIN_EMAIL}" ]; then
log_error "NPM_ADMIN_EMAIL is not set in .env"
errors=$((errors + 1))
fi
if [ -z "${NPM_ADMIN_PASSWORD}" ]; then
log_error "NPM_ADMIN_PASSWORD is not set in .env"
errors=$((errors + 1))
fi
if ! command -v curl &>/dev/null; then
log_error "curl is not installed. Required for NPM API calls."
errors=$((errors + 1))
fi
if ! command -v jq &>/dev/null; then
log_error "jq is not installed. Required for JSON parsing. Install: apt-get install jq"
errors=$((errors + 1))
fi
if [ "${errors}" -gt 0 ]; then
exit 1
fi
}
# ==============================================================================
# 4. TELEGRAM NOTIFICATION
# ==============================================================================
send_telegram() {
local message="$1"
if [ -n "${TELEGRAM_BOT_TOKEN}" ] && [ -n "${TELEGRAM_CHAT_ID}" ]; then
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d "chat_id=${TELEGRAM_CHAT_ID}" \
-d "text=${message}" \
-d "parse_mode=Markdown" \
>/dev/null 2>&1 || true
fi
}
# ==============================================================================
# 5. NPM API AUTHENTICATION
# ==============================================================================
npm_authenticate() {
log "Authenticating with NPM API at ${NPM_API_URL}..."
local response
response=$(curl -s -w "\n%{http_code}" \
-X POST "${NPM_API_URL}/tokens" \
-H "Content-Type: application/json" \
-d "{\"identity\": \"${NPM_ADMIN_EMAIL}\", \"secret\": \"${NPM_ADMIN_PASSWORD}\"}" \
--connect-timeout 10 \
--max-time 15)
local http_code
http_code=$(echo "${response}" | tail -n1)
local body
body=$(echo "${response}" | head -n-1)
if [ "${http_code}" != "200" ]; then
log_error "NPM authentication failed (HTTP ${http_code}). Check NPM_ADMIN_EMAIL and NPM_ADMIN_PASSWORD."
log_error "Response: ${body}"
return 1
fi
local token
token=$(echo "${body}" | jq -r '.token // empty')
if [ -z "${token}" ]; then
log_error "Could not extract token from NPM response."
log_error "Response: ${body}"
return 1
fi
log_success "NPM authentication successful."
echo "${token}"
}
# ==============================================================================
# 6. FIND PROXY HOST BY DOMAIN
# ==============================================================================
npm_find_proxy_host() {
local token="$1"
log "Looking up proxy host for domain: ${NPM_PROXY_HOST_DOMAIN}..."
local response
response=$(curl -s -w "\n%{http_code}" \
-X GET "${NPM_API_URL}/nginx/proxy-hosts?expand=domain_names" \
-H "Authorization: Bearer ${token}" \
--connect-timeout 10 \
--max-time 15)
local http_code
http_code=$(echo "${response}" | tail -n1)
local body
body=$(echo "${response}" | head -n-1)
if [ "${http_code}" != "200" ]; then
log_error "Failed to retrieve proxy hosts (HTTP ${http_code})"
return 1
fi
# Find the proxy host ID matching our domain
local host_id
host_id=$(echo "${body}" | jq -r \
--arg domain "${NPM_PROXY_HOST_DOMAIN}" \
'.[] | select(.domain_names[] == $domain) | .id' | head -n1)
if [ -z "${host_id}" ]; then
log_error "No proxy host found for domain '${NPM_PROXY_HOST_DOMAIN}' in NPM."
log_error "Available domains:"
echo "${body}" | jq -r '.[].domain_names[]' | sed 's/^/ - /' >&2
return 1
fi
log_success "Found proxy host ID: ${host_id} for ${NPM_PROXY_HOST_DOMAIN}"
# Also retrieve current forward_host for logging
local current_host
current_host=$(echo "${body}" | jq -r \
--arg domain "${NPM_PROXY_HOST_DOMAIN}" \
'.[] | select(.domain_names[] == $domain) | .forward_host' | head -n1)
log_info "Current forward host: ${current_host}"
echo "${host_id}|${current_host}"
}
# ==============================================================================
# 7. UPDATE PROXY HOST FORWARD IP
# ==============================================================================
npm_update_proxy_host() {
local token="$1"
local host_id="$2"
local new_ip="$3"
log "Updating proxy host ${host_id} → forward to ${new_ip}..."
# First, get the full current configuration to preserve all existing settings
local current_config
current_config=$(curl -s \
-X GET "${NPM_API_URL}/nginx/proxy-hosts/${host_id}" \
-H "Authorization: Bearer ${token}" \
--connect-timeout 10 \
--max-time 15)
# Build the update payload preserving existing config, only changing forward_host
local update_payload
update_payload=$(echo "${current_config}" | jq \
--arg new_ip "${new_ip}" \
'. + {"forward_host": $new_ip}')
if [ "${DRY_RUN}" = "true" ]; then
log_warning "[DRY RUN] Would send PUT to ${NPM_API_URL}/nginx/proxy-hosts/${host_id}"
log_warning "[DRY RUN] Payload: ${update_payload}"
log_success "[DRY RUN] NPM failover simulation complete — no changes made."
return 0
fi
local response
response=$(curl -s -w "\n%{http_code}" \
-X PUT "${NPM_API_URL}/nginx/proxy-hosts/${host_id}" \
-H "Authorization: Bearer ${token}" \
-H "Content-Type: application/json" \
-d "${update_payload}" \
--connect-timeout 10 \
--max-time 15)
local http_code
http_code=$(echo "${response}" | tail -n1)
local body
body=$(echo "${response}" | head -n-1)
if [ "${http_code}" != "200" ]; then
log_error "Failed to update proxy host (HTTP ${http_code})"
log_error "Response: ${body}"
return 1
fi
# Verify the change was applied
local confirmed_host
confirmed_host=$(echo "${body}" | jq -r '.forward_host // empty')
if [ "${confirmed_host}" != "${new_ip}" ]; then
log_error "NPM accepted the request but the forward_host is '${confirmed_host}', expected '${new_ip}'."
return 1
fi
log_success "NPM proxy host updated successfully: ${NPM_PROXY_HOST_DOMAIN}${new_ip}"
}
# ==============================================================================
# 8. MAIN
# ==============================================================================
main() {
echo ""
echo "========================================================="
echo " Wordly.art — NPM Failover"
echo " Target IP : ${TARGET_IP:-NOT SET}"
echo " NPM API : ${NPM_API_URL:-NOT SET}"
echo " Domain : ${NPM_PROXY_HOST_DOMAIN}"
echo " Dry Run : ${DRY_RUN}"
echo "========================================================="
echo ""
validate_config
# Step 1: Authenticate
local token
token=$(npm_authenticate)
# Step 2: Find proxy host ID and current IP
local host_info
host_info=$(npm_find_proxy_host "${token}")
local host_id="${host_info%%|*}"
local current_ip="${host_info##*|}"
if [ "${current_ip}" = "${TARGET_IP}" ]; then
log_warning "NPM already points to ${TARGET_IP}. No change needed."
exit 0
fi
# Step 3: Update forward host
npm_update_proxy_host "${token}" "${host_id}" "${TARGET_IP}"
# Step 4: Notify
if [ "${DRY_RUN}" = "false" ]; then
local msg="🔀 *Wordly.art NPM Failover*
Domaine : \`${NPM_PROXY_HOST_DOMAIN}\`
Ancien serveur : \`${current_ip}\`
Nouveau serveur : \`${TARGET_IP}\`
Heure : $(date '+%Y-%m-%d %H:%M:%S')"
send_telegram "${msg}"
log_success "Telegram notification sent."
fi
echo ""
log_success "========================================================="
log_success "NPM Failover COMPLETE"
log_success " ${NPM_PROXY_HOST_DOMAIN} now routes to → ${TARGET_IP}"
log_success "========================================================="
echo ""
}
main "$@"

264
scripts/setup-nas.sh Normal file
View File

@@ -0,0 +1,264 @@
#!/bin/bash
# ==============================================================================
# Wordly.art - NAS Setup via SSH/rsync
# ==============================================================================
# Configure l'accès SSH sans mot de passe au NAS Synology.
# Remplace l'approche CIFS/SMB par rsync sur SSH :
# - Pas de montage à gérer, pas de fstab
# - Path exact /volume1/backups/wordly utilisable directement
# - SSH chiffré, robuste aux redémarrages NAS
#
# Usage:
# sudo bash scripts/setup-nas.sh
#
# Prérequis côté NAS Synology (voir DISASTER_RECOVERY.md section 1) :
# 1. Compte 'wordly-backup' créé dans DSM → Utilisateurs et groupes
# 2. Accès R/W sur le dossier 'backups'
# 3. SSH activé dans DSM → Terminal et SNMP
# 4. Dossier /volume1/backups/wordly créé et chown wordly-backup
# ==============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
log() { echo -e "[NAS-Setup] $1"; }
log_success() { echo -e "[NAS-Setup] ${GREEN}$1${NC}"; }
log_warning() { echo -e "[NAS-Setup] ${YELLOW}⚠️ $1${NC}"; }
log_error() { echo -e "[NAS-Setup] ${RED}$1${NC}"; }
log_info() { echo -e "[NAS-Setup] ${BLUE} $1${NC}"; }
# ==============================================================================
# 1. ROOT CHECK
# ==============================================================================
if [ "$EUID" -ne 0 ]; then
log_error "Ce script doit être exécuté en root : sudo bash $0"
exit 1
fi
# ==============================================================================
# 2. CHARGER LE .env
# ==============================================================================
ENV_FILE="${PROJECT_ROOT}/.env"
if [ -f "${ENV_FILE}" ]; then
set -a
source "${ENV_FILE}"
set +a
fi
NAS_HOST="${NAS_HOST:-192.168.1.146}"
NAS_USER="${NAS_USER:-wordly-backup}"
NAS_PATH="${NAS_PATH:-/volume1/backups/wordly}"
NAS_SSH_PORT="${NAS_SSH_PORT:-22}"
NAS_SSH_KEY="${NAS_SSH_KEY:-/root/.ssh/wordly_nas_key}"
# ==============================================================================
# 3. VÉRIFIER QUE SSH EST DISPO SUR LE NAS
# ==============================================================================
check_nas_reachable() {
log "Vérification de la connectivité SSH vers ${NAS_HOST}:${NAS_SSH_PORT}..."
if ! nc -z -w5 "${NAS_HOST}" "${NAS_SSH_PORT}" 2>/dev/null; then
log_error "Impossible de joindre ${NAS_HOST} sur le port ${NAS_SSH_PORT}."
log_error "Vérifiez que SSH est activé dans DSM → Panneau de configuration → Terminal et SNMP."
exit 1
fi
log_success "NAS ${NAS_HOST}:${NAS_SSH_PORT} est joignable"
}
# ==============================================================================
# 4. GÉNÉRER LA CLÉ SSH DÉDIÉE (si elle n'existe pas)
# ==============================================================================
generate_ssh_key() {
if [ -f "${NAS_SSH_KEY}" ]; then
log_info "Clé SSH déjà existante : ${NAS_SSH_KEY}"
log_info "Si vous voulez la regénérer : rm ${NAS_SSH_KEY} ${NAS_SSH_KEY}.pub"
else
log "Génération de la clé SSH dédiée aux backups NAS..."
ssh-keygen -t ed25519 \
-C "wordly-backup@$(hostname)-$(date +%Y%m%d)" \
-f "${NAS_SSH_KEY}" \
-N ""
chmod 600 "${NAS_SSH_KEY}"
log_success "Clé SSH générée : ${NAS_SSH_KEY}"
fi
log_info "Clé publique à autoriser sur le NAS :"
cat "${NAS_SSH_KEY}.pub"
}
# ==============================================================================
# 5. COPIER LA CLÉ PUBLIQUE SUR LE NAS
# ==============================================================================
install_ssh_key() {
log "Copie de la clé publique sur le NAS ${NAS_USER}@${NAS_HOST}..."
log_warning "Le mot de passe du compte '${NAS_USER}' sur le NAS vous sera demandé UNE SEULE FOIS."
echo ""
if ssh-copy-id \
-i "${NAS_SSH_KEY}.pub" \
-p "${NAS_SSH_PORT}" \
-o StrictHostKeyChecking=accept-new \
"${NAS_USER}@${NAS_HOST}"; then
log_success "Clé publique installée sur le NAS — plus de mot de passe requis"
else
log_error "Échec de ssh-copy-id. Avez-vous bien créé le compte '${NAS_USER}' sur le Synology ?"
log_error "Vérifiez aussi que SSH est activé dans DSM."
exit 1
fi
}
# ==============================================================================
# 6. TESTER LA CONNEXION SANS MOT DE PASSE
# ==============================================================================
test_ssh_connection() {
log "Test de la connexion SSH sans mot de passe..."
local result
if result=$(ssh \
-i "${NAS_SSH_KEY}" \
-p "${NAS_SSH_PORT}" \
-o StrictHostKeyChecking=accept-new \
-o ConnectTimeout=10 \
-o BatchMode=yes \
"${NAS_USER}@${NAS_HOST}" \
"echo OK" 2>/dev/null); then
if [ "${result}" = "OK" ]; then
log_success "Connexion SSH sans mot de passe : OK"
else
log_error "Connexion établie mais réponse inattendue : ${result}"
exit 1
fi
else
log_error "Connexion SSH sans mot de passe ÉCHOUÉE."
log_error "Vérifiez que la clé a bien été copiée avec ssh-copy-id."
exit 1
fi
}
# ==============================================================================
# 7. CRÉER LA STRUCTURE DE DOSSIERS SUR LE NAS
# ==============================================================================
create_nas_directories() {
log "Création de la structure de dossiers sur le NAS : ${NAS_PATH}..."
ssh \
-i "${NAS_SSH_KEY}" \
-p "${NAS_SSH_PORT}" \
-o BatchMode=yes \
"${NAS_USER}@${NAS_HOST}" \
"mkdir -p ${NAS_PATH}/snapshots ${NAS_PATH}/scripts && echo OK"
log_success "Structure créée sur le NAS :"
log_success " ${NAS_PATH}/snapshots/ → archives DR"
log_success " ${NAS_PATH}/scripts/ → scripts de restauration"
}
# ==============================================================================
# 8. TESTER L'ÉCRITURE SUR LE NAS
# ==============================================================================
test_nas_write() {
log "Test d'écriture sur le NAS..."
local test_file="${NAS_PATH}/.write_test_$$"
if ssh \
-i "${NAS_SSH_KEY}" \
-p "${NAS_SSH_PORT}" \
-o BatchMode=yes \
"${NAS_USER}@${NAS_HOST}" \
"touch ${test_file} && rm -f ${test_file} && echo OK" | grep -q "OK"; then
log_success "Écriture sur le NAS : OK"
else
log_error "Le NAS est accessible mais pas accessible en écriture !"
log_error "Vérifiez les permissions du dossier '${NAS_PATH}' pour l'utilisateur '${NAS_USER}'."
exit 1
fi
}
# ==============================================================================
# 9. CRÉER LE FICHIER DE CONFIG SSH (~/.ssh/config)
# ==============================================================================
configure_ssh_config() {
local ssh_config="/root/.ssh/config"
local host_entry="
# Wordly.art — NAS Synology Backup
Host wordly-nas
HostName ${NAS_HOST}
User ${NAS_USER}
Port ${NAS_SSH_PORT}
IdentityFile ${NAS_SSH_KEY}
StrictHostKeyChecking accept-new
ConnectTimeout 10
ServerAliveInterval 30
BatchMode yes"
if grep -q "wordly-nas" "${ssh_config}" 2>/dev/null; then
log_warning "Entrée 'wordly-nas' déjà dans ${ssh_config} — ignorée."
else
mkdir -p /root/.ssh
chmod 700 /root/.ssh
echo "${host_entry}" >> "${ssh_config}"
chmod 600 "${ssh_config}"
log_success "Config SSH ajoutée : ${ssh_config}"
log_info "Vous pouvez maintenant utiliser : ssh wordly-nas"
fi
}
# ==============================================================================
# 10. COPIER LES SCRIPTS SUR LE NAS (disponibles depuis n'importe quel serveur)
# ==============================================================================
sync_scripts_to_nas() {
log "Synchronisation des scripts sur le NAS (pour restauration depuis .98)..."
if rsync -az \
-e "ssh -i ${NAS_SSH_KEY} -p ${NAS_SSH_PORT} -o BatchMode=yes" \
--exclude="__pycache__" \
--exclude="*.pyc" \
"${SCRIPT_DIR}/" \
"${NAS_USER}@${NAS_HOST}:${NAS_PATH}/scripts/"; then
log_success "Scripts synchronisés sur le NAS : ${NAS_PATH}/scripts/"
else
log_warning "rsync a échoué — vérifiez que rsync est installé sur le NAS Synology."
log_warning "Essayez : Synology Package Center → installer 'Rsync Server'"
fi
}
# ==============================================================================
# 11. MAIN
# ==============================================================================
main() {
echo ""
echo "================================================================="
echo " Wordly.art — Setup NAS via SSH/rsync"
echo " NAS : ${NAS_USER}@${NAS_HOST}:${NAS_SSH_PORT}"
echo " Path : ${NAS_PATH}"
echo " Clé : ${NAS_SSH_KEY}"
echo "================================================================="
echo ""
check_nas_reachable
generate_ssh_key
install_ssh_key
test_ssh_connection
create_nas_directories
test_nas_write
configure_ssh_config
sync_scripts_to_nas
echo ""
log_success "================================================================="
log_success "Setup NAS COMPLET"
log_success ""
log_success " ✅ Clé SSH : ${NAS_SSH_KEY}"
log_success " ✅ Connexion sans mot de passe : wordly-nas"
log_success " ✅ Dossiers créés sur le NAS"
log_success " ✅ Scripts disponibles pour DR depuis n'importe quel serveur"
log_success ""
log_success " Test rapide : ssh wordly-nas 'ls ${NAS_PATH}/'"
log_success " Étape suivante : bash scripts/backup-to-nas.sh --full"
log_success "================================================================="
echo ""
}
main "$@"

370
scripts/verify-backups.sh Normal file
View File

@@ -0,0 +1,370 @@
#!/bin/bash
# ==============================================================================
# Wordly.art - Backup Verification & Telegram Alerts
# ==============================================================================
# Runs after every backup to validate integrity and alert on failure.
# CRON: 30 */6 * * * (30 minutes after each backup)
#
# Checks:
# - Recent snapshot exists (< 8h)
# - Snapshot size > 1MB (not empty)
# - Snapshot gzip integrity
# - PostgreSQL is responding
# - DB contains data (COUNT > 0)
# - NAS is mounted and writable
# - Disk usage < 85%
# - App HTTP health check
# ==============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
log() { echo "[Verify ${TIMESTAMP}] $1"; }
log_success() { echo -e "[Verify ${TIMESTAMP}] ${GREEN}$1${NC}"; }
log_warning() { echo -e "[Verify ${TIMESTAMP}] ${YELLOW}⚠️ WARNING: $1${NC}"; }
log_error() { echo -e "[Verify ${TIMESTAMP}] ${RED}❌ ERROR: $1${NC}"; }
# ==============================================================================
# 1. LOAD CONFIGURATION
# ==============================================================================
ENV_FILE="${PROJECT_ROOT}/.env"
if [ -f "${ENV_FILE}" ]; then
set -a
source "${ENV_FILE}"
set +a
fi
# Directories
NAS_MOUNT="${NAS_MOUNT:-/mnt/nas-wordly}"
LOCAL_BACKUP_DIR="${BACKUP_DIR:-/opt/wordly/backups}"
# PostgreSQL
POSTGRES_CONTAINER="${POSTGRES_CONTAINER:-wordly-postgres}"
POSTGRES_USER="${POSTGRES_USER:-translate}"
POSTGRES_DB="${POSTGRES_DB:-translate_db}"
POSTGRES_PASSWORD="${POSTGRES_PASSWORD:-}"
# App health check
APP_HEALTH_URL="${APP_HEALTH_URL:-http://localhost:8001/health}"
# Thresholds
MAX_SNAPSHOT_AGE_HOURS=8
MIN_SNAPSHOT_SIZE_MB=1
MAX_DISK_USAGE_PERCENT=85
# Telegram
TELEGRAM_BOT_TOKEN="${TELEGRAM_BOT_TOKEN:-}"
TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}"
# Track failures
FAILURES=0
WARNINGS=0
# ==============================================================================
# 2. TELEGRAM
# ==============================================================================
send_telegram() {
local message="$1"
if [ -n "${TELEGRAM_BOT_TOKEN}" ] && [ -n "${TELEGRAM_CHAT_ID}" ]; then
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d "chat_id=${TELEGRAM_CHAT_ID}" \
-d "text=${message}" \
-d "parse_mode=Markdown" \
>/dev/null 2>&1 || true
else
log_warning "Telegram not configured (TELEGRAM_BOT_TOKEN or TELEGRAM_CHAT_ID missing)"
fi
}
# ==============================================================================
# 3. CHECK FUNCTIONS
# ==============================================================================
check_recent_snapshot() {
log "Check 1/8: Recent snapshot exists (< ${MAX_SNAPSHOT_AGE_HOURS}h)..."
# Look in both NAS and local backup directories
local search_dirs=("${LOCAL_BACKUP_DIR}/daily")
if mountpoint -q "${NAS_MOUNT}" 2>/dev/null; then
search_dirs+=("${NAS_MOUNT}/snapshots")
fi
local newest_snapshot=""
for dir in "${search_dirs[@]}"; do
if [ -d "${dir}" ]; then
local candidate
candidate=$(ls -t "${dir}"/*.gz 2>/dev/null | head -n1 || true)
if [ -n "${candidate}" ]; then
newest_snapshot="${candidate}"
break
fi
fi
done
if [ -z "${newest_snapshot}" ]; then
log_error "No snapshot found in backup directories!"
FAILURES=$((FAILURES + 1))
return
fi
# Check age
local snapshot_time
snapshot_time=$(stat -c %Y "${newest_snapshot}" 2>/dev/null || stat -f %m "${newest_snapshot}" 2>/dev/null)
local now
now=$(date +%s)
local age_hours=$(( (now - snapshot_time) / 3600 ))
if [ "${age_hours}" -ge "${MAX_SNAPSHOT_AGE_HOURS}" ]; then
log_error "Newest snapshot is ${age_hours}h old (max: ${MAX_SNAPSHOT_AGE_HOURS}h): $(basename "${newest_snapshot}")"
FAILURES=$((FAILURES + 1))
else
log_success "Snapshot found: $(basename "${newest_snapshot}") (${age_hours}h old)"
fi
echo "${newest_snapshot}"
}
check_snapshot_size() {
local snapshot_path="$1"
log "Check 2/8: Snapshot size > ${MIN_SNAPSHOT_SIZE_MB}MB..."
if [ -z "${snapshot_path}" ] || [ ! -f "${snapshot_path}" ]; then
log_warning "No snapshot to size-check."
return
fi
local size_bytes
size_bytes=$(stat -c %s "${snapshot_path}" 2>/dev/null || stat -f %z "${snapshot_path}" 2>/dev/null)
local min_bytes=$((MIN_SNAPSHOT_SIZE_MB * 1024 * 1024))
if [ "${size_bytes}" -lt "${min_bytes}" ]; then
log_error "Snapshot size is $(numfmt --to=iec ${size_bytes}) which is below minimum ${MIN_SNAPSHOT_SIZE_MB}MB — likely empty dump!"
FAILURES=$((FAILURES + 1))
else
log_success "Snapshot size: $(numfmt --to=iec ${size_bytes})"
fi
}
check_snapshot_integrity() {
local snapshot_path="$1"
log "Check 3/8: Snapshot gzip integrity..."
if [ -z "${snapshot_path}" ] || [ ! -f "${snapshot_path}" ]; then
log_warning "No snapshot to integrity-check."
return
fi
if gzip -t "${snapshot_path}" 2>/dev/null; then
log_success "Snapshot gzip integrity OK"
else
log_error "Snapshot is CORRUPTED: $(basename "${snapshot_path}")"
FAILURES=$((FAILURES + 1))
fi
}
check_postgres_running() {
log "Check 4/8: PostgreSQL container is running and healthy..."
if ! command -v docker &>/dev/null; then
log_warning "Docker not found — skipping PostgreSQL check."
return
fi
if ! docker ps --format '{{.Names}}' 2>/dev/null | grep -q "^${POSTGRES_CONTAINER}$"; then
log_error "PostgreSQL container '${POSTGRES_CONTAINER}' is NOT running!"
FAILURES=$((FAILURES + 1))
return
fi
local health
health=$(docker inspect --format='{{.State.Health.Status}}' "${POSTGRES_CONTAINER}" 2>/dev/null || echo "unknown")
if [ "${health}" = "healthy" ]; then
log_success "PostgreSQL container is healthy"
elif [ "${health}" = "unknown" ]; then
log_warning "PostgreSQL health status unknown (no healthcheck configured?)"
WARNINGS=$((WARNINGS + 1))
else
log_error "PostgreSQL container health status: ${health}"
FAILURES=$((FAILURES + 1))
fi
}
check_db_has_data() {
log "Check 5/8: Database contains data (COUNT > 0)..."
if ! command -v docker &>/dev/null; then
log_warning "Docker not found — skipping DB data check."
return
fi
if ! docker ps --format '{{.Names}}' 2>/dev/null | grep -q "^${POSTGRES_CONTAINER}$"; then
log_warning "PostgreSQL container not running — skipping data check."
return
fi
# Count rows across key tables (gracefully handle missing tables)
local count
count=$(docker exec -e PGPASSWORD="${POSTGRES_PASSWORD}" "${POSTGRES_CONTAINER}" \
psql -U "${POSTGRES_USER}" -d "${POSTGRES_DB}" -t -A \
-c "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public';" \
2>/dev/null || echo "0")
count=$(echo "${count}" | tr -d '[:space:]')
if [ "${count}" = "0" ] || [ -z "${count}" ]; then
log_error "Database appears to be empty (no public tables found)!"
FAILURES=$((FAILURES + 1))
else
log_success "Database has ${count} tables in public schema"
fi
}
check_nas_mounted() {
log "Check 6/8: NAS is mounted and writable at ${NAS_MOUNT}..."
if ! mountpoint -q "${NAS_MOUNT}" 2>/dev/null; then
log_error "NAS is NOT mounted at ${NAS_MOUNT}!"
log "Attempting emergency remount..."
mount "${NAS_MOUNT}" 2>/dev/null || true
if ! mountpoint -q "${NAS_MOUNT}" 2>/dev/null; then
log_error "Emergency remount FAILED. NAS is unavailable."
FAILURES=$((FAILURES + 1))
return
fi
log_warning "NAS remounted successfully (was temporarily unmounted)."
WARNINGS=$((WARNINGS + 1))
fi
# Test write access
local test_file="${NAS_MOUNT}/.write_test_${TIMESTAMP}"
if touch "${test_file}" 2>/dev/null && rm -f "${test_file}" 2>/dev/null; then
log_success "NAS is mounted and writable"
else
log_error "NAS is mounted but NOT writable!"
FAILURES=$((FAILURES + 1))
fi
}
check_disk_space() {
log "Check 7/8: Disk usage < ${MAX_DISK_USAGE_PERCENT}%..."
# Check NAS disk if mounted
if mountpoint -q "${NAS_MOUNT}" 2>/dev/null; then
local nas_usage
nas_usage=$(df "${NAS_MOUNT}" | awk 'NR==2 {gsub(/%/,""); print $5}')
if [ "${nas_usage}" -ge "${MAX_DISK_USAGE_PERCENT}" ]; then
log_error "NAS disk usage is ${nas_usage}% (threshold: ${MAX_DISK_USAGE_PERCENT}%)"
FAILURES=$((FAILURES + 1))
else
log_success "NAS disk usage: ${nas_usage}%"
fi
fi
# Check local disk
local local_usage
local_usage=$(df /opt 2>/dev/null | awk 'NR==2 {gsub(/%/,""); print $5}' || df / | awk 'NR==2 {gsub(/%/,""); print $5}')
if [ "${local_usage}" -ge "${MAX_DISK_USAGE_PERCENT}" ]; then
log_error "Local disk usage is ${local_usage}% (threshold: ${MAX_DISK_USAGE_PERCENT}%)"
WARNINGS=$((WARNINGS + 1))
else
log_success "Local disk usage: ${local_usage}%"
fi
}
check_app_health() {
log "Check 8/8: App HTTP health check at ${APP_HEALTH_URL}..."
if ! command -v curl &>/dev/null; then
log_warning "curl not found — skipping HTTP health check."
return
fi
local http_code
http_code=$(curl -s -o /dev/null -w "%{http_code}" \
--connect-timeout 5 \
--max-time 10 \
"${APP_HEALTH_URL}" 2>/dev/null || echo "000")
if [ "${http_code}" = "200" ]; then
log_success "App health check passed (HTTP ${http_code})"
elif [ "${http_code}" = "000" ]; then
log_error "App is unreachable (connection timeout)"
FAILURES=$((FAILURES + 1))
else
log_error "App health check returned HTTP ${http_code}"
FAILURES=$((FAILURES + 1))
fi
}
# ==============================================================================
# 4. MAIN
# ==============================================================================
main() {
echo ""
echo "========================================================="
echo " Wordly.art — Backup Verification"
echo " $(date '+%Y-%m-%d %H:%M:%S')"
echo "========================================================="
echo ""
# Run all checks
local newest_snapshot
newest_snapshot=$(check_recent_snapshot)
check_snapshot_size "${newest_snapshot}"
check_snapshot_integrity "${newest_snapshot}"
check_postgres_running
check_db_has_data
check_nas_mounted
check_disk_space
check_app_health
echo ""
echo "========================================================="
echo " Results: ${FAILURES} failure(s), ${WARNINGS} warning(s)"
echo "========================================================="
echo ""
# Send Telegram report
if [ "${FAILURES}" -gt 0 ]; then
local msg="🚨 *Wordly.art — Backup Verification FAILED*
Date: $(date '+%Y-%m-%d %H:%M:%S')
Failures: ${FAILURES}
Warnings: ${WARNINGS}
Check logs on 192.168.1.151:
\`cat /var/log/wordly-verify.log\`"
send_telegram "${msg}"
log_error "Verification FAILED with ${FAILURES} error(s). Telegram alert sent."
exit 1
elif [ "${WARNINGS}" -gt 0 ]; then
local msg="⚠️ *Wordly.art — Backup Verification passed with warnings*
Date: $(date '+%Y-%m-%d %H:%M:%S')
Failures: 0
Warnings: ${WARNINGS}"
send_telegram "${msg}"
log_warning "Verification passed with ${WARNINGS} warning(s)."
else
# Only send success alert once per day (at 06:30)
local hour
hour=$(date +%H)
if [ "${hour}" = "06" ]; then
local msg="✅ *Wordly.art — Daily backup check OK*
Date: $(date '+%Y-%m-%d %H:%M:%S')
All 8 checks passed."
send_telegram "${msg}"
fi
log_success "All checks passed."
fi
}
main "$@"