docs: update README.md to act as central documentation portal with French links
All checks were successful
Deploy to Production / Build and Deploy (push) Successful in 2m46s
All checks were successful
Deploy to Production / Build and Deploy (push) Successful in 2m46s
This commit is contained in:
463
README.md
463
README.md
@@ -1,389 +1,84 @@
|
||||
# 📄 Document Translation API
|
||||
# 📄 Wordly.art — Document Translation Portal
|
||||
|
||||
A powerful SaaS-ready Python API for translating complex structured documents (Excel, Word, PowerPoint) while **strictly preserving** the original formatting, layout, and embedded media.
|
||||
Wordly.art est une application complète et prête pour la production (SaaS-ready) permettant de traduire des documents bureautiques complexes (Excel, Word, PowerPoint) tout en **préservant strictement** la mise en page originale, le style, les formules et les médias intégrés.
|
||||
|
||||
## ✨ Features
|
||||
|
||||
### 🔄 Multiple Translation Providers
|
||||
| Provider | Type | Description |
|
||||
|----------|------|-------------|
|
||||
| **Google Translate** | Cloud | Free, fast, reliable |
|
||||
| **Ollama** | Local LLM | Privacy-focused, customizable with system prompts |
|
||||
| **WebLLM** | Browser | Runs entirely in browser using WebGPU |
|
||||
| **DeepL** | Cloud | High-quality translations (API key required) |
|
||||
| **LibreTranslate** | Self-hosted | Open-source alternative |
|
||||
| **OpenAI** | Cloud | GPT-4o/4o-mini with vision support |
|
||||
|
||||
### 📊 Excel Translation (.xlsx)
|
||||
- ✅ Translates all cell content and sheet names
|
||||
- ✅ Preserves cell merging, formulas, and styles
|
||||
- ✅ Maintains font styles, colors, and borders
|
||||
- ✅ Image text extraction with vision models
|
||||
- ✅ Adds translated image text as comments
|
||||
|
||||
### 📝 Word Translation (.docx)
|
||||
- ✅ Translates body text, headers, footers, and tables
|
||||
- ✅ Preserves heading styles and paragraph formatting
|
||||
- ✅ Maintains lists, images, charts, and SmartArt
|
||||
- ✅ Image text extraction and translation
|
||||
|
||||
### 📽️ PowerPoint Translation (.pptx)
|
||||
- ✅ Translates slide titles, body text, and speaker notes
|
||||
- ✅ Preserves slide layouts, transitions, and animations
|
||||
- ✅ Image text extraction with text boxes added below images
|
||||
- ✅ Keeps layering order and positions
|
||||
|
||||
### 🧠 LLM Features (Ollama/WebLLM/OpenAI)
|
||||
- ✅ **Custom System Prompts**: Provide context for better translations
|
||||
- ✅ **Technical Glossary**: Define term mappings (e.g., `batterie=coil`)
|
||||
- ✅ **Presets**: HVAC, IT, Legal, Medical terminology
|
||||
- ✅ **Vision Models**: Translate text within images (gemma3, qwen3-vl, gpt-4o)
|
||||
|
||||
### 🏢 SaaS-Ready Features
|
||||
- 🚦 **Rate Limiting**: Per-client IP with token bucket and sliding window algorithms
|
||||
- 🔒 **Security Headers**: CSP, XSS protection, HSTS support
|
||||
- 🧹 **Auto Cleanup**: Automatic file cleanup with TTL tracking
|
||||
- 📊 **Monitoring**: Health checks, metrics, and system status
|
||||
- 🔐 **Admin Dashboard**: Secure admin panel with authentication
|
||||
- 📝 **Request Logging**: Structured logging with unique request IDs
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### Installation
|
||||
|
||||
```powershell
|
||||
# Clone the repository
|
||||
git clone https://gitea.parsanet.org/sepehr/office_translator.git
|
||||
cd office_translator
|
||||
|
||||
# Create virtual environment
|
||||
python -m venv venv
|
||||
.\venv\Scripts\Activate.ps1
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Run the API
|
||||
python main.py
|
||||
```
|
||||
|
||||
The API starts on `http://localhost:8000`
|
||||
|
||||
### Frontend Setup
|
||||
|
||||
```powershell
|
||||
cd frontend
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
Frontend runs on `http://localhost:3000`
|
||||
|
||||
## 📚 API Documentation
|
||||
|
||||
- **Swagger UI**: http://localhost:8000/docs
|
||||
- **ReDoc**: http://localhost:8000/redoc
|
||||
|
||||
## 🔧 API Endpoints
|
||||
|
||||
### Translation
|
||||
|
||||
#### POST /translate
|
||||
Translate a document with full customization.
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8000/translate" \
|
||||
-F "file=@document.xlsx" \
|
||||
-F "target_language=en" \
|
||||
-F "provider=ollama" \
|
||||
-F "ollama_model=gemma3:12b" \
|
||||
-F "translate_images=true" \
|
||||
-F "system_prompt=You are translating HVAC documents."
|
||||
```
|
||||
|
||||
### Monitoring
|
||||
|
||||
#### GET /health
|
||||
Comprehensive health check with system status.
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"translation_service": "google",
|
||||
"memory": {"system_percent": 34.1, "system_available_gb": 61.7},
|
||||
"disk": {"total_files": 0, "total_size_mb": 0},
|
||||
"cleanup_service": {"is_running": true}
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /metrics
|
||||
System metrics and statistics.
|
||||
|
||||
#### GET /rate-limit/status
|
||||
Current rate limit status for the requesting client.
|
||||
|
||||
### Admin Endpoints (Authentication Required)
|
||||
|
||||
#### POST /admin/login
|
||||
Login to admin dashboard.
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8000/admin/login" \
|
||||
-F "username=admin" \
|
||||
-F "password=your_password"
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"token": "your_bearer_token",
|
||||
"expires_in": 86400
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /admin/dashboard
|
||||
Get comprehensive dashboard data (requires Bearer token).
|
||||
|
||||
```bash
|
||||
curl "http://localhost:8000/admin/dashboard" \
|
||||
-H "Authorization: Bearer your_token"
|
||||
```
|
||||
|
||||
#### POST /admin/cleanup/trigger
|
||||
Manually trigger file cleanup.
|
||||
|
||||
#### GET /admin/files/tracked
|
||||
List currently tracked files.
|
||||
|
||||
## 🌐 Supported Languages
|
||||
|
||||
| Code | Language | Code | Language |
|
||||
|------|----------|------|----------|
|
||||
| en | English | fr | French |
|
||||
| fa | Persian/Farsi | es | Spanish |
|
||||
| de | German | it | Italian |
|
||||
| pt | Portuguese | ru | Russian |
|
||||
| zh | Chinese | ja | Japanese |
|
||||
| ko | Korean | ar | Arabic |
|
||||
|
||||
## ⚙️ Configuration
|
||||
|
||||
### Environment Variables (.env)
|
||||
|
||||
1. **Copy** `.env.example` to `.env`: `cp .env.example .env`
|
||||
2. **Fill** required variables (see comments in `.env.example`: Required vs Optional).
|
||||
3. In **production** (`ENV=production`), missing required vars (e.g. `JWT_SECRET_KEY`, `ADMIN_USERNAME`, `ADMIN_PASSWORD` or `ADMIN_PASSWORD_HASH`, `ADMIN_TOKEN_SECRET`, `DATABASE_URL`, and `REDIS_URL` if rate limiting is on) cause the app to **fail at startup** with a clear message listing them (Story 6.6, NFR10).
|
||||
|
||||
```env
|
||||
# ============== Translation Services ==============
|
||||
TRANSLATION_SERVICE=google
|
||||
DEEPL_API_KEY=your_deepl_api_key_here
|
||||
|
||||
# Ollama Configuration
|
||||
OLLAMA_BASE_URL=http://localhost:11434
|
||||
OLLAMA_MODEL=llama3
|
||||
OLLAMA_VISION_MODEL=llava
|
||||
|
||||
# ============== File Limits ==============
|
||||
MAX_FILE_SIZE_MB=50
|
||||
|
||||
# ============== Rate Limiting (SaaS) ==============
|
||||
RATE_LIMIT_ENABLED=true
|
||||
RATE_LIMIT_PER_MINUTE=30
|
||||
RATE_LIMIT_PER_HOUR=200
|
||||
TRANSLATIONS_PER_MINUTE=10
|
||||
TRANSLATIONS_PER_HOUR=50
|
||||
MAX_CONCURRENT_TRANSLATIONS=5
|
||||
|
||||
# ============== Cleanup Service ==============
|
||||
CLEANUP_ENABLED=true
|
||||
CLEANUP_INTERVAL_MINUTES=15
|
||||
FILE_TTL_MINUTES=60
|
||||
INPUT_FILE_TTL_MINUTES=30
|
||||
OUTPUT_FILE_TTL_MINUTES=120
|
||||
|
||||
# ============== Security ==============
|
||||
# When behind Nginx (production), HSTS is set by the proxy; ENABLE_HSTS applies when running the app without a reverse proxy (e.g. dev).
|
||||
ENABLE_HSTS=false
|
||||
# Use "*" only for local development; set explicit origins in production (see .env.example).
|
||||
CORS_ORIGINS=*
|
||||
|
||||
# ============== Admin Authentication ==============
|
||||
ADMIN_USERNAME=admin
|
||||
ADMIN_PASSWORD=changeme123 # Change in production!
|
||||
# Or use a SHA256 hash:
|
||||
# ADMIN_PASSWORD_HASH=your_sha256_hash
|
||||
|
||||
# ============== Monitoring ==============
|
||||
LOG_LEVEL=INFO
|
||||
ENABLE_REQUEST_LOGGING=true
|
||||
MAX_MEMORY_PERCENT=80
|
||||
```
|
||||
|
||||
### Ollama Setup
|
||||
|
||||
```bash
|
||||
# Install Ollama (Windows)
|
||||
winget install Ollama.Ollama
|
||||
|
||||
# Pull a model
|
||||
ollama pull llama3.2
|
||||
|
||||
# For vision/image translation
|
||||
ollama pull gemma3:12b
|
||||
# or
|
||||
ollama pull qwen3-vl:8b
|
||||
```
|
||||
|
||||
## 🎯 Using System Prompts & Glossary
|
||||
|
||||
### Example: HVAC Translation
|
||||
|
||||
**System Prompt:**
|
||||
```
|
||||
You are translating HVAC technical documents.
|
||||
Use precise technical terminology.
|
||||
Keep unit measurements (kW, m³/h, Pa) unchanged.
|
||||
```
|
||||
|
||||
**Glossary:**
|
||||
```
|
||||
batterie=coil
|
||||
groupe froid=chiller
|
||||
CTA=AHU (Air Handling Unit)
|
||||
échangeur=heat exchanger
|
||||
vanne 3 voies=3-way valve
|
||||
```
|
||||
|
||||
### Presets Available
|
||||
- 🔧 **HVAC**: Heating, Ventilation, Air Conditioning
|
||||
- 💻 **IT**: Software and technology
|
||||
- ⚖️ **Legal**: Legal documents
|
||||
- 🏥 **Medical**: Healthcare terminology
|
||||
|
||||
## <20> Admin Dashboard
|
||||
|
||||
Access the admin dashboard at `/admin` in the frontend. Features:
|
||||
|
||||
- **System Status**: Health, uptime, and issues
|
||||
- **Memory & Disk Monitoring**: Real-time usage stats
|
||||
- **Translation Statistics**: Total translations, success rate
|
||||
- **Rate Limit Management**: View active clients and limits
|
||||
- **Cleanup Service**: Monitor and trigger manual cleanup
|
||||
|
||||
### Default Credentials
|
||||
- **Username**: admin
|
||||
- **Password**: changeme123
|
||||
|
||||
⚠️ **Change the default password in production!**
|
||||
|
||||
## 🏗️ Project Structure
|
||||
|
||||
```
|
||||
Translate/
|
||||
├── main.py # FastAPI application with SaaS features
|
||||
├── config.py # Configuration with SaaS settings
|
||||
├── requirements.txt # Dependencies
|
||||
├── mcp_server.py # MCP server implementation
|
||||
├── middleware/ # SaaS middleware
|
||||
│ ├── __init__.py
|
||||
│ ├── rate_limiting.py # Rate limiting with token bucket
|
||||
│ ├── validation.py # Input validation
|
||||
│ ├── security.py # Security headers & logging
|
||||
│ └── cleanup.py # Auto cleanup service
|
||||
├── services/
|
||||
│ └── translation_service.py # Translation providers
|
||||
├── translators/
|
||||
│ ├── excel_translator.py # Excel with image support
|
||||
│ ├── word_translator.py # Word with image support
|
||||
│ └── pptx_translator.py # PowerPoint with image support
|
||||
├── frontend/ # Next.js frontend
|
||||
│ ├── src/
|
||||
│ │ ├── app/
|
||||
│ │ │ ├── page.tsx # Main translation page
|
||||
│ │ │ ├── admin/ # Admin dashboard
|
||||
│ │ │ └── settings/ # Settings pages
|
||||
│ │ └── components/
|
||||
│ └── package.json
|
||||
├── static/
|
||||
│ └── webllm.html # WebLLM standalone interface
|
||||
├── uploads/ # Temporary uploads (auto-cleaned)
|
||||
└── outputs/ # Translated files (auto-cleaned)
|
||||
```
|
||||
|
||||
## 🛠️ Tech Stack
|
||||
|
||||
### Backend
|
||||
- **FastAPI**: Modern async web framework
|
||||
- **openpyxl**: Excel manipulation
|
||||
- **python-docx**: Word documents
|
||||
- **python-pptx**: PowerPoint presentations
|
||||
- **deep-translator**: Google/DeepL/Libre translation
|
||||
- **psutil**: System monitoring
|
||||
- **python-magic**: File type validation
|
||||
|
||||
### Frontend
|
||||
- **Next.js 15**: React framework
|
||||
- **Tailwind CSS**: Styling
|
||||
- **Lucide Icons**: Icon library
|
||||
- **WebLLM**: Browser-based LLM
|
||||
|
||||
## 🔌 MCP Integration
|
||||
|
||||
This API can be used as an MCP (Model Context Protocol) server for AI assistants.
|
||||
|
||||
### VS Code Configuration
|
||||
|
||||
Add to your VS Code `settings.json` or `.vscode/mcp.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"servers": {
|
||||
"document-translator": {
|
||||
"type": "stdio",
|
||||
"command": "python",
|
||||
"args": ["mcp_server.py"],
|
||||
"cwd": "D:/Translate",
|
||||
"env": {
|
||||
"PYTHONPATH": "D:/Translate"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 🚀 Production Deployment
|
||||
|
||||
### Security Checklist
|
||||
- [ ] Change `ADMIN_PASSWORD` or set `ADMIN_PASSWORD_HASH`
|
||||
- [ ] Set `CORS_ORIGINS` to your frontend domain
|
||||
- [ ] Enable `ENABLE_HSTS=true` if using HTTPS (when not behind Nginx; behind Nginx, HSTS is set by the proxy)
|
||||
- [ ] Configure rate limits appropriately
|
||||
- [ ] Set up log rotation for `logs/` directory
|
||||
- [ ] Use a reverse proxy (nginx/traefik) for HTTPS
|
||||
|
||||
### Docker Deployment
|
||||
|
||||
En production, utilisez le stack Docker Compose avec Nginx en reverse proxy (ports 80/443) :
|
||||
|
||||
```bash
|
||||
# Avec certificats SSL dans docker/nginx/ssl/ (voir DEPLOYMENT_GUIDE.md)
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
- **Nginx** : terminaison TLS, HTTP→HTTPS, HSTS, routage `/api/*` → backend, `/*` → frontend (Story 6.5).
|
||||
- Backend et frontend ne sont pas exposés sur l’hôte ; tout passe par le proxy.
|
||||
- Détails : [DEPLOYMENT_GUIDE.md](./DEPLOYMENT_GUIDE.md) (SSL/TLS, santé `/health`, variables d’environnement).
|
||||
|
||||
## 📝 License
|
||||
|
||||
MIT License
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
Contributions welcome! Please submit a Pull Request.
|
||||
Ce fichier sert de **portail central** pour accéder à toutes les documentations techniques, guides d'exploitation, de déploiement et de secours de l'application.
|
||||
|
||||
---
|
||||
|
||||
**Built with ❤️ using Python, FastAPI, Next.js, and Ollama**
|
||||
## 🗺️ Carte de la Documentation
|
||||
|
||||
Pour faciliter la navigation, utilisez les liens ci-dessous pour accéder directement aux guides spécialisés :
|
||||
|
||||
### 🚀 Démarrage & Utilisation
|
||||
* 📥 **[Guide de démarrage rapide (QUICKSTART.md)](./QUICKSTART.md)** : Installation locale, configuration d'Ollama et lancement du projet en développement.
|
||||
* 📖 **[Guide d'utilisation de l'API (GUIDE_UTILISATION.md)](./GUIDE_UTILISATION.md)** : Exemples de requêtes de traduction, gestion du glossaire et des presets techniques (CVC, IT, Légal, etc.).
|
||||
* 💡 **[Fiche de référence rapide (QUICK_REFERENCE.md)](./QUICK_REFERENCE.md)** : Commandes utiles, ports réseaux, identifiants par défaut et astuces rapides.
|
||||
|
||||
### 🏗️ Architecture & Conception
|
||||
* 📐 **[Architecture logicielle (ARCHITECTURE.md)](./ARCHITECTURE.md)** : Choix technologiques, flux de données, gestion du cycle de vie des fichiers, sécurité (HSTS, CSP) et monitoring.
|
||||
|
||||
### 🌐 Déploiement en Production
|
||||
* 🏢 **[Guide de Déploiement Général (DEPLOYMENT_GUIDE.md)](./DEPLOYMENT_GUIDE.md)** : Guide standard pour le déploiement de production sous Docker avec reverse-proxy Nginx et SSL.
|
||||
* 🏠 **[Guide de Déploiement Homelab (DEPLOYMENT_HOMELAB.md)](./DEPLOYMENT_HOMELAB.md)** : Guide pas-à-pas configuré spécifiquement pour le réseau homelab (NPM, Stripe, DNS IONOS, Gitea CI/CD, sauvegarde NAS).
|
||||
* ☁️ **[Déploiement sur IONOS (DEPLOY_IONOS.md)](./DEPLOY_IONOS.md)** : Instructions spécifiques pour déployer l'infrastructure sur un serveur virtuel IONOS.
|
||||
|
||||
### 🛡️ Sauvegarde, Résilience & Secours (Disaster Recovery)
|
||||
* 💾 **[Plan de Reprise d'Activité (DISASTER_RECOVERY.md)](./DISASTER_RECOVERY.md)** : Guide complet du système de sauvegarde automatique vers le NAS via SSH/rsync, monitoring et automatisation du failover.
|
||||
* 🔄 **[Procédure de Restauration Simplifiée (PROCEDURE_RESTAURATION.md)](./PROCEDURE_RESTAURATION.md)** : Guide d'urgence pas-à-pas pour restaurer les données sur le serveur actif ou basculer en 20 minutes sur le serveur de secours (`.98`).
|
||||
|
||||
---
|
||||
|
||||
## ✨ Fonctionnalités Clés
|
||||
|
||||
### 🔄 Multi-fournisseurs de Traduction
|
||||
L'application supporte 7 moteurs de traduction, activables à la volée :
|
||||
* **Google Translate** (Gratuit, rapide, par défaut)
|
||||
* **Ollama** (LLM local pour une confidentialité totale, ex : `gemma3`, `llama3.2`)
|
||||
* **DeepL API** (Haute qualité pour l'entreprise)
|
||||
* **OpenAI** (Modèles GPT-4o, support de la vision)
|
||||
* **DeepSeek, OpenRouter, LibreTranslate, WebLLM**
|
||||
|
||||
### 📁 Traduction Intelligente par Fichier
|
||||
* **Excel (.xlsx)** : Conserve la fusion des cellules, les formules, les polices de caractères, les styles de bordures et traduit également le texte contenu dans les images (via modèles vision).
|
||||
* **Word (.docx)** : Préserve les en-têtes, pieds de page, tableaux, listes à puces et la mise en forme des paragraphes.
|
||||
* **PowerPoint (.pptx)** : Conserve la mise en page des diapositives, les animations et transitions.
|
||||
|
||||
### 🏢 Sécurité & Exploitation (SaaS-Ready)
|
||||
* 🚦 **Limitation de débit (Rate Limiting)** : Par IP client avec algorithme Token Bucket stocké dans Redis.
|
||||
* 🧹 **Nettoyage automatique (Auto Cleanup)** : Suppression automatique des fichiers temporaires après expiration de la durée de vie (TTL).
|
||||
* 📊 **Monitoring complet** : Route `/health` détaillée et intégration Prometheus + Grafana pour suivre les performances physiques et logicielles.
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Stack Technique
|
||||
|
||||
### Backend
|
||||
* **FastAPI** (Python 3.11+) : API asynchrone rapide et documentée (Swagger disponible sur `http://localhost:8000/docs`).
|
||||
* **openpyxl**, **python-docx**, **python-pptx** : Moteurs de manipulation de documents sans dépendance Microsoft Office.
|
||||
* **Docker / Docker Compose** : Isolation complète de l'application, de la base PostgreSQL et du cache Redis.
|
||||
|
||||
### Frontend
|
||||
* **Next.js 15** (React) & **Tailwind CSS** : Interface utilisateur moderne, ergonomique et responsive.
|
||||
* **Lucide Icons** : Bibliothèque d'icônes vectorielles.
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Lancement Rapide (Mode Dev)
|
||||
|
||||
Pour un déploiement complet en production ou homelab, veuillez vous référer aux fichiers de déploiement listés dans la [Carte de la Documentation](#-carte-de-la-documentation).
|
||||
|
||||
```bash
|
||||
# 1. Cloner le projet
|
||||
git clone https://gitea.parsanet.org/sepehr/office_translator.git /opt/wordly
|
||||
cd /opt/wordly
|
||||
|
||||
# 2. Configurer l'environnement
|
||||
cp .env.example .env
|
||||
# Modifiez les variables dans le .env selon vos besoins
|
||||
|
||||
# 3. Lancer avec Docker Compose
|
||||
docker compose up -d --build
|
||||
```
|
||||
* **API (Backend)** : `http://localhost:8000` (Documentation Swagger sur `/docs`)
|
||||
* **Interface Web (Frontend)** : `http://localhost:3000`
|
||||
|
||||
Reference in New Issue
Block a user