diff --git a/README.md b/README.md index 1dd8fc5..ceb5a60 100644 --- a/README.md +++ b/README.md @@ -1,389 +1,84 @@ -# 📄 Document Translation API +# 📄 Wordly.art — Document Translation Portal -A powerful SaaS-ready Python API for translating complex structured documents (Excel, Word, PowerPoint) while **strictly preserving** the original formatting, layout, and embedded media. +Wordly.art est une application complĂšte et prĂȘte pour la production (SaaS-ready) permettant de traduire des documents bureautiques complexes (Excel, Word, PowerPoint) tout en **prĂ©servant strictement** la mise en page originale, le style, les formules et les mĂ©dias intĂ©grĂ©s. -## ✹ Features - -### 🔄 Multiple Translation Providers -| Provider | Type | Description | -|----------|------|-------------| -| **Google Translate** | Cloud | Free, fast, reliable | -| **Ollama** | Local LLM | Privacy-focused, customizable with system prompts | -| **WebLLM** | Browser | Runs entirely in browser using WebGPU | -| **DeepL** | Cloud | High-quality translations (API key required) | -| **LibreTranslate** | Self-hosted | Open-source alternative | -| **OpenAI** | Cloud | GPT-4o/4o-mini with vision support | - -### 📊 Excel Translation (.xlsx) -- ✅ Translates all cell content and sheet names -- ✅ Preserves cell merging, formulas, and styles -- ✅ Maintains font styles, colors, and borders -- ✅ Image text extraction with vision models -- ✅ Adds translated image text as comments - -### 📝 Word Translation (.docx) -- ✅ Translates body text, headers, footers, and tables -- ✅ Preserves heading styles and paragraph formatting -- ✅ Maintains lists, images, charts, and SmartArt -- ✅ Image text extraction and translation - -### đŸ“œïž PowerPoint Translation (.pptx) -- ✅ Translates slide titles, body text, and speaker notes -- ✅ Preserves slide layouts, transitions, and animations -- ✅ Image text extraction with text boxes added below images -- ✅ Keeps layering order and positions - -### 🧠 LLM Features (Ollama/WebLLM/OpenAI) -- ✅ **Custom System Prompts**: Provide context for better translations -- ✅ **Technical Glossary**: Define term mappings (e.g., `batterie=coil`) -- ✅ **Presets**: HVAC, IT, Legal, Medical terminology -- ✅ **Vision Models**: Translate text within images (gemma3, qwen3-vl, gpt-4o) - -### 🏱 SaaS-Ready Features -- 🚩 **Rate Limiting**: Per-client IP with token bucket and sliding window algorithms -- 🔒 **Security Headers**: CSP, XSS protection, HSTS support -- đŸ§č **Auto Cleanup**: Automatic file cleanup with TTL tracking -- 📊 **Monitoring**: Health checks, metrics, and system status -- 🔐 **Admin Dashboard**: Secure admin panel with authentication -- 📝 **Request Logging**: Structured logging with unique request IDs - -## 🚀 Quick Start - -### Installation - -```powershell -# Clone the repository -git clone https://gitea.parsanet.org/sepehr/office_translator.git -cd office_translator - -# Create virtual environment -python -m venv venv -.\venv\Scripts\Activate.ps1 - -# Install dependencies -pip install -r requirements.txt - -# Run the API -python main.py -``` - -The API starts on `http://localhost:8000` - -### Frontend Setup - -```powershell -cd frontend -npm install -npm run dev -``` - -Frontend runs on `http://localhost:3000` - -## 📚 API Documentation - -- **Swagger UI**: http://localhost:8000/docs -- **ReDoc**: http://localhost:8000/redoc - -## 🔧 API Endpoints - -### Translation - -#### POST /translate -Translate a document with full customization. - -```bash -curl -X POST "http://localhost:8000/translate" \ - -F "file=@document.xlsx" \ - -F "target_language=en" \ - -F "provider=ollama" \ - -F "ollama_model=gemma3:12b" \ - -F "translate_images=true" \ - -F "system_prompt=You are translating HVAC documents." -``` - -### Monitoring - -#### GET /health -Comprehensive health check with system status. - -```json -{ - "status": "healthy", - "translation_service": "google", - "memory": {"system_percent": 34.1, "system_available_gb": 61.7}, - "disk": {"total_files": 0, "total_size_mb": 0}, - "cleanup_service": {"is_running": true} -} -``` - -#### GET /metrics -System metrics and statistics. - -#### GET /rate-limit/status -Current rate limit status for the requesting client. - -### Admin Endpoints (Authentication Required) - -#### POST /admin/login -Login to admin dashboard. - -```bash -curl -X POST "http://localhost:8000/admin/login" \ - -F "username=admin" \ - -F "password=your_password" -``` - -Response: -```json -{ - "status": "success", - "token": "your_bearer_token", - "expires_in": 86400 -} -``` - -#### GET /admin/dashboard -Get comprehensive dashboard data (requires Bearer token). - -```bash -curl "http://localhost:8000/admin/dashboard" \ - -H "Authorization: Bearer your_token" -``` - -#### POST /admin/cleanup/trigger -Manually trigger file cleanup. - -#### GET /admin/files/tracked -List currently tracked files. - -## 🌐 Supported Languages - -| Code | Language | Code | Language | -|------|----------|------|----------| -| en | English | fr | French | -| fa | Persian/Farsi | es | Spanish | -| de | German | it | Italian | -| pt | Portuguese | ru | Russian | -| zh | Chinese | ja | Japanese | -| ko | Korean | ar | Arabic | - -## ⚙ Configuration - -### Environment Variables (.env) - -1. **Copy** `.env.example` to `.env`: `cp .env.example .env` -2. **Fill** required variables (see comments in `.env.example`: Required vs Optional). -3. In **production** (`ENV=production`), missing required vars (e.g. `JWT_SECRET_KEY`, `ADMIN_USERNAME`, `ADMIN_PASSWORD` or `ADMIN_PASSWORD_HASH`, `ADMIN_TOKEN_SECRET`, `DATABASE_URL`, and `REDIS_URL` if rate limiting is on) cause the app to **fail at startup** with a clear message listing them (Story 6.6, NFR10). - -```env -# ============== Translation Services ============== -TRANSLATION_SERVICE=google -DEEPL_API_KEY=your_deepl_api_key_here - -# Ollama Configuration -OLLAMA_BASE_URL=http://localhost:11434 -OLLAMA_MODEL=llama3 -OLLAMA_VISION_MODEL=llava - -# ============== File Limits ============== -MAX_FILE_SIZE_MB=50 - -# ============== Rate Limiting (SaaS) ============== -RATE_LIMIT_ENABLED=true -RATE_LIMIT_PER_MINUTE=30 -RATE_LIMIT_PER_HOUR=200 -TRANSLATIONS_PER_MINUTE=10 -TRANSLATIONS_PER_HOUR=50 -MAX_CONCURRENT_TRANSLATIONS=5 - -# ============== Cleanup Service ============== -CLEANUP_ENABLED=true -CLEANUP_INTERVAL_MINUTES=15 -FILE_TTL_MINUTES=60 -INPUT_FILE_TTL_MINUTES=30 -OUTPUT_FILE_TTL_MINUTES=120 - -# ============== Security ============== -# When behind Nginx (production), HSTS is set by the proxy; ENABLE_HSTS applies when running the app without a reverse proxy (e.g. dev). -ENABLE_HSTS=false -# Use "*" only for local development; set explicit origins in production (see .env.example). -CORS_ORIGINS=* - -# ============== Admin Authentication ============== -ADMIN_USERNAME=admin -ADMIN_PASSWORD=changeme123 # Change in production! -# Or use a SHA256 hash: -# ADMIN_PASSWORD_HASH=your_sha256_hash - -# ============== Monitoring ============== -LOG_LEVEL=INFO -ENABLE_REQUEST_LOGGING=true -MAX_MEMORY_PERCENT=80 -``` - -### Ollama Setup - -```bash -# Install Ollama (Windows) -winget install Ollama.Ollama - -# Pull a model -ollama pull llama3.2 - -# For vision/image translation -ollama pull gemma3:12b -# or -ollama pull qwen3-vl:8b -``` - -## 🎯 Using System Prompts & Glossary - -### Example: HVAC Translation - -**System Prompt:** -``` -You are translating HVAC technical documents. -Use precise technical terminology. -Keep unit measurements (kW, mÂł/h, Pa) unchanged. -``` - -**Glossary:** -``` -batterie=coil -groupe froid=chiller -CTA=AHU (Air Handling Unit) -Ă©changeur=heat exchanger -vanne 3 voies=3-way valve -``` - -### Presets Available -- 🔧 **HVAC**: Heating, Ventilation, Air Conditioning -- đŸ’» **IT**: Software and technology -- ⚖ **Legal**: Legal documents -- đŸ„ **Medical**: Healthcare terminology - -## ïżœ Admin Dashboard - -Access the admin dashboard at `/admin` in the frontend. Features: - -- **System Status**: Health, uptime, and issues -- **Memory & Disk Monitoring**: Real-time usage stats -- **Translation Statistics**: Total translations, success rate -- **Rate Limit Management**: View active clients and limits -- **Cleanup Service**: Monitor and trigger manual cleanup - -### Default Credentials -- **Username**: admin -- **Password**: changeme123 - -⚠ **Change the default password in production!** - -## đŸ—ïž Project Structure - -``` -Translate/ -├── main.py # FastAPI application with SaaS features -├── config.py # Configuration with SaaS settings -├── requirements.txt # Dependencies -├── mcp_server.py # MCP server implementation -├── middleware/ # SaaS middleware -│ ├── __init__.py -│ ├── rate_limiting.py # Rate limiting with token bucket -│ ├── validation.py # Input validation -│ ├── security.py # Security headers & logging -│ └── cleanup.py # Auto cleanup service -├── services/ -│ └── translation_service.py # Translation providers -├── translators/ -│ ├── excel_translator.py # Excel with image support -│ ├── word_translator.py # Word with image support -│ └── pptx_translator.py # PowerPoint with image support -├── frontend/ # Next.js frontend -│ ├── src/ -│ │ ├── app/ -│ │ │ ├── page.tsx # Main translation page -│ │ │ ├── admin/ # Admin dashboard -│ │ │ └── settings/ # Settings pages -│ │ └── components/ -│ └── package.json -├── static/ -│ └── webllm.html # WebLLM standalone interface -├── uploads/ # Temporary uploads (auto-cleaned) -└── outputs/ # Translated files (auto-cleaned) -``` - -## đŸ› ïž Tech Stack - -### Backend -- **FastAPI**: Modern async web framework -- **openpyxl**: Excel manipulation -- **python-docx**: Word documents -- **python-pptx**: PowerPoint presentations -- **deep-translator**: Google/DeepL/Libre translation -- **psutil**: System monitoring -- **python-magic**: File type validation - -### Frontend -- **Next.js 15**: React framework -- **Tailwind CSS**: Styling -- **Lucide Icons**: Icon library -- **WebLLM**: Browser-based LLM - -## 🔌 MCP Integration - -This API can be used as an MCP (Model Context Protocol) server for AI assistants. - -### VS Code Configuration - -Add to your VS Code `settings.json` or `.vscode/mcp.json`: - -```json -{ - "servers": { - "document-translator": { - "type": "stdio", - "command": "python", - "args": ["mcp_server.py"], - "cwd": "D:/Translate", - "env": { - "PYTHONPATH": "D:/Translate" - } - } - } -} -``` - -## 🚀 Production Deployment - -### Security Checklist -- [ ] Change `ADMIN_PASSWORD` or set `ADMIN_PASSWORD_HASH` -- [ ] Set `CORS_ORIGINS` to your frontend domain -- [ ] Enable `ENABLE_HSTS=true` if using HTTPS (when not behind Nginx; behind Nginx, HSTS is set by the proxy) -- [ ] Configure rate limits appropriately -- [ ] Set up log rotation for `logs/` directory -- [ ] Use a reverse proxy (nginx/traefik) for HTTPS - -### Docker Deployment - -En production, utilisez le stack Docker Compose avec Nginx en reverse proxy (ports 80/443) : - -```bash -# Avec certificats SSL dans docker/nginx/ssl/ (voir DEPLOYMENT_GUIDE.md) -docker compose up -d -``` - -- **Nginx** : terminaison TLS, HTTP→HTTPS, HSTS, routage `/api/*` → backend, `/*` → frontend (Story 6.5). -- Backend et frontend ne sont pas exposĂ©s sur l’hĂŽte ; tout passe par le proxy. -- DĂ©tails : [DEPLOYMENT_GUIDE.md](./DEPLOYMENT_GUIDE.md) (SSL/TLS, santĂ© `/health`, variables d’environnement). - -## 📝 License - -MIT License - -## đŸ€ Contributing - -Contributions welcome! Please submit a Pull Request. +Ce fichier sert de **portail central** pour accĂ©der Ă  toutes les documentations techniques, guides d'exploitation, de dĂ©ploiement et de secours de l'application. --- -**Built with ❀ using Python, FastAPI, Next.js, and Ollama** +## đŸ—ș Carte de la Documentation + +Pour faciliter la navigation, utilisez les liens ci-dessous pour accĂ©der directement aux guides spĂ©cialisĂ©s : + +### 🚀 DĂ©marrage & Utilisation +* đŸ“„ **[Guide de dĂ©marrage rapide (QUICKSTART.md)](./QUICKSTART.md)** : Installation locale, configuration d'Ollama et lancement du projet en dĂ©veloppement. +* 📖 **[Guide d'utilisation de l'API (GUIDE_UTILISATION.md)](./GUIDE_UTILISATION.md)** : Exemples de requĂȘtes de traduction, gestion du glossaire et des presets techniques (CVC, IT, LĂ©gal, etc.). +* 💡 **[Fiche de rĂ©fĂ©rence rapide (QUICK_REFERENCE.md)](./QUICK_REFERENCE.md)** : Commandes utiles, ports rĂ©seaux, identifiants par dĂ©faut et astuces rapides. + +### đŸ—ïž Architecture & Conception +* 📐 **[Architecture logicielle (ARCHITECTURE.md)](./ARCHITECTURE.md)** : Choix technologiques, flux de donnĂ©es, gestion du cycle de vie des fichiers, sĂ©curitĂ© (HSTS, CSP) et monitoring. + +### 🌐 DĂ©ploiement en Production +* 🏱 **[Guide de DĂ©ploiement GĂ©nĂ©ral (DEPLOYMENT_GUIDE.md)](./DEPLOYMENT_GUIDE.md)** : Guide standard pour le dĂ©ploiement de production sous Docker avec reverse-proxy Nginx et SSL. +* 🏠 **[Guide de DĂ©ploiement Homelab (DEPLOYMENT_HOMELAB.md)](./DEPLOYMENT_HOMELAB.md)** : Guide pas-Ă -pas configurĂ© spĂ©cifiquement pour le rĂ©seau homelab (NPM, Stripe, DNS IONOS, Gitea CI/CD, sauvegarde NAS). +* ☁ **[DĂ©ploiement sur IONOS (DEPLOY_IONOS.md)](./DEPLOY_IONOS.md)** : Instructions spĂ©cifiques pour dĂ©ployer l'infrastructure sur un serveur virtuel IONOS. + +### đŸ›Ąïž Sauvegarde, RĂ©silience & Secours (Disaster Recovery) +* đŸ’Ÿ **[Plan de Reprise d'ActivitĂ© (DISASTER_RECOVERY.md)](./DISASTER_RECOVERY.md)** : Guide complet du systĂšme de sauvegarde automatique vers le NAS via SSH/rsync, monitoring et automatisation du failover. +* 🔄 **[ProcĂ©dure de Restauration SimplifiĂ©e (PROCEDURE_RESTAURATION.md)](./PROCEDURE_RESTAURATION.md)** : Guide d'urgence pas-Ă -pas pour restaurer les donnĂ©es sur le serveur actif ou basculer en 20 minutes sur le serveur de secours (`.98`). + +--- + +## ✹ FonctionnalitĂ©s ClĂ©s + +### 🔄 Multi-fournisseurs de Traduction +L'application supporte 7 moteurs de traduction, activables Ă  la volĂ©e : +* **Google Translate** (Gratuit, rapide, par dĂ©faut) +* **Ollama** (LLM local pour une confidentialitĂ© totale, ex : `gemma3`, `llama3.2`) +* **DeepL API** (Haute qualitĂ© pour l'entreprise) +* **OpenAI** (ModĂšles GPT-4o, support de la vision) +* **DeepSeek, OpenRouter, LibreTranslate, WebLLM** + +### 📁 Traduction Intelligente par Fichier +* **Excel (.xlsx)** : Conserve la fusion des cellules, les formules, les polices de caractĂšres, les styles de bordures et traduit Ă©galement le texte contenu dans les images (via modĂšles vision). +* **Word (.docx)** : PrĂ©serve les en-tĂȘtes, pieds de page, tableaux, listes Ă  puces et la mise en forme des paragraphes. +* **PowerPoint (.pptx)** : Conserve la mise en page des diapositives, les animations et transitions. + +### 🏱 SĂ©curitĂ© & Exploitation (SaaS-Ready) +* 🚩 **Limitation de dĂ©bit (Rate Limiting)** : Par IP client avec algorithme Token Bucket stockĂ© dans Redis. +* đŸ§č **Nettoyage automatique (Auto Cleanup)** : Suppression automatique des fichiers temporaires aprĂšs expiration de la durĂ©e de vie (TTL). +* 📊 **Monitoring complet** : Route `/health` dĂ©taillĂ©e et intĂ©gration Prometheus + Grafana pour suivre les performances physiques et logicielles. + +--- + +## đŸ› ïž Stack Technique + +### Backend +* **FastAPI** (Python 3.11+) : API asynchrone rapide et documentĂ©e (Swagger disponible sur `http://localhost:8000/docs`). +* **openpyxl**, **python-docx**, **python-pptx** : Moteurs de manipulation de documents sans dĂ©pendance Microsoft Office. +* **Docker / Docker Compose** : Isolation complĂšte de l'application, de la base PostgreSQL et du cache Redis. + +### Frontend +* **Next.js 15** (React) & **Tailwind CSS** : Interface utilisateur moderne, ergonomique et responsive. +* **Lucide Icons** : BibliothĂšque d'icĂŽnes vectorielles. + +--- + +## 🚀 Lancement Rapide (Mode Dev) + +Pour un dĂ©ploiement complet en production ou homelab, veuillez vous rĂ©fĂ©rer aux fichiers de dĂ©ploiement listĂ©s dans la [Carte de la Documentation](#-carte-de-la-documentation). + +```bash +# 1. Cloner le projet +git clone https://gitea.parsanet.org/sepehr/office_translator.git /opt/wordly +cd /opt/wordly + +# 2. Configurer l'environnement +cp .env.example .env +# Modifiez les variables dans le .env selon vos besoins + +# 3. Lancer avec Docker Compose +docker compose up -d --build +``` +* **API (Backend)** : `http://localhost:8000` (Documentation Swagger sur `/docs`) +* **Interface Web (Frontend)** : `http://localhost:3000`