Translations were missing from i18n.tsx (the runtime translation file), causing the cookie consent banner to crash. Added keys for en, fr, es, de, pt, it, nl, ru, ja, ko, zh, ar, fa. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
📄 Document Translation API
A powerful SaaS-ready Python API for translating complex structured documents (Excel, Word, PowerPoint) while strictly preserving the original formatting, layout, and embedded media.
✨ Features
🔄 Multiple Translation Providers
| Provider | Type | Description |
|---|---|---|
| Google Translate | Cloud | Free, fast, reliable |
| Ollama | Local LLM | Privacy-focused, customizable with system prompts |
| WebLLM | Browser | Runs entirely in browser using WebGPU |
| DeepL | Cloud | High-quality translations (API key required) |
| LibreTranslate | Self-hosted | Open-source alternative |
| OpenAI | Cloud | GPT-4o/4o-mini with vision support |
📊 Excel Translation (.xlsx)
- ✅ Translates all cell content and sheet names
- ✅ Preserves cell merging, formulas, and styles
- ✅ Maintains font styles, colors, and borders
- ✅ Image text extraction with vision models
- ✅ Adds translated image text as comments
📝 Word Translation (.docx)
- ✅ Translates body text, headers, footers, and tables
- ✅ Preserves heading styles and paragraph formatting
- ✅ Maintains lists, images, charts, and SmartArt
- ✅ Image text extraction and translation
📽️ PowerPoint Translation (.pptx)
- ✅ Translates slide titles, body text, and speaker notes
- ✅ Preserves slide layouts, transitions, and animations
- ✅ Image text extraction with text boxes added below images
- ✅ Keeps layering order and positions
🧠 LLM Features (Ollama/WebLLM/OpenAI)
- ✅ Custom System Prompts: Provide context for better translations
- ✅ Technical Glossary: Define term mappings (e.g.,
batterie=coil) - ✅ Presets: HVAC, IT, Legal, Medical terminology
- ✅ Vision Models: Translate text within images (gemma3, qwen3-vl, gpt-4o)
🏢 SaaS-Ready Features
- 🚦 Rate Limiting: Per-client IP with token bucket and sliding window algorithms
- 🔒 Security Headers: CSP, XSS protection, HSTS support
- 🧹 Auto Cleanup: Automatic file cleanup with TTL tracking
- 📊 Monitoring: Health checks, metrics, and system status
- 🔐 Admin Dashboard: Secure admin panel with authentication
- 📝 Request Logging: Structured logging with unique request IDs
🚀 Quick Start
Installation
# Clone the repository
git clone https://gitea.parsanet.org/sepehr/office_translator.git
cd office_translator
# Create virtual environment
python -m venv venv
.\venv\Scripts\Activate.ps1
# Install dependencies
pip install -r requirements.txt
# Run the API
python main.py
The API starts on http://localhost:8000
Frontend Setup
cd frontend
npm install
npm run dev
Frontend runs on http://localhost:3000
📚 API Documentation
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
🔧 API Endpoints
Translation
POST /translate
Translate a document with full customization.
curl -X POST "http://localhost:8000/translate" \
-F "file=@document.xlsx" \
-F "target_language=en" \
-F "provider=ollama" \
-F "ollama_model=gemma3:12b" \
-F "translate_images=true" \
-F "system_prompt=You are translating HVAC documents."
Monitoring
GET /health
Comprehensive health check with system status.
{
"status": "healthy",
"translation_service": "google",
"memory": {"system_percent": 34.1, "system_available_gb": 61.7},
"disk": {"total_files": 0, "total_size_mb": 0},
"cleanup_service": {"is_running": true}
}
GET /metrics
System metrics and statistics.
GET /rate-limit/status
Current rate limit status for the requesting client.
Admin Endpoints (Authentication Required)
POST /admin/login
Login to admin dashboard.
curl -X POST "http://localhost:8000/admin/login" \
-F "username=admin" \
-F "password=your_password"
Response:
{
"status": "success",
"token": "your_bearer_token",
"expires_in": 86400
}
GET /admin/dashboard
Get comprehensive dashboard data (requires Bearer token).
curl "http://localhost:8000/admin/dashboard" \
-H "Authorization: Bearer your_token"
POST /admin/cleanup/trigger
Manually trigger file cleanup.
GET /admin/files/tracked
List currently tracked files.
🌐 Supported Languages
| Code | Language | Code | Language |
|---|---|---|---|
| en | English | fr | French |
| fa | Persian/Farsi | es | Spanish |
| de | German | it | Italian |
| pt | Portuguese | ru | Russian |
| zh | Chinese | ja | Japanese |
| ko | Korean | ar | Arabic |
⚙️ Configuration
Environment Variables (.env)
- Copy
.env.exampleto.env:cp .env.example .env - Fill required variables (see comments in
.env.example: Required vs Optional). - In production (
ENV=production), missing required vars (e.g.JWT_SECRET_KEY,ADMIN_USERNAME,ADMIN_PASSWORDorADMIN_PASSWORD_HASH,ADMIN_TOKEN_SECRET,DATABASE_URL, andREDIS_URLif rate limiting is on) cause the app to fail at startup with a clear message listing them (Story 6.6, NFR10).
# ============== Translation Services ==============
TRANSLATION_SERVICE=google
DEEPL_API_KEY=your_deepl_api_key_here
# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3
OLLAMA_VISION_MODEL=llava
# ============== File Limits ==============
MAX_FILE_SIZE_MB=50
# ============== Rate Limiting (SaaS) ==============
RATE_LIMIT_ENABLED=true
RATE_LIMIT_PER_MINUTE=30
RATE_LIMIT_PER_HOUR=200
TRANSLATIONS_PER_MINUTE=10
TRANSLATIONS_PER_HOUR=50
MAX_CONCURRENT_TRANSLATIONS=5
# ============== Cleanup Service ==============
CLEANUP_ENABLED=true
CLEANUP_INTERVAL_MINUTES=15
FILE_TTL_MINUTES=60
INPUT_FILE_TTL_MINUTES=30
OUTPUT_FILE_TTL_MINUTES=120
# ============== Security ==============
# When behind Nginx (production), HSTS is set by the proxy; ENABLE_HSTS applies when running the app without a reverse proxy (e.g. dev).
ENABLE_HSTS=false
# Use "*" only for local development; set explicit origins in production (see .env.example).
CORS_ORIGINS=*
# ============== Admin Authentication ==============
ADMIN_USERNAME=admin
ADMIN_PASSWORD=changeme123 # Change in production!
# Or use a SHA256 hash:
# ADMIN_PASSWORD_HASH=your_sha256_hash
# ============== Monitoring ==============
LOG_LEVEL=INFO
ENABLE_REQUEST_LOGGING=true
MAX_MEMORY_PERCENT=80
Ollama Setup
# Install Ollama (Windows)
winget install Ollama.Ollama
# Pull a model
ollama pull llama3.2
# For vision/image translation
ollama pull gemma3:12b
# or
ollama pull qwen3-vl:8b
🎯 Using System Prompts & Glossary
Example: HVAC Translation
System Prompt:
You are translating HVAC technical documents.
Use precise technical terminology.
Keep unit measurements (kW, m³/h, Pa) unchanged.
Glossary:
batterie=coil
groupe froid=chiller
CTA=AHU (Air Handling Unit)
échangeur=heat exchanger
vanne 3 voies=3-way valve
Presets Available
- 🔧 HVAC: Heating, Ventilation, Air Conditioning
- 💻 IT: Software and technology
- ⚖️ Legal: Legal documents
- 🏥 Medical: Healthcare terminology
<EFBFBD> Admin Dashboard
Access the admin dashboard at /admin in the frontend. Features:
- System Status: Health, uptime, and issues
- Memory & Disk Monitoring: Real-time usage stats
- Translation Statistics: Total translations, success rate
- Rate Limit Management: View active clients and limits
- Cleanup Service: Monitor and trigger manual cleanup
Default Credentials
- Username: admin
- Password: changeme123
⚠️ Change the default password in production!
🏗️ Project Structure
Translate/
├── main.py # FastAPI application with SaaS features
├── config.py # Configuration with SaaS settings
├── requirements.txt # Dependencies
├── mcp_server.py # MCP server implementation
├── middleware/ # SaaS middleware
│ ├── __init__.py
│ ├── rate_limiting.py # Rate limiting with token bucket
│ ├── validation.py # Input validation
│ ├── security.py # Security headers & logging
│ └── cleanup.py # Auto cleanup service
├── services/
│ └── translation_service.py # Translation providers
├── translators/
│ ├── excel_translator.py # Excel with image support
│ ├── word_translator.py # Word with image support
│ └── pptx_translator.py # PowerPoint with image support
├── frontend/ # Next.js frontend
│ ├── src/
│ │ ├── app/
│ │ │ ├── page.tsx # Main translation page
│ │ │ ├── admin/ # Admin dashboard
│ │ │ └── settings/ # Settings pages
│ │ └── components/
│ └── package.json
├── static/
│ └── webllm.html # WebLLM standalone interface
├── uploads/ # Temporary uploads (auto-cleaned)
└── outputs/ # Translated files (auto-cleaned)
🛠️ Tech Stack
Backend
- FastAPI: Modern async web framework
- openpyxl: Excel manipulation
- python-docx: Word documents
- python-pptx: PowerPoint presentations
- deep-translator: Google/DeepL/Libre translation
- psutil: System monitoring
- python-magic: File type validation
Frontend
- Next.js 15: React framework
- Tailwind CSS: Styling
- Lucide Icons: Icon library
- WebLLM: Browser-based LLM
🔌 MCP Integration
This API can be used as an MCP (Model Context Protocol) server for AI assistants.
VS Code Configuration
Add to your VS Code settings.json or .vscode/mcp.json:
{
"servers": {
"document-translator": {
"type": "stdio",
"command": "python",
"args": ["mcp_server.py"],
"cwd": "D:/Translate",
"env": {
"PYTHONPATH": "D:/Translate"
}
}
}
}
🚀 Production Deployment
Security Checklist
- Change
ADMIN_PASSWORDor setADMIN_PASSWORD_HASH - Set
CORS_ORIGINSto your frontend domain - Enable
ENABLE_HSTS=trueif using HTTPS (when not behind Nginx; behind Nginx, HSTS is set by the proxy) - Configure rate limits appropriately
- Set up log rotation for
logs/directory - Use a reverse proxy (nginx/traefik) for HTTPS
Docker Deployment
En production, utilisez le stack Docker Compose avec Nginx en reverse proxy (ports 80/443) :
# Avec certificats SSL dans docker/nginx/ssl/ (voir DEPLOYMENT_GUIDE.md)
docker compose up -d
- Nginx : terminaison TLS, HTTP→HTTPS, HSTS, routage
/api/*→ backend,/*→ frontend (Story 6.5). - Backend et frontend ne sont pas exposés sur l’hôte ; tout passe par le proxy.
- Détails : DEPLOYMENT_GUIDE.md (SSL/TLS, santé
/health, variables d’environnement).
📝 License
MIT License
🤝 Contributing
Contributions welcome! Please submit a Pull Request.
Built with ❤️ using Python, FastAPI, Next.js, and Ollama