- GoogleTranslationProvider: Added batch translation with separator method - DeepLTranslationProvider: Added translator caching and batch support - LibreTranslationProvider: Added translator caching and batch support - WordTranslator: Collect all texts -> batch translate -> apply pattern - ExcelTranslator: Collect all texts -> batch translate -> apply pattern - PowerPointTranslator: Collect all texts -> batch translate -> apply pattern - Enhanced Ollama/OpenAI prompts with stricter translation-only rules - Added rule: return original text if uncertain about translation
📄 Document Translation API
A powerful SaaS-ready Python API for translating complex structured documents (Excel, Word, PowerPoint) while strictly preserving the original formatting, layout, and embedded media.
✨ Features
🔄 Multiple Translation Providers
| Provider | Type | Description |
|---|---|---|
| Google Translate | Cloud | Free, fast, reliable |
| Ollama | Local LLM | Privacy-focused, customizable with system prompts |
| WebLLM | Browser | Runs entirely in browser using WebGPU |
| DeepL | Cloud | High-quality translations (API key required) |
| LibreTranslate | Self-hosted | Open-source alternative |
| OpenAI | Cloud | GPT-4o/4o-mini with vision support |
📊 Excel Translation (.xlsx)
- ✅ Translates all cell content and sheet names
- ✅ Preserves cell merging, formulas, and styles
- ✅ Maintains font styles, colors, and borders
- ✅ Image text extraction with vision models
- ✅ Adds translated image text as comments
📝 Word Translation (.docx)
- ✅ Translates body text, headers, footers, and tables
- ✅ Preserves heading styles and paragraph formatting
- ✅ Maintains lists, images, charts, and SmartArt
- ✅ Image text extraction and translation
📽️ PowerPoint Translation (.pptx)
- ✅ Translates slide titles, body text, and speaker notes
- ✅ Preserves slide layouts, transitions, and animations
- ✅ Image text extraction with text boxes added below images
- ✅ Keeps layering order and positions
🧠 LLM Features (Ollama/WebLLM/OpenAI)
- ✅ Custom System Prompts: Provide context for better translations
- ✅ Technical Glossary: Define term mappings (e.g.,
batterie=coil) - ✅ Presets: HVAC, IT, Legal, Medical terminology
- ✅ Vision Models: Translate text within images (gemma3, qwen3-vl, gpt-4o)
🏢 SaaS-Ready Features
- 🚦 Rate Limiting: Per-client IP with token bucket and sliding window algorithms
- 🔒 Security Headers: CSP, XSS protection, HSTS support
- 🧹 Auto Cleanup: Automatic file cleanup with TTL tracking
- 📊 Monitoring: Health checks, metrics, and system status
- 🔐 Admin Dashboard: Secure admin panel with authentication
- 📝 Request Logging: Structured logging with unique request IDs
🚀 Quick Start
Installation
# Clone the repository
git clone https://gitea.parsanet.org/sepehr/office_translator.git
cd office_translator
# Create virtual environment
python -m venv venv
.\venv\Scripts\Activate.ps1
# Install dependencies
pip install -r requirements.txt
# Run the API
python main.py
The API starts on http://localhost:8000
Frontend Setup
cd frontend
npm install
npm run dev
Frontend runs on http://localhost:3000
📚 API Documentation
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
🔧 API Endpoints
Translation
POST /translate
Translate a document with full customization.
curl -X POST "http://localhost:8000/translate" \
-F "file=@document.xlsx" \
-F "target_language=en" \
-F "provider=ollama" \
-F "ollama_model=gemma3:12b" \
-F "translate_images=true" \
-F "system_prompt=You are translating HVAC documents."
Monitoring
GET /health
Comprehensive health check with system status.
{
"status": "healthy",
"translation_service": "google",
"memory": {"system_percent": 34.1, "system_available_gb": 61.7},
"disk": {"total_files": 0, "total_size_mb": 0},
"cleanup_service": {"is_running": true}
}
GET /metrics
System metrics and statistics.
GET /rate-limit/status
Current rate limit status for the requesting client.
Admin Endpoints (Authentication Required)
POST /admin/login
Login to admin dashboard.
curl -X POST "http://localhost:8000/admin/login" \
-F "username=admin" \
-F "password=your_password"
Response:
{
"status": "success",
"token": "your_bearer_token",
"expires_in": 86400
}
GET /admin/dashboard
Get comprehensive dashboard data (requires Bearer token).
curl "http://localhost:8000/admin/dashboard" \
-H "Authorization: Bearer your_token"
POST /admin/cleanup/trigger
Manually trigger file cleanup.
GET /admin/files/tracked
List currently tracked files.
🌐 Supported Languages
| Code | Language | Code | Language |
|---|---|---|---|
| en | English | fr | French |
| fa | Persian/Farsi | es | Spanish |
| de | German | it | Italian |
| pt | Portuguese | ru | Russian |
| zh | Chinese | ja | Japanese |
| ko | Korean | ar | Arabic |
⚙️ Configuration
Environment Variables (.env)
# ============== Translation Services ==============
TRANSLATION_SERVICE=google
DEEPL_API_KEY=your_deepl_api_key_here
# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3
OLLAMA_VISION_MODEL=llava
# ============== File Limits ==============
MAX_FILE_SIZE_MB=50
# ============== Rate Limiting (SaaS) ==============
RATE_LIMIT_ENABLED=true
RATE_LIMIT_PER_MINUTE=30
RATE_LIMIT_PER_HOUR=200
TRANSLATIONS_PER_MINUTE=10
TRANSLATIONS_PER_HOUR=50
MAX_CONCURRENT_TRANSLATIONS=5
# ============== Cleanup Service ==============
CLEANUP_ENABLED=true
CLEANUP_INTERVAL_MINUTES=15
FILE_TTL_MINUTES=60
INPUT_FILE_TTL_MINUTES=30
OUTPUT_FILE_TTL_MINUTES=120
# ============== Security ==============
ENABLE_HSTS=false
CORS_ORIGINS=*
# ============== Admin Authentication ==============
ADMIN_USERNAME=admin
ADMIN_PASSWORD=changeme123 # Change in production!
# Or use a SHA256 hash:
# ADMIN_PASSWORD_HASH=your_sha256_hash
# ============== Monitoring ==============
LOG_LEVEL=INFO
ENABLE_REQUEST_LOGGING=true
MAX_MEMORY_PERCENT=80
Ollama Setup
# Install Ollama (Windows)
winget install Ollama.Ollama
# Pull a model
ollama pull llama3.2
# For vision/image translation
ollama pull gemma3:12b
# or
ollama pull qwen3-vl:8b
🎯 Using System Prompts & Glossary
Example: HVAC Translation
System Prompt:
You are translating HVAC technical documents.
Use precise technical terminology.
Keep unit measurements (kW, m³/h, Pa) unchanged.
Glossary:
batterie=coil
groupe froid=chiller
CTA=AHU (Air Handling Unit)
échangeur=heat exchanger
vanne 3 voies=3-way valve
Presets Available
- 🔧 HVAC: Heating, Ventilation, Air Conditioning
- 💻 IT: Software and technology
- ⚖️ Legal: Legal documents
- 🏥 Medical: Healthcare terminology
<EFBFBD> Admin Dashboard
Access the admin dashboard at /admin in the frontend. Features:
- System Status: Health, uptime, and issues
- Memory & Disk Monitoring: Real-time usage stats
- Translation Statistics: Total translations, success rate
- Rate Limit Management: View active clients and limits
- Cleanup Service: Monitor and trigger manual cleanup
Default Credentials
- Username: admin
- Password: changeme123
⚠️ Change the default password in production!
🏗️ Project Structure
Translate/
├── main.py # FastAPI application with SaaS features
├── config.py # Configuration with SaaS settings
├── requirements.txt # Dependencies
├── mcp_server.py # MCP server implementation
├── middleware/ # SaaS middleware
│ ├── __init__.py
│ ├── rate_limiting.py # Rate limiting with token bucket
│ ├── validation.py # Input validation
│ ├── security.py # Security headers & logging
│ └── cleanup.py # Auto cleanup service
├── services/
│ └── translation_service.py # Translation providers
├── translators/
│ ├── excel_translator.py # Excel with image support
│ ├── word_translator.py # Word with image support
│ └── pptx_translator.py # PowerPoint with image support
├── frontend/ # Next.js frontend
│ ├── src/
│ │ ├── app/
│ │ │ ├── page.tsx # Main translation page
│ │ │ ├── admin/ # Admin dashboard
│ │ │ └── settings/ # Settings pages
│ │ └── components/
│ └── package.json
├── static/
│ └── webllm.html # WebLLM standalone interface
├── uploads/ # Temporary uploads (auto-cleaned)
└── outputs/ # Translated files (auto-cleaned)
🛠️ Tech Stack
Backend
- FastAPI: Modern async web framework
- openpyxl: Excel manipulation
- python-docx: Word documents
- python-pptx: PowerPoint presentations
- deep-translator: Google/DeepL/Libre translation
- psutil: System monitoring
- python-magic: File type validation
Frontend
- Next.js 15: React framework
- Tailwind CSS: Styling
- Lucide Icons: Icon library
- WebLLM: Browser-based LLM
🔌 MCP Integration
This API can be used as an MCP (Model Context Protocol) server for AI assistants.
VS Code Configuration
Add to your VS Code settings.json or .vscode/mcp.json:
{
"servers": {
"document-translator": {
"type": "stdio",
"command": "python",
"args": ["mcp_server.py"],
"cwd": "D:/Translate",
"env": {
"PYTHONPATH": "D:/Translate"
}
}
}
}
🚀 Production Deployment
Security Checklist
- Change
ADMIN_PASSWORDor setADMIN_PASSWORD_HASH - Set
CORS_ORIGINSto your frontend domain - Enable
ENABLE_HSTS=trueif using HTTPS - Configure rate limits appropriately
- Set up log rotation for
logs/directory - Use a reverse proxy (nginx/traefik) for HTTPS
Docker Deployment (Coming Soon)
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
📝 License
MIT License
🤝 Contributing
Contributions welcome! Please submit a Pull Request.
Built with ❤️ using Python, FastAPI, Next.js, and Ollama