Document Translation API
A powerful Python API for translating complex structured documents (Excel, Word, PowerPoint) while strictly preserving the original formatting, layout, and embedded media.
🎯 Features
Excel Translation (.xlsx)
- ✅ Translates all cell content and sheet names
- ✅ Preserves cell merging
- ✅ Maintains font styles (size, bold, italic, color)
- ✅ Keeps background colors and borders
- ✅ Translates text within formulas while preserving formula structure
- ✅ Retains embedded images in original positions
Word Translation (.docx)
- ✅ Translates body text, headers, footers, and tables
- ✅ Preserves heading styles and paragraph formatting
- ✅ Maintains lists (numbered/bulleted)
- ✅ Keeps embedded images, charts, and SmartArt in place
- ✅ Preserves table structures and cell formatting
PowerPoint Translation (.pptx)
- ✅ Translates slide titles, body text, and speaker notes
- ✅ Preserves slide layouts and transitions
- ✅ Maintains animations
- ✅ Keeps images, videos, and shapes in exact positions
- ✅ Preserves layering order
🚀 Quick Start
Installation
- Clone the repository:
git clone <repository-url>
cd Translate
- Create a virtual environment:
python -m venv venv
.\venv\Scripts\Activate.ps1
- Install dependencies:
pip install -r requirements.txt
- Configure environment:
cp .env.example .env
# Edit .env with your preferred settings
- Run the API:
python main.py
The API will start on http://localhost:8000
📚 API Documentation
Once the server is running, visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
🔧 API Endpoints
POST /translate
Translate a single document
Request:
curl -X POST "http://localhost:8000/translate" \
-F "file=@document.xlsx" \
-F "target_language=es" \
-F "source_language=auto"
Response: Returns the translated document file
POST /translate-batch
Translate multiple documents at once
Request:
curl -X POST "http://localhost:8000/translate-batch" \
-F "files=@document1.docx" \
-F "files=@document2.pptx" \
-F "target_language=fr"
GET /languages
Get list of supported language codes
GET /health
Health check endpoint
💻 Usage Examples
Python Example
import requests
# Translate a document
with open('document.xlsx', 'rb') as f:
files = {'file': f}
data = {
'target_language': 'es',
'source_language': 'auto'
}
response = requests.post('http://localhost:8000/translate', files=files, data=data)
# Save translated file
with open('translated_document.xlsx', 'wb') as output:
output.write(response.content)
JavaScript/TypeScript Example
const formData = new FormData();
formData.append('file', fileInput.files[0]);
formData.append('target_language', 'fr');
formData.append('source_language', 'auto');
const response = await fetch('http://localhost:8000/translate', {
method: 'POST',
body: formData
});
const blob = await response.blob();
const url = window.URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'translated_document.docx';
a.click();
PowerShell Example
$file = Get-Item "document.pptx"
$uri = "http://localhost:8000/translate"
$form = @{
file = $file
target_language = "de"
source_language = "auto"
}
Invoke-RestMethod -Uri $uri -Method Post -Form $form -OutFile "translated_document.pptx"
🌐 Supported Languages
The API supports 25+ languages including:
- Spanish (es), French (fr), German (de)
- Italian (it), Portuguese (pt), Russian (ru)
- Chinese (zh), Japanese (ja), Korean (ko)
- Arabic (ar), Hindi (hi), Dutch (nl)
- And many more...
Full list available at: GET /languages
⚙️ Configuration
Edit .env file to configure:
# Translation Service (google, deepl, libre)
TRANSLATION_SERVICE=google
# DeepL API Key (if using DeepL)
DEEPL_API_KEY=your_api_key_here
# File Upload Limits
MAX_FILE_SIZE_MB=50
# Directory Configuration
UPLOAD_DIR=./uploads
OUTPUT_DIR=./outputs
🔌 Model Context Protocol (MCP) Integration
This API is designed to be easily wrapped as an MCP server for future integration with AI assistants and tools.
MCP Server Structure (Future Implementation)
{
"mcpServers": {
"document-translator": {
"command": "python",
"args": ["-m", "mcp_server"],
"env": {
"API_URL": "http://localhost:8000"
}
}
}
}
Example MCP Tools
The MCP wrapper will expose these tools:
- translate_document - Translate a single document
- translate_batch - Translate multiple documents
- get_supported_languages - List supported languages
- check_translation_status - Check status of translation
🏗️ Project Structure
Translate/
├── main.py # FastAPI application
├── config.py # Configuration management
├── requirements.txt # Dependencies
├── .env.example # Environment template
├── services/
│ ├── __init__.py
│ └── translation_service.py # Translation abstraction layer
├── translators/
│ ├── __init__.py
│ ├── excel_translator.py # Excel translation logic
│ ├── word_translator.py # Word translation logic
│ └── pptx_translator.py # PowerPoint translation logic
├── utils/
│ ├── __init__.py
│ ├── file_handler.py # File operations
│ └── exceptions.py # Custom exceptions
├── uploads/ # Temporary upload storage
└── outputs/ # Translated files
🧪 Testing
Manual Testing
- Start the API server
- Navigate to http://localhost:8000/docs
- Use the interactive Swagger UI to test endpoints
Test Files
Prepare test files with:
- Complex formatting (multiple fonts, colors, styles)
- Embedded images and media
- Tables and merged cells
- Formulas (for Excel)
- Multiple sections/slides
🛠️ Technical Details
Libraries Used
- FastAPI: Modern web framework for building APIs
- openpyxl: Excel file manipulation with formatting preservation
- python-docx: Word document handling
- python-pptx: PowerPoint presentation processing
- deep-translator: Multi-provider translation service
- Uvicorn: ASGI server for running FastAPI
Design Principles
- Modular Architecture: Each file type has its own translator module
- Provider Abstraction: Easy to swap translation services (Google, DeepL, LibreTranslate)
- Format Preservation: All translators maintain original document structure
- Error Handling: Comprehensive error handling and logging
- Scalability: Ready for MCP integration and microservices architecture
🔐 Security Considerations
For production deployment:
- Configure CORS properly in
main.py - Add authentication for API endpoints
- Implement rate limiting to prevent abuse
- Use HTTPS for secure file transmission
- Sanitize file uploads to prevent malicious files
- Set appropriate file size limits
📝 License
MIT License - Feel free to use this project for your needs.
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📧 Support
For issues and questions, please open an issue on the repository.
Built with ❤️ using Python and FastAPI