# Document Translation API A powerful Python API for translating complex structured documents (Excel, Word, PowerPoint) while **strictly preserving** the original formatting, layout, and embedded media. ## ๐ŸŽฏ Features ### Excel Translation (.xlsx) - โœ… Translates all cell content and sheet names - โœ… Preserves cell merging - โœ… Maintains font styles (size, bold, italic, color) - โœ… Keeps background colors and borders - โœ… Translates text within formulas while preserving formula structure - โœ… Retains embedded images in original positions ### Word Translation (.docx) - โœ… Translates body text, headers, footers, and tables - โœ… Preserves heading styles and paragraph formatting - โœ… Maintains lists (numbered/bulleted) - โœ… Keeps embedded images, charts, and SmartArt in place - โœ… Preserves table structures and cell formatting ### PowerPoint Translation (.pptx) - โœ… Translates slide titles, body text, and speaker notes - โœ… Preserves slide layouts and transitions - โœ… Maintains animations - โœ… Keeps images, videos, and shapes in exact positions - โœ… Preserves layering order ## ๐Ÿš€ Quick Start ### Installation 1. **Clone the repository:** ```powershell git clone cd Translate ``` 2. **Create a virtual environment:** ```powershell python -m venv venv .\venv\Scripts\Activate.ps1 ``` 3. **Install dependencies:** ```powershell pip install -r requirements.txt ``` 4. **Configure environment:** ```powershell cp .env.example .env # Edit .env with your preferred settings ``` 5. **Run the API:** ```powershell python main.py ``` The API will start on `http://localhost:8000` ## ๐Ÿ“š API Documentation Once the server is running, visit: - **Swagger UI**: http://localhost:8000/docs - **ReDoc**: http://localhost:8000/redoc ## ๐Ÿ”ง API Endpoints ### POST /translate Translate a single document **Request:** ```bash curl -X POST "http://localhost:8000/translate" \ -F "file=@document.xlsx" \ -F "target_language=es" \ -F "source_language=auto" ``` **Response:** Returns the translated document file ### POST /translate-batch Translate multiple documents at once **Request:** ```bash curl -X POST "http://localhost:8000/translate-batch" \ -F "files=@document1.docx" \ -F "files=@document2.pptx" \ -F "target_language=fr" ``` ### GET /languages Get list of supported language codes ### GET /health Health check endpoint ## ๐Ÿ’ป Usage Examples ### Python Example ```python import requests # Translate a document with open('document.xlsx', 'rb') as f: files = {'file': f} data = { 'target_language': 'es', 'source_language': 'auto' } response = requests.post('http://localhost:8000/translate', files=files, data=data) # Save translated file with open('translated_document.xlsx', 'wb') as output: output.write(response.content) ``` ### JavaScript/TypeScript Example ```javascript const formData = new FormData(); formData.append('file', fileInput.files[0]); formData.append('target_language', 'fr'); formData.append('source_language', 'auto'); const response = await fetch('http://localhost:8000/translate', { method: 'POST', body: formData }); const blob = await response.blob(); const url = window.URL.createObjectURL(blob); const a = document.createElement('a'); a.href = url; a.download = 'translated_document.docx'; a.click(); ``` ### PowerShell Example ```powershell $file = Get-Item "document.pptx" $uri = "http://localhost:8000/translate" $form = @{ file = $file target_language = "de" source_language = "auto" } Invoke-RestMethod -Uri $uri -Method Post -Form $form -OutFile "translated_document.pptx" ``` ## ๐ŸŒ Supported Languages The API supports 25+ languages including: - Spanish (es), French (fr), German (de) - Italian (it), Portuguese (pt), Russian (ru) - Chinese (zh), Japanese (ja), Korean (ko) - Arabic (ar), Hindi (hi), Dutch (nl) - And many more... Full list available at: `GET /languages` ## โš™๏ธ Configuration Edit `.env` file to configure: ```env # Translation Service (google, deepl, libre) TRANSLATION_SERVICE=google # DeepL API Key (if using DeepL) DEEPL_API_KEY=your_api_key_here # File Upload Limits MAX_FILE_SIZE_MB=50 # Directory Configuration UPLOAD_DIR=./uploads OUTPUT_DIR=./outputs ``` ## ๐Ÿ”Œ Model Context Protocol (MCP) Integration This API is designed to be easily wrapped as an MCP server for future integration with AI assistants and tools. ### MCP Server Structure (Future Implementation) ```json { "mcpServers": { "document-translator": { "command": "python", "args": ["-m", "mcp_server"], "env": { "API_URL": "http://localhost:8000" } } } } ``` ### Example MCP Tools The MCP wrapper will expose these tools: 1. **translate_document** - Translate a single document 2. **translate_batch** - Translate multiple documents 3. **get_supported_languages** - List supported languages 4. **check_translation_status** - Check status of translation ## ๐Ÿ—๏ธ Project Structure ``` Translate/ โ”œโ”€โ”€ main.py # FastAPI application โ”œโ”€โ”€ config.py # Configuration management โ”œโ”€โ”€ requirements.txt # Dependencies โ”œโ”€โ”€ .env.example # Environment template โ”œโ”€โ”€ services/ โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ””โ”€โ”€ translation_service.py # Translation abstraction layer โ”œโ”€โ”€ translators/ โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ”œโ”€โ”€ excel_translator.py # Excel translation logic โ”‚ โ”œโ”€โ”€ word_translator.py # Word translation logic โ”‚ โ””โ”€โ”€ pptx_translator.py # PowerPoint translation logic โ”œโ”€โ”€ utils/ โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ”œโ”€โ”€ file_handler.py # File operations โ”‚ โ””โ”€โ”€ exceptions.py # Custom exceptions โ”œโ”€โ”€ uploads/ # Temporary upload storage โ””โ”€โ”€ outputs/ # Translated files ``` ## ๐Ÿงช Testing ### Manual Testing 1. Start the API server 2. Navigate to http://localhost:8000/docs 3. Use the interactive Swagger UI to test endpoints ### Test Files Prepare test files with: - Complex formatting (multiple fonts, colors, styles) - Embedded images and media - Tables and merged cells - Formulas (for Excel) - Multiple sections/slides ## ๐Ÿ› ๏ธ Technical Details ### Libraries Used - **FastAPI**: Modern web framework for building APIs - **openpyxl**: Excel file manipulation with formatting preservation - **python-docx**: Word document handling - **python-pptx**: PowerPoint presentation processing - **deep-translator**: Multi-provider translation service - **Uvicorn**: ASGI server for running FastAPI ### Design Principles 1. **Modular Architecture**: Each file type has its own translator module 2. **Provider Abstraction**: Easy to swap translation services (Google, DeepL, LibreTranslate) 3. **Format Preservation**: All translators maintain original document structure 4. **Error Handling**: Comprehensive error handling and logging 5. **Scalability**: Ready for MCP integration and microservices architecture ## ๐Ÿ” Security Considerations For production deployment: 1. **Configure CORS** properly in `main.py` 2. **Add authentication** for API endpoints 3. **Implement rate limiting** to prevent abuse 4. **Use HTTPS** for secure file transmission 5. **Sanitize file uploads** to prevent malicious files 6. **Set appropriate file size limits** ## ๐Ÿ“ License MIT License - Feel free to use this project for your needs. ## ๐Ÿค Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## ๐Ÿ“ง Support For issues and questions, please open an issue on the repository. --- **Built with โค๏ธ using Python and FastAPI**