304 lines
7.5 KiB
Markdown
304 lines
7.5 KiB
Markdown
# Document Translation API
|
|
|
|
A powerful Python API for translating complex structured documents (Excel, Word, PowerPoint) while **strictly preserving** the original formatting, layout, and embedded media.
|
|
|
|
## 🎯 Features
|
|
|
|
### Excel Translation (.xlsx)
|
|
- ✅ Translates all cell content and sheet names
|
|
- ✅ Preserves cell merging
|
|
- ✅ Maintains font styles (size, bold, italic, color)
|
|
- ✅ Keeps background colors and borders
|
|
- ✅ Translates text within formulas while preserving formula structure
|
|
- ✅ Retains embedded images in original positions
|
|
|
|
### Word Translation (.docx)
|
|
- ✅ Translates body text, headers, footers, and tables
|
|
- ✅ Preserves heading styles and paragraph formatting
|
|
- ✅ Maintains lists (numbered/bulleted)
|
|
- ✅ Keeps embedded images, charts, and SmartArt in place
|
|
- ✅ Preserves table structures and cell formatting
|
|
|
|
### PowerPoint Translation (.pptx)
|
|
- ✅ Translates slide titles, body text, and speaker notes
|
|
- ✅ Preserves slide layouts and transitions
|
|
- ✅ Maintains animations
|
|
- ✅ Keeps images, videos, and shapes in exact positions
|
|
- ✅ Preserves layering order
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Installation
|
|
|
|
1. **Clone the repository:**
|
|
```powershell
|
|
git clone <repository-url>
|
|
cd Translate
|
|
```
|
|
|
|
2. **Create a virtual environment:**
|
|
```powershell
|
|
python -m venv venv
|
|
.\venv\Scripts\Activate.ps1
|
|
```
|
|
|
|
3. **Install dependencies:**
|
|
```powershell
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
4. **Configure environment:**
|
|
```powershell
|
|
cp .env.example .env
|
|
# Edit .env with your preferred settings
|
|
```
|
|
|
|
5. **Run the API:**
|
|
```powershell
|
|
python main.py
|
|
```
|
|
|
|
The API will start on `http://localhost:8000`
|
|
|
|
## 📚 API Documentation
|
|
|
|
Once the server is running, visit:
|
|
- **Swagger UI**: http://localhost:8000/docs
|
|
- **ReDoc**: http://localhost:8000/redoc
|
|
|
|
## 🔧 API Endpoints
|
|
|
|
### POST /translate
|
|
Translate a single document
|
|
|
|
**Request:**
|
|
```bash
|
|
curl -X POST "http://localhost:8000/translate" \
|
|
-F "file=@document.xlsx" \
|
|
-F "target_language=es" \
|
|
-F "source_language=auto"
|
|
```
|
|
|
|
**Response:**
|
|
Returns the translated document file
|
|
|
|
### POST /translate-batch
|
|
Translate multiple documents at once
|
|
|
|
**Request:**
|
|
```bash
|
|
curl -X POST "http://localhost:8000/translate-batch" \
|
|
-F "files=@document1.docx" \
|
|
-F "files=@document2.pptx" \
|
|
-F "target_language=fr"
|
|
```
|
|
|
|
### GET /languages
|
|
Get list of supported language codes
|
|
|
|
### GET /health
|
|
Health check endpoint
|
|
|
|
## 💻 Usage Examples
|
|
|
|
### Python Example
|
|
|
|
```python
|
|
import requests
|
|
|
|
# Translate a document
|
|
with open('document.xlsx', 'rb') as f:
|
|
files = {'file': f}
|
|
data = {
|
|
'target_language': 'es',
|
|
'source_language': 'auto'
|
|
}
|
|
response = requests.post('http://localhost:8000/translate', files=files, data=data)
|
|
|
|
# Save translated file
|
|
with open('translated_document.xlsx', 'wb') as output:
|
|
output.write(response.content)
|
|
```
|
|
|
|
### JavaScript/TypeScript Example
|
|
|
|
```javascript
|
|
const formData = new FormData();
|
|
formData.append('file', fileInput.files[0]);
|
|
formData.append('target_language', 'fr');
|
|
formData.append('source_language', 'auto');
|
|
|
|
const response = await fetch('http://localhost:8000/translate', {
|
|
method: 'POST',
|
|
body: formData
|
|
});
|
|
|
|
const blob = await response.blob();
|
|
const url = window.URL.createObjectURL(blob);
|
|
const a = document.createElement('a');
|
|
a.href = url;
|
|
a.download = 'translated_document.docx';
|
|
a.click();
|
|
```
|
|
|
|
### PowerShell Example
|
|
|
|
```powershell
|
|
$file = Get-Item "document.pptx"
|
|
$uri = "http://localhost:8000/translate"
|
|
|
|
$form = @{
|
|
file = $file
|
|
target_language = "de"
|
|
source_language = "auto"
|
|
}
|
|
|
|
Invoke-RestMethod -Uri $uri -Method Post -Form $form -OutFile "translated_document.pptx"
|
|
```
|
|
|
|
## 🌐 Supported Languages
|
|
|
|
The API supports 25+ languages including:
|
|
- Spanish (es), French (fr), German (de)
|
|
- Italian (it), Portuguese (pt), Russian (ru)
|
|
- Chinese (zh), Japanese (ja), Korean (ko)
|
|
- Arabic (ar), Hindi (hi), Dutch (nl)
|
|
- And many more...
|
|
|
|
Full list available at: `GET /languages`
|
|
|
|
## ⚙️ Configuration
|
|
|
|
Edit `.env` file to configure:
|
|
|
|
```env
|
|
# Translation Service (google, deepl, libre)
|
|
TRANSLATION_SERVICE=google
|
|
|
|
# DeepL API Key (if using DeepL)
|
|
DEEPL_API_KEY=your_api_key_here
|
|
|
|
# File Upload Limits
|
|
MAX_FILE_SIZE_MB=50
|
|
|
|
# Directory Configuration
|
|
UPLOAD_DIR=./uploads
|
|
OUTPUT_DIR=./outputs
|
|
```
|
|
|
|
## 🔌 Model Context Protocol (MCP) Integration
|
|
|
|
This API is designed to be easily wrapped as an MCP server for future integration with AI assistants and tools.
|
|
|
|
### MCP Server Structure (Future Implementation)
|
|
|
|
```json
|
|
{
|
|
"mcpServers": {
|
|
"document-translator": {
|
|
"command": "python",
|
|
"args": ["-m", "mcp_server"],
|
|
"env": {
|
|
"API_URL": "http://localhost:8000"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Example MCP Tools
|
|
|
|
The MCP wrapper will expose these tools:
|
|
|
|
1. **translate_document** - Translate a single document
|
|
2. **translate_batch** - Translate multiple documents
|
|
3. **get_supported_languages** - List supported languages
|
|
4. **check_translation_status** - Check status of translation
|
|
|
|
## 🏗️ Project Structure
|
|
|
|
```
|
|
Translate/
|
|
├── main.py # FastAPI application
|
|
├── config.py # Configuration management
|
|
├── requirements.txt # Dependencies
|
|
├── .env.example # Environment template
|
|
├── services/
|
|
│ ├── __init__.py
|
|
│ └── translation_service.py # Translation abstraction layer
|
|
├── translators/
|
|
│ ├── __init__.py
|
|
│ ├── excel_translator.py # Excel translation logic
|
|
│ ├── word_translator.py # Word translation logic
|
|
│ └── pptx_translator.py # PowerPoint translation logic
|
|
├── utils/
|
|
│ ├── __init__.py
|
|
│ ├── file_handler.py # File operations
|
|
│ └── exceptions.py # Custom exceptions
|
|
├── uploads/ # Temporary upload storage
|
|
└── outputs/ # Translated files
|
|
```
|
|
|
|
## 🧪 Testing
|
|
|
|
### Manual Testing
|
|
|
|
1. Start the API server
|
|
2. Navigate to http://localhost:8000/docs
|
|
3. Use the interactive Swagger UI to test endpoints
|
|
|
|
### Test Files
|
|
|
|
Prepare test files with:
|
|
- Complex formatting (multiple fonts, colors, styles)
|
|
- Embedded images and media
|
|
- Tables and merged cells
|
|
- Formulas (for Excel)
|
|
- Multiple sections/slides
|
|
|
|
## 🛠️ Technical Details
|
|
|
|
### Libraries Used
|
|
|
|
- **FastAPI**: Modern web framework for building APIs
|
|
- **openpyxl**: Excel file manipulation with formatting preservation
|
|
- **python-docx**: Word document handling
|
|
- **python-pptx**: PowerPoint presentation processing
|
|
- **deep-translator**: Multi-provider translation service
|
|
- **Uvicorn**: ASGI server for running FastAPI
|
|
|
|
### Design Principles
|
|
|
|
1. **Modular Architecture**: Each file type has its own translator module
|
|
2. **Provider Abstraction**: Easy to swap translation services (Google, DeepL, LibreTranslate)
|
|
3. **Format Preservation**: All translators maintain original document structure
|
|
4. **Error Handling**: Comprehensive error handling and logging
|
|
5. **Scalability**: Ready for MCP integration and microservices architecture
|
|
|
|
## 🔐 Security Considerations
|
|
|
|
For production deployment:
|
|
|
|
1. **Configure CORS** properly in `main.py`
|
|
2. **Add authentication** for API endpoints
|
|
3. **Implement rate limiting** to prevent abuse
|
|
4. **Use HTTPS** for secure file transmission
|
|
5. **Sanitize file uploads** to prevent malicious files
|
|
6. **Set appropriate file size limits**
|
|
|
|
## 📝 License
|
|
|
|
MIT License - Feel free to use this project for your needs.
|
|
|
|
## 🤝 Contributing
|
|
|
|
Contributions are welcome! Please feel free to submit a Pull Request.
|
|
|
|
## 📧 Support
|
|
|
|
For issues and questions, please open an issue on the repository.
|
|
|
|
---
|
|
|
|
**Built with ❤️ using Python and FastAPI**
|