office_translator/README.md

7.5 KiB

Document Translation API

A powerful Python API for translating complex structured documents (Excel, Word, PowerPoint) while strictly preserving the original formatting, layout, and embedded media.

🎯 Features

Excel Translation (.xlsx)

  • Translates all cell content and sheet names
  • Preserves cell merging
  • Maintains font styles (size, bold, italic, color)
  • Keeps background colors and borders
  • Translates text within formulas while preserving formula structure
  • Retains embedded images in original positions

Word Translation (.docx)

  • Translates body text, headers, footers, and tables
  • Preserves heading styles and paragraph formatting
  • Maintains lists (numbered/bulleted)
  • Keeps embedded images, charts, and SmartArt in place
  • Preserves table structures and cell formatting

PowerPoint Translation (.pptx)

  • Translates slide titles, body text, and speaker notes
  • Preserves slide layouts and transitions
  • Maintains animations
  • Keeps images, videos, and shapes in exact positions
  • Preserves layering order

🚀 Quick Start

Installation

  1. Clone the repository:
git clone <repository-url>
cd Translate
  1. Create a virtual environment:
python -m venv venv
.\venv\Scripts\Activate.ps1
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure environment:
cp .env.example .env
# Edit .env with your preferred settings
  1. Run the API:
python main.py

The API will start on http://localhost:8000

📚 API Documentation

Once the server is running, visit:

🔧 API Endpoints

POST /translate

Translate a single document

Request:

curl -X POST "http://localhost:8000/translate" \
  -F "file=@document.xlsx" \
  -F "target_language=es" \
  -F "source_language=auto"

Response: Returns the translated document file

POST /translate-batch

Translate multiple documents at once

Request:

curl -X POST "http://localhost:8000/translate-batch" \
  -F "files=@document1.docx" \
  -F "files=@document2.pptx" \
  -F "target_language=fr"

GET /languages

Get list of supported language codes

GET /health

Health check endpoint

💻 Usage Examples

Python Example

import requests

# Translate a document
with open('document.xlsx', 'rb') as f:
    files = {'file': f}
    data = {
        'target_language': 'es',
        'source_language': 'auto'
    }
    response = requests.post('http://localhost:8000/translate', files=files, data=data)
    
    # Save translated file
    with open('translated_document.xlsx', 'wb') as output:
        output.write(response.content)

JavaScript/TypeScript Example

const formData = new FormData();
formData.append('file', fileInput.files[0]);
formData.append('target_language', 'fr');
formData.append('source_language', 'auto');

const response = await fetch('http://localhost:8000/translate', {
    method: 'POST',
    body: formData
});

const blob = await response.blob();
const url = window.URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'translated_document.docx';
a.click();

PowerShell Example

$file = Get-Item "document.pptx"
$uri = "http://localhost:8000/translate"

$form = @{
    file = $file
    target_language = "de"
    source_language = "auto"
}

Invoke-RestMethod -Uri $uri -Method Post -Form $form -OutFile "translated_document.pptx"

🌐 Supported Languages

The API supports 25+ languages including:

  • Spanish (es), French (fr), German (de)
  • Italian (it), Portuguese (pt), Russian (ru)
  • Chinese (zh), Japanese (ja), Korean (ko)
  • Arabic (ar), Hindi (hi), Dutch (nl)
  • And many more...

Full list available at: GET /languages

⚙️ Configuration

Edit .env file to configure:

# Translation Service (google, deepl, libre)
TRANSLATION_SERVICE=google

# DeepL API Key (if using DeepL)
DEEPL_API_KEY=your_api_key_here

# File Upload Limits
MAX_FILE_SIZE_MB=50

# Directory Configuration
UPLOAD_DIR=./uploads
OUTPUT_DIR=./outputs

🔌 Model Context Protocol (MCP) Integration

This API is designed to be easily wrapped as an MCP server for future integration with AI assistants and tools.

MCP Server Structure (Future Implementation)

{
  "mcpServers": {
    "document-translator": {
      "command": "python",
      "args": ["-m", "mcp_server"],
      "env": {
        "API_URL": "http://localhost:8000"
      }
    }
  }
}

Example MCP Tools

The MCP wrapper will expose these tools:

  1. translate_document - Translate a single document
  2. translate_batch - Translate multiple documents
  3. get_supported_languages - List supported languages
  4. check_translation_status - Check status of translation

🏗️ Project Structure

Translate/
├── main.py                 # FastAPI application
├── config.py              # Configuration management
├── requirements.txt       # Dependencies
├── .env.example          # Environment template
├── services/
│   ├── __init__.py
│   └── translation_service.py    # Translation abstraction layer
├── translators/
│   ├── __init__.py
│   ├── excel_translator.py       # Excel translation logic
│   ├── word_translator.py        # Word translation logic
│   └── pptx_translator.py        # PowerPoint translation logic
├── utils/
│   ├── __init__.py
│   ├── file_handler.py           # File operations
│   └── exceptions.py             # Custom exceptions
├── uploads/              # Temporary upload storage
└── outputs/              # Translated files

🧪 Testing

Manual Testing

  1. Start the API server
  2. Navigate to http://localhost:8000/docs
  3. Use the interactive Swagger UI to test endpoints

Test Files

Prepare test files with:

  • Complex formatting (multiple fonts, colors, styles)
  • Embedded images and media
  • Tables and merged cells
  • Formulas (for Excel)
  • Multiple sections/slides

🛠️ Technical Details

Libraries Used

  • FastAPI: Modern web framework for building APIs
  • openpyxl: Excel file manipulation with formatting preservation
  • python-docx: Word document handling
  • python-pptx: PowerPoint presentation processing
  • deep-translator: Multi-provider translation service
  • Uvicorn: ASGI server for running FastAPI

Design Principles

  1. Modular Architecture: Each file type has its own translator module
  2. Provider Abstraction: Easy to swap translation services (Google, DeepL, LibreTranslate)
  3. Format Preservation: All translators maintain original document structure
  4. Error Handling: Comprehensive error handling and logging
  5. Scalability: Ready for MCP integration and microservices architecture

🔐 Security Considerations

For production deployment:

  1. Configure CORS properly in main.py
  2. Add authentication for API endpoints
  3. Implement rate limiting to prevent abuse
  4. Use HTTPS for secure file transmission
  5. Sanitize file uploads to prevent malicious files
  6. Set appropriate file size limits

📝 License

MIT License - Feel free to use this project for your needs.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📧 Support

For issues and questions, please open an issue on the repository.


Built with ❤️ using Python and FastAPI