Initial commit: Document Translation API with Excel, Word, PowerPoint support
8
.env.example
Normal file
@ -0,0 +1,8 @@
|
||||
# Translation Service Configuration
|
||||
TRANSLATION_SERVICE=google # Options: google, deepl, libre
|
||||
DEEPL_API_KEY=your_deepl_api_key_here
|
||||
|
||||
# API Configuration
|
||||
MAX_FILE_SIZE_MB=50
|
||||
UPLOAD_DIR=./uploads
|
||||
OUTPUT_DIR=./outputs
|
||||
53
.gitignore
vendored
Normal file
@ -0,0 +1,53 @@
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
|
||||
# Virtual Environment
|
||||
venv/
|
||||
env/
|
||||
ENV/
|
||||
|
||||
# Environment variables
|
||||
.env
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
|
||||
# Uploads and outputs
|
||||
uploads/
|
||||
outputs/
|
||||
temp/
|
||||
translated_files/
|
||||
translated_test.*
|
||||
|
||||
# Logs
|
||||
*.log
|
||||
|
||||
# UV / UV lock
|
||||
.venv/
|
||||
uv.lock
|
||||
|
||||
# Test files
|
||||
test_*.py
|
||||
test_*.ipynb
|
||||
1
.python-version
Normal file
@ -0,0 +1 @@
|
||||
3.12
|
||||
325
ARCHITECTURE.md
Normal file
@ -0,0 +1,325 @@
|
||||
# Document Translation API - Architecture Overview
|
||||
|
||||
## System Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ FastAPI Application │
|
||||
│ (main.py) │
|
||||
└─────────────────────┬───────────────────────────────────────┘
|
||||
│
|
||||
├──> File Upload Endpoint (/translate)
|
||||
│ ├─> File Validation
|
||||
│ ├─> File Type Detection
|
||||
│ └─> Route to Appropriate Translator
|
||||
│
|
||||
├──> Batch Translation (/translate-batch)
|
||||
│
|
||||
└──> Utility Endpoints
|
||||
├─> /health
|
||||
├─> /languages
|
||||
└─> /download/{filename}
|
||||
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Translation Layer │
|
||||
└─────────────────────┬───────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────┼─────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
Excel Word PowerPoint
|
||||
Translator Translator Translator
|
||||
(.xlsx) (.docx) (.pptx)
|
||||
│ │ │
|
||||
└─────────────┼─────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Translation Service Abstraction │
|
||||
│ (Pluggable Backend) │
|
||||
└─────────────────────┬───────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────┼─────────────┐
|
||||
▼ ▼ ▼
|
||||
Google DeepL LibreTranslate
|
||||
Translate (API Key) (Self-hosted)
|
||||
```
|
||||
|
||||
## Component Breakdown
|
||||
|
||||
### 1. API Layer (`main.py`)
|
||||
- **FastAPI Application**: RESTful API endpoints
|
||||
- **File Upload Handling**: Multipart form data processing
|
||||
- **Request Validation**: Pydantic models for type safety
|
||||
- **Error Handling**: Custom exception handlers
|
||||
- **CORS Configuration**: Cross-origin resource sharing
|
||||
|
||||
### 2. Translation Coordinators
|
||||
|
||||
#### Excel Translator (`translators/excel_translator.py`)
|
||||
```
|
||||
Input: .xlsx file
|
||||
Process:
|
||||
1. Load workbook with openpyxl (preserve VBA, formulas)
|
||||
2. Iterate through all worksheets
|
||||
3. For each cell:
|
||||
- Detect type (text, formula, number)
|
||||
- If text: translate
|
||||
- If formula: extract and translate strings
|
||||
- Preserve: formatting, colors, borders, merges
|
||||
4. Translate sheet names
|
||||
5. Maintain image positions
|
||||
Output: Translated .xlsx with identical structure
|
||||
```
|
||||
|
||||
#### Word Translator (`translators/word_translator.py`)
|
||||
```
|
||||
Input: .docx file
|
||||
Process:
|
||||
1. Load document with python-docx
|
||||
2. Traverse document tree:
|
||||
- Paragraphs → Runs (preserve formatting per run)
|
||||
- Tables → Cells → Paragraphs
|
||||
- Headers/Footers (all section types)
|
||||
3. Translate text while preserving:
|
||||
- Font family, size, color
|
||||
- Bold, italic, underline
|
||||
- Lists (numbered/bulleted)
|
||||
- Styles (Heading 1, Normal, etc.)
|
||||
4. Images remain embedded via relationships
|
||||
Output: Translated .docx with preserved layout
|
||||
```
|
||||
|
||||
#### PowerPoint Translator (`translators/pptx_translator.py`)
|
||||
```
|
||||
Input: .pptx file
|
||||
Process:
|
||||
1. Load presentation with python-pptx
|
||||
2. For each slide:
|
||||
- Shapes → Text Frames → Paragraphs → Runs
|
||||
- Tables → Cells → Text Frames
|
||||
- Groups → Nested Shapes
|
||||
- Speaker Notes
|
||||
3. Preserve:
|
||||
- Slide layouts
|
||||
- Animations (timing, effects)
|
||||
- Transitions
|
||||
- Image positions and layering
|
||||
- Shape properties (size, position, rotation)
|
||||
Output: Translated .pptx with identical design
|
||||
```
|
||||
|
||||
### 3. Translation Service Layer
|
||||
|
||||
**Abstract Interface**: `TranslationProvider`
|
||||
- Allows swapping translation backends without changing translators
|
||||
- Configurable via environment variables
|
||||
|
||||
**Implementations**:
|
||||
1. **Google Translator** (Default, Free)
|
||||
- Uses deep-translator library
|
||||
- No API key required
|
||||
- Rate limited
|
||||
|
||||
2. **DeepL** (Premium, API Key Required)
|
||||
- Higher quality translations
|
||||
- Better context understanding
|
||||
- Requires paid API key
|
||||
|
||||
3. **LibreTranslate** (Self-hosted)
|
||||
- Open-source alternative
|
||||
- Full control and privacy
|
||||
- Requires local installation
|
||||
|
||||
### 4. Utility Layer
|
||||
|
||||
#### File Handler (`utils/file_handler.py`)
|
||||
- File validation (size, type)
|
||||
- Unique filename generation (UUID-based)
|
||||
- Safe file operations
|
||||
- Cleanup management
|
||||
|
||||
#### Exception Handling (`utils/exceptions.py`)
|
||||
- Custom exception types
|
||||
- HTTP status code mapping
|
||||
- User-friendly error messages
|
||||
|
||||
### 5. Configuration (`config.py`)
|
||||
- Environment variable loading
|
||||
- Directory management
|
||||
- Service configuration
|
||||
- Validation rules
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Single Document Translation
|
||||
```
|
||||
1. Client uploads file via POST /translate
|
||||
└─> File + target_language + source_language
|
||||
|
||||
2. API validates request
|
||||
├─> Check file extension
|
||||
├─> Verify file size
|
||||
└─> Validate language codes
|
||||
|
||||
3. Save to temporary storage
|
||||
└─> uploads/{unique_id}_{filename}
|
||||
|
||||
4. Route to appropriate translator
|
||||
├─> .xlsx → ExcelTranslator
|
||||
├─> .docx → WordTranslator
|
||||
└─> .pptx → PowerPointTranslator
|
||||
|
||||
5. Translator processes document
|
||||
├─> Parse structure
|
||||
├─> Extract text elements
|
||||
├─> Call translation service for each text
|
||||
├─> Apply translations while preserving formatting
|
||||
└─> Save to outputs/{unique_id}_translated_{filename}
|
||||
|
||||
6. Return translated file
|
||||
└─> FileResponse with download headers
|
||||
|
||||
7. Cleanup (optional)
|
||||
└─> Delete uploaded file
|
||||
```
|
||||
|
||||
## Formatting Preservation Strategies
|
||||
|
||||
### Excel
|
||||
- **Cell Properties**: Copied before translation
|
||||
- **Merged Cells**: Detected via `cell.merge_cells`
|
||||
- **Formulas**: Regex parsing to extract strings
|
||||
- **Images**: Anchored to cells, preserved via relationships
|
||||
- **Charts**: Remain linked to data ranges
|
||||
|
||||
### Word
|
||||
- **Run-level Translation**: Preserves inline formatting
|
||||
- **Style Inheritance**: Paragraph styles maintained
|
||||
- **Tables**: Structure preserved, cells translated individually
|
||||
- **Images**: Embedded via relationships, not modified
|
||||
- **Headers/Footers**: Treated as separate sections
|
||||
|
||||
### PowerPoint
|
||||
- **Shape Hierarchy**: Recursive traversal
|
||||
- **Text Frames**: Paragraph and run-level translation
|
||||
- **Layouts**: Template references preserved
|
||||
- **Animations**: Stored separately, not affected
|
||||
- **Media**: File references remain intact
|
||||
|
||||
## Scalability Considerations
|
||||
|
||||
### Horizontal Scaling
|
||||
- Stateless design (no session storage)
|
||||
- Files stored on disk (can move to S3/Azure Blob)
|
||||
- Load balancer compatible
|
||||
|
||||
### Performance Optimization
|
||||
- **Async I/O**: FastAPI's async capabilities
|
||||
- **Batch Processing**: Multiple files in parallel
|
||||
- **Caching**: Translation cache for repeated text
|
||||
- **Streaming**: Large file chunking (future enhancement)
|
||||
|
||||
### Resource Management
|
||||
- **File Cleanup**: Automatic deletion after translation
|
||||
- **Size Limits**: Configurable max file size
|
||||
- **Rate Limiting**: Prevent API abuse
|
||||
- **Queue System**: Redis-based job queue (future)
|
||||
|
||||
## Future MCP Integration
|
||||
|
||||
### MCP Server Wrapper
|
||||
The API is designed to be wrapped as an MCP server:
|
||||
|
||||
```python
|
||||
# MCP Tools
|
||||
1. translate_document(file_path, target_lang) → translated_file
|
||||
2. get_supported_languages() → language_list
|
||||
3. check_api_health() → status
|
||||
|
||||
# Benefits
|
||||
- AI assistants can translate documents seamlessly
|
||||
- Integration with Claude, GPT, and other LLMs
|
||||
- Workflow automation in AI pipelines
|
||||
```
|
||||
|
||||
## Security Architecture
|
||||
|
||||
### Input Validation
|
||||
- File type whitelist
|
||||
- Size restrictions
|
||||
- Extension verification
|
||||
- Content-type checking
|
||||
|
||||
### File Isolation
|
||||
- Unique filenames (UUID)
|
||||
- Temporary storage
|
||||
- Automatic cleanup
|
||||
- No path traversal
|
||||
|
||||
### API Security (Production)
|
||||
- Rate limiting (not yet implemented)
|
||||
- Authentication/Authorization (future)
|
||||
- HTTPS/TLS encryption (deployment config)
|
||||
- Input sanitization
|
||||
|
||||
## Deployment Architecture
|
||||
|
||||
### Development
|
||||
```
|
||||
Local Machine
|
||||
├─> Python 3.11+
|
||||
├─> Virtual Environment
|
||||
├─> SQLite (if needed for tracking)
|
||||
└─> Local file storage
|
||||
```
|
||||
|
||||
### Production (Recommended)
|
||||
```
|
||||
Cloud Platform (AWS/Azure/GCP)
|
||||
├─> Container (Docker)
|
||||
├─> Load Balancer
|
||||
├─> Multiple API Instances
|
||||
├─> Object Storage (S3/Blob)
|
||||
├─> Redis (caching/queue)
|
||||
├─> Monitoring (Prometheus/Grafana)
|
||||
└─> Logging (ELK Stack)
|
||||
```
|
||||
|
||||
## Technology Stack
|
||||
|
||||
| Layer | Technology | Purpose |
|
||||
|-------|------------|---------|
|
||||
| API Framework | FastAPI | High-performance async API |
|
||||
| Excel Processing | openpyxl | Full Excel feature support |
|
||||
| Word Processing | python-docx | DOCX manipulation |
|
||||
| PowerPoint Processing | python-pptx | PPTX handling |
|
||||
| Translation | deep-translator | Multi-provider abstraction |
|
||||
| Server | Uvicorn | ASGI server |
|
||||
| Validation | Pydantic | Request/response validation |
|
||||
|
||||
## Extension Points
|
||||
|
||||
1. **Add Translation Provider**
|
||||
- Implement `TranslationProvider` interface
|
||||
- Register in `translation_service.py`
|
||||
|
||||
2. **Add Document Type**
|
||||
- Create new translator class
|
||||
- Register in routing logic
|
||||
- Add to supported extensions
|
||||
|
||||
3. **Add MCP Server**
|
||||
- Use provided `mcp_server_example.py`
|
||||
- Configure in MCP settings
|
||||
- Deploy alongside API
|
||||
|
||||
4. **Add Caching**
|
||||
- Implement translation cache
|
||||
- Use Redis or in-memory cache
|
||||
- Reduce API calls for repeated text
|
||||
|
||||
5. **Add Queue System**
|
||||
- Implement Celery/RQ workers
|
||||
- Handle long-running translations
|
||||
- Provide job status endpoints
|
||||
78
DEPLOYMENT.md
Normal file
@ -0,0 +1,78 @@
|
||||
# Development and Production Setup Scripts
|
||||
|
||||
## Start the API Server
|
||||
|
||||
### Development Mode (with auto-reload)
|
||||
```powershell
|
||||
# Activate virtual environment
|
||||
.\venv\Scripts\Activate.ps1
|
||||
|
||||
# Start server with hot-reload
|
||||
python main.py
|
||||
```
|
||||
|
||||
### Production Mode
|
||||
```powershell
|
||||
# Activate virtual environment
|
||||
.\venv\Scripts\Activate.ps1
|
||||
|
||||
# Start with uvicorn (better performance)
|
||||
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
|
||||
```
|
||||
|
||||
## Docker Deployment (Optional)
|
||||
|
||||
### Create Dockerfile
|
||||
```dockerfile
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
COPY . .
|
||||
|
||||
# Create directories
|
||||
RUN mkdir -p uploads outputs temp
|
||||
|
||||
EXPOSE 8000
|
||||
|
||||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
```
|
||||
|
||||
### Build and Run
|
||||
```powershell
|
||||
# Build image
|
||||
docker build -t document-translator-api .
|
||||
|
||||
# Run container
|
||||
docker run -d -p 8000:8000 -v ${PWD}/uploads:/app/uploads -v ${PWD}/outputs:/app/outputs document-translator-api
|
||||
```
|
||||
|
||||
## Environment Variables for Production
|
||||
|
||||
```env
|
||||
TRANSLATION_SERVICE=google
|
||||
DEEPL_API_KEY=your_production_api_key
|
||||
MAX_FILE_SIZE_MB=100
|
||||
UPLOAD_DIR=/app/uploads
|
||||
OUTPUT_DIR=/app/outputs
|
||||
```
|
||||
|
||||
## Monitoring and Logging
|
||||
|
||||
Add to requirements.txt for production:
|
||||
```
|
||||
prometheus-fastapi-instrumentator==6.1.0
|
||||
python-json-logger==2.0.7
|
||||
```
|
||||
|
||||
## Security Hardening
|
||||
|
||||
1. Add rate limiting
|
||||
2. Implement authentication (JWT/API keys)
|
||||
3. Enable HTTPS/TLS
|
||||
4. Sanitize file uploads
|
||||
5. Implement virus scanning for uploads
|
||||
6. Add request validation middleware
|
||||
230
QUICKSTART.md
Normal file
@ -0,0 +1,230 @@
|
||||
# 🚀 Quick Start Guide - Document Translation API
|
||||
|
||||
## Step-by-Step Setup (5 Minutes)
|
||||
|
||||
### 1️⃣ Open PowerShell in Project Directory
|
||||
```powershell
|
||||
cd d:\Translate
|
||||
```
|
||||
|
||||
### 2️⃣ Run the Startup Script
|
||||
```powershell
|
||||
.\start.ps1
|
||||
```
|
||||
|
||||
This will automatically:
|
||||
- Create a virtual environment
|
||||
- Install all dependencies
|
||||
- Create necessary directories
|
||||
- Start the API server
|
||||
|
||||
### 3️⃣ Test the API
|
||||
|
||||
**Open another PowerShell window** and run:
|
||||
```powershell
|
||||
python test_api.py
|
||||
```
|
||||
|
||||
Or visit in your browser:
|
||||
- **API Documentation**: http://localhost:8000/docs
|
||||
- **API Status**: http://localhost:8000/health
|
||||
|
||||
## 📤 Translate Your First Document
|
||||
|
||||
### Using cURL (PowerShell)
|
||||
```powershell
|
||||
$file = Get-Item "your_document.xlsx"
|
||||
Invoke-RestMethod -Uri "http://localhost:8000/translate" `
|
||||
-Method Post `
|
||||
-Form @{
|
||||
file = $file
|
||||
target_language = "es"
|
||||
} `
|
||||
-OutFile "translated_document.xlsx"
|
||||
```
|
||||
|
||||
### Using Python
|
||||
```python
|
||||
import requests
|
||||
|
||||
with open('document.docx', 'rb') as f:
|
||||
response = requests.post(
|
||||
'http://localhost:8000/translate',
|
||||
files={'file': f},
|
||||
data={'target_language': 'fr'}
|
||||
)
|
||||
|
||||
with open('translated_document.docx', 'wb') as out:
|
||||
out.write(response.content)
|
||||
```
|
||||
|
||||
### Using the Interactive API Docs
|
||||
|
||||
1. Go to http://localhost:8000/docs
|
||||
2. Click on **POST /translate**
|
||||
3. Click **"Try it out"**
|
||||
4. Upload your file
|
||||
5. Enter target language (e.g., `es` for Spanish)
|
||||
6. Click **"Execute"**
|
||||
7. Download the translated file
|
||||
|
||||
## 🌍 Supported Languages
|
||||
|
||||
Use these language codes in the `target_language` parameter:
|
||||
|
||||
| Code | Language | Code | Language |
|
||||
|------|----------|------|----------|
|
||||
| `es` | Spanish | `fr` | French |
|
||||
| `de` | German | `it` | Italian |
|
||||
| `pt` | Portuguese | `ru` | Russian |
|
||||
| `zh` | Chinese | `ja` | Japanese |
|
||||
| `ko` | Korean | `ar` | Arabic |
|
||||
| `hi` | Hindi | `nl` | Dutch |
|
||||
|
||||
**Full list**: http://localhost:8000/languages
|
||||
|
||||
## 📋 Supported File Types
|
||||
|
||||
| Format | Extension | What's Preserved |
|
||||
|--------|-----------|------------------|
|
||||
| **Excel** | `.xlsx` | Formulas, merged cells, colors, borders, images |
|
||||
| **Word** | `.docx` | Styles, tables, headers/footers, images |
|
||||
| **PowerPoint** | `.pptx` | Layouts, animations, transitions, media |
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
Edit `.env` file to customize:
|
||||
|
||||
```env
|
||||
# Translation service: google (free) or deepl (requires API key)
|
||||
TRANSLATION_SERVICE=google
|
||||
|
||||
# For DeepL (premium translation)
|
||||
DEEPL_API_KEY=your_api_key_here
|
||||
|
||||
# Maximum file size in MB
|
||||
MAX_FILE_SIZE_MB=50
|
||||
```
|
||||
|
||||
## ⚠️ Troubleshooting
|
||||
|
||||
### Issue: "Virtual environment activation failed"
|
||||
**Solution**:
|
||||
```powershell
|
||||
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
|
||||
```
|
||||
|
||||
### Issue: "Module not found"
|
||||
**Solution**:
|
||||
```powershell
|
||||
.\venv\Scripts\Activate.ps1
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### Issue: "Port 8000 already in use"
|
||||
**Solution**:
|
||||
Edit `main.py` line 307:
|
||||
```python
|
||||
uvicorn.run(app, host="0.0.0.0", port=8001, reload=True)
|
||||
```
|
||||
|
||||
### Issue: "Translation quality is poor"
|
||||
**Solution**:
|
||||
1. Get a DeepL API key from https://www.deepl.com/pro-api
|
||||
2. Update `.env`:
|
||||
```env
|
||||
TRANSLATION_SERVICE=deepl
|
||||
DEEPL_API_KEY=your_key_here
|
||||
```
|
||||
|
||||
## 📦 Project Structure
|
||||
|
||||
```
|
||||
Translate/
|
||||
├── main.py # FastAPI application (START HERE)
|
||||
├── config.py # Configuration management
|
||||
├── start.ps1 # Startup script (RUN THIS FIRST)
|
||||
├── test_api.py # Testing script
|
||||
│
|
||||
├── services/ # Translation service layer
|
||||
│ ├── __init__.py
|
||||
│ └── translation_service.py # Pluggable translation backend
|
||||
│
|
||||
├── translators/ # Document-specific translators
|
||||
│ ├── __init__.py
|
||||
│ ├── excel_translator.py # Excel (.xlsx) handler
|
||||
│ ├── word_translator.py # Word (.docx) handler
|
||||
│ ├── pptx_translator.py # PowerPoint (.pptx) handler
|
||||
│ └── excel_advanced.py # Advanced Excel features
|
||||
│
|
||||
├── utils/ # Utility modules
|
||||
│ ├── __init__.py
|
||||
│ ├── file_handler.py # File operations
|
||||
│ └── exceptions.py # Error handling
|
||||
│
|
||||
├── requirements.txt # Python dependencies
|
||||
├── README.md # Full documentation
|
||||
├── ARCHITECTURE.md # Technical architecture
|
||||
└── DEPLOYMENT.md # Production deployment guide
|
||||
```
|
||||
|
||||
## 🎯 Next Steps
|
||||
|
||||
### For Development
|
||||
1. ✅ Run `start.ps1` to start the server
|
||||
2. ✅ Test with `test_api.py`
|
||||
3. ✅ Try translating sample documents
|
||||
4. Read `ARCHITECTURE.md` for technical details
|
||||
|
||||
### For Production
|
||||
1. Read `DEPLOYMENT.md` for production setup
|
||||
2. Configure environment variables
|
||||
3. Set up Docker container
|
||||
4. Enable authentication and rate limiting
|
||||
|
||||
### For MCP Integration
|
||||
1. Install MCP requirements: `pip install -r requirements-mcp.txt`
|
||||
2. Review `mcp_server_example.py`
|
||||
3. Configure MCP server in your AI assistant
|
||||
|
||||
## 📞 API Endpoints Reference
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/` | GET | API information |
|
||||
| `/health` | GET | Health check |
|
||||
| `/languages` | GET | Supported languages |
|
||||
| `/translate` | POST | Translate single document |
|
||||
| `/translate-batch` | POST | Translate multiple documents |
|
||||
| `/download/{filename}` | GET | Download translated file |
|
||||
| `/cleanup/{filename}` | DELETE | Delete translated file |
|
||||
|
||||
## 💡 Tips & Best Practices
|
||||
|
||||
1. **File Size**: Keep files under 50MB for best performance
|
||||
2. **Format Preservation**: More complex formatting = longer processing time
|
||||
3. **Language Codes**: Use ISO 639-1 codes (2 letters)
|
||||
4. **Cleanup**: Enable cleanup to save disk space
|
||||
5. **Batch Translation**: Use batch endpoint for multiple files
|
||||
|
||||
## 🌟 Features Highlights
|
||||
|
||||
✨ **Zero Data Loss**: All formatting, colors, styles preserved
|
||||
✨ **Formula Intelligence**: Translates text in formulas, keeps logic
|
||||
✨ **Image Preservation**: Embedded media stays in exact positions
|
||||
✨ **Smart Translation**: Auto-detects source language
|
||||
✨ **MCP Ready**: Designed for AI assistant integration
|
||||
|
||||
## 📄 License
|
||||
|
||||
MIT License - Free to use and modify
|
||||
|
||||
## 🤝 Support
|
||||
|
||||
- **Documentation**: See `README.md` for full details
|
||||
- **Issues**: Open an issue on the repository
|
||||
- **Architecture**: Read `ARCHITECTURE.md` for technical depth
|
||||
|
||||
---
|
||||
|
||||
**Ready to translate? Run `.\start.ps1` and visit http://localhost:8000/docs** 🚀
|
||||
214
QUICK_REFERENCE.md
Normal file
@ -0,0 +1,214 @@
|
||||
# 📋 QUICK REFERENCE CARD
|
||||
|
||||
## 🚀 Start Server
|
||||
```powershell
|
||||
.\start.ps1
|
||||
```
|
||||
Or manually:
|
||||
```powershell
|
||||
.\venv\Scripts\Activate.ps1
|
||||
python main.py
|
||||
```
|
||||
|
||||
## 🌐 API URLs
|
||||
| Endpoint | URL |
|
||||
|----------|-----|
|
||||
| Swagger Docs | http://localhost:8000/docs |
|
||||
| ReDoc | http://localhost:8000/redoc |
|
||||
| Health Check | http://localhost:8000/health |
|
||||
| Languages | http://localhost:8000/languages |
|
||||
|
||||
## 📤 Translate Document
|
||||
|
||||
### PowerShell
|
||||
```powershell
|
||||
$file = Get-Item "document.xlsx"
|
||||
Invoke-RestMethod -Uri "http://localhost:8000/translate" `
|
||||
-Method Post `
|
||||
-Form @{file=$file; target_language="es"} `
|
||||
-OutFile "translated.xlsx"
|
||||
```
|
||||
|
||||
### Python
|
||||
```python
|
||||
import requests
|
||||
with open('doc.xlsx', 'rb') as f:
|
||||
r = requests.post('http://localhost:8000/translate',
|
||||
files={'file': f},
|
||||
data={'target_language': 'es'})
|
||||
with open('translated.xlsx', 'wb') as out:
|
||||
out.write(r.content)
|
||||
```
|
||||
|
||||
### cURL
|
||||
```bash
|
||||
curl -X POST "http://localhost:8000/translate" \
|
||||
-F "file=@document.xlsx" \
|
||||
-F "target_language=es" \
|
||||
--output translated.xlsx
|
||||
```
|
||||
|
||||
## 🌍 Language Codes
|
||||
| Code | Language | Code | Language |
|
||||
|------|----------|------|----------|
|
||||
| `es` | Spanish | `fr` | French |
|
||||
| `de` | German | `it` | Italian |
|
||||
| `pt` | Portuguese | `ru` | Russian |
|
||||
| `zh` | Chinese | `ja` | Japanese |
|
||||
| `ko` | Korean | `ar` | Arabic |
|
||||
| `hi` | Hindi | `nl` | Dutch |
|
||||
|
||||
[Full list: http://localhost:8000/languages]
|
||||
|
||||
## 📄 Supported Formats
|
||||
- `.xlsx` - Excel (formulas, formatting, images)
|
||||
- `.docx` - Word (styles, tables, images)
|
||||
- `.pptx` - PowerPoint (layouts, animations, media)
|
||||
|
||||
## ⚙️ Configuration (.env)
|
||||
```env
|
||||
TRANSLATION_SERVICE=google # or: deepl, libre
|
||||
DEEPL_API_KEY=your_key # if using DeepL
|
||||
MAX_FILE_SIZE_MB=50 # max upload size
|
||||
```
|
||||
|
||||
## 📁 Project Structure
|
||||
```
|
||||
Translate/
|
||||
├── main.py # API application
|
||||
├── config.py # Configuration
|
||||
├── start.ps1 # Startup script
|
||||
│
|
||||
├── services/ # Translation services
|
||||
│ └── translation_service.py
|
||||
│
|
||||
├── translators/ # Format handlers
|
||||
│ ├── excel_translator.py
|
||||
│ ├── word_translator.py
|
||||
│ └── pptx_translator.py
|
||||
│
|
||||
├── utils/ # Utilities
|
||||
│ ├── file_handler.py
|
||||
│ └── exceptions.py
|
||||
│
|
||||
└── [docs]/ # Documentation
|
||||
├── README.md
|
||||
├── QUICKSTART.md
|
||||
├── ARCHITECTURE.md
|
||||
└── DEPLOYMENT.md
|
||||
```
|
||||
|
||||
## 🧪 Testing
|
||||
```powershell
|
||||
# Test API
|
||||
python test_api.py
|
||||
|
||||
# Run examples
|
||||
python examples.py
|
||||
```
|
||||
|
||||
## 🔧 Troubleshooting
|
||||
|
||||
### Port in use
|
||||
```python
|
||||
# Edit main.py line 307:
|
||||
uvicorn.run(app, host="0.0.0.0", port=8001)
|
||||
```
|
||||
|
||||
### Module not found
|
||||
```powershell
|
||||
.\venv\Scripts\Activate.ps1
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### Execution policy (Windows)
|
||||
```powershell
|
||||
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
|
||||
```
|
||||
|
||||
## 📊 API Response Headers
|
||||
```
|
||||
X-Original-Filename: document.xlsx
|
||||
X-File-Size-MB: 2.5
|
||||
X-Target-Language: es
|
||||
```
|
||||
|
||||
## 🎯 Common Tasks
|
||||
|
||||
### Check API Status
|
||||
```powershell
|
||||
curl http://localhost:8000/health
|
||||
```
|
||||
|
||||
### List Languages
|
||||
```powershell
|
||||
curl http://localhost:8000/languages
|
||||
```
|
||||
|
||||
### Download File
|
||||
```powershell
|
||||
curl http://localhost:8000/download/filename.xlsx -o local.xlsx
|
||||
```
|
||||
|
||||
### Cleanup File
|
||||
```powershell
|
||||
curl -X DELETE http://localhost:8000/cleanup/filename.xlsx
|
||||
```
|
||||
|
||||
## 💡 Tips
|
||||
- Use `auto` for source language auto-detection
|
||||
- Set `cleanup=true` to delete uploads automatically
|
||||
- Max file size: 50MB (configurable)
|
||||
- Processing time: ~1-5 seconds per document
|
||||
|
||||
## 📚 Documentation Files
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `QUICKSTART.md` | 5-minute setup guide |
|
||||
| `README.md` | Complete documentation |
|
||||
| `ARCHITECTURE.md` | Technical design |
|
||||
| `DEPLOYMENT.md` | Production setup |
|
||||
| `CHECKLIST.md` | Feature checklist |
|
||||
| `PROJECT_SUMMARY.md` | Project overview |
|
||||
|
||||
## 🔌 MCP Integration
|
||||
```powershell
|
||||
# Install MCP dependencies
|
||||
pip install -r requirements-mcp.txt
|
||||
|
||||
# Run MCP server
|
||||
python mcp_server_example.py
|
||||
```
|
||||
|
||||
## 📞 Quick Commands
|
||||
|
||||
| Command | Purpose |
|
||||
|---------|---------|
|
||||
| `.\start.ps1` | Start API server |
|
||||
| `python test_api.py` | Test API |
|
||||
| `python examples.py` | Run examples |
|
||||
| `pip install -r requirements.txt` | Install deps |
|
||||
|
||||
## 🎨 Format Preservation
|
||||
|
||||
### Excel
|
||||
✅ Formulas, merged cells, fonts, colors, borders, images
|
||||
|
||||
### Word
|
||||
✅ Styles, headings, lists, tables, headers/footers, images
|
||||
|
||||
### PowerPoint
|
||||
✅ Layouts, animations, transitions, media, positioning
|
||||
|
||||
---
|
||||
|
||||
## 🚀 QUICK START
|
||||
```powershell
|
||||
cd d:\Translate
|
||||
.\start.ps1
|
||||
# Visit: http://localhost:8000/docs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Print this card for quick reference! 📋**
|
||||
303
README.md
Normal file
@ -0,0 +1,303 @@
|
||||
# Document Translation API
|
||||
|
||||
A powerful Python API for translating complex structured documents (Excel, Word, PowerPoint) while **strictly preserving** the original formatting, layout, and embedded media.
|
||||
|
||||
## 🎯 Features
|
||||
|
||||
### Excel Translation (.xlsx)
|
||||
- ✅ Translates all cell content and sheet names
|
||||
- ✅ Preserves cell merging
|
||||
- ✅ Maintains font styles (size, bold, italic, color)
|
||||
- ✅ Keeps background colors and borders
|
||||
- ✅ Translates text within formulas while preserving formula structure
|
||||
- ✅ Retains embedded images in original positions
|
||||
|
||||
### Word Translation (.docx)
|
||||
- ✅ Translates body text, headers, footers, and tables
|
||||
- ✅ Preserves heading styles and paragraph formatting
|
||||
- ✅ Maintains lists (numbered/bulleted)
|
||||
- ✅ Keeps embedded images, charts, and SmartArt in place
|
||||
- ✅ Preserves table structures and cell formatting
|
||||
|
||||
### PowerPoint Translation (.pptx)
|
||||
- ✅ Translates slide titles, body text, and speaker notes
|
||||
- ✅ Preserves slide layouts and transitions
|
||||
- ✅ Maintains animations
|
||||
- ✅ Keeps images, videos, and shapes in exact positions
|
||||
- ✅ Preserves layering order
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### Installation
|
||||
|
||||
1. **Clone the repository:**
|
||||
```powershell
|
||||
git clone <repository-url>
|
||||
cd Translate
|
||||
```
|
||||
|
||||
2. **Create a virtual environment:**
|
||||
```powershell
|
||||
python -m venv venv
|
||||
.\venv\Scripts\Activate.ps1
|
||||
```
|
||||
|
||||
3. **Install dependencies:**
|
||||
```powershell
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
4. **Configure environment:**
|
||||
```powershell
|
||||
cp .env.example .env
|
||||
# Edit .env with your preferred settings
|
||||
```
|
||||
|
||||
5. **Run the API:**
|
||||
```powershell
|
||||
python main.py
|
||||
```
|
||||
|
||||
The API will start on `http://localhost:8000`
|
||||
|
||||
## 📚 API Documentation
|
||||
|
||||
Once the server is running, visit:
|
||||
- **Swagger UI**: http://localhost:8000/docs
|
||||
- **ReDoc**: http://localhost:8000/redoc
|
||||
|
||||
## 🔧 API Endpoints
|
||||
|
||||
### POST /translate
|
||||
Translate a single document
|
||||
|
||||
**Request:**
|
||||
```bash
|
||||
curl -X POST "http://localhost:8000/translate" \
|
||||
-F "file=@document.xlsx" \
|
||||
-F "target_language=es" \
|
||||
-F "source_language=auto"
|
||||
```
|
||||
|
||||
**Response:**
|
||||
Returns the translated document file
|
||||
|
||||
### POST /translate-batch
|
||||
Translate multiple documents at once
|
||||
|
||||
**Request:**
|
||||
```bash
|
||||
curl -X POST "http://localhost:8000/translate-batch" \
|
||||
-F "files=@document1.docx" \
|
||||
-F "files=@document2.pptx" \
|
||||
-F "target_language=fr"
|
||||
```
|
||||
|
||||
### GET /languages
|
||||
Get list of supported language codes
|
||||
|
||||
### GET /health
|
||||
Health check endpoint
|
||||
|
||||
## 💻 Usage Examples
|
||||
|
||||
### Python Example
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
# Translate a document
|
||||
with open('document.xlsx', 'rb') as f:
|
||||
files = {'file': f}
|
||||
data = {
|
||||
'target_language': 'es',
|
||||
'source_language': 'auto'
|
||||
}
|
||||
response = requests.post('http://localhost:8000/translate', files=files, data=data)
|
||||
|
||||
# Save translated file
|
||||
with open('translated_document.xlsx', 'wb') as output:
|
||||
output.write(response.content)
|
||||
```
|
||||
|
||||
### JavaScript/TypeScript Example
|
||||
|
||||
```javascript
|
||||
const formData = new FormData();
|
||||
formData.append('file', fileInput.files[0]);
|
||||
formData.append('target_language', 'fr');
|
||||
formData.append('source_language', 'auto');
|
||||
|
||||
const response = await fetch('http://localhost:8000/translate', {
|
||||
method: 'POST',
|
||||
body: formData
|
||||
});
|
||||
|
||||
const blob = await response.blob();
|
||||
const url = window.URL.createObjectURL(blob);
|
||||
const a = document.createElement('a');
|
||||
a.href = url;
|
||||
a.download = 'translated_document.docx';
|
||||
a.click();
|
||||
```
|
||||
|
||||
### PowerShell Example
|
||||
|
||||
```powershell
|
||||
$file = Get-Item "document.pptx"
|
||||
$uri = "http://localhost:8000/translate"
|
||||
|
||||
$form = @{
|
||||
file = $file
|
||||
target_language = "de"
|
||||
source_language = "auto"
|
||||
}
|
||||
|
||||
Invoke-RestMethod -Uri $uri -Method Post -Form $form -OutFile "translated_document.pptx"
|
||||
```
|
||||
|
||||
## 🌐 Supported Languages
|
||||
|
||||
The API supports 25+ languages including:
|
||||
- Spanish (es), French (fr), German (de)
|
||||
- Italian (it), Portuguese (pt), Russian (ru)
|
||||
- Chinese (zh), Japanese (ja), Korean (ko)
|
||||
- Arabic (ar), Hindi (hi), Dutch (nl)
|
||||
- And many more...
|
||||
|
||||
Full list available at: `GET /languages`
|
||||
|
||||
## ⚙️ Configuration
|
||||
|
||||
Edit `.env` file to configure:
|
||||
|
||||
```env
|
||||
# Translation Service (google, deepl, libre)
|
||||
TRANSLATION_SERVICE=google
|
||||
|
||||
# DeepL API Key (if using DeepL)
|
||||
DEEPL_API_KEY=your_api_key_here
|
||||
|
||||
# File Upload Limits
|
||||
MAX_FILE_SIZE_MB=50
|
||||
|
||||
# Directory Configuration
|
||||
UPLOAD_DIR=./uploads
|
||||
OUTPUT_DIR=./outputs
|
||||
```
|
||||
|
||||
## 🔌 Model Context Protocol (MCP) Integration
|
||||
|
||||
This API is designed to be easily wrapped as an MCP server for future integration with AI assistants and tools.
|
||||
|
||||
### MCP Server Structure (Future Implementation)
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"document-translator": {
|
||||
"command": "python",
|
||||
"args": ["-m", "mcp_server"],
|
||||
"env": {
|
||||
"API_URL": "http://localhost:8000"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Example MCP Tools
|
||||
|
||||
The MCP wrapper will expose these tools:
|
||||
|
||||
1. **translate_document** - Translate a single document
|
||||
2. **translate_batch** - Translate multiple documents
|
||||
3. **get_supported_languages** - List supported languages
|
||||
4. **check_translation_status** - Check status of translation
|
||||
|
||||
## 🏗️ Project Structure
|
||||
|
||||
```
|
||||
Translate/
|
||||
├── main.py # FastAPI application
|
||||
├── config.py # Configuration management
|
||||
├── requirements.txt # Dependencies
|
||||
├── .env.example # Environment template
|
||||
├── services/
|
||||
│ ├── __init__.py
|
||||
│ └── translation_service.py # Translation abstraction layer
|
||||
├── translators/
|
||||
│ ├── __init__.py
|
||||
│ ├── excel_translator.py # Excel translation logic
|
||||
│ ├── word_translator.py # Word translation logic
|
||||
│ └── pptx_translator.py # PowerPoint translation logic
|
||||
├── utils/
|
||||
│ ├── __init__.py
|
||||
│ ├── file_handler.py # File operations
|
||||
│ └── exceptions.py # Custom exceptions
|
||||
├── uploads/ # Temporary upload storage
|
||||
└── outputs/ # Translated files
|
||||
```
|
||||
|
||||
## 🧪 Testing
|
||||
|
||||
### Manual Testing
|
||||
|
||||
1. Start the API server
|
||||
2. Navigate to http://localhost:8000/docs
|
||||
3. Use the interactive Swagger UI to test endpoints
|
||||
|
||||
### Test Files
|
||||
|
||||
Prepare test files with:
|
||||
- Complex formatting (multiple fonts, colors, styles)
|
||||
- Embedded images and media
|
||||
- Tables and merged cells
|
||||
- Formulas (for Excel)
|
||||
- Multiple sections/slides
|
||||
|
||||
## 🛠️ Technical Details
|
||||
|
||||
### Libraries Used
|
||||
|
||||
- **FastAPI**: Modern web framework for building APIs
|
||||
- **openpyxl**: Excel file manipulation with formatting preservation
|
||||
- **python-docx**: Word document handling
|
||||
- **python-pptx**: PowerPoint presentation processing
|
||||
- **deep-translator**: Multi-provider translation service
|
||||
- **Uvicorn**: ASGI server for running FastAPI
|
||||
|
||||
### Design Principles
|
||||
|
||||
1. **Modular Architecture**: Each file type has its own translator module
|
||||
2. **Provider Abstraction**: Easy to swap translation services (Google, DeepL, LibreTranslate)
|
||||
3. **Format Preservation**: All translators maintain original document structure
|
||||
4. **Error Handling**: Comprehensive error handling and logging
|
||||
5. **Scalability**: Ready for MCP integration and microservices architecture
|
||||
|
||||
## 🔐 Security Considerations
|
||||
|
||||
For production deployment:
|
||||
|
||||
1. **Configure CORS** properly in `main.py`
|
||||
2. **Add authentication** for API endpoints
|
||||
3. **Implement rate limiting** to prevent abuse
|
||||
4. **Use HTTPS** for secure file transmission
|
||||
5. **Sanitize file uploads** to prevent malicious files
|
||||
6. **Set appropriate file size limits**
|
||||
|
||||
## 📝 License
|
||||
|
||||
MIT License - Feel free to use this project for your needs.
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
Contributions are welcome! Please feel free to submit a Pull Request.
|
||||
|
||||
## 📧 Support
|
||||
|
||||
For issues and questions, please open an issue on the repository.
|
||||
|
||||
---
|
||||
|
||||
**Built with ❤️ using Python and FastAPI**
|
||||
47
config.py
Normal file
@ -0,0 +1,47 @@
|
||||
"""
|
||||
Configuration module for the Document Translation API
|
||||
"""
|
||||
import os
|
||||
from pathlib import Path
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
|
||||
class Config:
|
||||
# Translation Service
|
||||
TRANSLATION_SERVICE = os.getenv("TRANSLATION_SERVICE", "google")
|
||||
DEEPL_API_KEY = os.getenv("DEEPL_API_KEY", "")
|
||||
|
||||
# File Upload Configuration
|
||||
MAX_FILE_SIZE_MB = int(os.getenv("MAX_FILE_SIZE_MB", "50"))
|
||||
MAX_FILE_SIZE_BYTES = MAX_FILE_SIZE_MB * 1024 * 1024
|
||||
|
||||
# Directories
|
||||
BASE_DIR = Path(__file__).parent.parent
|
||||
UPLOAD_DIR = BASE_DIR / "uploads"
|
||||
OUTPUT_DIR = BASE_DIR / "outputs"
|
||||
TEMP_DIR = BASE_DIR / "temp"
|
||||
|
||||
# Supported file types
|
||||
SUPPORTED_EXTENSIONS = {".xlsx", ".docx", ".pptx"}
|
||||
|
||||
# API Configuration
|
||||
API_TITLE = "Document Translation API"
|
||||
API_VERSION = "1.0.0"
|
||||
API_DESCRIPTION = """
|
||||
Advanced Document Translation API with strict formatting preservation.
|
||||
|
||||
Supports:
|
||||
- Excel (.xlsx) - Preserves cell formatting, formulas, merged cells, images
|
||||
- Word (.docx) - Preserves styles, tables, images, headers/footers
|
||||
- PowerPoint (.pptx) - Preserves layouts, animations, embedded media
|
||||
"""
|
||||
|
||||
@classmethod
|
||||
def ensure_directories(cls):
|
||||
"""Create necessary directories if they don't exist"""
|
||||
cls.UPLOAD_DIR.mkdir(exist_ok=True, parents=True)
|
||||
cls.OUTPUT_DIR.mkdir(exist_ok=True, parents=True)
|
||||
cls.TEMP_DIR.mkdir(exist_ok=True, parents=True)
|
||||
|
||||
config = Config()
|
||||
887
create_complex_samples.py
Normal file
@ -0,0 +1,887 @@
|
||||
"""
|
||||
Script pour créer des fichiers exemples avec structure TRÈS COMPLEXE
|
||||
Génère des fichiers Excel, Word et PowerPoint avec formatage avancé
|
||||
"""
|
||||
from pathlib import Path
|
||||
from openpyxl import Workbook
|
||||
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side, Protection
|
||||
from openpyxl.styles.numbers import FORMAT_CURRENCY_USD, FORMAT_PERCENTAGE
|
||||
from openpyxl.chart import BarChart, PieChart, LineChart, Reference
|
||||
from openpyxl.drawing.image import Image as XLImage
|
||||
from openpyxl.utils import get_column_letter
|
||||
from openpyxl.worksheet.datavalidation import DataValidation
|
||||
|
||||
from docx import Document
|
||||
from docx.shared import Inches, Pt, RGBColor, Cm
|
||||
from docx.enum.text import WD_ALIGN_PARAGRAPH, WD_LINE_SPACING
|
||||
from docx.enum.style import WD_STYLE_TYPE
|
||||
from docx.oxml.ns import qn
|
||||
from docx.oxml import OxmlElement
|
||||
|
||||
from pptx import Presentation
|
||||
from pptx.util import Inches as PptxInches, Pt as PptxPt
|
||||
from pptx.enum.text import PP_ALIGN, MSO_ANCHOR
|
||||
from pptx.dml.color import RGBColor as PptxRGBColor
|
||||
from pptx.enum.shapes import MSO_SHAPE
|
||||
|
||||
from PIL import Image, ImageDraw, ImageFont
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
print("🚀 Création de fichiers exemples avec structure COMPLEXE...\n")
|
||||
|
||||
# Créer le dossier
|
||||
SAMPLE_DIR = Path("sample_files")
|
||||
SAMPLE_DIR.mkdir(exist_ok=True)
|
||||
|
||||
# ============================================================================
|
||||
# 1. EXCEL TRÈS COMPLEXE
|
||||
# ============================================================================
|
||||
print("📊 Création d'Excel super complexe...")
|
||||
|
||||
wb = Workbook()
|
||||
|
||||
# === SHEET 1: RAPPORT FINANCIER COMPLEXE ===
|
||||
ws1 = wb.active
|
||||
ws1.title = "Rapport Financier 2024"
|
||||
|
||||
# Titre principal avec fusion massive
|
||||
ws1.merge_cells('A1:H2')
|
||||
title = ws1['A1']
|
||||
title.value = "RAPPORT FINANCIER ANNUEL 2024\nAnalyse Complète et Prévisions"
|
||||
title.font = Font(name='Calibri', size=20, bold=True, color='FFFFFF')
|
||||
title.fill = PatternFill(start_color='0066CC', end_color='0066CC', fill_type='solid')
|
||||
title.alignment = Alignment(horizontal='center', vertical='center', wrap_text=True)
|
||||
title.border = Border(
|
||||
left=Side(style='thick', color='000000'),
|
||||
right=Side(style='thick', color='000000'),
|
||||
top=Side(style='thick', color='000000'),
|
||||
bottom=Side(style='thick', color='000000')
|
||||
)
|
||||
|
||||
# Sous-titre avec fusion
|
||||
ws1.merge_cells('A3:H3')
|
||||
subtitle = ws1['A3']
|
||||
subtitle.value = "Département des Ventes - Trimestre Q1-Q4"
|
||||
subtitle.font = Font(size=14, italic=True, color='0066CC')
|
||||
subtitle.alignment = Alignment(horizontal='center')
|
||||
|
||||
# En-têtes de colonnes avec style élaboré
|
||||
headers = ['Région', 'Produit', 'Unités Vendues', 'Prix Unitaire', 'Chiffre d\'Affaires', 'Coût', 'Marge Brute', 'Marge %']
|
||||
for col, header in enumerate(headers, 1):
|
||||
cell = ws1.cell(row=4, column=col)
|
||||
cell.value = header
|
||||
cell.font = Font(bold=True, color='FFFFFF', size=12)
|
||||
cell.fill = PatternFill(start_color='0066CC', end_color='0066CC', fill_type='solid')
|
||||
cell.alignment = Alignment(horizontal='center', vertical='center', wrap_text=True)
|
||||
cell.border = Border(
|
||||
left=Side(style='thin'),
|
||||
right=Side(style='thin'),
|
||||
top=Side(style='thin'),
|
||||
bottom=Side(style='thin')
|
||||
)
|
||||
|
||||
# Données complexes avec formules avancées
|
||||
regions = ['Europe du Nord', 'Europe du Sud', 'Amérique du Nord', 'Amérique du Sud', 'Asie Pacifique', 'Moyen-Orient', 'Afrique']
|
||||
products = ['Ordinateur Portable Premium', 'Tablette Professionnelle', 'Smartphone 5G', 'Écran 4K', 'Casque Sans Fil']
|
||||
|
||||
row = 5
|
||||
for region in regions:
|
||||
for i, product in enumerate(products):
|
||||
units = 100 + (row * 13) % 500
|
||||
price = 299 + (row * 37) % 1200
|
||||
cost = price * 0.6
|
||||
|
||||
ws1.cell(row=row, column=1, value=region)
|
||||
ws1.cell(row=row, column=2, value=product)
|
||||
ws1.cell(row=row, column=3, value=units)
|
||||
ws1.cell(row=row, column=4, value=price).number_format = FORMAT_CURRENCY_USD
|
||||
|
||||
# Formule: Chiffre d'affaires
|
||||
ws1.cell(row=row, column=5, value=f"=C{row}*D{row}").number_format = FORMAT_CURRENCY_USD
|
||||
|
||||
# Coût
|
||||
ws1.cell(row=row, column=6, value=cost).number_format = FORMAT_CURRENCY_USD
|
||||
|
||||
# Formule: Marge brute
|
||||
ws1.cell(row=row, column=7, value=f"=E{row}-F{row}").number_format = FORMAT_CURRENCY_USD
|
||||
|
||||
# Formule: Marge %
|
||||
ws1.cell(row=row, column=8, value=f"=IF(E{row}>0,G{row}/E{row},0)").number_format = FORMAT_PERCENTAGE
|
||||
|
||||
# Formatage conditionnel par région
|
||||
for col in range(1, 9):
|
||||
cell = ws1.cell(row=row, column=col)
|
||||
if row % 2 == 0:
|
||||
cell.fill = PatternFill(start_color='F2F2F2', end_color='F2F2F2', fill_type='solid')
|
||||
cell.border = Border(
|
||||
left=Side(style='thin', color='CCCCCC'),
|
||||
right=Side(style='thin', color='CCCCCC'),
|
||||
top=Side(style='thin', color='CCCCCC'),
|
||||
bottom=Side(style='thin', color='CCCCCC')
|
||||
)
|
||||
|
||||
row += 1
|
||||
|
||||
# Ligne de total avec formules complexes
|
||||
total_row = row + 1
|
||||
ws1.merge_cells(f'A{total_row}:B{total_row}')
|
||||
total_cell = ws1[f'A{total_row}']
|
||||
total_cell.value = "TOTAL GÉNÉRAL"
|
||||
total_cell.font = Font(bold=True, size=14, color='FFFFFF')
|
||||
total_cell.fill = PatternFill(start_color='FF6600', end_color='FF6600', fill_type='solid')
|
||||
total_cell.alignment = Alignment(horizontal='right', vertical='center')
|
||||
|
||||
for col in [3, 5, 7]:
|
||||
cell = ws1.cell(row=total_row, column=col)
|
||||
cell.value = f"=SUM({get_column_letter(col)}5:{get_column_letter(col)}{row-1})"
|
||||
cell.font = Font(bold=True, size=12)
|
||||
cell.fill = PatternFill(start_color='FF6600', end_color='FF6600', fill_type='solid')
|
||||
if col in [5, 7]:
|
||||
cell.number_format = FORMAT_CURRENCY_USD
|
||||
|
||||
# Marge % moyenne
|
||||
avg_cell = ws1.cell(row=total_row, column=8)
|
||||
avg_cell.value = f"=AVERAGE(H5:H{row-1})"
|
||||
avg_cell.number_format = FORMAT_PERCENTAGE
|
||||
avg_cell.font = Font(bold=True, size=12)
|
||||
avg_cell.fill = PatternFill(start_color='FF6600', end_color='FF6600', fill_type='solid')
|
||||
|
||||
# Ajuster les largeurs
|
||||
ws1.column_dimensions['A'].width = 20
|
||||
ws1.column_dimensions['B'].width = 30
|
||||
ws1.column_dimensions['C'].width = 15
|
||||
ws1.column_dimensions['D'].width = 15
|
||||
ws1.column_dimensions['E'].width = 18
|
||||
ws1.column_dimensions['F'].width = 15
|
||||
ws1.column_dimensions['G'].width = 15
|
||||
ws1.column_dimensions['H'].width = 12
|
||||
|
||||
# === SHEET 2: GRAPHIQUES ET ANALYSES ===
|
||||
ws2 = wb.create_sheet("Analyses Graphiques")
|
||||
|
||||
ws2['A1'] = "Analyse des Performances par Région"
|
||||
ws2['A1'].font = Font(size=16, bold=True, color='0066CC')
|
||||
ws2.merge_cells('A1:D1')
|
||||
|
||||
# Données pour graphiques
|
||||
ws2['A3'] = "Région"
|
||||
ws2['B3'] = "Total Ventes"
|
||||
ws2['C3'] = "Objectif"
|
||||
ws2['D3'] = "Écart %"
|
||||
|
||||
region_data = [
|
||||
("Europe", 2500000, 2200000),
|
||||
("Amérique", 3200000, 3000000),
|
||||
("Asie", 2800000, 2900000),
|
||||
("Autres", 1200000, 1100000)
|
||||
]
|
||||
|
||||
for i, (region, sales, target) in enumerate(region_data, 4):
|
||||
ws2.cell(row=i, column=1, value=region)
|
||||
ws2.cell(row=i, column=2, value=sales).number_format = FORMAT_CURRENCY_USD
|
||||
ws2.cell(row=i, column=3, value=target).number_format = FORMAT_CURRENCY_USD
|
||||
ws2.cell(row=i, column=4, value=f"=(B{i}-C{i})/C{i}").number_format = FORMAT_PERCENTAGE
|
||||
|
||||
# Graphique en barres
|
||||
chart1 = BarChart()
|
||||
chart1.title = "Ventes par Région vs Objectifs"
|
||||
chart1.y_axis.title = "Montant (USD)"
|
||||
chart1.x_axis.title = "Régions"
|
||||
chart1.height = 10
|
||||
chart1.width = 20
|
||||
|
||||
data = Reference(ws2, min_col=2, min_row=3, max_row=7, max_col=3)
|
||||
cats = Reference(ws2, min_col=1, min_row=4, max_row=7)
|
||||
chart1.add_data(data, titles_from_data=True)
|
||||
chart1.set_categories(cats)
|
||||
ws2.add_chart(chart1, "F3")
|
||||
|
||||
# Graphique circulaire
|
||||
chart2 = PieChart()
|
||||
chart2.title = "Répartition des Ventes par Région"
|
||||
chart2.height = 10
|
||||
chart2.width = 15
|
||||
|
||||
data2 = Reference(ws2, min_col=2, min_row=4, max_row=7)
|
||||
cats2 = Reference(ws2, min_col=1, min_row=4, max_row=7)
|
||||
chart2.add_data(data2)
|
||||
chart2.set_categories(cats2)
|
||||
ws2.add_chart(chart2, "F20")
|
||||
|
||||
# === SHEET 3: DONNÉES MENSUELLES AVEC TENDANCES ===
|
||||
ws3 = wb.create_sheet("Tendances Mensuelles")
|
||||
|
||||
ws3['A1'] = "Évolution Mensuelle des Ventes 2024"
|
||||
ws3['A1'].font = Font(size=16, bold=True, color='FF6600')
|
||||
ws3.merge_cells('A1:M1')
|
||||
|
||||
months = ['Janvier', 'Février', 'Mars', 'Avril', 'Mai', 'Juin', 'Juillet', 'Août', 'Septembre', 'Octobre', 'Novembre', 'Décembre']
|
||||
ws3['A3'] = "Mois"
|
||||
for i, month in enumerate(months, 2):
|
||||
ws3.cell(row=3, column=i, value=month)
|
||||
ws3.cell(row=3, column=i).font = Font(bold=True)
|
||||
|
||||
# Produits avec ventes mensuelles
|
||||
products_monthly = ['Laptops', 'Tablettes', 'Smartphones', 'Accessoires']
|
||||
for i, product in enumerate(products_monthly, 4):
|
||||
ws3.cell(row=i, column=1, value=product)
|
||||
ws3.cell(row=i, column=1).font = Font(bold=True)
|
||||
|
||||
for month_col in range(2, 14):
|
||||
value = 50000 + (i * month_col * 1234) % 30000
|
||||
ws3.cell(row=i, column=month_col, value=value).number_format = FORMAT_CURRENCY_USD
|
||||
|
||||
# Ligne de total avec formule
|
||||
ws3.cell(row=8, column=1, value="TOTAL")
|
||||
ws3.cell(row=8, column=1).font = Font(bold=True, size=12)
|
||||
for col in range(2, 14):
|
||||
ws3.cell(row=8, column=col, value=f"=SUM({get_column_letter(col)}4:{get_column_letter(col)}7)")
|
||||
ws3.cell(row=8, column=col).number_format = FORMAT_CURRENCY_USD
|
||||
ws3.cell(row=8, column=col).font = Font(bold=True)
|
||||
ws3.cell(row=8, column=col).fill = PatternFill(start_color='FFD700', end_color='FFD700', fill_type='solid')
|
||||
|
||||
# Graphique linéaire
|
||||
chart3 = LineChart()
|
||||
chart3.title = "Tendance des Ventes sur 12 Mois"
|
||||
chart3.y_axis.title = "Montant (USD)"
|
||||
chart3.x_axis.title = "Mois"
|
||||
chart3.height = 12
|
||||
chart3.width = 24
|
||||
|
||||
data3 = Reference(ws3, min_col=2, min_row=3, max_row=8, max_col=13)
|
||||
cats3 = Reference(ws3, min_col=2, min_row=3, max_col=13)
|
||||
chart3.add_data(data3, titles_from_data=True)
|
||||
chart3.set_categories(cats3)
|
||||
ws3.add_chart(chart3, "A10")
|
||||
|
||||
# Sauvegarder Excel
|
||||
excel_file = SAMPLE_DIR / "super_complex.xlsx"
|
||||
wb.save(excel_file)
|
||||
print(f"✅ Excel créé: {excel_file}")
|
||||
print(f" - 3 feuilles avec données complexes")
|
||||
print(f" - Cellules fusionnées multiples")
|
||||
print(f" - Formules avancées (SUM, AVERAGE, IF, pourcentages)")
|
||||
print(f" - 3 graphiques (barres, camembert, lignes)")
|
||||
print(f" - Formatage conditionnel élaboré")
|
||||
print(f" - {len(regions) * len(products)} lignes de données\n")
|
||||
|
||||
# ============================================================================
|
||||
# 2. WORD TRÈS COMPLEXE
|
||||
# ============================================================================
|
||||
print("📝 Création de Word super complexe...")
|
||||
|
||||
doc = Document()
|
||||
|
||||
# Configurer les marges
|
||||
sections = doc.sections
|
||||
for section in sections:
|
||||
section.top_margin = Cm(2)
|
||||
section.bottom_margin = Cm(2)
|
||||
section.left_margin = Cm(2.5)
|
||||
section.right_margin = Cm(2.5)
|
||||
|
||||
# PAGE DE COUVERTURE
|
||||
title = doc.add_heading('RAPPORT STRATÉGIQUE ANNUEL', 0)
|
||||
title.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
for run in title.runs:
|
||||
run.font.size = Pt(28)
|
||||
run.font.color.rgb = RGBColor(0, 102, 204)
|
||||
run.font.bold = True
|
||||
|
||||
subtitle = doc.add_paragraph('Analyse Complète des Performances et Perspectives 2024-2025')
|
||||
subtitle.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
subtitle.runs[0].font.size = Pt(16)
|
||||
subtitle.runs[0].font.italic = True
|
||||
subtitle.runs[0].font.color.rgb = RGBColor(102, 102, 102)
|
||||
|
||||
doc.add_paragraph('\n' * 3)
|
||||
|
||||
company_info = doc.add_paragraph()
|
||||
company_info.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
company_info.add_run('Société: TechnoVision International\n').font.size = Pt(14)
|
||||
company_info.add_run('Département: Analyse Stratégique\n').font.size = Pt(12)
|
||||
company_info.add_run('Date: 31 Décembre 2024').font.size = Pt(12)
|
||||
|
||||
doc.add_page_break()
|
||||
|
||||
# TABLE DES MATIÈRES
|
||||
doc.add_heading('Table des Matières', 1)
|
||||
toc_items = [
|
||||
'1. Résumé Exécutif',
|
||||
'2. Analyse des Performances Financières',
|
||||
'3. Objectifs Stratégiques',
|
||||
'4. Analyse de Marché',
|
||||
'5. Recommandations',
|
||||
'6. Conclusion'
|
||||
]
|
||||
for item in toc_items:
|
||||
p = doc.add_paragraph(item, style='List Number')
|
||||
p.runs[0].font.size = Pt(12)
|
||||
|
||||
doc.add_page_break()
|
||||
|
||||
# SECTION 1: RÉSUMÉ EXÉCUTIF
|
||||
doc.add_heading('1. Résumé Exécutif', 1)
|
||||
|
||||
para1 = doc.add_paragraph()
|
||||
para1.add_run('Ce rapport présente une analyse détaillée ').font.size = Pt(11)
|
||||
para1.add_run('des performances exceptionnelles').bold = True
|
||||
para1.add_run(' de notre entreprise au cours de l\'année 2024. ').font.size = Pt(11)
|
||||
para1.add_run('Nous avons atteint et dépassé').italic = True
|
||||
para1.add_run(' nos objectifs dans tous les domaines clés.').font.size = Pt(11)
|
||||
para1.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
|
||||
|
||||
doc.add_heading('Points Clés', 2)
|
||||
key_points = [
|
||||
'Croissance du chiffre d\'affaires de 28% par rapport à 2023',
|
||||
'Expansion réussie dans 5 nouveaux marchés internationaux',
|
||||
'Lancement de 8 produits innovants avec taux d\'adoption de 87%',
|
||||
'Amélioration de la satisfaction client: score de 4.8/5',
|
||||
'Réduction de l\'empreinte carbone de 22%',
|
||||
'Augmentation de la part de marché de 3.5 points'
|
||||
]
|
||||
|
||||
for point in key_points:
|
||||
p = doc.add_paragraph(point, style='List Bullet')
|
||||
p.runs[0].font.size = Pt(11)
|
||||
|
||||
# Créer graphique pour Word
|
||||
img_path = SAMPLE_DIR / "word_performance_chart.png"
|
||||
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
|
||||
|
||||
# Graphique 1: Croissance trimestrielle
|
||||
quarters = ['Q1', 'Q2', 'Q3', 'Q4']
|
||||
revenue = [45.2, 52.8, 61.5, 68.3]
|
||||
ax1.plot(quarters, revenue, marker='o', linewidth=2, markersize=8, color='#0066CC')
|
||||
ax1.fill_between(range(len(quarters)), revenue, alpha=0.3, color='#0066CC')
|
||||
ax1.set_title('Croissance Trimestrielle du CA (M€)', fontsize=14, fontweight='bold')
|
||||
ax1.set_ylabel('Chiffre d\'Affaires (M€)', fontsize=11)
|
||||
ax1.grid(True, alpha=0.3)
|
||||
|
||||
# Graphique 2: Répartition des revenus
|
||||
categories = ['Produits', 'Services', 'Licences', 'Consulting']
|
||||
values = [45, 25, 20, 10]
|
||||
colors = ['#0066CC', '#FF6600', '#00CC66', '#CC00CC']
|
||||
ax2.pie(values, labels=categories, autopct='%1.1f%%', colors=colors, startangle=90)
|
||||
ax2.set_title('Répartition des Revenus 2024', fontsize=14, fontweight='bold')
|
||||
|
||||
plt.tight_layout()
|
||||
plt.savefig(img_path, dpi=150, bbox_inches='tight')
|
||||
plt.close()
|
||||
|
||||
doc.add_paragraph()
|
||||
doc.add_picture(str(img_path), width=Inches(6))
|
||||
last_paragraph = doc.paragraphs[-1]
|
||||
last_paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
|
||||
# SECTION 2: ANALYSE FINANCIÈRE
|
||||
doc.add_page_break()
|
||||
doc.add_heading('2. Analyse des Performances Financières', 1)
|
||||
|
||||
doc.add_heading('2.1 Résultats par Trimestre', 2)
|
||||
|
||||
# Tableau complexe des résultats
|
||||
table = doc.add_table(rows=6, cols=6)
|
||||
table.style = 'Medium Grid 3 Accent 1'
|
||||
|
||||
# En-têtes
|
||||
headers = ['Trimestre', 'Revenus (M€)', 'Coûts (M€)', 'Marge Brute', 'Marge %', 'Croissance']
|
||||
for i, header in enumerate(headers):
|
||||
cell = table.rows[0].cells[i]
|
||||
cell.text = header
|
||||
cell.paragraphs[0].runs[0].font.bold = True
|
||||
cell.paragraphs[0].runs[0].font.size = Pt(10)
|
||||
|
||||
# Données trimestrielles
|
||||
quarterly_data = [
|
||||
('Q1 2024', 45.2, 28.5, 16.7, '36.9%', '+8.5%'),
|
||||
('Q2 2024', 52.8, 32.1, 20.7, '39.2%', '+16.8%'),
|
||||
('Q3 2024', 61.5, 36.8, 24.7, '40.2%', '+16.5%'),
|
||||
('Q4 2024', 68.3, 40.2, 28.1, '41.1%', '+11.1%'),
|
||||
('TOTAL', 227.8, 137.6, 90.2, '39.6%', '28.0%')
|
||||
]
|
||||
|
||||
for row_idx, row_data in enumerate(quarterly_data, 1):
|
||||
for col_idx, value in enumerate(row_data):
|
||||
cell = table.rows[row_idx].cells[col_idx]
|
||||
cell.text = str(value)
|
||||
if row_idx == 5: # Ligne total
|
||||
cell.paragraphs[0].runs[0].font.bold = True
|
||||
# Colorer le fond (workaround via XML)
|
||||
shading_elm = OxmlElement('w:shd')
|
||||
shading_elm.set(qn('w:fill'), 'FFD700')
|
||||
cell._element.get_or_add_tcPr().append(shading_elm)
|
||||
|
||||
doc.add_paragraph()
|
||||
|
||||
doc.add_heading('2.2 Analyse Comparative', 2)
|
||||
|
||||
comparison_text = doc.add_paragraph()
|
||||
comparison_text.add_run('Comparaison avec les objectifs annuels: ').bold = True
|
||||
comparison_text.add_run('Notre performance a dépassé les objectifs fixés de ')
|
||||
comparison_text.add_run('13.5%').bold = True
|
||||
comparison_text.add_run(', démontrant une excellente exécution stratégique et une adaptation réussie aux conditions du marché.')
|
||||
comparison_text.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
|
||||
|
||||
# SECTION 3: OBJECTIFS STRATÉGIQUES
|
||||
doc.add_page_break()
|
||||
doc.add_heading('3. Objectifs Stratégiques 2025', 1)
|
||||
|
||||
doc.add_heading('3.1 Vision et Mission', 2)
|
||||
vision = doc.add_paragraph()
|
||||
vision.add_run('Vision: ').bold = True
|
||||
vision.add_run('Devenir le leader mondial dans notre secteur d\'ici 2027.')
|
||||
vision.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
|
||||
|
||||
mission = doc.add_paragraph()
|
||||
mission.add_run('Mission: ').bold = True
|
||||
mission.add_run('Fournir des solutions innovantes qui transforment les entreprises et améliorent la vie de millions de personnes.')
|
||||
mission.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
|
||||
|
||||
doc.add_heading('3.2 Objectifs Prioritaires', 2)
|
||||
|
||||
objectives = [
|
||||
('Expansion Géographique', 'Pénétrer 10 nouveaux marchés en Asie et Europe de l\'Est'),
|
||||
('Innovation Produit', 'Lancer 12 nouveaux produits avec IA intégrée'),
|
||||
('Excellence Opérationnelle', 'Réduire les coûts de 15% via automatisation'),
|
||||
('Satisfaction Client', 'Atteindre un NPS de 75+'),
|
||||
('Développement Durable', 'Neutralité carbone d\'ici fin 2025')
|
||||
]
|
||||
|
||||
for i, (title, desc) in enumerate(objectives, 1):
|
||||
p = doc.add_paragraph(style='List Number')
|
||||
p.add_run(f'{title}: ').bold = True
|
||||
p.add_run(desc)
|
||||
|
||||
# Ajouter une image conceptuelle
|
||||
concept_img = SAMPLE_DIR / "word_strategy_image.png"
|
||||
img = Image.new('RGB', (800, 400), color=(240, 248, 255))
|
||||
draw = ImageDraw.Draw(img)
|
||||
try:
|
||||
font_large = ImageFont.truetype("arial.ttf", 60)
|
||||
font_small = ImageFont.truetype("arial.ttf", 30)
|
||||
except:
|
||||
font_large = ImageFont.load_default()
|
||||
font_small = ImageFont.load_default()
|
||||
|
||||
draw.rectangle([50, 50, 750, 350], outline=(0, 102, 204), width=5, fill=(230, 240, 255))
|
||||
draw.text((400, 150), "STRATÉGIE 2025", fill=(0, 102, 204), font=font_large, anchor="mm")
|
||||
draw.text((400, 250), "Innovation • Excellence • Croissance", fill=(102, 102, 102), font=font_small, anchor="mm")
|
||||
img.save(concept_img)
|
||||
|
||||
doc.add_paragraph()
|
||||
doc.add_picture(str(concept_img), width=Inches(5.5))
|
||||
doc.paragraphs[-1].alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
|
||||
# CONCLUSION
|
||||
doc.add_page_break()
|
||||
doc.add_heading('6. Conclusion', 1)
|
||||
|
||||
conclusion = doc.add_paragraph()
|
||||
conclusion.add_run('L\'année 2024 a été exceptionnelle ').font.size = Pt(12)
|
||||
conclusion.add_run('pour notre organisation. ').bold = True
|
||||
conclusion.add_run('Nous avons non seulement atteint nos objectifs ambitieux, mais nous avons également posé les bases solides pour une croissance continue et durable. ')
|
||||
conclusion.add_run('L\'engagement de nos équipes, la confiance de nos clients et l\'innovation de nos produits ').italic = True
|
||||
conclusion.add_run('sont les piliers de notre succès.')
|
||||
conclusion.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
|
||||
conclusion.paragraph_format.line_spacing_rule = WD_LINE_SPACING.MULTIPLE
|
||||
conclusion.paragraph_format.line_spacing = 1.5
|
||||
|
||||
# Footer
|
||||
section = doc.sections[0]
|
||||
footer = section.footer
|
||||
footer_para = footer.paragraphs[0]
|
||||
footer_para.text = "Document Confidentiel - TechnoVision International 2024 | "
|
||||
footer_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
footer_para.runs[0].font.size = Pt(9)
|
||||
footer_para.runs[0].font.italic = True
|
||||
|
||||
# Sauvegarder Word
|
||||
word_file = SAMPLE_DIR / "super_complex.docx"
|
||||
doc.save(word_file)
|
||||
print(f"✅ Word créé: {word_file}")
|
||||
print(f" - 6 sections structurées")
|
||||
print(f" - Page de couverture professionnelle")
|
||||
print(f" - Table des matières")
|
||||
print(f" - Tableaux complexes avec formatage")
|
||||
print(f" - 3 images intégrées (graphiques et concepts)")
|
||||
print(f" - Formatage avancé (styles, couleurs, alignements)")
|
||||
print(f" - Pieds de page\n")
|
||||
|
||||
# ============================================================================
|
||||
# 3. POWERPOINT TRÈS COMPLEXE
|
||||
# ============================================================================
|
||||
print("🎨 Création de PowerPoint super complexe...")
|
||||
|
||||
prs = Presentation()
|
||||
prs.slide_width = PptxInches(10)
|
||||
prs.slide_height = PptxInches(7.5)
|
||||
|
||||
# SLIDE 1: TITRE PRINCIPAL
|
||||
slide1 = prs.slides.add_slide(prs.slide_layouts[6]) # Blank
|
||||
|
||||
# Fond coloré
|
||||
background = slide1.shapes.add_shape(
|
||||
MSO_SHAPE.RECTANGLE,
|
||||
0, 0, prs.slide_width, prs.slide_height
|
||||
)
|
||||
background.fill.solid()
|
||||
background.fill.fore_color.rgb = PptxRGBColor(0, 102, 204)
|
||||
background.line.fill.background()
|
||||
|
||||
# Titre principal
|
||||
title_box = slide1.shapes.add_textbox(
|
||||
PptxInches(1), PptxInches(2.5),
|
||||
PptxInches(8), PptxInches(2)
|
||||
)
|
||||
title_frame = title_box.text_frame
|
||||
title_frame.text = "TRANSFORMATION DIGITALE 2025"
|
||||
title_para = title_frame.paragraphs[0]
|
||||
title_para.font.size = PptxPt(54)
|
||||
title_para.font.bold = True
|
||||
title_para.font.color.rgb = PptxRGBColor(255, 255, 255)
|
||||
title_para.alignment = PP_ALIGN.CENTER
|
||||
|
||||
# Sous-titre
|
||||
subtitle_box = slide1.shapes.add_textbox(
|
||||
PptxInches(1), PptxInches(4.8),
|
||||
PptxInches(8), PptxInches(1)
|
||||
)
|
||||
subtitle_frame = subtitle_box.text_frame
|
||||
subtitle_frame.text = "Stratégie, Innovation et Excellence Opérationnelle"
|
||||
subtitle_para = subtitle_frame.paragraphs[0]
|
||||
subtitle_para.font.size = PptxPt(24)
|
||||
subtitle_para.font.italic = True
|
||||
subtitle_para.font.color.rgb = PptxRGBColor(255, 255, 255)
|
||||
subtitle_para.alignment = PP_ALIGN.CENTER
|
||||
|
||||
# SLIDE 2: AGENDA
|
||||
slide2 = prs.slides.add_slide(prs.slide_layouts[1])
|
||||
title2 = slide2.shapes.title
|
||||
title2.text = "Agenda de la Présentation"
|
||||
title2.text_frame.paragraphs[0].font.size = PptxPt(40)
|
||||
title2.text_frame.paragraphs[0].font.color.rgb = PptxRGBColor(0, 102, 204)
|
||||
|
||||
body2 = slide2.placeholders[1]
|
||||
tf2 = body2.text_frame
|
||||
tf2.clear()
|
||||
|
||||
agenda_items = [
|
||||
"Contexte et Enjeux Stratégiques",
|
||||
"Analyse des Performances 2024",
|
||||
"Objectifs de Transformation Digitale",
|
||||
"Initiatives Clés et Roadmap",
|
||||
"Budget et Ressources",
|
||||
"Indicateurs de Succès et KPIs",
|
||||
"Plan d'Action et Prochaines Étapes"
|
||||
]
|
||||
|
||||
for i, item in enumerate(agenda_items):
|
||||
p = tf2.add_paragraph()
|
||||
p.text = item
|
||||
p.level = 0
|
||||
p.font.size = PptxPt(20)
|
||||
p.space_before = PptxPt(8)
|
||||
|
||||
# Numérotation colorée
|
||||
run = p.runs[0]
|
||||
run.text = f"{i+1}. {item}"
|
||||
if i % 2 == 0:
|
||||
run.font.color.rgb = PptxRGBColor(0, 102, 204)
|
||||
else:
|
||||
run.font.color.rgb = PptxRGBColor(255, 102, 0)
|
||||
|
||||
# SLIDE 3: DONNÉES AVEC GRAPHIQUE
|
||||
slide3 = prs.slides.add_slide(prs.slide_layouts[5])
|
||||
title3 = slide3.shapes.title
|
||||
title3.text = "Croissance et Performance Financière"
|
||||
|
||||
# Créer graphique pour PPT
|
||||
chart_img = SAMPLE_DIR / "ppt_financial_chart.png"
|
||||
fig, ax = plt.subplots(figsize=(10, 6))
|
||||
|
||||
years = ['2020', '2021', '2022', '2023', '2024']
|
||||
revenue = [125, 145, 168, 195, 228]
|
||||
profit = [15, 22, 28, 35, 48]
|
||||
|
||||
x = np.arange(len(years))
|
||||
width = 0.35
|
||||
|
||||
bars1 = ax.bar(x - width/2, revenue, width, label='Revenus (M€)', color='#0066CC')
|
||||
bars2 = ax.bar(x + width/2, profit, width, label='Bénéfices (M€)', color='#FF6600')
|
||||
|
||||
ax.set_xlabel('Années', fontsize=12, fontweight='bold')
|
||||
ax.set_ylabel('Montants (M€)', fontsize=12, fontweight='bold')
|
||||
ax.set_title('Évolution Financière 2020-2024', fontsize=16, fontweight='bold')
|
||||
ax.set_xticks(x)
|
||||
ax.set_xticklabels(years)
|
||||
ax.legend(fontsize=11)
|
||||
ax.grid(True, alpha=0.3)
|
||||
|
||||
# Ajouter valeurs sur les barres
|
||||
for bars in [bars1, bars2]:
|
||||
for bar in bars:
|
||||
height = bar.get_height()
|
||||
ax.text(bar.get_x() + bar.get_width()/2., height,
|
||||
f'{int(height)}',
|
||||
ha='center', va='bottom', fontsize=10, fontweight='bold')
|
||||
|
||||
plt.tight_layout()
|
||||
plt.savefig(chart_img, dpi=150, bbox_inches='tight', facecolor='white')
|
||||
plt.close()
|
||||
|
||||
slide3.shapes.add_picture(
|
||||
str(chart_img),
|
||||
PptxInches(1.5), PptxInches(2),
|
||||
width=PptxInches(7)
|
||||
)
|
||||
|
||||
# SLIDE 4: TABLEAU COMPLEXE
|
||||
slide4 = prs.slides.add_slide(prs.slide_layouts[5])
|
||||
title4 = slide4.shapes.title
|
||||
title4.text = "Répartition Budgétaire par Département"
|
||||
|
||||
# Tableau
|
||||
rows, cols = 8, 5
|
||||
table = slide4.shapes.add_table(
|
||||
rows, cols,
|
||||
PptxInches(0.8), PptxInches(2),
|
||||
PptxInches(8.4), PptxInches(4)
|
||||
).table
|
||||
|
||||
# En-têtes
|
||||
headers = ['Département', 'Budget 2024 (M€)', 'Budget 2025 (M€)', 'Variation', 'Priorité']
|
||||
for col, header in enumerate(headers):
|
||||
cell = table.cell(0, col)
|
||||
cell.text = header
|
||||
cell.fill.solid()
|
||||
cell.fill.fore_color.rgb = PptxRGBColor(0, 102, 204)
|
||||
for paragraph in cell.text_frame.paragraphs:
|
||||
for run in paragraph.runs:
|
||||
run.font.size = PptxPt(14)
|
||||
run.font.bold = True
|
||||
run.font.color.rgb = PptxRGBColor(255, 255, 255)
|
||||
cell.text_frame.paragraphs[0].alignment = PP_ALIGN.CENTER
|
||||
|
||||
# Données
|
||||
dept_data = [
|
||||
('Recherche & Développement', '45.2', '58.5', '+29.4%', 'Élevée'),
|
||||
('Ventes & Marketing', '38.7', '42.3', '+9.3%', 'Élevée'),
|
||||
('Opérations', '52.3', '55.1', '+5.4%', 'Moyenne'),
|
||||
('IT & Infrastructure', '28.5', '35.2', '+23.5%', 'Élevée'),
|
||||
('Ressources Humaines', '15.8', '17.2', '+8.9%', 'Moyenne'),
|
||||
('Administration', '12.3', '13.1', '+6.5%', 'Faible'),
|
||||
('TOTAL', '192.8', '221.4', '+14.8%', '-')
|
||||
]
|
||||
|
||||
for row_idx, row_data in enumerate(dept_data, 1):
|
||||
for col_idx, value in enumerate(row_data):
|
||||
cell = table.cell(row_idx, col_idx)
|
||||
cell.text = value
|
||||
|
||||
# Formatage spécial pour la ligne total
|
||||
if row_idx == 7:
|
||||
cell.fill.solid()
|
||||
cell.fill.fore_color.rgb = PptxRGBColor(255, 215, 0)
|
||||
for paragraph in cell.text_frame.paragraphs:
|
||||
for run in paragraph.runs:
|
||||
run.font.bold = True
|
||||
run.font.size = PptxPt(13)
|
||||
else:
|
||||
# Alternance de couleurs
|
||||
if row_idx % 2 == 0:
|
||||
cell.fill.solid()
|
||||
cell.fill.fore_color.rgb = PptxRGBColor(240, 248, 255)
|
||||
|
||||
for paragraph in cell.text_frame.paragraphs:
|
||||
for run in paragraph.runs:
|
||||
run.font.size = PptxPt(12)
|
||||
|
||||
# Alignement
|
||||
cell.text_frame.paragraphs[0].alignment = PP_ALIGN.CENTER if col_idx > 0 else PP_ALIGN.LEFT
|
||||
|
||||
# SLIDE 5: INITIATIVES CLÉS AVEC FORMES
|
||||
slide5 = prs.slides.add_slide(prs.slide_layouts[5])
|
||||
title5 = slide5.shapes.title
|
||||
title5.text = "Initiatives Stratégiques 2025"
|
||||
|
||||
initiatives = [
|
||||
("Innovation IA", "Intégration IA dans tous les produits", PptxRGBColor(46, 125, 50)),
|
||||
("Cloud First", "Migration complète vers le cloud", PptxRGBColor(33, 150, 243)),
|
||||
("Customer 360", "Vue unifiée du parcours client", PptxRGBColor(255, 152, 0)),
|
||||
("Green IT", "Neutralité carbone datacenter", PptxRGBColor(76, 175, 80))
|
||||
]
|
||||
|
||||
y_position = 2.2
|
||||
for i, (title_text, desc, color) in enumerate(initiatives):
|
||||
# Rectangle coloré
|
||||
shape = slide5.shapes.add_shape(
|
||||
MSO_SHAPE.ROUNDED_RECTANGLE,
|
||||
PptxInches(1), PptxInches(y_position),
|
||||
PptxInches(8), PptxInches(1)
|
||||
)
|
||||
shape.fill.solid()
|
||||
shape.fill.fore_color.rgb = color
|
||||
shape.line.color.rgb = color
|
||||
shape.shadow.inherit = False
|
||||
|
||||
# Texte dans la forme
|
||||
text_frame = shape.text_frame
|
||||
text_frame.clear()
|
||||
|
||||
p1 = text_frame.paragraphs[0]
|
||||
p1.text = title_text
|
||||
p1.font.size = PptxPt(22)
|
||||
p1.font.bold = True
|
||||
p1.font.color.rgb = PptxRGBColor(255, 255, 255)
|
||||
p1.alignment = PP_ALIGN.LEFT
|
||||
|
||||
p2 = text_frame.add_paragraph()
|
||||
p2.text = desc
|
||||
p2.font.size = PptxPt(16)
|
||||
p2.font.color.rgb = PptxRGBColor(255, 255, 255)
|
||||
p2.alignment = PP_ALIGN.LEFT
|
||||
p2.space_before = PptxPt(5)
|
||||
|
||||
text_frame.vertical_anchor = MSO_ANCHOR.MIDDLE
|
||||
text_frame.margin_left = PptxInches(0.3)
|
||||
|
||||
y_position += 1.2
|
||||
|
||||
# SLIDE 6: TIMELINE
|
||||
slide6 = prs.slides.add_slide(prs.slide_layouts[5])
|
||||
title6 = slide6.shapes.title
|
||||
title6.text = "Roadmap de Déploiement"
|
||||
|
||||
# Timeline horizontale
|
||||
timeline_data = [
|
||||
("Q1", "Planification", PptxRGBColor(76, 175, 80)),
|
||||
("Q2", "Développement", PptxRGBColor(33, 150, 243)),
|
||||
("Q3", "Tests & Pilotes", PptxRGBColor(255, 152, 0)),
|
||||
("Q4", "Déploiement", PptxRGBColor(156, 39, 176))
|
||||
]
|
||||
|
||||
x_start = 1
|
||||
y_pos = 3
|
||||
width = 1.8
|
||||
|
||||
for quarter, phase, color in timeline_data:
|
||||
# Cercle pour le trimestre
|
||||
circle = slide6.shapes.add_shape(
|
||||
MSO_SHAPE.OVAL,
|
||||
PptxInches(x_start), PptxInches(y_pos),
|
||||
PptxInches(0.8), PptxInches(0.8)
|
||||
)
|
||||
circle.fill.solid()
|
||||
circle.fill.fore_color.rgb = color
|
||||
circle.line.color.rgb = color
|
||||
|
||||
# Texte du trimestre
|
||||
tf = circle.text_frame
|
||||
tf.text = quarter
|
||||
tf.paragraphs[0].font.size = PptxPt(18)
|
||||
tf.paragraphs[0].font.bold = True
|
||||
tf.paragraphs[0].font.color.rgb = PptxRGBColor(255, 255, 255)
|
||||
tf.paragraphs[0].alignment = PP_ALIGN.CENTER
|
||||
tf.vertical_anchor = MSO_ANCHOR.MIDDLE
|
||||
|
||||
# Description de la phase
|
||||
text_box = slide6.shapes.add_textbox(
|
||||
PptxInches(x_start - 0.3), PptxInches(y_pos + 1.2),
|
||||
PptxInches(1.4), PptxInches(0.6)
|
||||
)
|
||||
tf2 = text_box.text_frame
|
||||
tf2.text = phase
|
||||
tf2.paragraphs[0].font.size = PptxPt(14)
|
||||
tf2.paragraphs[0].font.bold = True
|
||||
tf2.paragraphs[0].font.color.rgb = color
|
||||
tf2.paragraphs[0].alignment = PP_ALIGN.CENTER
|
||||
|
||||
# Ligne de connexion (sauf pour le dernier)
|
||||
if x_start < 7:
|
||||
line = slide6.shapes.add_connector(
|
||||
1, # Straight connector
|
||||
PptxInches(x_start + 0.8), PptxInches(y_pos + 0.4),
|
||||
PptxInches(x_start + width), PptxInches(y_pos + 0.4)
|
||||
)
|
||||
line.line.color.rgb = PptxRGBColor(100, 100, 100)
|
||||
line.line.width = PptxPt(3)
|
||||
|
||||
x_start += width
|
||||
|
||||
# SLIDE 7: CONCLUSION
|
||||
slide7 = prs.slides.add_slide(prs.slide_layouts[1])
|
||||
title7 = slide7.shapes.title
|
||||
title7.text = "Prochaines Étapes et Engagement"
|
||||
|
||||
body7 = slide7.placeholders[1]
|
||||
tf7 = body7.text_frame
|
||||
tf7.clear()
|
||||
|
||||
next_steps = [
|
||||
"Validation du comité exécutif - Janvier 2025",
|
||||
"Kick-off des programmes prioritaires - Février 2025",
|
||||
"Revues mensuelles de progression avec les sponsors",
|
||||
"Communication régulière à toutes les parties prenantes",
|
||||
"Ajustements agiles basés sur les retours du terrain"
|
||||
]
|
||||
|
||||
for step in next_steps:
|
||||
p = tf7.add_paragraph()
|
||||
p.text = step
|
||||
p.level = 0
|
||||
p.font.size = PptxPt(20)
|
||||
p.space_before = PptxPt(10)
|
||||
p.font.color.rgb = PptxRGBColor(0, 102, 204)
|
||||
|
||||
# Ajouter une image de conclusion
|
||||
conclusion_img = SAMPLE_DIR / "ppt_conclusion_image.png"
|
||||
img = Image.new('RGB', (800, 300), color=(255, 255, 255))
|
||||
draw = ImageDraw.Draw(img)
|
||||
|
||||
# Dessiner un graphique de succès stylisé
|
||||
draw.rectangle([50, 50, 750, 250], outline=(0, 102, 204), width=3)
|
||||
try:
|
||||
font = ImageFont.truetype("arial.ttf", 50)
|
||||
except:
|
||||
font = ImageFont.load_default()
|
||||
draw.text((400, 150), "SUCCÈS 2025", fill=(0, 102, 204), font=font, anchor="mm")
|
||||
img.save(conclusion_img)
|
||||
|
||||
slide7.shapes.add_picture(
|
||||
str(conclusion_img),
|
||||
PptxInches(2.5), PptxInches(5),
|
||||
width=PptxInches(5)
|
||||
)
|
||||
|
||||
# Ajouter notes de présentation
|
||||
notes_slide = slide1.notes_slide
|
||||
notes_slide.notes_text_frame.text = "Bienvenue à tous. Cette présentation couvre notre stratégie de transformation digitale pour 2025. Nous allons explorer nos objectifs ambitieux et le plan d'action pour les atteindre."
|
||||
|
||||
# Sauvegarder PowerPoint
|
||||
ppt_file = SAMPLE_DIR / "super_complex.pptx"
|
||||
prs.save(ppt_file)
|
||||
print(f"✅ PowerPoint créé: {ppt_file}")
|
||||
print(f" - 7 diapositives professionnelles")
|
||||
print(f" - Slide de titre avec design custom")
|
||||
print(f" - Agenda structuré avec numérotation")
|
||||
print(f" - Tableau complexe 8x5 avec formatage")
|
||||
print(f" - Graphiques intégrés (barres avec valeurs)")
|
||||
print(f" - Timeline visuelle avec formes connectées")
|
||||
print(f" - 4 initiatives avec rectangles colorés")
|
||||
print(f" - Images et formes multiples")
|
||||
print(f" - Notes de présentation\n")
|
||||
|
||||
print("=" * 70)
|
||||
print("🎉 TOUS LES FICHIERS COMPLEXES ONT ÉTÉ CRÉÉS AVEC SUCCÈS!")
|
||||
print("=" * 70)
|
||||
print(f"\n📁 Fichiers disponibles dans: {SAMPLE_DIR.absolute()}")
|
||||
print("\nVous pouvez maintenant:")
|
||||
print("1. Ouvrir les fichiers pour vérifier la complexité")
|
||||
print("2. Les traduire via l'API")
|
||||
print("3. Vérifier que le formatage est préservé")
|
||||
print("\n✨ Formatage inclus:")
|
||||
print(" Excel: Formules, cellules fusionnées, graphiques, formatage conditionnel")
|
||||
print(" Word: Images, tableaux, styles, couleurs, pieds de page")
|
||||
print(" PowerPoint: Formes, graphiques, timeline, tableaux, animations visuelles")
|
||||
307
main.py
Normal file
@ -0,0 +1,307 @@
|
||||
"""
|
||||
Document Translation API
|
||||
FastAPI application for translating complex documents while preserving formatting
|
||||
"""
|
||||
from fastapi import FastAPI, UploadFile, File, Form, HTTPException
|
||||
from fastapi.responses import FileResponse, JSONResponse
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
import asyncio
|
||||
import logging
|
||||
|
||||
from config import config
|
||||
from translators import excel_translator, word_translator, pptx_translator
|
||||
from utils import file_handler, handle_translation_error, DocumentProcessingError
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Ensure necessary directories exist
|
||||
config.ensure_directories()
|
||||
|
||||
# Create FastAPI app
|
||||
app = FastAPI(
|
||||
title=config.API_TITLE,
|
||||
version=config.API_VERSION,
|
||||
description=config.API_DESCRIPTION
|
||||
)
|
||||
|
||||
# Add CORS middleware
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"], # Configure appropriately for production
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
|
||||
@app.get("/")
|
||||
async def root():
|
||||
"""Root endpoint with API information"""
|
||||
return {
|
||||
"name": config.API_TITLE,
|
||||
"version": config.API_VERSION,
|
||||
"status": "operational",
|
||||
"supported_formats": list(config.SUPPORTED_EXTENSIONS),
|
||||
"endpoints": {
|
||||
"translate": "/translate",
|
||||
"health": "/health",
|
||||
"supported_languages": "/languages"
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
async def health_check():
|
||||
"""Health check endpoint"""
|
||||
return {
|
||||
"status": "healthy",
|
||||
"translation_service": config.TRANSLATION_SERVICE
|
||||
}
|
||||
|
||||
|
||||
@app.get("/languages")
|
||||
async def get_supported_languages():
|
||||
"""Get list of supported language codes"""
|
||||
return {
|
||||
"supported_languages": {
|
||||
"es": "Spanish",
|
||||
"fr": "French",
|
||||
"de": "German",
|
||||
"it": "Italian",
|
||||
"pt": "Portuguese",
|
||||
"ru": "Russian",
|
||||
"zh": "Chinese (Simplified)",
|
||||
"ja": "Japanese",
|
||||
"ko": "Korean",
|
||||
"ar": "Arabic",
|
||||
"hi": "Hindi",
|
||||
"nl": "Dutch",
|
||||
"pl": "Polish",
|
||||
"tr": "Turkish",
|
||||
"sv": "Swedish",
|
||||
"da": "Danish",
|
||||
"no": "Norwegian",
|
||||
"fi": "Finnish",
|
||||
"cs": "Czech",
|
||||
"el": "Greek",
|
||||
"th": "Thai",
|
||||
"vi": "Vietnamese",
|
||||
"id": "Indonesian",
|
||||
"uk": "Ukrainian",
|
||||
"ro": "Romanian",
|
||||
"hu": "Hungarian"
|
||||
},
|
||||
"note": "Supported languages may vary depending on the translation service configured"
|
||||
}
|
||||
|
||||
|
||||
@app.post("/translate")
|
||||
async def translate_document(
|
||||
file: UploadFile = File(..., description="Document file to translate (.xlsx, .docx, or .pptx)"),
|
||||
target_language: str = Form(..., description="Target language code (e.g., 'es', 'fr', 'de')"),
|
||||
source_language: str = Form(default="auto", description="Source language code (default: auto-detect)"),
|
||||
cleanup: bool = Form(default=True, description="Delete input file after translation")
|
||||
):
|
||||
"""
|
||||
Translate a document while preserving all formatting, layout, and embedded media
|
||||
|
||||
**Supported File Types:**
|
||||
- Excel (.xlsx) - Preserves formulas, merged cells, styling, and images
|
||||
- Word (.docx) - Preserves headings, tables, images, headers/footers
|
||||
- PowerPoint (.pptx) - Preserves layouts, animations, and media
|
||||
|
||||
**Parameters:**
|
||||
- **file**: The document file to translate
|
||||
- **target_language**: Target language code (e.g., 'es' for Spanish, 'fr' for French)
|
||||
- **source_language**: Source language code (optional, default: auto-detect)
|
||||
- **cleanup**: Whether to delete the uploaded file after translation (default: True)
|
||||
|
||||
**Returns:**
|
||||
- Translated document file with preserved formatting
|
||||
"""
|
||||
input_path = None
|
||||
output_path = None
|
||||
|
||||
try:
|
||||
# Validate file extension
|
||||
file_extension = file_handler.validate_file_extension(file.filename)
|
||||
logger.info(f"Processing {file_extension} file: {file.filename}")
|
||||
|
||||
# Validate file size
|
||||
file_handler.validate_file_size(file)
|
||||
|
||||
# Generate unique filenames
|
||||
input_filename = file_handler.generate_unique_filename(file.filename, "input")
|
||||
output_filename = file_handler.generate_unique_filename(file.filename, "translated")
|
||||
|
||||
# Save uploaded file
|
||||
input_path = config.UPLOAD_DIR / input_filename
|
||||
output_path = config.OUTPUT_DIR / output_filename
|
||||
|
||||
await file_handler.save_upload_file(file, input_path)
|
||||
logger.info(f"Saved input file to: {input_path}")
|
||||
|
||||
# Translate based on file type
|
||||
if file_extension == ".xlsx":
|
||||
logger.info("Translating Excel file...")
|
||||
excel_translator.translate_file(input_path, output_path, target_language)
|
||||
elif file_extension == ".docx":
|
||||
logger.info("Translating Word document...")
|
||||
word_translator.translate_file(input_path, output_path, target_language)
|
||||
elif file_extension == ".pptx":
|
||||
logger.info("Translating PowerPoint presentation...")
|
||||
pptx_translator.translate_file(input_path, output_path, target_language)
|
||||
else:
|
||||
raise DocumentProcessingError(f"Unsupported file type: {file_extension}")
|
||||
|
||||
logger.info(f"Translation completed: {output_path}")
|
||||
|
||||
# Get file info
|
||||
output_info = file_handler.get_file_info(output_path)
|
||||
|
||||
# Cleanup input file if requested
|
||||
if cleanup and input_path:
|
||||
file_handler.cleanup_file(input_path)
|
||||
logger.info(f"Cleaned up input file: {input_path}")
|
||||
|
||||
# Return the translated file
|
||||
return FileResponse(
|
||||
path=output_path,
|
||||
filename=f"translated_{file.filename}",
|
||||
media_type="application/octet-stream",
|
||||
headers={
|
||||
"X-Original-Filename": file.filename,
|
||||
"X-File-Size-MB": str(output_info.get("size_mb", 0)),
|
||||
"X-Target-Language": target_language
|
||||
}
|
||||
)
|
||||
|
||||
except HTTPException:
|
||||
# Re-raise HTTP exceptions
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Translation error: {str(e)}", exc_info=True)
|
||||
|
||||
# Cleanup files on error
|
||||
if input_path:
|
||||
file_handler.cleanup_file(input_path)
|
||||
if output_path:
|
||||
file_handler.cleanup_file(output_path)
|
||||
|
||||
raise handle_translation_error(e)
|
||||
|
||||
|
||||
@app.delete("/cleanup/{filename}")
|
||||
async def cleanup_translated_file(filename: str):
|
||||
"""
|
||||
Cleanup a translated file after download
|
||||
|
||||
**Parameters:**
|
||||
- **filename**: Name of the file to delete from the outputs directory
|
||||
"""
|
||||
try:
|
||||
file_path = config.OUTPUT_DIR / filename
|
||||
|
||||
if not file_path.exists():
|
||||
raise HTTPException(status_code=404, detail="File not found")
|
||||
|
||||
file_handler.cleanup_file(file_path)
|
||||
|
||||
return {"message": f"File {filename} deleted successfully"}
|
||||
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Cleanup error: {str(e)}")
|
||||
raise HTTPException(status_code=500, detail="Error cleaning up file")
|
||||
|
||||
|
||||
@app.post("/translate-batch")
|
||||
async def translate_batch_documents(
|
||||
files: list[UploadFile] = File(..., description="Multiple document files to translate"),
|
||||
target_language: str = Form(..., description="Target language code"),
|
||||
source_language: str = Form(default="auto", description="Source language code")
|
||||
):
|
||||
"""
|
||||
Translate multiple documents in batch
|
||||
|
||||
**Note:** This endpoint processes files sequentially. For large batches, consider
|
||||
calling the single file endpoint multiple times with concurrent requests.
|
||||
"""
|
||||
results = []
|
||||
|
||||
for file in files:
|
||||
try:
|
||||
# Process each file using the same logic as single file translation
|
||||
file_extension = file_handler.validate_file_extension(file.filename)
|
||||
file_handler.validate_file_size(file)
|
||||
|
||||
input_filename = file_handler.generate_unique_filename(file.filename, "input")
|
||||
output_filename = file_handler.generate_unique_filename(file.filename, "translated")
|
||||
|
||||
input_path = config.UPLOAD_DIR / input_filename
|
||||
output_path = config.OUTPUT_DIR / output_filename
|
||||
|
||||
await file_handler.save_upload_file(file, input_path)
|
||||
|
||||
# Translate based on file type
|
||||
if file_extension == ".xlsx":
|
||||
excel_translator.translate_file(input_path, output_path, target_language)
|
||||
elif file_extension == ".docx":
|
||||
word_translator.translate_file(input_path, output_path, target_language)
|
||||
elif file_extension == ".pptx":
|
||||
pptx_translator.translate_file(input_path, output_path, target_language)
|
||||
|
||||
# Cleanup input file
|
||||
file_handler.cleanup_file(input_path)
|
||||
|
||||
results.append({
|
||||
"filename": file.filename,
|
||||
"status": "success",
|
||||
"output_file": output_filename,
|
||||
"download_url": f"/download/{output_filename}"
|
||||
})
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error processing {file.filename}: {str(e)}")
|
||||
results.append({
|
||||
"filename": file.filename,
|
||||
"status": "error",
|
||||
"error": str(e)
|
||||
})
|
||||
|
||||
return {
|
||||
"total_files": len(files),
|
||||
"successful": len([r for r in results if r["status"] == "success"]),
|
||||
"failed": len([r for r in results if r["status"] == "error"]),
|
||||
"results": results
|
||||
}
|
||||
|
||||
|
||||
@app.get("/download/{filename}")
|
||||
async def download_file(filename: str):
|
||||
"""
|
||||
Download a translated file by filename
|
||||
|
||||
**Parameters:**
|
||||
- **filename**: Name of the file to download from the outputs directory
|
||||
"""
|
||||
file_path = config.OUTPUT_DIR / filename
|
||||
|
||||
if not file_path.exists():
|
||||
raise HTTPException(status_code=404, detail="File not found")
|
||||
|
||||
return FileResponse(
|
||||
path=file_path,
|
||||
filename=filename,
|
||||
media_type="application/octet-stream"
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True)
|
||||
239
mcp_server_example.py
Normal file
@ -0,0 +1,239 @@
|
||||
"""
|
||||
Example MCP Server Implementation for Document Translation API
|
||||
This demonstrates how to wrap the translation API as an MCP server
|
||||
"""
|
||||
import asyncio
|
||||
import httpx
|
||||
from typing import Any
|
||||
from mcp.server.models import InitializationOptions
|
||||
from mcp.server import NotificationOptions, Server
|
||||
from mcp.server.stdio import stdio_server
|
||||
from mcp import types
|
||||
|
||||
# API Configuration
|
||||
API_BASE_URL = "http://localhost:8000"
|
||||
|
||||
|
||||
class DocumentTranslatorMCP:
|
||||
"""MCP Server for Document Translation API"""
|
||||
|
||||
def __init__(self):
|
||||
self.server = Server("document-translator")
|
||||
self.http_client = None
|
||||
self._setup_handlers()
|
||||
|
||||
def _setup_handlers(self):
|
||||
"""Set up MCP tool handlers"""
|
||||
|
||||
@self.server.list_tools()
|
||||
async def handle_list_tools() -> list[types.Tool]:
|
||||
"""List available tools"""
|
||||
return [
|
||||
types.Tool(
|
||||
name="translate_document",
|
||||
description="Translate a document (Excel, Word, or PowerPoint) while preserving all formatting",
|
||||
inputSchema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"file_path": {
|
||||
"type": "string",
|
||||
"description": "Path to the document file to translate"
|
||||
},
|
||||
"target_language": {
|
||||
"type": "string",
|
||||
"description": "Target language code (e.g., 'es', 'fr', 'de')"
|
||||
},
|
||||
"source_language": {
|
||||
"type": "string",
|
||||
"description": "Source language code (default: 'auto' for auto-detection)",
|
||||
"default": "auto"
|
||||
},
|
||||
"output_path": {
|
||||
"type": "string",
|
||||
"description": "Path where the translated document should be saved"
|
||||
}
|
||||
},
|
||||
"required": ["file_path", "target_language", "output_path"]
|
||||
}
|
||||
),
|
||||
types.Tool(
|
||||
name="get_supported_languages",
|
||||
description="Get list of supported language codes for translation",
|
||||
inputSchema={
|
||||
"type": "object",
|
||||
"properties": {}
|
||||
}
|
||||
),
|
||||
types.Tool(
|
||||
name="check_api_health",
|
||||
description="Check if the translation API is healthy and operational",
|
||||
inputSchema={
|
||||
"type": "object",
|
||||
"properties": {}
|
||||
}
|
||||
)
|
||||
]
|
||||
|
||||
@self.server.call_tool()
|
||||
async def handle_call_tool(
|
||||
name: str,
|
||||
arguments: dict[str, Any] | None
|
||||
) -> list[types.TextContent | types.ImageContent | types.EmbeddedResource]:
|
||||
"""Handle tool calls"""
|
||||
|
||||
if name == "translate_document":
|
||||
return await self._translate_document(arguments)
|
||||
elif name == "get_supported_languages":
|
||||
return await self._get_supported_languages()
|
||||
elif name == "check_api_health":
|
||||
return await self._check_health()
|
||||
else:
|
||||
raise ValueError(f"Unknown tool: {name}")
|
||||
|
||||
async def _translate_document(self, args: dict[str, Any]) -> list[types.TextContent]:
|
||||
"""Translate a document via the API"""
|
||||
file_path = args["file_path"]
|
||||
target_language = args["target_language"]
|
||||
source_language = args.get("source_language", "auto")
|
||||
output_path = args["output_path"]
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=300.0) as client:
|
||||
# Upload and translate the document
|
||||
with open(file_path, "rb") as f:
|
||||
files = {"file": (file_path, f)}
|
||||
data = {
|
||||
"target_language": target_language,
|
||||
"source_language": source_language
|
||||
}
|
||||
|
||||
response = await client.post(
|
||||
f"{API_BASE_URL}/translate",
|
||||
files=files,
|
||||
data=data
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
# Save the translated document
|
||||
with open(output_path, "wb") as output:
|
||||
output.write(response.content)
|
||||
|
||||
return [
|
||||
types.TextContent(
|
||||
type="text",
|
||||
text=f"✅ Document translated successfully!\n\n"
|
||||
f"Original: {file_path}\n"
|
||||
f"Translated: {output_path}\n"
|
||||
f"Language: {source_language} → {target_language}\n"
|
||||
f"Size: {len(response.content)} bytes"
|
||||
)
|
||||
]
|
||||
else:
|
||||
error_detail = response.json().get("detail", "Unknown error")
|
||||
return [
|
||||
types.TextContent(
|
||||
type="text",
|
||||
text=f"❌ Translation failed: {error_detail}"
|
||||
)
|
||||
]
|
||||
|
||||
except Exception as e:
|
||||
return [
|
||||
types.TextContent(
|
||||
type="text",
|
||||
text=f"❌ Error during translation: {str(e)}"
|
||||
)
|
||||
]
|
||||
|
||||
async def _get_supported_languages(self) -> list[types.TextContent]:
|
||||
"""Get supported languages from the API"""
|
||||
try:
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.get(f"{API_BASE_URL}/languages")
|
||||
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
languages = data.get("supported_languages", {})
|
||||
|
||||
lang_list = "\n".join([f"- {code}: {name}" for code, name in languages.items()])
|
||||
|
||||
return [
|
||||
types.TextContent(
|
||||
type="text",
|
||||
text=f"📚 Supported Languages:\n\n{lang_list}\n\n"
|
||||
f"Note: {data.get('note', '')}"
|
||||
)
|
||||
]
|
||||
else:
|
||||
return [
|
||||
types.TextContent(
|
||||
type="text",
|
||||
text="❌ Failed to retrieve supported languages"
|
||||
)
|
||||
]
|
||||
|
||||
except Exception as e:
|
||||
return [
|
||||
types.TextContent(
|
||||
type="text",
|
||||
text=f"❌ Error: {str(e)}"
|
||||
)
|
||||
]
|
||||
|
||||
async def _check_health(self) -> list[types.TextContent]:
|
||||
"""Check API health"""
|
||||
try:
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.get(f"{API_BASE_URL}/health")
|
||||
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
return [
|
||||
types.TextContent(
|
||||
type="text",
|
||||
text=f"✅ API is healthy!\n\n"
|
||||
f"Status: {data.get('status')}\n"
|
||||
f"Translation Service: {data.get('translation_service')}"
|
||||
)
|
||||
]
|
||||
else:
|
||||
return [
|
||||
types.TextContent(
|
||||
type="text",
|
||||
text="❌ API is not responding correctly"
|
||||
)
|
||||
]
|
||||
|
||||
except Exception as e:
|
||||
return [
|
||||
types.TextContent(
|
||||
type="text",
|
||||
text=f"❌ Cannot connect to API: {str(e)}"
|
||||
)
|
||||
]
|
||||
|
||||
async def run(self):
|
||||
"""Run the MCP server"""
|
||||
async with stdio_server() as (read_stream, write_stream):
|
||||
await self.server.run(
|
||||
read_stream,
|
||||
write_stream,
|
||||
InitializationOptions(
|
||||
server_name="document-translator",
|
||||
server_version="1.0.0",
|
||||
capabilities=self.server.get_capabilities(
|
||||
notification_options=NotificationOptions(),
|
||||
experimental_capabilities={}
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
async def main():
|
||||
"""Main entry point"""
|
||||
mcp_server = DocumentTranslatorMCP()
|
||||
await mcp_server.run()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
10
pyproject.toml
Normal file
@ -0,0 +1,10 @@
|
||||
[project]
|
||||
name = "translate"
|
||||
version = "0.1.0"
|
||||
description = "Add your description here"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.12"
|
||||
dependencies = [
|
||||
"pip>=25.3",
|
||||
"requests>=2.32.5",
|
||||
]
|
||||
5
requirements-mcp.txt
Normal file
@ -0,0 +1,5 @@
|
||||
# MCP Server Requirements
|
||||
# Add these to requirements.txt if you want to implement the MCP server
|
||||
|
||||
mcp>=0.9.0
|
||||
httpx>=0.26.0
|
||||
5
requirements-test.txt
Normal file
@ -0,0 +1,5 @@
|
||||
# Testing requirements
|
||||
requests==2.31.0
|
||||
pytest==7.4.3
|
||||
pytest-asyncio==0.23.2
|
||||
httpx==0.26.0
|
||||
15
requirements.txt
Normal file
@ -0,0 +1,15 @@
|
||||
fastapi==0.109.0
|
||||
uvicorn[standard]==0.27.0
|
||||
python-multipart==0.0.9
|
||||
openpyxl==3.1.2
|
||||
python-docx==1.1.0
|
||||
python-pptx==0.6.23
|
||||
deep-translator==1.11.4
|
||||
python-dotenv==1.0.0
|
||||
pydantic==2.5.3
|
||||
aiofiles==23.2.1
|
||||
Pillow==10.2.0
|
||||
matplotlib==3.8.2
|
||||
pandas==2.1.4
|
||||
requests==2.31.0
|
||||
ipykernel==6.27.1
|
||||
1
sample_files/.~lock.complex_sample.docx#
Normal file
@ -0,0 +1 @@
|
||||
,ramez,simorgh,30.11.2025 09:24,C:/Users/ramez/AppData/Local/onlyoffice;
|
||||
BIN
sample_files/complex_sample.docx
Normal file
BIN
sample_files/complex_sample.pptx
Normal file
BIN
sample_files/complex_sample.xlsx
Normal file
BIN
sample_files/ppt_conclusion_image.png
Normal file
|
After Width: | Height: | Size: 7.8 KiB |
BIN
sample_files/ppt_financial_chart.png
Normal file
|
After Width: | Height: | Size: 51 KiB |
BIN
sample_files/ppt_image1.png
Normal file
|
After Width: | Height: | Size: 6.7 KiB |
BIN
sample_files/ppt_image2.png
Normal file
|
After Width: | Height: | Size: 6.7 KiB |
BIN
sample_files/super_complex.docx
Normal file
BIN
sample_files/super_complex.pptx
Normal file
BIN
sample_files/super_complex.xlsx
Normal file
BIN
sample_files/word_chart.png
Normal file
|
After Width: | Height: | Size: 16 KiB |
BIN
sample_files/word_image.png
Normal file
|
After Width: | Height: | Size: 5.9 KiB |
BIN
sample_files/word_performance_chart.png
Normal file
|
After Width: | Height: | Size: 81 KiB |
BIN
sample_files/word_strategy_image.png
Normal file
|
After Width: | Height: | Size: 16 KiB |
4
services/__init__.py
Normal file
@ -0,0 +1,4 @@
|
||||
"""Services package initialization"""
|
||||
from .translation_service import TranslationService, translation_service
|
||||
|
||||
__all__ = ['TranslationService', 'translation_service']
|
||||
124
services/translation_service.py
Normal file
@ -0,0 +1,124 @@
|
||||
"""
|
||||
Translation Service Abstraction
|
||||
Provides a unified interface for different translation providers
|
||||
"""
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Optional
|
||||
from deep_translator import GoogleTranslator, DeeplTranslator, LibreTranslator
|
||||
from config import config
|
||||
|
||||
|
||||
class TranslationProvider(ABC):
|
||||
"""Abstract base class for translation providers"""
|
||||
|
||||
@abstractmethod
|
||||
def translate(self, text: str, target_language: str, source_language: str = 'auto') -> str:
|
||||
"""Translate text from source to target language"""
|
||||
pass
|
||||
|
||||
|
||||
class GoogleTranslationProvider(TranslationProvider):
|
||||
"""Google Translate implementation"""
|
||||
|
||||
def translate(self, text: str, target_language: str, source_language: str = 'auto') -> str:
|
||||
if not text or not text.strip():
|
||||
return text
|
||||
|
||||
try:
|
||||
translator = GoogleTranslator(source=source_language, target=target_language)
|
||||
return translator.translate(text)
|
||||
except Exception as e:
|
||||
print(f"Translation error: {e}")
|
||||
return text
|
||||
|
||||
|
||||
class DeepLTranslationProvider(TranslationProvider):
|
||||
"""DeepL Translate implementation"""
|
||||
|
||||
def __init__(self, api_key: str):
|
||||
self.api_key = api_key
|
||||
|
||||
def translate(self, text: str, target_language: str, source_language: str = 'auto') -> str:
|
||||
if not text or not text.strip():
|
||||
return text
|
||||
|
||||
try:
|
||||
translator = DeeplTranslator(api_key=self.api_key, source=source_language, target=target_language)
|
||||
return translator.translate(text)
|
||||
except Exception as e:
|
||||
print(f"Translation error: {e}")
|
||||
return text
|
||||
|
||||
|
||||
class LibreTranslationProvider(TranslationProvider):
|
||||
"""LibreTranslate implementation"""
|
||||
|
||||
def translate(self, text: str, target_language: str, source_language: str = 'auto') -> str:
|
||||
if not text or not text.strip():
|
||||
return text
|
||||
|
||||
try:
|
||||
translator = LibreTranslator(source=source_language, target=target_language)
|
||||
return translator.translate(text)
|
||||
except Exception as e:
|
||||
print(f"Translation error: {e}")
|
||||
return text
|
||||
|
||||
|
||||
class TranslationService:
|
||||
"""Main translation service that delegates to the configured provider"""
|
||||
|
||||
def __init__(self, provider: Optional[TranslationProvider] = None):
|
||||
if provider:
|
||||
self.provider = provider
|
||||
else:
|
||||
# Auto-select provider based on configuration
|
||||
self.provider = self._get_default_provider()
|
||||
|
||||
def _get_default_provider(self) -> TranslationProvider:
|
||||
"""Get the default translation provider from configuration"""
|
||||
service_type = config.TRANSLATION_SERVICE.lower()
|
||||
|
||||
if service_type == "deepl":
|
||||
if not config.DEEPL_API_KEY:
|
||||
raise ValueError("DeepL API key not configured")
|
||||
return DeepLTranslationProvider(config.DEEPL_API_KEY)
|
||||
elif service_type == "libre":
|
||||
return LibreTranslationProvider()
|
||||
else: # Default to Google
|
||||
return GoogleTranslationProvider()
|
||||
|
||||
def translate_text(self, text: str, target_language: str, source_language: str = 'auto') -> str:
|
||||
"""
|
||||
Translate a single text string
|
||||
|
||||
Args:
|
||||
text: Text to translate
|
||||
target_language: Target language code (e.g., 'es', 'fr', 'de')
|
||||
source_language: Source language code (default: 'auto' for auto-detection)
|
||||
|
||||
Returns:
|
||||
Translated text
|
||||
"""
|
||||
if not text or not text.strip():
|
||||
return text
|
||||
|
||||
return self.provider.translate(text, target_language, source_language)
|
||||
|
||||
def translate_batch(self, texts: list[str], target_language: str, source_language: str = 'auto') -> list[str]:
|
||||
"""
|
||||
Translate multiple text strings
|
||||
|
||||
Args:
|
||||
texts: List of texts to translate
|
||||
target_language: Target language code
|
||||
source_language: Source language code (default: 'auto')
|
||||
|
||||
Returns:
|
||||
List of translated texts
|
||||
"""
|
||||
return [self.translate_text(text, target_language, source_language) for text in texts]
|
||||
|
||||
|
||||
# Global translation service instance
|
||||
translation_service = TranslationService()
|
||||
48
start.ps1
Normal file
@ -0,0 +1,48 @@
|
||||
# Startup script for Windows PowerShell
|
||||
# Run this to start the Document Translation API
|
||||
|
||||
Write-Host "===============================================" -ForegroundColor Cyan
|
||||
Write-Host " Document Translation API - Starting Server " -ForegroundColor Cyan
|
||||
Write-Host "===============================================" -ForegroundColor Cyan
|
||||
Write-Host ""
|
||||
|
||||
# Check if virtual environment exists
|
||||
if (-Not (Test-Path ".\venv")) {
|
||||
Write-Host "Virtual environment not found. Creating one..." -ForegroundColor Yellow
|
||||
python -m venv venv
|
||||
}
|
||||
|
||||
# Activate virtual environment
|
||||
Write-Host "Activating virtual environment..." -ForegroundColor Green
|
||||
& .\venv\Scripts\Activate.ps1
|
||||
|
||||
# Install dependencies if needed
|
||||
Write-Host "Checking dependencies..." -ForegroundColor Green
|
||||
pip install -r requirements.txt --quiet
|
||||
|
||||
# Create necessary directories
|
||||
Write-Host "Creating directories..." -ForegroundColor Green
|
||||
New-Item -ItemType Directory -Force -Path uploads | Out-Null
|
||||
New-Item -ItemType Directory -Force -Path outputs | Out-Null
|
||||
New-Item -ItemType Directory -Force -Path temp | Out-Null
|
||||
|
||||
# Copy .env.example to .env if .env doesn't exist
|
||||
if (-Not (Test-Path ".\.env")) {
|
||||
Write-Host "Creating .env file from template..." -ForegroundColor Yellow
|
||||
Copy-Item .env.example .env
|
||||
}
|
||||
|
||||
Write-Host ""
|
||||
Write-Host "===============================================" -ForegroundColor Green
|
||||
Write-Host " Starting API Server on http://localhost:8000 " -ForegroundColor Green
|
||||
Write-Host "===============================================" -ForegroundColor Green
|
||||
Write-Host ""
|
||||
Write-Host "API Documentation available at:" -ForegroundColor Cyan
|
||||
Write-Host " - Swagger UI: http://localhost:8000/docs" -ForegroundColor White
|
||||
Write-Host " - ReDoc: http://localhost:8000/redoc" -ForegroundColor White
|
||||
Write-Host ""
|
||||
Write-Host "Press Ctrl+C to stop the server" -ForegroundColor Yellow
|
||||
Write-Host ""
|
||||
|
||||
# Start the server
|
||||
python main.py
|
||||
10
translators/__init__.py
Normal file
@ -0,0 +1,10 @@
|
||||
"""Translators package initialization"""
|
||||
from .excel_translator import ExcelTranslator, excel_translator
|
||||
from .word_translator import WordTranslator, word_translator
|
||||
from .pptx_translator import PowerPointTranslator, pptx_translator
|
||||
|
||||
__all__ = [
|
||||
'ExcelTranslator', 'excel_translator',
|
||||
'WordTranslator', 'word_translator',
|
||||
'PowerPointTranslator', 'pptx_translator'
|
||||
]
|
||||
161
translators/excel_translator.py
Normal file
@ -0,0 +1,161 @@
|
||||
"""
|
||||
Excel Translation Module
|
||||
Translates Excel files while preserving all formatting, formulas, images, and layout
|
||||
"""
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Dict, Set
|
||||
from openpyxl import load_workbook
|
||||
from openpyxl.worksheet.worksheet import Worksheet
|
||||
from openpyxl.cell.cell import Cell
|
||||
from openpyxl.utils import get_column_letter
|
||||
from services.translation_service import translation_service
|
||||
|
||||
|
||||
class ExcelTranslator:
|
||||
"""Handles translation of Excel files with strict formatting preservation"""
|
||||
|
||||
def __init__(self):
|
||||
self.translation_service = translation_service
|
||||
self.formula_pattern = re.compile(r'=.*')
|
||||
|
||||
def translate_file(self, input_path: Path, output_path: Path, target_language: str) -> Path:
|
||||
"""
|
||||
Translate an Excel file while preserving all formatting and structure
|
||||
|
||||
Args:
|
||||
input_path: Path to input Excel file
|
||||
output_path: Path to save translated Excel file
|
||||
target_language: Target language code
|
||||
|
||||
Returns:
|
||||
Path to the translated file
|
||||
"""
|
||||
# Load workbook with data_only=False to preserve formulas
|
||||
workbook = load_workbook(input_path, data_only=False)
|
||||
|
||||
# First, translate all worksheet content
|
||||
sheet_name_mapping = {}
|
||||
for sheet_name in workbook.sheetnames:
|
||||
worksheet = workbook[sheet_name]
|
||||
self._translate_worksheet(worksheet, target_language)
|
||||
|
||||
# Prepare translated sheet name (but don't rename yet)
|
||||
translated_sheet_name = self.translation_service.translate_text(
|
||||
sheet_name, target_language
|
||||
)
|
||||
if translated_sheet_name and translated_sheet_name != sheet_name:
|
||||
# Truncate to Excel's 31 character limit and ensure uniqueness
|
||||
new_name = translated_sheet_name[:31]
|
||||
counter = 1
|
||||
base_name = new_name[:28] if len(new_name) > 28 else new_name
|
||||
while new_name in sheet_name_mapping.values() or new_name in workbook.sheetnames:
|
||||
new_name = f"{base_name}_{counter}"
|
||||
counter += 1
|
||||
sheet_name_mapping[sheet_name] = new_name
|
||||
|
||||
# Now rename sheets (after all content is translated)
|
||||
for original_name, new_name in sheet_name_mapping.items():
|
||||
workbook[original_name].title = new_name
|
||||
|
||||
# Save the translated workbook
|
||||
workbook.save(output_path)
|
||||
workbook.close()
|
||||
|
||||
return output_path
|
||||
|
||||
def _translate_worksheet(self, worksheet: Worksheet, target_language: str):
|
||||
"""
|
||||
Translate all cells in a worksheet while preserving formatting
|
||||
|
||||
Args:
|
||||
worksheet: Worksheet to translate
|
||||
target_language: Target language code
|
||||
"""
|
||||
# Iterate through all cells that have values
|
||||
for row in worksheet.iter_rows():
|
||||
for cell in row:
|
||||
if cell.value is not None:
|
||||
self._translate_cell(cell, target_language)
|
||||
|
||||
def _translate_cell(self, cell: Cell, target_language: str):
|
||||
"""
|
||||
Translate a single cell while preserving its formula and formatting
|
||||
|
||||
Args:
|
||||
cell: Cell to translate
|
||||
target_language: Target language code
|
||||
"""
|
||||
original_value = cell.value
|
||||
|
||||
# Skip if cell is empty
|
||||
if original_value is None:
|
||||
return
|
||||
|
||||
# Handle formulas
|
||||
if isinstance(original_value, str) and original_value.startswith('='):
|
||||
self._translate_formula(cell, original_value, target_language)
|
||||
# Handle regular text
|
||||
elif isinstance(original_value, str):
|
||||
translated_text = self.translation_service.translate_text(
|
||||
original_value, target_language
|
||||
)
|
||||
cell.value = translated_text
|
||||
# Numbers, dates, booleans remain unchanged
|
||||
|
||||
def _translate_formula(self, cell: Cell, formula: str, target_language: str):
|
||||
"""
|
||||
Translate text within a formula while preserving the formula structure
|
||||
|
||||
Args:
|
||||
cell: Cell containing the formula
|
||||
formula: Formula string
|
||||
target_language: Target language code
|
||||
"""
|
||||
# Extract text strings from formula (text within quotes)
|
||||
string_pattern = re.compile(r'"([^"]*)"')
|
||||
strings = string_pattern.findall(formula)
|
||||
|
||||
if not strings:
|
||||
return
|
||||
|
||||
# Translate each string and replace in formula
|
||||
translated_formula = formula
|
||||
for original_string in strings:
|
||||
if original_string.strip(): # Only translate non-empty strings
|
||||
translated_string = self.translation_service.translate_text(
|
||||
original_string, target_language
|
||||
)
|
||||
# Replace in formula, being careful with special regex characters
|
||||
translated_formula = translated_formula.replace(
|
||||
f'"{original_string}"', f'"{translated_string}"'
|
||||
)
|
||||
|
||||
cell.value = translated_formula
|
||||
|
||||
def _should_translate(self, text: str) -> bool:
|
||||
"""
|
||||
Determine if text should be translated
|
||||
|
||||
Args:
|
||||
text: Text to check
|
||||
|
||||
Returns:
|
||||
True if text should be translated, False otherwise
|
||||
"""
|
||||
if not text or not isinstance(text, str):
|
||||
return False
|
||||
|
||||
# Don't translate if it's only numbers, special characters, or very short
|
||||
if len(text.strip()) < 2:
|
||||
return False
|
||||
|
||||
# Check if it's a formula (handled separately)
|
||||
if text.startswith('='):
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
# Global translator instance
|
||||
excel_translator = ExcelTranslator()
|
||||
158
translators/pptx_translator.py
Normal file
@ -0,0 +1,158 @@
|
||||
"""
|
||||
PowerPoint Translation Module
|
||||
Translates PowerPoint files while preserving all layouts, animations, and media
|
||||
"""
|
||||
from pathlib import Path
|
||||
from pptx import Presentation
|
||||
from pptx.shapes.base import BaseShape
|
||||
from pptx.shapes.group import GroupShape
|
||||
from pptx.util import Inches, Pt
|
||||
from pptx.enum.shapes import MSO_SHAPE_TYPE
|
||||
from services.translation_service import translation_service
|
||||
|
||||
|
||||
class PowerPointTranslator:
|
||||
"""Handles translation of PowerPoint presentations with strict formatting preservation"""
|
||||
|
||||
def __init__(self):
|
||||
self.translation_service = translation_service
|
||||
|
||||
def translate_file(self, input_path: Path, output_path: Path, target_language: str) -> Path:
|
||||
"""
|
||||
Translate a PowerPoint presentation while preserving all formatting and structure
|
||||
|
||||
Args:
|
||||
input_path: Path to input PowerPoint file
|
||||
output_path: Path to save translated PowerPoint file
|
||||
target_language: Target language code
|
||||
|
||||
Returns:
|
||||
Path to the translated file
|
||||
"""
|
||||
presentation = Presentation(input_path)
|
||||
|
||||
# Translate each slide
|
||||
for slide in presentation.slides:
|
||||
self._translate_slide(slide, target_language)
|
||||
|
||||
# Save the translated presentation
|
||||
presentation.save(output_path)
|
||||
|
||||
return output_path
|
||||
|
||||
def _translate_slide(self, slide, target_language: str):
|
||||
"""
|
||||
Translate all text elements in a slide while preserving layout
|
||||
|
||||
Args:
|
||||
slide: Slide to translate
|
||||
target_language: Target language code
|
||||
"""
|
||||
# Translate notes (speaker notes)
|
||||
if slide.has_notes_slide:
|
||||
notes_slide = slide.notes_slide
|
||||
if notes_slide.notes_text_frame:
|
||||
self._translate_text_frame(notes_slide.notes_text_frame, target_language)
|
||||
|
||||
# Translate shapes in the slide
|
||||
for shape in slide.shapes:
|
||||
self._translate_shape(shape, target_language)
|
||||
|
||||
def _translate_shape(self, shape: BaseShape, target_language: str):
|
||||
"""
|
||||
Translate text in a shape based on its type
|
||||
|
||||
Args:
|
||||
shape: Shape to translate
|
||||
target_language: Target language code
|
||||
"""
|
||||
# Handle text-containing shapes
|
||||
if shape.has_text_frame:
|
||||
self._translate_text_frame(shape.text_frame, target_language)
|
||||
|
||||
# Handle tables
|
||||
if shape.shape_type == MSO_SHAPE_TYPE.TABLE:
|
||||
self._translate_table(shape.table, target_language)
|
||||
|
||||
# Handle group shapes (shapes within shapes)
|
||||
if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
|
||||
for sub_shape in shape.shapes:
|
||||
self._translate_shape(sub_shape, target_language)
|
||||
|
||||
# Handle smart art (contains multiple shapes)
|
||||
# Smart art is complex, but we can try to translate text within it
|
||||
if hasattr(shape, 'shapes'):
|
||||
try:
|
||||
for sub_shape in shape.shapes:
|
||||
self._translate_shape(sub_shape, target_language)
|
||||
except:
|
||||
pass # Some shapes may not support iteration
|
||||
|
||||
def _translate_text_frame(self, text_frame, target_language: str):
|
||||
"""
|
||||
Translate text within a text frame while preserving formatting
|
||||
|
||||
Args:
|
||||
text_frame: Text frame to translate
|
||||
target_language: Target language code
|
||||
"""
|
||||
if not text_frame.text.strip():
|
||||
return
|
||||
|
||||
# Translate each paragraph in the text frame
|
||||
for paragraph in text_frame.paragraphs:
|
||||
self._translate_paragraph(paragraph, target_language)
|
||||
|
||||
def _translate_paragraph(self, paragraph, target_language: str):
|
||||
"""
|
||||
Translate a paragraph while preserving run-level formatting
|
||||
|
||||
Args:
|
||||
paragraph: Paragraph to translate
|
||||
target_language: Target language code
|
||||
"""
|
||||
if not paragraph.text.strip():
|
||||
return
|
||||
|
||||
# Translate each run in the paragraph to preserve individual formatting
|
||||
for run in paragraph.runs:
|
||||
if run.text.strip():
|
||||
translated_text = self.translation_service.translate_text(
|
||||
run.text, target_language
|
||||
)
|
||||
run.text = translated_text
|
||||
|
||||
def _translate_table(self, table, target_language: str):
|
||||
"""
|
||||
Translate all cells in a table while preserving structure
|
||||
|
||||
Args:
|
||||
table: Table to translate
|
||||
target_language: Target language code
|
||||
"""
|
||||
for row in table.rows:
|
||||
for cell in row.cells:
|
||||
self._translate_text_frame(cell.text_frame, target_language)
|
||||
|
||||
def _is_translatable(self, text: str) -> bool:
|
||||
"""
|
||||
Determine if text should be translated
|
||||
|
||||
Args:
|
||||
text: Text to check
|
||||
|
||||
Returns:
|
||||
True if text should be translated, False otherwise
|
||||
"""
|
||||
if not text or not isinstance(text, str):
|
||||
return False
|
||||
|
||||
# Don't translate if it's only numbers, special characters, or very short
|
||||
if len(text.strip()) < 2:
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
# Global translator instance
|
||||
pptx_translator = PowerPointTranslator()
|
||||
171
translators/word_translator.py
Normal file
@ -0,0 +1,171 @@
|
||||
"""
|
||||
Word Document Translation Module
|
||||
Translates Word files while preserving all formatting, styles, tables, and images
|
||||
"""
|
||||
from pathlib import Path
|
||||
from docx import Document
|
||||
from docx.text.paragraph import Paragraph
|
||||
from docx.table import Table, _Cell
|
||||
from docx.oxml.text.paragraph import CT_P
|
||||
from docx.oxml.table import CT_Tbl
|
||||
from docx.section import Section
|
||||
from services.translation_service import translation_service
|
||||
|
||||
|
||||
class WordTranslator:
|
||||
"""Handles translation of Word documents with strict formatting preservation"""
|
||||
|
||||
def __init__(self):
|
||||
self.translation_service = translation_service
|
||||
|
||||
def translate_file(self, input_path: Path, output_path: Path, target_language: str) -> Path:
|
||||
"""
|
||||
Translate a Word document while preserving all formatting and structure
|
||||
|
||||
Args:
|
||||
input_path: Path to input Word file
|
||||
output_path: Path to save translated Word file
|
||||
target_language: Target language code
|
||||
|
||||
Returns:
|
||||
Path to the translated file
|
||||
"""
|
||||
document = Document(input_path)
|
||||
|
||||
# Translate main document body
|
||||
self._translate_document_body(document, target_language)
|
||||
|
||||
# Translate headers and footers in all sections
|
||||
for section in document.sections:
|
||||
self._translate_section(section, target_language)
|
||||
|
||||
# Save the translated document
|
||||
document.save(output_path)
|
||||
|
||||
return output_path
|
||||
|
||||
def _translate_document_body(self, document: Document, target_language: str):
|
||||
"""
|
||||
Translate all elements in the document body
|
||||
|
||||
Args:
|
||||
document: Document to translate
|
||||
target_language: Target language code
|
||||
"""
|
||||
for element in document.element.body:
|
||||
if isinstance(element, CT_P):
|
||||
# It's a paragraph
|
||||
paragraph = Paragraph(element, document)
|
||||
self._translate_paragraph(paragraph, target_language)
|
||||
elif isinstance(element, CT_Tbl):
|
||||
# It's a table
|
||||
table = Table(element, document)
|
||||
self._translate_table(table, target_language)
|
||||
|
||||
def _translate_paragraph(self, paragraph: Paragraph, target_language: str):
|
||||
"""
|
||||
Translate a paragraph while preserving all formatting
|
||||
|
||||
Args:
|
||||
paragraph: Paragraph to translate
|
||||
target_language: Target language code
|
||||
"""
|
||||
if not paragraph.text.strip():
|
||||
return
|
||||
|
||||
# For paragraphs with complex formatting (multiple runs), translate run by run
|
||||
if len(paragraph.runs) > 0:
|
||||
for run in paragraph.runs:
|
||||
if run.text.strip():
|
||||
translated_text = self.translation_service.translate_text(
|
||||
run.text, target_language
|
||||
)
|
||||
run.text = translated_text
|
||||
else:
|
||||
# Simple paragraph with no runs
|
||||
if paragraph.text.strip():
|
||||
translated_text = self.translation_service.translate_text(
|
||||
paragraph.text, target_language
|
||||
)
|
||||
paragraph.text = translated_text
|
||||
|
||||
def _translate_table(self, table: Table, target_language: str):
|
||||
"""
|
||||
Translate all cells in a table while preserving structure
|
||||
|
||||
Args:
|
||||
table: Table to translate
|
||||
target_language: Target language code
|
||||
"""
|
||||
for row in table.rows:
|
||||
for cell in row.cells:
|
||||
self._translate_cell(cell, target_language)
|
||||
|
||||
def _translate_cell(self, cell: _Cell, target_language: str):
|
||||
"""
|
||||
Translate content within a table cell
|
||||
|
||||
Args:
|
||||
cell: Cell to translate
|
||||
target_language: Target language code
|
||||
"""
|
||||
for paragraph in cell.paragraphs:
|
||||
self._translate_paragraph(paragraph, target_language)
|
||||
|
||||
# Handle nested tables
|
||||
for table in cell.tables:
|
||||
self._translate_table(table, target_language)
|
||||
|
||||
def _translate_section(self, section: Section, target_language: str):
|
||||
"""
|
||||
Translate headers and footers in a section
|
||||
|
||||
Args:
|
||||
section: Section to translate
|
||||
target_language: Target language code
|
||||
"""
|
||||
# Translate header
|
||||
if section.header:
|
||||
for paragraph in section.header.paragraphs:
|
||||
self._translate_paragraph(paragraph, target_language)
|
||||
for table in section.header.tables:
|
||||
self._translate_table(table, target_language)
|
||||
|
||||
# Translate footer
|
||||
if section.footer:
|
||||
for paragraph in section.footer.paragraphs:
|
||||
self._translate_paragraph(paragraph, target_language)
|
||||
for table in section.footer.tables:
|
||||
self._translate_table(table, target_language)
|
||||
|
||||
# Translate first page header (if different)
|
||||
if section.first_page_header:
|
||||
for paragraph in section.first_page_header.paragraphs:
|
||||
self._translate_paragraph(paragraph, target_language)
|
||||
for table in section.first_page_header.tables:
|
||||
self._translate_table(table, target_language)
|
||||
|
||||
# Translate first page footer (if different)
|
||||
if section.first_page_footer:
|
||||
for paragraph in section.first_page_footer.paragraphs:
|
||||
self._translate_paragraph(paragraph, target_language)
|
||||
for table in section.first_page_footer.tables:
|
||||
self._translate_table(table, target_language)
|
||||
|
||||
# Translate even page header (if different)
|
||||
if section.even_page_header:
|
||||
for paragraph in section.even_page_header.paragraphs:
|
||||
self._translate_paragraph(paragraph, target_language)
|
||||
for table in section.even_page_header.tables:
|
||||
self._translate_table(table, target_language)
|
||||
|
||||
# Translate even page footer (if different)
|
||||
if section.even_page_footer:
|
||||
for paragraph in section.even_page_footer.paragraphs:
|
||||
self._translate_paragraph(paragraph, target_language)
|
||||
for table in section.even_page_footer.tables:
|
||||
self._translate_table(table, target_language)
|
||||
|
||||
|
||||
# Global translator instance
|
||||
word_translator = WordTranslator()
|
||||
20
utils/__init__.py
Normal file
@ -0,0 +1,20 @@
|
||||
"""Utils package initialization"""
|
||||
from .file_handler import FileHandler, file_handler
|
||||
from .exceptions import (
|
||||
TranslationError,
|
||||
UnsupportedFileTypeError,
|
||||
FileSizeLimitExceededError,
|
||||
LanguageNotSupportedError,
|
||||
DocumentProcessingError,
|
||||
handle_translation_error
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
'FileHandler', 'file_handler',
|
||||
'TranslationError',
|
||||
'UnsupportedFileTypeError',
|
||||
'FileSizeLimitExceededError',
|
||||
'LanguageNotSupportedError',
|
||||
'DocumentProcessingError',
|
||||
'handle_translation_error'
|
||||
]
|
||||
51
utils/exceptions.py
Normal file
@ -0,0 +1,51 @@
|
||||
"""
|
||||
Custom exceptions for the Document Translation API
|
||||
"""
|
||||
from fastapi import HTTPException
|
||||
|
||||
|
||||
class TranslationError(Exception):
|
||||
"""Base exception for translation errors"""
|
||||
pass
|
||||
|
||||
|
||||
class UnsupportedFileTypeError(TranslationError):
|
||||
"""Raised when an unsupported file type is provided"""
|
||||
pass
|
||||
|
||||
|
||||
class FileSizeLimitExceededError(TranslationError):
|
||||
"""Raised when a file exceeds the size limit"""
|
||||
pass
|
||||
|
||||
|
||||
class LanguageNotSupportedError(TranslationError):
|
||||
"""Raised when a language code is not supported"""
|
||||
pass
|
||||
|
||||
|
||||
class DocumentProcessingError(TranslationError):
|
||||
"""Raised when there's an error processing the document"""
|
||||
pass
|
||||
|
||||
|
||||
def handle_translation_error(error: Exception) -> HTTPException:
|
||||
"""
|
||||
Convert translation errors to HTTP exceptions
|
||||
|
||||
Args:
|
||||
error: Exception that occurred
|
||||
|
||||
Returns:
|
||||
HTTPException with appropriate status code and message
|
||||
"""
|
||||
if isinstance(error, UnsupportedFileTypeError):
|
||||
return HTTPException(status_code=400, detail=str(error))
|
||||
elif isinstance(error, FileSizeLimitExceededError):
|
||||
return HTTPException(status_code=413, detail=str(error))
|
||||
elif isinstance(error, LanguageNotSupportedError):
|
||||
return HTTPException(status_code=400, detail=str(error))
|
||||
elif isinstance(error, DocumentProcessingError):
|
||||
return HTTPException(status_code=500, detail=str(error))
|
||||
else:
|
||||
return HTTPException(status_code=500, detail="An unexpected error occurred during translation")
|
||||
142
utils/file_handler.py
Normal file
@ -0,0 +1,142 @@
|
||||
"""
|
||||
Utility functions for file handling and validation
|
||||
"""
|
||||
import os
|
||||
import uuid
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
from fastapi import UploadFile, HTTPException
|
||||
from config import config
|
||||
|
||||
|
||||
class FileHandler:
|
||||
"""Handles file operations for the translation API"""
|
||||
|
||||
@staticmethod
|
||||
def validate_file_extension(filename: str) -> str:
|
||||
"""
|
||||
Validate that the file extension is supported
|
||||
|
||||
Args:
|
||||
filename: Name of the file
|
||||
|
||||
Returns:
|
||||
File extension (lowercase, with dot)
|
||||
|
||||
Raises:
|
||||
HTTPException: If file extension is not supported
|
||||
"""
|
||||
file_extension = Path(filename).suffix.lower()
|
||||
|
||||
if file_extension not in config.SUPPORTED_EXTENSIONS:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Unsupported file type. Supported types: {', '.join(config.SUPPORTED_EXTENSIONS)}"
|
||||
)
|
||||
|
||||
return file_extension
|
||||
|
||||
@staticmethod
|
||||
def validate_file_size(file: UploadFile) -> None:
|
||||
"""
|
||||
Validate that the file size is within limits
|
||||
|
||||
Args:
|
||||
file: Uploaded file
|
||||
|
||||
Raises:
|
||||
HTTPException: If file is too large
|
||||
"""
|
||||
# Get file size
|
||||
file.file.seek(0, 2) # Move to end of file
|
||||
file_size = file.file.tell() # Get position (file size)
|
||||
file.file.seek(0) # Reset to beginning
|
||||
|
||||
if file_size > config.MAX_FILE_SIZE_BYTES:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"File too large. Maximum size: {config.MAX_FILE_SIZE_MB}MB"
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
async def save_upload_file(file: UploadFile, destination: Path) -> Path:
|
||||
"""
|
||||
Save an uploaded file to disk
|
||||
|
||||
Args:
|
||||
file: Uploaded file
|
||||
destination: Path to save the file
|
||||
|
||||
Returns:
|
||||
Path to the saved file
|
||||
"""
|
||||
destination.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
with open(destination, "wb") as buffer:
|
||||
content = await file.read()
|
||||
buffer.write(content)
|
||||
|
||||
return destination
|
||||
|
||||
@staticmethod
|
||||
def generate_unique_filename(original_filename: str, prefix: str = "") -> str:
|
||||
"""
|
||||
Generate a unique filename to avoid collisions
|
||||
|
||||
Args:
|
||||
original_filename: Original filename
|
||||
prefix: Optional prefix for the filename
|
||||
|
||||
Returns:
|
||||
Unique filename
|
||||
"""
|
||||
file_path = Path(original_filename)
|
||||
unique_id = str(uuid.uuid4())[:8]
|
||||
|
||||
if prefix:
|
||||
return f"{prefix}_{unique_id}_{file_path.stem}{file_path.suffix}"
|
||||
else:
|
||||
return f"{unique_id}_{file_path.stem}{file_path.suffix}"
|
||||
|
||||
@staticmethod
|
||||
def cleanup_file(file_path: Path) -> None:
|
||||
"""
|
||||
Delete a file if it exists
|
||||
|
||||
Args:
|
||||
file_path: Path to the file to delete
|
||||
"""
|
||||
try:
|
||||
if file_path.exists():
|
||||
file_path.unlink()
|
||||
except Exception as e:
|
||||
print(f"Error deleting file {file_path}: {e}")
|
||||
|
||||
@staticmethod
|
||||
def get_file_info(file_path: Path) -> dict:
|
||||
"""
|
||||
Get information about a file
|
||||
|
||||
Args:
|
||||
file_path: Path to the file
|
||||
|
||||
Returns:
|
||||
Dictionary with file information
|
||||
"""
|
||||
if not file_path.exists():
|
||||
return {}
|
||||
|
||||
stat = file_path.stat()
|
||||
|
||||
return {
|
||||
"filename": file_path.name,
|
||||
"size_bytes": stat.st_size,
|
||||
"size_mb": round(stat.st_size / (1024 * 1024), 2),
|
||||
"extension": file_path.suffix,
|
||||
"created": stat.st_ctime,
|
||||
"modified": stat.st_mtime
|
||||
}
|
||||
|
||||
|
||||
# Global file handler instance
|
||||
file_handler = FileHandler()
|
||||