Files
office_translator/_bmad-output/implementation-artifacts/2-10-endpoint-post-api-v1-translate-core.md
Sepehr Ramezani 26bd096a06 feat: production deployment - full update with providers, admin, glossaries, pricing, tests
Major changes across backend, frontend, infrastructure:
- Provider system with model selection (Google, DeepL, OpenAI, Ollama, Google Cloud)
- Admin panel: user management, pricing, settings
- Glossary system with CSV import/export
- Subscription and tier quota management
- Security hardening (rate limiting, API key auth, path traversal fixes)
- Docker compose for dev, prod, and IONOS deployment
- Alembic migrations for new tables
- Frontend: dashboard, pricing page, landing page, i18n (en/fr)
- Test suite and verification scripts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-25 15:01:47 +02:00

15 KiB

Story 2.10: Endpoint POST /api/v1/translate (Core)

Status: done

Story

As a user, I want to submit a document for translation via API, So that I can get my document translated.

Acceptance Criteria

  1. AC1: Authentication - Endpoint requires valid JWT token (web user) or X-API-Key header (automation user)
  2. AC2: File Upload - POST to /api/v1/translate accepts multipart/form-data with file, source_lang, target_lang
  3. AC3: File Validation - System validates format (xlsx/docx/pptx only), max size 50MB, magic bytes check
  4. AC4: Success Response - Valid requests return HTTP 202 with {data: {id, status: "processing"}, meta: {rate_limit_remaining}}
  5. AC5: Invalid Format - Unsupported formats return 400 with error "INVALID_FORMAT" and accepted formats list
  6. AC6: Quota Exceeded - Users exceeding tier limit return 429 with error "QUOTA_EXCEEDED" and Retry-After header
  7. AC7: File Too Large - Files > 50MB return 413 with error "FILE_TOO_LARGE"
  8. AC8: Async Processing - Translation is processed asynchronously (endpoint returns immediately after validation)
  9. AC9: URL Ingestion - Pro users can provide file_url parameter instead of file upload (FR62-FR64)
  10. AC10: Optional Parameters - Support mode (classic/llm), provider, webhook_url, glossary_id, custom_prompt

Tasks / Subtasks

  • Task 1: Create Request/Response Schemas (AC: 2, 4, 5, 6, 7)

    • 1.1 Create TranslateRequest schema with file upload or file_url
    • 1.2 Create TranslateResponse schema with {data: {id, status}, meta: {rate_limit_remaining}}
    • 1.3 Create error response schemas for each error code
  • Task 2: Implement File Validation (AC: 3, 5, 7)

    • 2.1 Check file extension (only .xlsx, .docx, .pptx)
    • 2.2 Check magic bytes (PK header for Office files)
    • 2.3 Check file size (max 50MB)
    • 2.4 Return structured errors for each validation failure
  • Task 3: Implement Authentication Middleware (AC: 1)

    • 3.1 Support JWT Bearer token from Authorization header
    • 3.2 Support X-API-Key header for automation users
    • 3.3 Extract user context and tier information
  • Task 4: Implement Rate Limiting Check (AC: 6)

    • 4.1 Check user tier (free: 5/day, pro: unlimited)
    • 4.2 Check daily_translation_count against limit
    • 4.3 Return 429 with Retry-After header if exceeded
    • 4.4 Include rate_limit_remaining in meta response
  • Task 5: Implement Translation Job Creation (AC: 4, 8)

    • 5.1 Generate unique translation ID (UUID)
    • 5.2 Store file in temporary location with TTL metadata
    • 5.3 Create translation job record in database/Redis
    • 5.4 Queue job for async processing (or process inline for MVP)
    • 5.5 Return 202 with job ID and status
  • Task 6: Implement URL Ingestion (AC: 9)

    • 6.1 Accept file_url parameter as alternative to file upload
    • 6.2 Download file from URL with timeout (10s)
    • 6.3 Validate downloaded file format and size
    • 6.4 Return error "URL_DOWNLOAD_FAILED" or "URL_UNREACHABLE" on failure
  • Task 7: Implement Optional Parameters (AC: 10)

    • 7.1 Accept mode parameter (classic/llm, default: classic)
    • 7.2 Accept provider parameter (optional override)
    • 7.3 Accept webhook_url parameter (optional)
    • 7.4 Accept glossary_id parameter (Pro only)
    • 7.5 Accept custom_prompt parameter (Pro only)
  • Task 8: Create Router Endpoint (AC: All)

    • 8.1 Create POST /api/v1/translate in routes/translate_routes.py
    • 8.2 Wire all validation, auth, and processing components
    • 8.3 Add OpenAPI documentation with all parameters
    • 8.4 Add unit tests for all scenarios
  • Task 9: Integration Tests (AC: All)

    • 9.1 Test successful translation submission
    • 9.2 Test authentication (JWT and API Key)
    • 9.3 Test file validation errors
    • 9.4 Test rate limiting
    • 9.5 Test URL ingestion

Dev Notes

Previous Story Intelligence (Stories 2.1-2.9)

Critical patterns from Processor stories to reuse:

  1. File validation pattern (from pptx_translator.py):
MAX_FILE_SIZE_MB = 50
OFFICE_MAGIC_BYTES = b"PK"  # All Office files are ZIP archives
ACCEPTED_EXTENSIONS = {".xlsx", ".docx", ".pptx"}

def _validate_file(file_path: Path) -> None:
    if file_path.suffix.lower() not in ACCEPTED_EXTENSIONS:
        raise ValidationError(code="INVALID_FORMAT", ...)
    
    with open(file_path, "rb") as f:
        header = f.read(4)
    if header[:2] != OFFICE_MAGIC_BYTES:
        raise ValidationError(code="INVALID_FORMAT", ...)
    
    file_size_mb = file_path.stat().st_size / (1024 * 1024)
    if file_size_mb > MAX_FILE_SIZE_MB:
        raise ValidationError(code="FILE_TOO_LARGE", ...)
  1. Provider integration (from all processors):
def translate_file(self, provider: TranslationProvider, request: TranslationRequest) -> TranslationResponse:
    # Provider handles batch translation with fallback
    response = provider.translate_batch(texts, target_lang, source_lang)
    if response.error:
        raise TranslationError(code=response.error_code, message=response.error)
  1. Error class pattern:
class TranslateEndpointError(Exception):
    INVALID_FORMAT = "INVALID_FORMAT"
    FILE_TOO_LARGE = "FILE_TOO_LARGE"
    QUOTA_EXCEEDED = "QUOTA_EXCEEDED"
    URL_DOWNLOAD_FAILED = "URL_DOWNLOAD_FAILED"
    URL_UNREACHABLE = "URL_UNREACHABLE"
    UNAUTHORIZED = "UNAUTHORIZED"

Existing Code Structure

Check existing files:

  • app/main.py - FastAPI application entry
  • app/modules/translation/ - Translation module (may need creation)
  • app/core/security.py - Auth utilities
  • app/middleware/rate_limit.py - Rate limiting logic
  • translators/ - Existing processors (Excel, Word, PowerPoint)

Architecture Compliance

Per _bmad-output/planning-artifacts/architecture.md:

Success Response Format:

{
  "data": {
    "id": "tr_abc123",
    "status": "processing",
    "file_name": "report.xlsx",
    "source_lang": "en",
    "target_lang": "fr"
  },
  "meta": {
    "rate_limit_remaining": 49,
    "estimated_time_seconds": 12
  }
}

Error Response Format:

{
  "error": "QUOTA_EXCEEDED",
  "message": "Limite quotidienne atteinte (5/5 fichiers)",
  "details": {
    "current_usage": 5,
    "limit": 5,
    "tier": "free",
    "reset_at": "2024-01-16T00:00:00Z"
  }
}

Naming Conventions:

  • File: router.py (snake_case)
  • Class: TranslateRequest, TranslateResponse (PascalCase)
  • Variables: user_id, file_path (snake_case)
  • JSON fields: snake_case

API Endpoint Specification

POST /api/v1/translate
Authorization: Bearer <jwt_token>
# OR
X-API-Key: sk_live_xxx

Content-Type: multipart/form-data

file: <binary>                    # Required (unless file_url provided)
file_url: https://example.com/doc.xlsx  # Alternative to file (Pro feature)
source_lang: en                   # Required
target_lang: fr                   # Required
mode: classic                     # Optional: "classic" | "llm" (default: classic)
provider: google                  # Optional: "google" | "deepl" | "ollama" | "openai"
webhook_url: https://...          # Optional
glossary_id: uuid                 # Optional (Pro only)
custom_prompt: string             # Optional (Pro only)

Rate Limiting Logic

# From architecture.md
FREE_TIER_LIMIT = 5  # files per day
PRO_TIER_LIMIT = None  # unlimited

# Check in middleware or endpoint
if user.tier == "free" and user.daily_translation_count >= FREE_TIER_LIMIT:
    raise HTTPException(
        status_code=429,
        detail={
            "error": "QUOTA_EXCEEDED",
            "message": "Limite quotidienne atteinte.",
            "details": {
                "current_usage": user.daily_translation_count,
                "limit": FREE_TIER_LIMIT,
                "reset_at": next_midnight_utc
            }
        },
        headers={"Retry-After": str(seconds_until_midnight)}
    )

URL Ingestion Details

import httpx

async def download_from_url(url: str, timeout: int = 10) -> Tuple[Path, str]:
    """Download file from URL and return (temp_path, filename)."""
    async with httpx.AsyncClient(timeout=timeout) as client:
        response = await client.get(url, follow_redirects=True)
        
        if response.status_code != 200:
            raise TranslateEndpointError(
                code="URL_UNREACHABLE",
                message=f"URL inaccessible (HTTP {response.status_code})"
            )
        
        # Extract filename from URL or Content-Disposition
        filename = extract_filename(url, response.headers)
        
        # Save to temp file
        temp_path = save_to_temp(response.content, filename)
        
        return temp_path, filename

File Structure

Files to Create/Modify:

  • app/modules/translation/router.py - Main endpoint (create)
  • app/modules/translation/schemas.py - Request/Response schemas (create)
  • app/modules/translation/service.py - Business logic (create/update)
  • app/middleware/rate_limit.py - Rate limiting (update if needed)
  • tests/test_translation_endpoint.py - Integration tests (create)

Git Intelligence - Recent Patterns

From recent commits:

  • Translation cache implemented (5000 entry LRU cache)
  • OpenRouter provider with DeepSeek support added
  • Parallel processing optimizations in translation service
  • Redis sessions for production

Testing Strategy

# Unit tests
pytest tests/test_translate_endpoint.py -v

# With coverage
pytest tests/test_translation_endpoint.py --cov=app/modules/translation -v

# Integration tests
pytest tests/integration/ -v

Dependencies on Previous Stories

Story Dependency
2.1-2.6 TranslationProvider abstraction and fallback chain
2.7 ExcelProcessor for .xlsx files
2.8 WordProcessor for .docx files
2.9 PowerPointProcessor for .pptx files
1.6 Rate limiting middleware
1.8 Usage tracking for billing

Anti-Patterns to Avoid

  1. Don't process synchronously - Return 202 immediately, process in background
  2. Don't skip validation - Always check magic bytes, not just extension
  3. Don't log file content - Only log metadata (NFR11, NFR16)
  4. Don't return HTTP 500 - All errors should be 4xx with structured response
  5. Don't forget tier checks - Pro features (glossary, custom_prompt, URL ingestion) require tier check

References

  • [Source: _bmad-output/planning-artifacts/epics.md#Story 2.10]
  • [Source: _bmad-output/planning-artifacts/architecture.md#API Response Formats]
  • [Source: _bmad-output/planning-artifacts/prd.md#FR50-FR54 File Management]
  • [Source: _bmad-output/planning-artifacts/prd.md#FR62-FR64 URL Ingestion]
  • [Source: _bmad-output/planning-artifacts/prd.md#NFR12 Zero HTTP 500]
  • [Source: _bmad-output/implementation-artifacts/2-9-processor-powerpoint-pptx.md - Previous story patterns]
  • [Source: translators/excel_translator.py - File validation pattern]
  • [Source: translators/word_translator.py - Error handling pattern]
  • [Source: services/providers/base.py - TranslationProvider interface]
  • [Source: https://fastapi.tiangolo.com/tutorial/request-files/ - File upload docs]

Dev Agent Record

Agent Model Used

glm-5

Debug Log References

None

Completion Notes List

  1. Task 1 Complete: Created TranslateEndpointError exception class with error codes (INVALID_FORMAT, FILE_TOO_LARGE, QUOTA_EXCEEDED, URL_DOWNLOAD_FAILED, URL_UNREACHABLE, UNAUTHORIZED, MISSING_FILE, PRO_FEATURE_REQUIRED). Created Pydantic response schemas (TranslateResponseData, TranslateResponseMeta, TranslateResponse, ErrorResponse).

  2. Task 2 Complete: File validation uses existing FileValidator from middleware/validation.py with magic bytes (PK header), extension check (.xlsx, .docx, .pptx), and 50MB size limit. Added specific FILE_TOO_LARGE detection.

  3. Task 3 Complete: Authentication supports both JWT Bearer token (via Authorization header) and X-API-Key header. Uses get_authenticated_user dependency that tries API key first, then JWT.

  4. Task 4 Complete: Rate limiting uses existing TierQuotaService from middleware/tier_quota.py. Free tier: 5/day, Pro: unlimited. Returns 429 with Retry-After header on quota exceeded.

  5. Task 5 Complete: Translation jobs are stored in-memory _translation_jobs dict with job ID, status, file info, timestamps. Jobs are processed asynchronously via asyncio.create_task(). Returns 202 with job ID and status "processing".

  6. Task 6 Complete: URL ingestion implemented via download_from_url() using httpx with 10s timeout. Validates downloaded file format and size. Returns appropriate error codes (URL_UNREACHABLE, URL_DOWNLOAD_FAILED, FILE_TOO_LARGE). Restricted to Pro users only.

  7. Task 7 Complete: All optional parameters supported: mode (classic/llm), provider, webhook_url, glossary_id (Pro only), custom_prompt (Pro only). Pro features return 403 with PRO_FEATURE_REQUIRED error for free tier users.

  8. Task 8 Complete: Created routes/translate_routes.py with router_v1 mounted at /api/v1. Includes POST /translate, GET /translations/{job_id}, and GET /translate/health endpoints. Full OpenAPI documentation with all parameters.

  9. Task 9 Complete: Created 27 comprehensive tests in tests/test_translate_endpoint.py covering all acceptance criteria: file upload, validation, authentication, quota, file size, async processing, URL ingestion, optional parameters.

  10. Code Review Fixes Applied: Fixed 5 HIGH and 6 MEDIUM issues:

    • HTTP 500 → 400 (NFR12 compliance)
    • glossary_id now properly passed to translation job
    • Added source_lang validation
    • Added webhook_url format validation
    • Added tests for provider parameter, source_lang validation, webhook validation, and API key auth

File List

Created files:

  • routes/translate_routes.py - POST /api/v1/translate endpoint, job status endpoint, error handling, URL ingestion, async processing
  • middleware/tier_quota.py - TierQuotaService for daily quota management (Free: 5/day, Pro: unlimited)
  • alembic/versions/002_add_tier_daily_count.py - DB migration for tier tracking
  • tests/test_translate_endpoint.py - 34+ unit tests covering all ACs

Modified files:

  • main.py - Import and include translate_v1_router
  • middleware/validation.py - FileValidator, LanguageValidator, ProviderValidator classes

Change Log

  • 2026-02-21: Code review fixes - HTTP 500→400, glossary_id propagation, source_lang validation, webhook_url validation, additional tests
  • 2026-02-21: Implemented Story 2.10 - POST /api/v1/translate endpoint with async processing, file validation, authentication, rate limiting, URL ingestion (Pro), and comprehensive tests