Files
office_translator/_bmad-output/implementation-artifacts/2-5-provider-openai-llm-cloud.md
Sepehr Ramezani 26bd096a06 feat: production deployment - full update with providers, admin, glossaries, pricing, tests
Major changes across backend, frontend, infrastructure:
- Provider system with model selection (Google, DeepL, OpenAI, Ollama, Google Cloud)
- Admin panel: user management, pricing, settings
- Glossary system with CSV import/export
- Subscription and tier quota management
- Security hardening (rate limiting, API key auth, path traversal fixes)
- Docker compose for dev, prod, and IONOS deployment
- Alembic migrations for new tables
- Frontend: dashboard, pricing page, landing page, i18n (en/fr)
- Test suite and verification scripts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-25 15:01:47 +02:00

20 KiB

Story 2.5: Provider OpenAI (LLM Cloud)

Status: done

Story

As a system, I want to integrate OpenAI API as an LLM provider, so that Pro users can translate documents with GPT models.

Acceptance Criteria

  1. AC1: API Integration - Given OPENAI_API_KEY is configured in environment, when OpenAIProvider.translate_text() is called, then text is translated using GPT-4 or specified model
  2. AC2: Custom System Prompt - Custom system prompt can be injected via request metadata to guide translation context
  3. AC3: Rate Limiting - API rate limits return error "PROVIDER_RATE_LIMITED" with retry suggestion (HTTP 429)
  4. AC4: Invalid Key Handling - Invalid API key returns error "OPENAI_INVALID_KEY" with HTTP 401
  5. AC5: Graceful Error Handling - All errors return structured JSON (never HTTP 500) with French messages
  6. AC6: Health Check - Provider is_available() returns True when API key is valid and service is reachable
  7. AC7: Registry Integration - Provider is registered in ProviderRegistry and appears in fallback chain
  8. AC8: Unit Tests - Tests verify all error scenarios, rate limiting handling, and mock OpenAI API responses

Tasks / Subtasks

  • Task 1: Create OpenAI Provider Implementation (AC: 1, 2)

    • 1.1 Create services/providers/openai_provider.py
    • 1.2 Implement OpenAITranslationProvider class extending TranslationProvider
    • 1.3 Implement translate_text() using OpenAI Chat Completions API
    • 1.4 Support custom system prompt injection via request metadata
    • 1.5 Configure default translation system prompt with temperature 0.3
  • Task 2: Implement Error Handling (AC: 3, 4, 5)

    • 2.1 Define error codes: OPENAI_RATE_LIMITED, OPENAI_INVALID_KEY, OPENAI_QUOTA_EXCEEDED, OPENAI_TIMEOUT, OPENAI_SERVICE_ERROR, OPENAI_CONTEXT_TOO_LONG
    • 2.2 Implement OpenAIProviderError exception class (follow Ollama pattern)
    • 2.3 Map OpenAI API errors to structured error responses with French messages
    • 2.4 Add retry logic with exponential backoff for rate limits and timeouts
    • 2.5 Add timeout configuration (default 60s for OpenAI - faster than Ollama)
    • 2.6 Handle specific OpenAI errors: rate_limit_exceeded, insufficient_quota, invalid_api_key
  • Task 3: Implement Health Check (AC: 6)

    • 3.1 Implement is_available() to validate API key and service reachability
    • 3.2 Add health_check() with caching (TTL 60s) matching existing provider pattern
    • 3.3 Make lightweight API call to verify credentials (e.g., list models or simple completion)
    • 3.4 Return ProviderHealthStatus with availability, latency, and model info
  • Task 4: Registry Integration (AC: 7)

    • 4.1 Add register_openai_provider() function
    • 4.2 Add get_openai_provider() singleton function
    • 4.3 Update services/providers/__init__.py to auto-register OpenAI when OPENAI_ENABLED=true
    • 4.4 Verify provider appears in fallback chain when configured
  • Task 5: Configuration Updates (AC: 1, 2)

    • 5.1 Verify OPENAI_API_KEY, OPENAI_MODEL, OPENAI_ENABLED in config.py (already present)
    • 5.2 Add OpenAI-specific configuration options to config.py:
      • OPENAI_TIMEOUT=60 (faster than Ollama's 120s)
      • OPENAI_MAX_RETRIES=3
      • OPENAI_RETRY_DELAY=1.0
      • OPENAI_BASE_URL (optional, for custom endpoints like Azure OpenAI)
    • 5.3 Update .env.example with OpenAI-specific config
  • Task 6: Create Unit Tests (AC: 8)

    • 6.1 Create tests/test_providers/test_openai_provider.py
    • 6.2 Test successful translation with mocked OpenAI API
    • 6.3 Test all error scenarios (rate limited, invalid key, quota exceeded, timeout)
    • 6.4 Test custom system prompt injection
    • 6.5 Test retry logic for rate limits
    • 6.6 Test health check functionality
    • 6.7 Test registry integration
  • Task 7: Update Documentation (AC: 1-8)

    • 7.1 Update services/providers/README.md with OpenAI section
    • 7.2 Document OpenAI setup requirements (API key from platform.openai.com)
    • 7.3 Document supported models and pricing considerations
    • 7.4 Document rate limiting behavior and retry strategy

Dev Notes

OpenAI API Specifics

OpenAI Chat Completions API:

Endpoint Method Purpose
/v1/chat/completions POST Generate translation
/v1/models GET List available models (for health check)

API Request Format:

OPENAI_API_URL = "https://api.openai.com/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {OPENAI_API_KEY}",
    "Content-Type": "application/json"
}

payload = {
    "model": "gpt-4o-mini",  # or gpt-4, gpt-3.5-turbo
    "messages": [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": text_to_translate}
    ],
    "temperature": 0.3,  # Lower for consistent translation
    "max_tokens": 4096   # Adjust based on expected output
}

API Response Format:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o-mini",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Bonjour, comment allez-vous?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 10,
    "total_tokens": 60
  }
}

OpenAI Error Codes:

OpenAI Error HTTP Mapped Code French Message
rate_limit_exceeded 429 OPENAI_RATE_LIMITED "Limite de requêtes OpenAI atteinte. Réessayez dans {retry_after}s."
insufficient_quota 429 OPENAI_QUOTA_EXCEEDED "Quota OpenAI épuisé. Vérifiez votre facturation."
invalid_api_key 401 OPENAI_INVALID_KEY "Clé API OpenAI invalide. Vérifiez votre configuration."
context_length_exceeded 400 OPENAI_CONTEXT_TOO_LONG "Texte trop long (max {max_tokens} tokens)."
server_error 500 OPENAI_SERVICE_ERROR "Service OpenAI temporairement indisponible."
Timeout - OPENAI_TIMEOUT "Délai d'attente OpenAI dépassé."
Model Cost Speed Quality Best For
gpt-4o-mini $0.15/M tokens Fast Good Default choice, cost-effective
gpt-4o $2.50/M tokens Medium Excellent High-quality requirements
gpt-4 $30/M tokens Slower Excellent Critical translations
gpt-3.5-turbo $0.50/M tokens Fastest Good Speed priority

Default: gpt-4o-mini (best value for translation)

Default System Prompt for Translation

DEFAULT_TRANSLATION_PROMPT = """You are a professional translator. Translate the following text from {source_lang} to {target_lang}.

Rules:
- Translate ONLY the text, do not add explanations or notes
- Preserve the original formatting, line breaks, and structure
- Maintain the original tone and style
- For technical terms, use the standard translation in the target language
- If the text contains proper nouns or brand names, keep them unchanged unless there's a well-known translation"""

def _build_system_prompt(
    source_lang: str, 
    target_lang: str, 
    custom_prompt: Optional[str] = None
) -> str:
    if custom_prompt:
        return custom_prompt
    return DEFAULT_TRANSLATION_PROMPT.format(
        source_lang=source_lang, 
        target_lang=target_lang
    )

Architecture Compliance

Per _bmad-output/planning-artifacts/architecture.md:

Error Format:

{
  "error": "OPENAI_RATE_LIMITED",
  "message": "Limite de requêtes OpenAI atteinte. Réessayez dans 20s.",
  "details": {
    "provider": "openai",
    "retry_after_seconds": 20,
    "model": "gpt-4o-mini"
  }
}

Never return HTTP 500 - All errors must be 4xx or 502 (upstream error).

Naming Conventions:

  • File: openai_provider.py (snake_case)
  • Class: OpenAITranslationProvider (PascalCase)
  • Error codes: OPENAI_* (UPPER_SNAKE_CASE)
  • JSON fields: snake_case

Previous Story Intelligence (Story 2.4 - Ollama)

What Worked Well:

  • httpx library for HTTP requests (supports async and sync)
  • Error codes with to_dict() method for consistent formatting
  • Retry logic with exponential backoff for transient errors
  • Health check with 60s TTL caching
  • Thread-safe singleton pattern for provider instance
  • Structlog-compatible logging with keyword args
  • Language name mapping for better LLM understanding

Patterns to Reuse:

# Error codes pattern
OPENAI_RATE_LIMITED = "OPENAI_RATE_LIMITED"
OPENAI_INVALID_KEY = "OPENAI_INVALID_KEY"
OPENAI_QUOTA_EXCEEDED = "OPENAI_QUOTA_EXCEEDED"
OPENAI_TIMEOUT = "OPENAI_TIMEOUT"
OPENAI_SERVICE_ERROR = "OPENAI_SERVICE_ERROR"
OPENAI_CONTEXT_TOO_LONG = "OPENAI_CONTEXT_TOO_LONG"

_RETRYABLE_ERRORS = {OPENAI_RATE_LIMITED, OPENAI_TIMEOUT, OPENAI_SERVICE_ERROR}

# Exception class pattern
class OpenAIProviderError(Exception):
    def __init__(self, code: str, message: str, details: Optional[Dict[str, Any]] = None):
        self.code = code
        self.message = message
        self.details = details or {}
        super().__init__(message)

    def to_dict(self) -> Dict[str, Any]:
        result = {"error": self.code, "message": self.message}
        if self.details:
            result["details"] = self.details
        return result

# Retry logic pattern
def _translate_with_retry(self, text: str, system_prompt: str) -> str:
    last_error = None
    for attempt in range(self.max_retries + 1):
        try:
            return self._make_api_request(text, system_prompt)
        except OpenAIProviderError as e:
            last_error = e
            if e.code not in _RETRYABLE_ERRORS or attempt == self.max_retries:
                raise
            delay = self.retry_delay * (2 ** attempt)
            time.sleep(delay)
    raise last_error

Key Differences from Ollama:

  • Requires API key authentication (Bearer token)
  • Uses OpenAI's specific error codes and headers
  • Rate limiting is more strict (pay-per-use)
  • Faster response times (60s timeout vs 120s)
  • No model "pulling" concept - models are always available
  • Quota management is critical (billing impact)

File Structure

Files to Create:

  • services/providers/openai_provider.py - Main OpenAI provider implementation
  • tests/test_providers/test_openai_provider.py - Unit tests

Files to Modify:

  • services/providers/__init__.py - Add OpenAI auto-registration
  • services/providers/config.py - Add OPENAI_TIMEOUT, OPENAI_MAX_RETRIES, OPENAI_RETRY_DELAY, OPENAI_BASE_URL
  • .env.example - Add OpenAI-specific configuration options
  • services/providers/README.md - Add OpenAI documentation

Error Codes to Implement

Code HTTP Scenario Message Template
OPENAI_RATE_LIMITED 429 Rate limit hit "Limite de requêtes atteinte. Réessayez dans {retry_after}s."
OPENAI_INVALID_KEY 401 Invalid API key "Clé API invalide. Vérifiez OPENAI_API_KEY."
OPENAI_QUOTA_EXCEEDED 429 Billing quota exceeded "Quota épuisé. Vérifiez votre facturation OpenAI."
OPENAI_TIMEOUT 502 Request timeout "Délai dépassé. Le service est lent."
OPENAI_SERVICE_ERROR 502 OpenAI server error "Service temporairement indisponible."
OPENAI_CONTEXT_TOO_LONG 413 Context exceeds model limit "Texte trop long (max {max_tokens} tokens)."

Configuration

Environment Variables (.env.example):

# OpenAI Provider (Cloud LLM)
OPENAI_ENABLED=true
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx
OPENAI_MODEL=gpt-4o-mini
OPENAI_TIMEOUT=60
OPENAI_MAX_RETRIES=3
OPENAI_RETRY_DELAY=1.0
# OPENAI_BASE_URL=https://api.openai.com/v1  # Optional: for Azure OpenAI or proxies

Provider Config (services/providers/config.py): Add to existing OpenAI section:

OPENAI_TIMEOUT: int = int(os.getenv("OPENAI_TIMEOUT", "60"))
OPENAI_MAX_RETRIES: int = int(os.getenv("OPENAI_MAX_RETRIES", "3"))
OPENAI_RETRY_DELAY: float = float(os.getenv("OPENAI_RETRY_DELAY", "1.0"))
OPENAI_BASE_URL: str = os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1")

Testing Strategy

Unit Tests (Mocked):

  • Mock httpx or requests responses
  • Test successful translation
  • Test all error scenarios (rate limit, invalid key, quota exceeded, timeout)
  • Test custom system prompt injection
  • Test health check logic
  • Test retry logic for rate limits
  • Test registry integration

Test Commands:

# Unit tests only
pytest tests/test_providers/test_openai_provider.py -v

# All provider tests
pytest tests/test_providers/ -v

# With coverage
pytest tests/test_providers/ --cov=services/providers -v

Logging Pattern

try:
    import structlog
    logger = structlog.get_logger(__name__)
    _HAS_STRUCTLOG = True
except ImportError:
    import logging
    logger = logging.getLogger(__name__)
    _HAS_STRUCTLOG = False

def _log_info(event: str, **kwargs):
    """Log info with structlog or standard logging compatibility."""
    if _HAS_STRUCTLOG:
        logger.info(event, **kwargs)
    else:
        msg = f"{event} " + " ".join(f"{k}={v}" for k, v in kwargs.items())
        logger.info(msg)

# Good - metadata only (NO document content)
_log_info(
    "openai_translation_success",
    chars=len(text),
    source_lang=source_language,
    target_lang=target_language,
    model=self._model,
    latency_ms=round(latency * 1000, 2),
    tokens_used=response.usage.total_tokens,
)

_log_error(
    "openai_translation_failed",
    error_code=error.code,
    text_length=len(text),
    source_lang=source_language,
    target_lang=target_language,
    model=self._model,
)

Dependencies

Internal:

  • services/providers/base.py - TranslationProvider abstract class
  • services/providers/registry.py - ProviderRegistry
  • services/providers/config.py - Configuration
  • services/providers/schemas.py - TranslationRequest/Response models

External:

  • httpx - HTTP client (preferred for async/sync support)
  • structlog or standard logging - Structured logging

HTTP Client Pattern

Use httpx for OpenAI API calls:

import httpx

class OpenAITranslationProvider(TranslationProvider):
    def __init__(self, api_key: str, model: str = "gpt-4o-mini", timeout: int = 60, base_url: str = "https://api.openai.com/v1"):
        self._api_key = api_key
        self._model = model
        self._base_url = base_url.rstrip("/")
        self._timeout = timeout
        self._client = httpx.Client(
            timeout=timeout,
            headers={
                "Authorization": f"Bearer {api_key}",
                "Content-Type": "application/json"
            }
        )
    
    def _make_api_request(self, text: str, system_prompt: str) -> str:
        response = self._client.post(
            f"{self._base_url}/v1/chat/completions",
            json={
                "model": self._model,
                "messages": [
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": text}
                ],
                "temperature": 0.3,
                "max_tokens": 4096
            }
        )
        # ... error handling based on status code
        return response.json()["choices"][0]["message"]["content"]

Security Considerations

API Key Management:

  • API key stored in environment variable (never in code)
  • Key validated at initialization
  • Never log the API key (only last 4 characters if needed for debugging)

Data Privacy:

  • Never log document content (NFR11)
  • Only log metadata: text length, languages, model, timestamps
  • OpenAI may retain data per their privacy policy (different from Ollama's local processing)

Pro Feature Integration

Per PRD FR26: "Pro users can access LLM translation modes"

This provider will be used when:

  • User tier is "pro"
  • User selects "LLM" mode
  • User selects "OpenAI" as LLM provider

The tier check happens in the translation service/router, not in the provider itself.

Rate Limiting Handling

OpenAI returns rate limit info in response headers:

  • x-ratelimit-limit-requests
  • x-ratelimit-remaining-requests
  • x-ratelimit-reset-requests

Extract retry_after from error response or use exponential backoff.

References

  • [Source: _bmad-output/planning-artifacts/architecture.md#Error Handling]
  • [Source: _bmad-output/planning-artifacts/architecture.md#API Response Formats]
  • [Source: _bmad-output/planning-artifacts/epics.md#Story 2.5]
  • [Source: _bmad-output/planning-artifacts/prd.md#FR7 LLM providers (Ollama, OpenAI)]
  • [Source: _bmad-output/planning-artifacts/prd.md#NFR12 Zero HTTP 500 errors]
  • [Source: _bmad-output/implementation-artifacts/2-4-provider-ollama-llm-local.md]
  • [Source: services/providers/ollama_provider.py - Implementation pattern]
  • [Source: https://platform.openai.com/docs/api-reference/chat - OpenAI API docs]
  • [Source: https://platform.openai.com/docs/guides/error-codes - OpenAI Error Codes]

Dev Agent Record

Agent Model Used

Claude (GLM-5) via opencode

Debug Log References

  • Fixed test mocking issues for registry integration tests
  • Resolved ProvidersConfig import path in tests

Completion Notes List

  • Implemented OpenAITranslationProvider class with full OpenAI Chat Completions API integration
  • All 6 error codes implemented with French messages: OPENAI_RATE_LIMITED, OPENAI_INVALID_KEY, OPENAI_QUOTA_EXCEEDED, OPENAI_TIMEOUT, OPENAI_SERVICE_ERROR, OPENAI_CONTEXT_TOO_LONG
  • Retry logic with exponential backoff for transient errors (rate limits, timeouts, service errors)
  • Health check with 60s TTL caching and model availability verification
  • Registry integration with auto-registration when OPENAI_ENABLED=true
  • Custom system prompt injection via request.metadata["custom_prompt"]
  • Language name mapping for better LLM understanding (same as Ollama)
  • 44 unit tests created and all passing
  • Configuration updated in config.py with OPENAI_TIMEOUT, OPENAI_MAX_RETRIES, OPENAI_RETRY_DELAY, OPENAI_BASE_URL, OPENAI_HEALTH_CHECK_TIMEOUT
  • Auto-registration added to init.py
  • All acceptance criteria (AC1-AC8) satisfied

Code Review Fixes (2026-02-21)

  • [HIGH] Added model info to health_check() return (model, model_available fields per Task 3.4)
  • [MEDIUM] Added configurable health_check_timeout parameter (default 5s, via OPENAI_HEALTH_CHECK_TIMEOUT)
  • [MEDIUM] Added reset_openai_provider() function to reset singleton when config changes
  • [MEDIUM] Added API key validation (empty key raises ValueError)
  • [MEDIUM] Added 11 new tests covering: empty API key, text too long preemptive check, malformed API responses (empty choices, missing content), health check model info, reset function

File List

Files Created:

  • services/providers/openai_provider.py - Main OpenAI provider implementation (660 lines)
  • tests/test_providers/test_openai_provider.py - 44 unit tests covering all functionality

Files Modified:

  • services/providers/__init__.py - Added OpenAI auto-registration
  • services/providers/config.py - Added OPENAI_TIMEOUT, OPENAI_MAX_RETRIES, OPENAI_RETRY_DELAY, OPENAI_BASE_URL, OPENAI_HEALTH_CHECK_TIMEOUT
  • services/providers/README.md - OpenAI section (Task 7)
  • .env.example - Added OPENAI_HEALTH_CHECK_TIMEOUT and OpenAI config options

Change Log

  • 2026-02-21: [AI Code Review 2-5/2-6] Fixes: defensive JSON for 429/400, tokens_used in success log, ProviderSettings.openai base_url in config, File List README
  • 2026-02-21: Code review fixes applied - Added model info to health_check, configurable health check timeout, reset function for singleton, API key validation, 11 new tests
  • 2026-02-21: Story 2.5 implementation complete - OpenAI provider with cloud LLM translation, custom prompts, comprehensive error handling with French messages, retry logic, health checks, and 44 passing tests