Files

Sepehr Ramezani 26bd096a06 feat: production deployment - full update with providers, admin, glossaries, pricing, tests

Major changes across backend, frontend, infrastructure:
- Provider system with model selection (Google, DeepL, OpenAI, Ollama, Google Cloud)
- Admin panel: user management, pricing, settings
- Glossary system with CSV import/export
- Subscription and tier quota management
- Security hardening (rate limiting, API key auth, path traversal fixes)
- Docker compose for dev, prod, and IONOS deployment
- Alembic migrations for new tables
- Frontend: dashboard, pricing page, landing page, i18n (en/fr)
- Test suite and verification scripts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-04-25 15:01:47 +02:00

20 KiB

Raw Blame History

Story 2.5: Provider OpenAI (LLM Cloud)

Status: done

Story

As a system, I want to integrate OpenAI API as an LLM provider, so that Pro users can translate documents with GPT models.

Acceptance Criteria

AC1: API Integration - Given OPENAI_API_KEY is configured in environment, when OpenAIProvider.translate_text() is called, then text is translated using GPT-4 or specified model
AC2: Custom System Prompt - Custom system prompt can be injected via request metadata to guide translation context
AC3: Rate Limiting - API rate limits return error "PROVIDER_RATE_LIMITED" with retry suggestion (HTTP 429)
AC4: Invalid Key Handling - Invalid API key returns error "OPENAI_INVALID_KEY" with HTTP 401
AC5: Graceful Error Handling - All errors return structured JSON (never HTTP 500) with French messages
AC6: Health Check - Provider is_available() returns True when API key is valid and service is reachable
AC7: Registry Integration - Provider is registered in ProviderRegistry and appears in fallback chain
AC8: Unit Tests - Tests verify all error scenarios, rate limiting handling, and mock OpenAI API responses

Tasks / Subtasks

Task 1: Create OpenAI Provider Implementation (AC: 1, 2)
- 1.1 Create services/providers/openai_provider.py
- 1.2 Implement OpenAITranslationProvider class extending TranslationProvider
- 1.3 Implement translate_text() using OpenAI Chat Completions API
- 1.4 Support custom system prompt injection via request metadata
- 1.5 Configure default translation system prompt with temperature 0.3
Task 2: Implement Error Handling (AC: 3, 4, 5)
- 2.1 Define error codes: OPENAI_RATE_LIMITED, OPENAI_INVALID_KEY, OPENAI_QUOTA_EXCEEDED, OPENAI_TIMEOUT, OPENAI_SERVICE_ERROR, OPENAI_CONTEXT_TOO_LONG
- 2.2 Implement OpenAIProviderError exception class (follow Ollama pattern)
- 2.3 Map OpenAI API errors to structured error responses with French messages
- 2.4 Add retry logic with exponential backoff for rate limits and timeouts
- 2.5 Add timeout configuration (default 60s for OpenAI - faster than Ollama)
- 2.6 Handle specific OpenAI errors: rate_limit_exceeded, insufficient_quota, invalid_api_key
Task 3: Implement Health Check (AC: 6)
- 3.1 Implement is_available() to validate API key and service reachability
- 3.2 Add health_check() with caching (TTL 60s) matching existing provider pattern
- 3.3 Make lightweight API call to verify credentials (e.g., list models or simple completion)
- 3.4 Return ProviderHealthStatus with availability, latency, and model info
Task 4: Registry Integration (AC: 7)
- 4.1 Add register_openai_provider() function
- 4.2 Add get_openai_provider() singleton function
- 4.3 Update services/providers/__init__.py to auto-register OpenAI when OPENAI_ENABLED=true
- 4.4 Verify provider appears in fallback chain when configured
Task 5: Configuration Updates (AC: 1, 2)
- 5.1 Verify OPENAI_API_KEY, OPENAI_MODEL, OPENAI_ENABLED in config.py (already present)
- 5.2 Add OpenAI-specific configuration options to config.py:
  - OPENAI_TIMEOUT=60 (faster than Ollama's 120s)
  - OPENAI_MAX_RETRIES=3
  - OPENAI_RETRY_DELAY=1.0
  - OPENAI_BASE_URL (optional, for custom endpoints like Azure OpenAI)
- 5.3 Update .env.example with OpenAI-specific config
Task 6: Create Unit Tests (AC: 8)
- 6.1 Create tests/test_providers/test_openai_provider.py
- 6.2 Test successful translation with mocked OpenAI API
- 6.3 Test all error scenarios (rate limited, invalid key, quota exceeded, timeout)
- 6.4 Test custom system prompt injection
- 6.5 Test retry logic for rate limits
- 6.6 Test health check functionality
- 6.7 Test registry integration
Task 7: Update Documentation (AC: 1-8)
- 7.1 Update services/providers/README.md with OpenAI section
- 7.2 Document OpenAI setup requirements (API key from platform.openai.com)
- 7.3 Document supported models and pricing considerations
- 7.4 Document rate limiting behavior and retry strategy

Dev Notes

OpenAI API Specifics

OpenAI Chat Completions API:

Endpoint	Method	Purpose
`/v1/chat/completions`	POST	Generate translation
`/v1/models`	GET	List available models (for health check)

API Request Format:

OPENAI_API_URL = "https://api.openai.com/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {OPENAI_API_KEY}",
    "Content-Type": "application/json"
}

payload = {
    "model": "gpt-4o-mini",  # or gpt-4, gpt-3.5-turbo
    "messages": [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": text_to_translate}
    ],
    "temperature": 0.3,  # Lower for consistent translation
    "max_tokens": 4096   # Adjust based on expected output
}

API Response Format:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o-mini",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Bonjour, comment allez-vous?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 10,
    "total_tokens": 60
  }
}

OpenAI Error Codes:

OpenAI Error	HTTP	Mapped Code	French Message
`rate_limit_exceeded`	429	`OPENAI_RATE_LIMITED`	"Limite de requêtes OpenAI atteinte. Réessayez dans {retry_after}s."
`insufficient_quota`	429	`OPENAI_QUOTA_EXCEEDED`	"Quota OpenAI épuisé. Vérifiez votre facturation."
`invalid_api_key`	401	`OPENAI_INVALID_KEY`	"Clé API OpenAI invalide. Vérifiez votre configuration."
`context_length_exceeded`	400	`OPENAI_CONTEXT_TOO_LONG`	"Texte trop long (max {max_tokens} tokens)."
`server_error`	500	`OPENAI_SERVICE_ERROR`	"Service OpenAI temporairement indisponible."
Timeout	-	`OPENAI_TIMEOUT`	"Délai d'attente OpenAI dépassé."

Recommended Models for Translation

Model	Cost	Speed	Quality	Best For
`gpt-4o-mini`	$0.15/M tokens	Fast	Good	Default choice, cost-effective
`gpt-4o`	$2.50/M tokens	Medium	Excellent	High-quality requirements
`gpt-4`	$30/M tokens	Slower	Excellent	Critical translations
`gpt-3.5-turbo`	$0.50/M tokens	Fastest	Good	Speed priority

Default: gpt-4o-mini (best value for translation)

Default System Prompt for Translation

DEFAULT_TRANSLATION_PROMPT = """You are a professional translator. Translate the following text from {source_lang} to {target_lang}.

Rules:
- Translate ONLY the text, do not add explanations or notes
- Preserve the original formatting, line breaks, and structure
- Maintain the original tone and style
- For technical terms, use the standard translation in the target language
- If the text contains proper nouns or brand names, keep them unchanged unless there's a well-known translation"""

def _build_system_prompt(
    source_lang: str, 
    target_lang: str, 
    custom_prompt: Optional[str] = None
) -> str:
    if custom_prompt:
        return custom_prompt
    return DEFAULT_TRANSLATION_PROMPT.format(
        source_lang=source_lang, 
        target_lang=target_lang
    )

Architecture Compliance

Per _bmad-output/planning-artifacts/architecture.md:

Error Format:

{
  "error": "OPENAI_RATE_LIMITED",
  "message": "Limite de requêtes OpenAI atteinte. Réessayez dans 20s.",
  "details": {
    "provider": "openai",
    "retry_after_seconds": 20,
    "model": "gpt-4o-mini"
  }
}

Never return HTTP 500 - All errors must be 4xx or 502 (upstream error).

Naming Conventions:

File: openai_provider.py (snake_case)
Class: OpenAITranslationProvider (PascalCase)
Error codes: OPENAI_* (UPPER_SNAKE_CASE)
JSON fields: snake_case

Previous Story Intelligence (Story 2.4 - Ollama)

What Worked Well:

httpx library for HTTP requests (supports async and sync)
Error codes with to_dict() method for consistent formatting
Retry logic with exponential backoff for transient errors
Health check with 60s TTL caching
Thread-safe singleton pattern for provider instance
Structlog-compatible logging with keyword args
Language name mapping for better LLM understanding

Patterns to Reuse:

# Error codes pattern
OPENAI_RATE_LIMITED = "OPENAI_RATE_LIMITED"
OPENAI_INVALID_KEY = "OPENAI_INVALID_KEY"
OPENAI_QUOTA_EXCEEDED = "OPENAI_QUOTA_EXCEEDED"
OPENAI_TIMEOUT = "OPENAI_TIMEOUT"
OPENAI_SERVICE_ERROR = "OPENAI_SERVICE_ERROR"
OPENAI_CONTEXT_TOO_LONG = "OPENAI_CONTEXT_TOO_LONG"

_RETRYABLE_ERRORS = {OPENAI_RATE_LIMITED, OPENAI_TIMEOUT, OPENAI_SERVICE_ERROR}

# Exception class pattern
class OpenAIProviderError(Exception):
    def __init__(self, code: str, message: str, details: Optional[Dict[str, Any]] = None):
        self.code = code
        self.message = message
        self.details = details or {}
        super().__init__(message)

    def to_dict(self) -> Dict[str, Any]:
        result = {"error": self.code, "message": self.message}
        if self.details:
            result["details"] = self.details
        return result

# Retry logic pattern
def _translate_with_retry(self, text: str, system_prompt: str) -> str:
    last_error = None
    for attempt in range(self.max_retries + 1):
        try:
            return self._make_api_request(text, system_prompt)
        except OpenAIProviderError as e:
            last_error = e
            if e.code not in _RETRYABLE_ERRORS or attempt == self.max_retries:
                raise
            delay = self.retry_delay * (2 ** attempt)
            time.sleep(delay)
    raise last_error

Key Differences from Ollama:

Requires API key authentication (Bearer token)
Uses OpenAI's specific error codes and headers
Rate limiting is more strict (pay-per-use)
Faster response times (60s timeout vs 120s)
No model "pulling" concept - models are always available
Quota management is critical (billing impact)

File Structure

Files to Create:

services/providers/openai_provider.py - Main OpenAI provider implementation
tests/test_providers/test_openai_provider.py - Unit tests

Files to Modify:

services/providers/__init__.py - Add OpenAI auto-registration
services/providers/config.py - Add OPENAI_TIMEOUT, OPENAI_MAX_RETRIES, OPENAI_RETRY_DELAY, OPENAI_BASE_URL
.env.example - Add OpenAI-specific configuration options
services/providers/README.md - Add OpenAI documentation

Error Codes to Implement

Code	HTTP	Scenario	Message Template
`OPENAI_RATE_LIMITED`	429	Rate limit hit	"Limite de requêtes atteinte. Réessayez dans {retry_after}s."
`OPENAI_INVALID_KEY`	401	Invalid API key	"Clé API invalide. Vérifiez OPENAI_API_KEY."
`OPENAI_QUOTA_EXCEEDED`	429	Billing quota exceeded	"Quota épuisé. Vérifiez votre facturation OpenAI."
`OPENAI_TIMEOUT`	502	Request timeout	"Délai dépassé. Le service est lent."
`OPENAI_SERVICE_ERROR`	502	OpenAI server error	"Service temporairement indisponible."
`OPENAI_CONTEXT_TOO_LONG`	413	Context exceeds model limit	"Texte trop long (max {max_tokens} tokens)."

Configuration

Environment Variables (.env.example):

# OpenAI Provider (Cloud LLM)
OPENAI_ENABLED=true
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx
OPENAI_MODEL=gpt-4o-mini
OPENAI_TIMEOUT=60
OPENAI_MAX_RETRIES=3
OPENAI_RETRY_DELAY=1.0
# OPENAI_BASE_URL=https://api.openai.com/v1  # Optional: for Azure OpenAI or proxies

Provider Config (services/providers/config.py): Add to existing OpenAI section:

OPENAI_TIMEOUT: int = int(os.getenv("OPENAI_TIMEOUT", "60"))
OPENAI_MAX_RETRIES: int = int(os.getenv("OPENAI_MAX_RETRIES", "3"))
OPENAI_RETRY_DELAY: float = float(os.getenv("OPENAI_RETRY_DELAY", "1.0"))
OPENAI_BASE_URL: str = os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1")

Testing Strategy

Unit Tests (Mocked):

Mock httpx or requests responses
Test successful translation
Test all error scenarios (rate limit, invalid key, quota exceeded, timeout)
Test custom system prompt injection
Test health check logic
Test retry logic for rate limits
Test registry integration

Test Commands:

# Unit tests only
pytest tests/test_providers/test_openai_provider.py -v

# All provider tests
pytest tests/test_providers/ -v

# With coverage
pytest tests/test_providers/ --cov=services/providers -v

Logging Pattern

try:
    import structlog
    logger = structlog.get_logger(__name__)
    _HAS_STRUCTLOG = True
except ImportError:
    import logging
    logger = logging.getLogger(__name__)
    _HAS_STRUCTLOG = False

def _log_info(event: str, **kwargs):
    """Log info with structlog or standard logging compatibility."""
    if _HAS_STRUCTLOG:
        logger.info(event, **kwargs)
    else:
        msg = f"{event} " + " ".join(f"{k}={v}" for k, v in kwargs.items())
        logger.info(msg)

# Good - metadata only (NO document content)
_log_info(
    "openai_translation_success",
    chars=len(text),
    source_lang=source_language,
    target_lang=target_language,
    model=self._model,
    latency_ms=round(latency * 1000, 2),
    tokens_used=response.usage.total_tokens,
)

_log_error(
    "openai_translation_failed",
    error_code=error.code,
    text_length=len(text),
    source_lang=source_language,
    target_lang=target_language,
    model=self._model,
)

Dependencies

Internal:

services/providers/base.py - TranslationProvider abstract class
services/providers/registry.py - ProviderRegistry
services/providers/config.py - Configuration
services/providers/schemas.py - TranslationRequest/Response models

External:

httpx - HTTP client (preferred for async/sync support)
structlog or standard logging - Structured logging

HTTP Client Pattern

Use httpx for OpenAI API calls:

import httpx

class OpenAITranslationProvider(TranslationProvider):
    def __init__(self, api_key: str, model: str = "gpt-4o-mini", timeout: int = 60, base_url: str = "https://api.openai.com/v1"):
        self._api_key = api_key
        self._model = model
        self._base_url = base_url.rstrip("/")
        self._timeout = timeout
        self._client = httpx.Client(
            timeout=timeout,
            headers={
                "Authorization": f"Bearer {api_key}",
                "Content-Type": "application/json"
            }
        )
    
    def _make_api_request(self, text: str, system_prompt: str) -> str:
        response = self._client.post(
            f"{self._base_url}/v1/chat/completions",
            json={
                "model": self._model,
                "messages": [
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": text}
                ],
                "temperature": 0.3,
                "max_tokens": 4096
            }
        )
        # ... error handling based on status code
        return response.json()["choices"][0]["message"]["content"]

Security Considerations

API Key Management:

API key stored in environment variable (never in code)
Key validated at initialization
Never log the API key (only last 4 characters if needed for debugging)

Data Privacy:

Never log document content (NFR11)
Only log metadata: text length, languages, model, timestamps
OpenAI may retain data per their privacy policy (different from Ollama's local processing)

Pro Feature Integration

Per PRD FR26: "Pro users can access LLM translation modes"

This provider will be used when:

User tier is "pro"
User selects "LLM" mode
User selects "OpenAI" as LLM provider

The tier check happens in the translation service/router, not in the provider itself.

Rate Limiting Handling

OpenAI returns rate limit info in response headers:

x-ratelimit-limit-requests
x-ratelimit-remaining-requests
x-ratelimit-reset-requests

Extract retry_after from error response or use exponential backoff.

References

[Source: _bmad-output/planning-artifacts/architecture.md#Error Handling]
[Source: _bmad-output/planning-artifacts/architecture.md#API Response Formats]
[Source: _bmad-output/planning-artifacts/epics.md#Story 2.5]
[Source: _bmad-output/planning-artifacts/prd.md#FR7 LLM providers (Ollama, OpenAI)]
[Source: _bmad-output/planning-artifacts/prd.md#NFR12 Zero HTTP 500 errors]
[Source: _bmad-output/implementation-artifacts/2-4-provider-ollama-llm-local.md]
[Source: services/providers/ollama_provider.py - Implementation pattern]
[Source: https://platform.openai.com/docs/api-reference/chat - OpenAI API docs]
[Source: https://platform.openai.com/docs/guides/error-codes - OpenAI Error Codes]

Dev Agent Record

Agent Model Used

Claude (GLM-5) via opencode

Debug Log References

Fixed test mocking issues for registry integration tests
Resolved ProvidersConfig import path in tests

Completion Notes List

✅ Implemented OpenAITranslationProvider class with full OpenAI Chat Completions API integration
✅ All 6 error codes implemented with French messages: OPENAI_RATE_LIMITED, OPENAI_INVALID_KEY, OPENAI_QUOTA_EXCEEDED, OPENAI_TIMEOUT, OPENAI_SERVICE_ERROR, OPENAI_CONTEXT_TOO_LONG
✅ Retry logic with exponential backoff for transient errors (rate limits, timeouts, service errors)
✅ Health check with 60s TTL caching and model availability verification
✅ Registry integration with auto-registration when OPENAI_ENABLED=true
✅ Custom system prompt injection via request.metadata["custom_prompt"]
✅ Language name mapping for better LLM understanding (same as Ollama)
✅ 44 unit tests created and all passing
✅ Configuration updated in config.py with OPENAI_TIMEOUT, OPENAI_MAX_RETRIES, OPENAI_RETRY_DELAY, OPENAI_BASE_URL, OPENAI_HEALTH_CHECK_TIMEOUT
✅ Auto-registration added to init.py
✅ All acceptance criteria (AC1-AC8) satisfied

Code Review Fixes (2026-02-21)

✅ [HIGH] Added model info to health_check() return (model, model_available fields per Task 3.4)
✅ [MEDIUM] Added configurable health_check_timeout parameter (default 5s, via OPENAI_HEALTH_CHECK_TIMEOUT)
✅ [MEDIUM] Added reset_openai_provider() function to reset singleton when config changes
✅ [MEDIUM] Added API key validation (empty key raises ValueError)
✅ [MEDIUM] Added 11 new tests covering: empty API key, text too long preemptive check, malformed API responses (empty choices, missing content), health check model info, reset function

File List

Files Created:

services/providers/openai_provider.py - Main OpenAI provider implementation (660 lines)
tests/test_providers/test_openai_provider.py - 44 unit tests covering all functionality

Files Modified:

services/providers/__init__.py - Added OpenAI auto-registration
services/providers/config.py - Added OPENAI_TIMEOUT, OPENAI_MAX_RETRIES, OPENAI_RETRY_DELAY, OPENAI_BASE_URL, OPENAI_HEALTH_CHECK_TIMEOUT
services/providers/README.md - OpenAI section (Task 7)
.env.example - Added OPENAI_HEALTH_CHECK_TIMEOUT and OpenAI config options

Change Log

2026-02-21: [AI Code Review 2-5/2-6] Fixes: defensive JSON for 429/400, tokens_used in success log, ProviderSettings.openai base_url in config, File List README
2026-02-21: Code review fixes applied - Added model info to health_check, configurable health check timeout, reset function for singleton, API key validation, 11 new tests
2026-02-21: Story 2.5 implementation complete - OpenAI provider with cloud LLM translation, custom prompts, comprehensive error handling with French messages, retry logic, health checks, and 44 passing tests

20 KiB Raw Blame History