Files
office_translator/_bmad-output/implementation-artifacts/2-9-processor-powerpoint-pptx.md
Sepehr Ramezani 26bd096a06 feat: production deployment - full update with providers, admin, glossaries, pricing, tests
Major changes across backend, frontend, infrastructure:
- Provider system with model selection (Google, DeepL, OpenAI, Ollama, Google Cloud)
- Admin panel: user management, pricing, settings
- Glossary system with CSV import/export
- Subscription and tier quota management
- Security hardening (rate limiting, API key auth, path traversal fixes)
- Docker compose for dev, prod, and IONOS deployment
- Alembic migrations for new tables
- Frontend: dashboard, pricing page, landing page, i18n (en/fr)
- Test suite and verification scripts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-25 15:01:47 +02:00

16 KiB

Story 2.9: Processor PowerPoint (.pptx)

Status: done

Story

As a user, I want to translate PowerPoint files while preserving slides, layouts, and images, So that I receive a translated presentation ready to present.

Acceptance Criteria

  1. AC1: Text Box Translation - Given a valid .pptx file, when PowerPointTranslator.translate_file() is called, then text boxes and shapes are translated
  2. AC2: Slide Layout Preservation - Slide layouts and master slides are preserved (python-pptx preserves by default)
  3. AC3: Image Preservation - Images and charts remain in their original positions
  4. AC4: Animation Preservation - Animations are preserved (python-pptx preserves by default)
  5. AC5: PowerPoint Compatibility - The translated file opens in Microsoft PowerPoint without corruption error (FR16)
  6. AC6: Error Handling - Unsupported/corrupted files return structured error with code INVALID_FORMAT or PPTX_CORRUPTED (HTTP 400)
  7. AC7: Provider Integration - Translator uses new TranslationProvider interface from services/providers/ (supports fallback chain)

Current Implementation Status

Existing code in translators/pptx_translator.py:

  • Batch translation optimization (5-10x faster)
  • Setter pattern for applying translations
  • Text frame collection (paragraphs, runs)
  • Table handling (cells with text frames)
  • Group shapes handling (recursive)
  • Smart art handling
  • Notes slide handling
  • Uses new TranslationProvider interface
  • Structured error codes (PptxProcessorError)
  • File validation (magic bytes, extension, size)
  • Progress callback for large files
  • structlog-compatible logging
  • Proper logging (no print() statements)

Tasks / Subtasks

  • Task 1: Integrate with new Provider Interface (AC: 7)

    • 1.1 Update PowerPointTranslator to accept TranslationProvider instance
    • 1.2 Replace translation_service.translate_batch() with provider.translate_batch() using TranslationRequest
    • 1.3 Handle TranslationResponse with error/error_code fields
    • 1.4 Support custom system prompt via request.metadata
  • Task 2: Add Structured Error Handling (AC: 6)

    • 2.1 Add PptxProcessorError exception class with to_dict() method (same pattern as ExcelProcessorError)
    • 2.2 Define error codes: PPTX_READ_ERROR, PPTX_WRITE_ERROR, PPTX_CORRUPTED, INVALID_FORMAT, PPTX_TOO_LARGE
    • 2.3 Wrap Presentation() load in try/except with French error messages
    • 2.4 Validate file format (magic bytes PK header for .pptx)
    • 2.5 Add file size validation (50MB max)
  • Task 3: Add Progress Callback (AC: 5)

    • 3.1 Add optional progress_callback parameter to translate_file()
    • 3.2 Emit progress during processing: {"slide": N, "total_slides": M, "runs_translated": X}
    • 3.3 Ensure progress latency < 500ms (NFR3)
  • Task 4: Verify Layouts & Animations (AC: 2, 4)

    • 4.1 Test with master slides (verify layout preserved)
    • 4.2 Test with animations (verify preserved - python-pptx handles automatically)
    • 4.3 Test with images (verify positions preserved)
    • 4.4 Add unit tests for these scenarios
  • Task 5: Update Logging (AC: 6)

    • 5.1 Replace print() statements with structlog-compatible logging
    • 5.2 Log metadata only: file_name, slides_count, runs_translated, processing_time
    • 5.3 NO document content in logs (NFR11, NFR16)
  • Task 6: Unit Tests (AC: 1-7)

    • 6.1 Create tests/test_translators/test_pptx_translator.py
    • 6.2 Test text box/run translation
    • 6.3 Test table translation
    • 6.4 Test group shape handling
    • 6.5 Test image preservation
    • 6.6 Test animation preservation
    • 6.7 Test error scenarios (corrupted, invalid format)
    • 6.8 Test progress callback
  • Task 7: Integration Update (AC: 7)

    • 7.1 Update main.py to pass provider to pptx_translator
    • 7.2 Handle PptxProcessorError in global error handler
    • 7.3 Update translators/__init__.py exports

Dev Notes

Previous Story Intelligence (Stories 2.7 & 2.8)

Critical patterns from Excel and Word Translators to reuse:

  1. Error class pattern (PptxProcessorError):
class PptxProcessorError(Exception):
    """Exception for PowerPoint processing errors with structured error codes."""
    
    INVALID_FORMAT = "INVALID_FORMAT"
    PPTX_CORRUPTED = "PPTX_CORRUPTED"
    PPTX_READ_ERROR = "PPTX_READ_ERROR"
    PPTX_WRITE_ERROR = "PPTX_WRITE_ERROR"
    PPTX_TOO_LARGE = "PPTX_TOO_LARGE"
    
    ERROR_MESSAGES = {
        INVALID_FORMAT: "Format de fichier non supporte. Utilisez .pptx.",
        PPTX_CORRUPTED: "Le fichier PowerPoint est corrompu ou illisible.",
        PPTX_READ_ERROR: "Erreur lors de la lecture du fichier PowerPoint.",
        PPTX_WRITE_ERROR: "Erreur lors de la creation du fichier traduit.",
        PPTX_TOO_LARGE: "Le fichier est trop volumineux (max 50 Mo).",
    }
  1. Logging pattern (structlog-compatible):
try:
    import structlog
    logger = structlog.get_logger(__name__)
    _HAS_STRUCTLOG = True
except ImportError:
    import logging
    logger = logging.getLogger(__name__)
    _HAS_STRUCTLOG = False

def _log_info(event: str, **kwargs):
    """Log info with structlog or standard logging compatibility."""
    if _HAS_STRUCTLOG:
        logger.info(event, **kwargs)
    else:
        msg = f"{event} " + " ".join(f"{k}={v}" for k, v in kwargs.items())
        logger.info(msg)
  1. Provider integration:
def __init__(self, provider: Optional[TranslationProvider] = None):
    self._provider = provider
    self._custom_prompt: Optional[str] = None

def set_provider(self, provider: TranslationProvider) -> None:
    self._provider = provider

def set_custom_prompt(self, prompt: Optional[str]) -> None:
    self._custom_prompt = prompt
  1. File validation pattern:
MAX_FILE_SIZE_MB = 50
PPTX_MAGIC_BYTES = b"PK"  # .pptx files are ZIP archives

def _validate_file(self, file_path: Path) -> None:
    # Check extension
    if file_path.suffix.lower() != ".pptx":
        raise PptxProcessorError(code=PptxProcessorError.INVALID_FORMAT, ...)
    
    # Check magic bytes
    with open(file_path, "rb") as f:
        header = f.read(4)
    if header[:2] != self.PPTX_MAGIC_BYTES:
        raise PptxProcessorError(code=PptxProcessorError.INVALID_FORMAT, ...)
    
    # Check size
    file_size_mb = file_path.stat().st_size / (1024 * 1024)
    if file_size_mb > self.MAX_FILE_SIZE_MB:
        raise PptxProcessorError(code=PptxProcessorError.PPTX_TOO_LARGE, ...)

Existing Code Structure

File: translators/pptx_translator.py

class PowerPointTranslator:
    def __init__(self):
        self.translation_service = translation_service  # OLD interface
    
    def translate_file(self, input_path: Path, output_path: Path, target_language: str) -> Path:
        presentation = Presentation(input_path)
        
        text_elements = []
        image_shapes = []
        
        for slide_idx, slide in enumerate(presentation.slides):
            # Collect from notes
            if slide.has_notes_slide and slide.notes_slide.notes_text_frame:
                self._collect_from_text_frame(slide.notes_slide.notes_text_frame, text_elements)
            
            # Collect from shapes
            for shape in slide.shapes:
                self._collect_from_shape(shape, text_elements, slide, image_shapes)
        
        # Batch translate
        if text_elements:
            texts = [elem[0] for elem in text_elements]
            translated_texts = self.translation_service.translate_batch(texts, target_language)
            
            for (original_text, setter), translated in zip(text_elements, translated_texts):
                if translated is not None and setter is not None:
                    setter(translated)
        
        presentation.save(output_path)
        return output_path

python-pptx Library Specifics

Installation:

pip install python-pptx>=1.0.0

Key Classes:

Class Purpose
pptx.Presentation Represents a PowerPoint presentation
pptx.slide.Slide A single slide
pptx.shapes.base.BaseShape Base class for all shapes
pptx.text.text.TextFrame Text frame with paragraphs
pptx.text.run.Run A run of text with formatting
pptx.shapes.group.GroupShape Grouped shapes
pptx.enum.shapes.MSO_SHAPE_TYPE Shape type enumeration

Run Text Handling (same pattern as Word):

def _collect_from_text_frame(self, text_frame, text_elements):
    """Collect text from a text frame."""
    if not text_frame.text.strip():
        return
    
    for paragraph in text_frame.paragraphs:
        if not paragraph.text.strip():
            continue
        
        for run in paragraph.runs:
            if run.text and run.text.strip():
                def make_setter(r):
                    def setter(text):
                        r.text = text
                    return setter
                text_elements.append((run.text, make_setter(run)))

Magic Bytes Validation:

# .pptx files are ZIP archives starting with PK (same as .xlsx and .docx)
PPTX_MAGIC_BYTES = b'PK'

Error Codes

Code HTTP Scenario French Message
INVALID_FORMAT 400 Not a .pptx file "Format de fichier non supporte. Utilisez .pptx."
PPTX_CORRUPTED 400 File is corrupted "Le fichier PowerPoint est corrompu ou illisible."
PPTX_READ_ERROR 400 Cannot read file "Erreur lors de la lecture du fichier PowerPoint."
PPTX_WRITE_ERROR 500 Cannot write output "Erreur lors de la creation du fichier traduit."
PPTX_TOO_LARGE 413 File exceeds limit "Le fichier est trop volumineux (max 50 Mo)."

Architecture Compliance

Per _bmad-output/planning-artifacts/architecture.md:

Error Format:

{
  "error": "PPTX_CORRUPTED",
  "message": "Le fichier PowerPoint est corrompu ou illisible.",
  "details": {
    "file_name": "presentation.pptx",
    "error_detail": "Invalid presentation structure"
  }
}

Naming Conventions:

  • File: pptx_translator.py (snake_case)
  • Class: PowerPointTranslator (PascalCase)
  • Error class: PptxProcessorError (PascalCase)
  • Error codes: PPTX_* (UPPER_SNAKE_CASE)
  • JSON fields: snake_case

File Structure

Files to Modify:

  • translators/pptx_translator.py - Main changes (provider integration, error handling, progress, logging)

Files to Create:

  • tests/test_translators/test_pptx_translator.py - Unit tests

Testing Strategy

# Unit tests
pytest tests/test_translators/test_pptx_translator.py -v

# All translator tests
pytest tests/test_translators/ -v

# With coverage
pytest tests/test_translators/ --cov=translators -v

Key Differences from Excel/Word Translators

Feature Excel (.xlsx) Word (.docx) PowerPoint (.pptx)
Library openpyxl python-docx python-pptx
Text Unit Cells Runs Runs (in shapes/text frames)
Special Handling Formulas, merged cells, charts Headers/footers, nested tables Notes slides, group shapes
Magic Bytes PK (ZIP) PK (ZIP) PK (ZIP)
Structure Preservation Sheets → Rows → Cells Sections → Paragraphs/Tables → Runs Slides → Shapes → Text Frames → Runs

References

  • [Source: translators/pptx_translator.py - Existing implementation]
  • [Source: translators/excel_translator.py - Pattern reference for provider integration]
  • [Source: translators/word_translator.py - Pattern reference for error handling]
  • [Source: services/providers/base.py - TranslationProvider interface]
  • [Source: services/providers/schemas.py - TranslationRequest/Response]
  • [Source: _bmad-output/planning-artifacts/epics.md#Story 2.9]
  • [Source: _bmad-output/planning-artifacts/prd.md#FR11 Tables]
  • [Source: _bmad-output/planning-artifacts/prd.md#FR12 Images]
  • [Source: _bmad-output/planning-artifacts/prd.md#FR15 Animations]
  • [Source: _bmad-output/planning-artifacts/prd.md#NFR11 No content in logs]
  • [Source: _bmad-output/implementation-artifacts/2-7-processor-excel-xlsx.md - Previous story patterns]
  • [Source: _bmad-output/implementation-artifacts/2-8-processor-word-docx.md - Previous story patterns]
  • [Source: https://python-pptx.readthedocs.io/en/latest/ - python-pptx documentation]

Dev Agent Record

Agent Model Used

glm-5

Debug Log References

None

Completion Notes List

  1. Task 1 Complete: Integrated PowerPointTranslator with new TranslationProvider interface. Added set_provider() and set_custom_prompt() methods. Provider uses TranslationRequest/TranslationResponse schemas.

  2. Task 2 Complete: Added PptxProcessorError exception class with 5 error codes (INVALID_FORMAT, PPTX_CORRUPTED, PPTX_READ_ERROR, PPTX_WRITE_ERROR, PPTX_TOO_LARGE) and French messages. File validation includes magic bytes (PK header), extension check, and 50MB size limit.

  3. Task 3 Complete: Added progress_callback parameter to translate_file(). Emits progress events with {"slide": N, "total_slides": M, "runs_translated": X}.

  4. Task 4 Complete: Verified layout and animation preservation through unit tests. python-pptx handles these automatically.

  5. Task 5 Complete: Replaced all print() statements with structlog-compatible logging. Only logs metadata (file_name, slides_count, runs_translated, processing_time_ms) - no document content.

  6. Task 6 Complete: Created comprehensive test suite with 31 tests covering:

    • Error handling (PptxProcessorError)
    • File validation (extension, magic bytes, size)
    • Text box/run translation
    • Table translation
    • Group shape handling
    • Image preservation
    • Animation preservation
    • Notes slide handling
    • Progress callback
    • Provider integration
    • Legacy fallback
    • PowerPoint compatibility
  7. Task 7 Complete: Updated translators/__init__.py to export PptxProcessorError.

File List

  • translators/pptx_translator.py - Updated with provider integration, error handling, progress callback, and logging
  • translators/__init__.py - Updated exports to include PptxProcessorError
  • tests/test_translators/test_pptx_translator.py - Created with 31 unit tests

Change Log

  • 2026-02-21: Implemented Story 2.9 - PowerPoint processor with provider integration, structured errors, progress callback, and comprehensive tests
  • 2026-02-21: Code review fixes - Added PptxProcessorError handler in main.py, fixed source_language parameter, improved image preservation tests, added HTTP mapping tests

Senior Developer Review (AI)

Reviewer: Claude (Code Review Workflow)
Date: 2026-02-21
Outcome: APPROVED (with fixes applied)

Issues Found & Fixed

Severity Issue Status
HIGH PptxProcessorError not imported in main.py FIXED
HIGH No exception handler for PptxProcessorError in main.py FIXED
HIGH source_language not passed to pptx_translator.translate_file() FIXED
MEDIUM Image preservation test was skipped FIXED
MEDIUM Missing HTTP status code mapping tests FIXED

Changes Applied

  1. main.py - Added PptxProcessorError import and exception handler with HTTP status mapping (400/413/500)
  2. main.py - Added source_language parameter to all pptx_translator.translate_file() calls
  3. tests/test_translators/test_pptx_translator.py - Fixed image preservation tests (no longer skipped)
  4. tests/test_translators/test_pptx_translator.py - Added TestPptxProcessorErrorHTTPMapping class (4 tests)

Test Results

36 passed, 1 warning in 0.58s

AC Validation Summary

AC Status Evidence
AC1 PASS TestTextBoxTranslation tests
AC2 PASS TestAnimationPreservation tests
AC3 PASS TestImagePreservation tests
AC4 PASS TestAnimationPreservation tests
AC5 PASS TestPowerPointCompatibility tests
AC6 PASS TestErrorHandling + TestPptxProcessorErrorHTTPMapping tests
AC7 PASS TestProviderIntegration tests