Analysis/_bmad-output/implementation-artifacts/1-2-ingestion-de-fichiers-excel-csv-backend.md
2026-01-11 22:56:02 +01:00

2.7 KiB

Story 1.2: Ingestion de Fichiers Excel/CSV (Backend)

Status: review

Story

As a Julien (Analyst), I want to upload an Excel or CSV file, so that the system can read my production data.

Acceptance Criteria

  1. Upload Endpoint: A POST endpoint /api/v1/upload accepts .xlsx, .xls, and .csv files.
  2. File Validation: Backend validates MIME type and file extension. Returns clear error for unsupported formats.
  3. Data Parsing: Uses Pandas to read the file into a DataFrame. Handles multiple sheets (takes the first by default).
  4. Type Inference: Backend automatically detects column types (int, float, string, date).
  5. Arrow Serialization: Converts the DataFrame to an Apache Arrow Table and streams it using IPC format.
  6. Persistence (Ephemeral): Temporarily saves the file metadata and a pointer to the dataset in memory (stateless session simulation).

Tasks / Subtasks

  • API Route Implementation (AC: 1, 2)
    • Create /backend/app/api/v1/upload.py.
    • Implement file upload using FastAPI.UploadFile.
    • Add validation logic for extensions and MIME types.
  • Data Processing Logic (AC: 3, 4)
    • Implement backend/app/core/engine/ingest.py helper.
    • Use pandas to read Excel/CSV.
    • Basic data cleaning (strip whitespace from headers).
  • High-Performance Bridge (AC: 5)
    • Implement Arrow conversion using pyarrow.
    • Set up StreamingResponse with application/vnd.apache.arrow.stream.
  • Session & Metadata (AC: 6)
    • Return column metadata (name, inferred type) in the response headers or as a separate JSON part.

Dev Notes

  • Performance: For 50k rows, Arrow is mandatory. Zero-copy binary transfer implemented.
  • Libraries: Using pandas, openpyxl, and pyarrow.
  • Type Safety: Column metadata is stringified in the X-Column-Metadata header.

Project Structure Notes

  • Created backend/app/core/engine/ingest.py for pure data logic.
  • Created backend/app/api/v1/upload.py for the FastAPI route.
  • Updated backend/main.py to include the router.

References

  • [Source: architecture.md#API & Communication Patterns]
  • [Source: project-context.md#Data & State Architecture]

Dev Agent Record

Agent Model Used

{{agent_model_name_version}}

Completion Notes List

  • Implemented /api/v1/upload endpoint.
  • Added validation for .xlsx, .xls, and .csv.
  • Implemented automated type inference (numeric, categorical, date).
  • Successfully converted Pandas DataFrames to Apache Arrow IPC streams.
  • Verified with 3 automated tests (Health, CSV Upload, Error Handling).

File List

  • /backend/app/api/v1/upload.py
  • /backend/app/core/engine/ingest.py
  • /backend/main.py
  • /backend/tests/test_upload.py