Analysis/_bmad-output/implementation-artifacts/1-2-ingestion-de-fichiers-excel-csv-backend.md
2026-01-11 22:56:02 +01:00

70 lines
2.7 KiB
Markdown

# Story 1.2: Ingestion de Fichiers Excel/CSV (Backend)
Status: review
## Story
As a Julien (Analyst),
I want to upload an Excel or CSV file,
so that the system can read my production data.
## Acceptance Criteria
1. **Upload Endpoint:** A POST endpoint `/api/v1/upload` accepts `.xlsx`, `.xls`, and `.csv` files.
2. **File Validation:** Backend validates MIME type and file extension. Returns clear error for unsupported formats.
3. **Data Parsing:** Uses Pandas to read the file into a DataFrame. Handles multiple sheets (takes the first by default).
4. **Type Inference:** Backend automatically detects column types (int, float, string, date).
5. **Arrow Serialization:** Converts the DataFrame to an Apache Arrow Table and streams it using IPC format.
6. **Persistence (Ephemeral):** Temporarily saves the file metadata and a pointer to the dataset in memory (stateless session simulation).
## Tasks / Subtasks
- [x] **API Route Implementation** (AC: 1, 2)
- [x] Create `/backend/app/api/v1/upload.py`.
- [x] Implement file upload using `FastAPI.UploadFile`.
- [x] Add validation logic for extensions and MIME types.
- [x] **Data Processing Logic** (AC: 3, 4)
- [x] Implement `backend/app/core/engine/ingest.py` helper.
- [x] Use `pandas` to read Excel/CSV.
- [x] Basic data cleaning (strip whitespace from headers).
- [x] **High-Performance Bridge** (AC: 5)
- [x] Implement Arrow conversion using `pyarrow`.
- [x] Set up `StreamingResponse` with `application/vnd.apache.arrow.stream`.
- [x] **Session & Metadata** (AC: 6)
- [x] Return column metadata (name, inferred type) in the response headers or as a separate JSON part.
## Dev Notes
- **Performance:** For 50k rows, Arrow is mandatory. Zero-copy binary transfer implemented.
- **Libraries:** Using `pandas`, `openpyxl`, and `pyarrow`.
- **Type Safety:** Column metadata is stringified in the `X-Column-Metadata` header.
### Project Structure Notes
- Created `backend/app/core/engine/ingest.py` for pure data logic.
- Created `backend/app/api/v1/upload.py` for the FastAPI route.
- Updated `backend/main.py` to include the router.
### References
- [Source: architecture.md#API & Communication Patterns]
- [Source: project-context.md#Data & State Architecture]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Implemented `/api/v1/upload` endpoint.
- Added validation for `.xlsx`, `.xls`, and `.csv`.
- Implemented automated type inference (numeric, categorical, date).
- Successfully converted Pandas DataFrames to Apache Arrow IPC streams.
- Verified with 3 automated tests (Health, CSV Upload, Error Handling).
### File List
- /backend/app/api/v1/upload.py
- /backend/app/core/engine/ingest.py
- /backend/main.py
- /backend/tests/test_upload.py