# Story 1.2: Ingestion de Fichiers Excel/CSV (Backend) Status: review ## Story As a Julien (Analyst), I want to upload an Excel or CSV file, so that the system can read my production data. ## Acceptance Criteria 1. **Upload Endpoint:** A POST endpoint `/api/v1/upload` accepts `.xlsx`, `.xls`, and `.csv` files. 2. **File Validation:** Backend validates MIME type and file extension. Returns clear error for unsupported formats. 3. **Data Parsing:** Uses Pandas to read the file into a DataFrame. Handles multiple sheets (takes the first by default). 4. **Type Inference:** Backend automatically detects column types (int, float, string, date). 5. **Arrow Serialization:** Converts the DataFrame to an Apache Arrow Table and streams it using IPC format. 6. **Persistence (Ephemeral):** Temporarily saves the file metadata and a pointer to the dataset in memory (stateless session simulation). ## Tasks / Subtasks - [x] **API Route Implementation** (AC: 1, 2) - [x] Create `/backend/app/api/v1/upload.py`. - [x] Implement file upload using `FastAPI.UploadFile`. - [x] Add validation logic for extensions and MIME types. - [x] **Data Processing Logic** (AC: 3, 4) - [x] Implement `backend/app/core/engine/ingest.py` helper. - [x] Use `pandas` to read Excel/CSV. - [x] Basic data cleaning (strip whitespace from headers). - [x] **High-Performance Bridge** (AC: 5) - [x] Implement Arrow conversion using `pyarrow`. - [x] Set up `StreamingResponse` with `application/vnd.apache.arrow.stream`. - [x] **Session & Metadata** (AC: 6) - [x] Return column metadata (name, inferred type) in the response headers or as a separate JSON part. ## Dev Notes - **Performance:** For 50k rows, Arrow is mandatory. Zero-copy binary transfer implemented. - **Libraries:** Using `pandas`, `openpyxl`, and `pyarrow`. - **Type Safety:** Column metadata is stringified in the `X-Column-Metadata` header. ### Project Structure Notes - Created `backend/app/core/engine/ingest.py` for pure data logic. - Created `backend/app/api/v1/upload.py` for the FastAPI route. - Updated `backend/main.py` to include the router. ### References - [Source: architecture.md#API & Communication Patterns] - [Source: project-context.md#Data & State Architecture] ## Dev Agent Record ### Agent Model Used {{agent_model_name_version}} ### Completion Notes List - Implemented `/api/v1/upload` endpoint. - Added validation for `.xlsx`, `.xls`, and `.csv`. - Implemented automated type inference (numeric, categorical, date). - Successfully converted Pandas DataFrames to Apache Arrow IPC streams. - Verified with 3 automated tests (Health, CSV Upload, Error Handling). ### File List - /backend/app/api/v1/upload.py - /backend/app/core/engine/ingest.py - /backend/main.py - /backend/tests/test_upload.py