70 lines
2.7 KiB
Markdown
70 lines
2.7 KiB
Markdown
# Story 1.2: Ingestion de Fichiers Excel/CSV (Backend)
|
|
|
|
Status: review
|
|
|
|
## Story
|
|
|
|
As a Julien (Analyst),
|
|
I want to upload an Excel or CSV file,
|
|
so that the system can read my production data.
|
|
|
|
## Acceptance Criteria
|
|
|
|
1. **Upload Endpoint:** A POST endpoint `/api/v1/upload` accepts `.xlsx`, `.xls`, and `.csv` files.
|
|
2. **File Validation:** Backend validates MIME type and file extension. Returns clear error for unsupported formats.
|
|
3. **Data Parsing:** Uses Pandas to read the file into a DataFrame. Handles multiple sheets (takes the first by default).
|
|
4. **Type Inference:** Backend automatically detects column types (int, float, string, date).
|
|
5. **Arrow Serialization:** Converts the DataFrame to an Apache Arrow Table and streams it using IPC format.
|
|
6. **Persistence (Ephemeral):** Temporarily saves the file metadata and a pointer to the dataset in memory (stateless session simulation).
|
|
|
|
## Tasks / Subtasks
|
|
|
|
- [x] **API Route Implementation** (AC: 1, 2)
|
|
- [x] Create `/backend/app/api/v1/upload.py`.
|
|
- [x] Implement file upload using `FastAPI.UploadFile`.
|
|
- [x] Add validation logic for extensions and MIME types.
|
|
- [x] **Data Processing Logic** (AC: 3, 4)
|
|
- [x] Implement `backend/app/core/engine/ingest.py` helper.
|
|
- [x] Use `pandas` to read Excel/CSV.
|
|
- [x] Basic data cleaning (strip whitespace from headers).
|
|
- [x] **High-Performance Bridge** (AC: 5)
|
|
- [x] Implement Arrow conversion using `pyarrow`.
|
|
- [x] Set up `StreamingResponse` with `application/vnd.apache.arrow.stream`.
|
|
- [x] **Session & Metadata** (AC: 6)
|
|
- [x] Return column metadata (name, inferred type) in the response headers or as a separate JSON part.
|
|
|
|
## Dev Notes
|
|
|
|
- **Performance:** For 50k rows, Arrow is mandatory. Zero-copy binary transfer implemented.
|
|
- **Libraries:** Using `pandas`, `openpyxl`, and `pyarrow`.
|
|
- **Type Safety:** Column metadata is stringified in the `X-Column-Metadata` header.
|
|
|
|
### Project Structure Notes
|
|
|
|
- Created `backend/app/core/engine/ingest.py` for pure data logic.
|
|
- Created `backend/app/api/v1/upload.py` for the FastAPI route.
|
|
- Updated `backend/main.py` to include the router.
|
|
|
|
### References
|
|
|
|
- [Source: architecture.md#API & Communication Patterns]
|
|
- [Source: project-context.md#Data & State Architecture]
|
|
|
|
## Dev Agent Record
|
|
|
|
### Agent Model Used
|
|
|
|
{{agent_model_name_version}}
|
|
|
|
### Completion Notes List
|
|
- Implemented `/api/v1/upload` endpoint.
|
|
- Added validation for `.xlsx`, `.xls`, and `.csv`.
|
|
- Implemented automated type inference (numeric, categorical, date).
|
|
- Successfully converted Pandas DataFrames to Apache Arrow IPC streams.
|
|
- Verified with 3 automated tests (Health, CSV Upload, Error Handling).
|
|
|
|
### File List
|
|
- /backend/app/api/v1/upload.py
|
|
- /backend/app/core/engine/ingest.py
|
|
- /backend/main.py
|
|
- /backend/tests/test_upload.py |