2.7 KiB
2.7 KiB
Story 1.2: Ingestion de Fichiers Excel/CSV (Backend)
Status: review
Story
As a Julien (Analyst), I want to upload an Excel or CSV file, so that the system can read my production data.
Acceptance Criteria
- Upload Endpoint: A POST endpoint
/api/v1/uploadaccepts.xlsx,.xls, and.csvfiles. - File Validation: Backend validates MIME type and file extension. Returns clear error for unsupported formats.
- Data Parsing: Uses Pandas to read the file into a DataFrame. Handles multiple sheets (takes the first by default).
- Type Inference: Backend automatically detects column types (int, float, string, date).
- Arrow Serialization: Converts the DataFrame to an Apache Arrow Table and streams it using IPC format.
- Persistence (Ephemeral): Temporarily saves the file metadata and a pointer to the dataset in memory (stateless session simulation).
Tasks / Subtasks
- API Route Implementation (AC: 1, 2)
- Create
/backend/app/api/v1/upload.py. - Implement file upload using
FastAPI.UploadFile. - Add validation logic for extensions and MIME types.
- Create
- Data Processing Logic (AC: 3, 4)
- Implement
backend/app/core/engine/ingest.pyhelper. - Use
pandasto read Excel/CSV. - Basic data cleaning (strip whitespace from headers).
- Implement
- High-Performance Bridge (AC: 5)
- Implement Arrow conversion using
pyarrow. - Set up
StreamingResponsewithapplication/vnd.apache.arrow.stream.
- Implement Arrow conversion using
- Session & Metadata (AC: 6)
- Return column metadata (name, inferred type) in the response headers or as a separate JSON part.
Dev Notes
- Performance: For 50k rows, Arrow is mandatory. Zero-copy binary transfer implemented.
- Libraries: Using
pandas,openpyxl, andpyarrow. - Type Safety: Column metadata is stringified in the
X-Column-Metadataheader.
Project Structure Notes
- Created
backend/app/core/engine/ingest.pyfor pure data logic. - Created
backend/app/api/v1/upload.pyfor the FastAPI route. - Updated
backend/main.pyto include the router.
References
- [Source: architecture.md#API & Communication Patterns]
- [Source: project-context.md#Data & State Architecture]
Dev Agent Record
Agent Model Used
{{agent_model_name_version}}
Completion Notes List
- Implemented
/api/v1/uploadendpoint. - Added validation for
.xlsx,.xls, and.csv. - Implemented automated type inference (numeric, categorical, date).
- Successfully converted Pandas DataFrames to Apache Arrow IPC streams.
- Verified with 3 automated tests (Health, CSV Upload, Error Handling).
File List
- /backend/app/api/v1/upload.py
- /backend/app/core/engine/ingest.py
- /backend/main.py
- /backend/tests/test_upload.py