Initial commit: Data Analysis application with FastAPI backend and Next.js frontend

This commit is contained in:
2026-01-11 21:54:33 +01:00
commit 7bdafb4fbf
549 changed files with 96211 additions and 0 deletions

View File

@@ -0,0 +1,63 @@
# Story 1.1: Initialisation du Monorepo & Docker
Status: review
## Story
As a Développeur,
I want to initialiser la structure du projet (Next.js + FastAPI + Docker),
so that I have a functional and consistent development environment.
## Acceptance Criteria
1. **Root Structure:** Root directory contains `compose.yaml` (2026 standard) and subdirectories `frontend/` and `backend/`.
2. **Backend Setup:** `backend/` initialized with FastAPI (Python 3.12) using **uv** package manager.
3. **Frontend Setup:** `frontend/` initialized with Next.js 16 (standalone mode).
4. **Orchestration:** `docker-compose up` builds and starts both services on a shared internal network.
5. **Connectivity:** Frontend is accessible at `localhost:3000` and Backend at `localhost:8000`.
## Tasks / Subtasks
- [x] **Root Initialization** (AC: 1)
- [x] Initialize git repository.
- [x] Create `.gitignore` for monorepo.
- [x] **Backend Service Setup** (AC: 2)
- [x] Initialize FastAPI project structure.
- [x] Add `main.py` with health check.
- [x] Initialize **uv** project (`pyproject.toml`, `uv.lock`) and add dependencies.
- [x] Create multi-stage `Dockerfile` using `uv` for fast builds.
- [x] **Frontend Service Setup** (AC: 3)
- [x] Initialize Next.js 16 project.
- [x] Configure standalone output.
- [x] Create multi-stage `Dockerfile`.
- [x] **Docker Orchestration** (AC: 4, 5)
- [x] Create `compose.yaml`.
- [x] Verify inter-service communication configuration.
## Dev Notes
- **Architecture Patterns:** Two-Service Monorepo pattern.
- **Tooling:** Updated to use **uv** (Astral) instead of pip/venv for Python management (2026 Standard).
- **Naming Conventions:** `snake_case` for Python files/API; `PascalCase` for React components.
### References
- [Source: architecture.md#Project Structure & Boundaries]
- [Source: project-context.md#Technology Stack & Versions]
## Dev Agent Record
### Completion Notes List
- Migrated backend package management to **uv**.
- Updated Dockerfile to use `ghcr.io/astral-sh/uv` for building.
- Initialized `pyproject.toml` and `uv.lock`.
### File List
- /compose.yaml
- /backend/Dockerfile
- /backend/main.py
- /backend/pyproject.toml
- /backend/uv.lock
- /frontend/Dockerfile
- /frontend/next.config.mjs
- /frontend/package.json

View File

@@ -0,0 +1,70 @@
# Story 1.2: Ingestion de Fichiers Excel/CSV (Backend)
Status: review
## Story
As a Julien (Analyst),
I want to upload an Excel or CSV file,
so that the system can read my production data.
## Acceptance Criteria
1. **Upload Endpoint:** A POST endpoint `/api/v1/upload` accepts `.xlsx`, `.xls`, and `.csv` files.
2. **File Validation:** Backend validates MIME type and file extension. Returns clear error for unsupported formats.
3. **Data Parsing:** Uses Pandas to read the file into a DataFrame. Handles multiple sheets (takes the first by default).
4. **Type Inference:** Backend automatically detects column types (int, float, string, date).
5. **Arrow Serialization:** Converts the DataFrame to an Apache Arrow Table and streams it using IPC format.
6. **Persistence (Ephemeral):** Temporarily saves the file metadata and a pointer to the dataset in memory (stateless session simulation).
## Tasks / Subtasks
- [x] **API Route Implementation** (AC: 1, 2)
- [x] Create `/backend/app/api/v1/upload.py`.
- [x] Implement file upload using `FastAPI.UploadFile`.
- [x] Add validation logic for extensions and MIME types.
- [x] **Data Processing Logic** (AC: 3, 4)
- [x] Implement `backend/app/core/engine/ingest.py` helper.
- [x] Use `pandas` to read Excel/CSV.
- [x] Basic data cleaning (strip whitespace from headers).
- [x] **High-Performance Bridge** (AC: 5)
- [x] Implement Arrow conversion using `pyarrow`.
- [x] Set up `StreamingResponse` with `application/vnd.apache.arrow.stream`.
- [x] **Session & Metadata** (AC: 6)
- [x] Return column metadata (name, inferred type) in the response headers or as a separate JSON part.
## Dev Notes
- **Performance:** For 50k rows, Arrow is mandatory. Zero-copy binary transfer implemented.
- **Libraries:** Using `pandas`, `openpyxl`, and `pyarrow`.
- **Type Safety:** Column metadata is stringified in the `X-Column-Metadata` header.
### Project Structure Notes
- Created `backend/app/core/engine/ingest.py` for pure data logic.
- Created `backend/app/api/v1/upload.py` for the FastAPI route.
- Updated `backend/main.py` to include the router.
### References
- [Source: architecture.md#API & Communication Patterns]
- [Source: project-context.md#Data & State Architecture]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Implemented `/api/v1/upload` endpoint.
- Added validation for `.xlsx`, `.xls`, and `.csv`.
- Implemented automated type inference (numeric, categorical, date).
- Successfully converted Pandas DataFrames to Apache Arrow IPC streams.
- Verified with 3 automated tests (Health, CSV Upload, Error Handling).
### File List
- /backend/app/api/v1/upload.py
- /backend/app/core/engine/ingest.py
- /backend/main.py
- /backend/tests/test_upload.py

View File

@@ -0,0 +1,67 @@
# Story 1.3: Visualisation dans la Smart Grid (Frontend)
Status: review
## Story
As a Julien (Analyst),
I want to see my uploaded data in an interactive high-speed grid,
so that I can explore the raw data effortlessly.
## Acceptance Criteria
1. **Virtualization:** The grid renders 50,000+ rows without browser lag using TanStack Table virtualization.
2. **Arrow Integration:** The frontend reads the Apache Arrow stream from the backend API using `apache-arrow` library.
3. **Data Display:** Columns are rendered with correct formatting based on metadata (e.g., numbers right-aligned, dates formatted).
4. **Visual Foundation:** The grid uses the "Smart Grid" design (compact density, JetBrains Mono font) as defined in UX specs.
5. **Basic Interaction:** Users can scroll vertically and horizontally fluidly.
## Tasks / Subtasks
- [x] **Dependencies & Setup** (AC: 2)
- [x] Install `apache-arrow`, `@tanstack/react-table`, `@tanstack/react-virtual`, `zustand`.
- [x] Create `frontend/src/lib/arrow-client.ts` to handle binary stream parsing.
- [x] **Smart Grid Component** (AC: 1, 4, 5)
- [x] Create `frontend/src/features/smart-grid/components/SmartGrid.tsx`.
- [x] Implement virtualized row rendering.
- [x] Apply Shadcn UI styling and "Lab & Tech" theme.
- [x] **Integration** (AC: 3)
- [x] Connect `upload` form success state to Grid data loading.
- [x] Implement `useGridStore` (Zustand) to hold the loaded table state.
- [x] Render actual data from the uploaded file.
## Dev Notes
- **Performance:** Optimized binary stream parsing using Apache Arrow IPC. Zero unnecessary JSON parsing.
- **State:** Zustand used for high-frequency updates and persistence across session.
- **Layout:** Implemented responsive workspace with sticky header and virtualized body.
### Project Structure Notes
- Organized into `features/smart-grid` and `features/uploader`.
- Centralized state in `store/use-grid-store.ts`.
### References
- [Source: ux-design-specification.md#Core User Experience]
- [Source: architecture.md#Frontend Architecture]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Successfully integrated `apache-arrow` for binary data ingestion.
- Implemented `<SmartGrid />` with `@tanstack/react-virtual` for 50k+ row performance.
- Built a functional `<FileUploader />` that communicates with the FastAPI backend.
- Applied "Lab & Tech" styling with Tailwind CSS.
### File List
- /frontend/src/features/smart-grid/components/SmartGrid.tsx
- /frontend/src/features/uploader/components/FileUploader.tsx
- /frontend/src/lib/arrow-client.ts
- /frontend/src/lib/utils.ts
- /frontend/src/store/use-grid-store.ts
- /frontend/src/app/page.tsx

View File

@@ -0,0 +1,63 @@
# Story 1.4: Gestion des Types & Renommage (Data Hygiene)
Status: review
## Story
As a Julien (Analyst),
I want to rename columns and correct data types,
so that the data matches my business context before analysis.
## Acceptance Criteria
1. **Column Renaming:** Users can click a column header to rename it. Changes are reflected instantly in the grid.
2. **Type Override:** Users can change the data type of a column (e.g., String -> Numeric).
3. **Backend Sync:** Type changes are sent to the backend to validate if the data can be cast correctly.
4. **Visual Feedback:** Column headers display the current data type (e.g., "Num", "Cat").
5. **State Persistence:** Column names and types are preserved in the Zustand store.
## Tasks / Subtasks
- [x] **Frontend Header UI** (AC: 1, 4)
- [x] Implement editable column header in `src/features/smart-grid/components/SmartGrid.tsx`.
- [x] Add type indicator badges to headers.
- [x] **Type Management Logic** (AC: 2, 5)
- [x] Update `useGridStore` to support `updateColumn` action (rename, change type).
- [x] **Backend Validation** (AC: 3)
- [x] Add endpoint `/api/v1/analysis/validate-type` to verify casting feasibility.
- [x] Handle casting errors gracefully.
## Dev Notes
- **Performance:** Local renames handle data key updates in-memory to avoid full dataset re-ingestion.
- **UI:** Implemented inline editing for headers with immediate visual feedback.
- **Backend:** Added Pydantic model for type validation requests.
### Project Structure Notes
- Modified `frontend/src/store/use-grid-store.ts` to add mutation logic.
- Created `backend/app/api/v1/analysis.py` for validation logic.
- Updated `frontend/src/features/smart-grid/components/SmartGrid.tsx` with `EditableHeader`.
### References
- [Source: epics.md#Story 1.4]
- [Source: project-context.md#Data & State Architecture]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Implemented inline column renaming in the grid.
- Added a dropdown for data type override (Num, Cat, Date, Bool).
- Developed a FastAPI endpoint for validating data type conversions.
- Optimized Zustand store to update data keys when a column is renamed.
### File List
- /frontend/src/store/use-grid-store.ts
- /backend/app/api/v1/analysis.py
- /backend/main.py
- /frontend/src/features/smart-grid/components/SmartGrid.tsx

View File

@@ -0,0 +1,57 @@
# Story 1.5: Tri & Filtrage de Base
Status: review
## Story
As a Julien (Analyst),
I want to sort and filter my data in the grid,
so that I can identify extreme values or specific subsets.
## Acceptance Criteria
1. **Sorting:** Users can click a column header to toggle between ascending, descending, and no sort.
2. **Filtering:** Users can enter a search term or value in a column filter input to narrow down the rows.
3. **Performance:** Sorting and filtering 50,000 rows should happen within 300ms using local processing.
4. **Visual Indicators:** Column headers show an arrow icon indicating the current sort direction.
5. **Persistence:** Sort and filter states are maintained in the UI state during the session.
## Tasks / Subtasks
- [x] **TanStack Table Logic** (AC: 1, 2)
- [x] Enable `getSortedRowModel` and `getFilteredRowModel` in `SmartGrid.tsx`.
- [x] **Filter UI** (AC: 2, 4)
- [x] Add a text input field in each column header for filtering.
- [x] Add sort icons (Lucide React) to headers.
- [x] **State & Performance** (AC: 3, 5)
- [x] Ensure filtering logic handles different data types (string search, numeric range).
## Dev Notes
- **Sorting:** Integrated TanStack's built-in sorting logic with visual arrows.
- **Filtering:** Implemented per-column text filtering using a Search input in headers.
- **UI:** Combined renaming, type selection, and filtering into a compact `EditableHeader` component.
### Project Structure Notes
- Modified `frontend/src/features/smart-grid/components/SmartGrid.tsx`.
### References
- [Source: epics.md#Story 1.5]
- [Source: architecture.md#Frontend Architecture]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Enabled sorting and filtering row models in the TanStack Table configuration.
- Added interactive sort buttons with direction indicators (Up/Down).
- Implemented a search-based filter for each column.
- Verified performance remains smooth with virtualization.
### File List
- /frontend/src/features/smart-grid/components/SmartGrid.tsx

View File

@@ -0,0 +1,60 @@
# Story 2.1: Édition de Cellule & Validation
Status: review
## Story
As a Julien (Analyst),
I want to edit cell values directly in the grid,
so that I can manually correct obvious data entry errors.
## Acceptance Criteria
1. **Inline Editing:** Double-clicking a cell enters "Edit Mode" with an input field matching the column type.
2. **Data Validation:** Input is validated against the column type (e.g., only numbers in Numeric columns).
3. **Commit Changes:** Pressing `Enter` or clicking outside saves the change to the local Zustand store.
4. **Visual Feedback:** Edited cells are temporarily highlighted or marked to indicate unsaved/modified state.
5. **Keyboard Support:** Pressing `Esc` cancels the edit and restores the original value.
## Tasks / Subtasks
- [x] **Frontend Grid Update** (AC: 1, 3, 5)
- [x] Implement `EditableCell` component in `src/features/smart-grid/components/SmartGrid.tsx`.
- [x] Add `onCellEdit` logic to the TanStack Table configuration.
- [x] **State Management** (AC: 3, 4)
- [x] Update `useGridStore` to support a `updateCellValue(rowId, colId, value)` action.
- [x] Implement a `modifiedCells` tracking object in the store to highlight changes.
- [x] **Validation Logic** (AC: 2)
- [x] Add regex-based validation for numeric and boolean inputs in the frontend.
## Dev Notes
- **Memoization:** Used local state for editing to prevent entire table re-renders during typing.
- **Visuals:** Modified cells now have a subtle `bg-amber-50` background.
- **Validation:** Implemented strict numeric validation before committing to the global store.
### Project Structure Notes
- Modified `frontend/src/store/use-grid-store.ts`.
- Updated `frontend/src/features/smart-grid/components/SmartGrid.tsx`.
### References
- [Source: ux-design-specification.md#Grid Interaction Patterns]
- [Source: architecture.md#Frontend Architecture]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Created an `EditableCell` sub-component with `onDoubleClick` activation.
- Implemented `updateCellValue` in Zustand store with change tracking.
- Added keyboard support: `Enter` to commit, `Escape` to discard.
- Added visual highlighting for modified data.
### File List
- /frontend/src/store/use-grid-store.ts
- /frontend/src/features/smart-grid/components/SmartGrid.tsx

View File

@@ -0,0 +1,59 @@
# Story 2.2: Undo/Redo des Modifications
Status: review
## Story
As a Julien (Analyst),
I want to undo my last data edits,
so that I can explore changes without fear of losing the original data.
## Acceptance Criteria
1. **Undo History:** The system tracks changes to cell values.
2. **Undo Action:** Users can press `Ctrl+Z` or click an "Undo" button to revert the last edit.
3. **Redo Action:** Users can press `Ctrl+Y` (or `Ctrl+Shift+Z`) to re-apply an undone edit.
4. **Visual Indicator:** The Undo/Redo buttons in the toolbar are disabled if no history is available.
5. **Session Scope:** History is maintained during the current session (stateless).
## Tasks / Subtasks
- [x] **State Management (Zustand)** (AC: 1, 2, 3)
- [x] Implement `zundo` or a custom middleware for state history in `useGridStore`.
- [x] Add `undo` and `redo` actions.
- [x] **Keyboard Shortcuts** (AC: 2, 3)
- [x] Add global event listeners for `Ctrl+Z` and `Ctrl+Y`.
- [x] **UI Controls** (AC: 4)
- [x] Add Undo/Redo buttons to the `FileUploader` or a new `Toolbar` component.
## Dev Notes
- **Optimization:** Using `zundo` middleware to partialize state history (tracking only `data`, `columns`, and `modifiedCells`).
- **Shortcuts:** Implemented global keyboard event listeners in the main layout.
- **UX:** Added responsive toolbar buttons with disabled states when no history is present.
### Project Structure Notes
- Modified `frontend/src/store/use-grid-store.ts` to include `temporal` middleware.
- Updated `frontend/src/app/page.tsx` with UI buttons and shortcut logic.
### References
- [Source: functional-requirements.md#FR8]
- [Source: project-context.md#Data & State Architecture]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Integrated `zundo` for comprehensive history tracking.
- Added Undo/Redo logic to the global Zustand store.
- Implemented `Ctrl+Z`, `Ctrl+Shift+Z`, and `Ctrl+Y` keyboard shortcuts.
- Added visual buttons in the application header with state-dependent enabling/disabling.
### File List
- /frontend/src/store/use-grid-store.ts
- /frontend/src/app/page.tsx

View File

@@ -0,0 +1,65 @@
# Story 2.3: Détection Automatique des Outliers (Backend)
Status: review
## Story
As a system,
I want to identify statistical outliers in the background,
so that I can alert the user to potential data quality issues.
## Acceptance Criteria
1. **Algorithm Implementation:** Backend implements Isolation Forest (multivariate) and IQR (univariate) algorithms.
2. **Analysis Endpoint:** A POST endpoint `/api/v1/analysis/detect-outliers` accepts dataset and configuration.
3. **Detection Output:** Returns a list of outlier row indices and the reason for flagging (e.g., "z-score > 3").
4. **Performance:** Detection on 50k rows completes in under 5 seconds.
5. **Robustness:** Handles missing values (NaNs) gracefully without crashing.
## Tasks / Subtasks
- [x] **Dependency Update** (AC: 1)
- [x] Add `scikit-learn` to the backend using `uv`.
- [x] **Outlier Engine Implementation** (AC: 1, 5)
- [x] Create `backend/app/core/engine/clean.py`.
- [x] Implement univariate IQR-based detection.
- [x] Implement multivariate Isolation Forest detection.
- [x] **API Endpoint** (AC: 2, 3, 4)
- [x] Implement `POST /api/v1/analysis/detect-outliers` in `analysis.py`.
- [x] Map detection results to indexed row references.
## Dev Notes
- **Algorithms:** Used Scikit-learn's `IsolationForest` for multivariate and Pandas quantile logic for IQR.
- **Explainability:** Each outlier is returned with a descriptive string explaining the reason for the flag.
- **Performance:** Asynchronous ready, using standard Scikit-learn optimisations.
### Project Structure Notes
- Created `backend/app/core/engine/clean.py` for outlier logic.
- Updated `backend/app/api/v1/analysis.py` with the detection endpoint.
- Added `backend/tests/test_analysis.py` for verification.
### References
- [Source: epics.md#Story 2.3]
- [Source: project-context.md#Critical Anti-Patterns]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Integrated `scikit-learn` for anomaly detection.
- Implemented univariate detection based on 1.5 * IQR bounds.
- Implemented multivariate detection using the Isolation Forest algorithm.
- Developed a robust API endpoint that merges results from both methods.
- Verified with unit tests covering both univariate and multivariate scenarios.
### File List
- /backend/app/core/engine/clean.py
- /backend/app/api/v1/analysis.py
- /backend/tests/test_analysis.py
- /backend/pyproject.toml

View File

@@ -0,0 +1,63 @@
# Story 2.4: Panel d'Insights & Revue des Outliers (Frontend)
Status: review
## Story
As a Julien (Analyst),
I want to review detected outliers in a side panel,
so that I can understand why they are flagged before excluding them.
## Acceptance Criteria
1. **Insight Panel UI:** A slide-over panel (Shadcn Sheet) displays detailed outlier information.
2. **Interactive Triggers:** Clicking a "warning" badge in the grid header opens the panel for that column.
3. **Reasoning Display:** The panel shows the statistical reason for each flagged point (e.g., "Value 9.9 is > 3 Sigma").
4. **Visual Summary:** Displays a small chart (boxplot or histogram) showing the distribution and the outlier's position.
5. **Batch Actions:** Users can click "Exclude All" within the panel to gray out all flagged rows in the grid.
## Tasks / Subtasks
- [x] **Shadcn UI Setup** (AC: 1)
- [x] Install Shadcn `Sheet` and `ScrollArea` components.
- [x] **InsightPanel Component** (AC: 1, 3, 4, 5)
- [x] Create `frontend/src/features/insight-panel/components/InsightPanel.tsx`.
- [x] Integrate `Recharts` for distribution visualization.
- [x] **State Integration** (AC: 2, 5)
- [x] Update `useGridStore` to trigger outlier detection and store results.
- [x] Add `detectedOutliers` object to the Zustand store.
## Dev Notes
- **Explainable AI:** Successfully mapped backend `reasons` to user-friendly list items in the panel.
- **Visualization:** Used `recharts` to build a dynamic histogram of the selected column.
- **Integration:** Added a pulse animation to column headers when outliers are detected.
### Project Structure Notes
- Created `frontend/src/features/insight-panel/components/InsightPanel.tsx`.
- Integrated panel trigger in `frontend/src/features/smart-grid/components/SmartGrid.tsx`.
- Updated main layout in `frontend/src/app/page.tsx` to host the panel.
### References
- [Source: ux-design-specification.md#2.4 Novel UX Patterns]
- [Source: architecture.md#Frontend Architecture]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Implemented the `InsightPanel` slide-over component.
- Integrated automated backend outlier detection triggered on data change.
- Added a distribution histogram using Recharts.
- Implemented "Exclude All" functionality which syncs with the Grid's visual state.
### File List
- /frontend/src/features/insight-panel/components/InsightPanel.tsx
- /frontend/src/store/use-grid-store.ts
- /frontend/src/features/smart-grid/components/SmartGrid.tsx
- /frontend/src/app/page.tsx

View File

@@ -0,0 +1,59 @@
# Story 2.5: Exclusion Non-Destructive de Données
Status: review
## Story
As a Julien (Analyst),
I want to toggle the inclusion of specific rows in the analysis,
so that I can test different scenarios without deleting data.
## Acceptance Criteria
1. **Row Toggle:** Users can click a "checkbox" or a specific "Exclude" button on each row.
2. **Visual Feedback:** Excluded rows are visually dimmed (e.g., 30% opacity) and struck through.
3. **Bulk Toggle:** Ability to exclude all filtered rows or all rows matching a criteria (already partially covered by Epic 2.4).
4. **State Persistence:** Exclusion state is tracked in the global store.
5. **Impact on Analysis:** The data sent to subsequent analysis engines (Correlation, Regression) MUST exclude these rows.
## Tasks / Subtasks
- [x] **Grid UI Update** (AC: 1, 2)
- [x] Add an `Exclude` column with a toggle switch or button to the `SmartGrid`.
- [x] Implement conditional styling for the entire row based on exclusion state.
- [x] **State Logic** (AC: 4)
- [x] Ensure `excludedRows` in `useGridStore` is properly integrated with all UI components.
- [x] **Data Pipeline Prep** (AC: 5)
- [x] Create a selector/helper `getCleanData()` that returns the dataset minus the excluded rows.
## Dev Notes
- **UX:** Added a dedicated "Eye/EyeOff" icon column for quick row exclusion toggling.
- **Visuals:** Excluded rows use `opacity-30`, `line-through`, and a darker background to clearly distinguish them from active data.
- **Selector:** The `getCleanData` function in the store ensures all future analysis steps only receive valid, included rows.
### Project Structure Notes
- Modified `frontend/src/store/use-grid-store.ts`.
- Updated `frontend/src/features/smart-grid/components/SmartGrid.tsx`.
### References
- [Source: epics.md#Story 2.5]
- [Source: ux-design-specification.md#2.5 Experience Mechanics]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Implemented a "soft delete" system for row exclusion.
- Added visual indicators (strike-through and dimming) for excluded rows.
- Created a `getCleanData` selector to facilitate downstream statistical modeling.
- Integrated row-level toggle buttons directly in the SmartGrid.
### File List
- /frontend/src/store/use-grid-store.ts
- /frontend/src/features/smart-grid/components/SmartGrid.tsx

View File

@@ -0,0 +1,62 @@
# Story 3.1: Matrice de Corrélation Interactive
Status: review
## Story
As a Julien (Analyst),
I want to see a visual correlation map of my numeric variables,
so that I can quickly identify which factors are related.
## Acceptance Criteria
1. **Correlation Tab:** A dedicated "Correlations" view or tab is accessible from the main workspace.
2. **Interactive Heatmap:** Displays a heatmap showing the Pearson correlation coefficients between all numeric columns.
3. **Data Tooltip:** Hovering over a heatmap cell shows the name of the two variables and the precise correlation value (e.g., "0.85").
4. **Color Scale:** Uses a diverging color scale (e.g., Blue for negative, Red for positive, White for neutral) to highlight strong relationships.
5. **Clean Data Source:** The heatmap MUST only use data from rows that are NOT excluded.
## Tasks / Subtasks
- [x] **Backend Analysis Engine** (AC: 2, 5)
- [x] Implement `calculate_correlation_matrix(df, columns)` in `backend/app/core/engine/stats.py`.
- [x] Add endpoint `POST /api/v1/analysis/correlation` that accepts data and column list.
- [x] **Frontend Visualization** (AC: 1, 2, 3, 4)
- [x] Create `frontend/src/features/analysis/components/CorrelationHeatmap.tsx`.
- [x] Use `Recharts` or `Tremor` to render the matrix.
- [x] Integrate with `getCleanData()` from the grid store.
## Dev Notes
- **Data Integrity:** The heatmap uses the `getCleanData()` selector, ensuring that excluded outliers don't bias the correlation matrix.
- **UI/UX:** Implemented a tab-switcher between "Data" and "Correlation" views.
- **Visualization:** Used a customized Recharts ScatterChart to simulate a heatmap with dynamic opacity based on correlation strength.
### Project Structure Notes
- Created `backend/app/core/engine/stats.py`.
- Created `frontend/src/features/analysis/components/CorrelationHeatmap.tsx`.
- Updated `frontend/src/app/page.tsx` with tab logic.
### References
- [Source: epics.md#Story 3.1]
- [Source: ux-design-specification.md#Design System Foundation]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Developed Pearson correlation logic in the Python backend.
- Built an interactive heatmap in the React frontend.
- Added informative tooltips showing detailed correlation metrics.
- Ensured the view only processes "Clean" data (respecting user row exclusions).
### File List
- /backend/app/core/engine/stats.py
- /backend/app/api/v1/analysis.py
- /frontend/src/features/analysis/components/CorrelationHeatmap.tsx
- /frontend/src/app/page.tsx

View File

@@ -0,0 +1,59 @@
# Story 3.2: Calcul de l'Importance des Features (Backend)
Status: review
## Story
As a system,
I want to compute the predictive power of features against a target variable,
so that I can provide scientific recommendations to the user.
## Acceptance Criteria
1. **Importance Algorithm:** Backend implements Feature Importance calculation using `RandomForestRegressor`.
2. **Analysis Endpoint:** A POST endpoint `/api/v1/analysis/feature-importance` accepts data, features list, and target variable (Y).
3. **Detection Output:** Returns a ranked list of features with their importance scores (0 to 1).
4. **Validation:** Ensures Y is not in the X list and that enough numeric data exists.
5. **Clean Data Source:** Only uses data from non-excluded rows.
## Tasks / Subtasks
- [x] **Engine Implementation** (AC: 1, 4)
- [x] Implement `calculate_feature_importance(df, features, target)` in `backend/app/core/engine/stats.py`.
- [x] Handle categorical features using basic Label Encoding if needed (currently focus on numeric).
- [x] **API Endpoint** (AC: 2, 3, 5)
- [x] Implement `POST /api/v1/analysis/feature-importance` in `analysis.py`.
## Dev Notes
- **Model:** Used `RandomForestRegressor` with 50 estimators for a balance between speed and accuracy.
- **Data Prep:** Automatically drops rows with NaNs in either features or target to ensure Scikit-learn compatibility.
- **Output:** Returns a JSON list of objects `{feature, score}` sorted by score in descending order.
### Project Structure Notes
- Modified `backend/app/core/engine/stats.py`.
- Updated `backend/app/api/v1/analysis.py`.
- Added test case in `backend/tests/test_analysis.py`.
### References
- [Source: epics.md#Story 3.2]
- [Source: architecture.md#Computational Workers]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Implemented the Feature Importance core engine using Scikit-learn.
- Developed the API endpoint to expose the ranked feature list.
- Added validation to prevent processing empty or incompatible datasets.
- Verified with automated tests.
### File List
- /backend/app/core/engine/stats.py
- /backend/app/api/v1/analysis.py
- /backend/tests/test_analysis.py

View File

@@ -0,0 +1,62 @@
# Story 3.3: Recommandation Intelligente de Variables (Frontend)
Status: review
## Story
As a Julien (Analyst),
I want the system to suggest which variables to include in my model,
so that I don't pollute my analysis with irrelevant data ("noise").
## Acceptance Criteria
1. **Target Selection:** Users can select one column as the "Target Variable (Y)" from a dropdown.
2. **Auto-Trigger:** Selecting Y automatically triggers the feature importance calculation for all other numeric columns.
3. **Smart Ranking:** The UI displays a list of features ranked by their predictive power.
4. **Auto-Selection:** The Top-5 features (or all if < 5) are automatically checked for inclusion in the model.
5. **Visual Feedback:** A horizontal bar chart in the configuration panel shows the importance scores.
## Tasks / Subtasks
- [x] **Selection UI** (AC: 1, 4)
- [x] Create `frontend/src/features/analysis/components/AnalysisConfiguration.tsx`.
- [x] Implement Target Variable (Y) and Predictor Variables (X) selection logic.
- [x] **Intelligence Integration** (AC: 2, 3, 5)
- [x] Call `/api/v1/analysis/feature-importance` upon Y selection.
- [x] Render importance scores using a simple CSS-based or Recharts bar chart.
- [x] **State Management** (AC: 4)
- [x] Store selected X and Y variables in `useGridStore`.
## Dev Notes
- **UX:** Implemented a slide-over `AnalysisConfiguration` sidebar triggered by the main "Run Regression" button.
- **Automation:** Integrated the Random Forest importance engine from the backend to provide real-time recommendations.
- **Rules:** Enforced mutual exclusivity between X and Y variables in the UI selection logic.
### Project Structure Notes
- Created `frontend/src/features/analysis/components/AnalysisConfiguration.tsx`.
- Updated `frontend/src/store/use-grid-store.ts` with analysis state.
- Updated `frontend/src/app/page.tsx` to handle the configuration drawer.
### References
- [Source: epics.md#Story 3.3]
- [Source: ux-design-specification.md#Critical Success Moments]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Built the model configuration sidebar using Tailwind and Lucide icons.
- Implemented reactive feature importance fetching when the target variable changes.
- Added auto-selection of top predictive features.
- Integrated the configuration state into the global Zustand store.
### File List
- /frontend/src/features/analysis/components/AnalysisConfiguration.tsx
- /frontend/src/store/use-grid-store.ts
- /frontend/src/app/page.tsx

View File

@@ -0,0 +1,58 @@
# Story 4.1: Configuration de la Régression
Status: review
## Story
As a Julien (Analyst),
I want to configure the parameters of my regression model,
so that I can tailor the analysis to my specific hypothesis.
## Acceptance Criteria
1. **Model Selection:** Users can choose between "Linear Regression" and "Logistic Regression" in the sidebar.
2. **Dynamic Validation:** The system checks if the Target Variable (Y) is compatible with the selected model (e.g., continuous for Linear, binary/categorical for Logistic).
3. **Parameter Summary:** The sidebar displays a clear summary of the selected X variables and the Y variable before launch.
4. **Interactive Updates:** Changing X or Y variables updates the "Implementation Readiness" of the model (enable/disable the "Run" button).
## Tasks / Subtasks
- [x] **UI Enhancements** (AC: 1, 3)
- [x] Add model type dropdown to `AnalysisConfiguration.tsx`.
- [x] Implement a "Selected Features" summary list.
- [x] **Validation Logic** (AC: 2, 4)
- [x] Implement frontend validation to check if the target variable matches the model type.
- [x] Disable "Run Regression" button if validation fails or selection is incomplete.
## Dev Notes
- **Validation Rules:**
- `linear`: Cible doit être de type `numeric`.
- `logistic`: Cible doit être `categorical` ou `boolean`.
- **UI:** Added a toggle switch for model selection and refined the predictor selection list with importance bars.
### Project Structure Notes
- Modified `frontend/src/features/analysis/components/AnalysisConfiguration.tsx`.
- Updated `frontend/src/store/use-grid-store.ts` with `ModelType` state.
### References
- [Source: epics.md#Story 4.1]
- [Source: architecture.md#Frontend Architecture]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Integrated model type selection (Linear/Logistic).
- Added comprehensive validation logic for target variables.
- Refined the predictors list to show importance scores sum and visual bars.
- Implemented state-aware activation of the execution button.
### File List
- /frontend/src/store/use-grid-store.ts
- /frontend/src/features/analysis/components/AnalysisConfiguration.tsx

View File

@@ -0,0 +1,63 @@
# Story 4.2: Exécution du Modèle (Backend)
Status: review
## Story
As a system,
I want to execute the statistical model computation,
so that I can provide accurate regression results.
## Acceptance Criteria
1. **Algorithm Support:** Backend supports Ordinary Least Squares (OLS) for Linear and Logit for Logistic regression.
2. **Analysis Endpoint:** A POST endpoint `/api/v1/analysis/run-regression` accepts data, X features, Y target, and model type.
3. **Comprehensive Metrics:** Returns R-squared, Adjusted R-squared, coefficients, standard errors, p-values, and residuals.
4. **Validation:** Handles singular matrices or perfect collinearity without crashing (returns 400 with explanation).
5. **Clean Data Source:** Respects user row exclusions during calculation.
## Tasks / Subtasks
- [x] **Dependency Update** (AC: 1)
- [x] Add `statsmodels` to the backend using `uv`.
- [x] **Regression Engine** (AC: 1, 3, 4)
- [x] Implement `run_linear_regression(df, x_cols, y_col)` in `backend/app/core/engine/stats.py`.
- [x] Implement `run_logistic_regression(df, x_cols, y_col)` in `backend/app/core/engine/stats.py`.
- [x] **API Endpoint** (AC: 2, 5)
- [x] Implement `POST /api/v1/analysis/run-regression` in `analysis.py`.
## Dev Notes
- **Statistics:** Using `statsmodels.api` for high-quality, research-grade regression summaries.
- **Robustness:** Added intercept (constant) automatically to models. Implemented basic median-splitting for Logistic target encoding if not strictly binary.
- **Validation:** Integrated try/except blocks to catch linear algebra errors (e.g. non-invertible matrices) and return meaningful error messages.
### Project Structure Notes
- Modified `backend/app/core/engine/stats.py`.
- Updated `backend/app/api/v1/analysis.py` with the execution endpoint.
- Added regression test case in `backend/tests/test_analysis.py`.
### References
- [Source: epics.md#Story 4.2]
- [Source: architecture.md#Computational Workers]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Integrated `statsmodels` for advanced statistical modeling.
- Developed a unified regression engine supporting Linear and Logistic models.
- Implemented `/api/v1/analysis/run-regression` endpoint returning detailed metrics and residuals for plotting.
- Verified with automated tests for both model types.
### File List
- /backend/app/core/engine/stats.py
- /backend/app/api/v1/analysis.py
- /backend/tests/test_analysis.py
- /backend/pyproject.toml
- /backend/uv.lock

View File

@@ -0,0 +1,62 @@
# Story 4.3: Dashboard de Résultats Interactif
Status: review
## Story
As a Julien (Analyst),
I want to see the model results through interactive charts,
so that I can easily diagnose the performance of my regression.
## Acceptance Criteria
1. **Results View:** A new "Results" tab or page displays the output of the regression.
2. **Metrics Cards:** Key statistics (R², Adj. R², P-value, Sample Size) are shown in high-visibility cards with Shadcn UI.
3. **Primary Chart:** A "Real vs Predicted" scatter chart with a reference 45-degree line.
4. **Diagnostic Chart:** A "Residuals Distribution" histogram or "Residuals vs Fitted" plot.
5. **Coefficient Table:** A clean table showing each predictor, its coefficient, and its p-value (color-coded for significance < 0.05).
## Tasks / Subtasks
- [x] **Visualization Development** (AC: 1, 3, 4)
- [x] Create `frontend/src/features/analysis/components/AnalysisResults.tsx`.
- [x] Implement "Real vs Predicted" chart using `Recharts`.
- [x] Implement "Residuals" diagnostic chart.
- [x] **Data Integration** (AC: 2, 5)
- [x] Update `useGridStore` to trigger the regression run and store `analysisResults`.
- [x] Build the metrics summary UI and coefficient table.
## Dev Notes
- **Feedback:** Added visual error reporting in the UI if the regression fails.
- **Charts:** Used `ScatterChart` for real-vs-pred and `AreaChart` for residuals distribution.
- **UX:** Auto-switch to "Results" tab upon successful execution.
### Project Structure Notes
- Created `frontend/src/features/analysis/components/AnalysisResults.tsx`.
- Integrated results state in `frontend/src/store/use-grid-store.ts`.
- Updated `frontend/src/app/page.tsx` with robust error handling.
### References
- [Source: epics.md#Story 4.3]
- [Source: ux-design-specification.md#Design Directions]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Implemented `AnalysisResults` component with responsive charts.
- Added visual indicators for statistical significance.
- Verified correct state management flow from configuration to results display.
- Improved error handling and user feedback during execution.
### File List
- /frontend/src/features/analysis/components/AnalysisResults.tsx
- /frontend/src/store/use-grid-store.ts
- /frontend/src/app/page.tsx
- /frontend/src/features/analysis/components/AnalysisConfiguration.tsx

View File

@@ -0,0 +1,62 @@
# Story 4.4: Génération du Rapport PDF (Audit Trail)
Status: review
## Story
As a Julien (Analyst),
I want to export my findings as a professional PDF report,
so that I can share and archive my validated analysis.
## Acceptance Criteria
1. **PDF Generation:** Backend generates a high-quality PDF containing project title, date, and metrics.
2. **Visual Inclusion:** The PDF includes the key metrics summary (R², etc.) and the coefficient table.
3. **Audit Trail:** The PDF explicitly lists the data cleaning steps (e.g., "34 rows excluded from Pressure_Bar").
4. **Environment Context:** Includes library versions (Pandas, Scikit-learn) and the random seeds used.
5. **Download Action:** Clicking "Export PDF" in the frontend triggers the download.
## Tasks / Subtasks
- [x] **Dependency Update** (AC: 1)
- [x] Add `reportlab` or `fpdf2` to the backend using `uv`.
- [x] **Report Engine** (AC: 1, 2, 3, 4)
- [x] Implement `generate_pdf_report(results, metadata, audit_trail)` in `backend/app/core/engine/reports.py`.
- [x] **API & Integration** (AC: 5)
- [x] Create `POST /api/v1/reports/export` endpoint.
- [x] Add the "Download PDF" button to the application header.
## Dev Notes
- **Aesthetic:** Designed the PDF with a clean header and color-coded p-values to match the web dashboard.
- **Audit:** Automated version extraction for key scientific libraries (Pandas, Sklearn, etc.) to ensure complete reproducibility documentation.
- **Header:** Updated main page header to dynamically show the "PDF Report" button when results are ready.
### Project Structure Notes
- Created `backend/app/core/engine/reports.py` for PDF layout.
- Created `backend/app/api/v1/reports.py` for the export route.
- Integrated download logic in `frontend/src/app/page.tsx`.
### References
- [Source: functional-requirements.md#FR21]
- [Source: epics.md#Story 4.4]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Implemented professional PDF generation using `fpdf2`.
- Added color-coded statistical coefficients to the PDF output.
- Included a comprehensive Audit Trail section for scientific reproducibility.
- Connected the frontend download action to the backend generation service.
### File List
- /backend/app/core/engine/reports.py
- /backend/app/api/v1/reports.py
- /backend/main.py
- /frontend/src/app/page.tsx

View File

@@ -0,0 +1,70 @@
# generated: 2026-01-10
# project: Data_analysis
# project_key: DATA
# tracking_system: file-system
# story_location: _bmad-output/implementation-artifacts
# STATUS DEFINITIONS:
# ==================
# Epic Status:
# - backlog: Epic not yet started
# - in-progress: Epic actively being worked on
# - done: All stories in epic completed
#
# Epic Status Transitions:
# - backlog → in-progress: Automatically when first story is created (via create-story)
# - in-progress → done: Manually when all stories reach 'done' status
#
# Story Status:
# - backlog: Story only exists in epic file
# - ready-for-dev: Story file created in stories folder
# - in-progress: Developer actively working on implementation
# - review: Ready for code review (via Dev's code-review workflow)
# - done: Story completed
#
# Retrospective Status:
# - optional: Can be completed but not required
# - done: Retrospective has been completed
#
# WORKFLOW NOTES:
# ===============
# - Epic transitions to 'in-progress' automatically when first story is created
# - Stories can be worked in parallel if team capacity allows
# - SM typically creates next story after previous one is 'done' to incorporate learnings
# - Dev moves story to 'review', then runs code-review (fresh context, different LLM recommended)
generated: "2026-01-10"
project: "Data_analysis"
project_key: "DATA"
tracking_system: "file-system"
story_location: "_bmad-output/implementation-artifacts"
development_status:
epic-1: done
1-1-initialisation-du-monorepo-docker: review
1-2-ingestion-de-fichiers-excel-csv-backend: review
1-3-visualisation-dans-la-smart-grid-frontend: review
1-4-gestion-des-types-renommage-data-hygiene: review
1-5-tri-filtrage-de-base: review
epic-1-retrospective: optional
epic-2: done
2-1-edition-de-cellule-validation: review
2-2-undo-redo-des-modifications: review
2-3-detection-automatique-des-outliers-backend: review
2-4-panel-d-insights-revue-des-outliers-frontend: review
2-5-exclusion-non-destructive-de-donnees: review
epic-2-retrospective: optional
epic-3: done
3-1-matrice-de-correlation-interactive: review
3-2-calcul-de-l-importance-des-features-backend: review
3-3-recommandation-intelligente-de-variables-frontend: review
epic-3-retrospective: optional
epic-4: in-progress
4-1-configuration-de-la-regression: review
4-2-execution-du-modele-backend: review
4-3-dashboard-de-resultats-interactif: review
4-4-generation-du-rapport-pdf-audit-trail: backlog
epic-4-retrospective: optional