Analysis/_bmad-output/implementation-artifacts/3-2-calcul-de-l-importance-des-features-backend.md
2026-01-11 22:56:02 +01:00

59 lines
2.2 KiB
Markdown

# Story 3.2: Calcul de l'Importance des Features (Backend)
Status: review
## Story
As a system,
I want to compute the predictive power of features against a target variable,
so that I can provide scientific recommendations to the user.
## Acceptance Criteria
1. **Importance Algorithm:** Backend implements Feature Importance calculation using `RandomForestRegressor`.
2. **Analysis Endpoint:** A POST endpoint `/api/v1/analysis/feature-importance` accepts data, features list, and target variable (Y).
3. **Detection Output:** Returns a ranked list of features with their importance scores (0 to 1).
4. **Validation:** Ensures Y is not in the X list and that enough numeric data exists.
5. **Clean Data Source:** Only uses data from non-excluded rows.
## Tasks / Subtasks
- [x] **Engine Implementation** (AC: 1, 4)
- [x] Implement `calculate_feature_importance(df, features, target)` in `backend/app/core/engine/stats.py`.
- [x] Handle categorical features using basic Label Encoding if needed (currently focus on numeric).
- [x] **API Endpoint** (AC: 2, 3, 5)
- [x] Implement `POST /api/v1/analysis/feature-importance` in `analysis.py`.
## Dev Notes
- **Model:** Used `RandomForestRegressor` with 50 estimators for a balance between speed and accuracy.
- **Data Prep:** Automatically drops rows with NaNs in either features or target to ensure Scikit-learn compatibility.
- **Output:** Returns a JSON list of objects `{feature, score}` sorted by score in descending order.
### Project Structure Notes
- Modified `backend/app/core/engine/stats.py`.
- Updated `backend/app/api/v1/analysis.py`.
- Added test case in `backend/tests/test_analysis.py`.
### References
- [Source: epics.md#Story 3.2]
- [Source: architecture.md#Computational Workers]
## Dev Agent Record
### Agent Model Used
{{agent_model_name_version}}
### Completion Notes List
- Implemented the Feature Importance core engine using Scikit-learn.
- Developed the API endpoint to expose the ranked feature list.
- Added validation to prevent processing empty or incompatible datasets.
- Verified with automated tests.
### File List
- /backend/app/core/engine/stats.py
- /backend/app/api/v1/analysis.py
- /backend/tests/test_analysis.py