59 lines
2.2 KiB
Markdown
59 lines
2.2 KiB
Markdown
# Story 3.2: Calcul de l'Importance des Features (Backend)
|
|
|
|
Status: review
|
|
|
|
## Story
|
|
|
|
As a system,
|
|
I want to compute the predictive power of features against a target variable,
|
|
so that I can provide scientific recommendations to the user.
|
|
|
|
## Acceptance Criteria
|
|
|
|
1. **Importance Algorithm:** Backend implements Feature Importance calculation using `RandomForestRegressor`.
|
|
2. **Analysis Endpoint:** A POST endpoint `/api/v1/analysis/feature-importance` accepts data, features list, and target variable (Y).
|
|
3. **Detection Output:** Returns a ranked list of features with their importance scores (0 to 1).
|
|
4. **Validation:** Ensures Y is not in the X list and that enough numeric data exists.
|
|
5. **Clean Data Source:** Only uses data from non-excluded rows.
|
|
|
|
## Tasks / Subtasks
|
|
|
|
- [x] **Engine Implementation** (AC: 1, 4)
|
|
- [x] Implement `calculate_feature_importance(df, features, target)` in `backend/app/core/engine/stats.py`.
|
|
- [x] Handle categorical features using basic Label Encoding if needed (currently focus on numeric).
|
|
- [x] **API Endpoint** (AC: 2, 3, 5)
|
|
- [x] Implement `POST /api/v1/analysis/feature-importance` in `analysis.py`.
|
|
|
|
## Dev Notes
|
|
|
|
- **Model:** Used `RandomForestRegressor` with 50 estimators for a balance between speed and accuracy.
|
|
- **Data Prep:** Automatically drops rows with NaNs in either features or target to ensure Scikit-learn compatibility.
|
|
- **Output:** Returns a JSON list of objects `{feature, score}` sorted by score in descending order.
|
|
|
|
### Project Structure Notes
|
|
|
|
- Modified `backend/app/core/engine/stats.py`.
|
|
- Updated `backend/app/api/v1/analysis.py`.
|
|
- Added test case in `backend/tests/test_analysis.py`.
|
|
|
|
### References
|
|
|
|
- [Source: epics.md#Story 3.2]
|
|
- [Source: architecture.md#Computational Workers]
|
|
|
|
## Dev Agent Record
|
|
|
|
### Agent Model Used
|
|
|
|
{{agent_model_name_version}}
|
|
|
|
### Completion Notes List
|
|
- Implemented the Feature Importance core engine using Scikit-learn.
|
|
- Developed the API endpoint to expose the ranked feature list.
|
|
- Added validation to prevent processing empty or incompatible datasets.
|
|
- Verified with automated tests.
|
|
|
|
### File List
|
|
- /backend/app/core/engine/stats.py
|
|
- /backend/app/api/v1/analysis.py
|
|
- /backend/tests/test_analysis.py |