Initial commit
This commit is contained in:
123
_bmad-output/planning-artifacts/architecture.md
Normal file
123
_bmad-output/planning-artifacts/architecture.md
Normal file
@@ -0,0 +1,123 @@
|
||||
---
|
||||
stepsCompleted: [1, 2, 3, 4, 5]
|
||||
inputDocuments: ['_bmad-output/planning-artifacts/prd.md', '_bmad-output/planning-artifacts/ux-design-specification.md']
|
||||
workflowType: 'architecture'
|
||||
project_name: 'Data_analysis'
|
||||
user_name: 'Sepehr'
|
||||
date: '2026-01-10'
|
||||
---
|
||||
|
||||
# Architecture Decision Document
|
||||
|
||||
_This document builds collaboratively through step-by-step discovery. Sections are appended as we work through each architectural decision together._
|
||||
|
||||
## Project Context Analysis
|
||||
|
||||
### Requirements Overview
|
||||
|
||||
**Functional Requirements:**
|
||||
The system requires a robust data processing pipeline capable of ingesting diverse file formats (Excel/CSV), performing automated statistical analysis (Outlier Detection, RFE), and rendering interactive visualizations. The frontend must support a high-performance, editable grid ("Smart Grid") that mimics spreadsheet behavior.
|
||||
|
||||
**Non-Functional Requirements:**
|
||||
* **Performance:** Sub-second response times for grid interactions on datasets up to 50k rows.
|
||||
* **Stateless Architecture:** Phase 1 requires no persistent user data storage; sessions are ephemeral.
|
||||
* **Scientific Rigor:** Reproducibility of results is paramount, requiring strict versioning of libraries and random seeds.
|
||||
* **Security:** Secure file handling and transport (TLS 1.3) are mandatory.
|
||||
|
||||
**Scale & Complexity:**
|
||||
* **Primary Domain:** Scientific Web Application (Full-stack).
|
||||
* **Complexity Level:** Medium. The complexity lies in the bridge between the interactive frontend and the computational backend, ensuring synchronization and performance.
|
||||
* **Estimated Architectural Components:** ~5 Core Components (Frontend Shell, Data Grid, Visualization Engine, API Gateway, Computational Worker).
|
||||
|
||||
### Technical Constraints & Dependencies
|
||||
|
||||
* **Backend:** Python is mandatory for the scientific stack (Pandas, Scikit-learn, Statsmodels).
|
||||
* **Frontend:** Next.js 16 with React Server Components (for shell) and Client Components (for grid).
|
||||
* **UI Library:** Shadcn UI + TanStack Table (headless) + Recharts.
|
||||
* **Deployment:** Must support containerized deployment (Docker) for reproducibility.
|
||||
|
||||
### Cross-Cutting Concerns Identified
|
||||
|
||||
* **Data Serialization:** Efficient transfer of large datasets (JSON/Arrow) between Python backend and React frontend.
|
||||
* **State Management:** Synchronizing the client-side grid state with the server-side analysis context.
|
||||
* **Error Handling:** Unifying error reporting from the Python backend to the React UI (e.g., "Singular Matrix" error).
|
||||
|
||||
## Starter Template Evaluation
|
||||
|
||||
### Primary Technology Domain
|
||||
Scientific Data Application (Full-stack Next.js + FastAPI) optimized for self-hosting.
|
||||
|
||||
### Selected Starter: Custom FastAPI-Next.js-Docker Boilerplate
|
||||
**Rationale for Selection:**
|
||||
Explicitly chosen to support a "Two-Service" deployment model on a Homelab infrastructure. This ensures process isolation between the analytical Python engine and the React UI.
|
||||
|
||||
**Architectural Decisions Provided by Starter:**
|
||||
* **Language & Runtime:** Python 3.12 (Backend managed by **uv**) and Node.js 20 (Frontend).
|
||||
* **Styling Solution:** Tailwind CSS with Shadcn UI.
|
||||
* **Testing:** Pytest (Backend) and Vitest (Frontend).
|
||||
* **Code Organization:** Clean Monorepo with separated service directories.
|
||||
|
||||
**Deployment Strategy (Homelab):**
|
||||
* **Frontend Service:** Next.js in Standalone mode (Docker).
|
||||
* **Backend Service:** FastAPI with Uvicorn (Docker).
|
||||
* **Communication:** Internal Docker network for API requests to minimize latency.
|
||||
|
||||
## Core Architectural Decisions
|
||||
|
||||
### Decision Priority Analysis
|
||||
**Critical Decisions (Block Implementation):**
|
||||
* **Data Serialization Protocol:** Apache Arrow (IPC Stream) is mandatory for performance.
|
||||
* **State Management Strategy:** Hybrid (TanStack Query for Async + Zustand for UI State).
|
||||
* **Container Strategy:** Docker Compose with isolated networks for Homelab deployment.
|
||||
|
||||
### Data Architecture
|
||||
* **Format:** Apache Arrow (IPC Stream) for grid data; JSON for control plane.
|
||||
* **Validation:** Pydantic (v2) for all JSON payloads.
|
||||
* **Persistence:** None (Stateless) for Phase 1. `tempfile` module in Python for transient storage during analysis.
|
||||
|
||||
### API & Communication Patterns
|
||||
* **Protocol:** REST API (FastAPI) with `StreamingResponse` for data export.
|
||||
* **Serialization:** `pyarrow.ipc.new_stream` on backend -> `tableFromIPC` on frontend.
|
||||
* **CORS:** Strictly configured to allow only the Homelab domain (e.g., `data.home`).
|
||||
|
||||
### Frontend Architecture
|
||||
* **State Manager:**
|
||||
* **Zustand (v5):** For high-frequency grid state (selection, edits).
|
||||
* **TanStack Query (v5):** For analytical job status and data fetching.
|
||||
* **Component Architecture:** "Smart Grid" pattern where the Grid component subscribes directly to the Zustand store to avoid re-rendering the entire page.
|
||||
|
||||
### Infrastructure & Deployment
|
||||
* **Containerization:** Multi-stage Docker builds to keep images light (distroless/python and node-alpine).
|
||||
* **Orchestration:** Docker Compose file defining `frontend`, `backend`, and a shared `network`.
|
||||
|
||||
## Implementation Patterns & Consistency Rules
|
||||
|
||||
### Pattern Categories Defined
|
||||
**Critical Conflict Points Identified:** 5 major areas where AI agents must align to prevent implementation divergence.
|
||||
|
||||
### Naming Patterns
|
||||
* **Backend (Python):** Strict `snake_case` for modules, functions, and variables (PEP 8).
|
||||
* **Frontend (TSX):** `PascalCase` for Components (`SmartGrid.tsx`), `camelCase` for hooks and utilities.
|
||||
* **API / JSON:** `snake_case` for all keys to maintain 1:1 mapping with Pandas DataFrame columns and Pydantic models.
|
||||
|
||||
### Structure Patterns
|
||||
* **Project Organization:** Co-located logic. Features are grouped in folders: `/features/data-grid`, `/features/analysis-engine`.
|
||||
* **Test Location:** Centralized `/tests` directory at the service root (e.g., `backend/tests/`, `frontend/tests/`) to simplify Docker test runs.
|
||||
|
||||
### Format Patterns
|
||||
* **API Response Wrapper:**
|
||||
* Success: `{ "status": "success", "data": ..., "metadata": {...} }`.
|
||||
* Error: `{ "status": "error", "message": "User-friendly message", "code": "TECHNICAL_ERROR_CODE" }`.
|
||||
* **Date Format:** ISO 8601 strings (`YYYY-MM-DDTHH:mm:ssZ`) in UTC.
|
||||
|
||||
### Process Patterns
|
||||
* **Loading States:** Standardized `isLoading` and `isProcessing` flags in Zustand/TanStack Query.
|
||||
* **Validation:**
|
||||
* Backend: Pydantic v2.
|
||||
* Frontend: Zod (synchronized with Pydantic via OpenAPI generator).
|
||||
|
||||
### Enforcement Guidelines
|
||||
**All AI Agents MUST:**
|
||||
1. Check for existing Pydantic models before creating new ones.
|
||||
2. Use the `logger` utility instead of `print()` or `console.log`.
|
||||
3. Add JSDoc/Docstrings to every exported function.
|
||||
Reference in New Issue
Block a user