sepehr/Analysis

Fork 0

sepehr 6426ddd0ab initial commit

2026-01-11 22:56:02 +01:00

6.5 KiB

Raw Blame History

stepsCompleted

inputDocuments

workflowType

project_name

user_name

date

_bmad-output/planning-artifacts/prd.md

_bmad-output/planning-artifacts/ux-design-specification.md

architecture

Data_analysis

Sepehr

2026-01-10

Architecture Decision Document

This document builds collaboratively through step-by-step discovery. Sections are appended as we work through each architectural decision together.

Project Context Analysis

Requirements Overview

Functional Requirements: The system requires a robust data processing pipeline capable of ingesting diverse file formats (Excel/CSV), performing automated statistical analysis (Outlier Detection, RFE), and rendering interactive visualizations. The frontend must support a high-performance, editable grid ("Smart Grid") that mimics spreadsheet behavior.

Non-Functional Requirements:

Performance: Sub-second response times for grid interactions on datasets up to 50k rows.
Stateless Architecture: Phase 1 requires no persistent user data storage; sessions are ephemeral.
Scientific Rigor: Reproducibility of results is paramount, requiring strict versioning of libraries and random seeds.
Security: Secure file handling and transport (TLS 1.3) are mandatory.

Scale & Complexity:

Primary Domain: Scientific Web Application (Full-stack).
Complexity Level: Medium. The complexity lies in the bridge between the interactive frontend and the computational backend, ensuring synchronization and performance.
Estimated Architectural Components: ~5 Core Components (Frontend Shell, Data Grid, Visualization Engine, API Gateway, Computational Worker).

Technical Constraints & Dependencies

Backend: Python is mandatory for the scientific stack (Pandas, Scikit-learn, Statsmodels).
Frontend: Next.js 16 with React Server Components (for shell) and Client Components (for grid).
UI Library: Shadcn UI + TanStack Table (headless) + Recharts.
Deployment: Must support containerized deployment (Docker) for reproducibility.

Cross-Cutting Concerns Identified

Data Serialization: Efficient transfer of large datasets (JSON/Arrow) between Python backend and React frontend.
State Management: Synchronizing the client-side grid state with the server-side analysis context.
Error Handling: Unifying error reporting from the Python backend to the React UI (e.g., "Singular Matrix" error).

Starter Template Evaluation

Primary Technology Domain

Scientific Data Application (Full-stack Next.js + FastAPI) optimized for self-hosting.

Selected Starter: Custom FastAPI-Next.js-Docker Boilerplate

Rationale for Selection: Explicitly chosen to support a "Two-Service" deployment model on a Homelab infrastructure. This ensures process isolation between the analytical Python engine and the React UI.

Architectural Decisions Provided by Starter:

Language & Runtime: Python 3.12 (Backend managed by uv) and Node.js 20 (Frontend).
Styling Solution: Tailwind CSS with Shadcn UI.
Testing: Pytest (Backend) and Vitest (Frontend).
Code Organization: Clean Monorepo with separated service directories.

Deployment Strategy (Homelab):

Frontend Service: Next.js in Standalone mode (Docker).
Backend Service: FastAPI with Uvicorn (Docker).
Communication: Internal Docker network for API requests to minimize latency.

Core Architectural Decisions

Decision Priority Analysis

Critical Decisions (Block Implementation):

Data Serialization Protocol: Apache Arrow (IPC Stream) is mandatory for performance.
State Management Strategy: Hybrid (TanStack Query for Async + Zustand for UI State).
Container Strategy: Docker Compose with isolated networks for Homelab deployment.

Data Architecture

Format: Apache Arrow (IPC Stream) for grid data; JSON for control plane.
Validation: Pydantic (v2) for all JSON payloads.
Persistence: None (Stateless) for Phase 1. tempfile module in Python for transient storage during analysis.

API & Communication Patterns

Protocol: REST API (FastAPI) with StreamingResponse for data export.
Serialization: pyarrow.ipc.new_stream on backend -> tableFromIPC on frontend.
CORS: Strictly configured to allow only the Homelab domain (e.g., data.home).

Frontend Architecture

State Manager:
- Zustand (v5): For high-frequency grid state (selection, edits).
- TanStack Query (v5): For analytical job status and data fetching.
Component Architecture: "Smart Grid" pattern where the Grid component subscribes directly to the Zustand store to avoid re-rendering the entire page.

Infrastructure & Deployment

Containerization: Multi-stage Docker builds to keep images light (distroless/python and node-alpine).
Orchestration: Docker Compose file defining frontend, backend, and a shared network.

Implementation Patterns & Consistency Rules

Pattern Categories Defined

Critical Conflict Points Identified: 5 major areas where AI agents must align to prevent implementation divergence.

Naming Patterns

Backend (Python): Strict snake_case for modules, functions, and variables (PEP 8).
Frontend (TSX): PascalCase for Components (SmartGrid.tsx), camelCase for hooks and utilities.
API / JSON: snake_case for all keys to maintain 1:1 mapping with Pandas DataFrame columns and Pydantic models.

Structure Patterns

Project Organization: Co-located logic. Features are grouped in folders: /features/data-grid, /features/analysis-engine.
Test Location: Centralized /tests directory at the service root (e.g., backend/tests/, frontend/tests/) to simplify Docker test runs.

Format Patterns

API Response Wrapper:
- Success: { "status": "success", "data": ..., "metadata": {...} }.
- Error: { "status": "error", "message": "User-friendly message", "code": "TECHNICAL_ERROR_CODE" }.
Date Format: ISO 8601 strings (YYYY-MM-DDTHH:mm:ssZ) in UTC.

Process Patterns

Loading States: Standardized isLoading and isProcessing flags in Zustand/TanStack Query.
Validation:
- Backend: Pydantic v2.
- Frontend: Zod (synchronized with Pydantic via OpenAPI generator).

Enforcement Guidelines

All AI Agents MUST:

Check for existing Pydantic models before creating new ones.
Use the logger utility instead of print() or console.log.
Add JSDoc/Docstrings to every exported function.

6.5 KiB Raw Blame History