2026-01-11 22:56:02 +01:00

6.5 KiB

stepsCompleted inputDocuments workflowType project_name user_name date
1
2
3
4
5
_bmad-output/planning-artifacts/prd.md
_bmad-output/planning-artifacts/ux-design-specification.md
architecture Data_analysis Sepehr 2026-01-10

Architecture Decision Document

This document builds collaboratively through step-by-step discovery. Sections are appended as we work through each architectural decision together.

Project Context Analysis

Requirements Overview

Functional Requirements: The system requires a robust data processing pipeline capable of ingesting diverse file formats (Excel/CSV), performing automated statistical analysis (Outlier Detection, RFE), and rendering interactive visualizations. The frontend must support a high-performance, editable grid ("Smart Grid") that mimics spreadsheet behavior.

Non-Functional Requirements:

  • Performance: Sub-second response times for grid interactions on datasets up to 50k rows.
  • Stateless Architecture: Phase 1 requires no persistent user data storage; sessions are ephemeral.
  • Scientific Rigor: Reproducibility of results is paramount, requiring strict versioning of libraries and random seeds.
  • Security: Secure file handling and transport (TLS 1.3) are mandatory.

Scale & Complexity:

  • Primary Domain: Scientific Web Application (Full-stack).
  • Complexity Level: Medium. The complexity lies in the bridge between the interactive frontend and the computational backend, ensuring synchronization and performance.
  • Estimated Architectural Components: ~5 Core Components (Frontend Shell, Data Grid, Visualization Engine, API Gateway, Computational Worker).

Technical Constraints & Dependencies

  • Backend: Python is mandatory for the scientific stack (Pandas, Scikit-learn, Statsmodels).
  • Frontend: Next.js 16 with React Server Components (for shell) and Client Components (for grid).
  • UI Library: Shadcn UI + TanStack Table (headless) + Recharts.
  • Deployment: Must support containerized deployment (Docker) for reproducibility.

Cross-Cutting Concerns Identified

  • Data Serialization: Efficient transfer of large datasets (JSON/Arrow) between Python backend and React frontend.
  • State Management: Synchronizing the client-side grid state with the server-side analysis context.
  • Error Handling: Unifying error reporting from the Python backend to the React UI (e.g., "Singular Matrix" error).

Starter Template Evaluation

Primary Technology Domain

Scientific Data Application (Full-stack Next.js + FastAPI) optimized for self-hosting.

Selected Starter: Custom FastAPI-Next.js-Docker Boilerplate

Rationale for Selection: Explicitly chosen to support a "Two-Service" deployment model on a Homelab infrastructure. This ensures process isolation between the analytical Python engine and the React UI.

Architectural Decisions Provided by Starter:

  • Language & Runtime: Python 3.12 (Backend managed by uv) and Node.js 20 (Frontend).
  • Styling Solution: Tailwind CSS with Shadcn UI.
  • Testing: Pytest (Backend) and Vitest (Frontend).
  • Code Organization: Clean Monorepo with separated service directories.

Deployment Strategy (Homelab):

  • Frontend Service: Next.js in Standalone mode (Docker).
  • Backend Service: FastAPI with Uvicorn (Docker).
  • Communication: Internal Docker network for API requests to minimize latency.

Core Architectural Decisions

Decision Priority Analysis

Critical Decisions (Block Implementation):

  • Data Serialization Protocol: Apache Arrow (IPC Stream) is mandatory for performance.
  • State Management Strategy: Hybrid (TanStack Query for Async + Zustand for UI State).
  • Container Strategy: Docker Compose with isolated networks for Homelab deployment.

Data Architecture

  • Format: Apache Arrow (IPC Stream) for grid data; JSON for control plane.
  • Validation: Pydantic (v2) for all JSON payloads.
  • Persistence: None (Stateless) for Phase 1. tempfile module in Python for transient storage during analysis.

API & Communication Patterns

  • Protocol: REST API (FastAPI) with StreamingResponse for data export.
  • Serialization: pyarrow.ipc.new_stream on backend -> tableFromIPC on frontend.
  • CORS: Strictly configured to allow only the Homelab domain (e.g., data.home).

Frontend Architecture

  • State Manager:
    • Zustand (v5): For high-frequency grid state (selection, edits).
    • TanStack Query (v5): For analytical job status and data fetching.
  • Component Architecture: "Smart Grid" pattern where the Grid component subscribes directly to the Zustand store to avoid re-rendering the entire page.

Infrastructure & Deployment

  • Containerization: Multi-stage Docker builds to keep images light (distroless/python and node-alpine).
  • Orchestration: Docker Compose file defining frontend, backend, and a shared network.

Implementation Patterns & Consistency Rules

Pattern Categories Defined

Critical Conflict Points Identified: 5 major areas where AI agents must align to prevent implementation divergence.

Naming Patterns

  • Backend (Python): Strict snake_case for modules, functions, and variables (PEP 8).
  • Frontend (TSX): PascalCase for Components (SmartGrid.tsx), camelCase for hooks and utilities.
  • API / JSON: snake_case for all keys to maintain 1:1 mapping with Pandas DataFrame columns and Pydantic models.

Structure Patterns

  • Project Organization: Co-located logic. Features are grouped in folders: /features/data-grid, /features/analysis-engine.
  • Test Location: Centralized /tests directory at the service root (e.g., backend/tests/, frontend/tests/) to simplify Docker test runs.

Format Patterns

  • API Response Wrapper:
    • Success: { "status": "success", "data": ..., "metadata": {...} }.
    • Error: { "status": "error", "message": "User-friendly message", "code": "TECHNICAL_ERROR_CODE" }.
  • Date Format: ISO 8601 strings (YYYY-MM-DDTHH:mm:ssZ) in UTC.

Process Patterns

  • Loading States: Standardized isLoading and isProcessing flags in Zustand/TanStack Query.
  • Validation:
    • Backend: Pydantic v2.
    • Frontend: Zod (synchronized with Pydantic via OpenAPI generator).

Enforcement Guidelines

All AI Agents MUST:

  1. Check for existing Pydantic models before creating new ones.
  2. Use the logger utility instead of print() or console.log.
  3. Add JSDoc/Docstrings to every exported function.