Analysis/README.md
2026-01-11 22:56:02 +01:00

205 lines
6.4 KiB
Markdown

# Data Analysis Platform
A modern full-stack data analysis platform built with Python FastAPI backend and Next.js frontend, designed for efficient data processing, visualization, and statistical analysis.
## Overview
This platform provides a comprehensive toolkit for data analysis workflows, combining powerful Python data science libraries with a modern, responsive web interface. It leverages Apache Arrow for high-performance data transfer and implements best practices for both backend and frontend development.
### Key Features
- **Backend:** FastAPI with async support, Pydantic v2 validation, and comprehensive data science stack
- **Frontend:** Next.js 16 with TypeScript, Tailwind CSS, and Shadcn UI components
- **Data Processing:** Pandas, Scikit-learn, and Statsmodels integration
- **Performance:** Apache Arrow for zero-copy data transfer between services
- **Architecture:** Feature-based frontend organization, RESTful API design
## Technology Stack
### Backend
- **Python 3.12+** with FastAPI framework
- **Pydantic v2** for schema validation
- **Data Science:** Pandas 2.3.3+, Scikit-learn 1.8.0+, Statsmodels 0.14.6+
- **Serialization:** Apache Arrow 22.0+ for efficient binary data transfer
- **Package Management:** UV for fast dependency resolution
### Frontend
- **Next.js 16** (Standalone mode) with React 19
- **TypeScript** for type safety
- **Styling:** Tailwind CSS 4+ and Shadcn UI components
- **Data Display:** TanStack Table, Apache Arrow 21+, Recharts
- **State Management:** Zustand v5 for local state, TanStack Query v5 for server state
- **Virtualization:** TanStack Virtual for handling large datasets
### DevOps
- **Docker** multi-stage builds with distroless/alpine images
- **Docker Compose** for local development orchestration
## Project Structure
```
Data_analysis/
├── backend/ # FastAPI backend service
│ ├── app/ # Application modules
│ ├── tests/ # Backend test suite
│ ├── main.py # Application entry point
│ ├── pyproject.toml # Python dependencies (UV)
│ └── Dockerfile # Backend container image
├── frontend/ # Next.js frontend application
│ ├── src/
│ │ └── features/ # Feature-based organization
│ ├── tests/ # Frontend test suite
│ ├── package.json # Node.js dependencies
│ └── Dockerfile # Frontend container image
├── compose.yaml # Docker Compose configuration
├── _bmad-output/ # Planning and implementation artifacts
└── README.md # This file
```
## Prerequisites
Before running the applications locally, ensure you have the following installed:
- **Python 3.12+** - [Download Python](https://www.python.org/downloads/)
- **Node.js 20+** - [Download Node.js](https://nodejs.org/)
- **UV (Python package manager)** - Install via: `pip install uv`
- **npm** (comes with Node.js)
## Local Development Setup
### Backend Setup
1. Navigate to the backend directory:
```bash
cd backend
```
2. Create and activate a virtual environment (optional but recommended):
```bash
python3.12 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
```
3. Install dependencies using UV:
```bash
uv sync
```
4. Start the FastAPI development server:
```bash
uvicorn main:app --reload --host 0.0.0.0 --port 8000
```
The backend API will be available at `http://localhost:8000`
5. Access the interactive API documentation at:
- Swagger UI: `http://localhost:8000/docs`
- ReDoc: `http://localhost:8000/redoc`
### Frontend Setup
1. Open a new terminal and navigate to the frontend directory:
```bash
cd frontend
```
2. Install dependencies:
```bash
npm install
```
3. Create environment configuration file:
```bash
cp .env.local.example .env.local
```
Edit `.env.local` if needed to configure API endpoints or other settings.
4. Start the Next.js development server:
```bash
npm run dev
```
The frontend application will be available at `http://localhost:3000`
### Running Both Services
To run both services simultaneously:
1. **Terminal 1 - Backend:**
```bash
cd backend
source .venv/bin/activate # If using virtual environment
uvicorn main:app --reload --host 0.0.0.0 --port 8000
```
2. **Terminal 2 - Frontend:**
```bash
cd frontend
npm run dev
```
## Development Workflow
### Code Style and Standards
- **Backend:** Follow PEP 8 guidelines with snake_case naming
- **Frontend:** Use TypeScript strict mode, follow ESLint configuration
- **API Convention:** Use snake_case for JSON keys to maintain consistency with Pandas DataFrames
- **Documentation:** Include docstrings (Python) and JSDoc comments (TypeScript) for all exported functions
### Testing
- **Backend tests:** Run `pytest` from the `backend/` directory
- **Frontend tests:** Run `npm test` from the `frontend/` directory (when configured)
### Key Anti-Patterns to Avoid
- Do NOT use standard JSON for transferring datasets larger than 5,000 rows (use Apache Arrow)
- Do NOT use deep React Context for high-frequency state updates (use Zustand)
- Do NOT implement opaque algorithms without logging data exclusions
- Do NOT perform heavy blocking computations on the main FastAPI process (use background tasks)
## Docker Deployment (Coming Soon)
The project includes Docker configuration for containerized deployment. Instructions for running with Docker Compose will be added in the next update.
## Documentation
- **Project Context:** See `_bmad-output/project-context.md` for detailed implementation rules
- **Architecture:** Technical architecture documentation is available in `_bmad-output/planning-artifacts/`
- **API Reference:** Access interactive API documentation at `/docs` when backend is running
## Contributing
This project follows modern software development practices with comprehensive planning artifacts. When contributing:
1. Read the project context file for implementation guidelines
2. Follow the established code patterns and conventions
3. Ensure all tests pass before submitting changes
4. Update documentation as needed
## License
[Specify your license here]
## Support
For questions or issues related to this project, please refer to the project documentation or contact the development team.
---
**Last Updated:** 2026-01-11
Refreshed by automation