---
stepsCompleted: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
inputDocuments: ['_bmad-output/planning-artifacts/prd.md']
---
# UX Design Specification Data_analysis
**Author:** Sepehr
**Date:** 2026-01-10
---
## Executive Summary
### Project Vision
Create a modern, web-based, "No-Code" alternative to Minitab. The goal is to empower domain experts (engineers, analysts) to perform rigorous statistical regressions via a hybrid interface combining the simplicity of Excel with the computational power of Python.
### Target Users
* **Julien (Analyst/Engineer):** Domain expert user, seeks efficiency and rigor without coding. Primarily uses a desktop computer.
* **Marc (Decision Maker):** Result consumer, needs clear, mobile-friendly reports to validate production decisions.
### Key Design Challenges
* **Grid Performance:** Maintain fluid interactivity with large data volumes (virtualization).
* **Statistical Vulgarization:** Make variable selection and outlier detection concepts intuitive through visual design.
* **Guided Workflow:** Design a conversion funnel (from raw file to final report) that reduces cognitive load.
### Design Opportunities
* **Familiar Interface:** Leverage Microsoft Excel design patterns to reduce initial friction.
* **"Mobile-First" Reports:** Create a competitive advantage with report exports and views optimized for tablets.
## Core User Experience
### Defining Experience
The core of Data_analysis is the **"Smart Grid"**. Unlike a static HTML table, this grid feels alive. It's the command center where data ingestion, cleaning, and exploration happen seamlessly. Users don't "run scripts"; they interact with their data directly, with the system acting as an intelligent co-pilot suggesting corrections and insights.
### Platform Strategy
* **Desktop (Primary):** Optimized for mouse/keyboard inputs. High density of information. Supports "Power User" shortcuts (Ctrl+Z, Arrows).
* **Tablet (Secondary):** Optimized for touch. "Read-only" mode for reports and dashboards. Lower density, larger touch targets.
### Effortless Interactions
* **Zero-Config Import:** Drag-and-drop Excel ingestion with auto-detection of headers, types, and delimiters. No wizard fatigue.
* **One-Click Hygiene:** Automated detection of data anomalies (NaNs, wrong types) with single-click remediation actions ("Fix all", "Drop rows").
### Critical Success Moments
* **The "Clarity" Moment:** When the "Smart Feature Selection" reduces a chaotic 50-column dataset to the 3-4 variables that actually matter, visualized clearly.
* **The "Confidence" Moment:** When the system confirms "No outliers detected" or "Model assumptions met" via clear green indicators before generating the report.
### Experience Principles
1. **Direct Manipulation:** Don't hide data behind menus. Let users click, edit, and filter right where the data lives.
2. **Proactive Intelligence:** Don't wait for the user to find errors. Highlight them immediately and offer solutions.
3. **Visual First:** Show the data distribution (mini-histograms) in the headers. Show the outliers on a plot, not just a list of row numbers.
## Desired Emotional Response
### Primary Emotional Goals
The primary emotional goal of Data_analysis is to move the user from **Anxiety to Confidence**. Statistics can be intimidating; our interface must act as a reassuring expert co-pilot.
### Emotional Journey Mapping
* **Discovery:** **Curiosity & Hope.** "Can this really replace my manual Excel cleaning?"
* **Data Ingestion:** **Relief.** "It parsed my file instantly without errors."
* **Data Cleaning:** **Surprise & Empowerment.** "I didn't know I had outliers, now I see them clearly."
* **Analysis/Reporting:** **Confidence & Pride.** "This report looks professional and I understand every part of it."
### Micro-Emotions
* **Trust vs. Skepticism:** Built through "Explainable AI" tooltips.
* **Calm vs. Frustration:** Achieved through smooth animations and non-blocking background tasks.
* **Mastery vs. Confusion:** Delivered by guiding the user through a linear logical workflow.
### Design Implications
* **Confidence** → Use a sober, professional color palette (Blues/Grays). Provide clear "Validation" checkmarks when data is clean.
* **Relief** → Automate tedious tasks like type-casting and missing value detection. Use "Undo" to remove the fear of making mistakes.
* **Empowerment** → Use natural language labels instead of cryptic statistical abbreviations (e.g., "Predictive Power" instead of "Coefficient of Determination").
### Emotional Design Principles
1. **Safety Net:** Users should never feel like they can "break" the data. Every action is reversible.
2. **No Dead Ends:** If an error occurs (e.g., singular matrix), explain *why* in plain French and how to fix it.
3. **Visual Rewards:** Use subtle success animations when a model is successfully trained.
## UX Pattern Analysis & Inspiration
### Inspiring Products Analysis
* **Microsoft Excel:** The standard for grid interaction. Users expect double-click editing, arrow-key navigation, and "fill-down" patterns.
* **Airtable:** Revolutionized the data grid with modern UI patterns. We adopt their clean column headers, visual data types (badges, progress bars), and intuitive filtering.
* **Linear / Vercel:** The benchmark for high-performance developer tools. We draw inspiration from their minimalist aesthetic, exceptional Dark Mode, and keyboard-first navigation.
### Transferable UX Patterns
* **Navigation:** **Sidebar-less / Hub & Spoke.** Focus on the data grid as the central workspace with floating or collapsible side panels for analysis tools.
* **Interaction:** **"Sheet-to-Report" Pipeline.** A clear horizontal or vertical progression from raw data to a finalized interactive report.
* **Visual:** **Statistical Overlays.** Using "Sparklines" (mini-histograms) in column headers to show data distribution at a glance.
### Anti-Patterns to Avoid
* **The Modal Maze:** Opening a new pop-up window for every statistical setting. We prefer slide-over panels or inline settings to keep the context visible.
* **Opaque Processing:** Showing a generic spinner during long calculations. We will use a "Step-by-Step" status bar (e.g., "1. Parsing -> 2. Detecting Outliers -> 3. Selecting Features").
### Design Inspiration Strategy
* **Adopt:** The "TanStack Table" logic for grid virtualization (Excel speed) combined with Shadcn UI components (Vercel aesthetic).
* **Adapt:** Excel's right-click menu to include specific statistical actions like "Exclude from analysis" or "Set as Target (Y)".
* **Avoid:** Complex "Dashboard Builders." Users want a generated report, not a canvas they have to design themselves.
## Design System Foundation
### 1.1 Design System Choice
The project will use **Shadcn UI** as the primary UI library, built on top of **Tailwind CSS** and **Radix UI**. The core data interaction will be powered by **TanStack Table (headless)** to create a custom, high-performance "Smart Grid."
### Rationale for Selection
* **Performance:** TanStack Table allows for massive data virtualization (50k+ rows) without the overhead of heavy UI frameworks.
* **Aesthetic Consistency:** Shadcn provides the "Vercel-like" minimalist and professional aesthetic defined in our inspiration phase.
* **Accessibility:** Leveraging Radix UI primitives ensures that complex components (popovers, dropdowns, dialogs) are fully WCAG compliant.
* **Developer Experience:** Direct ownership of component code allows for deep customization of statistical-specific UI elements.
### Implementation Approach
* **Shell:** Standard Shadcn layout components (Sidebar, TopNav).
* **Data Grid:** A custom-built component using TanStack Table's hook logic, styled with Shadcn Table primitives.
* **Charts:** Integration of **Recharts** or **Tremor** (which matches Shadcn's style) for statistical visualizations.
### Customization Strategy
* **Tokens:** Neutral gray base with "Scientific Blue" as the primary action color.
* **Typography:** Sans-serif (Geist or Inter) for the UI; Monospace (JetBrains Mono) for data cells and statistical metrics.
* **Density:** "High-Density" mode by default for the grid (small cell padding) to maximize data visibility.
## 2. Core User Experience
### 2.1 Defining Experience
The defining interaction of Data_analysis is the **"Guided Data Hygiene Loop"**. It transforms the tedious task of cleaning data into a rapid, rewarding conversation with the system. Users don't "edit cells"; they respond to intelligent insights that actively improve their model's quality in real-time.
### 2.2 User Mental Model
* **Current Model:** "I have to manually hunt for errors row by row in Excel, then delete them and hope I didn't break anything."
* **Target Model:** "The system is my Quality Assistant. It points out the issues, I make the executive decision, and I instantly see the result."
### 2.3 Success Criteria
* **Speed:** Reviewing and fixing 50 outliers should take less than 30 seconds.
* **Safety:** Users must feel that "excluding" data is non-destructive (reversible).
* **Reward:** Every fix must trigger a positive visual feedback (e.g., model accuracy score pulsing green).
### 2.4 Novel UX Patterns
* **"Contextual Insight Panel":** Instead of modal popups, a slide-over panel allows users to see the specific rows in question (highlighted in the grid) while reviewing the statistical explanation (boxplot/histogram) side-by-side.
* **"Live Impact Preview":** Before confirming an exclusion, hover over the button to see a "Ghost Curve" showing how the regression line *will* change.
### 2.5 Experience Mechanics
1. **Initiation:** System highlights "dirty" columns with a subtle warning badge in the header.
2. **Interaction:** User clicks the header badge. The Insight Panel slides in.
3. **Feedback:** The panel shows "34 values are > 3 Sigma". The grid highlights these 34 rows.
4. **Action:** User clicks "Exclude All". Rows fade to gray. The Regression R² badge updates from 0.65 to 0.82 with a celebration animation.
5. **Completion:** The column header badge turns to a green checkmark.
## Visual Design Foundation
### Color System
* **Neutral:** Slate (50-900) - Technical, cold background for heavy data.
* **Primary:** Indigo (600) - For primary actions ("Run Regression").
* **Semantic Data Colors:**
* **Rose (500):** Outliers/Errors (Soft alert).
* **Emerald (500):** Valid Data/Success (Reassurance).
* **Amber (500):** Warnings/Missing Values.
* **Modes:** Fully supported Dark Mode using Slate-900 backgrounds and Indigo-400 primary accents.
### Typography System
* **Interface:** `Inter` (or Geist Sans) - Clean, legible at small sizes.
* **Data:** `JetBrains Mono` - Mandatory for the grid to ensure tabular alignment of decimals.
### Spacing & Layout Foundation
* **Grid Density:** Ultra-compact (4px y-padding) to maximize data visibility.
* **Panel Density:** Comfortable (16px padding) for reading insights.
* **Layout:** Full-width liquid layout. No wasted margins.
### Accessibility Considerations
* **Contrast:** Ensure data text (Slate-700) on row backgrounds meets AA standards.
* **Focus States:** High-visibility focus rings (Indigo-500 ring) for keyboard navigation in the grid.
## Design Direction Decision
### Design Directions Explored
Multiple design approaches were evaluated to balance density, readability, and modern aesthetics:
* **"Corporate Legacy":** Mimicking Minitab/Excel directly (too cluttered).
* **"Creative Canvas":** Like Notion/Miro (too open-ended).
* **"Lab & Tech":** A hybrid of Vercel's minimalism and Excel's density.
### Chosen Direction
**"Lab & Tech" with Shadcn UI & TanStack Table**
* **Visual Style:** Minimalist, data-first, with a strong Dark Mode.
* **Components:** Shadcn UI for the shell, TanStack Table for the grid.
* **Palette:** Slate + Indigo + Rose/Emerald semantic indicators.
### Design Rationale
* **User Fit:** Matches Julien's need for a professional, distraction-free environment.
* **Modernity:** Positions the tool as a "Next-Gen" product compared to legacy competitors.
* **Scalability:** The component library allows for easy addition of complex statistical widgets later.
### Implementation Approach
* **CSS Framework:** Tailwind CSS.
* **Component Library:** Shadcn UI (Radix based).
* **Icons:** Lucide React.
* **Charts:** Recharts.
## User Journey Flows
### Journey 1: Julien - The Guided Hygiene Loop
This flow details how Julien interacts with the system to clean his data. The focus is on the "Ping-Pong" interaction between the Grid and the Insight Panel.
```mermaid
graph TD
A[Start: File Uploaded] --> B{System Checks}
B -->|Clean| C[Grid View: Standard]
B -->|Issues Found| D[Grid View: Warning Badge on Header]
D --> E(User Clicks Badge)
E --> F[Action: Open Insight Panel]
subgraph Insight Panel Interaction
F --> G[Display: Issue Description + Chart]
G --> H[Display: Proposed Fix]
H --> I{User Decision}
I -->|Ignore| J[Close Panel & Remove Badge]
I -->|Apply Fix| K[Action: Update Grid Data]
end
K --> L[Feedback: Toast 'Fix Applied']
L --> M[Update Model Score R²]
M --> N[End: Ready for Regression]
```
### Journey 2: Marc - Mobile Decision Making
Optimized for touch and "Read-Only" consumption. No dense grids, just insights.
```mermaid
graph TD
A[Start: Click Link in Email] --> B[View: Mobile Dashboard]
B --> C[Display: Key Metrics Cards]
B --> D[Display: Regression Chart]
D --> E(User Taps Data Point)
E --> F[Action: Show Tooltip Details]
subgraph Decision
F --> G{Is Data Valid?}
G -->|No| H[Action: Add Comment 'Check this']
G -->|Yes| I[Action: Click 'Approve Analysis']
end
H --> J[Notify Julien]
I --> K[Generate PDF & Archive]
```
### Journey 3: Error Handling - The "Graceful Fail"
Ensuring the system handles bad inputs without crashing the Python backend.
```mermaid
graph TD
A[Start: Upload 50MB .xlsb] --> B{Validation Service}
B -->|Success| C[Proceed to Parsing]
B -->|Fail: Macros Detected| D[State: Upload Error]
D --> E[Display: Error Modal]
E --> F[Content: 'Security Risk Detected']
E --> G[Action: 'Sanitize & Retry' Button]
G --> H{Sanitization}
H -->|Success| C
H -->|Fail| I[Display: 'Please upload .xlsx or .csv']
```
### Flow Optimization Principles
1. **Non-Blocking Errors:** Warnings (like outliers) should never block the user from navigating. They are "suggestions", not "gates".
2. **Context Preservation:** When opening the Insight Panel, the relevant grid columns must scroll into view automatically.
3. **Optimistic UI:** When Julien clicks "Apply Fix", the UI updates instantly (Gray out rows) even while the backend saves the state.
## Component Strategy
### Design System Components (Shadcn UI)
We will rely on the standard library for:
* **Layout:** `Sheet` (for Insight Panel), `ScrollArea`, `Resizable`.
* **Forms:** `Button`, `Input`, `Select`, `Switch`.
* **Feedback:** `Toast`, `Progress`, `Skeleton` (for loading states).
### Custom Components Specification
#### 1. ``
The central nervous system of the app.
* **Purpose:** Virtualized rendering of massive datasets with Excel-like interactions.
* **Core Props:**
* `data: any[]` - The raw dataset.
* `columns: ColumnDef[]` - Definitions including types and formatters.
* `onCellEdit: (rowId, colId, value) => void` - Handler for data mutation.
* `highlightedRows: string[]` - IDs of rows to highlight (e.g., outliers).
* **Key States:** `Loading`, `Empty`, `Filtering`, `Editing`.
#### 2. ``
The container for Explainable AI interactions.
* **Purpose:** Contextual sidebar for statistical insights and data cleaning.
* **Core Props:**
* `isOpen: boolean` - Visibility state.
* `insight: InsightObject` - Contains `{ type: 'outlier' | 'correlation', description: string, chartData: any }`.
* `onApplyFix: () => Promise` - Async handler for the fix action.
* **Anatomy:** Header (Title + Close), Body (Text + Recharts Graph), Footer (Action Buttons).
#### 3. ``
A rich header component for the grid.
* **Purpose:** Show name, type, and distribution summary.
* **Core Props:**
* `label: string`.
* `type: 'numeric' | 'categorical' | 'date'`.
* `distribution: number[]` - Data for the sparkline mini-chart.
* `hasWarning: boolean` - Triggers the red badge.
### Implementation Roadmap
1. **Phase 1 (Grid Core):** Implement `SmartGrid` with read-only virtualization (TanStack Table).
2. **Phase 2 (Interaction):** Add `ColumnHeader` visualization and `onCellEdit` logic.
3. **Phase 3 (Intelligence):** Build the `InsightPanel` and connect it to the outlier detection logic.
## UX Consistency Patterns
### Button Hierarchy
* **Primary (Indigo):** Reserved for "Positive Progression" actions (Run Regression, Save, Export). Only one per view.
* **Secondary (White/Outline):** For "Alternative" actions (Cancel, Clear Filter, Close Panel).
* **Destructive (Rose):** For "Irreversible" actions (Exclude Data, Delete Project). Always requires a confirmation step if significant.
* **Ghost (Transparent):** For tertiary actions inside toolbars (e.g., "Sort Ascending" icon button) to reduce visual noise.
### Feedback Patterns
* **Toasts (Ephemeral):** Used for success confirmations ("Data saved", "Model updated"). Position: Bottom-Right. Duration: 3s.
* **Inline Validation:** Used for data entry errors within the grid (e.g., entering text in a numeric column). Immediate red border + tooltip.
* **Global Status:** A persistent "Status Bar" at the top showing the system state (Ready / Processing... / Done).
### Grid Interaction Patterns (Excel Compatibility)
* **Navigation:** Arrow keys move focus between cells. Tab moves right. Enter moves down.
* **Selection:** Click to select cell. Shift+Click to select range. Click row header to select row.
* **Editing:** Double-click or `Enter` starts editing. `Esc` cancels. `Enter` saves.
* **Context Menu:** Right-click triggers action menu specific to the selected object (Cell vs Row vs Column).
### Empty States
* **No Data:** Don't show an empty grid. Show a "Drop Zone" with a clear CTA ("Upload Excel File") and sample datasets for exploration.
* **No Selection:** When the Insight Panel is open but nothing is selected, show a helper illustration ("Select a column to see stats").
## Responsive Design & Accessibility
### Responsive Strategy
* **Desktop Only:** The application is strictly optimized for high-resolution desktop displays (1366px width minimum). No responsive breakpoints for mobile or tablet will be implemented.
* **Layout Focus:** Use a fixed Sidebar + Liquid Grid layout. The grid will expand to fill all available horizontal space.
### Breakpoint Strategy
* **Default:** 1440px+ (Optimized).
* **Minimum:** 1280px (Functional). Below this, a horizontal scrollbar will appear for the entire app shell to preserve data integrity.
### Accessibility Strategy
* **Compliance:** WCAG 2.1 Level AA.
* **Keyboard First:** Full focus on making the Data Grid and Insight Panel navigable without a mouse.
* **Screen Reader support:** Required for statistical summaries and report highlights.
### Testing Strategy
* **Browsers:** Chrome, Edge, and Firefox (latest 2 versions).
* **Devices:** Standard laptops (13" to 16") and external monitors (24"+).
### Implementation Guidelines
* **Container Query:** Use `@container` for complex widgets (like the Insight Panel) to adapt their layout based on the sidebar's width rather than the screen width.
* **Focus Management:** Ensure the focus ring is never hidden and follows a logical order (Sidebar -> Grid -> Insight Panel).