--- stepsCompleted: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] inputDocuments: ['_bmad-output/planning-artifacts/prd.md'] --- # UX Design Specification Data_analysis **Author:** Sepehr **Date:** 2026-01-10 --- ## Executive Summary ### Project Vision Create a modern, web-based, "No-Code" alternative to Minitab. The goal is to empower domain experts (engineers, analysts) to perform rigorous statistical regressions via a hybrid interface combining the simplicity of Excel with the computational power of Python. ### Target Users * **Julien (Analyst/Engineer):** Domain expert user, seeks efficiency and rigor without coding. Primarily uses a desktop computer. * **Marc (Decision Maker):** Result consumer, needs clear, mobile-friendly reports to validate production decisions. ### Key Design Challenges * **Grid Performance:** Maintain fluid interactivity with large data volumes (virtualization). * **Statistical Vulgarization:** Make variable selection and outlier detection concepts intuitive through visual design. * **Guided Workflow:** Design a conversion funnel (from raw file to final report) that reduces cognitive load. ### Design Opportunities * **Familiar Interface:** Leverage Microsoft Excel design patterns to reduce initial friction. * **"Mobile-First" Reports:** Create a competitive advantage with report exports and views optimized for tablets. ## Core User Experience ### Defining Experience The core of Data_analysis is the **"Smart Grid"**. Unlike a static HTML table, this grid feels alive. It's the command center where data ingestion, cleaning, and exploration happen seamlessly. Users don't "run scripts"; they interact with their data directly, with the system acting as an intelligent co-pilot suggesting corrections and insights. ### Platform Strategy * **Desktop (Primary):** Optimized for mouse/keyboard inputs. High density of information. Supports "Power User" shortcuts (Ctrl+Z, Arrows). * **Tablet (Secondary):** Optimized for touch. "Read-only" mode for reports and dashboards. Lower density, larger touch targets. ### Effortless Interactions * **Zero-Config Import:** Drag-and-drop Excel ingestion with auto-detection of headers, types, and delimiters. No wizard fatigue. * **One-Click Hygiene:** Automated detection of data anomalies (NaNs, wrong types) with single-click remediation actions ("Fix all", "Drop rows"). ### Critical Success Moments * **The "Clarity" Moment:** When the "Smart Feature Selection" reduces a chaotic 50-column dataset to the 3-4 variables that actually matter, visualized clearly. * **The "Confidence" Moment:** When the system confirms "No outliers detected" or "Model assumptions met" via clear green indicators before generating the report. ### Experience Principles 1. **Direct Manipulation:** Don't hide data behind menus. Let users click, edit, and filter right where the data lives. 2. **Proactive Intelligence:** Don't wait for the user to find errors. Highlight them immediately and offer solutions. 3. **Visual First:** Show the data distribution (mini-histograms) in the headers. Show the outliers on a plot, not just a list of row numbers. ## Desired Emotional Response ### Primary Emotional Goals The primary emotional goal of Data_analysis is to move the user from **Anxiety to Confidence**. Statistics can be intimidating; our interface must act as a reassuring expert co-pilot. ### Emotional Journey Mapping * **Discovery:** **Curiosity & Hope.** "Can this really replace my manual Excel cleaning?" * **Data Ingestion:** **Relief.** "It parsed my file instantly without errors." * **Data Cleaning:** **Surprise & Empowerment.** "I didn't know I had outliers, now I see them clearly." * **Analysis/Reporting:** **Confidence & Pride.** "This report looks professional and I understand every part of it." ### Micro-Emotions * **Trust vs. Skepticism:** Built through "Explainable AI" tooltips. * **Calm vs. Frustration:** Achieved through smooth animations and non-blocking background tasks. * **Mastery vs. Confusion:** Delivered by guiding the user through a linear logical workflow. ### Design Implications * **Confidence** → Use a sober, professional color palette (Blues/Grays). Provide clear "Validation" checkmarks when data is clean. * **Relief** → Automate tedious tasks like type-casting and missing value detection. Use "Undo" to remove the fear of making mistakes. * **Empowerment** → Use natural language labels instead of cryptic statistical abbreviations (e.g., "Predictive Power" instead of "Coefficient of Determination"). ### Emotional Design Principles 1. **Safety Net:** Users should never feel like they can "break" the data. Every action is reversible. 2. **No Dead Ends:** If an error occurs (e.g., singular matrix), explain *why* in plain French and how to fix it. 3. **Visual Rewards:** Use subtle success animations when a model is successfully trained. ## UX Pattern Analysis & Inspiration ### Inspiring Products Analysis * **Microsoft Excel:** The standard for grid interaction. Users expect double-click editing, arrow-key navigation, and "fill-down" patterns. * **Airtable:** Revolutionized the data grid with modern UI patterns. We adopt their clean column headers, visual data types (badges, progress bars), and intuitive filtering. * **Linear / Vercel:** The benchmark for high-performance developer tools. We draw inspiration from their minimalist aesthetic, exceptional Dark Mode, and keyboard-first navigation. ### Transferable UX Patterns * **Navigation:** **Sidebar-less / Hub & Spoke.** Focus on the data grid as the central workspace with floating or collapsible side panels for analysis tools. * **Interaction:** **"Sheet-to-Report" Pipeline.** A clear horizontal or vertical progression from raw data to a finalized interactive report. * **Visual:** **Statistical Overlays.** Using "Sparklines" (mini-histograms) in column headers to show data distribution at a glance. ### Anti-Patterns to Avoid * **The Modal Maze:** Opening a new pop-up window for every statistical setting. We prefer slide-over panels or inline settings to keep the context visible. * **Opaque Processing:** Showing a generic spinner during long calculations. We will use a "Step-by-Step" status bar (e.g., "1. Parsing -> 2. Detecting Outliers -> 3. Selecting Features"). ### Design Inspiration Strategy * **Adopt:** The "TanStack Table" logic for grid virtualization (Excel speed) combined with Shadcn UI components (Vercel aesthetic). * **Adapt:** Excel's right-click menu to include specific statistical actions like "Exclude from analysis" or "Set as Target (Y)". * **Avoid:** Complex "Dashboard Builders." Users want a generated report, not a canvas they have to design themselves. ## Design System Foundation ### 1.1 Design System Choice The project will use **Shadcn UI** as the primary UI library, built on top of **Tailwind CSS** and **Radix UI**. The core data interaction will be powered by **TanStack Table (headless)** to create a custom, high-performance "Smart Grid." ### Rationale for Selection * **Performance:** TanStack Table allows for massive data virtualization (50k+ rows) without the overhead of heavy UI frameworks. * **Aesthetic Consistency:** Shadcn provides the "Vercel-like" minimalist and professional aesthetic defined in our inspiration phase. * **Accessibility:** Leveraging Radix UI primitives ensures that complex components (popovers, dropdowns, dialogs) are fully WCAG compliant. * **Developer Experience:** Direct ownership of component code allows for deep customization of statistical-specific UI elements. ### Implementation Approach * **Shell:** Standard Shadcn layout components (Sidebar, TopNav). * **Data Grid:** A custom-built component using TanStack Table's hook logic, styled with Shadcn Table primitives. * **Charts:** Integration of **Recharts** or **Tremor** (which matches Shadcn's style) for statistical visualizations. ### Customization Strategy * **Tokens:** Neutral gray base with "Scientific Blue" as the primary action color. * **Typography:** Sans-serif (Geist or Inter) for the UI; Monospace (JetBrains Mono) for data cells and statistical metrics. * **Density:** "High-Density" mode by default for the grid (small cell padding) to maximize data visibility. ## 2. Core User Experience ### 2.1 Defining Experience The defining interaction of Data_analysis is the **"Guided Data Hygiene Loop"**. It transforms the tedious task of cleaning data into a rapid, rewarding conversation with the system. Users don't "edit cells"; they respond to intelligent insights that actively improve their model's quality in real-time. ### 2.2 User Mental Model * **Current Model:** "I have to manually hunt for errors row by row in Excel, then delete them and hope I didn't break anything." * **Target Model:** "The system is my Quality Assistant. It points out the issues, I make the executive decision, and I instantly see the result." ### 2.3 Success Criteria * **Speed:** Reviewing and fixing 50 outliers should take less than 30 seconds. * **Safety:** Users must feel that "excluding" data is non-destructive (reversible). * **Reward:** Every fix must trigger a positive visual feedback (e.g., model accuracy score pulsing green). ### 2.4 Novel UX Patterns * **"Contextual Insight Panel":** Instead of modal popups, a slide-over panel allows users to see the specific rows in question (highlighted in the grid) while reviewing the statistical explanation (boxplot/histogram) side-by-side. * **"Live Impact Preview":** Before confirming an exclusion, hover over the button to see a "Ghost Curve" showing how the regression line *will* change. ### 2.5 Experience Mechanics 1. **Initiation:** System highlights "dirty" columns with a subtle warning badge in the header. 2. **Interaction:** User clicks the header badge. The Insight Panel slides in. 3. **Feedback:** The panel shows "34 values are > 3 Sigma". The grid highlights these 34 rows. 4. **Action:** User clicks "Exclude All". Rows fade to gray. The Regression R² badge updates from 0.65 to 0.82 with a celebration animation. 5. **Completion:** The column header badge turns to a green checkmark. ## Visual Design Foundation ### Color System * **Neutral:** Slate (50-900) - Technical, cold background for heavy data. * **Primary:** Indigo (600) - For primary actions ("Run Regression"). * **Semantic Data Colors:** * **Rose (500):** Outliers/Errors (Soft alert). * **Emerald (500):** Valid Data/Success (Reassurance). * **Amber (500):** Warnings/Missing Values. * **Modes:** Fully supported Dark Mode using Slate-900 backgrounds and Indigo-400 primary accents. ### Typography System * **Interface:** `Inter` (or Geist Sans) - Clean, legible at small sizes. * **Data:** `JetBrains Mono` - Mandatory for the grid to ensure tabular alignment of decimals. ### Spacing & Layout Foundation * **Grid Density:** Ultra-compact (4px y-padding) to maximize data visibility. * **Panel Density:** Comfortable (16px padding) for reading insights. * **Layout:** Full-width liquid layout. No wasted margins. ### Accessibility Considerations * **Contrast:** Ensure data text (Slate-700) on row backgrounds meets AA standards. * **Focus States:** High-visibility focus rings (Indigo-500 ring) for keyboard navigation in the grid. ## Design Direction Decision ### Design Directions Explored Multiple design approaches were evaluated to balance density, readability, and modern aesthetics: * **"Corporate Legacy":** Mimicking Minitab/Excel directly (too cluttered). * **"Creative Canvas":** Like Notion/Miro (too open-ended). * **"Lab & Tech":** A hybrid of Vercel's minimalism and Excel's density. ### Chosen Direction **"Lab & Tech" with Shadcn UI & TanStack Table** * **Visual Style:** Minimalist, data-first, with a strong Dark Mode. * **Components:** Shadcn UI for the shell, TanStack Table for the grid. * **Palette:** Slate + Indigo + Rose/Emerald semantic indicators. ### Design Rationale * **User Fit:** Matches Julien's need for a professional, distraction-free environment. * **Modernity:** Positions the tool as a "Next-Gen" product compared to legacy competitors. * **Scalability:** The component library allows for easy addition of complex statistical widgets later. ### Implementation Approach * **CSS Framework:** Tailwind CSS. * **Component Library:** Shadcn UI (Radix based). * **Icons:** Lucide React. * **Charts:** Recharts. ## User Journey Flows ### Journey 1: Julien - The Guided Hygiene Loop This flow details how Julien interacts with the system to clean his data. The focus is on the "Ping-Pong" interaction between the Grid and the Insight Panel. ```mermaid graph TD A[Start: File Uploaded] --> B{System Checks} B -->|Clean| C[Grid View: Standard] B -->|Issues Found| D[Grid View: Warning Badge on Header] D --> E(User Clicks Badge) E --> F[Action: Open Insight Panel] subgraph Insight Panel Interaction F --> G[Display: Issue Description + Chart] G --> H[Display: Proposed Fix] H --> I{User Decision} I -->|Ignore| J[Close Panel & Remove Badge] I -->|Apply Fix| K[Action: Update Grid Data] end K --> L[Feedback: Toast 'Fix Applied'] L --> M[Update Model Score R²] M --> N[End: Ready for Regression] ``` ### Journey 2: Marc - Mobile Decision Making Optimized for touch and "Read-Only" consumption. No dense grids, just insights. ```mermaid graph TD A[Start: Click Link in Email] --> B[View: Mobile Dashboard] B --> C[Display: Key Metrics Cards] B --> D[Display: Regression Chart] D --> E(User Taps Data Point) E --> F[Action: Show Tooltip Details] subgraph Decision F --> G{Is Data Valid?} G -->|No| H[Action: Add Comment 'Check this'] G -->|Yes| I[Action: Click 'Approve Analysis'] end H --> J[Notify Julien] I --> K[Generate PDF & Archive] ``` ### Journey 3: Error Handling - The "Graceful Fail" Ensuring the system handles bad inputs without crashing the Python backend. ```mermaid graph TD A[Start: Upload 50MB .xlsb] --> B{Validation Service} B -->|Success| C[Proceed to Parsing] B -->|Fail: Macros Detected| D[State: Upload Error] D --> E[Display: Error Modal] E --> F[Content: 'Security Risk Detected'] E --> G[Action: 'Sanitize & Retry' Button] G --> H{Sanitization} H -->|Success| C H -->|Fail| I[Display: 'Please upload .xlsx or .csv'] ``` ### Flow Optimization Principles 1. **Non-Blocking Errors:** Warnings (like outliers) should never block the user from navigating. They are "suggestions", not "gates". 2. **Context Preservation:** When opening the Insight Panel, the relevant grid columns must scroll into view automatically. 3. **Optimistic UI:** When Julien clicks "Apply Fix", the UI updates instantly (Gray out rows) even while the backend saves the state. ## Component Strategy ### Design System Components (Shadcn UI) We will rely on the standard library for: * **Layout:** `Sheet` (for Insight Panel), `ScrollArea`, `Resizable`. * **Forms:** `Button`, `Input`, `Select`, `Switch`. * **Feedback:** `Toast`, `Progress`, `Skeleton` (for loading states). ### Custom Components Specification #### 1. `` The central nervous system of the app. * **Purpose:** Virtualized rendering of massive datasets with Excel-like interactions. * **Core Props:** * `data: any[]` - The raw dataset. * `columns: ColumnDef[]` - Definitions including types and formatters. * `onCellEdit: (rowId, colId, value) => void` - Handler for data mutation. * `highlightedRows: string[]` - IDs of rows to highlight (e.g., outliers). * **Key States:** `Loading`, `Empty`, `Filtering`, `Editing`. #### 2. `` The container for Explainable AI interactions. * **Purpose:** Contextual sidebar for statistical insights and data cleaning. * **Core Props:** * `isOpen: boolean` - Visibility state. * `insight: InsightObject` - Contains `{ type: 'outlier' | 'correlation', description: string, chartData: any }`. * `onApplyFix: () => Promise` - Async handler for the fix action. * **Anatomy:** Header (Title + Close), Body (Text + Recharts Graph), Footer (Action Buttons). #### 3. `` A rich header component for the grid. * **Purpose:** Show name, type, and distribution summary. * **Core Props:** * `label: string`. * `type: 'numeric' | 'categorical' | 'date'`. * `distribution: number[]` - Data for the sparkline mini-chart. * `hasWarning: boolean` - Triggers the red badge. ### Implementation Roadmap 1. **Phase 1 (Grid Core):** Implement `SmartGrid` with read-only virtualization (TanStack Table). 2. **Phase 2 (Interaction):** Add `ColumnHeader` visualization and `onCellEdit` logic. 3. **Phase 3 (Intelligence):** Build the `InsightPanel` and connect it to the outlier detection logic. ## UX Consistency Patterns ### Button Hierarchy * **Primary (Indigo):** Reserved for "Positive Progression" actions (Run Regression, Save, Export). Only one per view. * **Secondary (White/Outline):** For "Alternative" actions (Cancel, Clear Filter, Close Panel). * **Destructive (Rose):** For "Irreversible" actions (Exclude Data, Delete Project). Always requires a confirmation step if significant. * **Ghost (Transparent):** For tertiary actions inside toolbars (e.g., "Sort Ascending" icon button) to reduce visual noise. ### Feedback Patterns * **Toasts (Ephemeral):** Used for success confirmations ("Data saved", "Model updated"). Position: Bottom-Right. Duration: 3s. * **Inline Validation:** Used for data entry errors within the grid (e.g., entering text in a numeric column). Immediate red border + tooltip. * **Global Status:** A persistent "Status Bar" at the top showing the system state (Ready / Processing... / Done). ### Grid Interaction Patterns (Excel Compatibility) * **Navigation:** Arrow keys move focus between cells. Tab moves right. Enter moves down. * **Selection:** Click to select cell. Shift+Click to select range. Click row header to select row. * **Editing:** Double-click or `Enter` starts editing. `Esc` cancels. `Enter` saves. * **Context Menu:** Right-click triggers action menu specific to the selected object (Cell vs Row vs Column). ### Empty States * **No Data:** Don't show an empty grid. Show a "Drop Zone" with a clear CTA ("Upload Excel File") and sample datasets for exploration. * **No Selection:** When the Insight Panel is open but nothing is selected, show a helper illustration ("Select a column to see stats"). ## Responsive Design & Accessibility ### Responsive Strategy * **Desktop Only:** The application is strictly optimized for high-resolution desktop displays (1366px width minimum). No responsive breakpoints for mobile or tablet will be implemented. * **Layout Focus:** Use a fixed Sidebar + Liquid Grid layout. The grid will expand to fill all available horizontal space. ### Breakpoint Strategy * **Default:** 1440px+ (Optimized). * **Minimum:** 1280px (Functional). Below this, a horizontal scrollbar will appear for the entire app shell to preserve data integrity. ### Accessibility Strategy * **Compliance:** WCAG 2.1 Level AA. * **Keyboard First:** Full focus on making the Data Grid and Insight Panel navigable without a mouse. * **Screen Reader support:** Required for statistical summaries and report highlights. ### Testing Strategy * **Browsers:** Chrome, Edge, and Firefox (latest 2 versions). * **Devices:** Standard laptops (13" to 16") and external monitors (24"+). ### Implementation Guidelines * **Container Query:** Use `@container` for complex widgets (like the Insight Panel) to adapt their layout based on the sidebar's width rather than the screen width. * **Focus Management:** Ensure the focus ring is never hidden and follows a logical order (Sidebar -> Grid -> Insight Panel).