20 KiB
| stepsCompleted | inputDocuments | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
UX Design Specification Data_analysis
Author: Sepehr Date: 2026-01-10
Executive Summary
Project Vision
Create a modern, web-based, "No-Code" alternative to Minitab. The goal is to empower domain experts (engineers, analysts) to perform rigorous statistical regressions via a hybrid interface combining the simplicity of Excel with the computational power of Python.
Target Users
- Julien (Analyst/Engineer): Domain expert user, seeks efficiency and rigor without coding. Primarily uses a desktop computer.
- Marc (Decision Maker): Result consumer, needs clear, mobile-friendly reports to validate production decisions.
Key Design Challenges
- Grid Performance: Maintain fluid interactivity with large data volumes (virtualization).
- Statistical Vulgarization: Make variable selection and outlier detection concepts intuitive through visual design.
- Guided Workflow: Design a conversion funnel (from raw file to final report) that reduces cognitive load.
Design Opportunities
- Familiar Interface: Leverage Microsoft Excel design patterns to reduce initial friction.
- "Mobile-First" Reports: Create a competitive advantage with report exports and views optimized for tablets.
Core User Experience
Defining Experience
The core of Data_analysis is the "Smart Grid". Unlike a static HTML table, this grid feels alive. It's the command center where data ingestion, cleaning, and exploration happen seamlessly. Users don't "run scripts"; they interact with their data directly, with the system acting as an intelligent co-pilot suggesting corrections and insights.
Platform Strategy
- Desktop (Primary): Optimized for mouse/keyboard inputs. High density of information. Supports "Power User" shortcuts (Ctrl+Z, Arrows).
- Tablet (Secondary): Optimized for touch. "Read-only" mode for reports and dashboards. Lower density, larger touch targets.
Effortless Interactions
- Zero-Config Import: Drag-and-drop Excel ingestion with auto-detection of headers, types, and delimiters. No wizard fatigue.
- One-Click Hygiene: Automated detection of data anomalies (NaNs, wrong types) with single-click remediation actions ("Fix all", "Drop rows").
Critical Success Moments
- The "Clarity" Moment: When the "Smart Feature Selection" reduces a chaotic 50-column dataset to the 3-4 variables that actually matter, visualized clearly.
- The "Confidence" Moment: When the system confirms "No outliers detected" or "Model assumptions met" via clear green indicators before generating the report.
Experience Principles
- Direct Manipulation: Don't hide data behind menus. Let users click, edit, and filter right where the data lives.
- Proactive Intelligence: Don't wait for the user to find errors. Highlight them immediately and offer solutions.
- Visual First: Show the data distribution (mini-histograms) in the headers. Show the outliers on a plot, not just a list of row numbers.
Desired Emotional Response
Primary Emotional Goals
The primary emotional goal of Data_analysis is to move the user from Anxiety to Confidence. Statistics can be intimidating; our interface must act as a reassuring expert co-pilot.
Emotional Journey Mapping
- Discovery: Curiosity & Hope. "Can this really replace my manual Excel cleaning?"
- Data Ingestion: Relief. "It parsed my file instantly without errors."
- Data Cleaning: Surprise & Empowerment. "I didn't know I had outliers, now I see them clearly."
- Analysis/Reporting: Confidence & Pride. "This report looks professional and I understand every part of it."
Micro-Emotions
- Trust vs. Skepticism: Built through "Explainable AI" tooltips.
- Calm vs. Frustration: Achieved through smooth animations and non-blocking background tasks.
- Mastery vs. Confusion: Delivered by guiding the user through a linear logical workflow.
Design Implications
- Confidence → Use a sober, professional color palette (Blues/Grays). Provide clear "Validation" checkmarks when data is clean.
- Relief → Automate tedious tasks like type-casting and missing value detection. Use "Undo" to remove the fear of making mistakes.
- Empowerment → Use natural language labels instead of cryptic statistical abbreviations (e.g., "Predictive Power" instead of "Coefficient of Determination").
Emotional Design Principles
- Safety Net: Users should never feel like they can "break" the data. Every action is reversible.
- No Dead Ends: If an error occurs (e.g., singular matrix), explain why in plain French and how to fix it.
- Visual Rewards: Use subtle success animations when a model is successfully trained.
UX Pattern Analysis & Inspiration
Inspiring Products Analysis
- Microsoft Excel: The standard for grid interaction. Users expect double-click editing, arrow-key navigation, and "fill-down" patterns.
- Airtable: Revolutionized the data grid with modern UI patterns. We adopt their clean column headers, visual data types (badges, progress bars), and intuitive filtering.
- Linear / Vercel: The benchmark for high-performance developer tools. We draw inspiration from their minimalist aesthetic, exceptional Dark Mode, and keyboard-first navigation.
Transferable UX Patterns
- Navigation: Sidebar-less / Hub & Spoke. Focus on the data grid as the central workspace with floating or collapsible side panels for analysis tools.
- Interaction: "Sheet-to-Report" Pipeline. A clear horizontal or vertical progression from raw data to a finalized interactive report.
- Visual: Statistical Overlays. Using "Sparklines" (mini-histograms) in column headers to show data distribution at a glance.
Anti-Patterns to Avoid
- The Modal Maze: Opening a new pop-up window for every statistical setting. We prefer slide-over panels or inline settings to keep the context visible.
- Opaque Processing: Showing a generic spinner during long calculations. We will use a "Step-by-Step" status bar (e.g., "1. Parsing -> 2. Detecting Outliers -> 3. Selecting Features").
Design Inspiration Strategy
- Adopt: The "TanStack Table" logic for grid virtualization (Excel speed) combined with Shadcn UI components (Vercel aesthetic).
- Adapt: Excel's right-click menu to include specific statistical actions like "Exclude from analysis" or "Set as Target (Y)".
- Avoid: Complex "Dashboard Builders." Users want a generated report, not a canvas they have to design themselves.
Design System Foundation
1.1 Design System Choice
The project will use Shadcn UI as the primary UI library, built on top of Tailwind CSS and Radix UI. The core data interaction will be powered by TanStack Table (headless) to create a custom, high-performance "Smart Grid."
Rationale for Selection
- Performance: TanStack Table allows for massive data virtualization (50k+ rows) without the overhead of heavy UI frameworks.
- Aesthetic Consistency: Shadcn provides the "Vercel-like" minimalist and professional aesthetic defined in our inspiration phase.
- Accessibility: Leveraging Radix UI primitives ensures that complex components (popovers, dropdowns, dialogs) are fully WCAG compliant.
- Developer Experience: Direct ownership of component code allows for deep customization of statistical-specific UI elements.
Implementation Approach
- Shell: Standard Shadcn layout components (Sidebar, TopNav).
- Data Grid: A custom-built component using TanStack Table's hook logic, styled with Shadcn Table primitives.
- Charts: Integration of Recharts or Tremor (which matches Shadcn's style) for statistical visualizations.
Customization Strategy
- Tokens: Neutral gray base with "Scientific Blue" as the primary action color.
- Typography: Sans-serif (Geist or Inter) for the UI; Monospace (JetBrains Mono) for data cells and statistical metrics.
- Density: "High-Density" mode by default for the grid (small cell padding) to maximize data visibility.
2. Core User Experience
2.1 Defining Experience
The defining interaction of Data_analysis is the "Guided Data Hygiene Loop". It transforms the tedious task of cleaning data into a rapid, rewarding conversation with the system. Users don't "edit cells"; they respond to intelligent insights that actively improve their model's quality in real-time.
2.2 User Mental Model
- Current Model: "I have to manually hunt for errors row by row in Excel, then delete them and hope I didn't break anything."
- Target Model: "The system is my Quality Assistant. It points out the issues, I make the executive decision, and I instantly see the result."
2.3 Success Criteria
- Speed: Reviewing and fixing 50 outliers should take less than 30 seconds.
- Safety: Users must feel that "excluding" data is non-destructive (reversible).
- Reward: Every fix must trigger a positive visual feedback (e.g., model accuracy score pulsing green).
2.4 Novel UX Patterns
- "Contextual Insight Panel": Instead of modal popups, a slide-over panel allows users to see the specific rows in question (highlighted in the grid) while reviewing the statistical explanation (boxplot/histogram) side-by-side.
- "Live Impact Preview": Before confirming an exclusion, hover over the button to see a "Ghost Curve" showing how the regression line will change.
2.5 Experience Mechanics
- Initiation: System highlights "dirty" columns with a subtle warning badge in the header.
- Interaction: User clicks the header badge. The Insight Panel slides in.
- Feedback: The panel shows "34 values are > 3 Sigma". The grid highlights these 34 rows.
- Action: User clicks "Exclude All". Rows fade to gray. The Regression R² badge updates from 0.65 to 0.82 with a celebration animation.
- Completion: The column header badge turns to a green checkmark.
Visual Design Foundation
Color System
- Neutral: Slate (50-900) - Technical, cold background for heavy data.
- Primary: Indigo (600) - For primary actions ("Run Regression").
- Semantic Data Colors:
- Rose (500): Outliers/Errors (Soft alert).
- Emerald (500): Valid Data/Success (Reassurance).
- Amber (500): Warnings/Missing Values.
- Modes: Fully supported Dark Mode using Slate-900 backgrounds and Indigo-400 primary accents.
Typography System
- Interface:
Inter(or Geist Sans) - Clean, legible at small sizes. - Data:
JetBrains Mono- Mandatory for the grid to ensure tabular alignment of decimals.
Spacing & Layout Foundation
- Grid Density: Ultra-compact (4px y-padding) to maximize data visibility.
- Panel Density: Comfortable (16px padding) for reading insights.
- Layout: Full-width liquid layout. No wasted margins.
Accessibility Considerations
- Contrast: Ensure data text (Slate-700) on row backgrounds meets AA standards.
- Focus States: High-visibility focus rings (Indigo-500 ring) for keyboard navigation in the grid.
Design Direction Decision
Design Directions Explored
Multiple design approaches were evaluated to balance density, readability, and modern aesthetics:
- "Corporate Legacy": Mimicking Minitab/Excel directly (too cluttered).
- "Creative Canvas": Like Notion/Miro (too open-ended).
- "Lab & Tech": A hybrid of Vercel's minimalism and Excel's density.
Chosen Direction
"Lab & Tech" with Shadcn UI & TanStack Table
- Visual Style: Minimalist, data-first, with a strong Dark Mode.
- Components: Shadcn UI for the shell, TanStack Table for the grid.
- Palette: Slate + Indigo + Rose/Emerald semantic indicators.
Design Rationale
- User Fit: Matches Julien's need for a professional, distraction-free environment.
- Modernity: Positions the tool as a "Next-Gen" product compared to legacy competitors.
- Scalability: The component library allows for easy addition of complex statistical widgets later.
Implementation Approach
- CSS Framework: Tailwind CSS.
- Component Library: Shadcn UI (Radix based).
- Icons: Lucide React.
- Charts: Recharts.
User Journey Flows
Journey 1: Julien - The Guided Hygiene Loop
This flow details how Julien interacts with the system to clean his data. The focus is on the "Ping-Pong" interaction between the Grid and the Insight Panel.
graph TD
A[Start: File Uploaded] --> B{System Checks}
B -->|Clean| C[Grid View: Standard]
B -->|Issues Found| D[Grid View: Warning Badge on Header]
D --> E(User Clicks Badge)
E --> F[Action: Open Insight Panel]
subgraph Insight Panel Interaction
F --> G[Display: Issue Description + Chart]
G --> H[Display: Proposed Fix]
H --> I{User Decision}
I -->|Ignore| J[Close Panel & Remove Badge]
I -->|Apply Fix| K[Action: Update Grid Data]
end
K --> L[Feedback: Toast 'Fix Applied']
L --> M[Update Model Score R²]
M --> N[End: Ready for Regression]
Journey 2: Marc - Mobile Decision Making
Optimized for touch and "Read-Only" consumption. No dense grids, just insights.
graph TD
A[Start: Click Link in Email] --> B[View: Mobile Dashboard]
B --> C[Display: Key Metrics Cards]
B --> D[Display: Regression Chart]
D --> E(User Taps Data Point)
E --> F[Action: Show Tooltip Details]
subgraph Decision
F --> G{Is Data Valid?}
G -->|No| H[Action: Add Comment 'Check this']
G -->|Yes| I[Action: Click 'Approve Analysis']
end
H --> J[Notify Julien]
I --> K[Generate PDF & Archive]
Journey 3: Error Handling - The "Graceful Fail"
Ensuring the system handles bad inputs without crashing the Python backend.
graph TD
A[Start: Upload 50MB .xlsb] --> B{Validation Service}
B -->|Success| C[Proceed to Parsing]
B -->|Fail: Macros Detected| D[State: Upload Error]
D --> E[Display: Error Modal]
E --> F[Content: 'Security Risk Detected']
E --> G[Action: 'Sanitize & Retry' Button]
G --> H{Sanitization}
H -->|Success| C
H -->|Fail| I[Display: 'Please upload .xlsx or .csv']
Flow Optimization Principles
- Non-Blocking Errors: Warnings (like outliers) should never block the user from navigating. They are "suggestions", not "gates".
- Context Preservation: When opening the Insight Panel, the relevant grid columns must scroll into view automatically.
- Optimistic UI: When Julien clicks "Apply Fix", the UI updates instantly (Gray out rows) even while the backend saves the state.
Component Strategy
Design System Components (Shadcn UI)
We will rely on the standard library for:
- Layout:
Sheet(for Insight Panel),ScrollArea,Resizable. - Forms:
Button,Input,Select,Switch. - Feedback:
Toast,Progress,Skeleton(for loading states).
Custom Components Specification
1. <SmartGrid />
The central nervous system of the app.
- Purpose: Virtualized rendering of massive datasets with Excel-like interactions.
- Core Props:
data: any[]- The raw dataset.columns: ColumnDef[]- Definitions including types and formatters.onCellEdit: (rowId, colId, value) => void- Handler for data mutation.highlightedRows: string[]- IDs of rows to highlight (e.g., outliers).
- Key States:
Loading,Empty,Filtering,Editing.
2. <InsightPanel />
The container for Explainable AI interactions.
- Purpose: Contextual sidebar for statistical insights and data cleaning.
- Core Props:
isOpen: boolean- Visibility state.insight: InsightObject- Contains{ type: 'outlier' | 'correlation', description: string, chartData: any }.onApplyFix: () => Promise<void>- Async handler for the fix action.
- Anatomy: Header (Title + Close), Body (Text + Recharts Graph), Footer (Action Buttons).
3. <ColumnHeader />
A rich header component for the grid.
- Purpose: Show name, type, and distribution summary.
- Core Props:
label: string.type: 'numeric' | 'categorical' | 'date'.distribution: number[]- Data for the sparkline mini-chart.hasWarning: boolean- Triggers the red badge.
Implementation Roadmap
- Phase 1 (Grid Core): Implement
SmartGridwith read-only virtualization (TanStack Table). - Phase 2 (Interaction): Add
ColumnHeadervisualization andonCellEditlogic. - Phase 3 (Intelligence): Build the
InsightPaneland connect it to the outlier detection logic.
UX Consistency Patterns
Button Hierarchy
- Primary (Indigo): Reserved for "Positive Progression" actions (Run Regression, Save, Export). Only one per view.
- Secondary (White/Outline): For "Alternative" actions (Cancel, Clear Filter, Close Panel).
- Destructive (Rose): For "Irreversible" actions (Exclude Data, Delete Project). Always requires a confirmation step if significant.
- Ghost (Transparent): For tertiary actions inside toolbars (e.g., "Sort Ascending" icon button) to reduce visual noise.
Feedback Patterns
- Toasts (Ephemeral): Used for success confirmations ("Data saved", "Model updated"). Position: Bottom-Right. Duration: 3s.
- Inline Validation: Used for data entry errors within the grid (e.g., entering text in a numeric column). Immediate red border + tooltip.
- Global Status: A persistent "Status Bar" at the top showing the system state (Ready / Processing... / Done).
Grid Interaction Patterns (Excel Compatibility)
- Navigation: Arrow keys move focus between cells. Tab moves right. Enter moves down.
- Selection: Click to select cell. Shift+Click to select range. Click row header to select row.
- Editing: Double-click or
Enterstarts editing.Esccancels.Entersaves. - Context Menu: Right-click triggers action menu specific to the selected object (Cell vs Row vs Column).
Empty States
- No Data: Don't show an empty grid. Show a "Drop Zone" with a clear CTA ("Upload Excel File") and sample datasets for exploration.
- No Selection: When the Insight Panel is open but nothing is selected, show a helper illustration ("Select a column to see stats").
Responsive Design & Accessibility
Responsive Strategy
- Desktop Only: The application is strictly optimized for high-resolution desktop displays (1366px width minimum). No responsive breakpoints for mobile or tablet will be implemented.
- Layout Focus: Use a fixed Sidebar + Liquid Grid layout. The grid will expand to fill all available horizontal space.
Breakpoint Strategy
- Default: 1440px+ (Optimized).
- Minimum: 1280px (Functional). Below this, a horizontal scrollbar will appear for the entire app shell to preserve data integrity.
Accessibility Strategy
- Compliance: WCAG 2.1 Level AA.
- Keyboard First: Full focus on making the Data Grid and Insight Panel navigable without a mouse.
- Screen Reader support: Required for statistical summaries and report highlights.
Testing Strategy
- Browsers: Chrome, Edge, and Firefox (latest 2 versions).
- Devices: Standard laptops (13" to 16") and external monitors (24"+).
Implementation Guidelines
- Container Query: Use
@containerfor complex widgets (like the Insight Panel) to adapt their layout based on the sidebar's width rather than the screen width. - Focus Management: Ensure the focus ring is never hidden and follows a logical order (Sidebar -> Grid -> Insight Panel).