Add automatic note clustering using density-based algorithm (DBSCAN variant) and bridge notes detection for connecting different thematic clusters. Features: - NoteCluster, ClusterMember, BridgeNote, BridgeSuggestion models - Clustering service with pgvector cosine similarity - Bridge notes detection (notes connecting >=2 clusters) - AI-powered suggestions for missing cluster connections - /insights page with React Flow visualization - Cron endpoint for automatic recalculation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11 KiB
title, type, created, status, baseline_commit, context
| title | type | created | status | baseline_commit | context | ||
|---|---|---|---|---|---|---|---|
| Cluster Detection & Bridge Notes Discovery | feature | 2026-05-23 | done | 2aed148dc2 |
|
Intent
Problem: Users have hundreds of notes but no automatic way to discover thematic clusters or connections between unrelated topics. Notes remain siloed in notebooks, and users manually create links between related concepts across different notebooks.
Approach: Implement automatic clustering using density-based algorithm (HDBSCAN equivalent) on note embeddings, detect "bridge notes" that connect multiple clusters, and provide AI-powered suggestions for missing connections. Visualize clusters graphically with a dashboard showing insights and connection opportunities.
Boundaries & Constraints
Always:
- Use existing pgvector embeddings from NoteEmbedding table
- Cluster per-user only (never cross-user data)
- All SQL uses parameterized queries (no interpolation)
- HDBSCAN: Node.js compatible implementation or custom density-based clustering
- Bridge score = len(clusters_touched) / max_clusters (0-1 range)
- Threshold for cluster membership: >= 0.3 similarity
- Threshold for bridge detection: >= 0.5 cosine similarity to >= 2 clusters
- Incremental recalculation only when > 5% embeddings changed or > 10 notes modified
Ask First:
- External Python microservice vs Node.js native clustering
- Cluster auto-naming: exact LLM prompt wording
- Dashboard UI placement: separate page vs modal vs sidebar panel
- Visualization library: D3.js, Cytoscape.js, or React Flow
Never:
- HDBSCAN pip install (Python) — we are a Node.js stack
- Real-time clustering on every note save
- Cross-user cluster analysis
- Modifying existing NoteLink or MemoryEchoInsight behavior
- Manual cluster assignment by users (out of scope for v1)
I/O & Edge-Case Matrix
| Scenario | Input / State | Expected Output / Behavior | Error Handling |
|---|---|---|---|
| CLUSTER_NEW_USER | User with < 10 notes, no existing clusters | Empty cluster list, message "Need more notes to cluster" | Return gracefully with status: insufficient_data |
| CLUSTER_EXISTING | User with 100+ notes, embeddings exist | N clusters (3-50), each with auto-generated name | Log errors, continue with partial results |
| BRIDGE_DETECTION | Note similar to cluster A (0.7) and cluster B (0.6) | Marked as bridge, bridge_score = 0.5, clusters_connected = [A, B] | Skip if similarity < 0.5 |
| NO_BRIDGES | User with isolated clusters (no cross-cluster similarities) | Empty bridge list, suggestions for missing connections | Return empty array |
| SUGGEST_CONNECTIONS | Cluster A (Recipes) and Cluster B (Health) unconnected | 3 AI-suggested bridge note ideas with titles/descriptions | Fallback to generic "Connect [A] and [B]" if LLM fails |
| INCREMENTAL_UPDATE | 3 notes modified out of 200 | Skip recalculation (< 5% threshold) | Return cached results |
| MAJOR_UPDATE | 15 notes modified out of 200 | Trigger full recalculation | Queue background job, return immediately |
| EMBEDDING_MISSING | Note exists but no embedding in NoteEmbedding | Exclude from clustering | Log warning, continue with available notes |
Code Map
memento-note/prisma/schema.prisma-- Add NoteCluster and BridgeNote modelsmemento-note/lib/ai/services/clustering.service.ts-- NEW: HDBSCAN-style clustering algorithmmemento-note/lib/ai/services/bridge-notes.service.ts-- NEW: Bridge detection and suggestionsmemento-note/app/api/clusters/route.ts-- NEW: GET /api/clusters, POST /api/clusters/recalculatememento-note/app/api/bridge-notes/route.ts-- NEW: GET /api/bridge-notes, GET /api/bridge-notes/suggestionsmemento-note/app/api/cron/clusters/route.ts-- NEW: Cron endpoint for daily recalculationmemento-note/components/cluster-visualization.tsx-- NEW: Graph visualization with cluster coloringmemento-note/components/bridge-notes-dashboard.tsx-- NEW: Dashboard showing insights and suggestionsmemento-note/app/(main)/insights/page.tsx-- NEW: Insights page housing the dashboard
Tasks & Acceptance
Execution:
memento-note/prisma/schema.prisma-- Add NoteCluster and BridgeNote models with proper indexes -- Store cluster mappings and bridge metadatamemento-note/lib/ai/services/clustering.service.ts-- Implement density-based clustering (HDBSCAN-style in Node.js) -- Core algorithm for automatic note groupingmemento-note/lib/ai/services/bridge-notes.service.ts-- Implement bridge detection and AI suggestion generation -- Find cross-cluster connectionsmemento-note/app/api/clusters/route.ts-- Create cluster listing and recalculation endpoints -- REST API for cluster operationsmemento-note/app/api/bridge-notes/route.ts-- Create bridge notes and suggestions endpoints -- REST API for bridge operationsmemento-note/app/api/cron/clusters/route.ts-- Create cron endpoint for scheduled recalculation -- Automated background processingmemento-note/components/cluster-visualization.tsx-- Build interactive graph with cluster coloring -- Visual representation of note clustersmemento-note/components/bridge-notes-dashboard.tsx-- Build insights dashboard with stats and suggestions -- User-facing cluster and bridge informationmemento-note/app/(main)/insights/page.tsx-- Create main insights page -- Container for visualization and dashboardmemento-note/lib/ai/services/clustering.service.ts-- Add cluster auto-naming via LLM -- Human-readable cluster namesmemento-note/lib/ai/services/bridge-notes.service.ts-- Add incremental recalculation logic -- Efficient updates without full recompute
Acceptance Criteria:
- Given a user with 50+ notes with embeddings, when the clustering API is called, then notes are grouped into 3-20 clusters based on semantic similarity
- Given a cluster has been created, when the cluster is viewed, then it has an auto-generated name (2-4 words) describing the common theme
- Given a note is similar to >= 2 clusters with cosine >= 0.5, when bridge detection runs, then the note is marked as a bridge note with correct bridge_score
- Given isolated clusters exist (no bridges), when suggestions are requested, then 3 AI-generated bridge note ideas are returned per cluster pair
- Given the insights page is loaded, when displayed, then clusters are color-coded and bridge notes have golden borders
- Given < 10 notes exist for a user, when clustering is requested, then API returns insufficient_data status gracefully
- Given > 5% of notes have been modified, when the cron runs, then full recalculation is triggered
- Given <= 5% of notes have been modified, when the cron runs, then cached results are returned
Design Notes
Clustering Algorithm Choice
Since HDBSCAN is Python-only and we're a Node.js stack, we implement a simplified density-based clustering:
- Pairwise similarity matrix: Use pgvector cosine similarity for all note pairs
- DBSCAN variant:
- For each note, find neighbors within epsilon = 0.3 cosine distance
- Form clusters from dense regions (min_cluster_size = 3)
- Mark outliers as noise (cluster_id = -1)
This approximates HDBSCAN's core benefits: no preset cluster count, outlier detection, handles varying cluster sizes.
Bridge Score Formula
bridge_score = len(clusters_touched) / max_possible_clusters
- A note touching 2 of 5 clusters = 0.4
- A note touching 5 of 5 clusters = 1.0 (super-connector)
Cluster Naming
Use the 5 most central notes (highest mean similarity to other cluster members) as context for LLM:
"Here are 5 notes from a cluster. Summarize the common theme in 2-4 words:
1. [title] - [content snippet]
2. ..."
UI Visualization
- Clusters: Each cluster gets a unique hue (HSL color wheel)
- Bridge notes: Golden border + 1.5x size
- Isolated clusters: Grayed out with "Isolated" badge
- Click interaction: Click cluster to zoom and show note list
Cron Strategy
- Daily: At 2 AM server time, recalculate for all active users
- Incremental check: Track last_modified timestamp on embeddings; if delta < 5%, skip
Verification
Commands:
npx prisma migrate dev --name add_clustering_support-- Creates NoteCluster and BridgeNote tablesnpx prisma generate-- Regenerates Prisma client with new modelsnpm run build-- Ensures TypeScript compilation succeeds
Manual checks:
- Visit /insights page — should show cluster visualization or "Need more notes" message
- Create 30+ test notes with varied topics — check that clusters form and have sensible names
- Modify a note and check it — clusters should update within 24h or via manual recalculate
- Check that bridge notes are visually distinct (golden border) in visualization
- POST /api/clusters/recalculate — should return 202 and trigger background job
- GET /api/bridge-notes/suggestions — should return AI-generated connection ideas
Suggested Review Order
Database schema
- New models for clustering and bridge tracking with proper indexes
memento-note/prisma/schema.prisma:767
Core clustering algorithm
-
DBSCAN-style density-based clustering using pgvector cosine similarity
memento-note/lib/ai/services/clustering.service.ts:82 -
Cluster membership scoring and centrality detection
memento-note/lib/ai/services/clustering.service.ts:239
Bridge notes detection
-
Cross-cluster similarity analysis for bridge note identification
memento-note/lib/ai/services/bridge-notes.service.ts:52 -
AI-powered suggestions for connecting isolated clusters
memento-note/lib/ai/services/bridge-notes.service.ts:173
API endpoints
-
Cluster listing and recalculation with caching
memento-note/app/api/clusters/route.ts:14 -
Bridge notes and suggestions endpoints
memento-note/app/api/bridge-notes/route.ts:14 -
Scheduled cron job for automatic recalculation
memento-note/app/api/cron/clusters/route.ts:27
User interface
-
Interactive cluster visualization with React Flow
memento-note/components/cluster-visualization.tsx:40 -
Bridge notes dashboard with tabbed interface
memento-note/components/bridge-notes-dashboard.tsx:38 -
Main insights page tying it all together
memento-note/app/(main)/insights/page.tsx:17