Files
Momento/_bmad-output/implementation-artifacts/spec-cluster-detection-bridge-notes.md
Antigravity 077e665dfc feat(cluster): implement cluster detection and bridge notes discovery
Add automatic note clustering using density-based algorithm (DBSCAN variant)
and bridge notes detection for connecting different thematic clusters.

Features:
- NoteCluster, ClusterMember, BridgeNote, BridgeSuggestion models
- Clustering service with pgvector cosine similarity
- Bridge notes detection (notes connecting >=2 clusters)
- AI-powered suggestions for missing cluster connections
- /insights page with React Flow visualization
- Cron endpoint for automatic recalculation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 20:26:25 +00:00

193 lines
11 KiB
Markdown

---
title: 'Cluster Detection & Bridge Notes Discovery'
type: 'feature'
created: '2026-05-23'
status: 'done'
baseline_commit: '2aed148dc2a6de1914facae9fef65935b3d77ef5'
context: ['memento-note/prisma/schema.prisma', 'memento-note/lib/ai/services/semantic-search.service.ts']
---
<frozen-after-approval reason="human-owned intent — do not modify unless human renegotiates">
## Intent
**Problem:** Users have hundreds of notes but no automatic way to discover thematic clusters or connections between unrelated topics. Notes remain siloed in notebooks, and users manually create links between related concepts across different notebooks.
**Approach:** Implement automatic clustering using density-based algorithm (HDBSCAN equivalent) on note embeddings, detect "bridge notes" that connect multiple clusters, and provide AI-powered suggestions for missing connections. Visualize clusters graphically with a dashboard showing insights and connection opportunities.
## Boundaries & Constraints
**Always:**
- Use existing pgvector embeddings from NoteEmbedding table
- Cluster per-user only (never cross-user data)
- All SQL uses parameterized queries (no interpolation)
- HDBSCAN: Node.js compatible implementation or custom density-based clustering
- Bridge score = len(clusters_touched) / max_clusters (0-1 range)
- Threshold for cluster membership: >= 0.3 similarity
- Threshold for bridge detection: >= 0.5 cosine similarity to >= 2 clusters
- Incremental recalculation only when > 5% embeddings changed or > 10 notes modified
**Ask First:**
- External Python microservice vs Node.js native clustering
- Cluster auto-naming: exact LLM prompt wording
- Dashboard UI placement: separate page vs modal vs sidebar panel
- Visualization library: D3.js, Cytoscape.js, or React Flow
**Never:**
- HDBSCAN pip install (Python) — we are a Node.js stack
- Real-time clustering on every note save
- Cross-user cluster analysis
- Modifying existing NoteLink or MemoryEchoInsight behavior
- Manual cluster assignment by users (out of scope for v1)
## I/O & Edge-Case Matrix
| Scenario | Input / State | Expected Output / Behavior | Error Handling |
|----------|--------------|---------------------------|----------------|
| CLUSTER_NEW_USER | User with < 10 notes, no existing clusters | Empty cluster list, message "Need more notes to cluster" | Return gracefully with status: insufficient_data |
| CLUSTER_EXISTING | User with 100+ notes, embeddings exist | N clusters (3-50), each with auto-generated name | Log errors, continue with partial results |
| BRIDGE_DETECTION | Note similar to cluster A (0.7) and cluster B (0.6) | Marked as bridge, bridge_score = 0.5, clusters_connected = [A, B] | Skip if similarity < 0.5 |
| NO_BRIDGES | User with isolated clusters (no cross-cluster similarities) | Empty bridge list, suggestions for missing connections | Return empty array |
| SUGGEST_CONNECTIONS | Cluster A (Recipes) and Cluster B (Health) unconnected | 3 AI-suggested bridge note ideas with titles/descriptions | Fallback to generic "Connect [A] and [B]" if LLM fails |
| INCREMENTAL_UPDATE | 3 notes modified out of 200 | Skip recalculation (< 5% threshold) | Return cached results |
| MAJOR_UPDATE | 15 notes modified out of 200 | Trigger full recalculation | Queue background job, return immediately |
| EMBEDDING_MISSING | Note exists but no embedding in NoteEmbedding | Exclude from clustering | Log warning, continue with available notes |
</frozen-after-approval>
## Code Map
- `memento-note/prisma/schema.prisma` -- Add NoteCluster and BridgeNote models
- `memento-note/lib/ai/services/clustering.service.ts` -- NEW: HDBSCAN-style clustering algorithm
- `memento-note/lib/ai/services/bridge-notes.service.ts` -- NEW: Bridge detection and suggestions
- `memento-note/app/api/clusters/route.ts` -- NEW: GET /api/clusters, POST /api/clusters/recalculate
- `memento-note/app/api/bridge-notes/route.ts` -- NEW: GET /api/bridge-notes, GET /api/bridge-notes/suggestions
- `memento-note/app/api/cron/clusters/route.ts` -- NEW: Cron endpoint for daily recalculation
- `memento-note/components/cluster-visualization.tsx` -- NEW: Graph visualization with cluster coloring
- `memento-note/components/bridge-notes-dashboard.tsx` -- NEW: Dashboard showing insights and suggestions
- `memento-note/app/(main)/insights/page.tsx` -- NEW: Insights page housing the dashboard
## Tasks & Acceptance
**Execution:**
- [x] `memento-note/prisma/schema.prisma` -- Add NoteCluster and BridgeNote models with proper indexes -- Store cluster mappings and bridge metadata
- [x] `memento-note/lib/ai/services/clustering.service.ts` -- Implement density-based clustering (HDBSCAN-style in Node.js) -- Core algorithm for automatic note grouping
- [x] `memento-note/lib/ai/services/bridge-notes.service.ts` -- Implement bridge detection and AI suggestion generation -- Find cross-cluster connections
- [x] `memento-note/app/api/clusters/route.ts` -- Create cluster listing and recalculation endpoints -- REST API for cluster operations
- [x] `memento-note/app/api/bridge-notes/route.ts` -- Create bridge notes and suggestions endpoints -- REST API for bridge operations
- [x] `memento-note/app/api/cron/clusters/route.ts` -- Create cron endpoint for scheduled recalculation -- Automated background processing
- [x] `memento-note/components/cluster-visualization.tsx` -- Build interactive graph with cluster coloring -- Visual representation of note clusters
- [x] `memento-note/components/bridge-notes-dashboard.tsx` -- Build insights dashboard with stats and suggestions -- User-facing cluster and bridge information
- [x] `memento-note/app/(main)/insights/page.tsx` -- Create main insights page -- Container for visualization and dashboard
- [x] `memento-note/lib/ai/services/clustering.service.ts` -- Add cluster auto-naming via LLM -- Human-readable cluster names
- [x] `memento-note/lib/ai/services/bridge-notes.service.ts` -- Add incremental recalculation logic -- Efficient updates without full recompute
**Acceptance Criteria:**
- Given a user with 50+ notes with embeddings, when the clustering API is called, then notes are grouped into 3-20 clusters based on semantic similarity
- Given a cluster has been created, when the cluster is viewed, then it has an auto-generated name (2-4 words) describing the common theme
- Given a note is similar to >= 2 clusters with cosine >= 0.5, when bridge detection runs, then the note is marked as a bridge note with correct bridge_score
- Given isolated clusters exist (no bridges), when suggestions are requested, then 3 AI-generated bridge note ideas are returned per cluster pair
- Given the insights page is loaded, when displayed, then clusters are color-coded and bridge notes have golden borders
- Given < 10 notes exist for a user, when clustering is requested, then API returns insufficient_data status gracefully
- Given > 5% of notes have been modified, when the cron runs, then full recalculation is triggered
- Given <= 5% of notes have been modified, when the cron runs, then cached results are returned
## Design Notes
### Clustering Algorithm Choice
Since HDBSCAN is Python-only and we're a Node.js stack, we implement a simplified density-based clustering:
1. **Pairwise similarity matrix**: Use pgvector cosine similarity for all note pairs
2. **DBSCAN variant**:
- For each note, find neighbors within epsilon = 0.3 cosine distance
- Form clusters from dense regions (min_cluster_size = 3)
- Mark outliers as noise (cluster_id = -1)
This approximates HDBSCAN's core benefits: no preset cluster count, outlier detection, handles varying cluster sizes.
### Bridge Score Formula
```
bridge_score = len(clusters_touched) / max_possible_clusters
```
- A note touching 2 of 5 clusters = 0.4
- A note touching 5 of 5 clusters = 1.0 (super-connector)
### Cluster Naming
Use the 5 most central notes (highest mean similarity to other cluster members) as context for LLM:
```
"Here are 5 notes from a cluster. Summarize the common theme in 2-4 words:
1. [title] - [content snippet]
2. ..."
```
### UI Visualization
- **Clusters**: Each cluster gets a unique hue (HSL color wheel)
- **Bridge notes**: Golden border + 1.5x size
- **Isolated clusters**: Grayed out with "Isolated" badge
- **Click interaction**: Click cluster to zoom and show note list
### Cron Strategy
- **Daily**: At 2 AM server time, recalculate for all active users
- **Incremental check**: Track last_modified timestamp on embeddings; if delta < 5%, skip
## Verification
**Commands:**
- `npx prisma migrate dev --name add_clustering_support` -- Creates NoteCluster and BridgeNote tables
- `npx prisma generate` -- Regenerates Prisma client with new models
- `npm run build` -- Ensures TypeScript compilation succeeds
**Manual checks:**
- Visit /insights page — should show cluster visualization or "Need more notes" message
- Create 30+ test notes with varied topics — check that clusters form and have sensible names
- Modify a note and check it — clusters should update within 24h or via manual recalculate
- Check that bridge notes are visually distinct (golden border) in visualization
- POST /api/clusters/recalculate — should return 202 and trigger background job
- GET /api/bridge-notes/suggestions — should return AI-generated connection ideas
## Suggested Review Order
**Database schema**
- New models for clustering and bridge tracking with proper indexes
[`memento-note/prisma/schema.prisma:767`](../../memento-note/prisma/schema.prisma#L767)
**Core clustering algorithm**
- DBSCAN-style density-based clustering using pgvector cosine similarity
[`memento-note/lib/ai/services/clustering.service.ts:82`](../../memento-note/lib/ai/services/clustering.service.ts#L82)
- Cluster membership scoring and centrality detection
[`memento-note/lib/ai/services/clustering.service.ts:239`](../../memento-note/lib/ai/services/clustering.service.ts#L239)
**Bridge notes detection**
- Cross-cluster similarity analysis for bridge note identification
[`memento-note/lib/ai/services/bridge-notes.service.ts:52`](../../memento-note/lib/ai/services/bridge-notes.service.ts#L52)
- AI-powered suggestions for connecting isolated clusters
[`memento-note/lib/ai/services/bridge-notes.service.ts:173`](../../memento-note/lib/ai/services/bridge-notes.service.ts#L173)
**API endpoints**
- Cluster listing and recalculation with caching
[`memento-note/app/api/clusters/route.ts:14`](../../memento-note/app/api/clusters/route.ts#L14)
- Bridge notes and suggestions endpoints
[`memento-note/app/api/bridge-notes/route.ts:14`](../../memento-note/app/api/bridge-notes/route.ts#L14)
- Scheduled cron job for automatic recalculation
[`memento-note/app/api/cron/clusters/route.ts:27`](../../memento-note/app/api/cron/clusters/route.ts#L27)
**User interface**
- Interactive cluster visualization with React Flow
[`memento-note/components/cluster-visualization.tsx:40`](../../memento-note/components/cluster-visualization.tsx#L40)
- Bridge notes dashboard with tabbed interface
[`memento-note/components/bridge-notes-dashboard.tsx:38`](../../memento-note/components/bridge-notes-dashboard.tsx#L38)
- Main insights page tying it all together
[`memento-note/app/(main)/insights/page.tsx:17`](../../memento-note/app/(main)/insights/page.tsx#L17)