Replace JSON-string embeddings with native pgvector(1536) storage and add PostgreSQL full-text search (tsvector/GIN) with Reciprocal Rank Fusion for hybrid keyword + semantic ranking. Changes: - NoteEmbedding.embedding: String → vector(1536) via pgvector - NoteEmbedding: added updatedAt for reindex tracking - Note: added tsv (tsvector) with auto-update trigger for FTS - semantic-search.service: hybrid FTS + vector search with RRF fusion - embedding.service: toVectorString() for pgvector SQL literals - Removed JS-side cosine similarity loops (now DB-side via <=>) - Added HNSW index on NoteEmbedding.embedding (cosine distance) - Added GIN index on Note.tsv for FTS queries Schema migration in: prisma/migrations/20260512120000_pgvector_and_fts_search/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4.5 KiB
Semantic Search Migration to pgvector + Full-Text Search
Overview
This migration migrates the semantic search infrastructure from JSON-string embeddings to native pgvector storage and adds PostgreSQL full-text search (FTS) with Reciprocal Rank Fusion (RRF) for hybrid ranking.
What Changed
Schema Changes
NoteEmbedding table:
- Changed
embedding String→embedding Unsupported("vector(1536))— stores as native pgvector - Added
updatedAtcolumn for tracking reindex freshness
Note table:
- Added
tsv Unsupported("tsvector")— auto-updated via trigger for FTS
Search Architecture
| Before | After |
|---|---|
| JS-side cosine similarity loops | DB-side <=> (cosine distance) via pgvector |
| Embeddings stored as JSON strings | Native vector(1536) pgvector type |
| Pure vector-only search | Hybrid FTS + vector with RRF fusion |
| No full-text capability | tsvector + GIN index for keyword matching |
New Indexes
NoteEmbedding_embedding_hnsw_idx— HNSW index onembeddingcolumn (cosine distance)Note_tsv_gin_idx— GIN index ontsvcolumn for FTS
Deployment Steps
1. Enable pgvector Extension
pgvector must be enabled before the schema migration runs:
CREATE EXTENSION IF NOT EXISTS vector;
If deploying via the migration file, this runs automatically as Phase 1.
2. Run Database Migration
The migration file (prisma/migrations/20260512120000_pgvector_and_fts_search/migration.sql) applies in three phases:
Phase 1: Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
Phase 2: Convert NoteEmbedding to native vector
ALTER TABLE "NoteEmbedding" ADD COLUMN "vec" vector(1536);
UPDATE "NoteEmbedding" SET "vec" = ("embedding"::jsonb)::text::vector(1536)
WHERE "embedding" IS NOT NULL;
ALTER TABLE "NoteEmbedding" DROP COLUMN "embedding";
ALTER TABLE "NoteEmbedding" RENAME COLUMN "vec" TO "embedding";
ALTER TABLE "NoteEmbedding" ADD COLUMN "updatedAt" TIMESTAMP NOT NULL DEFAULT now();
CREATE INDEX "NoteEmbedding_embedding_hnsw_idx" ON "NoteEmbedding"
USING hnsw ("embedding" vector_cosine_ops) WITH (m = 16, ef_construction = 64);
Phase 3: Add FTS tsvector to Note
ALTER TABLE "Note" ADD COLUMN "tsv" tsvector;
UPDATE "Note" SET "tsv" =
setweight(to_tsvector('simple', COALESCE("title", '')), 'A') ||
setweight(to_tsvector('simple', COALESCE("content", '')), 'B');
CREATE INDEX "Note_tsv_gin_idx" ON "Note" USING gin ("tsv");
CREATE OR REPLACE FUNCTION "note_tsv_trigger"() RETURNS trigger AS $$
BEGIN
NEW."tsv" :=
setweight(to_tsvector('simple', COALESCE(NEW."title", '')), 'A') ||
setweight(to_tsvector('simple', COALESCE(NEW."content", '')), 'B');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER "note_tsv_update"
BEFORE INSERT OR UPDATE OF "title", "content" ON "Note"
FOR EACH ROW EXECUTE FUNCTION "note_tsv_trigger"();
3. Regenerate Embeddings for Existing Notes
After the migration, all existing NoteEmbedding rows must have their embedding column regenerated from the old JSON strings to native vector format. The migration handles this conversion automatically via the UPDATE statement.
To reindex all notes programmatically:
POST /api/notes/reindex
4. Verify Deployment
Validate embeddings:
POST /api/admin/embeddings/validate
Test semantic search via the note search tool or:
POST /api/notes/search?q=<query>
Docker Deployment
The docker-compose.yml runs PostgreSQL 16-alpine with the following configuration:
- PostgreSQL port: 5433 (host) → 5432 (container)
- Database:
memento(user:memento, password:mementoby default) - Health check:
pg_isready
Services that depend on the database (memento-note, mcp-server) wait for PostgreSQL to be healthy before starting.
Rollback
To rollback to the pre-migration state:
- Drop the HNSW index:
DROP INDEX "NoteEmbedding_embedding_hnsw_idx"; - Drop the GIN index:
DROP INDEX "Note_tsv_gin_idx"; - Drop the trigger:
DROP TRIGGER "note_tsv_update" ON "Note"; - Drop the function:
DROP FUNCTION "note_tsv_trigger"(); - Revert schema via Prisma migrate reset (requires restoring the old
NoteEmbedding.embeddingcolumn type)
Affected Services
| Service | Container | Port |
|---|---|---|
| PostgreSQL | memento-postgres |
5433 |
| memento-note (Next.js) | memento-web |
3000 |
| mcp-server | memento-mcp |
3001 |
| Ollama (optional) | memento-ollama |
11434 |