Files
Momento/MIGRATION.md
Antigravity 03e6a62b80
All checks were successful
Deploy to Production / Build and Deploy (push) Successful in 2m12s
feat: migrate semantic search to pgvector + full-text search
Replace JSON-string embeddings with native pgvector(1536) storage and
add PostgreSQL full-text search (tsvector/GIN) with Reciprocal Rank Fusion
for hybrid keyword + semantic ranking.

Changes:
- NoteEmbedding.embedding: String → vector(1536) via pgvector
- NoteEmbedding: added updatedAt for reindex tracking
- Note: added tsv (tsvector) with auto-update trigger for FTS
- semantic-search.service: hybrid FTS + vector search with RRF fusion
- embedding.service: toVectorString() for pgvector SQL literals
- Removed JS-side cosine similarity loops (now DB-side via <=>)
- Added HNSW index on NoteEmbedding.embedding (cosine distance)
- Added GIN index on Note.tsv for FTS queries

Schema migration in: prisma/migrations/20260512120000_pgvector_and_fts_search/

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 07:03:56 +00:00

4.5 KiB

Semantic Search Migration to pgvector + Full-Text Search

Overview

This migration migrates the semantic search infrastructure from JSON-string embeddings to native pgvector storage and adds PostgreSQL full-text search (FTS) with Reciprocal Rank Fusion (RRF) for hybrid ranking.

What Changed

Schema Changes

NoteEmbedding table:

  • Changed embedding Stringembedding Unsupported("vector(1536)) — stores as native pgvector
  • Added updatedAt column for tracking reindex freshness

Note table:

  • Added tsv Unsupported("tsvector") — auto-updated via trigger for FTS

Search Architecture

Before After
JS-side cosine similarity loops DB-side <=> (cosine distance) via pgvector
Embeddings stored as JSON strings Native vector(1536) pgvector type
Pure vector-only search Hybrid FTS + vector with RRF fusion
No full-text capability tsvector + GIN index for keyword matching

New Indexes

  • NoteEmbedding_embedding_hnsw_idx — HNSW index on embedding column (cosine distance)
  • Note_tsv_gin_idx — GIN index on tsv column for FTS

Deployment Steps

1. Enable pgvector Extension

pgvector must be enabled before the schema migration runs:

CREATE EXTENSION IF NOT EXISTS vector;

If deploying via the migration file, this runs automatically as Phase 1.

2. Run Database Migration

The migration file (prisma/migrations/20260512120000_pgvector_and_fts_search/migration.sql) applies in three phases:

Phase 1: Enable pgvector extension

CREATE EXTENSION IF NOT EXISTS vector;

Phase 2: Convert NoteEmbedding to native vector

ALTER TABLE "NoteEmbedding" ADD COLUMN "vec" vector(1536);
UPDATE "NoteEmbedding" SET "vec" = ("embedding"::jsonb)::text::vector(1536)
  WHERE "embedding" IS NOT NULL;
ALTER TABLE "NoteEmbedding" DROP COLUMN "embedding";
ALTER TABLE "NoteEmbedding" RENAME COLUMN "vec" TO "embedding";
ALTER TABLE "NoteEmbedding" ADD COLUMN "updatedAt" TIMESTAMP NOT NULL DEFAULT now();
CREATE INDEX "NoteEmbedding_embedding_hnsw_idx" ON "NoteEmbedding"
  USING hnsw ("embedding" vector_cosine_ops) WITH (m = 16, ef_construction = 64);

Phase 3: Add FTS tsvector to Note

ALTER TABLE "Note" ADD COLUMN "tsv" tsvector;
UPDATE "Note" SET "tsv" =
  setweight(to_tsvector('simple', COALESCE("title", '')), 'A') ||
  setweight(to_tsvector('simple', COALESCE("content", '')), 'B');
CREATE INDEX "Note_tsv_gin_idx" ON "Note" USING gin ("tsv");
CREATE OR REPLACE FUNCTION "note_tsv_trigger"() RETURNS trigger AS $$
BEGIN
  NEW."tsv" :=
    setweight(to_tsvector('simple', COALESCE(NEW."title", '')), 'A') ||
    setweight(to_tsvector('simple', COALESCE(NEW."content", '')), 'B');
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER "note_tsv_update"
  BEFORE INSERT OR UPDATE OF "title", "content" ON "Note"
  FOR EACH ROW EXECUTE FUNCTION "note_tsv_trigger"();

3. Regenerate Embeddings for Existing Notes

After the migration, all existing NoteEmbedding rows must have their embedding column regenerated from the old JSON strings to native vector format. The migration handles this conversion automatically via the UPDATE statement.

To reindex all notes programmatically:

POST /api/notes/reindex

4. Verify Deployment

Validate embeddings:

POST /api/admin/embeddings/validate

Test semantic search via the note search tool or:

POST /api/notes/search?q=<query>

Docker Deployment

The docker-compose.yml runs PostgreSQL 16-alpine with the following configuration:

  • PostgreSQL port: 5433 (host) → 5432 (container)
  • Database: memento (user: memento, password: memento by default)
  • Health check: pg_isready

Services that depend on the database (memento-note, mcp-server) wait for PostgreSQL to be healthy before starting.

Rollback

To rollback to the pre-migration state:

  1. Drop the HNSW index: DROP INDEX "NoteEmbedding_embedding_hnsw_idx";
  2. Drop the GIN index: DROP INDEX "Note_tsv_gin_idx";
  3. Drop the trigger: DROP TRIGGER "note_tsv_update" ON "Note";
  4. Drop the function: DROP FUNCTION "note_tsv_trigger"();
  5. Revert schema via Prisma migrate reset (requires restoring the old NoteEmbedding.embedding column type)

Affected Services

Service Container Port
PostgreSQL memento-postgres 5433
memento-note (Next.js) memento-web 3000
mcp-server memento-mcp 3001
Ollama (optional) memento-ollama 11434