# Semantic Search Migration to pgvector + Full-Text Search ## Overview This migration migrates the semantic search infrastructure from JSON-string embeddings to **native pgvector** storage and adds **PostgreSQL full-text search (FTS)** with Reciprocal Rank Fusion (RRF) for hybrid ranking. ## What Changed ### Schema Changes **`NoteEmbedding` table:** - Changed `embedding String` → `embedding Unsupported("vector(1536))` — stores as native pgvector - Added `updatedAt` column for tracking reindex freshness **`Note` table:** - Added `tsv Unsupported("tsvector")` — auto-updated via trigger for FTS ### Search Architecture | Before | After | |--------|-------| | JS-side cosine similarity loops | DB-side `<=>` (cosine distance) via pgvector | | Embeddings stored as JSON strings | Native `vector(1536)` pgvector type | | Pure vector-only search | Hybrid FTS + vector with RRF fusion | | No full-text capability | `tsvector` + GIN index for keyword matching | ### New Indexes - `NoteEmbedding_embedding_hnsw_idx` — HNSW index on `embedding` column (cosine distance) - `Note_tsv_gin_idx` — GIN index on `tsv` column for FTS ## Deployment Steps ### 1. Enable pgvector Extension pgvector must be enabled before the schema migration runs: ```sql CREATE EXTENSION IF NOT EXISTS vector; ``` If deploying via the migration file, this runs automatically as Phase 1. ### 2. Run Database Migration The migration file (`prisma/migrations/20260512120000_pgvector_and_fts_search/migration.sql`) applies in three phases: **Phase 1:** Enable pgvector extension ```sql CREATE EXTENSION IF NOT EXISTS vector; ``` **Phase 2:** Convert NoteEmbedding to native vector ```sql ALTER TABLE "NoteEmbedding" ADD COLUMN "vec" vector(1536); UPDATE "NoteEmbedding" SET "vec" = ("embedding"::jsonb)::text::vector(1536) WHERE "embedding" IS NOT NULL; ALTER TABLE "NoteEmbedding" DROP COLUMN "embedding"; ALTER TABLE "NoteEmbedding" RENAME COLUMN "vec" TO "embedding"; ALTER TABLE "NoteEmbedding" ADD COLUMN "updatedAt" TIMESTAMP NOT NULL DEFAULT now(); CREATE INDEX "NoteEmbedding_embedding_hnsw_idx" ON "NoteEmbedding" USING hnsw ("embedding" vector_cosine_ops) WITH (m = 16, ef_construction = 64); ``` **Phase 3:** Add FTS tsvector to Note ```sql ALTER TABLE "Note" ADD COLUMN "tsv" tsvector; UPDATE "Note" SET "tsv" = setweight(to_tsvector('simple', COALESCE("title", '')), 'A') || setweight(to_tsvector('simple', COALESCE("content", '')), 'B'); CREATE INDEX "Note_tsv_gin_idx" ON "Note" USING gin ("tsv"); CREATE OR REPLACE FUNCTION "note_tsv_trigger"() RETURNS trigger AS $$ BEGIN NEW."tsv" := setweight(to_tsvector('simple', COALESCE(NEW."title", '')), 'A') || setweight(to_tsvector('simple', COALESCE(NEW."content", '')), 'B'); RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER "note_tsv_update" BEFORE INSERT OR UPDATE OF "title", "content" ON "Note" FOR EACH ROW EXECUTE FUNCTION "note_tsv_trigger"(); ``` ### 3. Regenerate Embeddings for Existing Notes After the migration, all existing `NoteEmbedding` rows must have their `embedding` column regenerated from the old JSON strings to native vector format. The migration handles this conversion automatically via the `UPDATE` statement. To reindex all notes programmatically: ``` POST /api/notes/reindex ``` ### 4. Verify Deployment **Validate embeddings:** ``` POST /api/admin/embeddings/validate ``` **Test semantic search** via the note search tool or: ``` POST /api/notes/search?q= ``` ## Docker Deployment The `docker-compose.yml` runs PostgreSQL 16-alpine with the following configuration: - **PostgreSQL port:** 5433 (host) → 5432 (container) - **Database:** `memento` (user: `memento`, password: `memento` by default) - **Health check:** `pg_isready` Services that depend on the database (`memento-note`, `mcp-server`) wait for PostgreSQL to be healthy before starting. ## Rollback To rollback to the pre-migration state: 1. Drop the HNSW index: `DROP INDEX "NoteEmbedding_embedding_hnsw_idx";` 2. Drop the GIN index: `DROP INDEX "Note_tsv_gin_idx";` 3. Drop the trigger: `DROP TRIGGER "note_tsv_update" ON "Note";` 4. Drop the function: `DROP FUNCTION "note_tsv_trigger"();` 5. Revert schema via Prisma migrate reset (requires restoring the old `NoteEmbedding.embedding` column type) ## Affected Services | Service | Container | Port | |---------|-----------|------| | PostgreSQL | `memento-postgres` | 5433 | | memento-note (Next.js) | `memento-web` | 3000 | | mcp-server | `memento-mcp` | 3001 | | Ollama (optional) | `memento-ollama` | 11434 |