All checks were successful
Deploy to Production / Build and Deploy (push) Successful in 7s
- Add brainstorm feature with collaborative canvas, AI idea generation, live cursors, playback, and export - Add PDF upload/extraction/ingestion pipeline with pgvector document search (RAG) - Add document Q&A overlay with streaming chat and PDF preview - Add note attachments UI with status polling, grid layout, and auto-scroll - Add task extraction AI tool and agent executor improvements - Fix NoteEmbedding missing updatedAt column, re-index 66 notes with 1536-dim embeddings - Fix brainstorm 'Create Note' button: add success toast and redirect to created note - Fix memory echo notification infinite polling - Fix chat route to always include document_search tool - Add brainstorm i18n keys across all 14 locales - Add socket server for real-time brainstorm collaboration - Add hierarchical notebook selector and organize notebook dialog improvements - Add sidebar brainstorm section with session management - Update prisma schema with brainstorm tables, attachments, and document chunks
698 lines
21 KiB
Markdown
698 lines
21 KiB
Markdown
# Spécification Technique — Document Parsing & Q&A (Analyse PDF)
|
|
|
|
## A. Mises à jour du Schéma Prisma
|
|
|
|
### A1. Modèle `NoteAttachment`
|
|
|
|
Stocke les fichiers attachés à une note (PDF, images, documents).
|
|
|
|
```prisma
|
|
model NoteAttachment {
|
|
id String @id @default(cuid())
|
|
noteId String
|
|
fileName String
|
|
fileType String // "application/pdf", "image/png", etc.
|
|
fileSize Int // en bytes
|
|
filePath String // chemin local: data/uploads/attachments/{noteId}/{uuid}.pdf
|
|
mimeType String // redondant avec fileType pour requêtes rapides
|
|
status String @default("pending") // pending → processing → ready → failed
|
|
pageCount Int? // nombre de pages (PDF uniquement)
|
|
error String? // message d'erreur si failed
|
|
createdAt DateTime @default(now())
|
|
updatedAt DateTime @updatedAt
|
|
|
|
note Note @relation(fields: [noteId], references: [id], onDelete: Cascade)
|
|
chunks DocumentChunk[]
|
|
|
|
@@index([noteId])
|
|
@@index([status])
|
|
}
|
|
```
|
|
|
|
### A2. Modèle `DocumentChunk`
|
|
|
|
Fragments vectorisés d'un document. Chaque chunk est lié à un attachment ET transitivement à une note.
|
|
|
|
```prisma
|
|
model DocumentChunk {
|
|
id String @id @default(cuid())
|
|
attachmentId String
|
|
content String // texte du fragment (800-1200 tokens)
|
|
chunkIndex Int // position ordinale dans le document (0, 1, 2…)
|
|
pageNumber Int? // page source (pour citation)
|
|
startChar Int? // offset caractère de début dans le texte extrait
|
|
endChar Int? // offset caractère de fin
|
|
metadata String? // JSON: { heading, section, tableCaption… }
|
|
embedding Unsupported("vector(1536)")?
|
|
createdAt DateTime @default(now())
|
|
|
|
attachment NoteAttachment @relation(fields: [attachmentId], references: [id], onDelete: Cascade)
|
|
|
|
@@index([attachmentId])
|
|
@@index([attachmentId, chunkIndex])
|
|
}
|
|
```
|
|
|
|
### A3. Ajout à `Note`
|
|
|
|
```prisma
|
|
model Note {
|
|
// … champs existants …
|
|
|
|
attachments NoteAttachment[]
|
|
}
|
|
```
|
|
|
|
### A4. Migration SQL brute — Index HNSW pour DocumentChunk
|
|
|
|
```sql
|
|
-- À ajouter dans la migration Prisma (migration.sql)
|
|
CREATE INDEX IF NOT EXISTS "DocumentChunk_embedding_hnsw_idx"
|
|
ON "DocumentChunk" USING hnsw ("embedding" vector_cosine_ops)
|
|
WITH (m = 16, ef_construction = 64);
|
|
```
|
|
|
|
---
|
|
|
|
## B. Pipeline d'Ingestion (Chunking & Embeddings)
|
|
|
|
### B1. Architecture du pipeline
|
|
|
|
```
|
|
PDF upload → NoteAttachment (status: pending)
|
|
↓
|
|
pdf-parse extraction (texte brut + métadonnées pages)
|
|
↓
|
|
Structural Chunking (800 chars, overlap 200, respect des pages)
|
|
↓
|
|
DocumentChunk.create (content, chunkIndex, pageNumber, metadata)
|
|
↓
|
|
Batch embeddings (Promise.all par batch de 20)
|
|
↓
|
|
SQL UPDATE embedding sur chaque chunk
|
|
↓
|
|
NoteAttachment.update (status: ready)
|
|
```
|
|
|
|
### B2. Service d'extraction — `document-extraction.service.ts`
|
|
|
|
```typescript
|
|
// lib/ai/services/document-extraction.service.ts
|
|
|
|
import pdf from 'pdf-parse'
|
|
|
|
interface ExtractedPage {
|
|
pageNumber: number
|
|
text: string
|
|
}
|
|
|
|
interface ExtractedDocument {
|
|
pages: ExtractedPage[]
|
|
totalPages: number
|
|
metadata: { title?: string; author?: string }
|
|
}
|
|
|
|
export class DocumentExtractionService {
|
|
async extractPdf(filePath: string): Promise<ExtractedDocument> {
|
|
const dataBuffer = fs.readFileSync(filePath)
|
|
const data = await pdf(dataBuffer, {
|
|
max: 0, // toutes les pages
|
|
})
|
|
|
|
// pdf-parse ne donne pas les pages directement,
|
|
// on utilise un custom page renderer
|
|
const pages: ExtractedPage[] = []
|
|
let currentPage = 0
|
|
|
|
// NLP page renderer: each page separated
|
|
const renderer = {
|
|
renderPage: (pageData: any) => {
|
|
currentPage++
|
|
const text = pageData.text
|
|
pages.push({ pageNumber: currentPage, text })
|
|
return ''
|
|
}
|
|
}
|
|
|
|
// Re-parse avec le renderer
|
|
await pdf(dataBuffer, { pagerender: renderer.renderPage })
|
|
|
|
return {
|
|
pages,
|
|
totalPages: data.numpages,
|
|
metadata: {
|
|
title: data.info?.Title,
|
|
author: data.info?.Author,
|
|
},
|
|
}
|
|
}
|
|
}
|
|
|
|
export const documentExtractionService = new DocumentExtractionService()
|
|
```
|
|
|
|
### B3. Stratégie de Chunking — `document-chunking.service.ts`
|
|
|
|
**Principes :**
|
|
|
|
1. **Taille cible** : 800 caractères (~200 tokens), avec overlap de 200 caractères
|
|
2. **Respect des frontières de page** : un chunk ne chevauche JAMAIS deux pages. Si la coupure tombe au milieu d'une page, on ajuste.
|
|
3. **Respect des sections** : les headings (lignes en MAJUSCULES ou préfixées par `#`, `##`) démarrent un nouveau chunk
|
|
4. **Overlap contextuel** : les 200 derniers caractères du chunk N sont répétés au début du chunk N+1
|
|
5. **Tables** : conservées en entier dans un seul chunk si < 1500 chars, sinon découpées par ligne avec en-tête répété
|
|
|
|
```typescript
|
|
// lib/ai/services/document-chunking.service.ts
|
|
|
|
interface ChunkInput {
|
|
text: string
|
|
pageNumber: number
|
|
}
|
|
|
|
interface DocumentChunkData {
|
|
content: string
|
|
chunkIndex: number
|
|
pageNumber: number
|
|
startChar: number
|
|
endChar: number
|
|
metadata?: string
|
|
}
|
|
|
|
export class DocumentChunkingService {
|
|
private readonly CHUNK_SIZE = 800
|
|
private readonly OVERLAP = 200
|
|
private readonly MAX_CHUNK_SIZE = 1500
|
|
|
|
chunk(pages: ChunkInput[]): DocumentChunkData[] {
|
|
const chunks: DocumentChunkData[] = []
|
|
let globalIndex = 0
|
|
let previousTail = ''
|
|
|
|
for (const page of pages) {
|
|
const text = page.text.trim()
|
|
if (!text) continue
|
|
|
|
// Découper en sections (par headings ou paragraphes)
|
|
const sections = this.splitSections(text)
|
|
|
|
let buffer = previousTail
|
|
let bufferStart = 0
|
|
|
|
for (const section of sections) {
|
|
if (buffer.length + section.length > this.CHUNK_SIZE && buffer.length > 0) {
|
|
// Flush le buffer comme un chunk
|
|
chunks.push({
|
|
content: buffer.trim(),
|
|
chunkIndex: globalIndex++,
|
|
pageNumber: page.pageNumber,
|
|
startChar: bufferStart,
|
|
endChar: bufferStart + buffer.length,
|
|
})
|
|
// Overlap: garder les derniers OVERLAP chars
|
|
previousTail = buffer.slice(-this.OVERLAP)
|
|
buffer = previousTail + '\n' + section
|
|
bufferStart += buffer.length - section.length - previousTail.length
|
|
} else {
|
|
buffer += (buffer ? '\n\n' : '') + section
|
|
}
|
|
}
|
|
|
|
// Flush le reste
|
|
if (buffer.trim()) {
|
|
chunks.push({
|
|
content: buffer.trim(),
|
|
chunkIndex: globalIndex++,
|
|
pageNumber: page.pageNumber,
|
|
startChar: bufferStart,
|
|
endChar: bufferStart + buffer.length,
|
|
})
|
|
previousTail = buffer.slice(-this.OVERLAP)
|
|
}
|
|
}
|
|
|
|
return chunks
|
|
}
|
|
|
|
private splitSections(text: string): string[] {
|
|
const lines = text.split('\n')
|
|
const sections: string[] = []
|
|
let current = ''
|
|
|
|
for (const line of lines) {
|
|
const isHeading = /^(#{1,6}\s|[A-Z][A-Z\s]{5,}$)/.test(line.trim())
|
|
if (isHeading && current.trim()) {
|
|
sections.push(current.trim())
|
|
current = line
|
|
} else {
|
|
current += (current ? '\n' : '') + line
|
|
}
|
|
}
|
|
if (current.trim()) sections.push(current.trim())
|
|
return sections
|
|
}
|
|
}
|
|
|
|
export const documentChunkingService = new DocumentChunkingService()
|
|
```
|
|
|
|
### B4. Service d'ingestion orchestrateur — `document-ingestion.service.ts`
|
|
|
|
```typescript
|
|
// lib/ai/services/document-ingestion.service.ts
|
|
|
|
export class DocumentIngestionService {
|
|
async ingest(attachmentId: string): Promise<void> {
|
|
const attachment = await prisma.noteAttachment.findUnique({
|
|
where: { id: attachmentId },
|
|
})
|
|
if (!attachment) throw new Error('Attachment not found')
|
|
|
|
await prisma.noteAttachment.update({
|
|
where: { id: attachmentId },
|
|
data: { status: 'processing' },
|
|
})
|
|
|
|
try {
|
|
// 1. Extraction
|
|
const extracted = await documentExtractionService.extractPdf(attachment.filePath)
|
|
|
|
await prisma.noteAttachment.update({
|
|
where: { id: attachmentId },
|
|
data: { pageCount: extracted.totalPages },
|
|
})
|
|
|
|
// 2. Chunking
|
|
const chunkInputs = extracted.pages.map(p => ({
|
|
text: p.text,
|
|
pageNumber: p.pageNumber,
|
|
}))
|
|
const chunks = documentChunkingService.chunk(chunkInputs)
|
|
|
|
// 3. Créer les chunks en DB (sans embedding)
|
|
const created = await Promise.all(
|
|
chunks.map(c =>
|
|
prisma.documentChunk.create({
|
|
data: {
|
|
attachmentId,
|
|
content: c.content,
|
|
chunkIndex: c.chunkIndex,
|
|
pageNumber: c.pageNumber,
|
|
startChar: c.startChar,
|
|
endChar: c.endChar,
|
|
metadata: c.metadata,
|
|
},
|
|
})
|
|
)
|
|
)
|
|
|
|
// 4. Batch embeddings (par batch de 20)
|
|
const BATCH_SIZE = 20
|
|
for (let i = 0; i < created.length; i += BATCH_SIZE) {
|
|
const batch = created.slice(i, i + BATCH_SIZE)
|
|
const texts = batch.map(c => c.content)
|
|
const embeddings = await embeddingService.generateBatchEmbeddings(texts)
|
|
|
|
await Promise.all(
|
|
batch.map((chunk, idx) =>
|
|
prisma.$executeRawUnsafe(
|
|
`UPDATE "DocumentChunk" SET embedding = $1::vector WHERE id = $2`,
|
|
embeddingService.toVectorString(embeddings[idx].embedding),
|
|
chunk.id
|
|
)
|
|
)
|
|
)
|
|
}
|
|
|
|
// 5. Marquer prêt
|
|
await prisma.noteAttachment.update({
|
|
where: { id: attachmentId },
|
|
data: { status: 'ready' },
|
|
})
|
|
} catch (error: any) {
|
|
await prisma.noteAttachment.update({
|
|
where: { id: attachmentId },
|
|
data: { status: 'failed', error: error.message },
|
|
})
|
|
throw error
|
|
}
|
|
}
|
|
}
|
|
|
|
export const documentIngestionService = new DocumentIngestionService()
|
|
```
|
|
|
|
### B5. Route API d'upload
|
|
|
|
```typescript
|
|
// app/api/notes/[noteId]/attachments/route.ts
|
|
|
|
export async function POST(req, { params }) {
|
|
const session = await auth()
|
|
if (!session?.user?.id) return unauthorized()
|
|
|
|
const { noteId } = await params
|
|
const formData = await req.formData()
|
|
const file = formData.get('file') as File
|
|
|
|
// Validation
|
|
if (file.size > 20 * 1024 * 1024) return error('File too large (max 20MB)')
|
|
if (file.type !== 'application/pdf') return error('Only PDF supported')
|
|
|
|
// Sauvegarder le fichier
|
|
const dir = `data/uploads/attachments/${noteId}`
|
|
fs.mkdirSync(dir, { recursive: true })
|
|
const filePath = path.join(dir, `${uuid()}.pdf`)
|
|
fs.writeFileSync(filePath, Buffer.from(await file.arrayBuffer()))
|
|
|
|
// Créer l'attachment
|
|
const attachment = await prisma.noteAttachment.create({
|
|
data: {
|
|
noteId,
|
|
fileName: file.name,
|
|
fileType: file.type,
|
|
fileSize: file.size,
|
|
filePath,
|
|
mimeType: file.type,
|
|
status: 'pending',
|
|
},
|
|
})
|
|
|
|
// Lancer l'ingestion en arrière-plan (setImmediate)
|
|
setImmediate(() => documentIngestionService.ingest(attachment.id))
|
|
|
|
return NextResponse.json({ success: true, data: attachment })
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## C. Interface du Nouvel Outil Agent — `document_search`
|
|
|
|
### C1. Enregistrement dans le registre
|
|
|
|
```typescript
|
|
// lib/ai/tools/document-search.tool.ts
|
|
|
|
toolRegistry.register({
|
|
name: 'document_search',
|
|
description: 'Search within PDF documents attached to notes. Returns relevant passages with page numbers and source document info.',
|
|
isInternal: true,
|
|
buildTool: (ctx) =>
|
|
tool({
|
|
description: `Search within PDF documents attached to the user's notes.
|
|
Returns matching passages with page numbers, chunk content, and the source note/document info.
|
|
Use this when the user asks about specific documents, PDFs, or attached files.
|
|
Can search across all documents or within a specific note's attachments.`,
|
|
inputSchema: z.object({
|
|
query: z.string().describe('The search query to find relevant passages in documents'),
|
|
noteId: z.string().optional().describe('Optional: restrict search to attachments of a specific note'),
|
|
limit: z.number().optional().describe('Max results to return (default 5)').default(5),
|
|
}),
|
|
execute: async ({ query, noteId, limit = 5 }) => {
|
|
try {
|
|
const queryEmbedding = await embeddingService.generateEmbedding(query)
|
|
const vectorStr = embeddingService.toVectorString(queryEmbedding.embedding)
|
|
|
|
let noteFilter = ''
|
|
const params: any[] = [vectorStr, limit]
|
|
|
|
if (noteId) {
|
|
assertSafeId(noteId, 'noteId')
|
|
noteFilter = `AND na."noteId" = $${params.length}`
|
|
params.push(noteId)
|
|
} else if (ctx.notebookId) {
|
|
assertSafeId(ctx.notebookId, 'notebookId')
|
|
noteFilter = `AND n."notebookId" = $${params.length}`
|
|
params.push(ctx.notebookId)
|
|
}
|
|
|
|
const userId = ctx.userId
|
|
assertSafeId(userId, 'userId')
|
|
params.push(userId)
|
|
|
|
const results = await prisma.$queryRawUnsafe(
|
|
`SELECT
|
|
dc.id as chunkId,
|
|
dc.content,
|
|
dc."pageNumber",
|
|
dc."chunkIndex",
|
|
dc.metadata,
|
|
na.id as "attachmentId",
|
|
na."fileName",
|
|
na."pageCount",
|
|
na."noteId",
|
|
n.title as "noteTitle",
|
|
dc.embedding::text <=> $1::vector as distance
|
|
FROM "DocumentChunk" dc
|
|
JOIN "NoteAttachment" na ON na.id = dc."attachmentId"
|
|
JOIN "Note" n ON n.id = na."noteId"
|
|
WHERE dc.embedding IS NOT NULL
|
|
AND na.status = 'ready'
|
|
AND n."trashedAt" IS NULL
|
|
AND n."userId" = $${params.length}
|
|
${noteFilter}
|
|
ORDER BY dc.embedding::text <=> $1::vector
|
|
LIMIT $2`,
|
|
...params
|
|
) as any[]
|
|
|
|
if (!results.length) return { results: [], message: 'No matching documents found' }
|
|
|
|
const threshold = 0.5
|
|
return results
|
|
.filter(r => r.distance < threshold)
|
|
.map(r => ({
|
|
content: r.content.substring(0, 600),
|
|
pageNumber: r.pageNumber,
|
|
chunkIndex: r.chunkIndex,
|
|
fileName: r.fileName,
|
|
noteId: r.noteId,
|
|
noteTitle: r.noteTitle || 'Untitled',
|
|
score: Math.max(0, 1 - r.distance),
|
|
}))
|
|
} catch (e: any) {
|
|
return { error: `Document search failed: ${e.message}` }
|
|
}
|
|
},
|
|
}),
|
|
})
|
|
```
|
|
|
|
### C2. Auto-enregistrement
|
|
|
|
Ajout dans `lib/ai/tools/index.ts` :
|
|
|
|
```typescript
|
|
import './document-search'
|
|
```
|
|
|
|
### C3. Activation dans le Chat
|
|
|
|
Mise à jour de `registry.ts` — `buildToolsForChat` :
|
|
|
|
```typescript
|
|
buildToolsForChat(ctx: ToolContext): Tool[] {
|
|
const tools: Tool[] = []
|
|
tools.push(this.build('note_search', ctx))
|
|
tools.push(this.build('note_read', ctx))
|
|
tools.push(this.build('document_search', ctx)) // <-- NOUVEAU
|
|
if (ctx.webSearch) {
|
|
tools.push(this.build('web_search', ctx))
|
|
tools.push(this.build('web_scrape', ctx))
|
|
}
|
|
return tools
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## D. Logique de Requêtage RAG
|
|
|
|
### D1. Recherche hybride étendue — `semantic-search.service.ts`
|
|
|
|
Ajout d'une méthode `searchWithDocuments` qui combine notes ET chunks de documents :
|
|
|
|
```typescript
|
|
async searchWithDocuments(
|
|
userId: string,
|
|
query: string,
|
|
options?: SearchOptions & { noteId?: string; includeDocuments?: boolean }
|
|
): Promise<(SearchResult & { source?: 'note' | 'document'; pageNumber?: number; fileName?: string })[]> {
|
|
const includeDocuments = options?.includeDocuments !== false
|
|
|
|
// Phase 1: Recherche notes existante (FTS + pgvector + RRF)
|
|
const noteResults = await this.searchAsUser(userId, query, options)
|
|
|
|
// Phase 2: Recherche dans les documents (pgvector uniquement)
|
|
let documentResults: any[] = []
|
|
if (includeDocuments) {
|
|
const queryEmbedding = await embeddingService.generateEmbedding(query)
|
|
const vectorStr = embeddingService.toVectorString(queryEmbedding.embedding)
|
|
|
|
const params: any[] = [vectorStr, 50, userId]
|
|
let noteFilter = ''
|
|
if (options?.noteId) {
|
|
assertSafeId(options.noteId, 'noteId')
|
|
noteFilter = `AND na."noteId" = $${params.length + 1}`
|
|
params.push(options.noteId)
|
|
}
|
|
if (options?.notebookId) {
|
|
assertSafeId(options.notebookId, 'notebookId')
|
|
noteFilter += ` AND n."notebookId" = $${params.length + 1}`
|
|
params.push(options.notebookId)
|
|
}
|
|
|
|
documentResults = await prisma.$queryRawUnsafe(
|
|
`SELECT
|
|
dc.content,
|
|
dc."pageNumber",
|
|
na."fileName",
|
|
na."noteId",
|
|
n.title as "noteTitle",
|
|
1 - (dc.embedding::text <=> $1::vector) as score
|
|
FROM "DocumentChunk" dc
|
|
JOIN "NoteAttachment" na ON na.id = dc."attachmentId"
|
|
JOIN "Note" n ON n.id = na."noteId"
|
|
WHERE dc.embedding IS NOT NULL
|
|
AND na.status = 'ready'
|
|
AND n."trashedAt" IS NULL
|
|
AND n."userId" = $3
|
|
${noteFilter}
|
|
ORDER BY dc.embedding::text <=> $1::vector
|
|
LIMIT $2`,
|
|
...params
|
|
) as any[]
|
|
}
|
|
|
|
// Phase 3: Fusion RRF entre notes et documents
|
|
const K = 60
|
|
const fused = new Map<string, any>()
|
|
|
|
for (let i = 0; i < noteResults.length; i++) {
|
|
const r = noteResults[i]
|
|
fused.set(r.noteId, {
|
|
...r,
|
|
source: 'note',
|
|
rrfScore: 1 / (K + i + 1),
|
|
})
|
|
}
|
|
|
|
for (let i = 0; i < documentResults.length; i++) {
|
|
const r = documentResults[i]
|
|
const key = `doc_${r.noteId}_${r.pageNumber}_${i}`
|
|
fused.set(key, {
|
|
noteId: r.noteId,
|
|
title: `${r.noteTitle || 'Untitled'} → ${r.fileName} (p.${r.pageNumber})`,
|
|
content: r.content.substring(0, 500),
|
|
score: r.score,
|
|
matchType: 'related',
|
|
source: 'document',
|
|
pageNumber: r.pageNumber,
|
|
fileName: r.fileName,
|
|
rrfScore: 1 / (K + i + 1),
|
|
})
|
|
}
|
|
|
|
return Array.from(fused.values())
|
|
.sort((a, b) => b.rrfScore - a.rrfScore)
|
|
.slice(0, options?.limit || 20)
|
|
}
|
|
```
|
|
|
|
### D2. Logique de priorisation dans le Chat RAG
|
|
|
|
Mise à jour de `app/api/chat/route.ts` :
|
|
|
|
```typescript
|
|
// Dans le handler du chat, avant d'injecter le contexte :
|
|
|
|
let contextNotes = ''
|
|
|
|
// Si l'utilisateur mentionne un document/PDF spécifique
|
|
const documentMention = userMessage.match(
|
|
/\b(pdf|document|fichier|pi[eè]ce jointe|attachment|file)\b/i
|
|
)
|
|
const specificNote = userMessage.match(
|
|
/(?:dans|sur|de|du|la|le) (?:cette note|ce document|cette page)/i
|
|
)
|
|
|
|
if (specificNote && notebookId) {
|
|
// MODE CIBLE : chercher SEULEMENT dans les documents de cette note
|
|
const docResults = await semanticSearchService.searchWithDocuments(
|
|
userId, userMessage, { noteId: currentNoteId, includeDocuments: true, limit: 5 }
|
|
)
|
|
contextNotes = docResults.map(r =>
|
|
r.source === 'document'
|
|
? `[DOCUMENT: ${r.fileName} p.${r.pageNumber}]\n${r.content}`
|
|
: `[NOTE: ${r.title}]\n${r.content}`
|
|
).join('\n\n---\n\n')
|
|
} else {
|
|
// MODE GLOBAL : recherche étendue notes + documents
|
|
const results = await semanticSearchService.searchWithDocuments(
|
|
userId, userMessage, { notebookId, includeDocuments: !!documentMention, limit: 10 }
|
|
)
|
|
contextNotes = results.map(r =>
|
|
r.source === 'document'
|
|
? `[DOCUMENT: ${r.fileName} p.${r.pageNumber}]\n${r.content}`
|
|
: `[NOTE: ${r.title}]\n${r.content}`
|
|
).join('\n\n---\n\n')
|
|
}
|
|
```
|
|
|
|
### D3. Prompt système mis à jour
|
|
|
|
```typescript
|
|
const systemPrompt = `Tu es l'IA Note de Memento, l'assistant intelligent de prise de notes.
|
|
|
|
CONTEXTES DISPONIBLES :
|
|
- [NOTE: titre] → contenu d'une note de l'utilisateur
|
|
- [DOCUMENT: fichier.pdf p.X] → passage extrait d'un PDF attaché à une note
|
|
|
|
RÈGLES POUR LES DOCUMENTS :
|
|
- Cite toujours le nom du fichier et le numéro de page quand tu te réfères à un document
|
|
- Si l'utilisateur pose une question sur "ce document" ou "le PDF", base ta réponse uniquement sur les passages [DOCUMENT]
|
|
- Si les passages sont insuffisants, dis-le clairement plutôt que de deviner
|
|
- Pour les tableaux et données chiffrées, reproduis-les fidèlement
|
|
|
|
...`
|
|
```
|
|
|
|
### D4. SQL — Requête de débogage / test
|
|
|
|
```sql
|
|
-- Test : recherche dans les chunks d'un document spécifique
|
|
SELECT
|
|
dc.content,
|
|
dc."pageNumber",
|
|
dc."chunkIndex",
|
|
na."fileName",
|
|
n.title as note_title,
|
|
dc.embedding::text <=> '[0.01, 0.02, ...]'::vector as distance
|
|
FROM "DocumentChunk" dc
|
|
JOIN "NoteAttachment" na ON na.id = dc."attachmentId"
|
|
JOIN "Note" n ON n.id = na."noteId"
|
|
WHERE na.status = 'ready'
|
|
AND n."trashedAt" IS NULL
|
|
ORDER BY dc.embedding::text <=> '[0.01, 0.02, ...]'::vector
|
|
LIMIT 10;
|
|
```
|
|
|
|
---
|
|
|
|
## Résumé des fichiers à créer/modifier
|
|
|
|
| Action | Fichier |
|
|
|---|---|
|
|
| **CRÉER** | `prisma/migrations/XXX_add_note_attachment_document_chunk/migration.sql` |
|
|
| **MODIFIER** | `prisma/schema.prisma` — ajouter NoteAttachment, DocumentChunk, relation sur Note |
|
|
| **CRÉER** | `lib/ai/services/document-extraction.service.ts` |
|
|
| **CRÉER** | `lib/ai/services/document-chunking.service.ts` |
|
|
| **CRÉER** | `lib/ai/services/document-ingestion.service.ts` |
|
|
| **CRÉER** | `lib/ai/tools/document-search.tool.ts` |
|
|
| **MODIFIER** | `lib/ai/tools/index.ts` — ajouter import document-search |
|
|
| **MODIFIER** | `lib/ai/tools/registry.ts` — ajouter document_search dans buildToolsForChat |
|
|
| **CRÉER** | `app/api/notes/[noteId]/attachments/route.ts` — upload |
|
|
| **CRÉER** | `app/api/notes/[noteId]/attachments/[attachmentId]/route.ts` — GET status, DELETE |
|
|
| **MODIFIER** | `lib/ai/services/semantic-search.service.ts` — ajouter searchWithDocuments |
|
|
| **MODIFIER** | `app/api/chat/route.ts` — contexte documents dans le RAG |
|