Chat_bot_Rag/README.md

2.4 KiB

RAG Chatbot

This repository contains a Retrieval Augmented Generation (RAG) chatbot implementation that can process data and answer questions based on the provided context.

Requirements

Python Version

⚠️ Important: This project requires Python version lower than 3.12. Python 3.11 works correctly.

Installation

  1. Clone this repository:
git clone <repository-url>
cd <repository-name>
  1. Install the required dependencies:
pip install -r requirement.txt

Usage

Command Line Interface

Run the chatbot in terminal mode:

python cli.py

Web Interface

Launch the Gradio web interface:

python gradio_chatbot.py

RAG Implementation

If you want to import the RAG functionality in your own Python script:

from rag_chatbot import RagChatbot

chatbot = RagChatbot()
response = chatbot.query("your question here")

PDF Processing

The repository includes two methods for processing PDF documents as knowledge sources:

PDF Processing Class

A highly configurable PdfProcessor class is available for extracting text, images, and tables from PDF documents and storing them in a Qdrant vector database.

Key features:

  • Support for both Ollama and OpenAI models
  • Configurable embedding, text summarization, and image analysis models
  • Automatic text chunking based on document structure
  • Image and table extraction with descriptions
  • Customizable Qdrant collection configuration

Example usage:

from pdf_processor import PdfProcessor

# Basic usage with default settings
processor = PdfProcessor()
result = processor.process_pdf("path/to/document.pdf")

# Custom configuration
config = {
    "embedding_provider": "openai",
    "image_provider": "openai", 
    "collection_name": "my_documents",
    "openai_api_key": "your-api-key",
    "summary_language": "French"
}
processor = PdfProcessor(config)
processor.process_pdf("path/to/document.pdf")

Jupyter Notebook

For interactive PDF processing, you can also use the Jupyter notebook final_pdf.ipynb.

Project Structure