RAG Chatbot
This repository contains a Retrieval Augmented Generation (RAG) chatbot implementation that can process data and answer questions based on the provided context.
Requirements
Python Version
⚠️ Important: This project requires Python version lower than 3.12. Python 3.11 works correctly.
Installation
- Clone this repository:
git clone <repository-url>
cd <repository-name>
- Install the required dependencies:
pip install -r requirement.txt
Usage
Command Line Interface
Run the chatbot in terminal mode:
python cli.py
Web Interface
Launch the Gradio web interface:
python gradio_chatbot.py
RAG Implementation
If you want to import the RAG functionality in your own Python script:
from rag_chatbot import RagChatbot
chatbot = RagChatbot()
response = chatbot.query("your question here")
PDF Processing
The repository includes two methods for processing PDF documents as knowledge sources:
PDF Processing Class
A highly configurable PdfProcessor class is available for extracting text, images, and tables from PDF documents and storing them in a Qdrant vector database.
Key features:
- Support for both Ollama and OpenAI models
- Configurable embedding, text summarization, and image analysis models
- Automatic text chunking based on document structure
- Image and table extraction with descriptions
- Customizable Qdrant collection configuration
Example usage:
from pdf_processor import PdfProcessor
# Basic usage with default settings
processor = PdfProcessor()
result = processor.process_pdf("path/to/document.pdf")
# Custom configuration
config = {
"embedding_provider": "openai",
"image_provider": "openai",
"collection_name": "my_documents",
"openai_api_key": "your-api-key",
"summary_language": "French"
}
processor = PdfProcessor(config)
processor.process_pdf("path/to/document.pdf")
Jupyter Notebook
For interactive PDF processing, you can also use the Jupyter notebook final_pdf.ipynb.
Project Structure
cli.py: Command-line interface implementationgradio_chatbot.py: Gradio web interfacerag_chatbot.py: Core RAG implementationpdf_processor.py: PDF processing and vectorizationfinal_pdf.ipynb: Jupyter notebook for PDF processing
Description
Languages
Jupyter Notebook
78.1%
Python
21.9%