# RAG Chatbot This repository contains a Retrieval Augmented Generation (RAG) chatbot implementation that can process data and answer questions based on the provided context. ## Requirements ### Python Version ⚠️ **Important**: This project requires Python version lower than 3.12. Python 3.11 works correctly. ## Installation 1. Clone this repository: ```bash git clone cd ``` 2. Install the required dependencies: ```bash pip install -r requirement.txt ``` ## Usage ### Command Line Interface Run the chatbot in terminal mode: ```bash python cli.py ``` ### Web Interface Launch the Gradio web interface: ```bash python gradio_chatbot.py ``` ### RAG Implementation If you want to import the RAG functionality in your own Python script: ```python from rag_chatbot import RagChatbot chatbot = RagChatbot() response = chatbot.query("your question here") ``` ## PDF Processing The repository includes two methods for processing PDF documents as knowledge sources: ### PDF Processing Class A highly configurable `PdfProcessor` class is available for extracting text, images, and tables from PDF documents and storing them in a Qdrant vector database. Key features: - Support for both Ollama and OpenAI models - Configurable embedding, text summarization, and image analysis models - Automatic text chunking based on document structure - Image and table extraction with descriptions - Customizable Qdrant collection configuration Example usage: ```python from pdf_processor import PdfProcessor # Basic usage with default settings processor = PdfProcessor() result = processor.process_pdf("path/to/document.pdf") # Custom configuration config = { "embedding_provider": "openai", "image_provider": "openai", "collection_name": "my_documents", "openai_api_key": "your-api-key", "summary_language": "French" } processor = PdfProcessor(config) processor.process_pdf("path/to/document.pdf") ``` ### Jupyter Notebook For interactive PDF processing, you can also use the Jupyter notebook [`final_pdf.ipynb`](final_pdf.ipynb). ## Project Structure - [`cli.py`](cli.py): Command-line interface implementation - [`gradio_chatbot.py`](gradio_chatbot.py): Gradio web interface - [`rag_chatbot.py`](rag_chatbot.py): Core RAG implementation - [`pdf_processor.py`](pdf_processor.py): PDF processing and vectorization - [`final_pdf.ipynb`](final_pdf.ipynb): Jupyter notebook for PDF processing