Chat_bot_Rag/README.md

# RAG Chatbot

This repository contains a Retrieval Augmented Generation (RAG) chatbot implementation that can process data and answer questions based on the provided context.

## Requirements

### Python Version
⚠️ **Important**: This project requires Python version lower than 3.12. Python 3.11 works correctly.

## Installation

1. Clone this repository:
```bash
git clone <repository-url>
cd <repository-name>
```

2. Install the required dependencies:
```bash
pip install -r requirement.txt
```

## Usage

### Command Line Interface
Run the chatbot in terminal mode:
```bash
python cli.py
```

### Web Interface
Launch the Gradio web interface:
```bash
python gradio_chatbot.py
```

### RAG Implementation
If you want to import the RAG functionality in your own Python script:
```python
from rag_chatbot import RagChatbot

chatbot = RagChatbot()
response = chatbot.query("your question here")
```

## PDF Processing
The repository includes two methods for processing PDF documents as knowledge sources:

### PDF Processing Class
A highly configurable `PdfProcessor` class is available for extracting text, images, and tables from PDF documents and storing them in a Qdrant vector database.

Key features:
- Support for both Ollama and OpenAI models
- Configurable embedding, text summarization, and image analysis models
- Automatic text chunking based on document structure
- Image and table extraction with descriptions
- Customizable Qdrant collection configuration

Example usage:
```python
from pdf_processor import PdfProcessor

# Basic usage with default settings
processor = PdfProcessor()
result = processor.process_pdf("path/to/document.pdf")

# Custom configuration
config = {
    "embedding_provider": "openai",
    "image_provider": "openai",
    "collection_name": "my_documents",
    "openai_api_key": "your-api-key",
    "summary_language": "French"
}
processor = PdfProcessor(config)
processor.process_pdf("path/to/document.pdf")
```

### Jupyter Notebook
For interactive PDF processing, you can also use the Jupyter notebook [`final_pdf.ipynb`](final_pdf.ipynb).

## Project Structure
- [`cli.py`](cli.py): Command-line interface implementation
- [`gradio_chatbot.py`](gradio_chatbot.py): Gradio web interface
- [`rag_chatbot.py`](rag_chatbot.py): Core RAG implementation
- [`pdf_processor.py`](pdf_processor.py): PDF processing and vectorization
- [`final_pdf.ipynb`](final_pdf.ipynb): Jupyter notebook for PDF processing