Chat_bot_Rag/README.md

87 lines
2.4 KiB
Markdown

# RAG Chatbot
This repository contains a Retrieval Augmented Generation (RAG) chatbot implementation that can process data and answer questions based on the provided context.
## Requirements
### Python Version
⚠️ **Important**: This project requires Python version lower than 3.12. Python 3.11 works correctly.
## Installation
1. Clone this repository:
```bash
git clone <repository-url>
cd <repository-name>
```
2. Install the required dependencies:
```bash
pip install -r requirement.txt
```
## Usage
### Command Line Interface
Run the chatbot in terminal mode:
```bash
python cli.py
```
### Web Interface
Launch the Gradio web interface:
```bash
python gradio_chatbot.py
```
### RAG Implementation
If you want to import the RAG functionality in your own Python script:
```python
from rag_chatbot import RagChatbot
chatbot = RagChatbot()
response = chatbot.query("your question here")
```
## PDF Processing
The repository includes two methods for processing PDF documents as knowledge sources:
### PDF Processing Class
A highly configurable `PdfProcessor` class is available for extracting text, images, and tables from PDF documents and storing them in a Qdrant vector database.
Key features:
- Support for both Ollama and OpenAI models
- Configurable embedding, text summarization, and image analysis models
- Automatic text chunking based on document structure
- Image and table extraction with descriptions
- Customizable Qdrant collection configuration
Example usage:
```python
from pdf_processor import PdfProcessor
# Basic usage with default settings
processor = PdfProcessor()
result = processor.process_pdf("path/to/document.pdf")
# Custom configuration
config = {
"embedding_provider": "openai",
"image_provider": "openai",
"collection_name": "my_documents",
"openai_api_key": "your-api-key",
"summary_language": "French"
}
processor = PdfProcessor(config)
processor.process_pdf("path/to/document.pdf")
```
### Jupyter Notebook
For interactive PDF processing, you can also use the Jupyter notebook [`final_pdf.ipynb`](final_pdf.ipynb).
## Project Structure
- [`cli.py`](cli.py): Command-line interface implementation
- [`gradio_chatbot.py`](gradio_chatbot.py): Gradio web interface
- [`rag_chatbot.py`](rag_chatbot.py): Core RAG implementation
- [`pdf_processor.py`](pdf_processor.py): PDF processing and vectorization
- [`final_pdf.ipynb`](final_pdf.ipynb): Jupyter notebook for PDF processing