87 lines
2.4 KiB
Markdown
87 lines
2.4 KiB
Markdown
# RAG Chatbot
|
|
|
|
This repository contains a Retrieval Augmented Generation (RAG) chatbot implementation that can process data and answer questions based on the provided context.
|
|
|
|
## Requirements
|
|
|
|
### Python Version
|
|
⚠️ **Important**: This project requires Python version lower than 3.12. Python 3.11 works correctly.
|
|
|
|
## Installation
|
|
|
|
1. Clone this repository:
|
|
```bash
|
|
git clone <repository-url>
|
|
cd <repository-name>
|
|
```
|
|
|
|
2. Install the required dependencies:
|
|
```bash
|
|
pip install -r requirement.txt
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Command Line Interface
|
|
Run the chatbot in terminal mode:
|
|
```bash
|
|
python cli.py
|
|
```
|
|
|
|
### Web Interface
|
|
Launch the Gradio web interface:
|
|
```bash
|
|
python gradio_chatbot.py
|
|
```
|
|
|
|
### RAG Implementation
|
|
If you want to import the RAG functionality in your own Python script:
|
|
```python
|
|
from rag_chatbot import RagChatbot
|
|
|
|
chatbot = RagChatbot()
|
|
response = chatbot.query("your question here")
|
|
```
|
|
|
|
## PDF Processing
|
|
The repository includes two methods for processing PDF documents as knowledge sources:
|
|
|
|
### PDF Processing Class
|
|
A highly configurable `PdfProcessor` class is available for extracting text, images, and tables from PDF documents and storing them in a Qdrant vector database.
|
|
|
|
Key features:
|
|
- Support for both Ollama and OpenAI models
|
|
- Configurable embedding, text summarization, and image analysis models
|
|
- Automatic text chunking based on document structure
|
|
- Image and table extraction with descriptions
|
|
- Customizable Qdrant collection configuration
|
|
|
|
Example usage:
|
|
```python
|
|
from pdf_processor import PdfProcessor
|
|
|
|
# Basic usage with default settings
|
|
processor = PdfProcessor()
|
|
result = processor.process_pdf("path/to/document.pdf")
|
|
|
|
# Custom configuration
|
|
config = {
|
|
"embedding_provider": "openai",
|
|
"image_provider": "openai",
|
|
"collection_name": "my_documents",
|
|
"openai_api_key": "your-api-key",
|
|
"summary_language": "French"
|
|
}
|
|
processor = PdfProcessor(config)
|
|
processor.process_pdf("path/to/document.pdf")
|
|
```
|
|
|
|
### Jupyter Notebook
|
|
For interactive PDF processing, you can also use the Jupyter notebook [`final_pdf.ipynb`](final_pdf.ipynb).
|
|
|
|
## Project Structure
|
|
- [`cli.py`](cli.py): Command-line interface implementation
|
|
- [`gradio_chatbot.py`](gradio_chatbot.py): Gradio web interface
|
|
- [`rag_chatbot.py`](rag_chatbot.py): Core RAG implementation
|
|
- [`pdf_processor.py`](pdf_processor.py): PDF processing and vectorization
|
|
- [`final_pdf.ipynb`](final_pdf.ipynb): Jupyter notebook for PDF processing |