RAG Modeling Project
Overview
RAG Modeling is an advanced Retrieval-Augmented Generation system with comprehensive document processing capabilities. The system focuses on extracting high-quality data from PDF documents including text, images, and tables to build robust RAG applications.
Features
-
Advanced PDF Processing:
- Multiple extraction methods for maximum text coverage
- Image extraction with OCR capabilities
- Table detection and extraction
- Structured document parsing
-
Text Processing Pipeline:
- Intelligent text chunking for optimal context management
- Support for multiple languages
- Metadata preservation
-
Modular Architecture:
- Component-based design for easy extension
- Configurable processing parameters
Installation
# Clone the repository
git clone https://gitea.parsanet.org/sepehr/rag.git
cd rag
# Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install additional dependencies
pip install unstructured pytesseract camelot-py opencv-python pandas
Description
Languages
Jupyter Notebook
96.9%
Python
3.1%