# RAG Modeling Project ## Overview RAG Modeling is an advanced Retrieval-Augmented Generation system with comprehensive document processing capabilities. The system focuses on extracting high-quality data from PDF documents including text, images, and tables to build robust RAG applications. ## Features - **Advanced PDF Processing**: - Multiple extraction methods for maximum text coverage - Image extraction with OCR capabilities - Table detection and extraction - Structured document parsing - **Text Processing Pipeline**: - Intelligent text chunking for optimal context management - Support for multiple languages - Metadata preservation - **Modular Architecture**: - Component-based design for easy extension - Configurable processing parameters ## Installation ```bash # Clone the repository git clone https://gitea.parsanet.org/sepehr/rag.git cd rag # Create a virtual environment (optional but recommended) python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Install additional dependencies pip install unstructured pytesseract camelot-py opencv-python pandas