rag/README.md
2025-03-01 08:15:30 +01:00

39 lines
1.2 KiB
Markdown

# RAG Modeling Project
## Overview
RAG Modeling is an advanced Retrieval-Augmented Generation system with comprehensive document processing capabilities. The system focuses on extracting high-quality data from PDF documents including text, images, and tables to build robust RAG applications.
## Features
- **Advanced PDF Processing**:
- Multiple extraction methods for maximum text coverage
- Image extraction with OCR capabilities
- Table detection and extraction
- Structured document parsing
- **Text Processing Pipeline**:
- Intelligent text chunking for optimal context management
- Support for multiple languages
- Metadata preservation
- **Modular Architecture**:
- Component-based design for easy extension
- Configurable processing parameters
## Installation
```bash
# Clone the repository
git clone https://gitea.parsanet.org/sepehr/rag.git
cd rag
# Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install additional dependencies
pip install unstructured pytesseract camelot-py opencv-python pandas