📜 Political Analytics RAG Learning Portal

This project is an advanced Retrieval-Augmented Generation (RAG) system designed as an interactive learning portal for political analytics. It leverages the Socratic method, using a Q&A interface to help users explore complex topics in the intersection of data science and political science.
The system is built with two distinct architectures: a foundational RAG pipeline and an enhanced version that incorporates structured, step-by-step reasoning for more complex analytical queries.

✨ Key Features

Dual RAG Architecture

Simple RAG: A robust baseline system for answering factual questions by retrieving relevant text chunks from a knowledge base.
Enhanced Reasoning RAG: Utilizes the llm-reasoners framework to introduce a multi-step reasoning process for handling complex, analytical questions.

Interactive Learning Portal

A user-friendly web interface built with Streamlit that allows users to ask questions, view answers, and explore source documents.
Includes features like modular viewing and progress tracking to guide the user's educational journey.

Comprehensive Data Ingestion

Builds a knowledge base from a combination of dynamic web pages (17 URLs) and static academic PDFs (7 documents).

Robust Evaluation Framework

Integrated with the RAGAs framework to quantitatively evaluate the performance of both the simple and enhanced RAG systems.

⚙️ System Architecture

The project follows a modular RAG pipeline that progresses from data ingestion to user interaction:

Data Ingestion: Content from specified URLs and local PDFs is loaded into the system.
Text Processing & Vectorization: Documents are chunked and then converted into numerical embeddings.
Vector Store Creation: The embeddings are stored in a FAISS vector database, creating a searchable knowledge index for efficient retrieval.
Core RAG Logic:
- When a user asks a question, the system performs a semantic search on the vector store to retrieve relevant document chunks.
- For complex questions, the llm-reasoners framework adds a structured reasoning layer via prompt engineering before the final answer is generated by an OpenAI model.
User Interface: A Streamlit application provides the front-end for users to interact with the RAG system, displaying the answer, sources, and learning progress.

📊 Evaluation Results

The system was evaluated using the RAGAs framework to compare the baseline RAG against the version enhanced with llm-reasoners.
The results showed a general decline in performance after the enhancements, likely due to "over-reasoning," where adding a reasoning framework to a capable model like gpt-4o-mini confounds its natural generation abilities.

Metric	Basic RAG Score	llm-reasoners Enhanced RAG	Change
Faithfulness	0.821	0.536	-0.285
Answer Relevancy	0.957	0.830	-0.127
Context Precision	1.000	0.806	-0.194
Context Recall	0.819	0.769	-0.050

🚀 Getting Started

1. Prerequisites

Python 3.8+
Git

2. Installation

Clone the repository:

git clone <your-repo-url>
cd <your-repo-folder>

Create a virtual environment and activate it:

python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate

Install the required dependencies:

pip install -r requirements.txt

Key packages include: streamlit, langchain, ragas, faiss-cpu, sentence-transformers, openai, beautifulsoup4, pypdf, python-dotenv.

Set up environment variables: Create a file named .env in the root directory of the project and add your OpenAI API key:

OPENAI_API_KEY="sk-..."

3. Usage

Build the Knowledge Base
Run the ingestion script to process URLs and PDFs, create embeddings, and save the FAISS index.

Launch the Web Application

streamlit run streamlit_base.py

Run Evaluations
To reproduce evaluation results, run the scripts generating RAGAs scores for both pipelines.

📁 File Structure

.
├── src/
│   ├── datapdfs/               # Local PDF documents
│   └── test_questions_lp.txt   # Test questions for evaluation
├── config.py                   # Central configuration
├── rag_base.py                 # Simple RAG logic
├── rag_run.py                  # Ingestion and vector store creation
├── enhanced_rag_base.py        # Reasoning-enhanced RAG logic
├── enhanced_rag_run.py         # Example script for enhanced RAG
├── streamlit_base.py           # Streamlit front-end
├── rag_base_ragas_data_creation.py # Evaluation data prep (simple RAG)
├── rag_base_ragas_results.py   # Run RAGAs on simple RAG
├── enhanced_ragas_data_creation.py # Evaluation data prep (enhanced RAG)
└── enhanced_ragas_results.py   # Run RAGAs on enhanced RAG

🛠️ Technologies Used

Core Framework: LangChain
Reasoning Framework: llm-reasoners
Web UI: Streamlit
Vector Database: FAISS
Embeddings: Hugging Face Sentence-Transformers
LLMs: OpenAI
Evaluation: RAGAs

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
results		results
src		src
tests		tests
.env.template		.env.template
.gitignore		.gitignore
Final_Report_vbd.pdf		Final_Report_vbd.pdf
Question-Objective-Source mapping.xlsx		Question-Objective-Source mapping.xlsx
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📜 Political Analytics RAG Learning Portal

✨ Key Features

Dual RAG Architecture

Interactive Learning Portal

Comprehensive Data Ingestion

Robust Evaluation Framework

⚙️ System Architecture

📊 Evaluation Results

🚀 Getting Started

1. Prerequisites

2. Installation

3. Usage

📁 File Structure

🛠️ Technologies Used

About

Uh oh!

Releases

Packages

Languages

vashishthdoshi/data-analytics-in-politics-learningportal

Folders and files

Latest commit

History

Repository files navigation

📜 Political Analytics RAG Learning Portal

✨ Key Features

Dual RAG Architecture

Interactive Learning Portal

Comprehensive Data Ingestion

Robust Evaluation Framework

⚙️ System Architecture

📊 Evaluation Results

🚀 Getting Started

1. Prerequisites

2. Installation

3. Usage

📁 File Structure

🛠️ Technologies Used

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages