LB
likhith-bavisetti
MediRag
2026 RAG LLM Python FastAPI pgvector LangChain Docker

MediRag

Advanced RAG System for Healthcare Cost Estimation

Backend & AI Systems Engineer

Overview

Healthcare pricing in the U.S. is opaque, inconsistent, and difficult to estimate without domain knowledge. Even publicly available datasets are not easily accessible to everyday users due to their structure and complexity.

MediRag addresses this by transforming natural language queries into grounded, data-backed cost estimates using an advanced Retrieval-Augmented Generation pipeline.

Core question: How might we convert complex healthcare datasets into an intuitive system that allows users to ask simple questions and receive reliable cost estimates?


Dataset

Source: CMS Medicare Physician & Other Practitioners by Geography and Service

Rows ~250,000
Granularity State-level
Coding HCPCS
Source Medicare Part B

Key Fields

Tradeoffs


Problem

A typical user cannot query "How much does a dermatology procedure cost in California?" without understanding HCPCS codes, navigating tabular data, or writing structured queries. Data exists — but access is effectively blocked.


Initial Approach — ChromaDB + Basic RAG

The first iteration used ChromaDB for vector storage and a basic semantic retrieval pipeline.

What Worked

Limitations

Key Insight: Pure semantic search retrieves "similar text" but ignores hard constraints like geography. Healthcare queries require semantic understanding for what and structured filtering for where.


Final Architecture

User Query
FastAPI
Query Parsing — GPT-4o-mini
Structured Filters (state, condition)
Embedding - bge-large-en-v1.5
PostgreSQL + pgvector + IVFFlat
Top-K Retrieval
Context Assembly
Response Generation — GPT-5

1. NLP Guardrails

Natural language input is parsed by GPT-4o-mini to extract state and medical context before retrieval. Prevents invalid queries, aligns intent with schema, and reduces hallucination risk. Tradeoff: additional latency from the extra model call.

2. PostgreSQL + pgvector

Migrated from ChromaDB to PostgreSQL with pgvector — a unified system for both structured data and vector embeddings, eliminating a separate vector database and adding production reliability.

3. IVFFlat Indexing

Approximate nearest-neighbor search for fast retrieval at scale. Requires tuning list count and probe count. Slight recall tradeoff is acceptable given the performance gain on 250k rows.

4. Hybrid Retrieval Strategy

5. LangChain Orchestration

Manages prompt templates, retrieval pipelines, and multi-step execution. Reduces boilerplate and standardizes RAG workflows across iterations.

6. Response Generation

GPT-5 combines retrieved context and generates human-readable cost estimates with data-grounded reasoning.


API

Built with FastAPI. Core endpoint:

POST /query

// Input
{
  "query": "cost of knee surgery in Florida"
}

// Output
{
  "state":          "Florida",
  "procedure":      "Knee-related service",
  "estimated_cost": "...",
  "explanation":    "..."
}

Design Tradeoffs

Decision Benefit Tradeoff
pgvector over ChromaDB Production-ready, unified system More setup and tuning
IVFFlat indexing Fast retrieval at scale Approximate results
Hybrid RAG High precision on structured queries More complex pipeline
Multi-model setup Better parsing + reasoning Increased latency & cost
LLM-based parsing Flexible natural language understanding Cost + external dependency

Impact

Future Work