MediRag — Likhith Bavisetti

Overview

Healthcare pricing in the U.S. is opaque, inconsistent, and difficult to estimate without domain knowledge. Even publicly available datasets are not easily accessible to everyday users due to their structure and complexity.

MediRag addresses this by transforming natural language queries into grounded, data-backed cost estimates using an advanced Retrieval-Augmented Generation pipeline.

Core question: How might we convert complex healthcare datasets into an intuitive system that allows users to ask simple questions and receive reliable cost estimates?

Dataset

Source: CMS Medicare Physician & Other Practitioners by Geography and Service

Rows ~250,000

Granularity State-level

Coding HCPCS

Source Medicare Part B

Key Fields

Submitted charges & allowed amounts
Medicare payments & service utilization
Geography (state), HCPCS procedure codes
Place of service & beneficiary characteristics

Tradeoffs

Represents Medicare population, not full market pricing
Aggregated statistics — not individual-level data
Requires semantic mapping for usability due to HCPCS abstraction

Problem

Healthcare cost data is highly structured but not user-friendly
Users cannot easily map natural language to HCPCS-based medical services
Traditional search fails due to semantic mismatch
Existing solutions either ignore structured constraints or lack semantic understanding

A typical user cannot query "How much does a dermatology procedure cost in California?" without understanding HCPCS codes, navigating tabular data, or writing structured queries. Data exists — but access is effectively blocked.

Initial Approach — ChromaDB + Basic RAG

The first iteration used ChromaDB for vector storage and a basic semantic retrieval pipeline.

What Worked

Fast prototyping
Simple embedding-based search

Limitations

No tight integration with structured filters like state
Poor control over query optimization
Not production-grade for scaling
Data duplication between systems

Key Insight: Pure semantic search retrieves "similar text" but ignores hard constraints like geography. Healthcare queries require semantic understanding for what and structured filtering for where.

Final Architecture

User Query

FastAPI

Query Parsing — GPT-4o-mini

Structured Filters (state, condition)

Embedding - bge-large-en-v1.5

PostgreSQL + pgvector + IVFFlat

Top-K Retrieval

Context Assembly

Response Generation — GPT-5

1. NLP Guardrails

Natural language input is parsed by GPT-4o-mini to extract state and medical context before retrieval. Prevents invalid queries, aligns intent with schema, and reduces hallucination risk. Tradeoff: additional latency from the extra model call.

2. PostgreSQL + pgvector

Migrated from ChromaDB to PostgreSQL with pgvector — a unified system for both structured data and vector embeddings, eliminating a separate vector database and adding production reliability.

3. IVFFlat Indexing

Approximate nearest-neighbor search for fast retrieval at scale. Requires tuning list count and probe count. Slight recall tradeoff is acceptable given the performance gain on 250k rows.

4. Hybrid Retrieval Strategy

Step 1: Apply structured filters (state)
Step 2: Perform vector similarity search within filtered set
Step 3: Return top-k relevant records

5. LangChain Orchestration

Manages prompt templates, retrieval pipelines, and multi-step execution. Reduces boilerplate and standardizes RAG workflows across iterations.

6. Response Generation

GPT-5 combines retrieved context and generates human-readable cost estimates with data-grounded reasoning.

API

Built with FastAPI. Core endpoint:

POST /query

// Input
{
  "query": "cost of knee surgery in Florida"
}

// Output
{
  "state":          "Florida",
  "procedure":      "Knee-related service",
  "estimated_cost": "...",
  "explanation":    "..."
}

Design Tradeoffs

Decision	Benefit	Tradeoff
pgvector over ChromaDB	Production-ready, unified system	More setup and tuning
IVFFlat indexing	Fast retrieval at scale	Approximate results
Hybrid RAG	High precision on structured queries	More complex pipeline
Multi-model setup	Better parsing + reasoning	Increased latency & cost
LLM-based parsing	Flexible natural language understanding	Cost + external dependency

Impact

Converts a complex 250k-row healthcare dataset into an accessible natural language interface
Bridges the gap between structured Medicare data and everyday users
Demonstrates production-grade RAG architecture beyond toy examples
Reduces friction in healthcare cost estimation significantly

Future Work

Hybrid search — keyword + vector combined
Fine-tuned medical embeddings for better semantic alignment
Query caching layer to reduce LLM costs
Frontend interface for broader accessibility
Expansion beyond Medicare to full market pricing datasets