Advanced RAG System for Healthcare Cost Estimation
Healthcare pricing in the U.S. is opaque, inconsistent, and difficult to estimate without domain knowledge. Even publicly available datasets are not easily accessible to everyday users due to their structure and complexity.
MediRag addresses this by transforming natural language queries into grounded, data-backed cost estimates using an advanced Retrieval-Augmented Generation pipeline.
Core question: How might we convert complex healthcare datasets into an intuitive system that allows users to ask simple questions and receive reliable cost estimates?
Source: CMS Medicare Physician & Other Practitioners by Geography and Service
A typical user cannot query "How much does a dermatology procedure cost in California?" without understanding HCPCS codes, navigating tabular data, or writing structured queries. Data exists — but access is effectively blocked.
The first iteration used ChromaDB for vector storage and a basic semantic retrieval pipeline.
Key Insight: Pure semantic search retrieves "similar text" but ignores hard constraints like geography. Healthcare queries require semantic understanding for what and structured filtering for where.
Natural language input is parsed by GPT-4o-mini to extract state and medical context before retrieval. Prevents invalid queries, aligns intent with schema, and reduces hallucination risk. Tradeoff: additional latency from the extra model call.
Migrated from ChromaDB to PostgreSQL with pgvector — a unified system for both structured data and vector embeddings, eliminating a separate vector database and adding production reliability.
Approximate nearest-neighbor search for fast retrieval at scale. Requires tuning list count and probe count. Slight recall tradeoff is acceptable given the performance gain on 250k rows.
Manages prompt templates, retrieval pipelines, and multi-step execution. Reduces boilerplate and standardizes RAG workflows across iterations.
GPT-5 combines retrieved context and generates human-readable cost estimates with data-grounded reasoning.
Built with FastAPI. Core endpoint:
POST /query // Input { "query": "cost of knee surgery in Florida" } // Output { "state": "Florida", "procedure": "Knee-related service", "estimated_cost": "...", "explanation": "..." }
| Decision | Benefit | Tradeoff |
|---|---|---|
| pgvector over ChromaDB | Production-ready, unified system | More setup and tuning |
| IVFFlat indexing | Fast retrieval at scale | Approximate results |
| Hybrid RAG | High precision on structured queries | More complex pipeline |
| Multi-model setup | Better parsing + reasoning | Increased latency & cost |
| LLM-based parsing | Flexible natural language understanding | Cost + external dependency |