Case study · Healthcare
A 240-clinic operator was drowning in unstructured clinical knowledge — guidelines, formularies, internal protocols, and peer-reviewed literature spread across systems clinicians couldn't search effectively. We built a HIPAA-aligned RAG system that surfaces grounded answers at the point of care, with citation tracking on every response and the audit trail compliance teams actually rely on.
The challenge
The operator's clinical knowledge lived in fifteen different systems: an EHR knowledge base, a formulary tool, a guidelines repository, internal SharePoint, regulatory filings, peer-reviewed PDFs, and several legacy clinical-content platforms. Clinicians couldn't search across them, so they didn't — they made decisions from memory or asked colleagues, with predictable variation.
Three constraints made this a hard project: (1) HIPAA scope across the entire data substrate, with ePHI flowing through retrieval; (2) clinical safety meant 'I don't know' had to be an acceptable answer — confident-but-wrong was dangerous; (3) audit-defensibility required every interaction logged with input, output, retrieved citations, model version, and user identity, retained per the operator's record schedule.
Our approach
We started with a privacy threat model — mapping every PHI element through the system, defining minimum-necessary at the prompt boundary, and producing a data-flow diagram that became the spine of the architecture. Model selection came after architecture, not before.
Deployed on AWS with HIPAA-eligible services exclusively under BAA. Bedrock for the LLM endpoints (Claude under Anthropic's BAA-covered offering), pgvector on Aurora for the vector store, and S3 with customer-managed keys for the source documents. ePHI never crossed into a non-covered path; we built the architecture to enforce that.
Ground-truth datasets curated by board-certified clinicians across the operator's primary specialties. Faithfulness scoring, refusal correctness, and equity-aware subgroup evaluation. Every prompt change ran against the eval set in CI before it could ship to production.
The solution
The deployed system uses hybrid retrieval (dense embeddings + sparse keyword + faceted filters) with a re-ranking step against a clinically-curated relevance model. Every generated answer cites the source documents it drew from; ungrounded outputs are blocked at generation time. The clinician-facing UI surfaces citations inline with explicit confidence indicators.
Results
Eleven months after kickoff, the system handled 18,000 clinician queries per week across 240 clinics with grounded, cited responses. Independent finance review attributed $4.2M in annualized labor savings to reduced time-to-answer; clinical leadership reported a 31% reduction in variation on the protocol-driven decisions the system supports.
In their words
“Most clinical AI demos can't survive a real clinician asking a real question. This one does — and the audit trail makes it defensible at the level our compliance team can stand behind.”
Stack
Talk to us
A senior engineer plus the relevant department lead joins the first call. No discovery gauntlet, no junior reps.