Case study · Healthcare

RAG-powered clinical decision support across 240+ clinics.

A 240-clinic operator was drowning in unstructured clinical knowledge — guidelines, formularies, internal protocols, and peer-reviewed literature spread across systems clinicians couldn't search effectively. We built a HIPAA-aligned RAG system that surfaces grounded answers at the point of care, with citation tracking on every response and the audit trail compliance teams actually rely on.

Client: US clinical operator(anon.)
Industry: Healthcare
Duration: 11 months
Team: 6 senior engineers + CORTEX & CITADEL co-pilots
Model: Project-Based Delivery

The challenge

What stood in the way.

The operator's clinical knowledge lived in fifteen different systems: an EHR knowledge base, a formulary tool, a guidelines repository, internal SharePoint, regulatory filings, peer-reviewed PDFs, and several legacy clinical-content platforms. Clinicians couldn't search across them, so they didn't — they made decisions from memory or asked colleagues, with predictable variation.

Three constraints made this a hard project: (1) HIPAA scope across the entire data substrate, with ePHI flowing through retrieval; (2) clinical safety meant 'I don't know' had to be an acceptable answer — confident-but-wrong was dangerous; (3) audit-defensibility required every interaction logged with input, output, retrieved citations, model version, and user identity, retained per the operator's record schedule.

15 source systems with conflicting truth and no unified search
HIPAA scope across the entire RAG substrate, including embedding generation
Clinical safety required refusal-as-default for out-of-scope queries
Audit logging granularity defined before the first response was generated

Our approach

How we framed it.

We started with a privacy threat model — mapping every PHI element through the system, defining minimum-necessary at the prompt boundary, and producing a data-flow diagram that became the spine of the architecture. Model selection came after architecture, not before.

HIPAA-eligible deployment topology

Deployed on AWS with HIPAA-eligible services exclusively under BAA. Bedrock for the LLM endpoints (Claude under Anthropic's BAA-covered offering), pgvector on Aurora for the vector store, and S3 with customer-managed keys for the source documents. ePHI never crossed into a non-covered path; we built the architecture to enforce that.

Eval harness with clinical SMEs

Ground-truth datasets curated by board-certified clinicians across the operator's primary specialties. Faithfulness scoring, refusal correctness, and equity-aware subgroup evaluation. Every prompt change ran against the eval set in CI before it could ship to production.

The solution

What we built.

The deployed system uses hybrid retrieval (dense embeddings + sparse keyword + faceted filters) with a re-ranking step against a clinically-curated relevance model. Every generated answer cites the source documents it drew from; ungrounded outputs are blocked at generation time. The clinician-facing UI surfaces citations inline with explicit confidence indicators.

Hybrid retrieval: pgvector + OpenSearch + clinical metadata filters
Re-ranking: cross-encoder fine-tuned on clinically-rated relevance pairs
Refusal patterns: 6 explicit refusal classes (out-of-scope, low-confidence, jurisdictional, time-sensitive, ambiguous, escalate-to-human)
Audit pipeline: every interaction logged with input / output / citations / model version / user identity, retained 7 years
Continuous evaluation: weekly drift checks against the curated dataset, monthly equity audits across demographic subgroups

Results

What changed.

Eleven months after kickoff, the system handled 18,000 clinician queries per week across 240 clinics with grounded, cited responses. Independent finance review attributed $4.2M in annualized labor savings to reduced time-to-answer; clinical leadership reported a 31% reduction in variation on the protocol-driven decisions the system supports.

$4.2M annualized labor savings (audited by internal finance)
12M clinical documents indexed across 15 source systems
18,000 weekly clinician queries with citation-backed responses
31% reduction in protocol-decision variation across clinics
SOC 2-aligned audit pipeline; first regulator inquiry resolved in 3 days

In their words

“Most clinical AI demos can't survive a real clinician asking a real question. This one does — and the audit trail makes it defensible at the level our compliance team can stand behind.”

Chief Medical Information Officer · Clinical Operations · US clinical operator

Stack

Tools used in production.

AWS Bedrock
Anthropic Claude (BAA)
pgvector / Aurora
OpenSearch
FHIR R4
Python
LangSmith
Promptfoo
Datadog LLM

Practices

Practices on the engagement.

Talk to us

Have a similar problem? Bring it to a senior engineer.

A senior engineer plus the relevant department lead joins the first call. No discovery gauntlet, no junior reps.

Book a discovery call All case studies

RAG-powered clinical decision support across 240+ clinics.

Client

US clinical operator(anon.)

Industry

Healthcare

Duration

11 months

Team

6 senior engineers + CORTEX & CITADEL co-pilots

Model

Project-Based Delivery