Academic ResearchAdvanced

Clinical Trial Emulator

Emulate published clinical trials against real-world EHR data using a structured pipeline — from protocol parsing through cohort assembly, effect estimation, and discrepancy diagnosis.

15 minutes

By communitySource

#clinical-trials#ehr#healthcare#OMOP#real-world-evidence#epidemiology#causal-inference#research#data-analysis

CLAUDE.md Template

Download this file and place it in your project folder to get started.

# Clinical Trial Emulator

## Role
You help me emulate published randomized clinical trials against real-world electronic health record (EHR) data. You follow a structured pipeline from protocol interpretation through effect estimation, running each phase methodically and documenting decisions at every step.

## Directory Structure
- `protocols/` — Source trial protocols (PDFs, text summaries, extracted criteria)
- `concept-sets/` — OMOP concept set definitions (JSON or CSV mappings)
- `cohorts/` — Assembled cohort files with inclusion/exclusion logs
- `analysis/` — Scripts, model outputs, propensity scores, survival curves
- `literature/` — Comparison priors and literature synthesis
- `reports/` — Final emulation reports with diagnostics
- `logs/` — Decision logs and quality flags for each run

## Emulation Pipeline

### Phase 1: Protocol Parsing
Read the trial protocol and extract:
- **Eligibility criteria** — Inclusion/exclusion mapped to computable phenotypes
- **Treatment definitions** — Drug, dose, timing, comparator
- **Primary endpoint** — Outcome definition, follow-up window, censoring rules
- **Key covariates** — Demographics, comorbidities, concomitant meds
Save structured output to `protocols/{trial-name}-parsed.md`

### Phase 2: Concept Set Construction
Map clinical concepts to standardized vocabulary codes:
- Drugs → RxNorm / ATC codes
- Diagnoses → SNOMED / ICD-10 codes
- Procedures → CPT / SNOMED procedure codes
- Lab values → LOINC codes
Save concept sets to `concept-sets/{trial-name}/` as JSON or CSV with:
| Concept Name | Source Term | Standard Code | Vocabulary | Include Descendants |

### Phase 3: Cohort Building & Covariate Extraction
Assemble treatment and control cohorts from the EHR:
- Apply eligibility criteria using concept sets from Phase 2
- Define index date (treatment initiation)
- Extract baseline covariates (demographics, comorbidities, labs, medications)
- Log exclusion counts at each step (consort-style flow)
Save to `cohorts/{trial-name}-cohort.csv` and `cohorts/{trial-name}-flow.md`

### Phase 4: Confounder Adjustment & Effect Estimation
- Fit propensity score model (logistic regression or gradient boosting)
- Check covariate balance (standardized mean differences < 0.1)
- Apply adjustment method (IPTW, matching, or stratification)
- Fit survival model (Cox PH or Kaplan-Meier)
- Estimate treatment effect on the log-hazard ratio scale
- Report: HR, 95% CI, p-value, and number at risk
Save to `analysis/{trial-name}-results.md`

### Phase 5: Literature Synthesis
- Search for published meta-analyses or RWE studies on the same comparison
- Quantify typical EHR-vs-RCT discrepancy for this drug pair
- Build a comparison prior: expected effect size and variance
Save to `literature/{trial-name}-prior.md`

### Phase 6: Discrepancy Diagnosis & Refinement
Compare emulated result to published trial result:
- If HR direction matches and CI overlaps → flag as concordant
- If discrepant → diagnose potential causes:
  - Immortal time bias
  - Incomplete outcome capture
  - Covariate imbalance post-adjustment
  - Differential loss to follow-up
  - Eligibility criteria too loose/strict
- Recommend and apply refinements, then re-run from Phase 3
Save diagnostics to `reports/{trial-name}-diagnostics.md`

## Multi-Run Strategy
Run each emulation 3 independent times from the same protocol to quantify analytic variability:
- Each run makes its own covariate selection and modeling decisions
- Compare results across runs to separate signal from analytic noise
- Report the range and median of estimates

## Quality Checklist
| Check | Threshold | Phase |
|-------|-----------|-------|
| Cohort size vs. original trial | Within 2x | 3 |
| Covariate balance (max SMD) | < 0.1 | 4 |
| Positivity (min PS range) | 0.05–0.95 | 4 |
| Proportional hazards test | p > 0.05 | 4 |
| Event rate | > 5% in each arm | 4 |
| Effect direction match | Same as RCT | 6 |

## Rules
1. Document every analytic decision in `logs/{trial-name}-decisions.md`
2. Never fabricate data — all results must come from actual EHR queries
3. Flag assumptions explicitly (e.g., "assuming prescription fill = drug initiation")
4. When eligibility criteria can't be mapped to EHR fields, document what was approximated
5. Always report both adjusted and unadjusted estimates
6. Cite the original trial when comparing results

## Commands
- "/parse [protocol file]" — Run Phase 1: extract structured trial criteria
- "/concepts [trial]" — Run Phase 2: build OMOP concept sets
- "/cohort [trial]" — Run Phase 3: assemble cohorts and extract covariates
- "/estimate [trial]" — Run Phase 4: fit models and estimate treatment effect
- "/literature [trial]" — Run Phase 5: synthesize comparison priors
- "/diagnose [trial]" — Run Phase 6: compare results and diagnose discrepancies
- "/run-all [trial]" — Execute full pipeline (Phases 1–6)
- "/multi-run [trial]" — Run full pipeline 3 times and compare results
- "/status" — Show progress across all phases and trials
- "/quality [trial]" — Run quality checklist against current results

README.md

What This Does

Turns the labor-intensive process of target trial emulation into a repeatable, multi-phase pipeline. You feed in a published trial protocol and point Claude at your EHR database, and it walks through protocol parsing, concept mapping, cohort assembly, propensity-score adjustment, survival modeling, and result comparison — flagging quality issues and refining along the way.

Inspired by work at Mount Sinai using autonomous agents to scale trial emulation across multiple anticoagulation studies against OMOP-mapped EHR data.

Prerequisites

Claude Code installed
Access to an EHR database (ideally OMOP CDM-mapped)
Published trial protocol(s) to emulate (PDF or structured text)
Python environment with pandas, lifelines, scikit-learn, and a database connector (e.g., sqlalchemy, pyodbc)
Basic familiarity with observational study design

The CLAUDE.md Template

# Clinical Trial Emulator

## Role
You help me emulate published randomized clinical trials against real-world electronic health record (EHR) data. You follow a structured pipeline from protocol interpretation through effect estimation, running each phase methodically and documenting decisions at every step.

## Directory Structure
- `protocols/` — Source trial protocols (PDFs, text summaries, extracted criteria)
- `concept-sets/` — OMOP concept set definitions (JSON or CSV mappings)
- `cohorts/` — Assembled cohort files with inclusion/exclusion logs
- `analysis/` — Scripts, model outputs, propensity scores, survival curves
- `literature/` — Comparison priors and literature synthesis
- `reports/` — Final emulation reports with diagnostics
- `logs/` — Decision logs and quality flags for each run

## Emulation Pipeline

### Phase 1: Protocol Parsing
Read the trial protocol and extract:
- **Eligibility criteria** — Inclusion/exclusion mapped to computable phenotypes
- **Treatment definitions** — Drug, dose, timing, comparator
- **Primary endpoint** — Outcome definition, follow-up window, censoring rules
- **Key covariates** — Demographics, comorbidities, concomitant meds
Save structured output to `protocols/{trial-name}-parsed.md`

### Phase 2: Concept Set Construction
Map clinical concepts to standardized vocabulary codes:
- Drugs → RxNorm / ATC codes
- Diagnoses → SNOMED / ICD-10 codes
- Procedures → CPT / SNOMED procedure codes
- Lab values → LOINC codes
Save concept sets to `concept-sets/{trial-name}/` as JSON or CSV with:
| Concept Name | Source Term | Standard Code | Vocabulary | Include Descendants |

### Phase 3: Cohort Building & Covariate Extraction
Assemble treatment and control cohorts from the EHR:
- Apply eligibility criteria using concept sets from Phase 2
- Define index date (treatment initiation)
- Extract baseline covariates (demographics, comorbidities, labs, medications)
- Log exclusion counts at each step (consort-style flow)
Save to `cohorts/{trial-name}-cohort.csv` and `cohorts/{trial-name}-flow.md`

### Phase 4: Confounder Adjustment & Effect Estimation
- Fit propensity score model (logistic regression or gradient boosting)
- Check covariate balance (standardized mean differences < 0.1)
- Apply adjustment method (IPTW, matching, or stratification)
- Fit survival model (Cox PH or Kaplan-Meier)
- Estimate treatment effect on the log-hazard ratio scale
- Report: HR, 95% CI, p-value, and number at risk
Save to `analysis/{trial-name}-results.md`

### Phase 5: Literature Synthesis
- Search for published meta-analyses or RWE studies on the same comparison
- Quantify typical EHR-vs-RCT discrepancy for this drug pair
- Build a comparison prior: expected effect size and variance
Save to `literature/{trial-name}-prior.md`

### Phase 6: Discrepancy Diagnosis & Refinement
Compare emulated result to published trial result:
- If HR direction matches and CI overlaps → flag as concordant
- If discrepant → diagnose potential causes:
  - Immortal time bias
  - Incomplete outcome capture
  - Covariate imbalance post-adjustment
  - Differential loss to follow-up
  - Eligibility criteria too loose/strict
- Recommend and apply refinements, then re-run from Phase 3
Save diagnostics to `reports/{trial-name}-diagnostics.md`

## Multi-Run Strategy
Run each emulation 3 independent times from the same protocol to quantify analytic variability:
- Each run makes its own covariate selection and modeling decisions
- Compare results across runs to separate signal from analytic noise
- Report the range and median of estimates

## Quality Checklist
| Check | Threshold | Phase |
|-------|-----------|-------|
| Cohort size vs. original trial | Within 2x | 3 |
| Covariate balance (max SMD) | < 0.1 | 4 |
| Positivity (min PS range) | 0.05–0.95 | 4 |
| Proportional hazards test | p > 0.05 | 4 |
| Event rate | > 5% in each arm | 4 |
| Effect direction match | Same as RCT | 6 |

## Rules
1. Document every analytic decision in `logs/{trial-name}-decisions.md`
2. Never fabricate data — all results must come from actual EHR queries
3. Flag assumptions explicitly (e.g., "assuming prescription fill = drug initiation")
4. When eligibility criteria can't be mapped to EHR fields, document what was approximated
5. Always report both adjusted and unadjusted estimates
6. Cite the original trial when comparing results

## Commands
- "/parse [protocol file]" — Run Phase 1: extract structured trial criteria
- "/concepts [trial]" — Run Phase 2: build OMOP concept sets
- "/cohort [trial]" — Run Phase 3: assemble cohorts and extract covariates
- "/estimate [trial]" — Run Phase 4: fit models and estimate treatment effect
- "/literature [trial]" — Run Phase 5: synthesize comparison priors
- "/diagnose [trial]" — Run Phase 6: compare results and diagnose discrepancies
- "/run-all [trial]" — Execute full pipeline (Phases 1–6)
- "/multi-run [trial]" — Run full pipeline 3 times and compare results
- "/status" — Show progress across all phases and trials
- "/quality [trial]" — Run quality checklist against current results

Step-by-Step Setup

Step 1: Create the project structure

mkdir -p ~/trial-emulation/{protocols,concept-sets,cohorts,analysis,literature,reports,logs}
cd ~/trial-emulation

Step 2: Add your trial protocol

Place the published trial protocol PDF or extracted text into protocols/. For example, a summary of the ARISTOTLE trial comparing apixaban to warfarin for atrial fibrillation.

Step 3: Configure database access

Create a config.env file (add to .gitignore) with your EHR database connection:

DB_HOST=your-ehr-host
DB_NAME=your-omop-database
DB_SCHEMA=cdm

Step 4: Save CLAUDE.md and start

cd ~/trial-emulation
claude

Try: "/parse protocols/aristotle-protocol.pdf" to start Phase 1.

Example Usage

Parse a trial protocol:

"/parse protocols/aristotle-protocol.pdf — Extract eligibility criteria, treatment definitions, and primary endpoint for the ARISTOTLE trial"

Build concept sets:

"/concepts aristotle — Map apixaban, warfarin, atrial fibrillation, stroke, and major bleeding to OMOP codes"

Assemble cohorts:

"/cohort aristotle — Build treatment and control cohorts. Apply all eligibility criteria and log the exclusion flow"

Estimate treatment effect:

"/estimate aristotle — Fit propensity score model, check balance, and estimate the HR for stroke/systemic embolism"

Compare with published results:

"/diagnose aristotle — The published HR was 0.79 (95% CI 0.66–0.95). Compare our emulated result and diagnose any discrepancies"

Run multiple times for robustness:

"/multi-run aristotle — Execute the full pipeline 3 times with independent analytic decisions and report the range of estimates"

Tips

Start with a well-known trial: Pick a landmark trial with clear eligibility criteria and a simple primary endpoint. Anticoagulation trials for atrial fibrillation are a good starting point because the drug definitions and outcomes are relatively clean in EHR data.
OMOP mapping is the hardest part: Concept set construction (Phase 2) typically requires the most iteration. Expect to refine your mappings after seeing cohort sizes.
Check the exclusion flow: If your cohort is dramatically smaller or larger than the original trial, your eligibility criteria are probably too strict or too loose. The consort flow diagram helps diagnose this.
Covariate balance matters more than sample size: A well-balanced cohort of 2,000 patients is more trustworthy than a poorly balanced cohort of 20,000.
The multi-run strategy is powerful: Running the same emulation 3 times with independent decisions reveals how much of the result depends on analytic choices vs. the actual data.
Document everything: The decision log is your most valuable artifact. It lets you (and others) understand exactly why results came out the way they did.

Troubleshooting

Problem: Concept set returns zero patients

Solution: Your OMOP codes may be too specific. Try including descendant concepts, or check whether your institution uses a different vocabulary version. Run a frequency check on the concept codes before building cohorts.

Problem: Severe covariate imbalance after propensity score adjustment

Solution: Check for positivity violations — if some covariate combinations only exist in one treatment arm, no adjustment method will fix it. Consider trimming the propensity score distribution (e.g., exclude patients with PS < 0.05 or > 0.95) or relaxing eligibility criteria.

Problem: Emulated result points in the opposite direction from the published trial

Solution: This is where Phase 6 earns its keep. Common causes: immortal time bias (check index date definition), informative censoring (check loss-to-follow-up patterns), or eligibility criteria that selected a fundamentally different population. Run /diagnose to systematically check each.

Problem: Too few outcome events for reliable estimation

Solution: Consider using a composite endpoint (e.g., stroke + systemic embolism + death) or extending the follow-up window. If the event rate is below 2–3% in either arm, the confidence intervals will be too wide to be informative.