Why 'Zero-Shot' Clinical Predictions Are Risky

Date: April 19, 2026

Source: How to interpret 'zero-shot' results from generative EHR models — Nature Medicine


Key Takeaways


Proposed Evaluation Framework

Performance by Frequency

Evaluate how well the model performs on rare versus common medical events.

Calibration

Ensure that predicted probabilities (e.g., 30% risk) align with real-world outcome frequencies.

Timeline Completion

Measure how often the model fails to generate a complete patient timeline.

Shortcut Audits

Assess whether the model relies on non-clinical shortcuts (e.g., administrative codes) instead of true medical signals.

Out-of-Distribution Validation

Test model performance on fundamentally different patient populations without retraining.