🏀Zerve chosen as NCAA's Agentic Data Platform for 2026 Hackathon·📍Zerve exhibiting at Neudata London Summit · 2 July·📈We're hiring — awesome new roles just gone live!
Back

Re-identification Risk vs Predictive Utility on US Hospital Discharge Data

dlankeaux12
June 26, 2026

About

HIPAA's Safe Harbor de-identification standard removes 18 explicit identifiers, but a large literature — from Sweeney (2000) through to current re-identification audits — shows that the residual quasi-identifiers still in the record (demographics, admission codes, payer, specialty) are often enough to re-identify individual patients when joined against external data. Modern AI pipelines compound the problem: every training run, vendor handoff, and research extract is a new exposure surface. The operational question is no longer “is this data anonymous?” — it is “which mitigation gives us defensible privacy without breaking the model that has to run on it?” This project answers that question empirically on a real, large-scale hospital dataset.

Related Topics

Decision-grade data work

Explore, analyze and deploy your first project in minutes