Skip to Content

Privacy & Data

Getting Health Data De-Identification Right

We have written extensively about the importance of leveraging health data to improve health care quality, to eliminate racial and ethnic health disparities, and to reduce the unsustainable costs of our health care system. Concerns about the sensitivity of health data require us to leverage this data responsibly, in a way that respects the interests of individuals in health privacy and engenders public trust.

De-identification of health data can be an important tool for protecting privacy while still preserving the utility of health data for analytic purposes. But the ability of de-identification to meet both of these goals depends in large part on the deployment of effective health data de-identification methodologies. There are distressingly few resources (and only a handful of experts) available to health data researchers and health data stewards to help them understand – and effectively implement – health data de-identification methodologies.

One of the most renowned health data de-identification experts has stepped in to fill that void. Dr. Khaled El Emam, of the University of Ottawa, has written the “Guide to the De-Identification of Personal Health Information.” The Guide is a soup-to-nuts primer on the topic of health data de-identification. It begins with an explanation of what de-identification is and why de-identification helps us capture valuable insights from health data while still providing some protections for individual privacy, and moves to comprehensive, practical guidance on how to measure re-identification risk and actually deploy effective de-identification methodologies.

As noted clearly in the Guide, the goal of health data de-identification is not “zero risk,” which would be impossible to achieve while still preserving utility in the data. U.S. health privacy rules do not mandate that all re-identification risk be eliminated. HIPAA recognizes two methodologies for achieving “de-identification,” which is defined as “no reasonable basis to believe that the information can be used to re-identify the individual.” The safe harbor or “cook book” method, which requires the removal of 18 categories of common identifiers. By design, it is easy to use (probably easy enough even for a lawyer to deploy!); however, CDT has expressed concerns that this methodology may not sufficiently protect against re-identification in all circumstances. In this Guide, Dr. El Emam more specifically outlines some of the risks of the safe harbor method.

The Guide urges the use of the second HIPAA de-identification methodology, the statistician or “expert” method. This method requires that someone with knowledge of and experience in rendering information not individually identifiable determine that the risk is “very small” that the information, either alone or in combination with other information reasonably available to the recipient of the data, could be used to identify an individual in the dataset. This methodology expressly requires the actual risks of re-identification to be factored into the determination that a dataset is de-identified, rather than presuming that the risk is low regardless of the context. CDT believes this methodology is more protective of health data, and in recent de-identification guidance, the Office for Civil Rights, which has oversight over HIPAA, encourages entities to rely more on the statistical method. This methodology also retains greater utility in the data, because the statistical methods applied to the data take into account the purpose for which the data is intended to be used.

The lack of clear guidance on effective statistical methodologies makes entities reluctant to rely on this method for de-identification. This guide fills that gap by providing practical advice on how to successfully deploy such methodologies. Dr. El Emam and his colleagues have been statistically de-identifying data in Canada and the U.S. since 2005, and the methods described in the guide have been proven to work in practice and yield data still useful to health data analysts.

CDT has recommended the establishment of “Centers of Excellence” in health data de-identification to help ensure consistent implementation of effective de-identification methodologies. Dr. El Emam notes that the material in his guide could provide practical information to such organizations. We appreciate his endorsement of the Centers of Excellence idea – but we believe this guide could help increase the frequency with which effective statistical methodologies are used even if such Centers never established or recognized. And that is a vitally important contribution to all of our efforts to achieve better – and more affordable – health.