OPM Revises Its Health Information Database

June 17, 2011 / Harley Geiger

Yesterday, the Office of Personnel Management (OPM) issued updated notices for its health claims database. The new notices shed more light on the privacy protections OPM intends to use, and the notices announce significant revisions to the way OPM will collect and share health information. The new notices come after CDT and other groups issued letters and comments to OPM, urging it to apply greater privacy protections, be more transparent and to consider other database models. OPM’s new notices take up some of these recommendations, but there are still unresolved issues with the identifiability of the health information and the database architecture.

Background on the Health Claims Data Warehouse

In October 2010, OPM announced its plans to create a database – called the Health Claims Data Warehouse – containing copies of detailed electronic health records of millions of Americans. According to the October 2010 notice, the Warehouse would include enrollees’ Social Security Number (SSN), information on spouses, children, and employment, as well as health care coverage, procedures, diagnoses, and payments. OPM would collect this data by setting up data feeds with plans participating in three major insurance programs: the Federal Employee Health Benefit Program, the National Pre-Existing Condition Insurance Program, and the Multi-State Option Plan.

The October 2010 notice gave OPM the discretion to share this information with a broad range of entities. OPM stated it could share the health information with law enforcement agencies, researchers inside and outside the federal government, members of Congress, at the request of the individual, and with federal agencies and courts. The notice also stated that OPM could share the information for “other purposes.” OPM’s October 2010 notice stated that access to the database would be restricted to employees with the right clearance and “a need to know to perform their official duties.” The notice included a general statement that the data will be de-identified “in many instances.” OPM’s announcement gave almost no other information on how it will protect individuals’ privacy.

CDT issued a letter to OPM following the announcement of the Warehouse. CDT’s letter requested specific details regarding what privacy and security protections OPM would apply to safeguard the data. CDT’s letter urged OPM to limit the parties with which the agency would share health data, and to consider giving enrollees a choice as to whether OPM would collect their data. Finally, CDT’s letter urged OPM to consider a decentralized database model that would leave health information with the health plans that presently hold it, rather than OPM’s proposal to copy the health information into a centralized database. CDT also met with OPM in December 2010 to discuss these issues.

Positive Steps in OPM’s New Notices

OPM’s June 2011 notices demonstrate that the agency was responsive to CDT’s recommendations. The new notices drop two of the three health insurance programs from the data collection – the Warehouse will now only collect information from the Federal Employee Health Benefit Program. The new notices provide much more detail on applicable safeguards, including a commitment to full compliance with HIPAA and FISMA protections. OPM also pledged to use only de-identified information for analysis purposes and to release only de-identified information to parties external to OPM. The June 2011 notices describe with specificity the data to be collected and the purposes for which the data was used, whereas the October 2010 notice had left these items open-ended.

The new OPM notices sharply curtail the level of sharing of health data that had been described in the October 2010 notice. The notices describe a division between the analysis component and the fraud prevention component of the Warehouse. The analysis component will share de-identified health information only with researchers. OPM’s Officer of Inspector General retains the discretion to share health information for law enforcement purposes, for litigation and with contractors and grantees working with the federal government (such as fraud investigators). Although this encompasses a sizable number of entities with which OPM may share health data, this is a big improvement over the previous scope of the Warehouse program as it was described in October 2010.

Improvements Still Needed for Health Claims Databases

OPM deserves praise for making greater effort to be transparent and to protect the privacy of its data subjects. Two core issues remain, however: the identifiability of the health information and the database architecture.

First, it remains unclear why OPM must collect fully identifiable health information. OPM’s fraud detection goals could be achieved using a limited data set (under HIPAA, a limited data set has several direct identifiers stripped from health information – though not de-identified, limited data sets are more protective of privacy than fully identifiable data). Similarly, it is unclear why OPM couldn’t use a one-way hash function to scramble the identifiers in the health information (a hash function is essentially a cryptographic technique that can conceal messages but still allow for comparison of similar messages). Both limited data sets and a hash function should enable OPM to create statistically viable longitudinal records while preserving the relative anonymity of individual enrollees. If OPM needs the identity of a record subject (such as if the Inspector General discovers fraud), OPM could approach the health plans to obtain the identity of the enrollees whose health information is at issue.

Second, OPM is sticking with its centralized database model. In CDT’s letter to OPM, we urged the agency to explore a decentralized query-based system instead. A decentralized model would leave enrollee health information with the current record holders – namely the health plans – rather than compile new copies of the health information into a one big system. Precedent exists for decentralized query systems in the Federal government, such as the Food and Drug Administration’s Sentinel Initiative. Decentralized models leverage existing systems, minimize data transfer and reduce the risk of a severe data breach. Moreover, leaving health claims information with the health plans is more in line with the public’s expectations of privacy.

These two fundamental issues appear in several contexts. There is a general trend among businesses and government agencies to develop a new database for every analytic need. There is some evidence that this is happening with health claims databases states run by the states. Although CDT supports cost-cutting and fraud detection goals of health claims databases, individual privacy and data security are ill served when repositories and copies of identifiable personal information are created unnecessarily. To the extent possible, government agencies and businesses should seek to meet their objectives through methods that leave data in existing systems and maintain the relative anonymity of data subjects. Unfortunately, OPM does not appear to be leading this charge – but the explosion in health claims analysis has just begun.