Skip to Content

Privacy & Data

Health Big Data in the Clinical Context

This paper is the second in a series of three, each of which explores health big data in a different context. The first — on health big data in the government context — is available here, and the third — on health big data in the commercial context — is available here.

The health care industry, like other sectors, faces exciting new opportunities as a result of the cluster of developments occurring under the umbrella of the term “big data.” Building on a tradition of research, health care providers, insurers, pharmaceutical companies, academics, and many non-traditional entrants are already applying advanced analytics to large and disparate data sets, in order to gain valuable insights on treatment, safety, public health, and efficiency. As they do so, they encounter privacy questions.

In some ways, the privacy issues surrounding research and other health uses of big data are not new. The limits of notice and consent have long been recognized. Issues of security plague even “small data.” But health big data is more than just a new term. The health data environment today is vastly different than it was 10 years ago and will likely change more rapidly in the near future.

Most definitions of “big data” are based on the observation that the volume, velocity, and variety of data are rapidly increasing. For our purposes, the big data phenomenon encompasses not only the proliferation of “always on” sensing devices that collect ever larger volumes of data, but also the rapid improvements in processing capabilities that make it possible to easily share and aggregate data from disparate sources and, most importantly, to analyze it and draw knowledge from it. Until recently, the health care sector had lagged in its use of information technology. However, that is rapidly changing, due to a variety of factors including the shift to electronic health records (driven in part by federal incentives) and the emergence of new ways to collect health data. Hopes are high for big data as an important component of the “learning health care system,” which aims to leverage health information to improve and disseminate knowledge about effective prevention and treatment strategies, in order to enhance the quality and efficiency of health care.

To explore the privacy implications of health big data, and to develop concrete proposals for how to resolve privacy issues and at the same time reap the benefits of big data techniques, CDT is undertaking a series of consultations with stakeholders and experts. We are examining three scenarios: (1) clinical and administrative data generated by health care providers and payers; (2) health data contributed by consumers using the Internet and other consumer-facing technologies; and (3) health data collected by federal, state, and local governments.

In this paper, we focus on the first of these scenarios: clinical and administrative data generated by healthcare providers and payers in the course of providing treatment to patients, managing health care institutions, or processing payments. (This excludes data collected in controlled clinical trials, which we do not specifically address.) We look both at big data uses by providers and payers and at their disclosures of data, when permitted, to third parties for research and other analytic purposes.

The Privacy Rule promulgated under the Health Insurance Portability and Accountability Act (HIPAA) covers most healthcare providers and payers. Some uses of clinical data for research are also covered under the “Common Rule,” for protection of human research subjects in federally funded research. There has long been concern about the limitations of both sets of rules, and about inconsistencies between them. Clearly, those two legal regimes should be more consistent; efforts to harmonize them have been launched but so far have not progressed. The impetus of the big data revolution may spur harmonization and reform.

We acknowledge those long-running concerns, but in this paper we do not attempt to address them comprehensively or conclusively. Instead, we look to the framework provided by the Fair Information Practice Principles (FIPPs) and explore how it could be applied in an age of big data to clinical and administrative data. The FIPPs informed, albeit imperfectly, the HIPAA Privacy Rule, just as they have influenced to varying degrees most modern data privacy regimes. While some have questioned the continued validity of the FIPPs in the current era of mass data collection and analysis, we consider here how the flexibility and rigor of the FIPPs provide an organizing framework for responsible data governance, promoting innovation, efficiency, and knowledge production while also protecting patient privacy. Rather than proposing an entirely new framework for big data, which at best could be years in the making, we believe that the best approach for data in the traditional health care system is to start with the FIPPs-based rules under HIPAA and the Common Rule, and to interpret them for big data uses. This effort could have the further benefit of laying the groundwork for a consistent set of principles covering both the traditional health sector and emerging consumer applications.