White House Big Data Review Begins with Focus on Technical Issues
In January, President Obama delivered a speech regarding the Snowden disclosures of excessive government surveillance; that speech described a review of big data practices in both the private and public sectors, to be headed by presidential counselor John Podesta. The first public event in that review, co-sponsored by MIT in Cambridge, took place last week with a particular focus on technical issues raised by big data practices. While we welcome the review, we hope that the final report will adopt the Fair Information Practice Principles (FIPPs) from the White House’s 2012 report on consumer privacy – which remain relevant even in a big data environment – and recommend limitations on pervasive collection and retention of consumer data.
By focusing on technical topics – including the repercussions of storing large data sets for future big data purposes, security risks from data breach, and the limitations of analytics – the workshop shed light on some of the under-discussed issues concerning big data. For example, Cynthia Dwork of Microsoft Research gave a provocative talk on differential privacy and attempted to quantify the risks of reidentification of anonymized data sets. One panel focused on privacy enhancing technologies (such as encryption) that can be deployed to more effectively secure data. The technical methods for conducting big data analytics were also a popular topic, as many panelists described the ways in which data collected from online course services – an important new area that implicates privacy interests – can be analyzed by educational institutions.
In his introductory remarks, Podesta referred to the 2012 White House report on consumer privacy with approval, and we hope that the review of big data practices will reemphasize the most important provisions of that report – including its reliance on the Fair Information Practice Principles and contextual privacy protection. However, the use of the FIPPs as an organizing principle should be paired with limitations on collection of consumer data in order to promote individual privacy.
Multiple panelists argued that limitations on the collection of data were an ineffective way to protect privacy, and that providers and advocates should instead focus on limitations on the use of data that’s collected. For multiple reasons – most importantly, because privacy interests are implicated at the time of collection and deserve protection at that stage – we at CDT feel that collection limitations are important and shouldn’t be discarded. Focusing merely on use limitations ignores a host of issues that can arise when massive datasets are created from persistent collection. Data breach, inadvertent collection of customer data, and malicious internal use are all privacy violations that can occur even with strong use limitations – demonstrating the need for collection limitations. Just because an increasing amount of data is collectible doesn’t mean that it necessarily should be collected – and companies should make deliberate choices about what to collect and what not to, in order to help preserve the individual private spaces central to our lives.
The focus on use limitations was in keeping with a larger tendency to discuss the promise of big data – especially for educational uses, which MIT academics and affiliates have been testing. As we’ve written, specific contexts like education or the home are especially deserving of strong privacy protections. We hope that future big data uses of educational data retain the innovative spirit voiced by many of the panelists, but also include robust privacy and security protections.
On Monday, March 17, CDT President Nuala O’Connor will participate in the second workshop in the big data review, which will focus on social and ethical issues. The final report from the big data review process will hopefully contain language supporting the policy proposals that CDT has advocated, and we will be filing comments in order to ensure that individual privacy protections aren’t left behind in the age of big data. Big data may be the tool du jour, but consumer privacy and security should receive long-lasting protections.