Questions Submitted by Members of the Senate Committee on Commerce, Science, and Transportation on Enlisting Big Data in the Fight Against Coronavirus
Answers by Michelle Richardson, Director, Privacy and Data Project at the Center for Democracy and Technology
April 15, 2020
1. As the chairman points out, app-based programs are proliferating and will likely draw on increasingly large or diverse datasets. Regarding privacy, apps that do not transfer personal information are best of class. Those that need personal information should be subject to strict purpose limitations so data cannot be used for non-coronavirus applications. As for effectiveness, there is no reliable data available at this time. Even though location and proximity tracing apps have been deployed in other countries, their impact has not been disentangled from contemporaneous efforts like widespread testing, compulsory quarantines, public information on the movement of infected individuals, and other responses.
2. We do not believe that privacy and effectiveness are inversely proportional. Given the extraordinary resources that U.S. companies are investing in the coronavirus response, it is not a tradeoff we need to accept. In fact, excess data collection can often hide useful ‘signals’ behind a lot of data ‘noise’. Data collectors should have a clear idea of what data they want and why. This will encourage minimal data collection, strong data limitations, and result in the best health outcomes.
3. A comprehensive privacy law would have likely had several effects. First, it would have encouraged companies to conduct research in privacy protective ways. For example, the Chairman’s draft bill includes protections for public interest research that is necessary, proportionate and limited in purpose. It also excludes aggregate and de-identified data from its scope altogether. To maximize data use while receiving liability protection, companies would be more likely to commit to these methods. Second, under these protections people would likely feel more comfortable sharing their personal information. Knowing that there are clearer and more meaningful rules – including a way to enforce them – would encourage people to take part in voluntary data sharing that may currently feel too risky.
4. We recommend that location tracking use aggregated and anonymized data whenever possible. Less stringent de-identification tactics – such as creating a pseudonymous identifier – are not sufficient for such a sensitive data set. Because it is so easy to re-identify individual location data, it’s collection should be strongly disfavored. We are still working to understand how to effectively use anonymized, de-identified, and aggregate location data, but one area of benefit is allowing public health officials to identify and compare, in aggregate, the effectiveness of social distancing measures.
5. Data collected or shared during this health emergency should only be used to inform the response to the COVID-19 pandemic. The data should not be repurposed or retained for any other reason. Once the immediate public health crisis has passed, data collected by companies and the government should only be used by researchers for the sole purpose of learning from this episode and planning for future occurrences. Otherwise, the data should be destroyed. This is crucial for maintaining public trust and hence public health. Without these controls the public is less likely to share data or work to actively subvert data collection methods.
For the rest of our answers to Congress’ questions, as well as our other teams’ important work monitoring and guiding the coronavirus response, look to the resources box on this page.