Want to Improve Data Quality, Reduce Liability, and Gain Consumer Trust? Try Deleting.

February 24, 2017 / Michelle De Mooy

#DeleteUber recently became a trending topic on Twitter when over 200,000 users requested that the company delete their accounts. The requests came after suggestions that Uber broke a New York Taxi Workers Alliance strike objecting to the White House’s executive order on immigration. The company faced even more backlash once users began to question whether Uber was actually deleting their accounts.

In fact, Uber found itself overwhelmed with deletion requests and had no automated system in place to easily delete user accounts. New York Times writer Mike Isaac noted that the company’s process for deleting accounts was “completely manual,” meaning an Uber employee had to go into the company’s database system to delete each account.

Uber’s situation illustrates the disconnect between what data deletion means to companies and how users understand the concept. To users, deletion is an act of finality that ends their relationship with a company and destroys their information. To most companies, a deletion command is more likely to send a copy of the user’s information to cloud storage for potential retrieval.

In CDT’s newest white paper, “Should it stay or should it go? The legal, policy, and technical landscape around data deletion,” we explore this disconnect and the reasons why commercial data stores have grown. We make the case that it is neither in a company’s nor a customer’s best interest to hold onto large amounts of data.

Retaining data indefinitely has become the default for many companies, and the reasons for this are clear: not only does much of U.S. law and regulation incentivize data retention, the tantalizing promise that all data has some intrinsic value has encouraged the hoarding of data, too. Enormous technical challenges also are involved in purging data that is intermingled and stored in different places, and these challenges create business costs.

But these challenges are not insurmountable. Deletion can be a means to improving data quality, reducing liability and risk, and gaining consumer trust.

While data storage costs have gone down, the overall cost of storing large amounts of data, in legal risks, have soared. Even with storage costs dwindling, companies still spend an estimated $5 million per petabyte to retain old information. Liability and legal risk have also skyrocketed. Data breaches cost companies an average of $4 million per breach, according to one estimate, while electronic discovery costs have grown to $18,000 per gigabyte of data. Less data makes a company a smaller target for criminals and makes it harder for lawyers, consultants, and auditors to charge companies to wade through a bunch of irrelevant data.

Laws in the U.S. need to be more specific about their data retention and deletion rules. In months of research, we couldn’t find a single statute or regulation that provided one clear and actionable set of policy and technical guidelines for companies on the responsible disposal of data (one excellent resource, NIST’s Media Sanitization Guidance, is helpful to newbies but less so for larger companies). Many laws like the Fair and Accurate Credit Transactions Act point to the need for companies to create policies around data disposal based on the sensitivity of the data, but do not provide specifics on how to determine the best methods or approaches for doing it. And lest companies forget, the Federal Trade Commission has argued that not implementing reasonable security measures, including data disposal, is an unfair business practice.

Not all data is created equally. A survey by the Compliance, Governance and Oversight Council found that corporate information generally fits into four categories: one percent of data is held for litigation purposes, five percent for regulatory compliance, twenty five percent of data has business value, and a whooping sixty nine percent of a company’s data holdings have little to no business value. That’s a lot of useless data generating costs, not revenue.

Lengthy data retention is also contrary to people’s expectations and can reduce their trust in a company. No consumer expects their information to be retained forever, and as new Internet of Things devices begin to collect more granular and personal information, maintaining customer trust will be at the heart of sustaining value and loyalty. Consumers repeatedly say they are less likely to patronize a company that loses their information or suffers a data breach. Also, government uses of personal information for surveillance purposes is a threat that encourages deletion of datasets ― it’s hard to hand over data that you don’t have.

CDT’s paper suggests that companies begin by auditing their data holdings in order to index it by value, sensitivity, and other categories. Once established, these categories will enable the creation of a data lifecycle that creates responsible parameters around retention and deletion of data. We also make detailed recommendations on the policies and technical practices that companies can employ to advance data management, such as using deletion-by-encryption, reducing internal and external access to data, data destruction requirements for third-party vendors, and enabling data minimization.

As the novelty of big data wears off, companies are faced with enormous data holdings that present huge risks and high costs for them and their customers. Deletion used to be a dirty word for data-centric companies, but it’s time to reconsider its value. Not only do huge data stores generate costs and liability, they damage customer trust and loyalty, and make it much harder to find the data diamonds among the slurry.

Full Paper

Want to Improve Data Quality, Reduce Liability, and Gain Consumer Trust? Try Deleting.

Related Reading

EU AI Act Brief – Pt. 2, Privacy & Surveillance

CDT Files Comments with DOJ in Response to Advance Notice of Proposed Rulemaking on Bulk Sale of Data

CDT’s Matt Scherer Testifies Before Connecticut Senate’s General Law Committee on Senate Bill 2, An Act Concerning Artificial Intelligence