Data Retention More Complicated, Expensive Than Previously Understood

UPDATE: On February 1, 2012, we issued a revised and updated version of our memo analyzing the costs of data retention. The link to the right takes you to the revised memo.

For several years, the Justice Department and some Members of Congress have been pushing for federal legislation that would require ISPs to retain extensive records of the IP addresses they assign to users.  The idea was to be able to link Internet communications back to individual users months or years after being sent. CDT and others have warned about both the privacy risks and the costs of collecting and retaining so much data.

Recently, CDT has concluded that changes in the Internet addressing practices of ISPs and mobile carriers make data retention far more complicated and much more expensive than previously understood, while at the same time reducing the reliability of IP addresses in identifying individual users.  We’ve issued a memo detailing this new perspective on the data retention mandate.

The problem is this: The IP addresses associated with communications traversing the Internet no longer uniquely identify end-user devices.  Because of the shortage of IP addresses, Internet access providers are beginning to use a technique called Network Address Translation to share addresses among multiple users.

Three related problems arise from IP address sharing:

  1. The public-facing IP address associated with a particular communication is no longer unique; instead, it may be shared among dozens, hundreds, or even thousands of users.  To match public-facing addresses with individual customers, carriers use a second data element, called a port number.  Immediately, this doubles the amount of data required to link addresses with user devices.
  2. Moreover, particularly in the mobile context, the addressing information associated with a particular device can change as frequently as once every minute and possibly even more frequently.  With this wrinkle, the volume of data associated with addressing becomes truly enormous, as does the task of retaining and retrieving it, with real cost implications.  Imagine re-issuing a copy of a small townʼs White Pages as often as once a minute but still having to maintain all of the old copies.  For some entities the costs become prohibitive; for others, they clearly will detract from growth and may even impede other forms of cooperation with law enforcement.
  3. Destination servers probably do not store the port numbers required to make the match, and, even if they did, their time stamps may not be synchronized with those of the ISPs precisely enough to yield reliable matches.

On top of this, coffee shops, trains, buses, planes, hotels, and public venues that offer Internet access also use NAT address sharing, making it even less likely that the data obtained at the end point of a communication can be used to identify individual end-user devices.

A new addressing system developed for the Internet may or may not relieve these problems, but full use of that system (IPv6) is still many years away.

In sum, it appears that what looked fairly simple and effective a couple of years ago is now immensely more complicated.  Based on these developments, it seems that data retention is an idea whose time has passed.

Share Post