Skip to Content

Open Internet

Court Orders Google to Turn Over YouTube User Data

The federal court hearing Viacom’s billion-dollar copyright lawsuit against YouTube issued an order earlier this week in the discovery phase of the case. The court denied Viacom’s audacious request to require YouTube to turn over the source code that powers YouTube’s (and Google’s) search engine. But the court granted the request to compel YouTube to turn over the “logging database” that records all video viewing history information for the site — a compilation of which users watched which videos and when.

This raises privacy concerns. The logging database does not identify users by name, but it does contain users’ IP addresses and unique login IDs. A login ID will be whatever the user chose — which could be anything from a nonsensical set of characters or a random word to the user’s actual name. I’d guess that in a substantial number of cases, the login ID will contain name or email information. In those cases, the login ID, perhaps aided by IP address, could be sufficient to identify the actual, real-world world identity of the user. So the logging database will include identifying information for such individuals, linked to their full YouTube video viewing history.

Information about what videos individuals choose to watch can be very sensitive from a privacy perspective. And disclosing such information can chill expressive activity — after all, people may be reluctant to access certain videos if they know that everything they’ve chosen to watch could later be disclosed to third parties as part of legal disputes to which the affected individuals aren’t even party.

Moreover, as EFF’s Kurt Opshal has already pointed out, there’s a law specifically governing disclosure of individual video viewing logs. 18 USC 2710 prohibits a video tape provider or anyone else engaged in delivery of audio visual materials “similar” to video tapes from disclosing personally identifiable data about individual renting/viewing histories. It seems to me there is a very strong argument that YouTube delivers viewing material similar to video tapes; if online videos aren’t similar to video tapes, what would be? There’s also a strong argument that records for many users with revealing login IDs would qualify as personally identifiable. The court ruling this week mentioned the video tape statute briefly, but didn’t come close to offering a satisfactory analysis for why it shouldn’t apply. And if the statute does apply, it addresses the discovery question explicitly: disclosure of video usage information can be required by a court order in a civil proceeding if there is a compelling need that can’t be accommodated some other way. In addition, the consumer must be given reasonable notice and an opportunity to contest the disclosure. The court order this week does not feature such safeguards.

It seems likely, however, that Viacom doesn’t actually need to link specific videos to specific users. Viacom’s goal, presumably — and more on that in a moment — is to try to show that providing access to infringing video is a crucial and perhaps dominant aspect of YouTube’s business. For that purpose, aggregate and/or anonymous data (i.e., not linked or linkable to actual login IDs or IP addresses) should be perfectly sufficient. Viacom shouldn’t actually need specific login IDs or IP addresses at all. Significantly, Google says it is asking Viacom to let it anonymize the data before turning it over. Hopefully Viacom will agree that anonymous data is sufficient.

For those looking ahead to the merits of the case, Viacom’s request for the logging database raises the more speculative question of how Viacom may want to use the data it will get. Under the Sony Betamax case, the maker of a tool that can be used to infringe should not be secondarily liable for infringements committed by users so long as the tool has “substantial non-infringing uses.” The Supreme Court in Grokster declined to specify a quantitative threshold for “substantial non-infringing use.” But it seems clear, in YouTube’s case, that there are many people using the service for non-infringing purposes. I think there is no way the data will show that (say) 90 percent of YouTube’s use is infringement. So what does Viacom hope to show? I don’t know exactly what the data will reveal, but suppose that it allows Viacom to argue that 60% of the use of YouTube is infringement. I don’t see how a court could say that 40% of the huge amount of usage YouTube gets is not “substantial.” If that is the type of argument Viacom hopes to make, it will amount to an attempt to eviscerate Sony’s “substantial non-infringing use” defense.