Skip to Content

New CDT Research Report Highlights Limits of Automated Content Analysis

A new study by the Center for Democracy & Technology (CDT) finds that state-of-the-art machine learning techniques for analyzing user-generated content have key limitations that create human rights risks when used to evaluate people’s multimedia content.

The report, Do You See What I See? The Capabilities and Limitations of Automated Multimedia Content Analysis, explores a variety of machine learning techniques for understanding images, video, and audio media and explains what automated tools can—and cannot—tell us about digital content.

Dhanaraj Thakur, CDT Research Director and co-author of the study, says:

“Companies and governments alike are turning to automated content analysis tools to navigate the explosion in the amount of user-generated content online.  This report aims to explain the machine learning techniques used in content analysis as well as their limitations.

We look at matching models, which can help detect multimedia content that has already been identified as problematic. We also delve into a variety of computer vision and audition techniques to help explain what might be going on when a service like YouTube, Twitch, or Clubhouse is filtering content on the fly.

The report also discusses five key limitations of automated multimedia content analysis that policymakers and tech companies must understand as they consider the role that automated content analysis plays in our information ecosystem.  What works well in the lab may be easy to circumvent in the real world, and there is a lot of hype around the alleged ‘accuracy’ of tools, without enough effort to explain the predictions machine-learning algorithms are making.”

Emma Llansó, Director of CDT’s Free Expression Project and co-author of the study, says:

“Automated content analysis is everywhere, shaping what information we see and who gets an opportunity to speak. Government officials are also increasingly incorporating it into everything from immigration decisions to assessments of students’ behavior and academic performance.

In policy debates, it’s too easy to treat machine learning or artificial intelligence like a magic wand that will solve intractable problems. There’s a lot to be excited about in machine-learning research, and a number of beneficial uses of these technologies. But automated content analysis does not create solutions where humans disagree, and unthinking adoption of these tools poses severe human rights risks.”

Read the executive summary, full report, and analysis.