The Internet has enabled an absolutely enormous amount of speech to be streamed, uploaded, shared, and stored across a global network. By various estimates, more than 4 billion people are connected to the Internet, uploading over 3.2 billion images a day. On WhatsApp alone, more than 41 million messages are shared per minute, while on YouTube, users are creating more than 500 hours of video a minute.
The rate of creation of text, images, video, and audio content far outpaces any human’s ability to keep up with it all. But within this vast array of content, there are insights to be gleaned, information to learn, stories to engage with — and, sometimes, abusive and illegal activity to thwart. To navigate these vast troves of data, online service providers, state actors, and individual users have employed different forms of automation, from the simplest keyword searches and filters to today’s most sophisticated machine learning techniques.
But state-of-the-art machine learning techniques for analyzing user-generated content are still far from operating as omniscient artificial intelligence that can reliably identify who is posting hate speech, who is sharing inappropriate nudity, and who is engaged in terrorist activity. In our new report, Do You See What I See? The Capabilities and Limitations of Automated Multimedia Content Analysis, CDT explores a variety of machine learning techniques for understanding images, video, and audio media and explains what automated tools can — and cannot — tell us about digital content.
Building on CDT’s 2017 report, Mixed Messages?, the new report identifies five key limitations of automated multimedia content analysis that policymakers and tech companies must understand as they consider the role that automated content analysis plays in our information ecosystem. (Read more about these limitations in the Executive Summary of the report.) But where exactly do these limitations come into play? Below, we discuss three key policy debates that feature the role of automated content analysis.
Upload Filters and Other Uses in Content Moderation
Most obviously, the potential of automated content analysis is a central part of the debates around content moderation. Major online services employ automated content analysis to block previously identified child sexual abuse material (CSAM) and terrorist propaganda, detect potentially copyright-infringing files, identify nudity in content, and prompt users to rethink posts that may be hateful or harassing. Governments around the world are pushing these same providers to do more to “proactively detect” or “prevent” illegal and abusive content, which translates into a global push for content filtering—based on an assumption that automated analysis of this content is generally useful and reliable.
Whether and how these tools actually work is a pivotal question in discussions about online service providers’ liability for illegal content posted by their users. It also implicates providers’ obligations to moderate content in a way that respects principles of due process and their users’ human rights.
CDT has fought back against filtering mandates in their many guises, whether as an outright obligation, a quasi-voluntary “best practice”, or an inevitable outcome of short time frames for responding to removal orders from state actors. Beyond legislative proposals, many global policy fora are grappling with the potential use of, and human rights threats from, automated content analysis tools, including the UN Special Rapporteur on Freedom of Expression, the OSCE Representative on Freedom of the Media, the Council of Europe, the Freedom Online Coalition, and the Christchurch Call to Action. And human rights advocates around the world regularly call for more transparency and accountability from online services in their use of these tools.
Do You See What I See? provides an in-depth look at how different technologies used to automate content moderation actually work and what their limitations are. Readers will learn more about the hash-matching techniques that underlie tools such as PhotoDNA and the GIFCT’s shared hash database, the types of computer vision techniques that are used to evaluate video content on Twitch, and the technical challenges of doing real-time analysis of streaming audio on apps like Clubhouse.
Ranking and Recommendation Algorithms
It’s also important to understand the role of automated content analysis in the operation of ranking and recommendation algorithms. These algorithms have an enormous influence on our information environment: search engines answer people’s queries with apparently relevant information, social media feeds display the posts they think users are most likely to engage with, and recommendation systems prompt people to keep exploring related content. These algorithms raise major public policy concerns, including the risks of spreading mis- and disinformation, amplifying hateful views, and exacerbating the likelihood of offline violence by exposing users to increasingly radical perspectives. In response, the Council of Europe is developing the legal framework for an international treaty that would govern (among other things) the use of ranking and recommendation algorithms, the European Union’s Digital Services Act will address recommender systems, and legislators in the U.S. have introduced a bill focused on the role of amplification algorithms in civil rights violations and international terrorism.
Throughout most of these policy debates, however, we confront the black-box problem: it is unclear how most of these ranking and recommendation algorithms work, or what kinds of input they take into account. They typically rely on analysis of metadata about content and user behavior; from its earliest days, Google’s PageRank search algorithm, for example, assessed metadata such as the number of pages on the web that linked to a given URL, as a way of evaluating the page’s relevance. But it seems intuitive to many users that, for example, YouTube’s recommendation algorithm bases its suggestions not only on the behavior of users moving from video to video on its site, but also by conducting some content analysis in order to determine which videos are similar to one another.
Do You See What I See? provides policymakers, advocates, and others interested in transparency and accountability of ranking and recommendation algorithms with a better understanding of what types of content analysis techniques might inform these algorithms. Readers will be able to ask better questions about how these systems work, identify the potential points of failure, and advocate for safeguards that respond to some of the known limitations of these analysis techniques.
Social Media Monitoring
Online service providers aren’t the only ones using automated content analysis tools. Immigration officials, law enforcement, and other state actors sometimes engage in what is known as “social media monitoring,” or surveillance of individuals’ social media content. Such programs often involve collecting massive amounts of people’s public social media content and using various automated analysis tools to examine it. These officials attempt to use automated analysis tools to identify individuals who pose a security threat, exhibit evidence of criminal activity, or engage in other behavior officials deem “risky”.
In the U.S., the Department of Homeland Security (DHS) began screening social media accounts of some visitors as early as 2015. In subsequent years, DHS and the State Department began collecting social media identifiers from Visa Waiver Program applicants and on immigrant and non-immigrant visa applications from many countries, so that immigration officials could review their social media content. CDT and many others warned that DHS’s program would lead to a glut of data that would likely lead immigrations officials to draw inaccurate inferences and conclusions about visa applicants. Moreover, such surveillance programs will have a chilling effect on people’s social media activity and freedom of expression and association.
The Biden administration has taken some important steps towards reining in the indiscriminate collection of social media information in immigration contexts, and is undertaking a review of the use of social media identifiers in visa screening and vetting. However, there is a renewed push across DHS, the FBI, and other federal agencies (including the U.S. Postal Service), especially in the wake of the January 6th attack on the U.S. Capitol, to increase general surveillance of social media content. While DHS has said that they are currently using human analysts, not automated systems, to parse this data, this remains an area of opaque government policy with insufficient safeguards.
Determinations about immigration status and scrutiny from law enforcement can be a matter of life and death for many individuals. What’s more, social media information does not present the whole picture of an individual’s life or activities. In Do You See What I See?, we explain how various technical limitations of different automated content analysis techniques can yield disparate consequences across racial, ethnic, linguistic, and cultural lines. There is little useful intelligence for state actors to gain from social media monitoring, and so much for individuals to lose.
Beyond these issues, many other policy debates involve automated content analysis in some way: facial recognition; assessment of students’ educational progress; determinations about hiring, access to benefits or creditworthiness; online harassment; and more. As content analysis techniques get integrated into a variety of systems throughout our societies, their role may not be as obvious, and some of the techniques we feature in our report are bound to see improvements as research into computer vision, perceptual hashing, and other machine learning techniques continues.
No matter the advances in the technology, however, it remains essential for policymakers, companies, and the general public to understand that these technologies operate differently in a variety of real-world contexts, with consequences that are often disproportionately borne by those in our society who are already vulnerable or disadvantaged. We must continue to grapple with the fact that humans’ perspectives on the meaning of online content will differ, that when we ask each other, Do You See What I See?, the answer is often “No.” No amount of sophistication in machine learning technology can resolve policy questions on which humans disagree. Automated content analysis can be a useful tool, but it will never be a complete answer, and we must be watchful for its risks to human rights.
Read the full “Do You See What I See?” report here.
Take a look at the infographic detailing the “Five Limitations of Automated Multimedia Content Analysis.”