Governments and companies are turning to automated tools to make sense of what people post on social media, for everything ranging from hate speech detection to law enforcement investigations. Policymakers routinely call for social media companies to identify and take down hate speech, terrorist propaganda, harassment, “fake news” or disinformation, and other forms of problematic speech. Other policy proposals have focused on mining social media to inform law enforcement and immigration decisions. But these proposals wrongly assume that automated technology can accomplish on a large scale the kind of nuanced analysis that humans can accomplish on a small scale.
Today’s tools for automating social media content analysis have limited ability to parse the nuanced meaning of human communication, or to detect the intent or motivation of the speaker. Policymakers must understand these limitations before endorsing or adopting automated content analysis tools. Without proper safeguards, these tools can facilitate overbroad censorship and biased enforcement of laws and of platforms’ terms of service.
This paper explains the capabilities and limitations of tools for analyzing the text of social media posts and other online content. It is intended to help policymakers understand and evaluate available tools and the potential consequences of using them to carry out government policies. This paper focuses specifically on the use of natural language processing (NLP) tools for analyzing the text of social media posts. We explain five limitations of these tools that caution against relying on them to decide who gets to speak, who gets admitted into the country, and other critical determinations. This paper concludes with recommendations for policymakers and developers, including a set of questions to guide policymakers’ evaluation of available tools.