Automated “Extreme Vetting” Won’t Work and Will Be Discriminatory
Written by Natasha Duarte
Today, CDT joined 55 civil society groups, as well as leading computer and data science experts, to oppose the Department of Homeland Security’s (DHS) automated extreme vetting initiative. Immigration & Customs Enforcement (ICE) plans to use automated technology and social media data to decide who gets deported or denied entry to the United States. This initiative is not only discriminatory but also technically infeasible.
ICE is seeking a contractor to automate parts of the administration’s “extreme vetting” process, which will involve analyzing people’s social media posts and other online speech, including academic websites, blogs, and news websites. The stated goal of this vetting is to “evaluate an applicant’s probability of becoming a positively contributing member of society as well as their ability to contribute to the national interests” and to “assess whether an applicant intends to commit criminal or terrorist acts after entering the United States” (language from the January 27th executive order known as the original “Muslim ban”). ICE intends to award the contract for this technology by September 2018.
Existing technology is not capable of making these determinations. Indeed, the concept of “becoming a positively contributing member of society” is amorphous and inherently vulnerable to biased interpretation and decision-making. Using automated analyses of social media posts and other online content to make immigration and deportation decisions would be ineffective, discriminatory, and would chill free speech.
Even state-of-the-art tools for performing automated analysis of text cannot make nuanced determinations about its meaning or the intent of the speaker. Machine-learning models must be trained to identify certain types of content by learning from examples selected and labelled by humans. That means the humans training the model have to know what they’re looking for, and be able to define it. But there is no definition (in law or in publicly available records) of what makes someone likely to become “a positively contributing member of society” or to “contribute to the national interests.” Even humans would be hard-pressed to make these determinations, and automated technology is far behind humans when it comes to understanding the meaning of language.
Instead, automated tools are likely to rely on proxies, such as whether a post is negative toward the United States. Even for this type of analysis, existing methods are inaccurate. When it comes to determining whether a social media post is positive, negative, or neutral, even the highest performing tools only reach about 70% to 80% accuracy (measured against human analyses). The government should not use predictive tools that are wrong 20% to 30% of the time to make decisions restricting people’s liberty or speech.
Automation will likely amplify the discriminatory impacts of DHS’s extreme vetting plan. Machine-learning models reflect the biases in their training data, and research has shown that popular tools for processing text and images can amplify gender and racial bias. For example, one study found that popular language processing tools had difficulty recognizing that tweets using African American Vernacular English (AAVE) were, in fact, English. One tool classified AAVE examples as Danish with 99.9% accuracy.
Many available tools for processing text are only trained to recognize English, and few of them are trained to process languages that are not well represented on the internet (about 80% of online content is available in one of only 10 languages: English, Chinese, Spanish, Japanese, Arabic, Portuguese, German, French, Russian, and Korean). For other languages, automated analysis will likely have disproportionately low accuracy. This is a major weakness for immigration and customs uses, where inability to accurately process different languages could jeopardize civil and human rights.
A recent episode in the West Bank shows the peril of relying on machine-learning models to make law enforcement and immigration decisions. A Palestinian man was held and questioned by Israeli police relying on an incorrect machine translation of the man’s Facebook post. The post, which in fact said “good morning” in Arabic, was translated to “attack them” in Hebrew. Law enforcement officials arrested the man without ever seeking confirmation of the translation from a fluent Arabic speaker.
As a minimum requirement, when the government relies on technology to make critical decisions affecting liberty interests, that technology should work well. However, the system ICE intends to build will likely evade any effective validation methods, since its stated goal is to predict highly subjective and undefined concepts – such as whether someone will positively contribute to society – that do not lend themselves to objective tests. DHS won’t be able to prove whether its predictive models work, leaving the public and Congress without effective means of holding the agency accountable. Indeed, the Office of the Inspector General has critiqued DHS’s existing pilot programs for using social media information in screening immigration applications, finding that DHS failed to design its pilot programs in a way that would enable the agency to measure whether the programs were working.
DHS intends to acquire its automated extreme vetting contract in the next year, so now is the time for industry to stand up for equality and technical integrity. Companies and researchers cannot allow their technology to be misused in ways that abuse civil and human rights and hurt government accountability.