Skip to Content

Free Expression, Government Surveillance

Automated Tools for Social Media Monitoring Irrevocably Chill Millions of Noncitizens’ Expression

Last week, USCIS stated its plans to routinely screen applicants’ social media activity for alleged antisemitism when making immigration decisions in millions of cases, and announced that it is scouring the social media accounts of foreign students for speech that it deems potential grounds to revoke their legal status. Simultaneously, the Department of State has started using AI to enforce its “Catch and Revoke” policy and weed out “pro-Hamas” views among visa-holders, particularly including students who have protested against Israel’s war in Gaza. 

This isn’t USCIS’s first time conducting some form of social media monitoring; in fact, their first foray into social media data collection was in 2014. But, it is the first time the government has used a previously obscure provision of immigration law to target a large group of noncitizens for removal based on their political opinions and activism that the Secretary of State has determined could have “potentially serious adverse foreign policy consequences.” The current Administration’s broad definitions of speech that could lead to visa revocation or application denial, and the questionable constitutionality of making immigration decisions based on viewpoint, raise concerns that will only be exacerbated by the use of flawed, error-prone social media monitoring technologies.

The American immigration system already subjects applicants to disproportionate invasions of privacy and surveillance, some applicants more than others. In the current Administration, immigration enforcement has been particularly aggressive and gone beyond the bounds of previous enforcement efforts, with agents bringing deportation proceedings against applicants on valid visas on the basis of their legally-protected speech, including authorship of op-eds, participation in protests, and, according to a real albeit now-deleted social media post by the Immigration and Customs Enforcement agency, their ideas. Noncitizens have long been aware of the government’s surveillance of their speech and their social media activity, which has deterred them from accessing essential services and speaking freely on a wide range of topics, including their experience with immigration authorities, labor conditions in their workplace, or even domestic violence.

What is happening now, however, is an unprecedented and calculated effort by the U.S. government to conduct surveillance of public speech and use the results to target for removal those who disagree with government policy. At the time of writing, over 1,000 student visas have been revoked according to the State Department, some of which have been for participation in First Amendment-protected activities. For example, one post-doctoral student at Georgetown reportedly had his visa revoked for posting in support of Palestine on social media, posts that were characterized as “spreading Hamas propaganda” by a DHS spokesperson. In a high-profile case from earlier this year, the former President of Costa Rica received an email from the U.S. government revoking his visa to the United States a few weeks after he criticized the government on social media, saying, “It has never been easy for a small country to disagree with the U.S. government, and even less so, when its president behaves like a Roman emperor, telling the rest of the world what to do.” All signs indicate that disagreement with this Administration’s viewpoints could lead to negative consequences for noncitizens seeking to enter or remain in this country in any capacity.

This expansion of ideological targeting is cast against the backdrop of an immigration system that faces, at times, a Sisyphean backlog of applications and insufficient oversight of enforcement decisions, which are only growing in this political climate. Mistakes are routinely made, and they have devastating consequences. To the extent oversight agencies did exist, including through entities such as the Department of Homeland Security’s Office for Civil Rights and Civil Liberties, they have been shuttered or undermined, which will make it all the more difficult to identify and fix errors and failures to provide due process.

Applicants have little recourse to seek remedy or appeal mistakes when they are made, instead having to choose among cautious over-compliance in the form of silence, potential retaliation, or self-deportation to avoid it all. Increased social media surveillance of noncitizens against this backdrop will compound existing inequities within the system, and will almost certainly further chill noncitizens’ ability to speak and participate freely in society for fear of running afoul of the Administration.

And that’s all before accounting for the problems with the tools that the government will use to conduct this monitoring. The automated tools used for this type of social media surveillance are likely to be based on keyword filters and machine learning models, including large language models such as those that underlie chatbots such as ChatGPT. These tools are subject to various flaws and limitations that will exacerbate the deprivation of individuals’ fundamental rights to free expression and due process. This litany of problems with automated social media analysis is so pronounced that DHS opted against using such a system during the first Trump administration. DHS’s concerns about erroneous enforcement and deportations may have disappeared, but the risks from this technology have not.

First, models may be trained with a particular bias. Social media monitoring systems are generally trained on selected keywords and data easily found on the web, such as data scraped from Reddit, Wikipedia, and other largely open-access sources, which over-index on the views and perspectives of a few. Keywords may be added to the training corpus to fit the domain of use, such as offering examples of what constitutes “anti-semitism” or threats to national security. Should the training data over-represent a particular set of views or designations of “foreign terrorists,” the model may over-flag speech by some individuals more than others. The Administration’s over-capacious definition of the term “antisemitic” may be weaponized during the training of these social media monitoring models, subjecting to greater scrutiny anyone who has engaged in speech with which the Administration disagrees on topics such as Israel-Palestine or campus protests related to military actions against Gaza, even where the speech is protected by the First Amendment.

Second, and relatedly, these prescriptive tools struggle to parse context. While keyword filters and machine learning models may be able to identify words or phrases they’ve been tasked to detect, they are unable to parse the context in which the term is used, including such essential human expressions as humor, sarcasm, irony, and reclaimed language. We’ve written previously about how the use of automated content analysis tools by Facebook to enforce its Dangerous Organization & Individuals’ policy erroneously flagged and took down all posts containing the word “shaheed” (which means martyr in Arabic), even when an individual was named Shaheed or in contexts where individuals were not using the term in a way that glorified or approved of violence. Noncitizen journalists who cover protests or federal policy and post their articles on social media may also be flagged and surveilled simply for doing their job. People named Isis have long been caught up in the fray and flagged by these automated technologies. Posts by individuals citing the “soup nazi” episode of Seinfeld may also be swept in this analysis. Models’ inability to parse context will also limit their ability to conduct predictive analysis. Vendors procured by USCIS to conduct social media monitoring assert that they use AI to scan for “risky keywords” and identify persons of interest, but promises of predictive analysis likely rest on untested and discriminatory assumptions and burden the fundamental rights of all individuals swept up by these social media monitoring tools. 

Finally, the systems will be especially error-prone in multilingual settings. New multilingual language models purport to work better in more languages, yet are still trained primarily on English-language data, some machine-translated non-English data, and other available and often religious or government documents,—all imperfect proxies for how individuals speak their languages online. Multilingual training data for models is likely to underinclude terms frequently used by native speakers, including spoken regional dialects, slang, code-mixed terms, and “algospeak.” As a result, most models are unable to parse the more informal ways people have of speaking online, leading to erroneous outcomes when models analyze non-English language speech.

There have already been countless instances where digital translation technologies have been used by U.S. immigration enforcement agencies in problematic ways, which have prevented individuals from accessing a fair process and even safety. For example, an automated translation tool resulted in an individual erroneously being denied asylum because it misunderstood that she was seeking safety from parental abuse, literally translating that her perpetrator “el jefe” was her boss rather than her father. An individual from Brazil was detained for six months because of an incomplete asylum application, because the translation tool ICE used translated “Belo Horizonte” literally to “beautiful horizon” instead of identifying it as a city in which the applicant had lived. Another automated system used to conduct content analysis mistranslated “good morning” in Arabic to “attack them.” Widespread use of these error-prone systems to detect disfavored ideas will only exacerbate the discriminatory treatment of those who speak English as a second language.

Ultimately, the adoption of automated technologies to scan social media data will punish people for engaging in legal speech and result in more errors in an already flawed system. It will also chill the speech of millions of people in this country and abroad, impoverishing the global conversations that happen online. An applicant seeking to adjust their status or become a U.S. citizen, or even a U.S. citizen seeking to communicate with a noncitizen, will reasonably think twice before speaking freely or engaging in constitutionally-protected activities like protesting, simply because of the specter of social media surveillance. They already are.