AI Policy & Governance, CDT Research, Free Expression

CDT Urges NSF to Fund Non-English Language AI Research

August 9, 2023 / Gabriel Nicholas, Aliya Bhatia

As language models become embedded into more aspects of our social and technical systems, their limitations and biases will have larger ramifications on society at large.

One such limitation is how well language models work in languages other than English. A recent report from CDT entitled, Lost in Translation: Large Language Models in Non-English Languages, describes in detail the limitations of large language models’ performance in languages other than English, not just in generating content but in analyzing it as well.

To help address this problem, we made comments to the National Science Foundation’s new Directorate for Technology, Innovation, and Partnerships (TIP), recommending how they could help invest in use-inspired research to build training and test datasets in non-English languages. In particular, we urge them to invest in those with limited data available to make the development of language models more equitable across languages.

In the comments, we explain why language models work better in English and a handful of other “high resource” languages than in other languages, what effect that gap has, why others will not address the gap, and how TIP can help.

Read the full comments here.

Related Reading

CDT’s Matt Scherer Testifies Before Connecticut Senate’s General Law Committee on Senate Bill 2, An Act Concerning Artificial Intelligence

April 23, 2024

Graphic for CDT's European office. Pale blue / green pixelated background, with a portion of the EU flag's circle of stars emblazoned in white on top.

CDT Europe’s AI Bulletin: April 2024

April 19, 2024

Context Before Code: Meta’s Oversight Board Policy Advisory Opinion on the Word “Shaheed” Calls for Language and Cultural Nuance in Content Moderation

April 2, 2024