Moderating Maghrebi Arabic Content on Social Media
Introduction
People express their deepest desires, emotions, and imaginative ideas through language. However, Global North languages, particularly English, dominate in knowledge-sharing and technology. This has led to the marginalization of, and inadequate support for, Global South languages in digital spaces, particularly in the realm of content moderation. Content moderation involves governance mechanisms that determine participation in a certain community and control what is seen and what is valued online (Grimmelmann, 2015). By removing or reducing harmful content, content moderation helps ensure positive user experiences, safeguards social media platforms’ brand reputation, enables compliance with legal requirements, and increases advertising revenues (Roberts, 2018). Initially reliant on human moderators, the vast scale of online content posted online necessitates integrating automated moderation systems. These systems, however, often struggle with Global South languages due to inadequate and unrepresentative training data in addition to a lack of understanding of the cultural nuances that inform the meaning of language.
In this report, part of a CDT series investigating content moderation biases and disparities in the Global South (Elswah, 2024), we specifically examine the challenges and implications for moderation of content in the Maghrebi Arabic dialects in North Africa. Additionally, the Maghrebi Arabic dialects are under-examined in the literature of content moderation. Using a mixed-method approach combining interviews, focus groups, and online surveys, we found that:
- Most US-based social media companies utilize a global content moderation strategy that applies the same policies worldwide, whereas TikTok adopts a localized approach with many of their policies tailored to each region, particularly regarding cultural matters.
- Maghrebi Arabic users employ tactics such as “algospeak” to evade moderation algorithms, because they believe they are being censored for political reasons. Many of them compensate for ineffective reporting mechanisms by using mass reporting to remove harassing content.
- The lack of diversity in natural language processing teams that develop automated content moderation systems at social media companies, combined with insufficient training datasets for Maghrebi Arabic dialects and the recruitment of non-native annotators, negatively impacts the accuracy of the automated content moderation process.
- Content moderators, who work under harsh conditions, are assigned content from any country in the Arab world and are expected to make decisions despite the challenging cultural and linguistic nuances that vary across the region. This can lead to errors in content moderation in some instances.