CDT and many other civil society groups that work to protect human rights online have long advocated against proposals to undermine encryption. We believe the availability of secure encrypted communication services is central to privacy, free expression, and the security of today’s online commerce. While the encryption debate has largely centered on questions about whether law enforcement should be able to access encrypted content, a new front in the encryption wars has recently opened: how providers of end-to-end encrypted communications services can and should moderate unwanted content, including unlawful content such as Child Sexual Abuse Material (CSAM).
Unfortunately, Apple’s announcement of changes to its Messages service makes what we have feared a reality: it creates a backdoor that breaks end-to-end encryption (E2EE) to address the sharing of potentially unwanted content with children. Although Apple’s goal of protecting children online is laudable, it has taken the wrong approach, falsely claiming that their Messages app will remain end-to-end encrypted after their proposed changes and incorrectly suggesting that these are privacy-protective ways to address abusive content on E2EE services. Apple’s announcement has put questions about what exactly is an E2EE service, and how to moderate content on these services, in the spotlight.
In a new CDT report, Outside Looking In: Approaches to Content Moderation in End-to-End Encrypted Systems, we address both of these issues. First, we clarify what exactly end-to-end encryption (E2EE) means and what we should expect when a provider claims their service is E2EE. More specifically, a system, service, or app is end-to-end encrypted only if the keys used to encrypt and decrypt data are known only to the sender and the authorized recipients of this data. Moreover, in the case of an encrypted message exchange, only authenticated participants in the conversation can access the message, and no third party has access to the content. The importance of encryption to the privacy and security of our daily lives means that being precise about what E2EE means is critical, given attempts by governments and companies alike to introduce rules or features that undermine encryption.
Second, we look at the most viable current approaches to content moderation on E2EE services. To do that, we start by explaining the main phases of online content moderation: defining what user-generated content is permissible on a service, detecting content that may violate a host’s policies, evaluating whether that content is permissible or illegal, enforcing certain actions against the content, providing process for appeals of such enforcement actions, and educating users about the service’s policies on content moderation. Most of the existing research and thinking about content moderation has focused on plaintext (non-encrypted) settings. To expand the conversation to E2EE settings we examined several technical proposals that purport to introduce some form of content moderation (specifically, detection of unwanted content), while still providing an E2EE service.
Here is what we found:
- What works → methods that enable detection of unwanted or abusive content and still preserve E2EE.
- User-reporting: This refers to tools (e.g, reporting buttons or complaint forms, ) that allow users to alert moderators, other intermediaries, or other users of unwanted content. One technique, message franking, creates a means for the service provider to verify a reported message. Significantly, no third party obtains access to the message content without the knowledge and approval of at least one of the participants.
- Meta-data analysis: For E2EE communications, metadata, or “data about data,” can include unencrypted profile or group chat information, frequency of sending messages, or reports from other users about problematic content. As long as the metadata analysis occurs exclusively on a user’s device and does not result in storage, use, or sending of decrypted messages, the user’s privacy is preserved and the guarantees of end-to-end encryption are not violated.
It’s also important to note that both user-reporting and metadata analysis provide effective tools in detecting significant amounts and different types of problematic content on E2EE services including abusive and harassing messages, spam, mis- and disinformation, and CSAM.
- What does not work → methods that claim to maintain E2EE but in reality do not.
- Traceability: methods that aim to identify all users (or just the originating user) who have shared content on an E2EE service that has been flagged as problematic. In effect, this can provide a third party with access to information about the original sender of a message and/or all other prior recipients of the message in an expanded forwarding chain without their consent and is therefore not consistent with the privacy guarantees for E2EE.
- Perceptual hashing (server side or client-side scanning): This approach matches hashes or digital fingerprints of user generated content with hashes of content that the host has previously determined is unwanted or abusive. The matching can be done either by the service provider when it receives content, which is called server-side scanning, or on the app on the user’s device, which is called client-side scanning. Server-side scanning provides the service provider with access to information about the communication, a violation of the privacy guarantee of E2EE. Client-side scanning may not violate the privacy guarantees we expect from an encrypted conversation if the results of the hash comparison are only provided to the user; however, if the results of the hash comparison are shared with the service provider or any other third party, then those privacy guarantees have been violated. That said, even proposals for client-side scanning that involve no third-party access introduce significant security vulnerabilities into the system.
- What could work but needs more research:
- Machine-learning (ML) classifiers on the user’s device that are trained to detect unwanted content, on behalf of the user, such as enabling users to avoid receiving sexting images. If this process occurs exclusively on a user’s device, at the user’s instruction, and no information about the message is disclosed to a third party, then the guarantees of end-to-end encryption may not be violated. Apple, by contrast, did not take this approach in Messages, announcing instead that “parents” on family accounts will receive notice when a child under age 13 sends or receives an image classified as sexually explicit. As a result, Apple’s approach undermines the security of E2EE, and opens the door to further expansion and repurposing of the new features. More research is needed to develop ML classifiers that can be effective in detecting specific types of content while also protecting the privacy guarantee of E2EE.
There is of course room for improvements when it comes to understanding content moderation on E2EE services. We call for more research on approaches that emphasize user agency. These include techniques that are based on the use of metadata and methods that act within the confines of a messaging app on a user’s device to empower the user to flag, hide, or otherwise report unwanted content to the service provider.