Skip to Content

Cybersecurity & Standards, Free Expression, Government Surveillance

Part 2: New Intermediary Rules in India Imperil Free Expression, Privacy and Security

This post is the second of a two-part series on the new intermediary rules in India which came into effect on May 25, 2021. It focuses on their traceability requirements and their adverse effect on end-to-end encryption and users’ rights. The first post introduced the rules and focused on how aspects of the content moderation obligations for intermediaries and the oversight mechanism for digital media, social media, and OTT platforms undermine freedom of expression.

***

The recently enacted Information Technology (Intermediary Guidelines and Digital Media Ethics Code) Rules, 2021 threaten to undermine the right to privacy and free expression in India. In addition to providing the government with the authority and means to control online speech, the Rules also require that large social media intermediaries be able to identify and disclose the “first originator” of any information they carry.

This so-called traceability requirement is incompatible with end-to-end encryption (E2EE). E2EE platforms, including some of the most widely-used messaging platforms such as WhatsApp and Signal, enable users to communicate and express themselves freely with the assurance that their communications are safe from unauthorized access and can only be viewed by the sender and those intended to receive the communication.

At the core of E2EE platforms is the prioritization of privacy, free expression, and data security. Traceability strikes at this very core.

The Traceability Requirement

The Rules impose a traceability requirement on “significant social media intermediaries” (SSMIs) – i.e., intermediaries with more than 5 million registered users, and any other intermediary the government designates as presenting a material risk of harm to state security or the sovereignty or integrity of India. Each SSMI must be able to identify and disclose the “first originator” of any information carried on its platform in response to a judicial order or a government order under Section 69 of the parent legislation, the Information Technology Act, so long as that order is for the purpose of addressing an offense related to state sovereignty, security, public order, or certain sexually-related materials. For purposes of the Rules, the first originator of a message that originates outside of India is deemed to be the first person who originates the message within India.

The Indian government has proposed two models for implementing the traceability mandate that it claims will not undermine E2EE: (1) tagging the originator’s identity information in an encrypted form with each message, as suggested in a proposal from Dr. Kamakoti of IIT Madras, and (2) maintaining a library of alpha-numeric hashes for every message, against which the hash of the message that is the subject of the government order would be compared to enable traceability.

Dr. Kamakoti’s proposal recommends two levels of encryption. Each message would be encrypted as is currently the practice. Additionally, the originator’s information would be encrypted and tagged with the message as it is forwarded. The intermediary would hold in escrow the key to decrypt the originator’s information and would use the key to reveal the originator’s information for a particular message in response to an authorized order.  

Dr. Kamakoti further suggests that users must mark a message as “forwardable” or “non-forwardable” as a means of indicating consent to assuming responsibility as an originator. If a user originates a message and marks it as “forwardable,” their information gets linked with that message. But if a sender marks a message as “non-forwardable” and the recipient nevertheless forwards it, the recipient becomes the originator and their information is linked with the message.

The second proposal, endorsed by Rakesh Maheshwari, senior director and group coordinator of cyberlaw and eSecurity at MeitY, would require that intermediaries assign an alpha-numeric hash to each message on their platform. Hashing is an algorithmic function by which a data file is linked with a fixed value. For instance, the hash value of a message that reads “hey there” might be “7cd35ejn.” The intermediary would have to maintain a library of all such hashes to assist a government agency with tracing the originator of a message that was the subject of an authorized order.

Practical Impediments to Accurate Traceability

The premise of a traceability mandate is that forwarding a message is the only way in which the same content circulates on a platform. However, that is not true. For instance, if a user downloads a viral message or image and instead of forwarding it, and then copies and pastes the message to send it to several others (or sends the image from their gallery), this starts a new messaging chain altogether of which they become the originator. Alternatively, instead of using the forward icon, a WhatsApp user could share a screenshot of an image or use the share icon to send the image by adding a caption. Thus, the same message is not necessarily sent in a linear fashion that can be traced back to a single originator. 

As a result, the very concept of the “first originator” of a message is inherently ambiguous. For instance, suppose User A sends an image to User B. A few days later, User C obtains the same image from a Twitter handle and shares the link with Users D, E and F. The second chain then goes viral. It is unclear who would be treated as the first originator in such cases – User A or User C?  There can be thousands of such chains of simultaneous communications. As a result, the very concept of the “first originator” of a message is inherently ambiguous. For instance, suppose User A sends an image to User B. A few days later, User C obtains the same image from a Twitter handle and shares the link with Users D, E and F. The second chain then goes viral. It is unclear who would be treated as the first originator in such cases – User A or User C?  There can be thousands of such chains of simultaneous communications. The bottom line is that in practice it would be onerous, if not impossible, to discern the “first originator” of a specific message, especially without accessing the content of end-to-end encrypted messages to determine which chains are carrying the same content.  

The bottom line is that in practice it would be onerous, if not impossible, to discern the “first originator” of a specific message, especially without accessing the content of end-to-end encrypted messages to determine which chains are carrying the same content.  

The Indian government’s proposal to store hashes for every message would make the job of tracing the originator without accessing message content even harder. As an initial matter, the proposal rests on the faulty assumption that the hash value of a message remains the same if the content of the message remains the same. In fact, when E2EE platforms like Signal and WhatsApp generate a hash value, the unique identity of the sender and the recipient is also taken into account. Therefore, if Zoe sends “hey there!” to Jake, and Jake forwards that message to Katie, each of those exchanges will carry a different hash value. When the message from Jake to Katie is taken to the intermediary for comparison against its repository of hashes, it will not reveal Zoe’s message to Jake at all. That is because the protocol underlying services such as WhatsApp and Signal use “forward secrecy,” a privacy-enhancing feature that essentially changes the key between two users for every message. Thus, to comply with the government’s demands, WhatsApp and Signal would have to effectively give up this feature and undermine the privacy and security of their services. 

The government’s proposal would also be likely to prove ineffective because the hash value changes with even the most insignificant change in the content of a message. For example, the hash of “hey there!” and “Hey There!!” would not be the same. Further, thanks to “forward secrecy,” if the identical message is sent by Jake to Katie twice, each of those messages will have a different hash value. Therefore, there is no practically feasible method of tracing any message back to its originator using alpha-numeric hashes.

Finally, WhatsApp has billions of messages circulating on its platform every minute. It would be practically impossible for it to store a hash associated with every single message. Such expansive storage, for an indefinite period, would fly in the face of data minimization principles that have evolved to ensure data privacy and security. Other likely models for the implementation of traceability entail the use of digital signatures, as the government had earlier demanded, and the use of metadata. Both of these methods are highly vulnerable to impersonation and misuse by bad actors. This would result in an amplified risk of false attributions.

A Chink in the Armor of End-to-End Encryption

As experts have opined, the traceability requirement will erode the privacy expectations users have when using E2EE services. Permanently linking a user’s identity with a message jeopardizes anonymity and privacy, and causes a chilling effect on the right to freedom of expression. E2EE by design means that no one except the sender and the recipient can know which message was sent by whom to whom. With traceability, on the other hand, a sender’s message can be forwarded to hundreds of other users and the 1000th recipient – unknown to the original sender – could disclose the message with the originator’s identity tied to it.  Under the proposal recommended by Dr. Kamakoti where the intermediary would keep the decryption key in escrow, the intermediary would be able to unmask the user’s identity and tie it to a particular message. That would undermine a defining element of E2EE, under which the intermediary has no means of tracking senders of messages. And, because there would be no way to know for which message the government might demand to know the originator, a platform would have to link the identity of the originator for every message sent in India.

And, because there would be no way to know for which message the government might demand to know the originator, a platform would have to link the identity of the originator for every message sent in India.

Although traceability concerns the identity of the original message sender, it may also lead to decryption of the content. To be sure, the traceability provision states that in complying with a traceability order, no SSMI “shall be required to disclose the contents of any electronic message.” But this provides scant comfort to those concerned about E2EE. The proviso states that the government may not require the intermediary to disclose the content of any message, not that it will not effectively require the intermediary itself to access the content. Indeed, in many cases, the government may already have the content of a message and therefore not be seeking disclosure of that content, but rather the identity of the originator. The intermediary may have no way of linking the known content to the originator without itself accessing the content of messages to identify the relevant string of messages for which it must disclose the originator. In other words, to comply with the traceability mandate and shield themselves from liability, the lack of other effective alternatives may compel intermediaries to build the capability of accessing content that would have otherwise been protected by E2EE. To truly protect E2EE, the proviso would have exempted platforms from complying with the traceability mandate if the only technically feasible method of compliance was to access message content. 

Legally Questionable

No matter how traceability is implemented, it will either severely undermine encryption or break it, resulting in a loss of anonymity, privacy and free speech. The Indian Supreme Court laid down a necessity and proportionality test when it recognized that “the right to privacy is protected as an intrinsic part of the right to life and personal liberty under Article 21 and as a part of the freedoms guaranteed by Part III (fundamental rights) of the Constitution.” The four essential limbs of the test are: “(i) the action must be sanctioned by law; (ii) the proposed action must be necessary in a democratic society for a legitimate aim; (iii) the extent of such interference must be proportionate to the need for such interference; and (iv) there must be procedural guarantees against abuse of such interference.”

The traceability requirement may well fail every part of this test: (i) nothing in the parent statute, the Information Technology Act, contains any traceability or comparable requirement and thus it does not appear to be sanctioned by law; (ii) the requirement is not necessary for a legitimate aim since, as discussed above, it will not actually achieve its professed aim; (iii) the interference is disproportionate because it imperils the privacy and free expression rights of millions of users, and the country’s information security; and (iv) there are no material safeguards against interference for users of E2EE platforms, as evident by the fact that the government can order the disclosure of identity information without any form of prior judicial review.

Conclusion

Traceability is antithetical to privacy, security and free expression. Its implementation will necessarily vitiate the security of E2EE platforms and compel them to fundamentally alter their architecture. E2EE is a vital tool that enables users to communicate safely. It creates a secure space within which users can share intimate thoughts, medical and financial information, unpopular ideas, and dissenting views, all without fear of unauthorized access. Traceability would undermine that in exchange for little or no public benefit.