You’ll Probably Be Protected: Explaining Differential Privacy Guarantees
By Rachel Cummings, Associate Professor of Industrial Engineering and Operations Research, Columbia University and Priyanka Nanayakkara, Postdoctoral Fellow, Harvard Center for Research on Computation and Society
Disclaimer: The views expressed by CDT’s Non-Resident Fellows are their own and do not necessarily reflect the policy, position, or views of CDT.
Data collection is ubiquitous. Data are useful for a variety of purposes, from supporting research to helping allocate political representation. It benefits society to enable data use for such purposes, but it’s also important to protect people’s privacy in the process. Organizations across industry and government are increasingly turning to differential privacy (DP), an approach to privacy-preserving data analysis that limits how much information about an individual is learned from an analysis. Chances are DP has been used to provide a privacy guarantee for an analysis of your data: Companies like Google, Apple, Meta, Microsoft, and Uber, as well as government agencies like the U.S. Census Bureau have all used it in the past several years.
Not all differential privacy systems are created equal, though. The strength of privacy protections offered by DP depends on a “privacy loss budget” parameter, called epsilon. Epsilon is a measure of the amount of information “leaked” about individuals from the use of their data. This value can be chosen to be anything from zero to infinity, where smaller epsilon values correspond to stronger levels of privacy protections. Privacy protections can vary wildly according to how epsilon is set: bigger epsilons can leak much more information about individuals. For example, when epsilon is 0.1, an observer or attacker is 1.1 times more likely to learn something about you, compared to if they had never seen your data. If epsilon is 10, this becomes 22,000 times more likely. Despite epsilon’s importance as an indicator of privacy risk, it is seldom communicated to the people whose personal data are used by technology companies and other large organizations. This is in part because epsilon is difficult to reason about, even among experts. It is a unitless and contextless parameter, making it challenging to map onto real-world outcomes. Furthermore, it specifies probabilistic guarantees, meaning people must reason under uncertainty to fully grasp its implications. However, not explaining epsilon to people who are deciding whether to share their data under DP leaves them ill-informed about the protections that are being offered.
In an attempt to remedy this information asymmetry, we set out, in work published in 2023 with Gabriel Kaptchuk, Elissa M. Redmiles, and Mary Anne Smart, to design explanation methods for epsilon that empower people to make informed data-sharing decisions. Specifically, we wanted our methods to increase people’s:
- Objective Risk Comprehension: Understanding of numeric risks associated with sharing data
- Subjective Privacy Understanding: Self-rated feelings of understanding the privacy protections offered
- Self-Efficacy: Self-rated feelings of having enough information and confidence making data-sharing decisions
Explanation Methods for Epsilon
We developed and evaluated three portable explanation methods for epsilon: an odds-based text method, an odds-based visualization method, and an example-based method. Each method provides information about what a data subject can expect to happen if they share or do not share data.
As a concrete scenario, imagine that your company sent a survey to all employees, asking whether you feel adequately supported by your manager. Further imagine that you want to respond NO, but are worried that your manager would retaliate against you if they found out. Your company wants to protect your privacy in this process, and will only send your manager a differentially-private version of the number of NO responses. How should your company communicate about the privacy guarantees you are receiving in this process, and about the epsilon value that is used?
The “odds-based” methods present probabilities of your manager believing that you responded NO on the survey, when you share or don’t share data. We present probabilities as frequencies (e.g., 10 out of 100) versus percentages (e.g., 10%) because prior research has found that frequency-framed probabilities support people in making more accurate probabilistic judgments. The first odds-based method uses text, as follows:
Odds-Based Text Method
If you do not share data, 39 out of 100 potential DP outputs will lead your manager to believe you responded NO.
If you share data, 61 out of 100 potential DP outputs will lead your manager to believe you responded NO.
The second odds-based method adds icon arrays–a frequency-framed visualization technique for depicting probabilities–to the text (See Figure 1):
Odds-Based Visualization Method

Figure 1. Odds-Based Visualization Method. This explanation assumes 𝟄 = 0.5. Source: Rachel Cummings and Priyanka Nanayakkara.
Alternatively, research in usability suggests that concrete examples may help people understand security and privacy concepts. Hence, our third “example-based” method shows potential outputs from the DP algorithm (i.e., results of the analysis under DP) with data sharing and without (See Figure 2):
Example-Based Method

Figure 2. Example-Based method: Because DP adds some random noise to the output, the number of NOs reported may not be a whole number, or could even be negative. This explanation assumes 𝟄 = 0.5. Source: Rachel Cummings and Priyanka Nanayakkara.
How Well Do Our Methods Work?
We evaluated our methods using a vignette survey study with 963 people. We presented them with the workplace scenario described above, and presented our communication methods under varying epsilon values to capture a wide range of protections. People then had to decide whether or not to share data, and answered some additional questions to measure their objective risk comprehension and their subjective understanding of the privacy protections and self-efficacy.
We compared these responses against two baseline explanations: In the No-Privacy Control, people received no privacy protections, and their manager would certainly see their response if they chose to share data. In the No-Epsilon Control, people were presented with an explanation of DP from prior work that provided a high-level description of DP without explaining epsilon.
We found that the odds-based visualization method improved participants’ objective risk comprehension over the No-Privacy Control, while the example-based method decreased comprehension. In other words, participants tended to answer more questions correctly with the odds-based visualization method and fewer questions correctly with the example-based method. Furthermore, both odds-based methods improved feelings of having enough information compared to the No-Epsilon Control, which did not include epsilon information, suggesting people may feel empowered by having explicit information about epsilon.
Interestingly, we found that participants were more likely to share data when given one of our methods over the No-Epsilon Control. For example, participants were nearly twice as likely to share data when given the odds-based visualization method than when given the No-Epsilon Control. Finally, as expected, participants appeared to be sensitive to changes in epsilon: as epsilon increased (privacy protections weakened), participants were less likely to share data.
From our study, we found that the odds-based methods were most effective. Both odds-based methods improved objective risk comprehension (measured by two yes/no questions about risks with and without data sharing), subjective privacy understanding, and feelings of having enough information, over the example-based method for participants in our study.
The Future of Explaining Differential Privacy
Our work suggests that odds-based methods are promising for explaining epsilon to data subjects. While probabilistic information is often sidestepped in public-facing explanations of epsilon, we hope that in the near future, organizations deploying DP can increase transparency by using methods like ours to give people concrete, accessible information about epsilon. In addition to explaining epsilon, there are several other factors that could help people assess protections, like which “model” of DP is used, a problem which recent work (including ours) tries to tackle.
More broadly, we hope that providing more clear explanations of DP will enable broader, responsible adoption of this privacy technique. As we highlight, understanding the epsilon parameter and the privacy guarantees it provides are a critical part of transparency around DP. Our goal is that, with clear explanations of privacy protections, people become more empowered to weigh in on decisions about their data. This will also facilitate increased trust in this technology, allow more organizations to begin using it in practice, and unlock its potential for societally valuable data analysis.