Consider dataset containing an entry of our data and without that entry. If we run the same algorithm on both datasets, we get the outputs and . To ensure privacy, we want to make and โ€œindistinguishable.โ€ That is, for an observer who knows everything except which output corresponds to which dataset, they canโ€™t tell which output is the result of which dataset.

The only way to ensure this is through randomization and probabilistic indistinguishability. One example is ๐Ÿ“„ Randomized Response; following this idea, we โ€œrandomizeโ€ a ๐ŸŽฒ Probability Distribution to be similar to the original.

Formally, an algorithm is -differentially private is for any neighboring datasets and , the distributions for the outputs and are -close,

for any subset of outcomes. With , we have perfect privacy, and the higher the , the less privacy we have.

Intuitively, we can treat is the set of โ€œharmfulโ€ outcomes. This is a guarantee that using our data () or not using it () doesnโ€™t change the risk of the harmful outcomes happening. Note that this is different from guaranteeing that nothing bad will happen.

Examples of differentially-private algorithms include the ๐Ÿ“Œ Laplace Mechanism and โšก๏ธ Exponential Mechanism.

Post-Processing Immunity

The post-processing immunity of differential privacy states that any algorithm processing the output of a differentially-private algorithm cannot โ€œundoโ€ the differential privacy. That is, for some -DP algorithm and post-processing algorithm , is also -DP.

If is an algorithm that tries to recover the input to (that is, outputs or ), then we can show that our chance of a correct answer is not much better than random guessing. First, letโ€™s approximate . Then, the probability of a correct guess for is

Then, adding up the probability of correct guesses for both and gives us

The right hand side is close to . However, a perfect predictor would have each of these correct probabilities be , meaning its sum would be . The right hand sideโ€™s upper bound of thus shows that the best can do is close to random guessing.

Compositional Weaknesses

Though differential privacy protects against post-processing, the paired output of two parallel -DP and -DP algorithms is -DP. Generally, if there are datasets, each processed with an -DP algorithm, then their composition is -DP.