Algorithmic privacy deals with controlling inferences and exfiltration from data sources. Unlike cryptography, which involves controlling access to data, privacy allows access but prevents โleaks.โ
Privacy has three general tiers.
- Anonymization and aggregation techniques try to hide the individualโs data, but this is often not private enough.
- ๐ฒ Differential Privacy injects guarantees that the presence of our data in a computation has little effect on the โharmโ done; to do this, it introduces randomization into its computations, which provides strong indistinguishability guarantees.
- โNo Harm Whatsoeverโ is a guarantee that any computation on the data doesnโt increase chance of harm, but this promise is too strong and not practicalโharm may still come from computations done without our data anyways.
Info
The problem with โNo Harm Whatsoeverโ is that it prevents the computation from being done even when our involvement doesnโt affect the outcome (much). For example, if our case wasnโt included in a smoking study, the study would still find the link with lung cancer. โNo Harm Whatsoeverโ deems this a violation of privacy whereas differential privacy allows it.
Anonymization
Anonymization transforms some dataset
- Redaction: removing entire fields and columns.
- Coarsening: reducing the resolution of a field by grouping values into buckets.
However, anonymization is weak to comparison attacks that associate entries across multiple datasets, which might allow re-identification. One slight improvement is the
Aggregation
Aggregation is the idea that summary queries over a database wonโt leak individual entries. However, it can be shown that itโs possible to partially or fully reconstruct a dataset from query results.
Specifically, queries gives us a system of equations over the dataset entries, which can be solved via non-convex optimization methods. More details can be found in Dick et al.