Bias bounties is a public-sourcing approach to mitigating biases in a model. Since itโs difficult to anticipate adversarial dynamics, we can instead reward people for finding such problems.
Let
- A group classifier
that defines the harmed group. - A model classifier
that has better error on the group,
We can then build a better model:
Intuitively, we simply use to the better model
We can show that this improves our model overall. Let
and let
The second term is a better error, and by combining our equations, we get
In practice, we can require
We can how repeat and recurse this method, adding multiple