Bias bounties is a public-sourcing approach to mitigating biases in a model. Since itโ€™s difficult to anticipate adversarial dynamics, we can instead reward people for finding such problems.

Let be our current model. For a bounty to be valid, we require two classifiers:

  1. A group classifier that defines the harmed group.
  2. A model classifier that has better error on the group,

We can then build a better model:

Intuitively, we simply use to the better model for the group defined by in the bounty.

We can show that this improves our model overall. Let

and let . We have

The second term is a better error, and by combining our equations, we get

In practice, we can require for some threshold to prevent extremely minuscule improvements.

We can how repeat and recurse this method, adding multiple and to the front of our model. If adding a new makes a previous improvement worse, we can simply re-add to the front of the model stack. With our threshold , we will only accept models before reaching to best-possible result.