The majority of work in algorithmic fairness poses “fair learning” as a constrained optimization problem over a fixed hypothesis class. While this has been a fruitful perspective from the point of view of algorithm design (Hardt et al., 2016; Agarwal et al., 2018; Kearns et al., 2018; Zafar et al., 2017), it has the disadvantage of centering tradeoffs as a key object of study. This is because tradeoffs are inevitable in constrained optimization: adding or tightening a constraint in an optimization problem necessarily harms the objective value. The upshot is that making fairness constraints more rigorous — either by tightening them quantitatively, or by adding additional “protected groups” results in higher error — both overall, and within each of the existing protected groups. This obstructs the deployment of fair learning methods, because it pits stakeholders against one another. It also requires that important decisions — who are the “protected groups” — be made ahead of time, before model training. But this can be difficult, because the groups on which model performance turns out to be worst on might not be apparent (e.g. because they arise from unanticipated intersections of attributes that turn out to be important) before model training, and are often found as a result of algorithmic audits (Buolamwini and Gebru, 2018).
In this paper we develop a very general framework that circumvents both of these problems. First, we do not need to identify all protected groups ahead of time. If at any time we discover a group on which the error of our model is sub-optimal (worse than the Bayes optimal error obtainable on that group), then there is a simple update that can take as input a certificate of the sub-optimality of our model on that group, and efficiently output a new model with improved error on that group. Second, we no longer have to worry about tradeoffs, because these updates can only be error improving — both overall, and within each group previously identified. That is, we can promise that whenever a group on which our model is poorly performing is identified, we can patch the existing model in a way that causes overall error to decrease, while being monotonically error improving for the entire sequence of groups that were previously identified. We place no restrictions at all on the structure of the identified groups, which can be overlapping and arbitrarily complex. Because each such update implies that the overall model error will decrease, a simple potential argument implies that there cannot be very many significant updates of this form. This means that we quickly and provably get convergence to a state in which either
Our model is close to Bayes optimal, and hence nearly unimprovable on any group of non-trivial size, or
If we are far from Bayes optimal, we can’t find evidence of this.
Since in general it is impossible to certify Bayes optimality with a polynomial number of samples and in polynomial time, we argue that in settings in which we would be happy with Bayes optimality from a fairness perspective, we should therefore be happy with a failure to falsify Bayes optimality, whenever we have made a concerted effort to do so.
Informally, a certificate falsifying Bayes optimality of some model consists of two things: 1) The specification of a particular group of examples , and 2) a model that outperforms on examples from . Such certificates exist if and only if is not Bayes optimal. Our algorithmic framework suggests two very different kinds of instantiations, depending on who (or what) we task with finding violations of Bayes optimality (i.e. groups on which error is needlessly high):
Large Scale Auditing and “Bias Bug Bounties”
After developing a model (using any set of best practices for fairness), we could expose our model to the public for auditing. We could then award monetary “bounties” to any individual or organization who could identify 1) a group , and 2) a model such that improves on our model for the individuals in . We could then use the certificate to automatically improve our model. An attractive feature of this use-case is that if the bounties are paid in proportion to the severity of the problem that is uncovered (assuming severity is proportional to the size of and to the size of the improvement that has on performance on ), then the convergence analysis of our update algorithm implies a worst-case bounded lifetime payout for the bounty program (because large bounties are paid for discoveries of large problems in our model, which lead to correspondingly large improvements). Recall that the convergence of our method is to a model that is either Bayes optimal, or at least indistinguishable from Bayes optimal by the parties who have made a concerted effort to do so. In the case of a bias bug bounty program, all interested parties (potentially incentivized by large monetary payouts) can attempt to falsify the model’s purported Bayes optimality, and hence if at convergence, none of these parties can succeed, we have a strong guarantee that our model is (indistinguishable from) Bayes optimal. In Appendix A we lay out a number of desiderata for a bias bounty program, how our framework addresses them, the difficulties that justify imposing the burden on bounty hunters that they submit models (rather than just examples), and how we might mitigate those burdens.
A Framework for Model Training
We can also use our “update” paradigm directly as an iterative algorithmic framework for model training: at every round, we must solve an optimization problem corresponding to finding a pair such that improves on the current model on the distribution restricted to . We give two methods for solving this problem: the first by reducing to cost-sensitive classification problems over a particular hypothesis class, and the other using an EM style approach, which also makes use of (standard, unweighted) empirical risk minimization methods defined over a class of group indicator functions and a hypothesis class . Assuming these optimization problems can be solved at each round over classes and respectively, this process quickly converges to a notion of “-Bayes optimality” that corresponds to Bayes optimality when and are taken to be the set of all groups and models respectively.
1.1 Limitations and Open Questions
The primary limitation of our proposed framework is that it can only identify and correct sub-optimal performance on groups as measured on the data distribution for which we have training data. It does not solve either of the following related problems:
Our model performs well on every group only because in gathering data, we have failed to sample representative members of that group, or
The model that we have cannot be improved on some group only because we have failed to gather the features that are most predictive on that group, and the performance would be improvable if only we had gathered better features.
That is, our framework can be used to find and correct biases as measured on the data distribution from which we have data, but cannot be used to find and correct biases that come from having gathered the wrong dataset. In both cases, one primary obstacle to extending our framework is the need to be able to efficiently validate proposed fixes. For example, because we restrict attention to a single data distribution, given a proposed pair , we can check on a holdout set whether in fact has improved performance compared to our model, on examples from group . This is important to disambiguate distributional improvements compared to subsets of examples that amount to cherrypicking — which is especially important in the public “bias bug bounty” application. How can we approach this problem when proposed improvements include evaluations on new datasets, for which we by definition do not have held out data? Compelling solutions to this problem seem to us to be of high interest. We remark that a bias bounty program held under our proposed framework would at least serve to highlight where new data collection efforts are needed, by disambiguating failures of training from failures for the data to properly represent a population: if a group continues to have persistently high error even in the presence of a large population of auditors in our framework, this is evidence that in order to obtain improved error on that group, we need to focus on better representing them within our data.
1.2 Related Work
There are several strands of the algorithmic fairness literature that are closely related to our work. Most popular notions of algorithmic fairness (e.g. those that propose to equalize notions of error across protected groups as in e.g. (Hardt et al., 2016; Agarwal et al., 2018; Kearns et al., 2018; Zafar et al., 2017), or those that aim to “treat similar individuals similarly” as in e.g. (Dwork et al., 2012; Ilvento, 2020; Yona and Rothblum, 2018; Jung et al., 2019)) involve tradeoffs, in that asking for “fairness” involves settling for reduced accuracy. Several papers (Blum and Stangl, 2019; Dutta et al., 2020) show that fairness constraints of these types need not involve tradeoffs (or can even be accuracy improving) on test data if the training data has been corrupted by some bias model and is not representative of the test data. In cases like this, fairness constraints can act as corrections to undo the errors that have been introduced in the data. These kinds of results leverage differences between the training and evaluation data, and unlike our work, do not avoid tradeoffs between fairness and accuracy in settings in which the training data is representative of the true distribution.
A notable exception to the rule that fairness and accuracy must involve tradeoffs, from which we take inspiration, is the literature on multicalibration initiated by Hébert-Johnson et al. (Hébert-Johnson et al., 2018; Kim et al., 2019; Jung et al., 2021; Gupta et al., 2021; Dwork et al., 2021) that asks that a model’s predictions be calibrated not just overall, but also when restricted to a large number of protected subgroups . Hébert-Johnson et al. (Hébert-Johnson et al., 2018) and Kim, Ghorbani, and Zou (Kim et al., 2019) show that an arbitrary model can be postprocessed to satisfy multicalibration (and the related notion of “multi-accuracy”) without sacrificing (much) in terms of model accuracy. Our aim is to achieve something similar, but for predictive error, rather than model calibration.
The notion of fairness that we ask for in this paper was studied in an online setting (in which the data, rather than the protected groups arrive online) by Blum and Lykouris (Blum and Lykouris, 2020) and generalized by Rothblum and Yona (Rothblum and Yona, 2021) as “multigroup agnostic learnability.” Noarov, Pai, and Roth (Noarov et al., 2021) show how to obtain it in an online setting as part of the same unified framework of algorithms that can obtain multicalibrated predictions. The algorithmic results in these papers lead to complex models — in contrast, our algorithm produces “simple” predictors in the form of a decision list. In contrast to these prior works, we do not view the set of groups that we wish to offer guarantees to as fixed up front, but instead as something that can be discovered online, after models are deployed. Our focus is on fast algorithms to update existing models when new groups on which our model is performing poorly are discovered.
Concurrently and independently of our paper, Tosh and Hsu (Tosh and Hsu, 2021) study algorithms and sample complexity for multi-group agnostic learnability and give an algorithm (“Prepend”) that is equivalent to our Algorithm 1 (“ListUpdate”). Their focus is on sample complexity of batch optimization, however — in contrast to our focus on the discovery of groups on which our model is performing poorly online (e.g. as part of a “bias bounty program”). They also are not concerned with the details of the optimization that needs to be solved to produce an update — we give practical algorithms based on reductions to cost sensitive classification and empirical evaluation. Tosh and Hsu (Tosh and Hsu, 2021) also contains additional results, including algorithms producing more complex hypotheses but with improved sample complexity (again in the setting in which the groups are fixed up front).
The multigroup notion of fairness we employ (Blum and Lykouris, 2020; Rothblum and Yona, 2021) aims to perform optimally on each subgroup , rather than equalizing the performance across subgroups. This is similar in motivation to minimax fairness, studied in (Martinez et al., 2020; Diana et al., 2021b, a), which aim to minimize the error on the maximum error sub-group. Both of these notions of fairness have the merit that they produce models that pareto-dominate equal error solutions, in the sense that in a minimax (or multigroup) optimal solution, every group has weakly lower error than they would have in any equal error solution. However, optimizing for minimax error over a fixed hypothesis class still results in tradeoffs in the sense that it results in higher error than the error optimal model in the same class, and that adding more groups to the guarantee makes the tradeoff more severe. Our approach avoids tradeoffs by optimizing over a class that is dynamically expanded as the set of groups to be protected expands.
The idea of a “bias bug bounty” program dates back at least to a 2018 editorial of Amit Elazari Bar On (On, 2018), and Twitter ran a version of a bounty program in 2021 to find bias issues in its image cropping algorithm (Chowdhury and Williams, 2021). These programs are generally envisioned to be quite different than what we propose here. On the one hand, we are proposing to automatically audit models and award bounties for the discovery of a narrow form of technical bias — sub-optimal error on well defined subgroups — whereas the bounty program run by Twitter was open ended, with human judges and written submissions. On the other hand, the method we propose could underly a long-running program that could automatically correct the bias issues discovered in production systems at scale, whereas Twitter’s bounty program was a labor intensive event that ran over the course of a week.
We consider a supervised learning problem defined over to denote a joint probability distribution over labelled examples: which aims to predict the label of an example from its features, and we will evaluate the performance of our model with a loss function
We consider a supervised learning problem defined overlabelled examples . This can represent (for example) a binary classification problem if , a discrete multiclass classification problem if , or a regression problem if . We write
to denote a joint probability distribution over labelled examples:. We will write to denote a dataset consisting of labelled examples sampled i.i.d. from . Our goal is to learn some model represented as a function
which aims to predict the label of an example from its features, and we will evaluate the performance of our model with a loss function, where represents the “cost” of mistakenly labelling an example that has true label with the prediction . We will be interested in the performance of models not just overall on the underlying distribution, but also on particular subgroups of interest. A subgroup corresponds to an arbitrary subset of the feature space , which we will model using an indicator function:
Definition 1 (Subgroups).
A subgroup of the feature space will be represented as an indicator function . We say that is in group if and is not in group otherwise. Given a group , we write to denote its measure under the probability distribution
to denote its measure under the probability distribution:
We write to denote the corresponding empirical measure under , which results from viewing as the uniform distribution over its elements.
as the uniform distribution over its elements.
We can now define the loss of a model both overall and on different subgroups:
Definition 2 (Model Loss).
Given a model We write to denote the average loss of on distribution :
We write to denote the loss on conditional on membership in :
Given a dataset , we write and to denote the corresponding empirical losses on , which result from viewing as the uniform distribution over its elements.
The best we can ever hope to do in any prediction problem (fixing the loss function and the distribution) is to make predictions that are as accurate as those of a Bayes optimal model:
A Bayes Optimal model with respect to a loss function and a distribution satisfies:
where can be defined arbitrarily for any that is not in the support of .
The Bayes optimal model is pointwise optimal, and hence has the lowest loss of any possible model, not just overall, but simultaneously on every subgroup. In fact, its easy to see that this is a characterization of Bayes optimality.
Fixing a loss function and a distribution , is a Bayes optimal model if and only if for every group and every alternative model :
The above characterization states that a model is Bayes optimal if and only if it induces loss that is as exactly as low as that of any possible model , when restricted to any possible group . It will also be useful to refer to approximate notions of Bayes optimality, in which the exactness is relaxed, as well as possibly the class of comparison models , and the class of groups . We call this -Bayes optimality to highlight the connection to (exact) Bayes Optimality, but it is identical to what Rothblum and Yona (Rothblum and Yona, 2021) call a “multigroup agnostic PAC solution” with respect to and . Related notions were also studied in (Blum and Lykouris, 2020; Noarov et al., 2021).
A model is -Bayes optimal with respect to a collection of (group, model) pairs if for each , the performance of on is within of the performance of on . In other words, for every :
When is a product set , then we call “-Bayes Optimal” and the condition is equivalent to requiring that for every , has performance on that is within of the best model on . When and represent the set of all groups and models respectively, we call -Bayes optimal.
We have chosen to define approximate Bayes optimality by letting the approximation term scale proportionately to the inverse probability of the group , similar to how notions of multigroup fairness are defined in (Kearns et al., 2018; Jung et al., 2021; Gupta et al., 2021). An alternative (slightly weaker) option would be to require error that is uniformly bounded by for all groups, but to only make promises for groups that have probability larger than some threshold, as is done in (Hébert-Johnson et al., 2018). Some relaxation of this sort is necessary to provide guarantees on an unknown distribution based only on a finite sample from it, since we will necessarily have less statistical certainty about smaller subsets of our data.
Note that -Bayes optimality is identical to Bayes optimality when and when and represent the classes of all possible groups and models respectively, and that it becomes an increasingly stronger condition as and grow in expressivity.
3 Certificates of Sub-Optimality and Update Algorithms
Suppose we have an existing model , and we find that it is performing sub-optimally on some group . By Observation 4, it must be that is not Bayes optimal, and this will be witnessed by some model such that:
We call such a pair a certificate of sub-optimality. Note that by Observation 4, such a certificate will exist if and only if is not Bayes optimal. We can define a quantitative version of these certificates:
A group indicator function together with a model form a -certificate of sub-optimality for a model under distribution if:
Group has probability mass at least under : , and
improves on the performance of on group by at least :
We say that form a certificate of sub-optimality for if they form a -certificate of optimality for for any constants .
The core of our algorithmic updating framework will rely on a close quantitative connection between certificates of sub-optimality and approximate Bayes optimality. The following theorem can be viewed as a quantitative version of Observation 4.
Fix any , and any collection of (group,model) pairs . There exists a -certificate of sub-optimality for if and only if is not -Bayes optimal for .
We need to prove two directions. First, we will assume that is -Bayes optimal, and show that in this case there do not exist any pairs such that form a -certificate of sub-optimality with . Fix a pair . Without loss of generality, we can take (and if we are done, so we can also assume that ). Since is -Bayes optimal, by definition we have that:
Solving, we get that as desired.
Next, we prove the other direction: We assume that there exists a pair that form an -certificate of sub-optimality, and show that is not -Bayes optimal for any . Without loss of generality we can take and conclude:
which falsifies -Bayes optimality for any as desired. ∎
Theorem 8 tells us that if we are looking for evidence that a model fails to be Bayes Optimal (or more generally, fails to be -Bayes optimal), then without loss of generality, we can restrict our attention to certificates of sub-optimality with large parameters — these exist if and only if is significantly far from Bayes optimal. But it does not tell us what to do if we find such a certificate. Can we use a certificate of sub-optimality for to easily update to a new model that both corrects the suboptimality witnessed by and makes measurable progress towards Bayes Optimality? It turns out that the answer is yes, and we can do this with an exceedingly simple update algorithm, which we analyze next. The update algorithm (Algorithm 1) takes as input a model together with a certificate of sub-optimality for , , and outputs an improved model based on the following intuitive update rule: If an example is in group (i.e. if ), then we will classify
), then we will classifyusing ; otherwise we will classify using .
Algorithm 1 (ListUpdate) has the following properties. If form a -certificate of sub-optimality for , and then:
The new model matches the performance of on group : , and
The overall performance of the model is improved by at least : .
It is immediate from the definition of that , since for any such that , . It remains to verify the 2nd condition. Because we also have that for every such that , , we can calculate:
We can use Algorithm 1 as an iterative update algorithm: If we have a model , and then discover a certificate of sub-optimality for , we can update our model to a new model . If we then find a new certificate of sub-optimality , we can once again use Algorithm 1 to update our model to a new model, , and so on. The result is that at time , we have a model in the form of a decision list in which the internal nodes branch on the group indicator functions and the leaves invoke the models or the initial model . See Figure 1. Note that to evaluate such a decision list on a point , although we might need to evaluate for every group indicator used to define the list, we only need to evaluate a single model . Moreover, as we will show next in Theorem 10, when the decision list is constructed iteratively in this manner, it cannot grow very long. Thus, evaluation can be fast even if the models used to construct it are complex.
The fact that each update makes progress towards Bayes Optimality (in fact, optimal progress, given Theorem 8) means that this updating process cannot go on for too long:
Fix any . For any initial model with loss and any sequence of models , such that and each pair forms a -certificate of suboptimality for for some such that , the length of the update sequence must be at most .
By assumption . Because each is a -certificate of suboptimality of with , we know from Theorem 9 that for each , . Hence . But loss is non-negative: . Thus it must be that as desired. ∎
What can we do with such an update algorithm? Given a model , we can search for certificates of sub-optimality, and if we find them, we can make quantitative progress towards improving our model. We can then repeat the process. The guarantee of Theorem 10 is that this process of searching and updating cannot go on for very many rounds before we arrive at a model that our search process is unable to demonstrate is not Bayes Optimal. How interesting this is depends on what our search process is.
Suppose, for example, that we have an optimization algorithm that for some class of (group,model) pairs can find a certificate of sub-optimality whenever one exists. Paired with our update algorithm, we obtain an algorithm which quickly converges to an -Bayes Optimal model. We give such an algorithm in Section 4.2.
Suppose alternately that we open the search for certificates of sub-optimality to a large and motivated population: for example, to machine learning engineers, regulators and the general public, incentivized by possibly large monetary rewards. In this case, the guarantee of Theorem 10 is that the process of iteratively opening our models up to scrutiny and updating whenever certificates of suboptimality are found cannot go on for too many rounds: at convergence, it must be either that our deployed model is -Bayes optimal, or that if not, at least nobody can find any evidence to contradict this hypothesis. Since in general it is not possible to falsify Bayes optimality given only a polynomial amount of data and computation, this is in a strong sense the best we can hope for. We give a procedure for checking large numbers of arbitrarily complex submitted proposals for certificates of sub-optimality (e.g. that arrive as part of a bias bounty program) in Section 4.1. There are two remaining obstacles, which we address in the next sections:
Our analysis so far is predicated on our update algorithm being given certificates of sub-optimality . But and are defined with respect to the distribution , and we will not have direct access to — we will only have samples drawn from . So how can we find certificates of sub-optimality and check their parameters? In an algorithmic application in which we search for certificates within a restricted class , we can appeal to uniform convergence bounds, but the bias bounty application poses additional challenges. In this case, the certificates are not constrained to come from any fixed class, and so we cannot appeal to uniform convergence results. If we are opening up the search for certificates of sub-optimality to the public (with large monetary incentives), we also need to be prepared to handle a very large number of submissions. In Section 4.1 we show how to use techniques from adaptive data analysis to re-use a small holdout set to check a very large number of submissions, while maintaining strong worst case guarantees (Dwork et al., 2015c, b, a; Blum and Hardt, 2015; Bassily et al., 2021; Jung et al., 2020).
Theorem 9 gives us a guarantee that whenever we are given a certificate of sub-optimality , our new model makes improvements both with respect to its error on , and with respect to overall error. But it does not promise that the update does not increase error for some other previously identified group for . This would be particularly undesirable in a “bias bounty” application, and would represent the kind of tradeoff that our framework aims to circumvent. However, we show in Section 4.1.1
that (up to small additive terms that come from statistical estimation error), our updates can be made to be groupwise monotonically error improving, in the sense that the update at timedoes not increase the error for any group identified at any time .
4 Obtaining Certificates of Suboptimality
In this section we show how to find and verify proposed certificates of sub-optimality given only a finite sample of data . We consider two important cases:
In Section 4.1, we consider the “bias bounty” application in which the discovery of certificates of sub-optimality is crowd-sourced (aided perhaps with API access to the model and a training dataset). In this case, we face two main difficulties:
The submitted certificates might be arbitrary (and in particular, not guaranteed to come from a class of bounded complexity or expressivity), and
We expect to receive a very large number of submitted certificates, all of which need to be checked.
The first of these difficulties means that we cannot appeal to uniform convergence arguments to obtain rigorous bounds on the sub-optimality parameters and . The second of these difficulties means that we cannot naively rely on estimates from a (single, re-used) holdout set to obtain rigorous bounds on and .
In Section 4.2 we consider the algorithmic application in which the discovery of certificates is treated as an optimization problem over , for particular classes . In this case we give two algorithms for finding -Bayes optimal models via efficient reductions to cost sensitive classification problems over an appropriately defined class, solved over a polynomially sized dataset sampled from the underlying distribution.
4.1 Unrestricted Certificates and Bias Bounties
In this section we develop a procedure to re-use a holdout set to check the validity of a very large number of proposed certificates of sub-optimality with rigorous guarantees. Here we make no assumptions at all about the structure or complexity of either the groups or models , or the process by which they are generated. This allows us the flexibility to model e.g. a public bias bounty, in which a large collection of human competitors use arbitrary methods to find and propose certificates of sub-optimality, potentially adaptively as a function of all of the information that they have available. We use simple description length techniques developed in the adaptive data analysis literature (Dwork et al., 2015b; Blum and Hardt, 2015). Somewhat more sophisticated techniques which employ noise addition (Dwork et al., 2015c, a; Bassily et al., 2021; Jung et al., 2020) could also be directly used here to improve the sample complexity bound in Theorems 11 and 12 by a factor, but we elide this for clarity of exposition. First we give a simple algorithm (Algorithm 2) that takes as input a stream of arbitrary adaptively chosen triples , and checks if each form a certificate of sub-optimality for . We then use this as a sub-routine in Algorithm 3 which maintains a sequence of models produced by ListUpdate (Algorithm 1) and takes as input a sequence of proposed certificates which claim to be certificates of sub-optimality for the current model : it updates the current model whenever such a proposed certificate is verified.
Let be any distribution over labelled examples, and let be a holdout dataset consisting of i.i.d. samples from . Suppose:
Let be the output stream generated by CertificateChecker (Algorithm 2). Then for any possibly adaptive process generating a stream of up to submissions as a function of the output stream , with probability over the randomness of :
For every round such that (the submission is rejected), we have that is not a certificate of sub-optimality for for any with . And:
For every round such that (the submission is accepted), we have that is a -certificate of sub-optimality for for .
The high level idea of the proof is as follows: For any fixed (i.e. non-adaptively chosen) sequence of submissions, a Chernoff bound and a union bound are enough to argue that the estimate of the product of the parameters and on the holdout set is with high probability close to their expected value on the underlying distribution. We then observe that submissions depend on the holdout set only through the transcript , and so are able to union bound over all possible transcripts . Since contains only instances in which the submission is accepted, the number of such transcripts grows only polynomially in rather than exponentially, and so we can union bound over all transcripts with only a logarithmic dependence in .
We first consider any fixed triple of functions . Observe that we can write:
Since each is drawn independently from , each term in the sum is an independent random variable taking value in the range
is an independent random variable taking value in the range. Thus is the average of independent bounded random variables and we can apply a Chernoff bound to conclude that for any value of :
Solving for we have that with probability , we have if:
This analysis was for a fixed triple of functions , but these triples can be chosen arbitrarily as a function of the transcript . We therefore need to count how many transcripts might arise. By construction, has length at most and has at most indices such that . Thus the number of transcripts that can arise is at most: , and each transcript results in some sequence of triples . Thus for any mechanism for generating triples from transcript prefixes, there are at most triples that can ever arise. We can complete the proof by union bounding over this set. Taking and plugging into our Chernoff bound above, we obtain that with probability over the choice of , for any method of generating a sequence of triples from transcripts , we have that: so long as:
Finally, note that whenever this event obtains, the conclusions of the theorem hold, because we have that exactly when . In this case, as desired. Similarly, whenever , we have that as desired. ∎
Persistently maintains a current model publicly, and elicits a stream of submissions attempting to falsify the hypothesis that the current model is approximately Bayes optimal,
With high probability does not reject any submissions that falsify the assertion that is -Bayes optimal,
With high probability does not accept any submissions that do not falsify the assertion that is -Bayes optimal,
Whenever it accepts a submission , it updates the current model and outputs a new model such that and such that no longer falsifies the sub-optimality of , and
With high probability does not halt until receiving submissions.
Fix any . Let be any distribution over labelled examples, and let be a holdout dataset consisting of i.i.d. samples from . Suppose:
Then for any (possibly adaptive) process generating a sequence of at most submissions , with probability at least , we have that FalsifyAndUpdate satisfies:
If is rejected, then is not a -certificate of sub-optimality for , where is the current model at the time of submission , for any such that .
If is accepted, then is a -certificate of sub-optimality for , where is the current model at the time of submission , for some such that . Moreover, the new model output satisfies and .
FalsifyAndUpdate does not halt before receiving all submissions.
At a high level, this proof reduces to the guarantees of CertificateChecker and ListUpdate. Note that the models produced in the run of this algorithm depend on the holdout set only through the transcript produced by certificate checker — i.e. given the stream of submissions and the output of CertificateChecker, one can reproduce the decision lists output by FalsifyAndUpdate. Thus we inherit the sample complexity bounds proven for CertificateChecker.
This theorem follows straightforwardly from the properties of Algorithm 1 and Algorithm 2. From Theorem 11, we have that with probability , every submission accepted by CertificateChecker (and hence by FalsifyandUpdate) is a -certificate of sub-optimality for with and every submission rejected is not a -certificate of sub-optimality for any with .
Whenever this event obtains, then for every call that FalsifyAndUpdate makes to is such that is a -certificate of sub-optimality for for . Therefore by Theorem 9, we have that and . Finally, by Theorem 10, if each invocation of the iteration is such that is a -certificate of sub-optimality for with , then there can be at most such invocations. Since FalsifyAndUpdate makes one such invocation for every submission that is accepted, this means there can be at most submissions accepted in total. But CertificateChecker has only two halting conditions: it halts when either more than submissions are accepted, or when submissions have been made in total. Because with probability at least no more than submissions are accepted, it must be that with probability , FalsifyAndUpdate does not halt until all submissions have been received. ∎
Note that FalsifyAndUpdate has sample complexity scaling only logarithmically with the total number of submissions that we can accept, and no dependence on the complexity of the submissions. This means that a relatively small holdout dataset is sufficient to run an extremely long-running bias bounty program (i.e. handling a number of submissions that is exponentially large in the size of the holdout set) that automatically updates the current model whenever submissions are accepted and bounties are awarded.
4.1.1 Guaranteeing Groupwise Monotone Improvements
FalsifyAndUpdate (Algorithm 3) has the property that whenever it accepts a submission falsifying that its current model is -Bayes optimal, it produces a new model that has overall loss that has decreased by at least . It also promises that has strictly lower loss than on group . However, because the groups can have arbitrary intersections, this does not imply that has error that is lower than that of on groups that were previously identified.Specifically, let denote the set of at most groups that make up the internal nodes of decision list — i.e. the set of groups corresponding to submissions that were previously accepted and incorporated into model . It might be that for some , . This kind of non-monotonic behavior is extremely undesirable in the context of a bias bug bounty program, because it means that previous instances of sub-optimal behavior on a group that were explicitly identified and corrected for can be undone by future updates. Note that simply repeating the update when this occurs does not solve the problem — this would return the performance of the model on to what it was at the time that it was originally introduced — but since the model’s performance on might have improved in the mean time, it would not guarantee groupwise error monotonicity.
There is a simple fix, however: whenever a new proposed certificate of sub-optimality for a model is accepted and a new model is generated, add the proposed certificates to the front of the stream of submissions, for each pair of and . Updates resulting from these submissions (which we call repairs) might themselves generate new non-monotonicities, but repeating this process recursively is sufficient to guarantee approximate groupwise monotonicity — and because we know from Theorem 10 that the total number of updates cannot exceed , this process never adds more than submissions to the existing stream, and thus affects the sample complexity bound only by low order terms. This is because for each of the at most updates, there are at most proposed certificates that can be generated in this way. Moreover, if any of these constructed submissions trigger a model update, these updates too count towards the limit of updates that can ever occur — and so do not increase the maximum length of the decision list that is ultimately output.The procedure, which we call MonotoneFalsifyAndUpdate, is described as Algorithm 4. Here we state its guarantee:
Fix any . Let be any distribution over labelled examples, and let be a holdout dataset consisting of i.i.d. samples from . Suppose:
Then for any (possibly adaptive) process generating a sequence of at most submissions , with probability at least , we have that MonotoneFalsifyAndUpdate satisfies all of the properties proven in Theorem 12 for FalsifyAndUpdate, and additionally satisfies the following error monotonicity property. Consider any model that is output, and any group . Then:
The proof that MonotoneFalsifyAndUpdate satisfies the first two conclusions of Theorem 12:
If is rejected, then is not a -certificate of sub-optimality for , where is the current model at the time of submission , for any such that .
If is accepted, then is a -certificate of sub-optimality for , where is the current model at the time of submission , for some such that . Moreover, the new model output satisfies .
are identical and we do not repeat them here. We must show that with probability , CertificateChecker (and hence MonotoneFalsifyAndUpdate) does not halt before processing all submissions. Note that MonotoneFalsifyAndUpdate initializes an instance of CertificateChecker that will not halt before receiving many submissions. Thus it remains to verify that our algorithm does not produce more than many submissions to CertificateChecker in its monotonicity update process. But this will be the case, because by Theorem 10, , and so after each call to ListUpdate, we generate at most many submissions to CertificateChecker. Since there can be at most such calls to ListUpdate, the claim follows.
To see that the monotonicity property holds, assume for sake of contradiction that it does not — i.e. that there is a model , a group , and a model with such that:
In this case, the pair would form a -certificate of sub-optimality for with . But if , then this certificate must have been rejected, which we have already established is an event that occurs with probability at most . ∎