Fair Allocation through Selective Information Acquisition

by   William Cai, et al.

Public and private institutions must often allocate scare resources under uncertainty. Banks, for example, extend credit to loan applicants based in part on their estimated likelihood of repaying a loan. But when the quality of information differs across candidates (e.g., if some applicants lack traditional credit histories), common lending strategies can lead to disparities across groups. Here we consider a setting in which decision makers – before allocating resources – can choose to spend some of their limited budget further screening select individuals. We present a computationally efficient algorithm for deciding whom to screen that maximizes a standard measure of social welfare. Intuitively, decision makers should screen candidates on the margin, for whom the additional information could plausibly alter the allocation. We formalize this idea by showing the problem can be reduced to solving a series of linear programs. Both on synthetic and real-world datasets, this strategy improves utility, illustrating the value of targeted information acquisition in such decisions. Further, when there is social value for distributing resources to groups for whom we have a priori poor information – like those without credit scores – our approach can substantially improve the allocation of limited assets.


page 1

page 2

page 3

page 4


Beyond identical utilities: buyer utility functions and fair allocations

The problem of finding envy-free allocations of indivisible goods can no...

Budget-feasible Maximum Nash Social Welfare Allocation is Almost Envy-free

The Nash social welfare (NSW) is a well-known social welfare measurement...

Fair and Useful Cohort Selection

As important decisions about the distribution of society's resources bec...

Fair Algorithms for Learning in Allocation Problems

Settings such as lending and policing can be modeled by a centralized ag...

The Constrained Round Robin Algorithm for Fair and Efficient Allocation

We consider a multi-agent resource allocation setting that models the as...

Equitable Allocation of Healthcare Resources with Fair Cox Models

Healthcare programs such as Medicaid provide crucial services to vulnera...

Actionable Recourse in Linear Classification

Classification models are often used to make decisions that affect human...

1 Introduction

Approximately one in seven U.S. households have unmet demand for small-dollar loans, and are often unable to secure credit from traditional financial institutions as they have little or no formal credit history [fdic2018]. However, in the majority of these households, individuals receive regular income and typically pay their bills on time, which suggests many in fact would have low risk of default [fdic2018]. One barrier to providing loans to this low-risk yet underserved subpopulation is that it can be more expensive and time-consuming to screen individuals with non-traditional financial histories, limiting the inclusiveness of the banking system.

Motivated by this problem, we formalize and analyze a general setting in which one must allocate limited funds both to screen applicants and to distribute resources. The task for a budget-constrained decision maker is thus to first select a set of candidates to screen and then, given the results of that screening process, determine to whom to allocate the remaining resources. We assume there is a fixed cost for screening each applicant, and that decision makers have prior knowledge of the distribution of information they would receive if they choose to screen an applicant. In practice, such prior knowledge could be obtained by screening a small random sample of applicants to learn the resulting information distributions.

We derive an efficient algorithm for computing an optimal, utility-maximizing strategy for the general screening and allocation problem. To do so, we first show that once a set of candidates has been selected to screen, it is optimal to allocate the remaining resources according to a threshold rule, with assets distributed to those candidates having post-screening expected utility above a fixed threshold. Further, for any fixed threshold policy, we show that one can find the optimal set of candidates to screen (while satisfying the budget constraint) via a linear program. Intuitively, one should screen candidates near the margin, for whom the screening process could reveal information that could push a candidate across the threshold. But to do this rigorously, one also needs to account for the precise structure of the prior information. Finally, we sweep over the possible thresholds, solving the corresponding linear program at each point. In this manner, we obtain both a rule to screen candidates and a specific threshold policy for distributing funds to candidates with sufficiently high post-screening value.

We further consider an extension of the above problem in which policymakers have explicit value for diversity. For example, instead of simply finding a max-utility policy, one might maximize utility subject to the constraint that a particular group—for example, those who traditionally have had limited access to credit markets—achieve at least a fixed minimum utility. In the United States, those with unmet demand for credit are disproportionately black and Hispanic [fdic2018], heightening the value of diversity considerations in allocation decisions. We show that this (and related) extensions can be incorporated into our general algorithmic approach in a straightforward manner.

To demonstrate the potential value of augmenting allocation decisions with a screening phase, we apply our methods to both synthetic datasets and one with real measures of creditworthiness. In particular, we examine the potential benefits of screening as a function of the cost and value of information. Especially when we impose a diversity constraint, we find that screening strategies can significantly outperform a naive strategy that simply attempts to satisfy the constraint without screening any applicants.

For concreteness, we frame our discussion in terms of lending decisions, though our approach applies to many allocation settings. For example, it is often challenging to accurately assess household wealth—particularly in countries where informal and irregular work is more common—and, in turn, to appropriately target the distribution of government subsidies [noriega2019active]. The simple strategy of distributing funds to those families clearly in need can systematically overlook populations with harder-to-verify financial status. As with lending, one can judiciously allocate some of the budget to more extensively screen certain applicants, ensuring funds are ultimately distributed to those who can benefit the most.

2 Related Work

Several papers address the problem of active feature acquisition [melville2004active, melville2005expected, saar2009active], where one can selectively purchase missing data to improve the overall out-of-sample performance of a statistical model. We consider the related problem of acquiring features to identify specific, high-value individuals. While there is some shared intuition between the two settings— that one should seek information on individuals most likely to alter downstream decisions—the technical approach we take is different, in large part because our end goal is optimal allocation rather than statistical learning.

In a related, recent stream of research, bakker2019fairness and noriega2019active

likewise consider a feature acquisition problem, but with fairness constraints. In their setting, the decision maker must acquire additional features for each individual to ensure classification decisions have similar errors rates across groups—including parity in false negative and false positive rates—a common measure of fairness in the machine learning community 

[hardt2016, kleinberg2016inherent, chouldechova2016fair]. Our approach to the problem differs in three important respects. First, we adopt the perspective of constrained utility maximization. Past work has shown that directly equalizing error rates can lead to outcomes that, counterintuitively, may harm the very groups they were designed to protect [corbett2018measure, corbett2017, liu2018delayed]. We avoid such deleterious outcomes by instead framing the problem explicitly in terms of group-specific utilities: fairness is encoded into our requirement that the decision maker must allocate some minimum amount of utility to each group. Second, we focus on one-shot screening decisions, in which decision makers simply choose whether or not to acquire information on each individual, rather than sequentially deciding how much information to acquire based on the results of each past acquisition decision. Our one-shot formulation maps to the binary decision structure (i.e., to screen or not to screen) common in many institutions and leads to different optimization challenges. Third, we directly model the tradeoff between screening and allocation decisions by tying both to a common budget constraint (i.e., more screening means less funds are available to ultimately distribute to individuals).

Finally, our work touches on research from the fair division and allocation literature [brams1996fair, moulin2004fair, thomson2011fair], which considers how to share resources while satisfying fairness properties defined between individuals. The former devises mechanisms wherein strategic agents self-divide the resources fairly, and the latter studies the existence of allocations that jointly satisfy various fairness notions. In contrast to our work, that line of research is particularly concerned with individual incentives, strategic action, and equilibrium effects.

3 A Model of Screening and Allocation

We model screening and allocation decisions as a sequential process in which a budget-constrained lender first selects a (possibly random) subset of candidates to further screen from a pool of applicants, and then, based on the information revealed in that screening phase, selects a second (possibly random) subset of candidates to receive a loan.

We assume the value of lending to an applicant

is given by the random variable

. These utilities are intended to capture the full social value of providing loans, and we imagine the lender aims to optimize social welfare, as in the case of a government agency. In general, the lender has only partial information about . More specifically, if the lender chooses not to screen an applicant, we assume the lender knows only the applicant’s conditional expectation given their pre-screening covariates , such as credit score for those applicants who have traditional credit histories. On the other hand, if the lender opts to screen an applicant, they learn , where denotes the additional information one gains through screening. For example, might encode applicant ’s history of paying their electricity or phone bills—information that is often feasible to acquire with some extra effort and which is a good indicator of creditworthiness [fdic2018].

When deciding whom to screen, we assume the lender knows the distribution of , where . That is, the lender knows how their estimate of utility could change if they decide to screen each applicant, where these distributions may depend on the available pre-screening covariates. A lender may, for example, thus choose only to screen applicants whose estimate is likely to substantially change given additional information. We further assume the lender must pay a fixed cost for screening an applicant and a cost for underwriting a loan. For simplicity we assume these costs do not vary across applicants, though it is straightforward to extend to the more general case; see Appendix A.

Based on knowledge of the above information and cost structure, the lender selects a randomized strategy to screen applicants. That is, the lender chooses a vector

, meaning that each applicant

is selected to be screened independently with probability

. Let indicate which applicants are ultimately screened under this policy; therefore, is a Bernoulli random variable with probability of success .

Given this randomized screening policy, we can now write the information the lender has at the end of the screening phase as follows:


In other words, is the lender’s post-screening estimated utility of giving a loan to applicant . In particular, if (i.e., the applicant is screened), the lender’s estimate changes from , the estimate based only on applicant ’s pre-screening covariates , to , which incorporates the post-screening information .

Finally, the lender chooses an allocation policy , where specifies the probability a loan is (independently) offered to each applicant. Importantly, is a function of the lender’s post-screening utility estimates . For example, the lender might give loans to the individuals with the highest post-screening utility estimates, up to the budget constraint.

Combining all of the above, the lender’s optimization problem is to choose screening and allocation policies that maximize expected welfare,


subject to being budget-balanced in expectation,


where is a fixed, non-negative constant.

In some settings, decision makers may value diversity in their allocations. For example, they may wish to ensure a certain minimum number of loans are provided to groups that historically have been excluded from credit markets. One can encode this policy preference directly into the utilities, in which case the resulting optimization problem would incorporate one’s value for diversity. That approach, however, requires decision makers to agree upon these utilities to interpret the results, which can be challenging.

Here we take a complementary approach that explicitly allows value for diversity to differ across decision makers. Suppose partitions the applicant pool into groups; for example, if , we might partition candidates into those who traditionally have had access to credit markets and those who have not. Then we require the selected policy to allocate at least utility to group , where these utilities do not themselves include any value for diversity:


In practice, as we discuss below, one would solve this optimization problem for a range of , which traces out the Pareto frontier of possible policies across different group constraints, corresponding to different values for diversity. The diversity condition above is expressed in terms of utility, but we might, alternatively, simply lower bound the number of loans given to members of each group . This alternative constraint can be handled in a straightforward manner by our algorithm detailed below.

A stylized example

We illustrate the above ideas in the context of a simple, stylized example. Suppose a lender must decide how best to allocate loans among an applicant pool of 13 people, with an overall budget of . Providing a loan costs , and additional screening of an applicant costs .

Further suppose that five of the applicants are able to provide rich credit histories, and the expected utility of giving each of them a loan is . For this group, additional screening would not provide any more information. Imagine that the other eight applicants do not have formal credit histories, and the utility of giving them a loan is accordingly lower due to the risk of default, with . However, the lender knows that these applicants come in two types that could be disambiguated through additional screening. More specifically, the lender knows that after screening, individuals in this group can be divided into those with expected utility $1,000 (with 50% chance) and those with expected utility $0 (with 50% chance). The high-utility group could, for example, correspond to those who demonstrate a history of consistently paying their bills on time.

The naive strategy that does not screen any individuals would allocate five loans to hit the budget constraint, since . All five loans would go to applicants with rich credit histories () over those without (). In this case, the total expected utility of the no-screening allocation is .

But in this scenario, one can improve overall utility by screening all eight applicants without formal credit histories and then granting loans to the high-utility applicants that are identified. Under that strategy, we expect four of the eight screened applicants to be identified as high utility (), and so the expected utility of the allocation is , greater than the expected utility of $3,750 under the no-screening strategy. Finally, the expected cost of the strategy is for screening plus for distributing the loans, totaling and satisfying the budget constraint. In this example, one can thus improve overall utility—and even allocate more loans to the group that a priori appears worse—by incorporating additional screening into the decision-making process.

4 Finding Optimal Policies

We now derive an efficient algorithm to find optimal screening and allocation policies , subject to the budget and diversity constraints. We start by showing that over the full space of policies, it is optimal to allocate resources according to a threshold policy, with loans dispersed to individuals having post-screening expected utility above a group-specific threshold for . Then, for each threshold policy, we show the optimal screening policy can be obtained by solving a linear program (LP), a type of optimization problem with linear objective function and linear constraints, for which their exist fast solution methods [bertsimas1997introduction]. As a result, we can find an optimal combined screening and allocation policy by sweeping over threshold policies and solving the corresponding LP for each such policy.

We begin by formally defining threshold policies.

Definition 1 (Threshold Policy).

A threshold policy is an allocation policy for some fixed and , , such that

where denotes the group membership of individual .

Threshold policies deterministically allocate resources to those with post-screening expected utilities above a fixed, group-specific threshold . Randomization (i.e., allocating resources with probability ) at the threshold may be necessary to exactly satisfy the budget constraint, which is important because for an individual not screened, the distribution of is concentrated at a single point.

Theorem 2 below formally states that it is sufficient to restrict to the set of threshold policies when searching for a globally optimal screening and allocation policy.

Theorem 2.

Suppose the constrained optimization problem defined by Eqs. (2), (3), and (4) has a solution . Then there is a threshold policy such that is also a solution.

To see this, suppose that deterministically selects a subset of applicants to screen. Then, for each group , it is clear one should allocate loans to the approximately individuals in each group with the highest post-screening estimated utility, where is the number of loans granted to each group under . Such a rank-based allocation can equivalently be written as a threshold rule with the same expected utility. The more general case, in which screening decisions are randomized, introduces some technical complications, but the spirit of the argument is similar. Full details can be found in Appendix A.

Now, given a threshold policy with thresholds and boundary randomization probabilities , we turn to finding an optimal companion screening policy. We show that such an optimal screening policy can be found by solving an LP. In particular, the LP has decision variables , with specifying the probability that applicant is screened; the objective equals the utility of the combined screening and allocation policy; and the constraints encode our budget and diversity conditions.

To construct the LP, we first define the following quantities that depend on both the threshold rule and the lender’s prior knowledge on the value of screening:

where denotes applicant ’s group membership. Recall that denotes pre-screening expected utility and denotes post-screening expected utility if applicant is screened. Both and the distribution of are known in advance.

For fixed screening probabilities , the expected utility of the corresponding screening and allocation policy can now be expressed as a linear function of :


The first summand reflects the expected utility associated with applicant if they were screened, and the second summand reflects the expected utility if they were not screened.

Our goal is to maximize the expression in Eq. (5) subject to the budget and diversity conditions, which we now show can also be expressed as linear constraints on . In terms of the constants defined above, the budget constraint can be written as


Likewise, the diversity constraints in Eq. (4) become


for .

Together, the objective given by Eq. (5), with constraints defined by Eqs. (6) and (7), define a linear program with decision variables , the solution of which—if one exists— gives a screening policy that is optimal for the threshold policy . A jointly optimal screening and allocation policy can accordingly be found through a grid search over all threshold policies in a (discretized) space . For each choice of policy , defined by threshold and randomization parameters and , we find the corresponding optimal screening policy by solving the LP described above. Then, among the resulting screening-allocation pairs, the one maximizing utility is guaranteed to be globally optimal.

5 Experiments

We now investigate the value of our sequential screening and allocation approach through two simulation exercises, one based on synthetic data and another based on real loan data. First, with the synthetic data, we examine the optimal policies as we vary both the cost of screening and the value of the resulting information. Then, with the real-world data, we illustrate how a decision maker could, in practice, operationalize our screening and allocation approach to pursue equity when distributing limited resources.

In each experiment, we allocate loans to members of two groups. One of the groups—which we call the “targeted” group—can be screened for more information at some cost, and we imagine there is social value to distributing more resources to this group. For example, the targeted group may be comprised of those who traditionally have not had ready access to the banking system and accordingly have limited formal credit history. For simplicity, we further assume that those in the non-targeted group cannot be screened, perhaps because they already have complete credit histories.

We define the utility of lending to applicant to be


where indicates whether they would pay back the loan, and and are fixed, known constants. In both of our experiments, we set and , meaning there is of social utility when an applicant receives and pays back a loan and utility when an applicant defaults on a loan, for example because defaulting could trigger further financial distress. As above, we imagine the lender is a government institution or other agent attempting to maximize total social utility. We further assume that the pre-screening covariate specifies the lender’s pre-screening estimate of applicant ’s likelihood to repay a loan; in other words, . The covariate can thus be translated into (pre-screening) expected utility by Eq. (8):

Figure 1: For an applicant with pre-screening likelihood of being creditworthy , the distribution of post-screening creditworthiness,

. Under a regime with high value of information (red line), the post-screening distribution has higher variance than in a setting with low value of information (blue line).

Synthetic data

We illustrate the value of screening in four regimes of low vs. high cost of information paired with low vs. high value of information. To do so, we created four synthetic datasets, each comprised of individuals.

For all four datasets, we first evenly split the population into targeted and non-targeted groups. For each applicant in the targeted group, we generated their pre-screening probability of repayment

(or, equivalently, their observed pre-screening covariates) by independently drawing from a beta distribution with mean 0.5 and count parameter 50.

111 In terms of the and shape parameters often used to parameterize beta distributions, the mean is and the count parameter is . A higher count corresponds to a lower variance. Similarly, for each applicant in the non-targeted group, we generated their pre-screening probability of repayment by independently drawing from a beta distribution with mean 0.70 and count parameter 50. The higher mean repayment probability for members of the non-targeted group corresponds to them being, on average, more creditworthy.

Now, for each applicant in the targeted group, the lender may elect to screen them. As a result of screening, the lender receives an improved estimate of the applicant’s repayment probability, so that:

We assume is drawn from a beta distribution with mean (i.e., the lender’s pre-screening estimate of the applicant’s repayment probability). In the high-information scenario, we set the count parameter for this beta distribution to be 5; and in the low-information scenario, we set it equal to 25.

Figure 1 shows these two post-screening information distributions for an applicant with . As illustrated in the plot, the high-information distribution (red line) has higher variance than the low-information distribution (blue line), and so screening is more likely to reveal very high risk and very low risk applicants in the high-information setting. Finally, we set the cost of screening to be in the low-cost scenario and in the high-cost scenario, the cost of a loan to be , and the total budget to be .

Figure 2: Comparison of our optimal screening strategy (blue line) with one without screening (black line) in four regimes with different costs and values of information.

Given these four datasets, we computed the optimal screening and allocation strategies using the algorithm described above.222Because only one group can be screened in our experiments, we use a faster variant of our optimization algorithm, described in Appendix B

. For any fixed set of parameters, this approach returns the optimal screening and allocation policy within a few seconds with the open-source LP solver SCS 

[o2016conic]. In our original problem formulation, the diversity constraint in Eq. (4) specified only that we lower bound the utility of each group. To better understand the impact of diversity on utility, we modify this constraint to be a strict equality for the utility of the targeted group and set the lower bound on utility to be for the non-targeted group. Thus, across a range of exactly satisfied utilities for the targeted group, we find the strategy that maximizes overall utility.

Figure 2 shows the results of this analysis, with the blue lines tracing out the Pareto frontiers for each of the four scenarios we consider. For comparison, the black lines show the corresponding result under a strategy that does not screen any applicants. Specifically, for any fixed utility constraint on the targeted group, the optimal no-screening policy first allocates loans to the individuals in the targeted group most likely to repay based on their pre-screening estimates , where is chosen to satisfy the utility constraint; and then any remaining budget is used to allocate loans to those in the non-targeted group most likely to repay.

When either the value of information is high (left column) or the cost of screening is low (bottom row), we find that screening can be a valuable tool to improve utility. Notably, screening plays a more important role in these examples as we demand greater utility be allocated to the targeted group, since the no-screening strategy ends up dispersing loans to relatively high-risk applicants in the targeted group even though more creditworthy applicants in that group could be identified for little marginal cost. As one might expect, the gap between the screening and no-screening strategies is particularly large when both information is valuable and cheap. Indeed, in the high-value, low-cost setting (lower-left panel), one can achieve substantial diversity with little drop in overall utility.

Empirical credit data

We next apply our approach to the German Credit Dataset [hofmann1994statlog], which includes a variety of individual-level socioeconomic and financial characteristics (e.g., age, employment status, and credit history) on a sample of people, of whom 700 are deemed creditworthy. We define the targeted group to be those who currently do not own their residence, a subpopulation that comprises 28% of the dataset. In this case, 60% of individuals in the targeted group are creditworthy compared to 74% in the non-targeted group. For members of the targeted group, we assume the lender, prior to screening, only knows the targeted group’s overall base rate of creditworthiness. If, however, the lender chooses to screen an applicant in the targeted group, they learn , the applicant’s likelihood of being creditworthy conditional on all the available features in the dataset. For members of the non-targeted group, we assume this full estimate of creditworthiness is available prior to screening, and that there is no opportunity to obtain additional information.333

More specifically, at the start of this exercise, we train a logistic regression model on the full dataset predicting creditworthiness as a function of the available covariates. Then, for members of the targeted group,

is the model-estimated probability of creditworthiness for applicant ; and for the members of the non-targeted group, that same model estimate is available prior to screening. Finally, we translate estimates of creditworthiness to estimates of utility via Eq. (8), in line with our simulations above.

Figure 3 shows the result of applying our screening and allocation algorithm to this dataset, where we assume the cost of screening is (equal to our high cost regime in the synthetic datasets), the cost of a loan is , and the total budget is . Like before, we compare the Pareto frontier of our approach (blue line) to that of a naive policy in which the lender does not screen applicants (black line). As with the synthetic datasets above, we find that the optimal policies with screening substantially outperform those without screening, particularly when we enforce a diversity constraint. For example, when we require of utility to come from allocating loans to the targeted group, the maximum total utility under the no-screening policy is , compared to under the screening policy, an increase of 17%.

Figure 3: For the German Credit Dataset, comparison of our optimal screening strategy (blue line) with one without screening (black line).

6 Discussion

Many creditworthy individuals often have difficulty gaining access to traditional credit markets due to lack of formal financial histories, an issue that can exacerbate existing socioeconomic disparities. To address this gap, we developed a simple and efficient algorithm for a budget-constrained decision maker to screen applicants and then allocate a limited resource, an approach that we find offers substantial benefits on both real and synthetic lending datasets. This joint screening-plus-allocation approach is especially useful in settings where a targeted subset of the population—those most in need of an intervention—are also those for whom the least information is available a priori, a common situation in many social welfare programs.

Past research has shown that a dearth of high-quality data for various subgroups of the population can lead to poor models in a variety of domains [gebru2018datasheets, olteanu2019social], including text analysis [bolukbasi2016man, caliskan2017semantics, garg2018word]

, facial recognition 

[buolamwini2018gender], and automated hiring [dastin2018]. Looking forward, our combined data acquisition and decision-making approach provides one framework to address this challenge by jointly modeling the cost of data collection and the value for subsequent improvements in downstream decisions.


Appendix A Proofs

We prove Theorem 2 in a more general setting in which we allow the cost of screening and cost of allocation to vary for each individual. In the special case of no screening (but with applicant-specific costs of allocation), we note that our optimization problem is equivalent to the fractional knapsack problem. For fractional knapsack, the greedy strategy—in which one packs items in descending order of their value per weight—yields an optimal solution [dantzig1957discrete]. In our case, we show that the same approach can be used to find an optimal allocation of loans once the set of applicants to screen has been appropriately selected.

To start, we generalize our definition of a threshold policy to account for the applicant-specific allocation costs.

Definition 3 (Cost-Aware Threshold Policy).

A cost-aware threshold policy is an allocation policy for some fixed and , , such that

where denotes the group membership of individual and denotes the cost of allocating resources to individual .

We first analyze the setting of a single group, in which case our threshold policy will have a single threshold, . Lemma 4 shows that in this case cost-aware threshold policies are non-dominated: there cannot be an allocation policy with both higher expected utility and lower cost than a single-threshold policy. Lemma 5 is an existence theorem for cost-aware threshold policies, which shows that if a particular expected cost or utility can be achieved by an allocation policy, it can be achieved by a threshold policy.

Lemma 4.

Let be a cost-aware threshold policy with a single threshold , and suppose has an expected cost —i.e., —and expected utility —i.e., . Let be any other allocation policy.

Case 1:

If the expected cost of is , then the expected utility of is less than or equal to .

Case 2:

If the expected utility of is , then the expected cost of is greater than or equal to .


Let , and set be the negative part of , i.e., . Similarly, let be the positive part.

The variables represent circumstances in which “saves money” in expectation over , and represents how this portion of the budget is reallocated.

Consider the following events:

Note that only on some subset of . likewise, only on a subset of . Informally, can only save in expected cost over when , and can only reallocate savings to circumstances in which .

Because and , it consequently follows that




In particular,

Here the inequality follows from Eqs. (9) and (10).

Now, since any allocation policy is by definition conditionally independent of given , it follows that and . In consequence,


In Case 1, the expected costs of and are equal. It follows that ; consequently, by Eq. (11), , and so the expected utility of is greater than or equal to that of .

In Case 2, the expected utilities of and are equal, so that . Then, again by Eq. (11), since , . Therefore the expected cost of is less than or equal to that of . ∎

Lemma 5.

Let be an achievable expected utility—that is, there is an allocation policy such that —and let be an achievable expected cost. Then,

  • There exists a cost-aware threshold policy of expected utility ; and,

  • There exists a (possibly different) cost-aware threshold policy of expected cost .


We first address the case of a given positive expected utility. Consider the function given by


We note three facts about . First, is monotonically non-increasing for . Second, is the maximum expected utility achievable by any allocation policy. Third, as .

Suppose . Then there exists some maximal such that . Let and . Define . Then the threshold policy defined by will have expected utility . The result follows as all achievable positive expected utilities lie in the half-open interval .

The proof in the case of expected cost is similar. ∎

We are now equipped to move to analyze settings in which there are multiple groups. We prove the following generalization of Theorem 2.

Theorem 6.

Suppose the constrained optimization problem defined by Eqs. (2), (3), and (4) has a solution . Then there is a (not necessarily single-threshold) cost-aware threshold policy such that is also a solution.

Proof of Theorem 2.

Consider a solution screening and allocation policy pair satisfying the allocation diversity constraints.

Note that each of the collections defines an allocation policy on the subpopulation . Applying Lemma 5 to each of them yields a threshold policy —defined by the pair —of the same expected cost as . By Lemma 4, achieves the same or higher expected utility on as .

Let be the threshold policy defined by and . The expected cost of is the same as . Moreover, satisfies the group utility constraints, since the utility it achieves on each group is greater than or equal to . Lastly, the expected utility achieved by is greater than or equal to that of .

Since was assumed optimal, it must be that the expected utilities are equal. Therefore the pair is also a solution to the constrained optimization problem defined by Eqs. (2), (3), and (4). ∎

Theorem 2 follows as an immediate corollary.

In some cases, it is useful to impose equality rather than inequality diversity allocation constraints. For instance, in some situations the decision maker may wish to obtain the Pareto frontier. Lemma 7 ensures that it is still sufficient when specific utilities must be achieved on subgroups to restrict to the set of threshold policies when searching for a globally optimal screening and allocation policy.

Lemma 7.

The conclusion of Theorem 2 still holds even if the inequalities in Eq. (4) are replaced with equalities, i.e., if the optimal policy pair satisfies


The proof is identical to that of Theorem 2, except that Lemma 5 is used to construct a threshold policy achieving exactly utility on any constrained group . Case 2 of Lemma 4 then shows that this utility is achieved at possibly less expected cost than . It follows immediately that the pair is a solution to the optimization problem.444 In this case, it is actually possible that will result in expected cost savings over , even though is optimal. This can occur if there is no remaining positive utility to “spend” the savings on, since resources are already allocated at total expected cost less than in all cases where is positive.

Appendix B Optimized Procedure

In our experiments we consider cases in which screening provides no additional information about one group of individuals (i.e., is concentrated at a single point for ). In this case, it is possible to significantly reduce the number of LPs the decision maker must solve to find an optimal policy or calculate the Pareto frontier. For instance, a lender may already possess all available information relevant to the creditworthiness of those with traditional credit histories.

In this case, the cost of allocating resources to group does not depend on the screening policy . Therefore, rather than sweeping over all possible pairs of thresholds and boundary randomization probabilities , one can encode the allocation policy to directly as a collection of allocation probabilities .

For notational simplicity, suppose without loss of generality that , that and , and that is a pointmass for all .

Then, let be a length vector of probabilities indicating the probability of screening the members of . Let be a length vector of probabilities indicating the probability of allocating resources to members of . We keep the rest of the notation the same as in Section 4.

Fix threshold and boundary randomization probability for . Then, the expected utility of any pair of screening policies (for ) and allocation policies (for ) can be expressed as a linear function of and :


Likewise, our budget constraint can be expressed as follows:


The allocation diversity constraint will take one of the following two forms. If the constrained group is ,


Otherwise, if the constrained group is , the constraint will be


Lastly, we ensure that and are probability vectors:


Together Eqs. (14), (15), (16), and (17)—given the initial data of the threshold policy on defined by the pair —define a linear program in the decision variables and that can be solved for an optimal screening policy (on ) and allocation policy (on ). By the results of Appendix A, can be assumed to have the form of a threshold policy. Sweeping over all such pairs in a (discretized) space , the resulting triple that maximizes utility will represent a globally optimal screening and allocation policy.