1 Introduction
The main aim of this paper is, after introducing a distinction between justice and group fairness through the language of probability, to show that theories of justice do not provide a sufficient normative grounding for reasonable accounts of group fairness. Justice and fairness may be used as rough synonyms when it is not assumed that they stand for different concepts.
^{3}^{3}3For example, the same luckegalitarian view is described as "comparative fairness" in [temkin_equality_2017] and as "egalitarian justice" in [cohen_currency_1989]. Remarkably, one important theory, John Rawls’s, allegedly derives justice from a combination of formal fairness and prudential premises. Our starting point is different: group fairness and justice are compatible only when justice is perfectly realized, otherwise one obtains at the expense of the other. And yet, if one tries to define fairness with the same normative elements that are used to define justice, it turns out that groupfairness can only be achieved in an absolute sense by procedures involving a nondeterministic element. This interesting result could be regarded as an insight about the impossibility of group fairness; but it could also be regarded as a reason to define groupfairness on the basis of theories other than theories of justice.When we talk about fairness, here, we mean a property of procedures allocating outcomes, not Platonic properties of outcome distributions defined independently of the procedural elements that may bring those distributions about. We focus on imperfectly just procedures, those that do not guarantee a perfectly just distribution. The property of group fairness as a property of procedures, in the definition from which we start, is an intuitive one. It seems intuitive that a procedure is group fair only if it does not favor in a morally arbitrary way, intentionally or unintentionally, any individual who belongs to a group over an individual who belongs to a different group. We will subsequently provide a rigorous analysis of this notion, following a notation defended for the first time in [loi_philosophical_2019]. We then provide a formal argument to show that, for an imperfectly just procedure to be group fair, it must necessarily be a nondeterministic one.
As procedures involving a nondeterministic element are typically perceived as problematic from the point of view of justice, this result is both unexpected and not trivial. We examine the relation between outcome justice and procedural justice based on [rawls_theory_1971] the distinction between pure, perfect, and imperfect procedural justice. We exclude pure procedural justice where just outcomes are not defined prior to procedures. Then, we explain the relation between (perfect and imperfect) procedural justice and group fairness in the language of probability (and set) theory. In this way, we can show mathematically that, among deterministic procedures, only a perfect procedure can be groupfair with respect to all groups. We conclude that, for all imperfect procedures, only nondeterministic ones can be group fair with respect to all (logically possible) groups. The statement we prove mathematically is: unless the procedure is perfect, one can always identify at least two morally arbitrary groups, relative to which the procedure is not fair. This result can be shown mathematically, given our definitions of justice and group fairness. The only way to avoid this result is to make the procedure nondeterministic.
This result is relevant for the FAccT community because many algorithms qualify as procedures according to our definition. It also applies to many procedures that are not currently discussed in the context of data science. This requires a new, more abstract language, that is an essential element of this (and similar, see
[loi_philosophical_2019]) contributions.Furthermore, with a stark contrast to papers discussing statistical fairness definitions for machine learning, our goal is not to deliver an measurable definition of group fairness. Our argument illustrates the relevance of postulating an objective
moral ground truth for the sake of the normative analysis. Our basic philosophical postulate is that someone may be objectively deserving even if no feasible procedure for determining who deserves what in real life exists. For example, someone could be objectively innocent of the crime charged to her, and deserve acquittal, on the basis of evidence that a procedure, even the best human procedure, may ignore, which leads to an (unjust) conviction. Perfect justice is consistent as a Platonic idea, even when we know no procedure that may achieve it. The concept of the moral ground truth differs from the concept of the ground truth as used in the machine learning literature, when it typically refers to features that are actually observed (e.g., arrest by the police) and whose normative relevance is in many cases worth doubting. Our paper needs a different conception of the ground truth because its point is, primarily, philosophical and conceptual.The structure of the paper is the following. Section 2 discusses related work. Section 3 provides definitions of procedure, merit, morally arbitrary group, justice, fairness, perfect and imperfect (as attributes of procedures). Section 4 offers a schematic and graphical representation of all possible procedures for distributing advantages or disadvantages in a context in which some inequalities are morally justified; we provide this representation only for the binary case for simplicity’s sake; the representation relies on probability. Section 5 presents briefly the main argument of our analysis. In Section 6 we examine deeply the argument by considering all possible procedures. Section 7 is devoted to the final discussion, and Section 8 takes stock of our work.
2 Related work
The distinction between a perfect and imperfect procedure derives from John Rawls [rawls_theory_1971], who defines perfect procedural justice as a form of procedural justice guaranteeing the justice of outcomes, where outcomes are characterized as just prior to the procedure. Imperfect procedural justice results when just outcomes cannot be guaranteed every time.
In the contemporary debate in analytical philosophy (e.g. [arneson_equality_1989, miller_principles_1999, sen_equality_1982]) the goal is, often, to characterize outcomes that are just in this sense. For example, luck egalitarians [arneson_equality_1989] characterize just inequalities as inequalities that reflect unequal responsibility; "desertarians" as outcomes reflecting unequal contributions [brouwer_why_2018]. One may also argue that different needs justify inequality [herlitz_measuring_2016]. Pluralists [miller_principles_1999, walzer_spheres_1983] maintain that inequalities are justified by different properties in different contexts.
Philosophers interested in procedural justice mostly discuss pure procedural justice, where a prior, substantive definition of just outcomes is not available, or subject to (sometimes, reasonable) disagreement [ceva_interactive_2016, daniels_accountability_2000, wong_democratizing_2019]. Here what makes outcomes just is the procedure leading to them. We ignore pure procedural justice in what follows.
Our definition of "group fairness" bears resemblance to [hardt_equality_2016]
’s definition of equalized odds (and "separation"
[barocas_fairness_nodate]). Equalized odds requires that, when testing an algorithm with historical data, individuals with the same value for the ground truth, (e.g., those who have repaid their loan), have the same chances of receiving a favorable or unfavorable classification,(e.g., they are classified as future defaulting creditors), independently of their group. This is not necessarily "fair", according to our use of the term, because equalized odds are defined relative to the actually observed ground truth, which may differ from the moral ground truth, which is what morally justifies the unequal treatment
[heidari_moral_2019].The definition of group fairness we analyze is not entirely new. The normative definition we use here has been explicitly defended [loi_philosophical_2019, loi_fair_2021], or, more often, presupposed. For example, in [di_bello_profile_2020], one premise of the argument is that equal protection implies "a right of innocent defendants not to be exposed to higher ex ante risks of mistaken conviction compared to other innocent defendants facing similar charges" (p. 147). In the vocabulary we employ here, this amounts to assuming that all (objectively) innocent defendants equally deserve to be acquitted, independently of their statistically relevant characteristics. Our interpretation is supported by the objection against statistical profiling being that "admitting incriminating profile evidence would create an inequality in the distribution of the risks of mistaken conviction [between innocent individuals matching and not matching statistical profiles], and that admitting exculpatory profile evidence would do so as well" (p. 167).
The equality of opportunity approach, also discussed in relation to machine learning [heidari_moral_2019] rests on the luckegalitarian assumption that a person’s effort (that for which the individual is responsible) justifies inequality; by contrast, his or her circumstances, (e.g, "the socioeconomic status he/she is born into") do not. We use "merit" here in a way compatible with the (extensional) identity between "effort", as used in an equality of opportunity theory of this type [heidari_moral_2019] and merit; but our concept is different (intensionally). It denotes different properties if other theories are valid, or in different distributive contexts.^{4}^{4}4We follow [loi_philosophical_2019, loi_fair_2021], who use "desert*" to designate any inequalityjustifier, which corresponds to different properties according to responsibilitybased views (e.g., [heidari_moral_2019], meritocratic, [brouwer_why_2018, miller_principles_1999], needbased or pluralistic ones [miller_principles_1999]).
3 Main concepts
3.1 Procedure and utility
We define a procedure as a rule leading to the allocation of benefits, burdens or harms of various kinds. Benefits and harms are (or rather become, by virtue of the procedure) features of the individuals obtaining them. we indicate these with the letter .
More precisely, by a procedure we mean a sequence of actions such that, when criterion is satisfied, a given outcome for the individual, , is produced. For the sake of simplicity we prove our argument for procedures leading to binary decisions, e.g., an individual is either convicted or acquitted in a trial, hired or rejected by an employer, given or refused bail, etc. Some procedures are extremely complicated. They involve a large number of criteria that jointly determine a decision. But we can suppose for the sake of simplicity that is a ubercriterion which corresponds to all a manifold of other criteria, or combination of criteria, being satisfied, to such an extent that their satisfaction is sufficient and necessary for the decision concerning the individual. The nature of may be very complicated; in some cases, combines several criteria of different weights, neither of which is individually necessary or sufficient for a decision.
We assume, for deterministic procedures, a set of "determinant facts" that summarizes, for that procedure, the satisfaction of all the criteria that are relevant and every other fact about the circumstances affecting the procedure explaining why a given outcome will necessarily be achieved. Take the criminal trial for instance. In the criminal trial there are many sources of evidence that have to be presented to a court and evaluated, in order for the judge or the jury to reach a decision. Many witnesses have to be consulted, different types of evidence and epistemic criteria have to be balanced, etc. Moreover, most procedures are probably sensitive to inputs that they were not intended
to be responsive to. These may include the way the defendant expresses his or her emotions when speaking, the temperature of the room on the trial day, whether a judge is hungry, the color of the skin of the defendant and the witnesses, etc. We shall refer to these as "procedureinfluencing circumstances". We simplify this picture by supposing the decision to be determined by a set of determining causes, the existence of which we represent by the value of the binary random variable
.Procedural changes may imply different outcomes for different type of defendants, though the effects may not be directly measurable. If a new procedure 2 is adopted, innocent defendants may have a higher chance of being acquitted of a given charge, than under old procedure 1. Unfortunately, it may also be the case that guilty defendants are also less likely to be convicted as a result of procedure 2. Moreover, different procedures typically imply that a different set of determinant facts is necessary and sufficient to acquit a defendant. In other words, what facts count as necessary and sufficient conditions for a acquittal under procedure 1 may not count as such under procedure 2. Moreover, members of one demographic group (e.g., young people) may have higher chances of being acquitted under procedure 3 compared to procedure 2, even though innocent defendants are equally likely to be acquitted in both.
We can also describe the two possible outcomes for individuals subjected to a trial as unequal utility outcomes for them, e.g., a better outcome, , if the individual is acquitted, and a worse outcome, , if he is convicted.
Thus, we can summarize all the salient events taking place in a criminal trial procedure by the following definitions:

the determinant facts for acquittal are in place;^{5}^{5}5One may think about as lack of (perceived) sufficient evidence for the crime.

the determinant facts for acquittal are not in place;

the defendant is released of all charges and acquitted;

the defendant is declared guilty and convicted.
The necessary and sufficient condition of our criterion with respect the benefit can be summarized by the following expression:
(3.1) 
which (in the binary case) amounts to
In the following we shall introduce a probabilistic notation, which should help to formalize mathematically our argument. In particular we use the classical notation for the probability and for the conditional probability. In these terms, a procedure can be see as a the method to associate the outcome with a couple of probabilities:
It is trivial to observe that in the binary case we have and the second equality is directly deductible from the first one. This definitions build a close link between the procedure and its probabilistic description. By using relation (3.1), we can also find the probability relations for the (binary) criterion :
Moreover, we notice that the following conditional probabilities hold:
Let us reflect now on the ontology of the determinant facts . If a procedure is deterministic, there will be a set of inputs (in advance of the procedure being actually carried out) that always lead to the same output, e.g., a conviction. At least in the case of purely mechanical (unambiguouslydefined) deterministic procedures with two mutually exclusive outcomes, (for example, acquittal and conviction), can be regarded as the universal unambiguous criterion (in what follows, criterion) that necessarily leads to the procedure output.^{6}^{6}6In logical words, we are considering a certain true/false decision problem, which is decidable. A good example could be the formalization of a given procedure with a Turingdecidable algorithm. (This is intended to make our argument especially clear, even if it is, morally speaking, an unnecessarily restrictive assumption.) For example, consider a criminal law procedure which takes, as inputs, a description of the case by the defendant’s lawyer, another one by the prosecutor, counts the sum of characters in both texts, , and convicts the defendants if and only if is prime. It is intuitively clear that the procedure reaches a determinate verdict with a specific result for every combination of inputs and that this result will be reached in a purely mechanical manner. In this case, we can ascribe to the random variable the value if is prime, otherwise. We can then consider the group of all individuals for which . We shall say that these are the individuals for which the criterion for acquittal is satisfied. Notice that, for very large reports, it may not be immediately clear whether before the procedure (the computations needed to determine if n
is prime) is actually carried out. Yet, the determinant facts exist independently of carrying out the procedure since the very moment the two reports are completed.
The concept of determinant criterion can be generalized to fully deterministic procedures that are not unambiguous like the mechanical procedure imagined here. In the real world, the "official" inputs, those that are intended to determine a trial (e.g., the lawyers’ arguments and witnesses’ depositions), do not cause the outcomes with full certainty, but only affect their probabilities. The full set of causes producing an acquittal outcome include the (typically, unintended) procedureaffecting circumstances, such as the defendants’ tone of voice or the witness skin color, mentioned before. Yet there may be a level of description at which even real world procedures are deterministic. The meaning of "deterministic" for procedures can be described in modal terms: given a sufficiently exhaustive description of the initial state of the procedure (including the procedureaffecting circumstances), necessarily the same outcome is produced, at least in suitably defined normal operating conditions. For example, if a judge must interpret the laws rather than apply them mechanically, the same judge would not change the interpretation given the same (suitably described) initial state of the procedure.^{7}^{7}7To change interpretation without any change in the circumstances of the case would amount to introduce a degree of randomness in its normal operations. The set of determinant causes is, unlike the unambiguous criterion of a mechanical procedure, typically too complex for any human to describe.^{8}^{8}8A careful reader may notice that the set of inputs should be fixed. However in the real world laws are subject to different interpretations, which could make the set of inputs not definable in advance. The formal proof relies on a mathematical construction that needs some strong hypothesis on as well as a procedure, which acts as an algorithm, see also Note 6.
3.2 Justice, deserved and undeserved inequalities
Following [loi_philosophical_2019], we assume the existence of a theory of justice for the allocation of (dis)advantages, denoted with values of the random variable , in the population. A theory of outcome justice is a theory of what, if anything, justifies inequalities in the distribution of . We refer to such "inequality justifier" as "desert" or "merit" without any intended implication that we are endorsing a substantive meritocratic view. We abstain from taking any position on the substantive philosophical problem of what, if anything, justifies inequality. Notice that this definition accounts for justice in terms of the distribution of a type of outcome (denoted with ), given other features of individuals (e.g., their contributions or needs, i.e., ); this completely ignores the procedures that have been used to allocate in practice.
We use for a random variable, the values of which indicate features, among those of an individual, making that individual deserving of a certain treatment, e.g., a person’s need for welfare assistance, her excellence for a prize, her responsibility of a crime for the punishment of that crime, etc.
We assume that, given a context (e.g., the distribution of convictions, or healthcare resources), such values of can be defined. One could argue that only one feature (e.g., responsibility) justifies inequality in all contexts, but our proof does not need this assumption. Our analysis is conceptually independent from a specific justifier, as long as it can be abstractly described in this abstract vocabulary. We suppose that even complex accounts of merit can be described as distinct values of a single variable
(which may be thought as a vector of several features). We simply assume that, no matter how complex the inequalityjustifying property, individuals will differ in relation to it in an objective way.
For the sake of simplifying our proof, we consider here a case in which is binary, e.g., there are only two classes of people who differ in relation to merit, that is to say, people either deserve a positive outcome () or not (). The argument generalizes unproblematically to a different range of values for .
In our example, we assume that defendants who are guilty of a crime deserve their conviction and defendants who are not guilty deserve clearing of all charges.
We define inequalities in the distribution of among individuals who are equal in as undeserved inequalities, or equivalently, unjust (but not necessarily unfair) inequalities. For example, consider two people both guilty of the crime charged to them and suppose that one is convicted and the other not. The inequality between these two people is deserved, because the two people differ in their desert, . Conversely, the inequality between two defendants who are both innocent (or guilty) is not deserved. If a procedure were perfect, as we shall see next, all equally deserving individuals would receive the same outcomes. For imperfect procedures, it may be probable that equally deserving individuals are allocated equal (dis)advantages, but it is never certain. This uncertainty can be expressed as a probability. Thus, for this class of individuals , and .
We shall introduce the following notations for the development of our mathematical argument:

the person is (really) innocent;

the person is (really) guilty.
We can introduce now the conditional effect of the in the description of the procedure by considering the following four probability:
As explained above, the binary case allow us to limit our analysis on the first two equations without loss of information.
Notice that we distinguish between , which makes an individual deserving of , and , which determines that the individual will be assigned by the procedure. In the criminal justice case, for example, includes the evidence submitted to the court, while is the fact that the defendant is actually guilty, including facts for which no court evidence can be produced. When a procedure is imperfect and do not correspond. For example, in the case of some innocent defendants (), her features and circumstances are sufficient to cause the appearance of guilt (), leading to a conviction (). A case such as this would be formalized as , and , and .
3.3 Morally arbitrary groups
Given a background theory of justice which we use to determine the nature of , we define morally arbitrary traits as those features the individual has which do not justify inequalities. Thus, every feature that is not a possible value of is a morally arbitrary feature. Individuals who are equal in (deserve the same ) and differ in other respects differ in a morally arbitrary way, relative to the good , whose distribution is in question. We appeal to the concept of morally arbitrary features in order to assess if a procedure, described probabilistically, exhibits any degree of groupunfairness. We illustrate this with the following example.
Example 3.1.
Let us consider a population of individuals. Moreover, let us consider the classical binary attributes male/female (). We notice that one can also describe this kind of situation in terms of "sex" , with , for simplicity, in the following lines we shall use . We assume that a person’s sex is not, in itself, a good ground to acquit or convict individuals. Hence, sex is a morally arbitrary feature. Let now suppose an hypothetical division of the population as follows.
Attribute  Individuals 

Male ()  
Female () 
Let us suppose that a dataset provides the following table of guilt and its complement.
Guilty ()  

Attribute  Individuals 
Male ()  
Female () 
Not Guilty ()  

Attribute  Individuals 
Male ()  
Female () 
Now let we start from a given procedure such that
In this toy model, the number of convicted (i.e., ) is
GUILTY CONVICTED  
NOT GUILTY CONVICTED 
Now, let us suppose that, as a requirement of groupfairness, we should guarantee that innocent individuals have the same chances of acquittal and guilty individuals the same chances of conviction, in either case, independently of their sex. (See 4.3 for the formal definition of groupfairness.) Let us suppose that, after optimizing the procedure for accuracy constrained by group fairness, we obtain the following results:
leading to the following two tables:
We notice two things. First of all, given a single morally arbitrary distinction (male and female), it is possible to achieve groupfairness without undermining justice completely. That is, innocent individuals (in either group) are still more likely to be acquitted than guilty ones. Second, we notice that, among women who are convicted, the share of guilty ones is little more than one half, indicating a great degree of injustice. This injustice is also significantly greater in the case of women, than in the case of men, where out of convictions, only are mistaken. This is remarkable given that we have characterized groupfairness and justice on the basis of the same theory of . Namely, all and only innocent defendants should ideally be acquitted. The example shows that, even starting from a single, monist, conception of justice, complex tradeoffs may emerge between the degree to which justice is achieved and the group fairness of the imperfect procedures designed to achieve it.
4 A probabilistic representation of group fairness and justice
4.1 The ROC space
To provide an intuitive and rigorous characterization of some moral features of procedures, let us represent all the logically possible procedures that could be used to allocate utility to individuals with attributes of merit . Since we are interested in the distribution of and merit (as both are essential to the above characterization of justice and fairness), we shall use a comprehensive graphical representation of all procedures in terms of how they affect the probability that gets assigned to individuals characterized (at this stage) uniquely by their attribute , namely an adjustment of the well known ROC space.^{9}^{9}9The receiver operating characteristics (ROC) graph is an illustrative method that provide a pictorial diagnostic reliability of a binary classifier system. For a more detailed analysis on such as a technique, we suggest to see [Fawcett2006].
Figure 1 represents  in a very intuitive way  the whole spectrum of possibilities that involves a procedure, as defined above. Any point in the diagram provides a complete description of the procedure by assigning a value (between and ) to and . That value represents the probability for an individual to receive a certain treatment given the merit feature that individual has.
Remark.
We notice that the probability , which defines the probability to be declared guilty and convicted if innocent, actually represents the false positive rate, and the probability , which defines the probability to be declared guilty and convicted if (really) guilty, actually represents the true positive rate. Sometimes the graph is represented by reversing the axis with the axis.
Remark.
We highlight that the use of the ROC space is quite general. Given (3.1) utility could be replaced by the criterion . That means that the diagram can also be used to represent the probability for an individual to fulfill the criterion deciding the treatment, given that the treatment is deserved (or not). Moreover, as the merit could be replaced by some arbitrary group , the ROC space can be used to represent the probability of a given treatment, given the group.
4.2 The perfectly just procedure
Intuitively, a procedure is perfect if, every time it is implemented, the distribution of the outcomes is perfectly justified by the theory of justice that is relevant to that domain. As proposed by [loi_philosophical_2019], a perfect procedure can be characterized as one that distributes equally between individuals who deserve the same treatment, and unequally between individuals who deserve a different treatment. It follows as a matter of definition that a perfect procedure is one that contains no inequality in among people equal in their merit, namely .
In probabilistic terms given a binary merit attribute () and a binary disadvantage attribute () a perfect procedure is a procedure that guarantees that
For example, those who deserve a punishment always get punished; those who deserve no punishment are never punished. It’s very important to highlight that for a binary utility the diagrams (constructed for , i.e., the distribution of convictions) already represent the same procedures described in relation to (i.e., to the distribution of acquittals). In fact, the following relations hold:
In our ROC space, this is point . The symmetric point that we omit from our argument gives the opposite, "perfectly unjust", procedure, in which all and only the innocent people are convicted.
4.3 Groupfair procedures
We define a procedure to be groupfair, and for the sake of brevity, "fair", in what follows, when individuals who deserve the same treatment are not more likely to receive a favorable treatment because of the group to which they belong. This is equivalent to requiring that morally arbitrary (i.e., non ) features be statistically irrelevant to the distribution of the benefit or harm assigned by the procedure, . For example, if being a woman is a morally arbitrary feature relative to criminal punishment, a procedure is fair only if all guilty defendants have the same chances of being convicted independently of their gender, and the same holds for all the innocent ones. This formalizes mathematically the intuitive idea that a fair procedure should not favor, intentionally or unintentionally, any individual on morally arbitrary grounds, i.e., its membership to a morally arbitrary group. Notice that fairness does not require that the probability of conviction/acquittal be the same for individuals who differ in , i.e., it is not unfair for innocents and guilty defendants to have different chances of conviction or acquittal.
Mathematically speaking, fairness relative to two groups and obtains if and only if:
A procedure is absolutely groupfair, that is to say, fair with respect all arbitrary groups, if and only if the probability of receiving disadvantage, , for those who are equally deserving, , is statistically independent of membership to any possible morally arbitrary group, (with integer number).
Remark.
We underline here that a group could be defined also by a singular individual, which constitutes a singleton (namely a set with exactly one element).
4.4 Kinds of nonperfectlyjust procedures
A procedure can fail to be perfectly just in two main different ways.
First, procedures can be unreasonably unjust. We define a procedure to be unreasonably unjust if it is inclined to deliver the opposite of what the individual deserves. Unjust procedures occupy the left portion of the ROC space. They are more likely to convict an innocent than a guilty defendant and more likely to acquit a guilty than an innocent defendant.
The perfectly unreasonably unjust procedure is, as already anticipated, the point , symmetric to in the ROC space. This is the point in which all and only the innocent people are convicted. If one were able to implement a perfectly just procedure, the perfectly unreasonably unjust procedure could be achieved by inverting the values of it attributes.
Second, some procedures are not unreasonably unjust, but imperfectly just. These are the procedures in the right side of the ROC space, with the exclusion of the point . They are not inclined to give individuals the opposite of what they deserve, but they make errors.
The rate of errors can vary. Imperfectly just procedures to the right of the line in the ROC space are more likely to convict a guilty defendant than an innocent defendant and in this sense can be considered approximations of justice, rather than approximations of injustice, overall.
There are three special kinds of imperfectly just procedures.
Meritagnostic procedures
Some imperfectly just procedures are perfectly meritagnostic, that is to say, entirely insensitive to merit (). In formal terms: the distribution of outcome is independent from the feature that justifies inequality, which we call "merit" (). That is
and^{10}^{10}10In binary cases, the second condition follows necessarily from the first.
where .
In the example of the criminal procedure, a procedure that is perfectly meritagnostic is equally likely to convict or acquit an individual who is innocent or guilty. This can be achieved, for example, by tossing a coin in order to decide whether the defendant should be convicted or acquitted. Coin tosses and similar "random" devices easily guarantee equal chances of acquittal for all innocent defendants, irrespective of their arbitrary traits. But they also guarantee the same chances of acquittal between innocent and guilty defendants, that is, they are fully agnostic with respect to . In the ROC space, these procedures are represented by the line .
Meritagnostic procedures are not unreasonably unjust and they can also be group fair, since individual outcome prospects are identical across groups, for people with the same merit features. Consider assigning convictions by tossing an unbiased coin. This procedure ignores merit entirely, which in this case is the defendant’s innocence or guilt. But it is also groupfair, as innocent individuals in every possible morally arbitrary group have exactly 50percent chances on average to be acquitted. Random procedures are the prime example of procedures that could be merit agnostic, under a given description.^{11}^{11}11We use "random" here to mean a procedure that is not "deterministic" in the sense explained in Section 3. One may argue that deterministic procedures could be meritagnostic in special cases. We will not discuss this case in what follows because it is irrelevant to our argument. For, if a meritagnostic procedure has to be fair to all groups, including singletons, it cannot be deterministic. (As we will argue next.)
Degenerate procedures
A special case of a perfectly meritagnostic procedure is the procedure that assigns the same outcome to every individual. The allinjail and freeforall points, and , in the ROC space satisfy this definition. These are procedures only in a degenerate sense, since they do not contemplate any option.
Semiperfect procedures
Finally, there could be procedures that are perfectly just for some values of , but not for others. In our binary example, this would be a procedure guaranteeing that no innocent person be convicted, but that wrongly acquits some guilty defendants. This would be perfect for innocents, but not for the guilty. Conversely, a procedure could be perfect for the guilty but not for innocents. That is, it could guarantee that all guilty defendants are convicted, but also convict some innocents.
5 Thesis and short argument
Our thesis is that if a procedure is deterministic, it is either perfect or there is at least one group, that is morally arbitrary, such that probabilities of obtaining the deserved treatment differ on average between the in and the out groups (i.e., group fairness is violated).
The argument can be summarized as follows. Suppose we are dealing with a deterministic procedure, where we can always conceive the set of determinant facts, , causing the acquittal of the defendant. These facts will either obtain, or not. If, when the facts obtain, the defendant is innocent, the procedure is perfect. Otherwise, the determinant facts will not obtain for all innocent individual, or they will also obtain for some guilty ones. In this case, the procedure is imperfect, and the two groups and are morally arbitrary. Innocent individuals have different chances of benefiting from the procedure, depending on their two groups or . Therefore, group fairness is not satisfied. This argument does not consider random procedures, those for which a set of determinant facts, , cannot be defined. We therefore conclude that, of imperfect procedures, only random procedures, can be group fair absolutely. If acquittal is decided by the throw of a coin, for instance, there is no set of facts (about the defendant and the circumstances of the case) prior to the throw itself which determines whether the defendant will be acquitted or convicted. No can be defined, which blocks our argument.^{12}^{12}12If a procedure includes a randomizing mechanism, such heads or tail, and all the causes that cause a coin to land head or tail, in a deterministic world, are included in the description of a procedure initial state, so that a certain outcome follows necessarily in standard operating conditions, then that procedure counts as deterministic too.
It may be objected that membership to is not morally arbitrary. Even if does not track innocence perfectly, it is also not unrelated to it entirely. If is a reasonably good criterion for acquittals, as it must be if we are in (vii), then tracks innocence to a certain degree. That is, defendants for whom are more likely to be innocent than defendants for whom . The rejoinder is to observe that, if someone were to use sex as the determinant criterion for a decision, even in a world in which it were reasonably well correlated with the just outcome this would still strike most people as unfair. The philosophical challenge is therefore to differentiate the obviously unfair criterion of sex (even when it can be correlated with the just outcome) from any other type of .
In Section 7 we will explain why this result is important from a moral and political point of view and should affect all subsequent theorizing about fairness in relation to imperfect procedures.
6 Detailed argument
We shall now provide the detailed argument. Here it is useful to keep in mind the ROC space that makes our line of reasoning easier. Let be a criterion and let us suppose the following logic equivalence between utility and criterion such that is a necessary and sufficient condition for . Thus, we immediately have
(6.2)  
(6.3) 
Now let us consider the following generic condition for a (binary) procedure:
Here and are the probabilities that someone be acquitted if one is, respectively, innocent or guilty of the crime charged to her. The conditions and can be deduced directly from our binary hypothesis (i.e., one can deduce the third and the fourth equation directly from the first and the second). We can divide the triangle of Figure 1 in its fundamental components distinguishing seven cases (three vertices, three edges, and the interior surface). These seven possibilities represent all the logical possibilities (we highlight, once again, that the triangle of Figure 1 represents the perversion of justice, with all its possible degrees of imperfection, and can be easily recovered by symmetry, but we are not considering it in our argument). We remind the reader that is the probability of conviction for the guilty defendant and for the innocent.
Vertices

Perfectly just procedure: and .

Everyone convicted: and .

Everyone acquitted: and .
Edges

Perfect for guilty defendants: and .

Perfect for innocent defendants: and .

Perfectly meritagnostic: with and . We are in the segment (the dots line) of the diagram (extremes excluded).
Interior surface

Generic imperfect procedures: (and , ). We are in the interior of the triangle (i.e., edges excluded).
Perfect procedures
Let us first of all suppose that a procedure is perfect. Then, by definition all people obtain what they deserve. That means that all and only the guilty people () are convicted (). Since the procedure reaches the outcome (conviction) when and only when there is sufficient evidence of the crime (), a perfect procedure is also one in which the criterion for acquittal is never satisfied in the case of every actual crime () and it is always satisfied when the defendant was actually not responsible for it (). Or in other words, the criterion for acquittal perfectly tracks the facts of the crimes and the merit of the defendant.
This procedure is perfectly fair because it ensures that people with the same value of have the same outcome, hence their probabilities to obtain those outcomes are the same. Perfectly just procedure are fair, but unfeasible.
Meritagnostic procedures
By definition of (vi), (ii) and (ii) the procedure is insensitive to differences in merit . As explained above, it is important to highlight that group fairness obtains absolutely only if it obtains for all logically possible morally arbitrary groups. This includes those groups containing a single individual. It is trivial to notice that the person belonging to the singleton can be either innocent () or guilty (), at least in our binary model. Indeed it follows immediately by definition that these procedures can be described with , and . This induces the independence to any a priori criterion.^{13}^{13}13We exclude the a posteriori criterion of the lottery procedure , e.g., the winners of a lottery, where is not the initial state of the procedure, but coincides with its outcome. For simplicity, suppose that the population consists of several groups, each of which contains one and only one individual (e.g., there is only one person at a given intersection of gender, race, genetic, and all other attributes), each of which is either innocent or guilty. A deterministic procedure can only be meritagnostic if it acquits the single individual in a certain proportion (say, ) of groups of innocent defendants and if it acquits the individual in the same proportion of groups of guilty ones. Clearly, this violates group fairness because there are morally arbitrary groups in which innocents are certainly acquitted () and others in which they are certainly convicted (). The example generalizes.
Let us consider all other cases: namely (v), (vi) and (vii).
Imperfectly just procedures
Let us begin with (vii). Mathematically speaking, the argument for (vii) can be formalized, without loss of generality, as follows. Let us consider , namely
However, by using condition (6.3), we have immediately
(6.4) 
and
(6.5) 
By these two equations the role of is fulfilled: the original procedure collapses in the two possible results provided by (6.4) and (6.5). Indeed, the criterion , by definition, represents a necessary and sufficient condition for the outcome . This implies directly that acquittal and conviction are entirely independent of and fully determined by . It follows that individuals with the same values of have different chances depending on the value of . Thus, the groups identified by the criterion (we recall that and are complementary in the binary case) actually identify an unfair procedure.
Semiperfect procedures
Let us now consider the special cases (v) and (vi). The procedures in (v) are fair to the innocents: since all innocents are certain to be acquitted, it trivially follows that innocent defendants of different groups have the same chances. But these procedures are not fair to the guilty, since guilty individuals have different chances of conviction depending on their membership to . Mutatis mutandis, it should be clear that the procedures in (iv) are fair to the guilty, but not to the innocents.^{14}^{14}14This is not to say that the procedure is insensitive to merit, for, by assumption, the imperfect procedures in question are such that they gives guilty defendants higher chances of conviction than innocent ones, that is . We underline that a similar argument can be provided for by using condition (6.2).
7 Discussion
Ideally, groupfairness should constrain the way an imperfect, deterministic procedure achieves (a given degree) of justice. This seems possible when the number of group regarded as morally arbitrary is limited, as in the Example 3.1. However, if one requires the fulfillment of group fairness with respect to all morally arbitrary groups, this can only be achieved by avoiding determinism. In a nondeterministic procedure, a given initial state of the procedure may correspond to some outcomes in one case, and to other outcomes in another, without any change in its inputs,^{15}^{15}15Or conditions, or circumstances, that can be modelled as the initial state of the procedure. in normal operating conditions. This is problematic since this random element can be seen as the opposite of what justice should try to achieve where there is a clear moral ground (e.g., innocence) to justify an outcome.
One could take this as a reason to reject groupfairness altogether. If so, when assessing procedures that are imperfect, one simply ought to minimize injustice, disregarding groupdistributive effects altogether. A less radical option is to amend our account of groupfairness. This requires a different theory of what makes certain groups "morally arbitrary". "Different" here means that the concept of the morally arbitrary is not reducible to "something other than what, morally speaking, justifies inequality". This is the direction in which we want to push our argument, philosophically. We believe our result is best interpreted as an argument supporting the quest for such philosophical view.
We are aware that we make certain assumptions about the nature of procedures that are quite restrictive. We hope this is seen as a useful attempt to describe the nature of at least an important kind of procedure.
8 Conclusions
In this paper we have provided a mathematical proof that a decidable procedure cannot fulfill groupfairness in relation to all possible relevant groups. Arguably this result generalizes to all procedures that are deterministic under certain descriptions.
Group justice, as defined, bears significant analogies to certain measures of fairness in the machine learning literature, in particular equality in the false positive/negative rates of different groups. If our argument is correct, group fairness can be defined in a similar way in relation to justice. This shows that the relation between group fairness, understood as quality of imperfect procedures, and justice, is normally problematic.
Hopefully our paper will motivate other scholars to study questions of groupfairness in imperfect procedures, regarded as a general type of procedures, conceptually broader than statistical models.