## 1 Introduction

Peer review plays a prominent role in nearly every aspect of academia. In addition to screening scientific contributions, peer review is also a major component in grant applications, book reviews, and academic promotions. Peer review serves a number of functions: selecting the best manuscripts, assessing originality, providing feedback, pointing out unacknowledged prior work, deciding the significance of new work, as well as flagging fraudulent or plagiarised work [mulligan2013peer]. The results of peer review impact hiring, promotion, and tenure decisions indirectly through gatekeeping of prestige. Given the broad application of peer review and its significant gatekeeping role, it is imperative that this process be as objective as possible.

One important parameter is whether reviewers possess the proper expertise for their assigned papers.
Selecting reviewers for submitted papers is therefore a crucial first step of any reviewing process. In large conferences such as NeurIPS/ICML/AAAI/IJCAI, reviewer assignment is largely automated through systems such as the Toronto Paper Matching System (TPMS) [charlin2013toronto], Microsoft CMT^{1}^{1}1https://cmt3.research.microsoft.com/, or OpenReview^{2}^{2}2https://github.com/openreview/openreview-matcher.
Inappropriately assigned reviewers may lead to failures: misinformed decisions, reviewer disinterest, and a general mistrust of the peer-review process.

Accuracy and fairness are two important criteria in reviewer assignment [shah2019principled], and fair division in general [brandt2016handbookfairdiv]. Overall assignment accuracy enables global objectives, such as maintaining quality standards for conferences, grants, and journals. However, it is imperative that we do not sacrifice review quality on some papers to obtain higher overall matching scores. Papers which receive poorly matched reviewers may be unfairly rejected or receive unhelpful feedback, causing the authors real harm. We thus desire algorithms which are globally accurate and fair.

To accomplish these goals, we consider the fair reviewer assignment problem as an instance of a *fair allocation* problem.
The theory of fair allocation offers a principled way of discussing fairness and optimality in assignment problems.

Our principal fairness criterion is *envy*: one paper envies another paper if it prefers the other’s assigned reviewers over its own. It is generally not possible to obtain envy-free allocations for indivisible items [brandt2016handbookfairdiv], so we focus on the criterion of envy-free up to one item (EF1) [budish2011ef1, lipton2004approximately].
We require that for all pairs of papers, either paper does not envy paper , or paper has a reviewer such that paper would not envy if that reviewer were dropped.
In standard fair allocation settings, the well-known *round-robin* (RR) mechanism produces EF1 allocations by setting an order of agents, and letting them select one item at a time. However, the constraint that no paper can be reviewed by the same reviewer twice makes round-robin not EF1. To mitigate this problem, we present a variation on classic RR, which we term *Reviewer Round Robin (RRR)*.

While RR mechanisms are known to satisfy fairness constraints, their efficiency guarantees are highly dependent on the order in which players pick items. For example, consider a stylized setting where there are two papers ( and ) and two reviewers ( and ): paper views both reviewers as equally qualified, and assigns them an affinity score of 5 each; paper is highly affiliated with , but has zero affiliation with , assigning a score of 10 to and 0 to . A round-robin mechanism that lets pick first runs the risk of having pick , leaving with . Letting pick first would have resulted in a much better outcome, without compromising on fairness. Prior works show that it is generally difficult to identify optimal picking sequences [bouveret2011general, kalinowski2013social, aziz2016welfare, aziz2015possible], which raises the question: can we identify *approximately optimal* player orders? This is where our work comes in.

### 1.1 Our Contributions

We run a combinatorial search for orders of papers that yield high efficiency allocations under the RRR mechanism. To do so, we examine the problem of finding an optimal paper order via the lens of *submodular optimization*.
To do so, we optimize a function on partial player sequences, which varies according to the welfare of the RRR allocation respecting that partial sequence. Though this function is not submodular in general, we can capture its distance from submodularity via a variable . Our main theoretical result (Theorem 3.2), which is of independent interest, shows that a simple greedy approach maximizes this function up to a factor of .

We implement our greedy RRR algorithm, and test it on three real-world conference datasets. We compare our approach to four other state-of-the-art paper assignment frameworks. Not only is RRR the only provably EF1 approach, it is considerably faster than the only other method with comparable fairness metrics. Finally, RRR maintains high utility guarantees, consistently outperforming its computationally feasible competitors.

### 1.2 Related work

The reviewer assignment problem has been extensively studied (see [wang2008survey]

for a survey). Most works model the problem as a mixed-integer linear program; the Toronto Paper Matching System (TMPS) is perhaps the most notable early work presenting this formulation

[charlin2011framework, charlin2013toronto]. Most works assume a list of submitted papers and available reviewers, along with a matrix of*affinities*for each reviewer-paper pair [kobren2019paper, charlin2013toronto]. It is assumed that these affinities accurately model the expected review quality provided by each reviewer for each paper. In general, existing work has sought to maximize the sum of the affinities of all reviewer-paper pairs (the

*utilitarian social welfare*) subject to load constraints on reviewers and coverage requirements for papers [charlin2013toronto, conry2009recommender, kobren2019paper, stelmakh2019peerreview4all].

A number of prior works consider fairness objectives in peer review, though none of them consider envy-freeness up to one item. hartvigsen1999conference ensure that at least one qualified reviewer is assigned to each paper. kobren2019paper present two algorithms, FairFlow and FairIR, that maximize the sum of paper scores subject to the constraint that every paper must receive a paper score above a specified threshold. odell2005paper

maximize the minimum paper score, noting that this objective may be at odds with maximizing the sum of paper scores. The PeerReview4All algorithm from

stelmakh2019peerreview4all extends the max-min objective by maximizing the minimum paper score, then maximizing the next smallest paper score, etc. There are important connections to social choice objectives; maximizing the minimum paper score is equivalent to maximizing egalitarian welfare, while PeerReview4All maximizes the leximin criterion. A number of works study fair assignment of papers to reviewers, allowing reviewers to express preferences over papers by bidding [garg2010assigning, lian2018conference, aziz2019constrained]. This setting aims to be fair to the reviewers rather than the papers, as is the case for our work^{3}

^{3}3We argue that it is more appropriate to treat reviewers, rather than papers, as goods: paper reviewing is generally viewed as a chore, not a benefit (the current submission being an obvious exception), whereas papers do benefit from appropriate reviews.. Other works target reviewer assignments with properties besides fairness or efficiency; long2013good avoid conflicts of interest, while kou2015weighted and ahmed2017diverse focus on assigning sets of reviewers with diverse interests and full coverage of the paper’s topics.

Existing work on fair allocation of indivisible items is also relevant. bei2019price and barman2020settling study the price of fairness, the ratio between the maximum utilitarian welfare and the maximum utilitarian welfare for any fair allocation, for various fairness definitions. caragiannis2012efficiency find the price of envy-freeness to be for agents, when an envy-free allocation exists. bei2019price show that the worst-case price of round-robin is for agents and goods, while barman2020settling conclusively determine the price of EF1 is [barman2020settling]. barman2019fair show that maximizing the utilitarian welfare subject to EF1 is not polynomial-time approximable. Maximizing the utilitarian welfare under round-robin is also NP-complete. aziz2015possible present a problem top- PossibleSet, which given a class of allocation mechanisms (such as round-robin) determines if an agent can receive their top- goods. They show this problem is NP-complete for round-robin allocations. Following techniques from aziz2016welfare, a simple reduction from top- PossibleSet to the problem of determining if a given welfare is achievable through round-robin proves that maximizing USW over round-robin orders is NP-complete. These results suggest that approximately maximizing the welfare over round-robin allocations is a reasonable objective, especially for small .

Two works in fair allocation are particularly relevant for fair reviewer assignment, but they both have important limitations in our setting. aziz2019constrained present an algorithm which takes a constraint as input, and outputs a -satisfying EF1 allocation if possible. can include a minimum threshold on utilitarian welfare, and it can also incorporate arbitrary constraints on allocations. Unfortunately, the runtime of CRR in our setting is , where is the number of papers, is the number of reviewers per paper, and is the complexity of checking (which is quite demanding). biswas2018fair present a modification of the round-robin mechanism that assigns a complete EF1 allocation when items are partitioned into categories and agents can receive a limited number of items from each category. Their framework almost applies to reviewer assignment, but they do not include a way to limit the total number of items received by each agent. In addition, they offer no efficiency guarantees beyond completeness.

## 2 Preliminaries

We represent reviewer assignment as a problem of allocating indivisible goods, with papers as agents and reviewers as goods. To simplify notation, given a set and an element , we often write instead of . We are given a set of papers , and a set of reviewers . Each paper has a valuation over reviewers , which defines how much value each reviewer provides to the paper. The value typically models alignment between reviewer expertise and paper topics, and can incorporate other relevant notions like reviewer bids and conflicts of interest; several works study how these values are generated [charlin2013toronto, kobren2019paper], and are orthogonal to our work.

Papers generally receive more than one reviewer, so we define valuation functions over sets of reviewers. We assume *additive* valuation functions, where for all papers and subsets , . An *assignment* or *allocation* of reviewers to papers is an ordered tuple where each is a set of distinct reviewers assigned to paper . We can refer to as paper ’s *bundle*.

Each reviewer has an upper bound on the number of papers they can review. No reviewer can be assigned to the same paper twice. To ensure a roughly even distribution of reviewers, we will require that each paper receives at most reviewers. When all papers are assigned distinct reviewers, we call that allocation *complete* (and *incomplete* otherwise). We try to achieve complete allocations whenever possible.

We now discuss our notion of fairness. An allocation is considered *envy-free* if for all pairs of papers and , . This criterion is not achievable in general (consider the example of two papers and one reviewer whose capacity is ), so we relax the criterion. An allocation is envy-free up to one item (EF1) if for all pairs of papers and , there is a reviewer such that .

The *utilitarian social welfare* (“utilitarian welfare” or “USW”) of an allocation is the sum of the papers’ valuations under that allocation:

is a natural objective in the context of reviewer assignment, and has been used in many prior works on this topic [charlin2013toronto, conry2009recommender, kobren2019paper, stelmakh2019peerreview4all].

We also use the *Nash social welfare* or “NSW” as a second measure of efficiency in our experimental evaluation:

Nash welfare is another common efficiency measure, and allocations with high NSW provide a balance of efficiency and fairness [caragiannis2019unreasonable].

For round-robin, we define an *order* on papers as a tuple , where is the set of papers in the order and is a permutation on mapping papers to positions. We slightly abuse notation and say that a paper if .
For any , we say that if and only if .
We sometimes write when is clear from context. We can think of an order as an ordered list such that for all positions . We use the notation to indicate the order that appends to the end of . Formally, , for , and .

## 3 Fair and Efficient Reviewer Assignment

### 3.1 Reviewer Round-Robin

We first describe how to obtain EF1 reviewer assignments, before turning our attention to efficiency. To ensure EF1, our allocations will draw on the simple and well-known round-robin mechanism for assigning goods to agents. Given an ordered list of agents, round-robin proceeds in rounds. Each round, we iterate over the agents in the assigned order, allowing each agent to pick the highest valued remaining good. The process terminates when all goods have been chosen. The resulting allocation is EF1 for additive valuations by a simple argument [caragiannis2019unreasonable]. For any agent , we divide the item selections into rounds specific to that agent . Round includes the first item chosen by all agents 1 through . Each round consists of agent selecting an item first, followed by all other agents and ending with agent or the last agent which receives the final good in the final round. For all rounds after , agent prefers the item it selected over any other agent’s item selected in that round (in the last round, some agents may get nothing). Thus agent prefers its own bundle to the bundle of any agent , and it prefers its own bundle to that of any agent if we ignore ’s good from round .

In our setting, we have two additional constraints that are not present in the setting examined by caragiannis2019unreasonable. Papers must select at most reviewers, and the reviewers must be distinct. A trivial modification of round-robin allows us to satisfy the first constraint — proceed for exactly rounds, then stop. We might naively update round-robin to satisfy the distinctness constraint as well, by allowing each paper to take the best reviewer they do not already have. However, the argument from caragiannis2019unreasonable fails. To see why, suppose a paper selects a reviewer in one round. In the next round, may still prefer over any other reviewer, but cannot select it. The paper may be forced to select a much worse reviewer, allowing another paper to take that desired “second copy” of . A more detailed counterexample is shown in Table 1.

We present a modification of the round-robin mechanism that produces reviewer assignments which satisfy all constraints and are EF1. Algorithm 1 forbids any selection that violates a crucial invariant for proving EF1. This invariant derives from the proof of EF1 in the additive case. Any time a paper would select a reviewer such that EF1 would be violated, we forbid the selection and require the paper to select a different reviewer. EF1 violations can only arise when another paper preferred to take that reviewer but could not, either because it had already taken it, or because it would have caused an EF1 violation for that paper as well. Papers always attempt to select reviewers in preference order. Thus when a paper attempts to select a reviewer , we only need to check for EF1 violations against the other papers that have attempted to select in the past. Theorem 3.1 proves that Algorithm 1 produces EF1 allocations which satisfy all reviewer assignment constraints.

###### Theorem 3.1

Algorithm 1 terminates with an EF1 allocation where papers receive all distinct reviewers, no reviewer is assigned to more than papers, and all papers have no more than reviewers.

###### Proof

The algorithm assigns at most one reviewer to each paper in each round for rounds, so the constraint that all papers receive at most reviewers is satisfied. In addition, the algorithm always checks that and the number of papers which already have is no more than before assigning to . Thus no paper receives duplicate reviewers and reviewer upper bounds are satisfied.

We now prove that the returned allocation is EF1. Consider some arbitrary paper ; we show that does not envy any other paper by more than 1 reviewer. As in the original round-robin argument, we divide the assignments of reviewers to papers into rounds , where ( only when the algorithm terminates early). Round contains the assignments made during iteration 1 of Algorithm 1 to papers . Rounds through begin with the assignment of a new reviewer to and end with the assignment of a new reviewer to , while round begins with assignment to and ends with assignment to some paper after .

Consider the bundle assigned to some paper after the end of some round . Recall that is the set containing the first reviewer assigned to in Algorithm 1. We will define modified bundles for all , and prove by induction that . For all , let if , and let if .

For the base case, we see that after round , for all and , so .

Now suppose that after round , we have for all . Suppose there is some such that after round , . selects a reviewer in all rounds except , and because valuations are non-negative must select a reviewer in round to obtain . By the inductive hypothesis and the fact that valuations are additive, must prefer the reviewer chose in () to the reviewer chose in (). Because chose first in , this means that attempted to take either in or earlier and must have checked for envy against .

If , then for to successfully take we would require . Similarly, if , then for to successfully take we require .

In both relative orders of and , we contradict our earlier assumption that . This implies that for all and , in particular . Since is either or with , we have that the final allocation is EF1.

∎

It is fairly straightforward to show that RRR always returns a complete, EF1 allocation when the number of reviewers is large.

###### Proposition 1

Given a reviewer assignment problem with reviewers, papers, and paper bundle size limits, where , Algorithm 1 always returns a complete and EF1 allocation.

###### Proof

Algorithm 1 only refuses to assign a reviewer to a paper when is assigned to too many papers, has already been assigned to , or some other paper that has attempted to take “objects” to the assignment. Thus if we have assigned distinct reviewers under Algorithm 1, it must be the case that there is a reviewer that has not been considered by any paper. Because there are at least distinct reviewers, we can see that during any round of the algorithm, there will be such an unconsidered reviewer. Thus in any round, a paper can always select some reviewer that has never been considered by any paper, and the selection will not be refused. This proves that the allocation returned by Algorithm 1 is complete, and we have EF1 from Theorem 3.1. ∎

We hypothesize in Conjecture 1 that it is not always possible to assign all papers reviewers and still achieve EF1; the existence of an instance that does not admit a complete EF1 assignment is left as an open problem.

###### Conjecture 1

There exists a reviewer assignment problem with papers , reviewers , valuation functions , reviewer load bounds , and paper bundle size limit such that no allocation is complete and EF1.

### 3.2 Optimizing Orders for Rrr

We have shown how to provably obtain an EF1 allocation of reviewers to papers, but we have no welfare guarantees for this mechanism.
In this section, we present a simple greedy approach to maximize the USW of the RRR mechanism by optimizing over the *ordering* of the papers. We define a function , which represents the USW from running Algorithm 1 on agents in the order with reviewers , reviewer limits , valuation functions , and paper bundle size limit . When it is clear from context, we will drop most of the arguments, writing to indicate that we run Algorithm 1 with the order and all other parameters defined by the current problem instance. Our algorithm will maintain an order , always adding the paper which maximizes .

The algorithm is presented in Algorithm 2. It returns an order on agents, which can be directly input to Algorithm 1 to obtain an EF1 allocation of reviewers. This algorithm is very simple and flexible. It admits trivial parallelization, as the function can be independently computed for each paper. One can also reduce runtime by subsampling the remaining papers at each step. Subsampling weakens the approximation guarantee in theory; while we do not attempt to analyze the approximation ratio of the subsampling approach in this work, we run our largest experiments with this variant, and still maintain good allocations. Let us now establish the welfare guarantees of Algorithm 2.

We first review some important concepts and define terms which will be used in the proof.
A *matroid* [oxley2011matroid] is a pair with ground set and independent sets , which must satisfy .
Independent sets must satisfy the inclusion property: , , and the exchange property: with , such that .
A *partition matroid* is defined using categories and capacities ;
the independent sets are .
Given two matroids over the same ground set and , the intersection of the two matroids is the pair . Note that the intersection of two matroids may not be a matroid [oxley2011matroid].

We also use the notion of a submodular set function; submodular functions formalize the notion of diminishing marginal gains. For a set function , a set , and an element , we can write the marginal gain of adding to under as or simply if is understood from context.
Given a set , a function is *submodular* if for all and , .
A set function is *monotone* if for all , . We define the notion of -weak submodularity for monotone, non-negative functions. Given a monotone, non-negative function , we say that is -weakly submodular if for all and , .
Note that when we recover submodularity and we always have .

We show that Algorithm 2 is equivalent to greedily maximizing a -weakly submodular function over the intersection of two partition matroids; its approximation guarantee worsens for larger values of .

Consider tuples of the form where is a paper and represents a position in an order. We define a mapping between sets of tuples to orders. Consider the set . We define two partition matroids and , where , and . Intuitively, forbids duplicating papers, while forbids duplicating positions. Any set in the intersection of these two matroids can be converted into a paper order by sorting on the position elements and outputting the paper elements in that order. Formally, given any set , we can construct an order by taking . For all , let and set . An example of this process is given in Table 2. We can extend this mapping to all subsets of by sorting on the position elements as a primary key and paper elements as a secondary key, then deleting all but the first tuple for each paper.

With these constructions defined, we now see that maximizing the USW for any round-robin allocation is equivalent to the problem for the matroids defined above. We will show that Algorithm 2 greedily maximizes a monotonically increasing version of our function over our two partition matroids. Next, we show that when our function is -weakly submodular, we can provide a -dependent approximation ratio.

To make monotonically increasing, we will multiply by a factor of , where is defined as the smallest positive number such that is monotonically increasing. We first prove that Algorithm 2 greedily maximizes . More formally, we prove Lemma 1 that the algorithm selects the element that maximizes at each iteration.

###### Lemma 1

Let for some such that is monotonically non-decreasing. Suppose that Algorithm 2 selects agent at each round , and denote the resulting set of tuples as . Then for all , maximizes over .

###### Proof

We first show that any greedy maximizer for is also a greedy maximizer of . Suppose that . Because , we have

as desired.

We also must prove that we can always simply append to the end of the current ordering (rather than perhaps selecting an arbitrary tuple ). Formally, we want to show that at any point in the algorithm, there is a tuple that maximizes . This is shown via a strong induction argument. For the base case, if some tuple maximizes , then it is easy to see that . Inductively, assume that we have a set and some tuple maximizes such that . Necessarily, , since all other positions have been filled in . Therefore, for any available position , we have that and thus is the same for all allowed . So without loss of generality, we can assume that we can always select the best agent in the next available position (as we do in Algorithm 2). ∎

The greedy algorithm for maximizing terminates when , so we must also ensure Algorithm 2 terminates with an order on all papers. Although Algorithm 2 only considers , which may not be monotonically increasing, by construction it continues until a full order over all papers is reached. Therefore, Algorithm 2 is exactly equivalent to greedily maximizing .

We are now ready to prove Theorem 3.2. Our proof is inspired by the proof in [fisher1978analysis] that a similar greedy algorithm gives a -approximation for maximizing a monotone submodular function over the intersection of matroids. However, the introduction of -weak submodularity changes the nature of the proof. Directly applying the techniques in [fisher1978analysis] would result in a trivial bound. To obtain our guarantee, we must also rely on the fact that at each step , we select exactly one tuple of the form . The optimal solution also selects exactly one such tuple as well, thus the overlap between the tuples considered in round by GRRR and the optimal solution is at most 1.

###### Theorem 3.2

Suppose that is the monotonically increasing, -weakly submodular function . The set returned by Algorithm 2 satisfies , where is the optimal paper order for RRR.

###### Proof

Let represent the subset of after the -th step of Algorithm 2, where we add the element to . Let denote the pair in which places paper in position . Denote . Consider the elements of , ordered so that . Let denote (with ). We bound :

(1) |

where the inequality in (1) holds by monotonicity of . By -weak submodularity of , we have that

(2) |

Equality 1 and inequality 2 imply that is upper bounded by

(3) |

with inequality (3) holding by monotonicity of . By -weak submodularity of , we know that for all , . Applying this to (3), we get

(4) |

Next, we claim that for all ,

(5) |

At step , the greedy algorithm chose to add to , with maximizing . If is not present in for any , then the greedy algorithm would have considered adding and determined that was better. Suppose that for some . The greedy algorithm proceeds by filling positions from left to right, so . By the definition of our mapping from sets to orders, will take position and ignore . Thus . In either case, inequality (5) holds. Combining (4) with (5) yields

completing the proof. ∎

When ( is submodular), Theorem 3.2 yields a -approximation guarantee, which beats the -approximation guarantee provided by fisher1978analysis. The greedy algorithm is a tight -approximation for submodular maximization in the *unconstrained* regime [buchbinder2012tight], which our result matches even though we operate in a constrained (albeit less general) space.

## 4 Experimental Results

We run experiments on the three conference datasets used in [kobren2019paper]

: Medical Imaging and Deep Learning (MIDL), Conference on Computer Vision and Pattern Recognition (CVPR), and the 2018 iteration of CVPR. A summary of the data statistics appears in Table

3. We note that for CVPR’18, while affinities are between 0 and 11, most are between 0 and 1. In addition, reviewer load bounds vary by reviewer but range between 2 and 9. Because of the size of the CVPR’18 dataset, we subsample 100 papers at each iteration of Algorithm 2rather than testing every available paper and report means and standard deviations over 5 runs. An implementation of GRRR and all baselines are available on GitHub

^{4}

^{4}4https://github.com/justinpayan/ReviewerAssignmentCode.

Name: | val. range | rev. load | |||
---|---|---|---|---|---|

MIDL: | 118 | 177 | 3 | 4 | |

CVPR: | 2623 | 1373 | 3 | 6 | |

CVPR’18: | 5062 | 2840 | 3 |

Alg. | USW | NSW | Min Score | EF1 Viol. | |
---|---|---|---|---|---|

FairFlow | 1.67 | 1.62 | 0.94 | 0 | |

FairIR | 1.71 | 1.65 | 0.93 | 0 | |

MIDL | TPMS | 1.71 | 1.65 | 0.90 | 0 |

PR4A | 1.68 | 1.64 | 0.92 | 0 | |

GRRR | 1.68 | 1.63 | 0.83 | 0 | |

FairFlow | 1.67 | 1.56 | 0.77 | 8813 | |

FairIR | 2.05 | 1.84 | 0.27 | 35262 | |

CVPR | TPMS | 2.08 | 0.00 (1.99) | 0.00 | 473545 |

PR4A | 1.96 | 1.89 | 0.77 | 82 | |

GRRR | 1.82 | 0.00 (1.72) | 0.00 | 0 | |

FairFlow | 17.94 | 17.38 | 11.26 | 37 | |

FairIR | 22.18 | 21.57 | 7.19 | 18 | |

2018 | TPMS | 22.23 | 21.11 | 1.37 | 134 |

PR4A | 21.48 | 21.12 | 12.68 | 2 | |

GRRR | 0 |

We compare against the FairIR and FairFlow algorithms [kobren2019paper], the Toronto Paper Matching System [charlin2013toronto], and PeerReview4All [stelmakh2019peerreview4all]. Following kobren2019paper and stelmakh2019peerreview4all, we only run one iteration of PeerReview4All (PR4A) on CVPR and CVPR’18. On those two conferences, PR4A maximizes the minimum paper score, but stops before maximizing the next smallest score. We run FairIR and FairFlow using the default configuration of the algorithms in the code from [kobren2019paper]. We also implemented the Constrained Round Robin algorithm [aziz2019constrained], implementing the test for

-satisfiability as a greedy heuristic followed by a Gurobi implementation of the fully specified linear program. We use

as the criterion for this experiment. CRR is approximately 40 times slower than GRRR on MIDL, taking 400 seconds instead of 10. GRRR admits parallelism, and takes about 18 hours to run on CVPR. Extrapolating these results, we can expect CRR to require a month of computation time or longer on CVPR (it did not terminate in our experiments). Given its infeasible runtime, we did not continue to compare against CRR as a baseline. All experiments were run on a Xeon E5-2680v4 processor with 128 GB memory.We report the USW, NSW, minimum paper score, and number of EF1 violations for each algorithm. When the lowest-scoring paper receives a bundle worth 0, the NSW is equal to 0. When this occurs, we denote the NSW as 0, but display the NSW on all positively scoring papers in parentheses. We report the USW normalized by the number of papers

(equivalent to the mean paper score), and for NSW we report the geometric mean (as in the definition given in Section

2). Thus USW and NSW are directly comparable, representing the arithmetic and geometric mean paper scores respectively. For an allocation , the number of EF1 violations is the number of pairs of papers failing EF1. Note that there are total potential violations.The results on all three datasets are summarized in Table 4. First, we note that no algorithm has EF1 violations on MIDL, so EF1 is very easily achieved on this conference. The CVPR conferences have a more interesting profile of EF1 violations, but no algorithm besides ours achieves EF1 on all three conferences. Although GRRR obtains a minimum paper score of 0 on CVPR (and also a NSW of 0 as a result), only three papers receive a score of 0. This small problem may be rectified by increasing the assignment limits of a few reviewers or allowing a small number of EF1 violations for these three papers.

Alg. | Lowest 10% | Lowest 25% | Gini | Envy | |
---|---|---|---|---|---|

FairFlow | .143 | .944 | |||

MIDL | FairIR | .147 | .501 | ||

PR4A | .127 | .448 | |||

GRRR | .145 | .834 | |||

FairFlow | .207 | 4.606e4 | |||

CVPR | FairIR | .231 | 7.100e4 | ||

PR4A | .145 | 0.928e4 | |||

GRRR | .183 | 2.240e4 | |||

FairFlow | .142 | 7771 | |||

2018 | FairIR | .119 | 5547 | ||

PR4A | .103 | 1367 | |||

GRRR | .168 | 2.884e4 |

Statistics to estimate amount of inequality for GRRR and all three baseline fair reviewer assignment algorithms. After running each algorithm, we compute the bottom 10% of papers by score and the bottom 25% by score, then report the mean and standard deviation for both low-percentile blocks. We also report the Gini coefficient of all paper scores and the sum of the envy across all paper pairs, where lower is better for both.

GRRR is the only algorithm to achieve the EF1 goal, but it is not clear if its allocations are more globally fair than those of the other algorithms. To more directly measure inequality, we compute the mean and standard deviation of paper scores in low percentiles of paper scores (10% and 25%). We consider allocations to be more fair if they allocate higher scores to these disadvantaged papers. We also calculate the Gini coefficient for each allocation, a standard measure of inequality. A higher Gini coefficient indicates more inequality, so lower is better for this metric. Finally, we report the sum of the total envy over all pairs and , . The results are summarized in Table 5. We report statistics for only one run of our subsampled GRRR on CVPR’18. Interestingly, we find that PR4A performs best across all datasets and fairness metrics. However, GRRR consistently outperforms FairFlow and is significantly better than FairIR on CVPR (which appears to be the most challenging dataset, based on the number of EF1 violations in Table 4).

To give an indication of the practical guarantees of Theorem 3.2, we estimate and on all three datasets. For any order and any paper , we must have that . When , any positive will satisfy this inequality. We estimate by sampling orders and papers , and we take our estimate to be slightly greater than the maximum found for any and paper . For MIDL, we found that no sampled and violate monotonicity, so we set to be 0.01. Using our estimated values, we then estimate . Here, we sample and so that , and . We then note that we need for all samples. Similarly to our estimate, we compute for all samples and then estimate to be slightly greater than the maximum value. We found in all experiments that our chosen parameter led to all positive marginal gains during the estimation, improving our confidence in the estimates. The results are displayed in Table 6. The approximation guarantee is meaningful (about ) for MIDL, but not for the other two datasets.

## 5 Conclusion

The reviewer assignment problem provides interesting constraints, complicating the standard fair allocation setting. In this work, we demonstrate that a greedy approach finds EF1 allocations with high USW in the reviewer assignment setting. We hope that our approach of optimizing over *orders* for round-robin allocations inspires more study of optimal round-robin allocations. We left Conjecture 1 open, and providing a counterexample would further justify RRR. Finally, there are many applications of different fairness, efficiency, robustness, or non-manipulability constraints from the fair allocation literature to problems in peer review, which could bring some much-needed rigor to this fundamental process.

## Acknowledgments

We would like to thank Vignesh Viswanathan for helpful discussions. This work was performed in part using high performance computing equipment obtained under a grant from the Collaborative R&D Fund managed by the Massachusetts Technology Collaborative.