# Random Utility Theory for Social Choice

Random utility theory models an agent's preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce.

## Authors

• 2 publications
• 35 publications
• 50 publications
10/05/2018

### Social Choice Random Utility Models of Intransitive Pairwise Comparisons

There is a growing need for discrete choice models that account for the ...
07/27/2020

### A Probabilistic Spectral Analysis of Multivariate Real-Valued Nonstationary Signals

A class of multivariate spectral representations for real-valued nonstat...
10/16/2012

### A Maximum Likelihood Approach For Selecting Sets of Alternatives

We consider the problem of selecting a subset of alternatives given nois...
06/06/2021

### Hierarchical Bayesian Mixture Models for Time Series Using Context Trees as State Space Partitions

A general Bayesian framework is introduced for mixture modelling and inf...
09/02/2020

### Improving ERGM Starting Values Using Simulated Annealing

Much of the theory of estimation for exponential family models, which in...
11/26/2009

### A conversion between utility and information

Rewards typically express desirabilities or preferences over a set of al...
03/04/2022

### Quantification of emotions in decision making

The problem of quantification of emotions in the choice between alternat...

## Code Repositories

### libmao

math, statistics, randomization, and other useful common grad-school utilities

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Problems of learning with rank-based error metrics [16] and the adoption of learning for the purpose of rank aggregation in social choice [8, 7, 30, 29, 25, 23] are gaining in prominence in recent years. In part, this is due to the explosion of socio-economic platforms, where opinions of users need to be aggregated; e.g., judges in crowd-sourcing contests, ranking of movies or user-generated content.

In the problem of social choice, users submit ordinal preferences consisting of partial or total ranks on the alternatives and a single rank order must be selected to be representative of the reports. Since Condorcet [6], one approach to this problem is to formulate social choice as the problem of estimating a true underlying world state (e.g., a true quality ranking of alternatives), where the individual reports are viewed as noisy data in regard to the true state. In this way, social choice can be framed as a problem of inference.

In particular, Condorcet assumed the existence of a true ranking over alternatives, with a voter’s preference between any pair of alternatives

generated to agree with the true ranking with probability

and disagree otherwise. Condorcet proposed to choose as the outcome of social choice the ranking that maximizes the likelihood of observing the voters’ preferences. Later, Kemeny’s rule was shown to provide the maximum likelihood estimator (MLE) for this model [32].

But Condorcet’s probabilistic model assumes identical and independent distributions on pairwise comparisons. This ignores the strength in agents’ preferences (the same probability is adopted for all pairwise comparisons), and allows for cyclic preferences. In addition, computing the winner through the Kemeny rule is -complete [13].

To overcome the first criticism, a more recent literature adopts the random utility model (RUM) from economics [26]. Consider alternatives. In RUM, there is a ground truth utility (or score) associated with each alternative. These are real-valued parameters, denoted by . Given this, an agent independently samples a random utility () for each alternative with conditional distribution .

Usually is the mean of .111

might be parameterized by other parameters, for example variance.

Let denote a permutation of , which naturally corresponds to a linear order: . Slightly abusing notation, we also use to denote this linear order. Random utility generates a distribution on preference orders, as

 Pr(π | →θ)=Pr(Xπ(1)>Xπ(2)>…>Xπ(m)) (1)

The generative process is illustrated in Figure 1.

Adopting RUMs rules out cyclic preferences, because each agent’s outcome corresponds to an order on real numbers, and it also captures the strength of preference, and thus overcomes the second criticism, by assigning a different parameter () to each alternative.

A popular RUM is Plackett-Luce (P-L) [18, 21], where the random utility terms are generated according to Gumbel distributions with fixed shape parameter [2, 31]. For P-L, the likelihood function has a simple analytical solution, making MLE inference tractable. P-L has been extensively applied in econometrics [19, 1]

, and more recently in machine learning and information retrieval (see

[16] for an overview). Efficient methods of EM inference [14, 5], and more recently expectation propagation [12], have been developed for P-L and its variants.

In application to social choice, the P-L model has been used to analyze political elections [10]. EM algorithm has also been used to learn the Mallows model, which is closely related to the Condorcet’s probabilistic model [17].

Although P-L overcomes the two difficulties of the Condorcet-Kemeny approach, it is still quite restricted, by assuming that the random utility terms are distributed as Gumbel, with each alternative is characterized by one parameter, which is the mean of its corresponding distribution. In fact, little is known about inference in RUMs beyond P-L. Specifically, we are not aware of either an analytical solution or an efficient algorithm for MLE inference for one of the most natural models proposed by Thurstone [26], where each

### 1.1 Our Contributions

In this paper we focus on RUMs in which the random utilities are independently generated with respect to distributions in the exponential family (EF) [20]. This extends the P-L model, since the Gumbel distribution with fixed shape parameters belonging to the EF. Our main theoretical contributions are Theorem 3 and Theorem 3, which propose conditions such that the log-likelihood function is concave and the set of global maxima solutions is bounded for the location family, which are RUMs where the shape of each distribution is fixed and the only latent variables are the locations, i.e., the means of ’s. These results hold for existing special cases, such as the P-L model, and many other RUMs, for example the ones where each is chosen from Normal, Gumbel, Laplace and Cauchy.

We also propose a novel application of MC-EM. We treat the random utilities (

) as latent variables, and adopt the Expectation Maximization (EM) method to estimate parameters

. The E-step for this problem is not analytically tractable, and for this we adopt a Monte Carlo approximation. We establish through experiments that the Monte-Carlo error in the E-step is controllable and does not affect inference, as long as numerical parameterizations are chosen carefully. In addition, for the E-step we suggest a parallelization over the agents and alternatives and a Rao-Blackwellized method, which further increases the scalability of our method.

We generally assume that the data provides total orders on alternatives from voters, but comment on how to extend the method and theory to the case where the input preferences are partial orders.

We evaluate our approach on synthetic data as well as two real-world datasets, a public election dataset and one involving rank preferences on sushi. The experimental results suggest that the approach is scalable despite providing significantly improved modeling flexibility over existing approaches.

For the two real-world datasets we have studied, we compare RUMs with normal distributions and P-L in terms of four criteria: log-likelihood, predictive log-likelihood, Akaike information criterion (AIC), and Bayesian information criterion (BIC). We observe that when the amount of data is not too small, RUMs with normal distributions fit better than P-L. Specifically, for the log-likelihood, predictive log-likelihood, and AIC criteria, RUMs with normal distributions outperform P-L with 95% confidence in both datasets.

## 2 RUMs and Exponential Families

In social choice, each agent has a strict preference order on alternatives. This provides the data for an inferential approach to social choice. In particular. let denote the set of all linear orders on . Then, a preference-profile, , is a set of preference orders, one from each agent, so that .

A voting rule is a mapping that assigns to each preference-profile a set of winning rankings, . In particular, in the case of ties the set of winning rankings may include more than a singleton ranking. In the maximum likelihood (MLE) approach to social choice, the preference profile is viewed as data, .

Given this, the probability (likelihood) of the data given ground truth (and for a particular ) is where,

 P(π|→θ)=∫∞xπ(n)=−∞∫∞xπ(n−1)=xπ(n)..∫∞xπ(1)=xπ(2)μπ(n)(xπ(n))..μπ(1)(xπ(1))dxπ(1)dxπ(2)..dxπ(n) (2)

The MLE approach to social choice selects as the winning ranking that which corresponds to the that maximizes . In the case of multiple parameters that maximize the likelihood then the MLE approach returns a set of rankings, one ranking corresponding to each parameterization.

In this paper, we focus on probabilistic models where each belongs to the exponential family (EF). The density function for each in EF has the following format:

 Pr(X=x) =μ(x)=eη(θ)T(x)−A(θ)+B(x), (3)

where and are functions of , is a function of , and denotes the sufficient statistics for , which could be multidimensional.

###### Example (Plackett-Luce as an RUM [2])

In the RUM, let ’s be Gumbel distributions. That is, for alternative we have . Then, we have: , where , , and .This gives us the Plackett-Luce model.

## 3 Global Optimality and Log-Concavity

In this section, we provide a condition on distributions that guarantees that the likelihood function (2) is log-concave in parameters . We also provide a condition under which the set of MLE solutions is bounded when any one latent parameter is fixed. Together, this guarantees the convergence of our MC-EM approach to a global mode with an accurate enough E-step. We focus on the location family, which is a subset of RUMs where the shapes of all ’s are fixed, and the only parameters are the means of the distributions. For the location family, we can write , where and

is a random variable whose mean is

and models an agent’s subjective noise. The random variables ’s do not need to be identically distributed for all alternatives ; e.g., they can be normal with different fixed variances.

We focus on computing solutions () to maximize the log-likelihood function,

 l(→θ;D)=n∑i=1logPr(πi | →θ) (4)
###### Theorem

For the location family, if for every

is log-concave, then is concave.

Proof sketch: The theorem is proved by applying the following lemma, which is Theorem 9 in [22].

###### Lemma

Suppose are concave functions in where

is the vector of

parameters and is a vector of real numbers that are generated according to a distribution whose pdf is logarithmic concave in . Then the following function is log-concave in .

 Li(→θ,G)=Pr(g1(→θ,→ζ)≥0,...,gR(→θ,→ζ)≥0),  →θ∈Rm (5)

To apply Lemma 3, we define a set of function ’s that is equivalent to an order in the sense of inequalities implied by RUM for and (the joint probability in (5) for to be the same as the probity of in RUM with parameters ). Suppose for .

Then considering that the length of order is , we have:

 Li(→θ,πi)=Li(→θ,Gi)=Pr(gi1(→θ,→ζ)≥0,...,giR(→θ,→ζ)≥0),  →θ∈Rm (6)

This is because is equivalent to that in alternative is preferred to alternative in the RUM sense.

To see how this extends to the case where preferences are specified as partial orders, we consider in particular an interpretation where an agent’s report for the ranking of alternatives implies that all other alternatives are worse for the agent, in some undefined order. Given this, define for and for . Considering that s are linear (hence, concave) and using log concavity of the distributions of ’s, we can apply Lemma 3 and prove log-concavity of the likelihood function.

It is not hard to verify that pdfs for normal and Gumbel are log-concave under reasonable conditions for their parameters, made explicit in the following corollary.

###### Corollary

For the location family where each is a normal distribution with mean zero and with fixed variance, or Gumbel distribution with mean zeros and fixed shape parameter, is concave. Specifically, the log-likelihood function for P-L is concave.

The concavity of log-likelihood of P-L has been proved [9] using a different technique. Using Fact 3.5. in [24], the set of global maxima solutions to the likelihood function, denoted by , is convex since the likelihood function is log-concave. However, we also need that is bounded, and would further like that it provides one unique order as the estimation for the ground truth.

For P-L, Ford, Jr. [9] proposed the following necessary and sufficient condition for the set of global maxima solutions to be bounded (more precisely, unique) when .

###### Condition

Given the data , in every partition of the alternatives into two nonempty subsets , there exists and such that there is at least one ranking in where .

We next show that Condition 3 is also a necessary and sufficient condition for the set of global maxima solutions to be bounded in location families, when we set one of the values to be (w.l.o.g., let ). If we do not bound any parameter, then is unbounded, because for any , any , and any number , .

###### Theorem

Suppose we fix . Then, the set of global maxima solutions to is bounded if and only if the data satisfies Condition 3.

Proof sketch:

If Condition 3 does not hold, then is unbounded because the parameters for all alternatives in can be increased simultaneously to improve the log-likelihood. For sufficiency, we first present the following lemma.

###### Lemma

If alternative is preferred to alternative in at least in one ranking then the difference of their mean parameters is bounded from above () for all the that maximize the likelihood function.

###### Proof

Suppose that in rank , then for any :

 Li(→θ,πi)=Li(→θ,Gi)=Pr(g1(→θ,→ζ)≥0,...,gR(→θ,→ζ)≥0) ≤ Pr(gπi(r)(→θ,→ζ)≥0,gπi(r+1)(→θ,→ζ)≥0,…,gπi(r′)(→θ,→ζ)≥0)≤Pr(ζj−ζj′≥θj′−θj), (7)

where and .

Let . Since the log-likelihood is always smaller than , it follows that for any and any , .

Hence, .

Therefore, there exists such that , where depends on the fixed and .

Now consider a directed graph , where the nodes are the alternatives, and there is an edge between to if in at least one ranking . By Condition 3, for any pair , there is a path from to (and conversely, a path from to ). To see this, consider building a path between and by starting from a partition with and following an edge from to in the graph where is an alternatives in for which there must be such an edge, by Condition 3. Consider the partition with , and repeat until an edge can be followed to vertex . It follows from Lemma 3 that for any we have , using the telescopic sum of bounded values of the difference of mean parameters along the edges of the path, since the length of the path is no more than (and tracing the path from to and to ), meaning that is bounded.

Now that we have the log concavity and bounded property, we need to declare conditions under which the bounded convex space of estimated parameters corresponds to a unique order. The next theorem provides a necessary and sufficient condition for all global maxima to correspond to the same order on alternatives. Suppose that we order the alternatives based on estimated ’s (meaning that is ranked higher than iff ).

###### Theorem

The order over parameters is strict and is the same across all if, for all and all alternatives , .

###### Proof

Suppose for the sake of contradiction there exist two maxima, and a pair of alternatives such that and . Then, there exists an such that the th and th components of are equal, which contradicts the assumption.

Hence, if there is never a tie in the scores in any , then any vector in will reveal the unique order.

## 4 Monte Carlo EM for Parameter Estimation

In this section, we propose an MC-EM algorithm for MLE inference for RUMs where every belongs to the EF.222Our algorithm can be naturally extended to compute a maximum a posteriori probability (MAP) estimate, when we have a prior over the parameters . Still, it seems hard to motivate the imposition of a prior on parameters in many social choice domains.

The EM algorithm determines the MLE parameters iteratively, and proceeds as follows. In each iteration , given parameters from the previous iteration, the algorithm is composed of an E-step and an M-step. For the E-step, for any given , we compute the conditional expectation of the complete-data log-likelihood (latent variables and data ), where the latent variables are distributed according to data and parameters from the last iteration.

For the M-step, we optimize to maximize the expected log-likelihood computed in the E-step, and use it as the input for the next iteration:

 E-Step :   Q(→θ,→θt)=E→X{logn∏i=1Pr(→xi,πi | →θ) | D,→θt} M-step :   →θt+1∈argmax→θQ(→θ,→θt)

### 4.1 Monte Carlo E-step by Gibbs sampler

The E-step can be simplified using (3) as follows:

 E→X{logn∏i=1Pr(→xi,πi | →θ) | D,→θt}=E→X{logn∏i=1Pr(→xi| →θ)Pr(πi|→xi) | D,→θt}
 =n∑i=1m∑j=1EXij{logμj(xij|θj) | πi,→θt}=n∑i=1m∑j=1(η(θj)EXij{T(xij) | πi,→θt}−A(θj)+W,

where only depends on and (not on ), which means that it can be treated as a constant in the M-step. Hence, in the E-step we only need to compute where is the sufficient statistic for the parameter in the model. We are not aware of an analytical solution for . However, we can use a Monte Carlo approximation, which involves sampling from the distribution using a Gibbs sampler, and then approximates by where is the number of samples in the Gibbs sampler.

In each step of our Gibbs sampler for voter , we randomly choose a position in and sample according to a TruncatedEF distribution , where . The TruncatedEF is obtained by truncating the tails of at and , respectively. For example, a truncated normal distribution is illustrated in Figure 2.

### 4.2 M-step

In the E-step we have (approximately) computed . In the M-step we compute to maximize . Equivalently, we compute for each separately to maximize .

For the case of the normal distribution with fixed variance, where and , we have . The algorithm is illustrated in Figure 3.

### 4.3 Convergence

In the last section we showed that if the RUM satisfies the premise in Theorem 3 and Theorem 3 the data satisfies Condition 3, then the log-likelihood function is concave, and the set of global maxima solutions is bounded. This guarantee the convergence of MC-EM for an exact E-step.

In general, MC-EM methods do not have the uniform convergence property of EM methods. In order to control the error of approximation in the MC-E step we can increase the number of samples with the iterations [28]. However, in our application, we are not concerned with the exact estimation of , as we are only interested in their orders relative to each-other. Therefore, as long as the approximation error remains relatively small, such that the differences of s are much larger than the error, we are safe to stop.

A known problem with Gibbs sampling is that it can introduce correlation among samples. To address this, we sub-sample the samples to reduce the correlation, and call the ratio of sub-sampling the thinning factor (). A suitable thinning ratio can be set using empirical results from the sampler.

With an approach similar to [3], we can derive a relationship between the variance of error in and the Monte-Carlo error in the E-step approximation:

 Var(θjt+1)=1n2n∑i=1Var(Si,t+1j)=1MNn2n∑i=1Var(xij)≤FVMNn, (8)

where is number of samples in Gibbs sampler, is the number of samples for Rao-Blackwellization, is number of agents, is the thinning factor and , and samples are assumed to be independent. Given, , and , we can make arbitrarily small by increasing .

## 5 Experimental Results

We evaluate the proposed MC-EM algorithm on synthetic data as well as two real world data sets, namely an election data set and a dataset representing preference orders on sushi. For simulated data we use the Kendall correlation [11] between two rank orders (typically between the true order and the method’s result) as a measure of performance.

### 5.1 Experiments for Synthetic Data

We first generate data from Normal models for the random utility terms, with means and equal variance for all terms, for different choices of variance (). We evaluate the performance of the method as the number of agents varies. The results show that a limited number of iterations in the EM algorithm (at most 3), and samples (M=5, N=800) are sufficient for inferring the order in most cases. The performance in terms of Kendall correlation for recovering ground truth improves for larger number of agents, which corresponds to more data. See Figure 4, which shows the asymptotic behavior of the maximum likelihood estimator in recovering the true parameters. Figure 4 left and middle panels show that the more the size of dataset the better the performance of the method.

Moreover, for large variances in data generation, due to increasing noise in the data, the rate that performance gets better is slower than that for the case for smaller variances. Notice that the scales on the y-axis are different in the left and middle panels.

### 5.2 Experiments for Model Robustness

We apply our method to a public election dataset collected by Nicolaus Tideman [27], where the voters provided partial orders on candidates. A partial order includes comparisons among a subset of alternative, and the non-mentioned alternatives in the partial order are considered to be ranked lower than the lowest ranked alternative among mentioned alternatives.

The total number of votes are and the number of alternatives . For the purpose of our experiments, we adopt the order on alternatives obtained by applying our method on the entire dataset as an assumed ground truth, since no ground truth is given as part of the data. After finding the ground truth by using all 280 votes (and adopting a normal model), we compare the performance of our approach as we vary the amount of data available. We evaluate the performance for sub-samples consisting of of samples randomly chosen from the full dataset. For each sub-sample size, the experiment is repeated times and we report the average performance and the variance. See the right panel in Figure 4. This experiment shows the robustness of the method, in the sense that the result of inference on a subset of the dataset shows consistent behavior with the case that the result on the full dataset. For example, the ranking obtained by using half of the data can still achieve a fair estimate to the results with full data, with an average Kendall correlation of greater than 0.4.

### 5.3 Experiments for Model Fitness

In addition to a public election dataset, we have tested our algorithm on a sushi dataset, where 5000 users give rankings over 10 different kinds of sushi [15]. For each experiment we randomly choose rankings, apply our MC-EM for RUMs with normal distributions where variances are also parameters.

In the former experiments, both the synthetic data generation and the model for election data, the variances were fixed to and hence we had the theoretical guarantees for the convergence to global optimal solutions by Theorem 3 and Theorem 3. When we let the variances to be part of parametrization we lose the theoretical guarantees. However, the EM algorithm can still be applied, and since the variances are now parameters (rather than being fixed to ), the model fits better in terms of log-likelihood.

For this reason, we adopt RUMs with normal distributions in which the variance is a parameter that is fit by EM along with the mean. We call this model a normal model. We compute the difference between the normal model and P-L in terms of four criteria: log-likelihood (LL), predictive log-likelihood (predictive LL), AIC, and BIC. For (predictive) log-likelihood, a positive value means that normal model fits better than P-L, whereas for AIC and BIC, a negative number means that normal model fits better than P-L. Predictive likelihood is different from likelihood in the sense that we compute the likelihood of the estimated parameters for a part of the data that is not used for parameter estimation.333

The use of predictive likelihood allows us to evaluate the performance of the estimated parameters on the rest of the data, and is similar in this sense to cross validation for supervised learning.

In particular, we compute predictive likelihood for a randomly chosen subset of

votes. The results and standard deviations for

are summarized in Table 1.

When is small , the variance is high and we are unable to obtain statistically significant results in comparing fitness. When is not too small (), RUMs with normal distributions fit better than P-L. Specifically, for log-likelihood, predictive log-likelihood, and AIC, RUMs with normal distributions outperform P-L with 95% confidence in both datasets.

### 5.4 Implementation and Run Time

The running time for our MC-EM algorithm scales linearly with number of agents on real world data (Election Data) with slope 13.3 second per agent on an Intel GHz PC. This is for 100 iterations of EM algorithm with Gibbs sampling number increasing with iterations as .

## Acknowledgments

This work is supported in part by NSF Grant No. CCF- 0915016. Lirong Xia is supported by NSF under Grant #1136996 to the Computing Research Association for the CIFellows Project. We thank Craig Boutilier, Jonathan Huang, Tyler Lu, Nicolaus Tideman, Paolo Viappiani, and anonymous NIPS-12 reviewers for helpful comments and suggestions, or help on the datasets.

## References

• [1] Steven Berry, James Levinsohn, and Ariel Pakes. Automobile prices in market equilibrium. Econometrica, 63(4):841–890, 1995.
• [2] Henry David Block and Jacob Marschak. Random orderings and stochastic theories of responses. In Contributions to Probability and Statistics, pages 97–132, 1960.
• [3] James G. Booth and James P. Hobert.

Maximizing Generalized Linear Mixed Model Likelihoods with an Automated Monte Carlo EM Algorithm.

• [4] Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng, editors.

Handbook of Markov Chain Monte Carlo

.
Chapman and Hall/CRC, 2011.
• [5] Francois Caron and Arnaud Doucet.

Efficient Bayesian Inference for Generalized Bradley-Terry Models.

Journal of Computational and Graphical Statistics, 21(1):174–196, 2012.
• [6] Marquis de Condorcet. Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Paris: L’Imprimerie Royale, 1785.
• [7] Vincent Conitzer, Matthew Rognlie, and Lirong Xia. Preference functions that score rankings and maximum likelihood estimation. In Proc. IJCAI, pages 109–115, 2009.
• [8] Vincent Conitzer and Tuomas Sandholm. Common voting rules as maximum likelihood estimators. In Proc. UAI, pages 145–152, 2005.
• [9] Lester R. Ford, Jr. Solution of a ranking problem from binary comparisons. The American Mathematical Monthly, 64(8):28–33, 1957.
• [10] Isobel Claire Gormley and Thomas Brendan Murphy. A grade of membership model for rank data. Bayesian Analysis, 4(2):265–296, 2009.
• [11] Przemyslaw Grzegorzewski. Kendall’s correlation coefficient for vague preferences. Soft Computing, 13(11):1055–1061, 2009.
• [12] John Guiver and Edward Snelson. Bayesian inference for Plackett-Luce ranking models. In Proc. ICML, pages 377–384, 2009.
• [13] Edith Hemaspaandra, Holger Spakowski, and Jörg Vogel. The complexity of Kemeny elections. Theoretical Computer Science, 349(3):382–391, December 2005.
• [14] David R. Hunter. MM algorithms for generalized Bradley-Terry models. In The Annals of Statistics, volume 32, pages 384–406, 2004.
• [15] Toshihiro Kamishima. Nantonac collaborative filtering: Recommendation based on order responses. In Proc. KDD, pages 583–588, 2003.
• [16] Tie-Yan Liu. Learning to Rank for Information Retrieval. Springer, 2011.
• [17] Tyler Lu and Craig Boutilier. Learning mallows models with pairwise preferences. In Proc. ICML, pages 145–152, 2011.
• [18] R. Duncan Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley, 1959.

Conditional logit analysis of qualitative choice behavior.

In Frontiers of Econometrics, pages 105–142, New York, NY, 1974. Academic Press.
• [20] Carl N. Morris. Natural Exponential Families with Quadratic Variance Functions. Annals of Statistics, 10(1):65–80, 1982.
• [21] R. L. Plackett. The analysis of permutations. JRSS. Series C, 24(2):193–202, 1975.
• [22] Andrś Prékopa. Logarithmic concave measures and related topics. In Stochastic Programming, pages 63–82. Academic Press, 1980.
• [23] Ariel D. Procaccia, Sashank J. Reddi, and Nisarg Shah. A maximum likelihood approach for selecting sets of alternatives. In Proc. UAI, 2012.
• [24] Frank Proschan and Yung L. Tong. Chapter 29. log-concavity property of probability measures. FSU techinical report Number M-805, pages 57–68, 1989.
• [25] Magnus Roos, Jörg Rothe, and Björn Scheuermann. How to calibrate the scores of biased reviewers by quadratic programming. In Proc. AAAI, pages 255–260, 2011.
• [26] Louis Leon Thurstone. A law of comparative judgement. Psychological Review, 34(4):273–286, 1927.
• [27] Nicolaus Tideman. Collective Decisions and Voting: The Potential for Public Choice. Ashgate Publishing, 2006.
• [28] Greg C. G. Wei and Martin A. Tanner. A Monte Carlo Implementation of the EM Algorithm and the Poor Man’s Data Augmentation Algorithms. JASA, 85(411):699–704, 1990.
• [29] Lirong Xia and Vincent Conitzer. A maximum likelihood approach towards aggregating partial orders. In Proc. IJCAI, pages 446–451, 2011.
• [30] Lirong Xia, Vincent Conitzer, and Jérôme Lang. Aggregating preferences in multi-issue domains by using maximum likelihood estimators. In Proc. AAMAS, pages 399–406, 2010.
• [31] John I. Jr. Yellott.

The relationship between Luce’s Choice Axiom, Thurstone’s Theory of Comparative Judgment, and the double exponential distribution.

J. of Mathematical Psychology, 15(2):109–144, 1977.
• [32] H. Peyton Young. Optimal voting rules. Journal of Economic Perspectives, 9(1):51–64, 1995.