has long been used for evaluating paired comparisons, such as games between pairs of teams in which one team or the other wins each game. The model assigns a strength parameter to each team, and the odds ratio associated with the probability of a team winning a game is equal to the ratio of the strengths. These strength parameters can be estimated based on the full set of game results and used to rank teams or make future predictions. The model has been extended by Davidson to contests in which a tie or drawn contest is a possible outcome. For many years, such ties were a common occurrence in the sport of ice hockey, but recently tie-breaking methods such as an overtime period played under different rules and/or a shootout in which the teams alternate penalty shots are used to determine a winner. Results in overtime or shootouts can be evaluated differently from wins in regulation play. For instance, since 2006 competitions organized by International Ice Hockey Federation (IIHF) have awarded three points in the standings to a team winning in regulation, two points for a win in overtime or a shootout, one point for a loss in overtime or a shootout, and no points for a loss in regulation, and many leagues have followed suit. Compared to the prior system which awarded two points for a win, one for a tie, and none for a loss, which effectively treated a tie as half a win and half a loss, the four-outcome system treats an overtime/shootout win as of a win and of a loss.111We do not consider non-zero-sum point systems such as that used in association football (soccer) which awards three points for a win and one for a draw, so that drawn matches are only worth two points total rather than three. Likewise, the National Hockey League awards all wins two points and overtime/shootout losses one point; this 2-2-1-0 system awards three total points for games which go into overtime, but only two for games decided in regulation. One possible approach to either of these situations (games with three or four outcomes) is to use standard Bradley-Terry and assign fractional wins as appropriate to the point system (see, e.g., ). However, this is unsatisfying, as it provides no way to assign a probability for a future game to end in a tie or overtime or shootout result. In this paper, we instead consider a generalization of the tie model of  which associates one strength parameter to each team, along with a single parameter describing the tendency for games to go into overtime.
The rest of this paper is organized as follows: In Section 2, we describe the three models (standard Bradley-Terry, Bradley-Terry-Davidson including ties, and a new model with four possible game outcomes including overtime/shootout wins and losses), and exhibit a generalization of the relevant formulas which describes all three cases. In Section 3
we describe methods for inferring the relevant parameters of these models given a set of game results: maximum likelihood estimation, and Bayesian inference using either a Gaussian approximation or Hamiltonian Monte Carlo. In Section4 we demonstrate these methods using a recent set of game results: the 2020-2021 Eastern College Athletic Conference (ECAC) season. This season used the standard IIHF system with 3-2-1-0 points assigned for regulation wins, overtime/shootout wins, overtime/shootout losses, and regulation losses, respectively. For the purposes of illustration, we evaluate the ECAC results with the four-outcome model as well as with the other two models, treating in one case all wins the same, and in the other all overtime/shootout results as ties.
so that the probability of a set of game outcomes in which team plays team times and wins of those games is222The first form explicitly includes each pair of teams only once, while the second corrects for the double-counting, taking advantage of the fact that . If the order of the games between pairs of teams is ignored, the sampling distribution for the is instead .
where is the number of teams, and .
Davidson  proposed an extension for competitions which include the probabilities of ties, in which the probabilities of the three possible outcomes of a game are
where is an additional parameter which describes how likely ties are to occur. (The probability of a tie in a game between evenly matched teams is .) Evidently, , and . The probability of a given set of game outcomes in which the games between teams and result in wins, ties and losses for team (where and ) is
We propose an extension appropriate for a system in which a win in
overtime or a shootout is treated as of a win and of a loss,
Writing the four possible game outcomes as RW for regulation win, OW
for overtime/shootout win, OL for overtime/shootout loss,
and RL for regulation loss, the modelled probability
of each outcome would be333The exponents are chosen to
correspond to the share of the points ( and ,
respectively) awarded for an overtime/shootout win or loss. This
has the desirable feature that the maximum likelihood equation
(3.7) becomes (after multiplying by )
The probability for a set of game outcomes will then be
If we write and , we can describe all three models as special cases of a general model in which the probability of a game between teams and ending in outcome is
is a vector equivalent of the logistic function known as the softmax function. The probability for a set of game outcomes is
For the standard Bradley-Terry model, , , and .
For the Bradley-Terry-Davidson model with ties, , , , , and .
For the model introduced in this paper, , , , , , and .
All of these models satisfy , and have “opposite” outcomes and such that , , and . They also satisfy and . We confine ourselves below to cases where these properties hold.
3 Inference of Parameters
3.1 Maximum Likelihood
Maximum likelihood estimates (MLEs) of Bradley-Terry strength parameters [18, 9, 7] provide a straightforward way of associating a “rating” to each team based on their game results, and have been proposed as a replacement for less reliable ways of evaluating a team’s game results in light of the difficulty of their schedule.
We can consider the probability as a likelihood function of the parameters and , with log-likelihood
We can use the identity
to show that
which means that
Using these, we can write the maximum likelihood equations as
can be interpreted in the models considered as the number of games which are tied or go to overtime, respectively, and
can be seen as the total number of “points” for team . The maximum likelihood equation set each of these quantities equal to their expectation values.
We can solve the maximum likelihood equations by a generalization of the iterative method in . writing them
(where we have used the fact that the only non-zero term in the numerator has ) and
As in the standard Bradley-Terry model, the overall multiplicative scale of is undefined (because can be written so that the team strengths appear only in the combination ), so it is necessary to rescale the team strengths at each iteration to preserve a property such as . Beyond that, there are conditions for the maximum likelihood estimates to be finite and well-defined, which are explored in e.g., [2, 14, 6].
3.2 Bayesian Approach
It is useful to move beyond maximum likelihood estimates, both to quantify uncertainty in the model parameters, and to make predictions about the outcome of future games. (For instance,  proposed simulating future games with probabilites drawn from a posterior distribution capturing the uncertainty in the strength parameters, rather than fixed probabilties generated from the MLEs of those parameters.)
A convenient framework for parameter estimates including uncertainties is Bayesian inference, which defines the posterior probability density for the parametersand , or equivalently and , given a set of game results and prior assumptions , as
A variety of choices can be made for the multivariate prior distribution on  in the Bradley-Terry model, and likewise for the tie/overtime parameter . For simplicity, we work in this paper with the improper Haldane prior444So named because the marginal prior distribution for probabilities such as will follow the Haldane prior [10, 11], which is the limit of a distribution as .
which means that the posterior distribution is proportional to the likelihood:
With this choice of prior, the posterior probability density will be independent of the combination , but otherwise will be normalizable under the same circumstances that lead to well-defined maximum likelihood estimates for the parameters.
3.2.1 Gaussian Approximation
One convenient approach is to Taylor expand the log-posterior
about the maximum a posteriori
solution (which in this case is the maximum likelihood solution
).555 Note that this method does
not assign special significance to the MAP estimates, but uses them
as the starting point for a convenient approximation to the
posterior probability distribution.
Note that this method does not assign special significance to the MAP estimates, but uses them as the starting point for a convenient approximation to the posterior probability distribution.Truncating the expansion at second order gives a Gaussian approximation
where is the Hessian matrix
with elements666Note the similarity to the Fisher information matrix , which differs from the Hessian in that that depends on the observed data, while is a function defined on parameter space.
To compute the elements of the Hessian matrix, we return to the first derivative (3.4) and differentiate them to get
is the probability of a tie or overtime game, depending on the model, and we have used the fact that since . Similarly, using the properties , , , and , we find
and, finally, differentiating (3.5) gives us
so that the Hessian matrix has components
Note that in the case of the Bradley-Terry model, where the only outcomes are win and loss, the condition simplifies the Hessian to (since the parameter is not actually part of the likelihood), and
which is the form seen in, e.g., .
The Hessian matrix in (3.22) is singular, since and , which ultimately arise from the fact that the probabilities , and thus the likelihood, are unchanged by adding the same constant to all the
. This can be handled computationally by constructing a variance-covariance matrixwhich is the Moore-Penrose pseudo-inverse777
For a real symmetric matrix with a complete eigenvalue decomposition, this operation replaces each non-zero eigenvalue with its reciprocal while leaving zero eigenvalues unchanged.of the Hessian matrix, and approximating the posterior as a multivariate Gaussian with a mean of and a variance-covariance matrix . This has the effect of enforcing the constraint
on the combination of the parameters which has no influence on the model.
This Gaussian approximation can be used to produce analytic estimates of quantities of interest, or used for Monte Carlo sampling, as illustrated in Section 4. It can also be used as a starting point for importance sampling of the sort discussed in . For the present work, we consider a different Monte Carlo method for sampling from the exact posterior.
3.2.2 Hamiltonian Monte Carlo
Markov-chain Monte Carlo methods provide a convenient way to draw samples from a posterior distribution. We demonstrate in this paper how to draw posterior samples for the Bradley-Terry extensions considered, using Hamiltonian Monte Carlo as implemented in the Stan library. There are a few technical considerations. Because the posterior on and
is improper, trying to draw from it directly will lead to chains which never converge. Any probabilities constructed from the samples will be well-behaved, since only the meaningless degree of freedomis unconstrained, but these apparent errors make it more difficult to detect other potential problems. It is thus useful instead to consider only variables (and ) which contribute to the probability model via (see (2.7))
Of course, the full set of values are not independent. Instead, they are determined by the parameters for . Given the we can construct .
In Appendix A we show the code of the Stan model used to perform Hamiltonian Monte Carlo simulations of all three models.
|St. Lawrence (SL)||2(1)||0(1)||1(0)||—||3||2||2||7|
4 Demonstration Using Game Results
We now illustrate the application of the models described in this paper using game results from a competition which used the 3-2-1-0 point system: the 2020-2021 Eastern College Athletic Conference (ECAC) season. While the league ordinarily plays a balanced round-robin schedule in which each team plays each other team the same number of times, the season in question ended up being unbalanced due to cancellations of games arising from the COVID-19 pandemic. In Table 1 we show the results for the ECAC season, in the form of and for each team against each opponent, along with the total number of results of each type for each team, .
4.1 ECAC: Standard Bradley-Terry Model
As a first demonstration, we consider the standard Bradley-Terry model applied to the ECAC results with regulation and overtime/shootout wins being counted as simply “wins” and regulation and overtime/shootout losses being counted as “losses”. I.e., we define and . The resulting maximum-likelihood solutions and associated probabilities are shown in Table 2, along with the uncertainties and correlations encoded in the variance-covariance matrix of the Gaussian approximation to the posterior distribution.
Since the log-strengths have an arbitrary additive scale, a more meaningful understanding of the posterior distributions is obtained by considering the marginal distribution of the difference of a pair of team strengths . In Figure 1, we illustrate the maximum likelihood estimate and posterior distribution of this quantity for two of the six pairs of teams: Quinnipiac-Colgate and Quinnipiac-Clarkson. We show the posterior in Gaussian approximation (for which the marginal posterior on
is also a Gaussian), in a Monte Carlo drawn from the approximate multivariate Gaussian distribution, and in posterior samples from the exact posterior generated using Hamiltonian Monte Carlo with the Stan library. We can transform the posterior on a difference in log-strength into a posterior on the corresponding probability ; this is shown in Figure 2 for the two sets of posterior samples. In all cases, the exact marginal posterior, as estimated by the Hamiltonian Monte Carlo is only slightly different from the Gaussian approximation. This is similar to results found using importance sampling in .
4.2 ECAC: Bradley-Terry-Davidson Model with Ties
Moving on to the Bradley-Terry-Davidson model with ties, we now consider inference of the log-strength parameters along with the log-tie parameter . We illustrate the methods by reanalyzing the 2020-2021 ECAC results, with all overtime games treated as ties, so that now , , and . The maximum likelihood solutions and are shown in Table 3, along with the associated probabilities for a win and for a tie in contests between pairs of teams. In Table 4, we show the maxumum-likelihood estimates along with the uncertainties in and correlations among the log-strengths and the log-tie parameter , which are encoded in the variance-covariance matrix of the Gaussian approximation to the posterior distribution.
As with the standard Bradley-Terry model, we can show the marginal posterior distributions on the differences between pairs of log-strength parameters, and we do this in Figure 3 for the same pairs of teams as before. Once again, samples drawn from the multivariate Gaussian approximation capture the shape of that distribution well, and samples drawn from the exact posterior using Hamiltonian Monte Carlo are slightly different but similar.
We cannot convert directly into a probability, however, since probabilities depend on the log-tie parameter as well. In Figure 4 we plot the marginal posterior on . The parameter can be transformed into a probability where () for a game between evenly-matched teams to be tied, and we plot the posterior for this as well. Finally, in Figure 5 we illustrate the joint marginal posterior in and for our selected pairs of teams.
To illustrate the posterior on the probabilities for a pair of teams, we note that the constraint means that the space is actually two dimensional. The natural visualization for the behavior of three quantities which sum to one is a ternary plot, and we contour plot density estimates of the posterior and its Gaussian approximation in Figure 6, along with the maximum likelihood estimates .
4.3 ECAC: Bradley-Terry-like Model with Overtime/Shootout Results
Having developed the mechanisms to characterize the posterior distribution for the Bradley-Terry-Davidson model with three outcomes (win, tie, and loss), we apply similar analogues for the model with four outcomes: regulation win (RW), overtime/shootout win (OW), overtime/shootout loss (OL), and regulation loss (RL), now applied to the full 2020-2021 ECAC results shown in Table 1. As before, there is a log-strength parameter for each team, and is now the log of a parameter associated with overtime results. We show the maximum likelihood estimates in Table 5 along with the probabilities for a regulation win for an overtime/shootout win in contests between pairs of teams. In Table 6 we show the parameters of the Gaussian approximation to the posterior.