Bradley-Terry Modeling with Multiple Game Outcomes with Applications to College Hockey

The Bradley-Terry model has previously been used in both Bayesian and frequentist interpretations to evaluate the strengths of sports teams based on win-loss game results. It has also been extended to handle additional possible results such as ties. We implement a generalization which includes multiple possible outcomes such as wins or losses in regulation, overtime, or shootouts. A natural application is to ice hockey competitions such as international matches, European professional leagues, and NCAA hockey, all of which use a zero-sum point system which values overtime and shootout wins as 1/3 of a win, and overtime and shootout losses as 1/3 of a win. We incorporate this into the probability model, and evaluate the posterior distributions for the associated strength parameters using techniques such as Gaussian expansion about maximum a posteriori estimates, and Hamiltonian Monte Carlo.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/13/2020

Prediction and Evaluation in College Hockey using the Bradley-Terry-Zermelo Model

We describe the application of the Bradley-Terry model to NCAA Division ...
12/14/2017

Prior Distributions for the Bradley-Terry Model of Paired Comparisons

The Bradley-Terry model assigns probabilities for the outcome of paired ...
10/12/2018

Estimating Robot Strengths with Application to Selection of Alliance Members in FIRST Robotics Competitions

Since the inception of the FIRST Robotics Competition and its special pl...
07/24/2018

Global consensus Monte Carlo

For Bayesian inference with large data sets, it is often convenient or n...
11/30/2017

Thermostat-assisted Continuous-tempered Hamiltonian Monte Carlo for Multimodal Posterior Sampling

In this paper, we propose a new sampling method named as the thermostat-...
12/21/2021

Retrodictive Modelling of Modern Rugby Union: Extension of Bradley-Terry to Multiple Outcomes

Frequently in sporting competitions it is desirable to compare teams bas...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Bradley-Terry model [3, 18]

has long been used for evaluating paired comparisons, such as games between pairs of teams in which one team or the other wins each game. The model assigns a strength parameter to each team, and the odds ratio associated with the probability of a team winning a game is equal to the ratio of the strengths. These strength parameters can be estimated based on the full set of game results and used to rank teams or make future predictions. The model has been extended by Davidson

[7] to contests in which a tie or drawn contest is a possible outcome. For many years, such ties were a common occurrence in the sport of ice hockey, but recently tie-breaking methods such as an overtime period played under different rules and/or a shootout in which the teams alternate penalty shots are used to determine a winner. Results in overtime or shootouts can be evaluated differently from wins in regulation play. For instance, since 2006[8] competitions organized by International Ice Hockey Federation (IIHF) have awarded three points in the standings to a team winning in regulation, two points for a win in overtime or a shootout, one point for a loss in overtime or a shootout, and no points for a loss in regulation, and many leagues have followed suit. Compared to the prior system which awarded two points for a win, one for a tie, and none for a loss, which effectively treated a tie as half a win and half a loss, the four-outcome system treats an overtime/shootout win as of a win and of a loss.111We do not consider non-zero-sum point systems such as that used in association football (soccer) which awards three points for a win and one for a draw, so that drawn matches are only worth two points total rather than three. Likewise, the National Hockey League awards all wins two points and overtime/shootout losses one point; this 2-2-1-0 system awards three total points for games which go into overtime, but only two for games decided in regulation. One possible approach to either of these situations (games with three or four outcomes) is to use standard Bradley-Terry and assign fractional wins as appropriate to the point system (see, e.g., [17]). However, this is unsatisfying, as it provides no way to assign a probability for a future game to end in a tie or overtime or shootout result. In this paper, we instead consider a generalization of the tie model of [7] which associates one strength parameter to each team, along with a single parameter describing the tendency for games to go into overtime.

The rest of this paper is organized as follows: In Section 2, we describe the three models (standard Bradley-Terry, Bradley-Terry-Davidson including ties, and a new model with four possible game outcomes including overtime/shootout wins and losses), and exhibit a generalization of the relevant formulas which describes all three cases. In Section 3

we describe methods for inferring the relevant parameters of these models given a set of game results: maximum likelihood estimation, and Bayesian inference using either a Gaussian approximation or Hamiltonian Monte Carlo. In Section 

4 we demonstrate these methods using a recent set of game results: the 2020-2021 Eastern College Athletic Conference (ECAC) season. This season used the standard IIHF system with 3-2-1-0 points assigned for regulation wins, overtime/shootout wins, overtime/shootout losses, and regulation losses, respectively. For the purposes of illustration, we evaluate the ECAC results with the four-outcome model as well as with the other two models, treating in one case all wins the same, and in the other all overtime/shootout results as ties.

2 Models

In the standard Bradley-Terry model [3, 18] each team has a strength , and the modelled probability that team will win a game with team is

(2.1)

so that the probability of a set of game outcomes in which team plays team times and wins of those games is222The first form explicitly includes each pair of teams only once, while the second corrects for the double-counting, taking advantage of the fact that . If the order of the games between pairs of teams is ignored, the sampling distribution for the is instead .

(2.2)

where is the number of teams, and .

Davidson [7] proposed an extension for competitions which include the probabilities of ties, in which the probabilities of the three possible outcomes of a game are

(2.3a)
(2.3b)
(2.3c)

where is an additional parameter which describes how likely ties are to occur. (The probability of a tie in a game between evenly matched teams is .) Evidently, , and . The probability of a given set of game outcomes in which the games between teams and result in wins, ties and losses for team (where and ) is

(2.4)

We propose an extension appropriate for a system in which a win in overtime or a shootout is treated as of a win and of a loss, Writing the four possible game outcomes as RW for regulation win, OW for overtime/shootout win, OL for overtime/shootout loss, and RL for regulation loss, the modelled probability of each outcome would be333The exponents are chosen to correspond to the share of the points ( and , respectively) awarded for an overtime/shootout win or loss. This has the desirable feature that the maximum likelihood equation (3.7) becomes (after multiplying by )

i.e., that the expected number of points for each team equals the actual number. See also the discussion in Section 5 about possible alternative models, including extended models in which the exponents are not fixed, but inferred from the data.

(2.5a)
(2.5b)
(2.5c)
(2.5d)

The probability for a set of game outcomes will then be

(2.6)

If we write and , we can describe all three models as special cases of a general model in which the probability of a game between teams and ending in outcome is

(2.7)

where

(2.8)

is a vector equivalent of the logistic function known as the softmax function.

[4] The probability for a set of game outcomes is

(2.9)

Specifically,

  • For the standard Bradley-Terry model, , , and .

  • For the Bradley-Terry-Davidson model with ties, , , , , and .

  • For the model introduced in this paper, , , , , , and .

All of these models satisfy , and have “opposite” outcomes and such that , , and . They also satisfy and . We confine ourselves below to cases where these properties hold.

3 Inference of Parameters

3.1 Maximum Likelihood

Maximum likelihood estimates (MLEs) of Bradley-Terry strength parameters [18, 9, 7] provide a straightforward way of associating a “rating” to each team based on their game results, and have been proposed as a replacement for less reliable ways of evaluating a team’s game results in light of the difficulty of their schedule.[5]

We can consider the probability as a likelihood function of the parameters and , with log-likelihood

(3.1)

We can use the identity

(3.2)

to show that

(3.3a)
and
(3.3b)

which means that

(3.4)

and

(3.5)

Using these, we can write the maximum likelihood equations as

(3.6)

and

(3.7)

where

(3.8)

can be interpreted in the models considered as the number of games which are tied or go to overtime, respectively, and

(3.9)

can be seen as the total number of “points” for team . The maximum likelihood equation set each of these quantities equal to their expectation values.

We can solve the maximum likelihood equations by a generalization of the iterative method in [9]. writing them

(3.10)

(where we have used the fact that the only non-zero term in the numerator has ) and

(3.11)

As in the standard Bradley-Terry model, the overall multiplicative scale of is undefined (because can be written so that the team strengths appear only in the combination ), so it is necessary to rescale the team strengths at each iteration to preserve a property such as . Beyond that, there are conditions for the maximum likelihood estimates to be finite and well-defined, which are explored in e.g., [2, 14, 6].

3.2 Bayesian Approach

It is useful to move beyond maximum likelihood estimates, both to quantify uncertainty in the model parameters, and to make predictions about the outcome of future games. (For instance, [17] proposed simulating future games with probabilites drawn from a posterior distribution capturing the uncertainty in the strength parameters, rather than fixed probabilties generated from the MLEs of those parameters.)

A convenient framework for parameter estimates including uncertainties is Bayesian inference, which defines the posterior probability density for the parameters

and , or equivalently and , given a set of game results and prior assumptions , as

(3.12)

A variety of choices can be made for the multivariate prior distribution on [16] in the Bradley-Terry model, and likewise for the tie/overtime parameter . For simplicity, we work in this paper with the improper Haldane prior444So named because the marginal prior distribution for probabilities such as will follow the Haldane prior [10, 11], which is the limit of a distribution as .

(3.13)

which means that the posterior distribution is proportional to the likelihood:

(3.14)

With this choice of prior, the posterior probability density will be independent of the combination , but otherwise will be normalizable under the same circumstances that lead to well-defined maximum likelihood estimates for the parameters.

3.2.1 Gaussian Approximation

One convenient approach is to Taylor expand the log-posterior about the maximum a posteriori solution (which in this case is the maximum likelihood solution ).555

Note that this method does not assign special significance to the MAP estimates, but uses them as the starting point for a convenient approximation to the posterior probability distribution.

Truncating the expansion at second order gives a Gaussian approximation

(3.15)

where is the Hessian matrix

(3.16)

with elements666Note the similarity to the Fisher information matrix , which differs from the Hessian in that that depends on the observed data, while is a function defined on parameter space.

(3.17a)
(3.17b)
(3.17c)

To compute the elements of the Hessian matrix, we return to the first derivative (3.4) and differentiate them to get

(3.18)

where

(3.19)

is the probability of a tie or overtime game, depending on the model, and we have used the fact that since . Similarly, using the properties , , , and , we find

(3.20)

and, finally, differentiating (3.5) gives us

(3.21)

so that the Hessian matrix has components

(3.22a)
(3.22b)
(3.22c)

Note that in the case of the Bradley-Terry model, where the only outcomes are win and loss, the condition simplifies the Hessian to (since the parameter is not actually part of the likelihood), and

(3.23)

which is the form seen in, e.g., [17].

The Hessian matrix in (3.22) is singular, since and , which ultimately arise from the fact that the probabilities , and thus the likelihood, are unchanged by adding the same constant to all the

. This can be handled computationally by constructing a variance-covariance matrix

which is the Moore-Penrose pseudo-inverse[12]777

For a real symmetric matrix with a complete eigenvalue decomposition, this operation replaces each non-zero eigenvalue with its reciprocal while leaving zero eigenvalues unchanged.

of the Hessian matrix, and approximating the posterior as a multivariate Gaussian with a mean of and a variance-covariance matrix . This has the effect of enforcing the constraint

(3.24)

on the combination of the parameters which has no influence on the model.

This Gaussian approximation can be used to produce analytic estimates of quantities of interest, or used for Monte Carlo sampling, as illustrated in Section 4. It can also be used as a starting point for importance sampling of the sort discussed in [17]. For the present work, we consider a different Monte Carlo method for sampling from the exact posterior.

3.2.2 Hamiltonian Monte Carlo

Markov-chain Monte Carlo methods provide a convenient way to draw samples from a posterior distribution. We demonstrate in this paper how to draw posterior samples for the Bradley-Terry extensions considered, using Hamiltonian Monte Carlo as implemented in the Stan library.[15] There are a few technical considerations. Because the posterior on and

is improper, trying to draw from it directly will lead to chains which never converge. Any probabilities constructed from the samples will be well-behaved, since only the meaningless degree of freedom

is unconstrained, but these apparent errors make it more difficult to detect other potential problems. It is thus useful instead to consider only variables (and ) which contribute to the probability model via (see (2.7))

(3.25)

Of course, the full set of values are not independent. Instead, they are determined by the parameters for . Given the we can construct .

In Appendix A we show the code of the Stan model used to perform Hamiltonian Monte Carlo simulations of all three models.

()
Team Cg Ck Qn SL RW OW OL RL
Colgate (Cg) 1(1) 1(0) 2(1) 4 2 3 9
Clarkson (Ck) 3(1) 1(2) 1(0) 5 3 4 2
Quinnipiac (Qn) 4(1) 1(2) 4(1) 9 4 2 3
St. Lawrence (SL) 2(1) 0(1) 1(0) 3 2 2 7
Table 1: Results of the 2020-2021 Eastern College Athletic Conference (ECAC) season, showing the number of regulation wins and overtime/shootout wins for each team against each opponent. From these we can derive the total number of results of each type (RW, OW, OL and RL) for each team, which are used, for example, to generate the standings in the 3-2-1-0 point system

4 Demonstration Using Game Results

We now illustrate the application of the models described in this paper using game results from a competition which used the 3-2-1-0 point system: the 2020-2021 Eastern College Athletic Conference (ECAC) season. While the league ordinarily plays a balanced round-robin schedule in which each team plays each other team the same number of times, the season in question ended up being unbalanced due to cancellations of games arising from the COVID-19 pandemic. In Table 1 we show the results for the ECAC season, in the form of and for each team against each opponent, along with the total number of results of each type for each team, .

4.1 ECAC: Standard Bradley-Terry Model

As a first demonstration, we consider the standard Bradley-Terry model applied to the ECAC results with regulation and overtime/shootout wins being counted as simply “wins” and regulation and overtime/shootout losses being counted as “losses”. I.e., we define and . The resulting maximum-likelihood solutions and associated probabilities are shown in Table 2, along with the uncertainties and correlations encoded in the variance-covariance matrix of the Gaussian approximation to the posterior distribution.

Team Cg Ck Qn SL Cg Ck Qn SL
Cg
Ck
Qn
SL
Table 2: The maximum likelihood estimates and parameters of the the Gaussian approximation to the posterior distribution for the Bradley-Terry model applied to the 2020-2021 ECAC results, with regulation and overtime/shootout results counted the same. The maximum likelihood estimate for each team’s log-strength has an associated one-sigma uncertainty . The variance-covariance matrix can be converted to a correlation matrix . Note that the information included in is also influenced by the constraint , so for example the anti-correlation of the different log-strengths is somewhat artificial. We also show the maximum-likelihood estimates for the head-to-head win probabilities between pairs of teams
Figure 1: Posterior probability density for difference in log-strengths between selected pairs of teams (left: Quinnipiac and Colgate; right: Quinnipiac and Clarkson), based on 2020-2021 ECAC game results in the standard Bradley-Terry model with regulation and overtime/shootout wins treated the same. The dotted red vertical line shows the maximum likelihod estimate . Since the Haldane prior used is uniform in the , this is also the maximum a posteriori (MAP) value. The curves show the approximate Gaussian posterior from expanding about the MAP value (solid blue line), along with density estimates from a set of Monte Carlo samples drawn from that distribution (dashed brown line), and a set of samples drawn from the exact distribution using Hamiltonian Monte Carlo (dot-dash black line). Differences between the Gaussian approximation and the samples from the exact posterior are small, but can be noticeable, especially if the maximum likelihood estimate is far from zero. For reference, note that the “Gaussian approx” and “Gaussian MC” curves should only differ due to Monte Carlo errors in the construction of the latter
Figure 2: Posterior probability density for the win probability predicted by the Bradley-Terry model for selected pairs of teams, as in Figure 1. Note that, due to the transformation of the probability density, the maximum of the probability density in this parameter is not the maximum likelihood value as it was in Figure 1

Since the log-strengths have an arbitrary additive scale, a more meaningful understanding of the posterior distributions is obtained by considering the marginal distribution of the difference of a pair of team strengths . In Figure 1, we illustrate the maximum likelihood estimate and posterior distribution of this quantity for two of the six pairs of teams: Quinnipiac-Colgate and Quinnipiac-Clarkson. We show the posterior in Gaussian approximation (for which the marginal posterior on

is also a Gaussian), in a Monte Carlo drawn from the approximate multivariate Gaussian distribution, and in posterior samples from the exact posterior generated using Hamiltonian Monte Carlo with the Stan library.

[15] We can transform the posterior on a difference in log-strength into a posterior on the corresponding probability ; this is shown in Figure 2 for the two sets of posterior samples. In all cases, the exact marginal posterior, as estimated by the Hamiltonian Monte Carlo is only slightly different from the Gaussian approximation. This is similar to results found using importance sampling in [17].

()
Team Cg Ck Qn SL
Cg () () ()
Ck () () ()
Qn () () ()
SL () () ()
Table 3: The maximum likelihood estimates for the Bradley-Terry-Davidson model applied to the 2020-2021 ECAC results, with all overtime games counted as ties. The maximum likelihood estimates and of the log-strengths and log tie parameter are used to compute the estimated probability for a win and for a tie between each pair of teams. Note that the estimated probability of a game between evenly-matched teams to end in a tie is , and it is lower the more different the two teams’ strengths are
Team Cg Ck Qn SL
Cg
Ck
Qn
SL
Table 4: The parameters of the the Gaussian approximation to the posterior distribution for the Bradley-Terry-Davidson model applied to the 2020-2021 ECAC results, with all overtime games counted as ties. In addition to the log-strength parameters considered for the Bradley-Terry model in Table 2, there are uncertainties and correlations associated with the log-tie parameter

4.2 ECAC: Bradley-Terry-Davidson Model with Ties

Moving on to the Bradley-Terry-Davidson model with ties, we now consider inference of the log-strength parameters along with the log-tie parameter . We illustrate the methods by reanalyzing the 2020-2021 ECAC results, with all overtime games treated as ties, so that now , , and . The maximum likelihood solutions and are shown in Table 3, along with the associated probabilities for a win and for a tie in contests between pairs of teams. In Table 4, we show the maxumum-likelihood estimates along with the uncertainties in and correlations among the log-strengths and the log-tie parameter , which are encoded in the variance-covariance matrix of the Gaussian approximation to the posterior distribution.

As with the standard Bradley-Terry model, we can show the marginal posterior distributions on the differences between pairs of log-strength parameters, and we do this in Figure 3 for the same pairs of teams as before. Once again, samples drawn from the multivariate Gaussian approximation capture the shape of that distribution well, and samples drawn from the exact posterior using Hamiltonian Monte Carlo are slightly different but similar.

We cannot convert directly into a probability, however, since probabilities depend on the log-tie parameter as well. In Figure 4 we plot the marginal posterior on . The parameter can be transformed into a probability where () for a game between evenly-matched teams to be tied, and we plot the posterior for this as well. Finally, in Figure 5 we illustrate the joint marginal posterior in and for our selected pairs of teams.

Figure 3: Posterior probability density for difference in log-strengths between selected pairs of teams (left: Quinnipiac and Colgate; right: Quinnipiac and Clarkson), based on the Bradley-Terry-Davidson model applied to the 2020-2021 ECAC results, with all overtime games counted as ties. Curves are as defined in Figure 1
Figure 4: Posterior probability density for the log-tie parameter (left) and the associated probability (right) of a tie game between evenly matched teams, in the Bradley-Terry-Davidson model applied to the 2020-2021 ECAC results, with all overtime games counted as ties. As in Figure 1 and Figure 3, the dashed vertical line is the maximum-likelihood estimate, the solid blue line is a Gaussian approximation to the posterior but expanding about the MAP point , and the dashed brown and dot-dash black lines are densty estimates, respectively constructed from a Monte Carlo sample from the approximate Gaussian distribution and from the exact distribution using Hamiltonian Monte Carlo. Note that while the MLE is the maximum of the marginal posterior on , the transformation of the posterior probability density means is not the maximum of the posterior on
Figure 5: Contours of the joint posterior probability density of the log-strength differences shown in Figure 3 and the log-tie parameter shown in the left panel of Figure 4. The red circle is the MLE . The solid blue curve are contours of the Gaussian approximation, and the dashed brown curves are density contours of a Monte Carlo sample drawn from that approximate distribution. The dot-dashed black curves are density contours of a sample from the exact distribution drawn using Hamiltonian Monte Carlo. As with the Bradley-Terry model, the exact and approximate posteriors are comparable, but differences are detectable beyond the level of the Monte Carlo uncertainties illustrated by the difference between the “Gaussian approx” and “Gaussian MC” contours

To illustrate the posterior on the probabilities for a pair of teams, we note that the constraint means that the space is actually two dimensional. The natural visualization for the behavior of three quantities which sum to one is a ternary plot, and we contour plot density estimates of the posterior and its Gaussian approximation in Figure 6, along with the maximum likelihood estimates .

Figure 6: Ternary plots illustrating the joint posterior on , , and , based on the Bradley-Terry-Davidson model applied to the 2020-2021 ECAC results, with all overtime games counted as ties. The horizontal gridlines correspond to lines of constant , with labelled as “Tie”; the diagonal gridlines correspond to lines of constant or , with labelled with the abbreviation for team (“Qn” for Quinnipiac in both cases) and labelled with the abbreviation for team (“Cg” for Colgate and “Ck” for Clarkson). The red triangle is the maximum likelihood point . Note that for a given set of game results, the maximum likelihood point for all pairs of teams will lie along a one-dimensional curve in the Ternary plot. since, for a fixed , the maximum-likelihood probabilities are functions of the single value . The three sets of contours are as defined in Figure 5. Note that the MLE is no longer the maximum of the posterior probability density after tranforming parameters from to , , and

4.3 ECAC: Bradley-Terry-like Model with Overtime/Shootout Results

Having developed the mechanisms to characterize the posterior distribution for the Bradley-Terry-Davidson model with three outcomes (win, tie, and loss), we apply similar analogues for the model with four outcomes: regulation win (RW), overtime/shootout win (OW), overtime/shootout loss (OL), and regulation loss (RL), now applied to the full 2020-2021 ECAC results shown in Table 1. As before, there is a log-strength parameter for each team, and is now the log of a parameter associated with overtime results. We show the maximum likelihood estimates in Table 5 along with the probabilities for a regulation win for an overtime/shootout win in contests between pairs of teams. In Table 6 we show the parameters of the Gaussian approximation to the posterior.

()
Team