Log In Sign Up

Combining historical data and bookmakers'odds in modelling football scores

Modelling football outcomes has gained increasing attention, in large part due to the potential for making substantial profits. Despite the strong connection existing between football models and the bookmakers' betting odds, no authors have used the latter for improving the fit and the predictive accuracy of these models. We have developed a hierarchical Bayesian Poisson model in which the scoring rates of the teams are convex combinations of parameters estimated from historical data and the additional source of the betting odds. We apply our analysis to a nine-year dataset of the most popular European leagues in order to predict match outcomes for their tenth seasons. In this paper, we provide numerical and graphical checks for our model.


page 16

page 17


Using Twitter to predict football outcomes

Twitter has been proven to be a notable source for predictive modelling ...

A Probabilistic Approach to Identifying Run Scoring Advantage in the Order of Playing Cricket

In the game of cricket, the result of coin toss is assumed to be one of ...

Poisson Modeling and Predicting English Premier League Goal Scoring

The English Premier League is well-known for being not only one of the m...

Modeling outcomes of soccer matches

We compare various extensions of the Bradley-Terry model and a hierarchi...

Betting the system: Using lineups to predict football scores

This paper aims to reduce randomness in football by analysing the role o...

Retrodictive Modelling of Modern Rugby Union: Extension of Bradley-Terry to Multiple Outcomes

Frequently in sporting competitions it is desirable to compare teams bas...

A framework for predicting, interpreting, and improving Learning Outcomes

It has long been recognized that academic success is a result of both co...

1 Introduction

In recent years, the challenge of modelling football outcomes has gained attention, in large part due to the potential for making substantial profits in betting markets. According to the current literature, this task may be achieved by adopting two different modelling strategies: the direct models, for the number of goals scored by two competing teams; and the indirect models, for estimating the probablility of the categorical outcome of a win, a draw, or a loss, which will hereafter be referred to as a three-way process.

The basic assumption of the direct models is that the number of goals scored by the two teams follow two Poisson distributions. Their dependence structure and the specification of their parameters are the other most relevant assumptions, according to the literature. The scores’ dependence issue is, in fact, the subject of much debate, and the discussion cannot yet be concluded. As one of the first contributors to the modelling of football scores,

Maher (1982) used two conditionally independent Poisson distributions, one for the goals scored by the home team, and another for the away team. Dixon and Coles (1997) expanded upon Maher’s work and extended his model, introducing a parametric dependence between the scores. This also represents the justification for the bivariate Poisson model, introduced in Karlis and Ntzoufras (2003) in a frequentist perspective, and in Ntzoufras (2011) under a Bayesian perspective. On the other hand, Baio and Blangiardo (2010) assume the conditional independence within hierarchical Bayesian models, on the grounds that the correlation of the goals is already taken into account by the hierarchical structure. Similarly, Groll and Abedieh (2013) and Groll et al. (2015) show that, up to a certain amount, the scores’ dependence on two competing teams may be explained by the inclusion of some specific teams’ covariates in the linear predictors. However, Dixon and Robinson (1998) note that modelling the dependence along a single match is possible: in such a case, a temporal structure in the 90 minutes is required.

The second common assumption is the inclusion in the models of some teams’ effects to describe the attack and the defence strengths of the competing teams. Generally, they are used for modelling the scoring rate of a given team, and in much of the aforementioned literature they do not vary over time. Of course, this is a major limitation of these models. Dixon and Coles (1997) tried to overcome this problem by downweighting the likelihood exponentially over time in order to reduce the impact of matches far from the current time of evaluation. However, over the last 10 years the advent of some dynamic models allowed these teams’ effects to vary over the season, and to have a temporal structure. The independent Poisson model proposed by Maher (1982) has been extended to a Bayesian dynamic independent model, where the evolution structure is based on continuous time (Rue and Salvesen, 2000), or is specified for discrete times, such as a random walk for both the attack and defence parameters (Owen, 2011). Instead the non-dynamic bivariate Poisson model is extended in Koopman and Lit (2015) and Koopman et al. (2017)

, and is expressed as a state space model where the teams’ effects vary in function of a state vector.

For our purposes, the scores’ dependence assumption may be relaxed, and in this paper we adopt a conditional independence assumption. From a purely conceptual point of view, we have several reasons for adopting two independent Poisson: (i) as discussed by Baio and Blangiardo (2010), assuming two conditionally independent Poisson hierarchical Bayesian models implicitely allows for correlation, since the observable variables are mixed at an upper level; (ii) as noted by McHale and Scarf (2011), there is empirical evidence that goals of two teams in seasonal leagues display only slightly positive correlation, or no correlation at all, whereas goals are negatively correlated for national teams; (iii) bivariate Poisson models (Karlis and Ntzoufras, 2003)

, which represent the most typical choice for modelling correlation, only allow for non-negative correlation. Moreover, the independence assumption allows for a simpler formulation for the likelihood function and simplifies the inclusion of the bookmakers’ odds in our model. Concerning the dynamic assumption of the teams’- specific effects, we use an autoregressive model by centring the effect of seasonal time

at the lagged effect in , plus a fixed effect.

Whatever the choices for the two assumptions discussed above, the model proposed in this context was built with both a descriptive and a predictive goal, and its parameters’ estimates/model probabilities were often used for building efficient betting strategies

(Dixon and Coles, 1997; Londono and Hassan, 2015). In fact, the well known expression ‘beating the bookmakers’ is often considered a mantra for whoever tries to predict football—or more generally, sports—results. As mentioned by Dixon and Coles (1997), to win money from the bookmakers requires a determination of probabilities, which is sufficiently more accurate than those obtained from the odds. On the other hand, it is empirically known that betting odds are the most accurate source of information for forecasting sports performances (Štrumbelj, 2014). However, at least two issues deserve a deep analysis: how to determine probability forecasts from the raw betting odds, and how to use this source of information within a forecasting model (e.g., to predict the number of goals). Concerning the first point, it is well known that the betting odds do not correspond to probabilities; in fact, to make a profit, bookmakers set unfair odds, and they have a ‘take’ of 5-10%. In order to derive a set of coherent probabilities from these odds, many researchers have used the basic normalization procedure, by normalising the inverse odds up to their sum. Alternatively, Forrest et al. (2005) and Forrest and Simmons (2002) propose a regression model-based approach, modelling the betting probabilities through an historical set of betting odds and match outcomes. But, Štrumbelj (2014) shows that Shin’s procedure (Shin, 1991, 1993) gives the best results overall, being preferable both to the basic normalisation and regression approaches. Concerning the second issue, a small amount of literature focused on using the existing betting odds as part of a statistical model for improving the predictive accuracy and the model fit. Londono and Hassan (2015)

used the betting odds for eliciting the hyperparameters of a Dirichlet distribution, and then updated them based on observations of the categorical three-way process. No researcher has tried to implement a similar strategy within the framework of the direct models.

In this paper we try to fill the gap, creating a bridge between the betting odds and betting probabilities on one hand and the statistical modelling of the scores. Once we transform the inverse betting odds into probabilities, we will develop a procedure to (i) infer from these the implicit scoring intensities, according to the bookmakers, and (ii) use these implicit intensities directly in the conditionally independent Poisson model for the scores, within a Bayesian perspective. We are interested in both the estimation of the models parameters, and in the prediction of a new set of matches. Intuitively, the latter task is much more difficult than the former, since football is intrinsically noisy and hardly predictable. However, we believe that combining the betting odds with an historical set of data on match results may give predictions that are more accurate than those obtained from a single source of information.

In Section 2 we introduce two methods, proposed by the current literature, for transforming the three-way betting odds favoured by bookmakers into probabilities. In Section 3, we introduce the full model, along with the implicit scoring rates. The results and predictive accuracy of the model on the top four European leagues—Bundesliga, Premier League, La Liga and Serie A—are presented in Section 4

, and are summarised through posterior probabilities and graphical checks. Some profitable betting strategies are briefly presented in Section 

5. Section 6 concludes our analysis.

2 Transforming the betting odds into probabilities

The connection between betting odds and probabilities has been broadly investigated over the last decades. Before proceeding, we will introduce the formal definition of odd and the related notation we are going to use throughout the rest of the paper. The odds of any given event are usually specified as the amount of money we would win if we bet one unit on that event. Thus, the odd 2.5 corresponds to 2.5 euro (or pounds) we would win betting 1 euro. The inverse odd—usually denoted as 1:2.5—corresponds to the unfair probability associated to that event. In fact, as is widely known, the betting odds do not correspond to probabilities: the sum of the inverse odds for a single match needs to be greater than one (Dixon and Coles, 1997) in order to guarantee the bookmakers’ profit. Here, , , and denote the vector of the inverse betting odds, the vector of the estimated betting probabilities, and the set of the three-way possible results for the -th game, respectively.

There is empirical evidence that the betting odds are the most accurate available source of probability forecasts for sports (Štrumbelj, 2014); in other words, forecasts based on odds-probabilities have been shown to be better, or at least as good as, statistical models, which use sport-specific predictors and/or expert tipsters.

However, some issues remain open. Among these is a strong debate over which method to use for inferring a set of probabilities from the raw betting odds. We can transform them into probabilities by using the two procedures proposed in the literature: the basic normalisation—dividing the inverse odds by the booksum, i.e. the sum of the inverse betting odds, as broadly explained in Štrumbelj (2014)—and Shin’s procedure described in Shin (1991, 1993). Štrumbelj (2014), Cain et al. (2002, 2003), and Smith et al. (2009) show that Shin’s probabilities improve over the basic normalisation: in Štrumbelj (2014) this result has been achieved by the application of the Ranked Probability Score (RPS) (Epstein, 1969), which may be defined as a discrepancy measure between the probability of a three-way process outcome and the actual outcome.

In this paper we will not focus focus on comparing these two procedures; rather, we are interested in using the probabilities derived from each for statistical and prediction purposes, as will become clearer in later sections.


Basic normalisation


where is the so called booksum (Štrumbelj, 2014). The method has gained a great popularity due to its simplicity.


Shin’s procedure

In the model proposed by Shin (1993), the bookmakers specify their odds in order to maximise their expected profit in a market with uninformed bettors and insider traders. The latter are those particular actors who, due to superior information, are assumed to already know the outcome of a given event—e.g. football match, horse race, etc.—before the event takes place. Their contribution in the global betting volume is quantified by the percentage . Jullien et al. (1994) used Shin’s model to explicitly work out the expression for the betting probabilities:


so that . The current literature refers to these as Shin’s probabilities. The formula above is a function depending on the insider trading rate , which Jullien et al. (1994) suggested should be estimated by nonlinear least squares as:

The value here obtained may be defined as the minimum rate of insider traders that yields probabilities corresponding to the vector of inverse betting odds .

Both of these methods yield probabilities, with the difference that Shin’s procedure entails a function of the insider traders’ rate which needs to be minimised for every match. Figure 4 displays the three-way betting probabilities obtained through the two procedures described above for La Liga (Spanish championship), from the season 2007-2008 to the season 2016-2017. As may be noted, the Draw probabilities obtained with the basic normalisation tend to be higher than those obtained with Shin’s procedure. Conversely, as a home win and an away win tend to become more likely, Shin’s procedure tends to favour them.

(a) Home win
(b) Draw
(c) Away win
Figure 4: Comparison between Shin probabilities (-axis) and basic normalised probabilities (-axis) for the Spanish La Liga championship (seasons from 2007/2008 to 2016/2017), according to seven different bookmakers.

As is intuitive, a higher probability of a home win should somehow be associated with a greater number of goals scored by the home team, and the same for an away team.

3 Model

3.1 Model for the scores

Here, denotes the vector of observed scores, where and are the number of goals scored by the home team and by the away team respectively in the -th match of the dataset. According to the motivations provided by Baio and Blangiardo (2010), in this paper we adopt a conditional independence assumption between the scores. This choice allows for a simpler formulation for the likelihood function and, later on, for the direct inclusion of the bookmakers’ odds into the model through the Skellam distribution (Karlis and Ntzoufras, 2009). The model for the scores is then specified as:


where is modelled as conditionally independent Poisson and the joint parameter represents the scoring intensities in the -th game, for the home team and for the away team respectively. In what follows, we will refer to (3) as the basic model, which is estimated using the past scores. The main novelty of this paper consists of enriching this specification by including the extra information which stems from the bookmakers’ betting odds. Thus, for each pair of match and bookmaker the betting probabilities , derived with one of the methods in Section 2, may be used to find out the values , which solve the following nonlinear system of equations:


The existence of these values is guaranteed by the fact that, under (3), , where denotes the Poisson-Difference distribution, also known as Skellam distribution, with parameters and mean . In such a way, we obtain for each pair the implicit scoring rates , somehow inferring the scoring intensities implicit in the three-way bookmakers’ odds. Now, we consider our augmented dataset by including as auxiliary data the observed . For every , our new data vector is represented by:

Now, from Equation (3) we move to the following specification:


where are bookmakers’ parameters introduced for modelling the additional data , as explained in the next section. Parameters are assigned a non-informative prior distribution, with hyper-parameters and , e.g. .

3.2 Model for the rates

Equation (5) introduced a convex combination for the Poisson parameters, accounting for both the scoring rates and the bookmakers’ parameters . Denoting with the number of teams, the common specification for the scoring intensities is a log-linear model in which for each :


with the nested index denoting the team in the -th game. The parameter represents the well-known football advantage of playing at home, and is assumed to be constant for all the teams over time, as in the current literature. The attack and defence strengths of the competing teams are modelled by the parameters and respectively. Baio and Blangiardo (2010) and Dixon and Coles (1997) assume that these team-specific effects do not vary over the time, and this represents a major limitation in their models. In fact, Dixon and Robinson (1998) show that the attack and defence effects are not static and and may even vary during a single match; thus, a static assumption is often not reliable for making predictions and represents a crude approximation of the reality. Rue and Salvesen (2000) propose a generalised linear Bayesian model in which the team-effects at match time

are drawn from a Normal distribution centred at the team-effects at match time

, and with a variance term depending on the time difference. We make a seasonal assumption considering the effects for the season

following a Normal distribution centred at the previous seasonal effect plus a fixed component. For each :


while, for the first season, we assume:


As outlined in the literature, we need to impose a ‘zero-sum’ identifiability constraint within each season to these random effects:

whereas and the hyperparameters of our model are assigned weakly informative priors:


denotes the half-Cauchy distribution, centred in 0 and with scale 2.5.

111On the choice of the half-Cauchy distribution for scale parameters, see Gelman et al. (2006). The team-specific effects modelled through Equations (7) and (8) are estimated from the past scores in the dataset. As expressed in (5), we add a level to the hierarchy, by including the implicit scoring rates as a separate data model. Given, then, a further level which consists of bookmakers, it is natural to consider as the model parameters for the observed . More precisely, these parameters represent the means of two truncated Normal distributions for the further implicit scoring rates model:


where is the common notation for the density of a truncated Normal with parameters and defined in the interval . are in turn assigned two truncated Normal distributions:


with hyperparameters .

4 Applications and results: top four European leagues

4.1 Data

We collected the exact scores for the top four European professional leagues—Italian Serie A, English Premier League, German Bundesliga, and Spanish La Liga—from season 2007/2008 to 2016/2017. Moreover, we also collected all the three-way odds for the following bookmakers: Bet365, Bet&Win, Interwetten, Ladbrokes, Sportingbet, VC Bet, William Hill. All these data have been downloaded from the public available page

. We are interested in both (a) posterior predictive checks in terms of replicated data under our models, and (b) out-of-sample predictions for a new dataset. According to point (b), which appears to be more appealing for fans, bettors and statisticians, let

denote the training set, and the test set. Our training set contains the results of nine seasons for each professional league, and our test set contains the results of the tenth season.

The model coding has been implemented in WinBUGS (Spiegelhalter et al., 2003) and in Stan (Stan Development Team, 2016). We ran our MCMC simulation for iterations, with a burn-in period of , and we monitored the convergence using the usual MCMC diagnostic (Gelman et al., 2014).

4.2 Parameter estimates

As broadly explained in Section 3, the model in (5) combines historical information about the scores and betting information about the odds. We acknowledge that the scoring rate is a convex combination that borrows strengths from both the sources of information. Figure 5 displays the posterior estimates for the attack and the defence parameters associated with the teams belonging to the English Premier League during the test set season 2016-2017. The larger is the team- attack parameter, and the greater is the attacking quality for that team; conversely, the lower is the team- defence parameter, and the better is the defence power for that team. As a general comment, after reminding the reader that these quantities are estimated using only the historical results, the pattern seems to reflect the actual strength of the teams across the seasons. For example Chelsea and Manchester City register the highest effects for the attack and the lowest for the defence across the nine seasons considered: consequently, the out-of-sample estimates for the tenth season mirror previous performance. Conversely, weaker teams are associated with an inverse pattern: see for instance Hull City, Middlesbrough, and Sunderland, all relegated at the end of the season. It is worth noting that some wide posterior bars are associated to those teams with fewer seasonal observations: in fact, for simplicity, we do not account for a relegation system, and some teams have been observed less during the seasons considered.

Figure 5: Posterior 50% confidence bars for the attack (red) and the defence (blue) effects along the 10 seasons for the teams belonging to the Premier League 2016/2017. Wider posterior bars are associated with teams reporting fewer observations.

Figure 8 displays the ordered 50% confidence bars for the marginal posteriors of the probabilities parameter , which appear in (5), computed for the German Bundesliga. Despite the high variability, these plots suggest that the amount of information that stems from the bookmakers is comparable with that arising from historical information. Then, the convex combination in (5) seems to be an adequate option for our purposes.

Figure 8: Ordered posterior 50% confidence bars for parameters for German Bundesliga (from 2007-2008 to 2015-2016), 2754 matches.

4.3 Model fit

As broadly explained in Gelman et al. (2014), once we obtain some estimates from a Bayesian model we should assess the fit of this model to the data at hand and the plausibility of such model, given the purposes for which it was built. The principal tool designed for achieving this task is posterior predictive checking. This post-model procedure consists of verifying whether some additional replicated data under our model are similar to the observed data. Thus, we draw simulated values from the joint predictive distribution of replicated data:

It is worth noting that the symbol used here is different from the symbol used in the next section. The former is just a replication of , the latter is any future observable value.

Then, we define a test statistic

for assessing the discrepancy between the model and the data. A lack of fit of the model with respect to the posterior predictive distribution may be measured by tail-area posterior probabilities, or Bayesian -values


As a practical utility, we usually do not compute the integral in (4.3), but compute the posterior predictive distribution through simulation. If we denote with the -th MCMC draw from the posterior distribution of , we just draw from the predictive distribution . Hence, an estimate for the Bayesian -value is given by the proportion of the simulations for which the quantity exceeds the observed quantity . From an interpretative point of view, an extreme -value—too close to 0 or 1—suggests a lack of fit of the model compared to the observed data.

Rather than comparing the posterior distribution of some statistics with their observed values (Gelman et al., 2014), we propose a slightly different approach, allowing for a broader comparison of the replicated data under the model. Figure 13 displays the replicated distributions (grey areas) and the observed goals’ difference (red horizontal line) from the top four European leagues. From this plots the fit of the model seems good: in other words, the replicated data under the model are plausible and close to the data at hand. As it may be noted, the variability of the replicated goals’ difference amounting to -1, 0, 1 is greater than the variability for a goals’ difference of -3 or 3. Moreover, the observed goals’ differences always fall within the replicated distributions. In correspondence of a draw—goal difference of 0—the observed goals’ differences register an high posterior probability if compared with the corresponding replicated distribution.

(a) Bundesliga
(b) La Liga
(c) Premier League
(d) Serie A
Figure 13: PP check for the goals’ difference against the replicated goals’ difference for the top four European leagues . For each league, the graphical posterior predictive checks show that the model fits the data well.

4.4 Prediction and posterior probabilities

The main appeal of a statistical model relies on its predictive accuracy. As usual in a Bayesian framework, the prediction for a new dataset may be performed directly via the posterior predictive distribution for our unknown set of observable values. Following the same notation of Gelman et al. (2014), let us denote with a generic unknown observable. Its distribution is then conditional on the observed ,

where the conditional independence of and given is assumed. Figure 14 displays the posterior predictive distributions for Real Madrid-Barcelona, Spanish La Liga 2016/2017, and for Sampdoria-Juventus, Italian Serie A. The red square indicates the observed result, (2,3) for the first match and (0,1) for the second match respectively. Darker regions are associated with higher posterior probabilities. According to the model, the most likely result for the first game is (2,1), with an associated posterior probability slightly greater than 0.08, whereas the most likely result coincide with the actual result (0,1) for the second game.

These plots are not actually suggesting the most likely result: would it be smart to bet on an event with an associated probability about 0.09? Maybe, not. Rather, these plots provide a picture that acknowledges the large uncertainty of the prediction. We are not really interested in a model that often indicates a rare result that has been observed as the most likely outcome; we suspect, in fact, that a model which would favour the outcome (2,3) as most (or quite) likely, is probably not a good model. Rather, being aware of the unpredictable nature of football, we would like to grasp the posterior uncertainty of a match outcome in such a way that the actual result is not extreme in the predictive distribution.


Figure 14: Posterior predictive distribution of the possible results for the match Real Madrid-Barcelona, Spanish La Liga 2016/2017, and Sampdoria-Juventus, Italian Serie A 2016-2017. Both the plots report the posterior uncertainty related to the exact predicted outcome. Darker regions are associated with higher posterior probabilities and red square corresponds with the observed result.

Table 1 and Table 2 report the estimated posterior probabilities for each team being the first, the second, and the third; the first relegated, the second relegated, and the third relegated for each of the top four leagues, together with the observed rank and the achieved points, respectively. At the beginning of the 2016-2017 season, Bayern Munich had an estimated probability 0.8168 of winning the German league, which it actually did; in Italy, Juventus had an high probability of being the first (0.592) as well. Conversely, Chelsea had a low associated probability to win the English Premier League at the beginning of the season, and this is mainly due to the bad results obtained by Chelsea in the previous season. Of course, the model does not account for the players’/managers’ transfer market occurring in the summer period. In July 2016, Chelsea hired Antonio Conte, one of the best European managers, who won the English Premier League on his first attempt. For the relegated teams, it is worth noting that Pescara has high estimated probability to be the worst team of the Italian league (0.46). Globally, the model appears able to identify the teams with an associated high relegation’s posterior probability.

Team P(1st) P(2nd) P(3rd) Actual rank Points
Bayern Munich 0.8168 0.1508 0.0248 1 82
RB Leipzig 0.008 0.0284 0.0608 2 67
Dortmund 0.1332 0.4712 0.1856 3 64
Chelsea 0.1396 0.1592 0.1584 1 93
Tottenham 0.1096 0.132 0.1424 2 86
Man City 0.3904 0.2004 0.1388 3 78
Real Madrid 0.3868 0.4844 0.1076 1 93
Barcelona 0.5652 0.3536 0.0728 2 90
Ath Madrid 0.046 0.1348 0.5556 3 78
Juventus 0.592 0.2335 0.107 1 91
Roma 0.1535 0.263 0.2595 2 87
Napoli 0.206 0.2965 0.213 3 86
Table 1: Estimated posterior probabilities for each team being the first, the second, and the third in the Bundesliga, Premier League, La Liga and Serie A 2016-2017, together with the observed rank and the number of points achieved.
Team P(1st rel) P(2nd rel) P(3d rel) Actual rank Points
Wolfsburg 0.0212 0.0236 0.0064 16 37
Ingolstadt 0.0952 0.0904 0.0912 17 32
Darmstadt 0.1192 0.1552 0.2528 18 25
Hull 0.1384 0.1512 0.1428 18 34
Middlesbrough 0.118 0.1448 0.1812 19 28
Sunderland 0.1272 0.1228 0.1144 20 24
Sp Gijon 0.1132 0.1112 0.1016 18 31
Osasuna 0.1464 0.174 0.228 19 22
Granada 0.138 0.1748 0.2476 20 20
Empoli 0.0795 0.066 0.0415 18 32
Palermo 0.132 0.1765 0.1205 19 26
Pescara 0.1215 0.178 0.46 20 18
Table 2: Estimated posterior probabilities for each team being the first, the second, and the third relegated team in the Bundesliga, Premier League, La Liga, and Serie A 2016-2017, together with the observed rank and the number of points achieved.
(a) Bundesliga
(b) Premier League
(c) La Liga
(d) Serie A
Figure 19: Posterior 50% confidence bars (grey ribbons) for the achieved final points of the top-four European leagues 2016-2017. Black dots are the observed points. Black lines are the posterior medians. At a first glance, the pattern of the predicted ranks appears to match the pattern of the observed ones, and the model calibration appears satisfying.

Figure 19

provides posterior 50% confidence bars (grey ribbons) for the predicted achieved points for each team in top four European leagues 2016-2017 at the end of their respective seasons, together with the observed final ranks. At a first glance, the four predicted posterior ranks appear to detect a pattern similar to the observed ones, with only a few exceptions. As may be noticed for Bundesliga (Panel (a)), Bayern Munich’s prediction mirrors its actual strength in the 2016-2017 season, whereas RB Leipzig was definitely underestimated by the model. Still, the model cannot handle the budget’s information, and RB Leipzig was one of the richest teams in the Bundesliga in 2016-2017. In the English Premier League (Panel (b)), Chelsea was definitely underestimated by the model, whereas Manchester City actually gained the predicted number of points (78). The predicted pattern for the Spanish La Liga (Panel (c)) is extremely close to the one we observed, apart from the winner (our model favoured Barcelona, second in the observed rank). The worst teams (Sporting Gijon, Osasuna and Granada) are correctly predicted to be relegated. Also, for the Italian Serie A, the predicted ranks globally match the observed ranks. The outlier is represented by Atalanta, a team that performed incredibly well and qualified for the Europa League at the end of the last season. As a general comment, we may conclude that these plots show a good model calibration, since more or less half of the observed points fall in the posterior 50% confidence bars.

5 A preliminary betting strategy

In this section we provide a real betting experiment, assessing the performance of our model compared to the existing betting odds. In a betting strategy, two main questions arise: it is worth betting on a given single match? If so, how much is worth betting? In Section 2, we described two different procedures for inferring a vector of betting probabilities from the inverse odds vector . The common expression ‘beating the bookmakers’ may be interpreted in two distinct ways: from a probabilistic point of view, and from a profitable point of view. According to the first definition, which is more appealing for statisticians, a bookmaker is beaten whenever our matches’ probabilities are more favorable than their probabilities. As before, denotes the betting probability provided by the -th bookmaker for the -th game, with . Additionally, let and

denote the random variables representing the number of goals scored by two teams in the

-th match. From our model in (5), we can compute the following three-way model’s posterior probabilities: for each , using the results of the Skellam distribution outlined in Section 3. In fact, , where and are the convex combinations of the posterior estimates obtained through the MCMC sampling. Thus, the global average probability of a correct prediction for our model may be defined as:


where denotes the Kronecker’s delta, with if the observed result at the -th match is . This quantity serves as a global measure of performance for comparing the predictive accuracy between the posterior match probabilities provided by the model and those obtained from the bookmakers’ odds.

Model Shin Basic
Bundesliga 0.4010 0.4100 0.4072
Premier League 0.4349 0.4516 0.4480
La Liga 0.4553 0.4584 0.4549
Serie A 0.4430 0.4554 0.4507
Table 3: Average correct probabilities of three-way bets, obtained through our model, Shin probabilities and basic probabilities (here we take the average of the seven bookmakers considered). Greater values indicate better predictive accuracy.

As reported in Table 3, our model is very close to the bookmakers’ probabilities (Shin’s method and basic procedure). At a first glance, one may be tempted to say that, according to this measure, our model does not improve the bookmakers’ probabilities. However, this index is only an average measure of the predictive power, which does not take into account the possible profits for the single matches.

(a) Bundesliga
(b) Liga
(c) Premier League
(d) Serie A
Figure 24: Expected profits (%/100) standard errors for the seven bookmakers considered, for each of the top four European leagues.

According to the second definition, ‘beating the bookmaker’ means earning money by betting according to our model’s probabilities. One could bet one unit on the three-way match outcome with the highest expected return (Strategy A) or place different amounts, basing each bet on the match’s profit variability, as suggested in Rue and Salvesen (2000) (Strategy B). The expected profits (percentages divided by 100) are reported in Figure 24, along with their standard errors. Although not explicitly shown here, gambling with the betting odds probabilities, we would always incur a sure loss. Conversely, betting with our posterior model probabilities yields high positive returns for each league and each bookmaker.

6 Discussion and further work

We have proposed a new hierarchical Bayesian Poisson model in which the rates are convex combinations of parameters accounting for two different sources of data: the bookmakers’ betting odds and the historical match results. We transformed the inverse betting odds into probabilities and we worked out the bookmakers’ scoring rates through the Skellam distribution. A wide graphical and numerical analysis for the top four European leagues has shown a good predictive accuracy for our model, and surprising results in terms of expected profits. These results confirm on one hand that the information contained in the betting odds is relevant in terms of football prediction; on the other hand that, combining this information with historical data allows for a natural extension of the existing models for football scores.


  • Baio and Blangiardo (2010) Baio, G. and M. Blangiardo (2010). Bayesian hierarchical model for the prediction of football results. Journal of Applied Statistics 37(2), 253–264.
  • Cain et al. (2002) Cain, M., D. Law, and D. Peel (2002). Is one price enough to value a state-contingent asset correctly? Evidence from a gambling market. Applied Financial Economics 12(1), 33–38.
  • Cain et al. (2003) Cain, M., D. Law, and D. Peel (2003). The favourite-longshot bias, bookmaker margins and insider trading in a variety of betting markets. Bulletin of Economic Research 55(3), 263–273.
  • Dixon and Robinson (1998) Dixon, M. and M. Robinson (1998). A birth process model for association football matches. Journal of the Royal Statistical Society: Series D (The Statistician) 47(3), 523–538.
  • Dixon and Coles (1997) Dixon, M. J. and S. G. Coles (1997). Modelling association football scores and inefficiencies in the football betting market. Journal of the Royal Statistical Society: Series C (Applied Statistics) 46(2), 265–280.
  • Epstein (1969) Epstein, E. S. (1969). A scoring system for probability forecasts of ranked categories. Journal of Applied Meteorology 8(6), 985–987.
  • Forrest et al. (2005) Forrest, D., J. Goddard, and R. Simmons (2005). Odds-setters as forecasters: The case of english football. International journal of forecasting 21(3), 551–564.
  • Forrest and Simmons (2002) Forrest, D. and R. Simmons (2002). Outcome uncertainty and attendance demand in sport: the case of english soccer. Journal of the Royal Statistical Society: Series D (The Statistician) 51(2), 229–241.
  • Gelman et al. (2006) Gelman, A. et al. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian analysis 1(3), 515–534.
  • Gelman et al. (2014) Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (2014). Bayesian data analysis, Volume 2. Chapman & Hall/CRC Boca Raton, FL, USA.
  • Groll and Abedieh (2013) Groll, A. and J. Abedieh (2013).

    Spain retains its title and sets a new record–generalized linear mixed models on European football championships.

    Journal of Quantitative Analysis in Sports 9(1), 51–66.
  • Groll et al. (2015) Groll, A., G. Schauberger, and G. Tutz (2015). Prediction of major international soccer tournaments based on team-specific regularized poisson regression: An application to the fifa world cup 2014. Journal of Quantitative Analysis in Sports 11(2), 97–115.
  • Jullien et al. (1994) Jullien, B., B. Salanié, et al. (1994). Measuring the incidence of insider trading: a comment on Shin. Economic Journal 104(427), 1418–19.
  • Karlis and Ntzoufras (2003) Karlis, D. and I. Ntzoufras (2003). Analysis of sports data by using bivariate Poisson models. Journal of the Royal Statistical Society: Series D (The Statistician) 52(3), 381–393.
  • Karlis and Ntzoufras (2009) Karlis, D. and I. Ntzoufras (2009). Bayesian modelling of football outcomes: using the Skellam’s distribution for the goal difference. IMA Journal of Management Mathematics 20(2), 133–145.
  • Koopman and Lit (2015) Koopman, S. J. and R. Lit (2015). A dynamic bivariate poisson model for analysing and forecasting match results in the english premier league. Journal of the Royal Statistical Society: Series A (Statistics in Society) 178(1), 167–186.
  • Koopman et al. (2017) Koopman, S. J. S., R. Lit, et al. (2017). Forecasting football match results in national league competitions using score-driven time series models. Technical report, Tinbergen Institute.
  • Londono and Hassan (2015) Londono, M. G. and A. R. Hassan (2015). Sports betting odds: a source for empirical Bayes. Technical report, EAFIT University, Medellın, Colombia.
  • Maher (1982) Maher, M. J. (1982). Modelling association football scores. Statistica Neerlandica 36(3), 109–118.
  • McHale and Scarf (2011) McHale, I. and P. Scarf (2011). Modelling the dependence of goals scored by opposing teams in international soccer matches. Statistical Modelling 11(3), 219–236.
  • Ntzoufras (2011) Ntzoufras, I. (2011). Bayesian modeling using WinBUGS, Volume 698. John Wiley & Sons.
  • Owen (2011) Owen, A. (2011). Dynamic bayesian forecasting models of football match outcomes with estimation of the evolution variance parameter. IMA Journal of Management Mathematics 22(2), 99–113.
  • Rue and Salvesen (2000) Rue, H. and O. Salvesen (2000). Prediction and retrospective analysis of soccer matches in a league. Journal of the Royal Statistical Society: Series D (The Statistician) 49(3), 399–418.
  • Shin (1991) Shin, H. S. (1991). Optimal betting odds against insider traders. The Economic Journal 101(408), 1179–1185.
  • Shin (1993) Shin, H. S. (1993). Measuring the incidence of insider trading in a market for state-contingent claims. The Economic Journal 103(420), 1141–1153.
  • Smith et al. (2009) Smith, M. A., D. Paton, and L. V. Williams (2009). Do bookmakers possess superior skills to bettors in predicting outcomes? Journal of Economic Behavior & Organization 71(2), 539–549.
  • Spiegelhalter et al. (2003) Spiegelhalter, D., A. Thomas, N. Best, and D. Lunn (2003). WinBUGS user manual.
  • Stan Development Team (2016) Stan Development Team (2016). RStan: the R interface to Stan, version 2.9.0.
  • Štrumbelj (2014) Štrumbelj, E. (2014). On determining probability forecasts from betting odds. International journal of forecasting 30(4), 934–943.