Evaluating one-shot tournament predictions

We introduce the Tournament Rank Probability Score (TRPS) as a measure to evaluate and compare pre-tournament predictions, where predictions of the full tournament results are required to be available before the tournament begins. The TRPS handles partial ranking of teams, gives credit to predictions that are only slightly wrong, and can be modified with weights to stress the importance of particular features of the tournament prediction. Thus, the Tournament Rank Prediction Score is more flexible than the commonly preferred log loss score for such tasks. In addition, we show how predictions from historic tournaments can be optimally combined into ensemble predictions in order to maximize the TRPS for a new tournament.



There are no comments yet.



Leveraging Model Interpretability and Stability to increase Model Robustness

State of the art Deep Neural Networks (DNN) can now achieve above human ...

Deep Similarity Learning for Sports Team Ranking

Sports data is more readily available and consequently, there has been a...

Ranking academic institutions on potential paper acceptance in upcoming conferences

The crux of the problem in KDD Cup 2016 involves developing data mining ...

Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal Predictions

A fundamental challenge for any intelligent system is prediction: given ...

A Bayesian multiscale CNN framework to predict local stress fields in structures with microscale features

The purpose of this work is to train an Encoder-Decoder Convolutional Ne...

More than just Frequency? Demasking Unsupervised Hypernymy Prediction Methods

This paper presents a comparison of unsupervised methods of hypernymy pr...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

2 Evaluating tournament predictions

3 Simulations

4 Improving predictions using ensemble predictions

5 Application: Evaluating predictions for the 2018 FIFA World Cup

6 Discussion


  • Baboota and Kaur (2018) Baboota, Rahul, and Harleen Kaur. 2018. “Predictive analysis and modelling football results using machine learning approach for English Premier League.” International Journal of Forecasting 35.
  • Bradley and Terry (1952) Bradley, Ralph Allan, and Milton E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.” Biometrika 39 (3/4): 324–345. http://www.jstor.org/stable/2334029.
  • Constantinou and Fenton (2012) Constantinou, Anthony Costa, and Norman Elliott Fenton. 2012. “Solving the Problem of Inadequate Scoring Rules for Assessing Probabilistic Football Forecast Models.” Journal of Quantitative Analysis in Sports 8.
  • Dyta and Clarke (2000) Dyta, David, and Stephen R. Clarke. 2000. “A ratings based Poisson model for World Cup soccer simulation.” Journal of the Operational Research Society 51: 993–998.
  • Ekstrøm (2018a) Ekstrøm, Claus Thorn. 2018a. “Hvem vinder VM 2018 og hvem er bedst til at prædiktere det?” Online blog post; accessed 26 July 2019, http://sandsynligvis.dk/2018/06/14/hvem-vinder-vm-2018-og-hvem-er-bedst-til-at-prædiktere-det/.
  • Ekstrøm (2018b) Ekstrøm, Claus Thorn. 2018b. “Prediction competition for FIFA World Cup 2018.” Online github page; accessed 26 July 2019, https://github.com/ekstroem/socceR2018.
  • Ekstrøm (2018c) Ekstrøm, Claus Thorn. 2018c. “World Cup Prediction Winners.” Online blog post; accessed 26 July 2019, http://sandsynligvis.dk/2018/08/03/world-cup-prediction-winners/.
  • Epstein (1969) Epstein, Edward S. 1969. “A Scoring System for Probability Forecasts of Ranked Categories.” Journal of Applied Meteorology 8: 985–987.
  • ESPN (2019) ESPN. 2019. “NCAA Tournament Bracket Challenge 2019.” Online page; accessed 01 July 2019, {http://fantasy.espn.com/tournament-challenge-bracket/2019/en/story?pageName=tcmen/howtoplay}.
  • Fragoso, Bertoli, and Louzada (2018) Fragoso, Tiago M., Wesley Bertoli, and Francisco Louzada. 2018. “Bayesian Model Averaging: A Systematic Review and Conceptual Classification.” International Statistical Review 86 (1): 1–28. https://onlinelibrary.wiley.com/doi/abs/10.1111/insr.12243.
  • Gneiting and Raftery (2007) Gneiting, Tilmann, and Adrian E. Raftery. 2007. “Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association 102: 359–378.
  • Groll et al. (2018) Groll, Andreas, Christophe Ley, Gunther Schauberger, and Hans Van Eetvelde. 2018. “Prediction of the FIFA World Cup 2018 - A random forest approach with an emphasis on estimated team ability parameters.” arXiv e-prints arXiv:1806.03208.
  • Groll et al. (2019) Groll, Andreas, Christophe Ley, Gunther Schauberger, and Hans Van Eetvelde. 2019. “A hybrid random forest to predict soccer matches in international tournaments.” Journal of Quantitative Analysis in Sports to appear.
  • Gu and Saaty (2019) Gu, Wei, and Thomas L. Saaty. 2019. “Predicting the Outcome of a Tennis Tournament: Based on Both Data and Judgments.” Journal of Systems Science and Systems Engineering 28 (3): 317–343. https://doi.org/10.1007/s11518-018-5395-3.
  • Hoeting et al. (1999) Hoeting, Jennifer A., David Madigan, Adrian E. Raftery, and Chris T. Volinsky. 1999. “Bayesian Model Averaging: A Tutorial.” Statistical Science 14 (4): 382–401. http://www.jstor.org/stable/2676803.
  • Huang and Chen (2011)

    Huang, Kou-Yuan, and Kai-Ju Chen. 2011. “Multilayer Perceptron for Prediction of 2006 World Cup Football Game.”

    Adv. Artif. Neu. Sys. 2011: 1–8. http://dx.doi.org/10.1155/2011/374816.
  • Hubáček, Šourek, and Železný (2019)

    Hubáček, Ondřej, Gustav Šourek, and Filip Železný. 2019. “Learning to predict soccer results from relational data with gradient boosted trees.”

    Machine Learning 108 (1): 29–47. https://doi.org/10.1007/s10994-018-5704-6.
  • Jin (2006) Jin, Yaochu, ed. 2006. Multi-Objective Machine Learning. Springer Verlag.
  • Karlis and Ntzoufras (2003) Karlis, Dimitris, and Ioannis Ntzoufras. 2003. “Analysis of sports data by using bivariate Poisson models.” Journal of the Royal Statistical Society: Series D (The Statistician) 52 (3): 381–393. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9884.00366.
  • Kolmogorov (1933) Kolmogorov, Andrey. 1933. “Sulla determinazione empirica di una legge di distribuzione.” Gior. Inst. Ital. Attuari. 4: 83–91.
  • Murphy (1970) Murphy, H., Allan. 1970. “The ranked probability score and the probability score: A comparison.” Monthly Weather Review 98 (12): 917–924.
  • Neudorfer and Rosset (2018) Neudorfer, A., and S. Rosset. 2018. “Predicting the NCAA basketball tournament using isotonic least squares pairwise comparison model.” Journal of Quantitative Analysis in Sports 14: 173–183.
  • Rosasco et al. (2004) Rosasco, Lorenzo, Ernesto De Vito, Andrea Caponnetto, Michele Piana, and Alessandro Verri. 2004. “Are Loss Functions All the Same?” Neural Computation 16 (5): 1063–1076. https://doi.org/10.1162/089976604773135104.
  • Skellam (1946) Skellam, J.G. 1946. “The frequency distribution of the difference between two Poisson variates belonging to different populations.” Journal of the Royal Statistical Society. Series A (General) 109 (Pt 3): 296.