Discovering indicators of dark horse of soccer games by deep learning from sequential trading data

by   Liyao Lu, et al.

It is not surprise for machine learning models to provide decent prediction accuracy of soccer games outcomes based on various objective metrics. However, the performance is not that decent in terms of predicting difficult and valuable matches. A deep learning model is designed and trained on a real sequential trading data from the real prediction market, with the assumption that such trading data contain critical latent information to determine the game outcomes. A new loss function is proposed which biases the selection toward matches with high investment return to train our model. Full investigation of 4669 top soccer league matches showed that our model traded off prediction accuracy for high value return due to a certain ability to detect dark horses. A further try is conducted to depict some indicators discovered by our model for describing key features of big dark horses and regular hot horses.



There are no comments yet.


page 1

page 2

page 3

page 4


GA-MSSR: Genetic Algorithm Maximizing Sharpe and Sterling Ratio Method for RoboTrading

Foreign exchange is the largest financial market in the world, and it is...

Clustering and attention model based for Intelligent Trading

The foreign exchange market has taken an important role in the global fi...

Feature Learning for Stock Price Prediction Shows a Significant Role of Analyst Rating

To reject the Efficient Market Hypothesis a set of 5 technical indicator...

A Comparative Evaluation of Predominant Deep Learning Quantified Stock Trading Strategies

This study first reconstructs three deep learning powered stock trading ...

Predict Forex Trend via Convolutional Neural Networks

Deep learning is an effective approach to solving image recognition prob...

RCURRENCY: Live Digital Asset Trading Using a Recurrent Neural Network-based Forecasting System

Consistent alpha generation, i.e., maintaining an edge over the market, ...

Using machine learning for medium frequency derivative portfolio trading

We use machine learning for designing a medium frequency trading strateg...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Sports analytics has been well studied and applied in various kinds of sports games since the 1970s [1]. The analytic technologies are evolving from statistical to computational approaches [2]. Applying machine learning to soccer games analytics has brought more and more attention to both the sports industry and computing academia.

Traditionally, people assess soccer players’ and teams’ attributes via quantified ratings by experts, big sports media, and professional league web sites based mainly on statistical methods. For example, Sky Sports, the English Premier League broadcast mogul, developed the Power Ranking system to calculate Premier League players’ overall performance based on 32 statistical features [3]. Such objective metrics [4] are usually main data sources for people making decisions associated with soccer games. This classical scenario is vividly exhibited in the most famous management simulation computer game FootBallManager [5].

Predicting the outcome of a soccer match is usually a main task for both academia and industry [6]. For matches covering 2014 World Cup finals, 2012 UEFA European Championships and 2015 Copa America, based on Elo [7], FIFA Women¡¯s World Ranking Methodology [8] and similar ratings, regression models got prediction accuracy ranging from 50% to 55% [9]. Besides academia, Tech giants such as Microsoft Bing also participated in predicting outcomes of English Premier League every season [10]. Bing correctly predicted 125 matches out of 232 in 2017, corresponding to an accuracy of 53.88%. It is marketed as a satisfying performance in terms of game outcome prediction accuracy.

Such predictions comfortably outperformed the random guess accuracy of 1/3 for three-way soccer games. However, accuracy alone could be misleading to evaluate the prediction quality and performance, because by doing so, it is implicitly assumed that the difficulty and value of each correct prediction are equal, which is actually untrue. It is much easier to correctly predict a match’s outcome when one team is much stronger than its opponent, such as the La Liga match Barcelona VS Celta Vigo, than a match when two teams are close, such as the EL CLASSIC match Real Madrid VS Barcelona. Correctly predicting a Barcelona victory in the first match will receive less award from prediction market than correctly predicting the EL CLASSIC match. Basically, the rate of return based on the prediction results can be thought of as quantitative indicator of the difficulty and value of each correct prediction. Optimizing reward is much more difficult than optimizing prediction accuracy. For example, if we place $1 in Betfair’s prediction market [11] on each of the 232 Premier League matches in 2017 guided by Microsoft’s prediction, the return gain is $228.23, which is only 98.37% of our total stake.

The reason for previous unprofitable predictions against the money line is probably not algorithmic, but because of lacking insider information that indicates players’ actual status of the coming match

[12]. Insider information such as morale, locker room stories, and unannounced injuries affect game outcomes significantly but are unknown to the public [13]

. In contrast, it is not hard for bookmakers and big market makers to obtain up-to-date insider information since they sponsor many leagues and teams. Based on their predictions with up-to-date insider information, they maximize profits and minimize potential loss by actions including bidding, trading, and modifying odds in prediction markets


It is believed that the market is not manipulated by individual participants, and bids information enriches some latent factors to determine the outcomes of some specific matches [15, 16]. So in this paper, we used the sequential trading data, instead of performance metrics from teams’ side, as our analytic target data [17, 18]. Therefore, we could probably offset our disadvantage of lacking insider information by digging out the hidden message behind public bids data in prediction market. We developed a deep learning model to maximize the valuable predictions. In order to do that, we defined a new objective function, or loss function for sake of training algorithm, which biases the selection toward matches with more return. Thus our model had a certain ability to detect dark horses.

Generally speaking, dark horse refers to the event with small probability to occur. In real life, correctly detecting dark horse usually results in significant effects. In our context, dark horses refer to those matches whose outcomes are less likely to happen and thus returns are higher than other matches. Our loss function and learning model tried to capture those dark horses. We fully investigated the average learning ability of our model on a real sequential trading data set, containing 4669 soccer matches of top soccer leagues. Results showed that our model underperformed in terms of outcome accuracy but outperformed in terms of valuable return, which reflects our model’s ability to detect dark horses.

In a pilot study, similar learning approach was applied to sequential bids data from Bookmaker [19]. We believe that trading data should contain more information than bids data. Bids data are static intends of bettors’ expression. Although in context of time they can form a dynamical sequential data, bids data lack important deal volume information. For example, a high bid without any deals might leak different information from a bid with high deal volumes. It seemed to suggest that even with the same learning model, learning from trading data flow might be easier than learning from bids data flow, since trading data contained deals information which bids data did not. Thus we expect that more useful information will be exploited by learning from sequential trading data.

Overall, our research makes three contributions. First, we targeted sequential sequential trading data which are believed to embed rich pattern and latent information. Second, we aimed at valuable predictions instead of regular accuracy. And as a natural consequence, we achieved some ability to identify dark horses. Last but not least, we tried to depict some key indicators discovered by our model for describing key features of big dark horses and hot horses.

The rest of paper is organized as follows: after the problem formulation and data set description are given in Section 2, a deep learning model and new loss function are presented in Section 3. Results and evaluations are fully described in Section 4. And further discussion ends the paper.

2 Problem formulation and data set

Let be the final outcome of a soccer match in terms of home team against guest team. For each soccer match, there are three s corresponding to three possible outcomes respectively at given time . If participants bet $1 on , then they will get back if eventually occurs as the match’s outcome. Otherwise, they will lose the bet. Given a bids data sequence for each match, where contains various bids data at time and is the final second before the match, our goal is NOT to predict the outcome of the match. Instead, our final goal is to maximize the expectation of gains for betting a group of matches. That is, given the bids data sequences of a group of matches, for each match , at the time with three of the match outcomes, we make a $1 bet on one prediction . Our final goal is to let


where is the identity function and is the real final outcome of match .

It is important to point out that our final goal described in equation (1) is partially different from the conventional goal of a machine learning task. Usually, a machine learning task aims to maximize the outcome prediction accuracy of the matches. Although in equation (1) is conceptually in direct proportion to the prediction accuracy, our model prefers correctly betting on one match with higher to two matches with lower and respectively, when . In this paper, we refer the match whose eventual outcome agrees with the biggest to a big dark horse. While the match whose eventual outcome agrees with the smallest is called hot horse, and middle horse is the match whose eventual outcome agrees with the stands between the biggest and smallest . We generally refer dark horse to the union of big dark horse and middle horse.

By modeling the primary goal as equation (1), we intentionally try to maximizing the gain, and unintentionally make the prediction accuracy as the secondary goal. That is to say, prediction accuracy becomes a mean instead of an ultimate goal in this paper. This is why we call our solution of maximizing gain as an end-to-end solution.

We bought real bids data of Betfair [11] from its licsensed data agent company Fracsoft [20]. Betfair is a prediction market platform for client-to-client trading, similar with stock exchange platform. Every participant can bid buy/sell prices and volumes on that platform, and of course trade any available bids. Our data set contains bids data sequence of English Premier League from 2007 to 2014, Spanish La Liga League from 2008, 2010 to 2014, and France Ligue 1 League from 2011 to 2014. These leagues are all top soccer leagues of their own countries and of the world. The missing data of some years were due to the data provider’s business restriction.

We collected some trading data for each time interval from the raw data. There are mainly three feature vectors within an

, describing the trading information between time and . These vectors describe trading data at time for a match outcome: and respectively. Each feature vector summarizes trading information of two basic groups, (Back, Buy) and (Lay, Sell), occurred within a certain time interval in a prediction market. People who short a certain outcome can submit Back bids with some volumes at a certain odds, so that someone else who long that outcome can Buy. In contrast, people who long a certain outcome can submit Lay bids and someone else who short that outcome can Sell. Both Back and Lay bids can be cancelled before they are matched by buyers or sellers. We use 4 features to summarize all Buy actions occurred in a certain time interval according to Table 1.

Feature Description
BuyActionCnt The number of times of Buy
BuyVolAvg The mean volume of Buy

The standard deviation of Buy volume

BuyOddsAvg The average odds of Buy
Table 1: Features related to Buy with description

In addition, 6 features are applied to summarize all Back actions occurred in a certain time interval according to Table 2.

Feature Description
BackBidsSubmitted The number of Back bids
BackSubmittedVolAvg The mean volume of Back bids
BackSubmittedVolStd The std volume of Back bids
BackBidsCancelled The number of cancelled Back bids
BackCancelledVolAvg The mean vol. of cancelled Back bids
BackCancelledVolStd The std vol. of cancelled Back bids
Table 2: Features related to Back with description

So there are 10 features in a group (Back, Buy). Similarly, there are another 10 features in a group (Lay, Sell) counterpart. In summary, there are 20 features for a feature vector of a match outcome, and the total dimension of is .

It is noted that deal volume information and bids cancelling information are all included in the feature vector. These are the features totally different from bids features.

Since the data sequence varied in length and trading frequency, we need to preprocess the raw data. Firstly, for a match we truncated trading data sequence to keep all valid data of 2 hours before the opening whistle. Secondly, we dropped matches which had too few trading data. Lastly, we sampled the sequential trading data with the sampling strategy described in Table 3.

sample period time interval sample points
1st 10 seconds 90 before the match begins
2nd 20 seconds 90 before the 1st period
3rd 30 seconds 59 before the 2nd period
4th till available 1 before the 3rd period
Table 3: Sample strategy to generate sequential trading data

Consequently for all matches, the length of a sequence of trading data is regulated to 240.

After filtering out some matches with error data, we have a clean data set of size 4669 matches for the rest of this paper.

3 Designing and training the deep model

The learning model is designed as the following expression:


where is sequential trade data flow of a match,

is non-sequential features of the match, CNN stands for a block based on convolutional neural network


, RNN for a block based on recurrent neural network

[22], CONCAT for concatenation and MLP for a block based on multiple layer perception [23]. Basically, the raw data flow is first feed to CNN for mining new features based on neighbors. So we call such features local features. Then the sequentially mined local features are forward to RNN for accumulatively extracting features as the representative features of the whole sequence. We call features from RNN global features. Finally the global features and other non-sequential features

are combined and input to MLP for constructing a classifier.

is different from the local and global features in that it includes all state features for the match. The order of features in does not matter for the learning task. Such example features of are League type (to which country does this league belong) and match type (strong team against strong, or strong against week team, and so on).

CNN consists of several 1D convolutional layers. The first layer is defined as


where the subscript of the convolution operand

stands for the number of operands and superscript for the window size. The purpose of the first layer is to reorganize the raw features with a non linear activation function


The second layer of CNN is defined as


where the special convolution operand here is totally different from the regular convolution operand in that does not share weights along each time spot of the sequence like a regular does. This means that for different time spots the extracting rules are allowed to be different. It is very likely that the bids data close to the match beginning time are differently embedded with feature from that far from the match beginning time. So this layer is expected to extract more useful features. The operand in equation (4

) denotes a max pooling operation with pooling size 2. This makes the length of the sequence

shortened in half to .

The third layer of CNN is similar with the first layer by defining

where reLu is rectified linear unit

[25] function used as the activation output of the third layer.

The three layers of CNN explore completely different dimensions for extracting local features. The first is focused on the internal. The second aims at different time spot. The last is trying to capture features along the time axis.

RNN block is defined as

where GRU is a gated recurrent network [26], and the subscript 9 denotes the output dimension. GRU is a simplified implementation of LSTM [22].

CONCAT() just concatenates the two input vectors. obviously needs manually annotation on the raw data, therefore it embeds human’s subjective intelligence. In this study, since we are focusing on learning from the objective raw data, we let be null.

MLP block consists of three fully connected regular neural networks, as defined by



is a layer consists of a fully connected neurons whose number is represented as the subscript.

In order to train a regular machine learning model like equation (2), we define our own loss function for the learning model (2) as the following:


where is mini batch size, is the entropy of predicting probability , and are the parameters of our learning model. and are norm 1 and norm 2 respectively, and and are corresponding weights of and .

Please note the subtle difference between the regular categorical cross entropy loss and our loss defined in equation (6). The regular cross entropy loss function gives each label the same importance, while our loss gives dark horse more weights. This is the root why our model might have more chance to catch dark horse than the model using regular cross entropy loss function.

By now, any backpropagation based training algorithms

[27] can be used to minimize our loss function for learning model (2

). For convergence checking, we monitor the decrease of validation loss. If the validation loss does not decrease for a continuous 8 epochs, we consider the training process is convergent and then stop the training. We then use the model parameters of when the validation loss is the minimum as our final trained model for the evaluations.

4 Results

For a not-so-large data set, tuning the best performance model on a fixed test set is usually possible. However, this does not ensure the good generalization of the trained model on the data outside the test set. In this study, we focused on evaluating the average learning ability of our model by the following design. We run multiple independent trials on the whole data set. For each trial, we randomly selected 10% of the data set as the test set and the rest as the training set. This makes the test set size of 467 matches in this section. Among the training set, a random 10% was selected as the validation set. Based on all these trials, we tried to analyze the average performance for evaluating our model in this section. This evaluation strategy is a variant of cross validation strategy, but with more fine granularity.

We used Keras


as our front programming framework, and Tensorflow

[29] as the underlined deep learning engine. For the other training parameters setting, we set as the mini batch size in equation (6), and =1e-3. We used Adam optimization algorithm [30] to minimize equation (6) with learning rate=1e-4 and decay=1e-5.

We run our program on a server equipped with two Nvidia GPU cards, Tesla K20C. The server has two 12-core E5-2620 CPUs with 64GB memory. It took 11 seconds for an epoch learning for the above settings.

4.1 Baselines setting and bet policies

For results evaluation in terms of valuable predictions, we set up five baselines. The first is the gain based on random guess strategy, which is the gain of betting on the random selection of or . The second is the gain based on min-Odds guess strategy, which is the gain of betting on the outcome with minimum odds. According to [31], the probability of a certain outcome is roughly the reciprocal of its odds. Thus, is considered as a naive strategy that picks the outcome seemingly most likely to happen according to the static pre-game odds. Similarly, the third and fourth baselines are and . It seems that and are rational choices if no other information are available for making decision. The last is the best gain , which is the gain of betting all correctly with the outcome of the match. is of course the ceiling line which is never touched by any predictions.

For each test set we used three policies based on the prediction probabilities to evaluate the gains of our trained models. One-bet policy (1-bet for short) means that we bet $1 on one of the three outcomes, or , by selecting the max probability of the prediction. Split-bet policy (s-bet for short) means that we split $1 on three bets according to the three prediction probabilities of three outcomes. In fact, s-bet can be thought as of a hedge policy. It will not get nothing or highest return no matter what outcome occurs. It is very easy to see that our loss function defined in equation (6) is in favor of 1-bet. Including s-bet results here is just for evaluation. The third bet policy is called dark horse policy (d-bet for short), in which case we ignore those matches whose predictions that our model agreed with , then we apply 1-bet to the rest of the matches.

4.2 Computational results

We do the following evaluations based on 87 random trials, and ensure that every match has chance to be in the test set at least one time.

It is easy to accept that the ideal expectation of is close to 1. The interesting thing is that and are all close to 1, and is close to 3 [32]. Check Figure 1 for the empirical results.

Since can be interpreted as the probability of a outcome [31], and for the ideal fair condition,

where is the outcome choice of . The accuracy of depends on the ratio of dark horses over all matches. It is nature that the probability of dark horses is less than that of hot horses. So the accuracy of is always a little above 50%.

Figure 1 showed the overall performance comparison between our model and baselines.

Figure 1: Overall performance comparing with the baselines

The first three boxes in Figure 1

showed our model’s better performance over the baselines. The average gains of d-bet and 1-bet were close to 1.07. We were very pleased that the average gains of our model were superior to the first quartile of

and . 1-bet obtained the solid and good performance among all the gains because d-bet took risk for extreme returns. The following investigation was now based on 1-bet performance.

An interesting point was such gains were obtained under the condition of prediction accuracies less than 50%. Figure 2 showed the relationship between gain and prediction accuracies.

Figure 2: Sorted 1-bet gains vs prediction accuracies

The average of total prediction accuracies was about 39%, and that of dark horse accuracy was about 34%. The slope of dark horse accuracy was a little bit sharp than that of total prediction accuracy, which demonstrated that dark horse accuracy contributed more to gain than the total prediction accuracy.

Figure 3: Sorted 1-bet gains vs horse distribution

Considering the fact that is almost equal to despite the prediction accuracy varying from 33% to 53%, shown in Figure 3, it is understandable that our model had chance to get better gains with the accuracy less than 50% because we captured more dark horses than and dropped more hot horses than .

The importance of detecting dark horse correctly was illustrated in Figure 4, where we adopted the more aggressive bet policy, d-bet.

Figure 4: Sorted d-bet gains vs prediction accuracy

Figure 4 showed that d-bet gains was better than 1-bet even with the average prediction accuracy as low as 26%. But it was acceptable that d-bet was not so stable than 1-bet by observing Box 1 and 2 in Figure 1.

We further wanted to know why our model had different accuracies of detecting dark horses. Figure 3 showed how 1-bet gains related to the horse distribution.

First, Figure 3 told that the three distributions of three types of horses in each test set were almost uniform. This ensured us the uniform partition between the train set and test set. Second, it was easy to know that the three distributions were also the prediction accuracies corresponding to the three baselines, hot horse distribution for predicting accuracy of , big dark horse distribution for , and middle horse distribution for . Of course the prediction accuracy of was 33%. We wanted to emphasize that despite the large difference of these accuracies, their average gains were not so much different. It also proved that predicting hot horse correctly was easier but less valuable than predicting dark horse correctly. Third, the linear fit of hot horse distribution in Figure 3 showed a little bit decrease, which reflected the overall increase of dark horse distribution. So this explained the reason of the increase trend of dark horse accuracy in Figure 2. It also told that the ability of our model to detect dark horse was not random but stable.

Finally, we wanted to investigate whether the higher 1-bet gains benefited from longer training. Figure 5 showed convergent epochs of all the experiment trials.

Figure 5: Sorted 1-bet gains vs convergent epochs

The linear fit of epochs in Figure 5 was independent with the increase of 1-bet gains. This proved that our training was basically stable, as well as the model’s learning ability.

4.3 Discovering indicators

In this section, we tried to exploit possible indicators of dark horse discovered by our learning model. The best indicator must be simple and suited for indicating as much dark horses as possible. Unfortunately, the accuracy for detecting dark horse of our model was about 34%. It seems not practical to discover the ideal consistent indicators for all dark horses. We turned to try finding the possible differentiable indicators between the darkest and the hottest horse. The following analysis was based on a typical trial.

We started from the learned features from CNN and RNN blocks, which automatically extracted 9 features to MLP for constructing a classifier according to equation (5). First, we analysed the 9 distributions of these features’ values. We failed to seek some special distributions of specific feature between dark horses and hot horses. However, when we depicted the pattern of these 9 features of TOP 3 darkest and hottest horses in Figure 6, the differentiable pattern was revealed.

Figure 6: Feature patterns learned from RNN block for top 3 hottest and darkest horses.

The pattern jointly determined by RNN feature 0, 3, 5, 6 and 7 was clearly different between dark and hot horses in Figure 6. The darkest horse was a match of Premier League played on February 09, 2014. The home team was Manchester United football club, and the guest was Fulham football club. The final outcome of this match was 2-2, a surprising draw. The procedure of this match was as the following. The first goal happened at 19 minute of the first half scored by Fulham’s Steve Sidwell. The second and third goals occurred at 78 and 80 minute of the second half from Manchester’s Robin Van Persie and Michael Carrick. At the last minute of match, Darren Bent from Fulham scored the equaliser.

Our model’s prediction and pre-game odds were summarized in Table 4.

1.17 9.8 22.0 null
0.8547 0.1020 0.0455 0
1-bet 0.4075 0.5777 0.0147 9.8

prediction probability on each outcome.

Table 4: Comparison between and 1-bet predictions on game Man. Utd vs Fulham

Our model gave the probability 0.5777 of the outcome , the highest among the three predictions but a little higher than that of the outcome . We tried to find how our model made this decision.

We began with visualizing the input feature of the match. We have 20 features at each timestamp for each of the three possible outcomes, which were previously illustrated in Table 1 and Table 2. When BuyVolAvg increases, it indicates that the market feels the outcome more likely to occur eventually. In contrast with Buy group related features, the trends of the Sell group related curves have the exact opposite meaning. When BackSubmitted related curves rise, it indicates that the market feels the outcome less likely to occur, and BackCancelled related curves indicate the opposite market morale. In addition, the trends of Lay group related curves have the exact opposite meaning with their Back counterparts. Here we visualized 6 typical features for each outcome in Figure 7.

Figure 7: Some typical normalized input features. The top subfigure is for outcome, the middle for and the bottom for .

For the clarification reason, we only truncated curves of last 400 seconds. The map of feature names to z-axis of Figure 7 is described in Table 5.

Feature name z-axis favorability of market
BuyVolAvg 0 more
SellVolAvg 1 less
BackSubmittedVolAvg 2 less
SellSubmittedVolAvg 3 more
BackCancelledVolAvg 4 more
SellCancelledVolAvg 5 less
Table 5: Feature names related to z-axis in Figure 7

The last column of Table 5 indicates the favorability of market if the corresponding feature value is going up. It was clearly demonstrated in Figure 7 that the hottest horse had totally different input pattern from the darkest horse. It is understandable that curves of the hottest horse usually does not change much as the darkest horse. Besides the frequent changes, there are always inconsistent trends for those dark horses. That is why dark horse is more difficult to identify than hot horse. For example, let’s check the features curves in Figure 7. Curves of Feature 0 and 1 for showed the market was unfavor for this outcome, which was consistent with the market’s favor of the other two outcomes. But checking these two curves for and , both Feature 0 and 1 for were increasing simultaneously, which was a conflict. While these two curves for were consistent, which showed the market began in favor of this outcome just before the match opening. But the trends of Feature 0 for and were not agreed.

Now let us check what our model learned from such complicated curves.

Figure 8: CNN block features learned by our model. The top subfigure is from the 1st CNN layer, The middle from the 2nd CNN layer, and the bottom from the 3rd CNN layer.

Figure 8 showed features learned from CNN block by our model. For the sake of simplicity, we only showed the last 40 points in Figure 8, and only Feature 0 and 8 in the top subfigure.

in equation (3

) was designed to encode 60-dimensional input space into 9-dimensional space. And such encoding was done by a non linear transformation in order to map the sparse input space into dense feature space. It made sense that each feature in

in Figure 8 was trying hard to re-represent the input feature.

in equation (4) was designed to capture the local features related to time spot. We did not share the parameters of the convolution operand along with the time axis. Instead, we let each time spot have an independent convolution operand. This gave each time spot a chance to embed different local patterns. showing in Figure 8 clearly illustrated that even the hottest horse had rich local feature patterns. Comparing with the relatively stable curves in Figure 7, features of the hottest horse were not so invariant as of the darkest horse.

was designed to capture the global curve trend as the representative feature of the whole sequence. It was consistent with human’s regular sense that two patterns were revealed in the bottom subfigure of Figure 8. First, more far from the match beginning, more stable for the feature values. Second, the feature curves of dark horse exhibited more fluctuation.

The extracted sequential was feed to RNN block to mine the statable features showed in Figure 6. Finally, based on feature pattern, MLP block gave the probability of 0.5777 to outcome for that darkest horse in Table 4, while a not-so-low probability of .4075 to outcome and a neglect to .

We further gave feature patterns found by RNN block for top 50 hottest and darkest horses in Figure 9.

Figure 9: Feature patterns learned from RNN block for top 50 hottest and darkest horses

It seemed that patterns determined by Feature 3, 5, 6 and 7 showed differently between hot and dark horses. Considering that these patterns would be further classified by MLP block, it was acceptable that clearly distinguishable bound could not be found between these patterns.

5 Discussion

This paper built a learning model for detecting dark horse pattern from the sequential trading data. We obtained good gains despite of prediction accuracy less than 50%. Further interesting insight was the possible patterns differentiated between dark and hot horses. We developed an analytic framework to identify indicators of dark horses of soccer games.

The relationship from the input to output seemed to be conceptually correct in this study. However, due to the existence of non linear transformations in three layers of our model, the quantity relationship is hard to be identified clearly and simply. In fact, deep learning model has always been criticized for its “black box” magic. Since applying techniques of convolutional neural network to computer vision tasks has successfully resulted in interpretable visual features, we are hoping that such successful possibility might exist in the domain of identifying dark horse. How to map the mined features to semantic meanings for human to understand will always be challenge in machine learning domain.

Since the prediction market data flow is just like that on stock markets, we hope our approach might be broadly applied to similar researches.


  • [1] R. Stefani. Football and basketball predictions using least squares. IEEE Transactions on Systems, Man and Cybernetics, 7(2):117–121, 1977.
  • [2] Jan Van Haaren, Albrecht Zimmermann, Joris Renkens, Guy Van Den Broeck, Tim Op, De Beéck, Wannes Meert, and Jesse Davis. Machine learning and data mining for sports analytics. In ACM SIGKDD 2017 Workshop, Skopje, Macedonia, September 2017. ACM.
  • [3] Sky Sports. How are the sky sports power rankings calculated, January 2017.
  • [4] M. Hughes and I. Franks. The Essentials of Performance analysis: An introduction. Routledge, 2007.
  • [5] Sega. Web Service, 2017.
  • [6] Van Haaren Jan and Van Den Broeck. Relational learning for football-related predictions. In Latest advances in inductive logic programming

    , Windsor Great Park, United Kingdom, 31 July - 3 Auguest 2011. Twenty-First International Conference on Inductive logic Programming.

  • [7] Arpad Elo. The Rating of Chess Players, Past and Present. Ishi Press, May 2008.
  • [8] Jan Lasek, Zoltán Szlávik, Marek Gagolewski, and Sandjai Bhulai. How to improve a team’s position in the fifa ranking? a simulation study. Journal of Applied Statistics, 43(7):1349–1368, 2016.
  • [9] Jan Lasek. Euro 2016 predictions using team rating systems. In Machine Learning and Data Mining for Sports Analytics, ECML/PKDD 2016 workshop. Riva del Garda, Italy, 2016.
  • [10] Bing. Web Service, 2017.
  • [11] Betfair. Web Service, 2017.
  • [12] Steven D. Levitt. Why are gambling markets organised so differently from financial markets? The Economic Journal, 114(495):223–246, 2004.
  • [13] Karen Croxson and J. James Reade. Information and efficiency: Goal arrival in soccer betting. The Economic Journal, 124(575):62–91, March 2014.
  • [14] Michael A. Smith, David Paton, and Leighton Vaughan Williams. Do bookmakers possess superior skills to bettors in predicting outcomes? Journal of Economic Behavior & Organization, 71(2):539 – 549, 2009.
  • [15] M.J. Dixon and S.G. Coles. Modelling association football scores and ineffciencies in the football betting market. Journal of the Royal Statistical Society: Series C (Applied Statistics), 46(2):265–180, 1997.
  • [16] Egon Franck, Erwin Verbeek, and Stephan N esch. Sentimental preferences and the organizational regime of betting markets. Southern Economic Journal, 78(2):502–518, 2011.
  • [17] Christoph Leitner, Achim Zeileis, and Kurt Hornik. Bookmaker consensus and agreement for the uefa champions league 2008/2009. Ima Journal of Management Mathematics, 22(2):183–194, 2011.
  • [18] Erik Strumbelj. On determining probability forecasts from betting odds. International Journal of Forecasting, 30(4):934 – 943, 2014.
  • [19] Qiang Lyu and Liyao Lu. Detecting dark horse of soccer games by deep learning bids data flow. unpublished paper,, Jan. 2018.
  • [20] Fracsoft. Web Service, 2015.
  • [21] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, Nov 1998.
  • [22] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
  • [23] Frank Rosenblatt.

    The perceptron: a probabilistic model for information storage and organization in the brain.

    Psychological Review, 65(6):386–408, 1958.
  • [24] Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. CoRR, abs/1312.4400, 2013.
  • [25] Vinod Nair and Geoffrey E. Hinton.

    Rectified linear units improve restricted boltzmann machines.

    In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, pages 807–814, USA, 2010. Omnipress.
  • [26] KyungHyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches. CoRR, abs/1409.1259, 2014.
  • [27] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning representations by back-propagating errors. Nature, 323:533, 1986.
  • [28] François Chollet. Web Service, 2017.
  • [29] Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: a system for large-scale machine learning. operating systems design and implementation, pages 265–283, 2016.
  • [30] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
  • [31] J Martin Bland and Douglas G Altman. The odds ratio. Bmj, 320(7247):1468, 2000.
  • [32] Dominic Cortis.

    Expected values and variances in bookmaker payouts: A theoretical approach towards setting limits on odds.

    The Journal of Prediction Markets, 9(1):1–14, 2015.