The aim of a recommender system is to provide suggestions to a set of users on items that might be interesting for them. Recommendation systems are commonly found in e-commerce [20, 18] (where users purchase goods like books, clothes or games online), usually implemented through collaborative filtering methods 
. These work by comparing similar items or similar users based on user ratings. If two users like the same items they are likely similar, and if two items are liked by the same users, those items are probably similar as well. However, as this method does not take into account the contents, new items cannot be recommended. Content-based recommenders can be used to overcome some of these issues by looking at the item in question and finding similarity between items based on inherit properties. A hybrid approach can also be taken, to combine e.g. collaborative information, content features and demographics . A more detailed study into the current limitations and possible extensions of recommendation systems can be found in .
The integration of recommendation systems into video games is a relatively new area of research. Previous work has mostly focused on game recommendation engines, which present players with suggestions on alternative titles based on the games they have already played [2, 22]. But it is also possible to use recommendation systems to increase player engagement in a game. In modern free-to-play games, users can buy a wide range of virtual items with real money (in-app purchases, IAPs). However, sometimes they can be overwhelmed by the number of items offered and the diversity of playstyles, and this can lead to an increase in the churn rate—as players start to find the contents too difficult and are unable to progress within the game. Item recommendation systems can help prevent this problem by offering players a more direct route to the items that could be appealing or useful for them, thereby improving their purchasing and general in-game experience. This may ultimately result into increased revenue  by increasing player retention, IAPs and the conversion rate from free to paying users.
To achieve these goals, it is essential to recommend each player the right item—one that fits both their current state and their playing behavior—at the right time. And this is possible because (in contrast to other applications where very limited information is available) every action performed by a player within the game gets recorded. This offers a unique opportunity not only to obtain accurate predictions on the player’s in-game behaviour (for example on when and at what level they will leave the game, see  and ) but also to offer them personalized recommendations of items that are likely relevant to them.
There are previous papers related to item recommendation systems.  introduces a recommendation system for the massively multiplayer online first-person shooter game Destiny, where players get suggestions on those items that best fit their play style and might improve their performance. They apply similarity measures to global descriptors like total kill count or kill/death ratio. Clusters for the player “base” and “cooldown” stats were derived through -means clustering, whereas archetypal analysis [7, 21] (which clusters by extreme values rather than centroids ) was used to find distinct playstyles. Similar analyses were done for the massively multiplayer online role-playing game Tera and the multiplayer strategy game Battlefield 2: Bad Company 2  or the game Tomb Raider: Underworld . In all these cases, players were clustered by their playing behaviour; although no recommendation system was built, behavioral profiling via clustering may be very useful in offering recommendations based on similarity between users.
However, unsupervised clustering methods remains a challenge. In particular, a significant amount of game-specific knowledge, is required to find adequate features that can separate players into the right number of clusters.
While there are several approaches to the problem of developing recommendation systems, here we will explore a different avenue: our aim is to provide a method that predicts the next items a player will purchase, and use this information to recommend them other items. This approach differs from traditional methods as we explicitly use a predictive model.
Such a model allows us to predict, both for new and existing users, the items they are likely to find most appealing based on their playing behaviour. Additionally, it must be robust for operational implementation, to be able to recommend game products automatically, in a variety of game genres, namely different game data distributions.
Ii-a Extremely Randomized Trees
Extremely randomized trees (ERTs) 
extend the randomization of original random forest[13, 6]
algorithms by choosing the splitting points randomly instead of computing the ones that are more correlated with the output (which makes random forest an easy biased approach). ERTs are computationally efficient, reducing the variance of the model and preventing overfitting. However the bias can also be larger with this method when the randomization is increased above the optimal level, due to the decrease in the variance.
Breiman implementation of random forest builds an ensemble of decision trees, each of which is fit on a random subset of features
. This randomization in the feature selection, combined with the bagging of multiple decision trees, reduces the correlation between trees and increases the overall accuracy of the ensemble.
One of the main advantages of ensemble models is that they are trivially parallelizable, either using multicore processors (as each tree could potentially be trained on a single core) or across multiple machines. This makes them more practical in operational settings, where training and inference have to be completed in a relatively short time, and thus better suited for developing a commercial recommendation system.
Ii-B Deep Neural Networks
Deep neural networks (DNNs) 
are artificial neural networks with multiple hidden layers. By using nonlinear activation functions (the functions that transform the output at each layer before passing it to the next), DNNs are able to learn highly nonlinear dynamics. Multiple iterations, i.e. epochs, are run to optimize the DNN during the learning process. Rectified linear units (ReLU) are among the most commonly used activation functions nowadays. DNNs that combine ReLU with dropout—a strategy consisting in randomly dropping out some of the units at each layer—have been shown to provide state-of-the-art accuracy in domains such as image classification or speech recognition 14] have achieved similarly high accuracies in sequence prediction and language modeling.
Iii Item Recommendation Model
While RNNs and LSTM networks are able to learn temporal dependencies and eliminate the need for manual feature engineering, they also slow down the training significantly, as they have to learn the relevant features of the time series that lead to an increase in prediction accuracy. On the other hand, by manually calculating general statistics of the time-series data together with other descriptors one can efficiently create a single vector describing the player’s behavior and use it in nontemporal models like DNNs or ensemble-based methods such as ERTs.
These are the main challenges related to our approach:
The model should be able to train and provide inference in production environments scaling to millions of users.
It should be trainable on mini-batches so that it fits in the memory (ensemble models usually work on the full dataset).
The time-series data needs to be converted into a single feature vector that accurately represents the player’s behavioural patterns (as commented above, tree ensembles and DNNs use static feature vectors, not time series).
As players make multiple purchases over their lifetime in the game, we must extract their next purchase from multiple time points. Thus the training dataset may become huge if e.g. players remain in the game for several years.
The following sections elaborate on the dataset used and on the way the model was constructed to solve these challenges.
The data used in our analysis comes from the Japanese card-game Age Of Ishtaria, developed by Silicon Studio, and contains daily time-series data for each paying user within the period from 2014-09-24 to 2017-05-08 (totaling 33,488 players). It contains information on the number of purchases per item and total sales per item for each user. Players can purchase in-game currency with real money and use it to buy different card-packs (known as gacha) containing a random set of cards that can be employed in the game. The data contains 8 different types of items and also has information on e.g. the player’s daily level progression, playtime and lifetime.
Iii-B Feature calculation
To convert our time-series into a single static vector we calculate general statistics over the full time-series data for each of the temporal features (e.g. daily playtime or sales). The process is as follows: First we compute the derivative of the time series in order to get its variations (for instance, if we are tracking total level, the derivative gives us the number of level-ups per day). Then we calculate the mean/variance/skew/kurtosis/maximum over the time series for each of the temporal features. Additionally, to capture behavioral changes of the player between the beginning and end of their lifetime, we also compute the distance for all temporal features over the first and last days in which they logged in. Finally, all these features get concatenated into our final feature vector. By using such a method, the feature calculation can be generalized to any type of temporal data.
Iii-C Sampling to handle multi-label outputs
Players usually make multiple purchases, which means we can have multiple prediction targets (multiple labels) per user. One way of dealing with this is taking some subsample until time from each player’s time series and then find their next purchase after . This results in a single label we can train on, and allows us to take multiple subsamples to enlarge our training set. Since players could be playing for several years and have hundreds or even thousands of days of playing activity, by using subsampling we can generate different training samples for each player, increasing our effective training dataset and reducing overfitting.
Iii-D Scalability using minibatches
Additionally, the model should be able to scale to millions of players; however, if we generate very large feature vectors (with thousands of features) and sample multiple labels per user, we could end up with datasets with over a billion samples (a thousand samples per user). An efficient way of coping with such huge data sets is to train an ensemble model on subsamples of the total set. Hence, we can train a small subset of trees (20) on a small sample of a few thousand users and generate the labels directly during training, so that we do not need to store all samples. The final ensemble is formed by combining many such subsets of trees, where each tree was trained on different features, different samples, and different target labels, producing an extremely robust model.
Iii-E Model Specification
For each player and item, we generate the probability that they will buy that item on their next purchase day. As the model is trained over all players, once players are in a similar state the model can learn to predict and recommend the right item at the right time for each individual player.
We take the full time-series patterns for each user to convert them into a single vector that represents their playing behavior. This conversion is done for all users in a single mini-batch. Multiple mini-batches are generated per epoch (one epoch goes over the entire dataset), and the model is trained on each of these batches.
The ERT model was trained on subsets of 20 trees for 30 iterations, resulting in a total ensemble size of 600 trees. Each iteration was performed on a subset of 10k users, which means that a full single epoch was completed after 3 iterations (as the total set has 33,488 players), therefore we had 10 epochs.
For the DNN model, we used two hidden layers of 2048 units and set a dropout probability of 0.5. Additionally, as there were many correlating features, dropout was also applied to the input layer. By randomly dropping some inputs, we reduce overfitting on single features, thereby increasing the robustness of the model. (Recall this was achieved by random subsampling of features in the ERT model.) The network was trained for 30 iterations as well, but each iteration was repeated 5 times, resulting in a total of 50 epochs. Both DNN and ERT are trained on the same data.
Iv Model Evaluation
In order to evaluate the effectiveness of the proposed model, we study the prediction accuracy within an upcoming time window. Predictions are made at a time point and evaluated at time (where is measured in days). The training was performed using data up to 2017-03-19, and predictions were verified in the window from 2017-03-20 to 2017-05-08.
Several measures are calculated:
1) isOnNextPurchaseDate: Checks whether the predicted item was actually acquired by the player throughout their next purchase day (our training objective).
2) isNextPurchase: Checks whether the item that was predicted to be purchased by a certain player was actually acquired by the player on their very next purchase.
3) isWithinWindow: Checks whether the predicted item was actually acquired by the player at some point within the time window considered (between and ).
For all three measures, the accuracy for the top (predictedMax), top 2 (withinTop2) and top 3 (withinTop3) predicted items is calculated, i.e. we check whether the player actually purchased the item that had the highest probability, any of the two items with the two highest probabilities or any of the three items with the three highest probabilities, as per the prediction.
Figure 1 shows the predictions for a subset of users. The DNN (left panel) and ERT (right panel) results exhibit similar patterns (with only slight variations). We see that different users have different purchase probabilities for each item, which shows that the models are capable of providing personalized predictions for each player based on their playing behaviour.
The accuracy results for both models can be found in Table I. When considering the top 2 and top 3 predictions, both models present similar accuracies, but the ERT is slightly better at identifying the item with the highest probability of being acquired on the next purchase, for all three measures.
prediction in the DNN and ERT models
An item recommendation system for games is essential to provide players with individual rewards or incentives to increase engagement, to maximize in-app purchases and to increase cross-selling and up-selling. We have presented two models to predict which items players will be more attracted to buy in their next purchases. The results show that the predicting performance of the DNN and ERT is similar. However the ERT model yields slightly better results (as shown in Table I) and also scales up more easily in a production environment.
While predictions were made only for a small set of items, the model is trivially extendable to run on hundreds of items, and can be used both for items purchased with real money and for in-game virtual purchases. Future works in this direction will include an evaluation of the recommendation system in terms of total game sales for live video-games.
We thank Javier Grande for his careful review of the manuscript and Ana Fernández for her support.
-  G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6):734–749, 2005.
-  S. M. Anwar, T. Shahzad, Z. Sattar, R. Khan, and M. Majid. A game recommender system using collaborative filtering (GAMBIT). In 2017 14th International Bhurban Conference on Applied Sciences and Technology (IBCAST), pages 328–332. IEEE, 2017.
-  C. Bauckhage and R. Sifa. -maxoids clustering. In Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB, pages 133–144, 2015.
-  P. Bertens, A. Guitart, and Á. Periáñez. Games and big data: A scalable multi-dimensional churn prediction model. In 2017 IEEE Conference on Computational Intelligence and Games (CIG), pages 33–36. IEEE, 2017.
J. S. Breese, D. Heckerman, and C. Kadie.
Empirical analysis of predictive algorithms for collaborative
Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pages 43–52, San Francisco, 1998. Morgan Kaufman.
-  L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
-  A. Cutler and L. Breiman. Archetypal analysis. Technometrics, 36(4):338–347, 1994.
-  A. Drachen, A. Canossa, and G. N. Yannakakis. Player modeling using self-organization in Tomb Raider: Underworld. In 2009 IEEE Symposium on Computational Intelligence and Games (CIG), pages 1–8. IEEE, 2009.
-  A. Drachen, R. Sifa, C. Bauckhage, and C. Thurau. Guns, swords and data: Clustering of player behavior in computer games in the wild. In 2012 IEEE Conference on Computational Intelligence and Games (CIG), pages 163–170. IEEE, 2012.
-  P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine learning, 63(1):3–42, 2006.
-  C. A. Gomez-Uribe and N. Hunt. The Netflix recommender system: Algorithms, business value, and innovation. ACM Transactions on Management Information Systems, 6(4):13, 2016.
-  G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012.
-  T. K. Ho. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8):832–844, 1998.
-  S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
-  A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS 2012), pages 1097–1105, 2012.
-  Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436, 2015.
-  V. Lehdonvirta. Virtual item sales as a revenue model: identifying attributes that drive purchase decisions. Electronic Commerce Research, 9(1–2):97–113, 2009.
-  G. Linden, B. Smith, and J. York. Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 7(1):76–80, 2003.
Á. Periáñez, A. Saas, A. Guitart, and C. Magne.
Churn prediction in mobile social games: towards a complete
assessment using survival ensembles.
2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 564–573. IEEE, 2016.
-  B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Analysis of recommendation algorithms for e-commerce. In Proceedings of the 2nd ACM Conference on Electronic Commerce, pages 158–167. ACM, 2000.
-  R. Sifa, C. Bauckhage, and A. Drachen. Archetypal game recommender systems. In Proceedings of the 16th LWA Workshops: KDML, IR and FGWM, pages 45–56, 2014.
-  R. Sifa, A. Drachen, and C. Bauckhage. Large-scale cross-game player behavior analysis on steam. In Proceedings of the Eleventh AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE-15), pages 198–204, 2015.
-  R. Sifa, E. Pawlakos, K. Zhai, S. Haran, R. Jha, D. Klabjan, and A. Drachen. Controlling the crucible: A novel PvP recommender systems framework for Destiny. In Proceedings of the Australasian Computer Science Week Multiconference, ACSW 2018, 2018.
-  A. Van den Oord, S. Dieleman, and B. Schrauwen. Deep content-based music recommendation. In Advances in neural information processing systems 26 (NIPS 2013), pages 2643–2651, 2013.