I Introduction
In the last few years, we have witnessed a genuine paradigm change in the development of video games [1, 2]. Nowadays, petabytes of player data are available, as every action, inapp purchase, guild conversation or ingame social interaction performed by the players is recorded. This provides data scientists and researchers with plenty of possibilities to construct sophisticated and reliable models to understand and predict player behavior and game dynamics. Game data are timedependent observations, as players are constantly interacting with the game. Therefore, it is paramount to understand and model player actions taking into account this temporal dimension.
Ingame events are drivers of player engagement. They influence player behavior due to their limited duration, strongly contributing to the ingame monetization (for instance, an event that offers a unique reward could serve to trigger inapp purchases). Those events break the monotony of the game, and thus it is essential to have a variety of types, such as battles, rewards, polls, etc. Anticipating the most adequate sequence of events and the right time to present them is a determinant factor to improve monetization and player engagement.
Forecasting time series data is a challenge common to multiple domains, e.g. weather or stock market prediction. Time series research has received significant contributions to improve the accuracy and time horizon of the forecasts, and an assortment of statistical learning and machine learning techniques have been developed or adapted to perform robust time series forecasting
[3, 4].Time series analysis focuses on studying the structure of the relationships between timedependent features, aiming to find mathematical expressions to represent these relationships. Based on them, several outcomes can be forecast [5].
Stochastic simulation [6] optimization consists on finding a local optimum for a response function whose values are not known analytically but can be inferred. On the other hand, the analysis of whatif scenarios (or just simulation optimization) is the process of finding the best inputs among all possibilities that maximize an outcome [7].
The aim of this work is twofold: on the one hand, to accurately forecast time series of ingame sales and playtime; on the other, to simulate events in order to find the best combination of ingame events and the optimal time to publish them. To achieve these goals, we performed an experimental analysis utilizing several techniques such as ARIMA (dynamic regression), gradient boosting, generalized additive models and deep neural networks
[8, 9, 10, 11].Pioneering studies on game data science in the field of video games, such as [12, 13, 14, 15], concentrated in churn prediction. Other related articles that analyze temporal data in the game domain [16, 17, 18, 19, 20] focused on unsupervised clustering, not in supervised time series forecast. To the best of our knowledge this is the first work in which forecasts of time series of game data and simulations of ingame events are performed.
Ii Model Description
Iia Autoregressive Integrated Moving Average (ARIMA)
ARIMA was firstly introduced by Box and Jenkins [21] in 1976, in a book that had a tremendous impact on the forecasting community. Since then, this method has been applied in a large variety of fields, and it remains a robust model, used in multiple operating forecast systems [22]. ARIMA characterizes time series focusing on three aspects: 1) The autoregressive (AR) terms model the past information of the time series, 2) The integrated (I) terms model the differencing needed for the time series to become stationary, e.g. the trend of the time series, 3) The moving average (MA) terms control the past noise of the original time series.
Specifically, the AR part represents the time series as a function of past observations,
(1) 
with the AR coefficients and the number of past observations needed to perform a forecast at the current time. The MA component, rather than focusing on past observations, uses a moving average of past error values in a regression model,
(2) 
where is the number of moving average lags, are the MA coefficients and the errors. Finally, a parameter is used to model the order of differencing, i.e. the I term of ARIMA:
(3) 
Taking is normally enough in most cases [5]. If , the model is reduced to an ARMA model.
The ARIMA model can only analyze stationary time series; however most of the stochastic processes exhibit nonstationary behavior. Only through the differencing operation, or by applying a previous additional transformation to the time series, e.g. a log or Box–Cox [23] transformation, we can convert the time series into stationary objects.
We can write the ARIMA model as
(4) 
where is the lag operator, i.e. . Therefore, we can express the ARIMA model in terms of the , and parameters as
(5) 
Box and Jenkins also generalized the model to recognize the seasonality pattern of the time series by using [21]. The parameters , , represent the number of seasonal autoregressive, differencing and moving average terms. The order of the seasonality (e.g. weekly, monthly, etc.) is set by . In order to determine the ARIMA parameters, the autocorrelation function (ACF) and partial autocorrelation function (PACF) are analyzed [21]
. Once the parameters are fixed, the model is fitted by maximum likelihood and selected (among the different estimated models) based on the Akaike (AIC)
[24] and Schwarz Bayesian (BIC) [25] information criteria.Dynamic Regression
To include the serial dependency of external covariates, a multivariate extension of Equation 5 is proposed, the socalled dynamic regression (DR). The relationship between the output, its lags and the variables is linear. DR takes the form [26]
(6) 
The first term represents the AR and I parts, the last corresponds to the MA component and the middle term is where the external variables are included. The are the corresponding coefficients of the covariates.
IiB Gradient Boosting Models
Gradient boosting machines (GBMs) [27] are ensemblebased machine learning methods capable of solving classification and regression problems[9]
. A GBM consists of an ensemble of weak learners (i.e. learners that perform only slightly better than random classifiers), commonly decision trees. Each weak learner is sequentially added to the ensemble, thus continuously improving its accuracy
[28]. The approach taken by GBMs stems from the idea that boosting (using multiple weak models to create a strong model) can be seen as an optimization algorithm on some suitable loss function
[29]. If this loss function is differentiable, the gradient can be calculated and gradient descent can be used for optimization [9]. This makes it possible to recursively add weak learners that follow the gradient, minimizing the loss function and reducing the error at each step. The construction of an ensemble to fit a desired function is illustrated below (a more indepth analysis can be found in [30]). For each weak learner in the ensemble, we sequentially do the following: First, we compute the negative gradient(7) 
where is the loss function and the current weak model.
Then, we fit a regression model predicting from and apply gradient descent, with step size given by
(8) 
Finally, we update the estimate of through
(9) 
IiC Generalized Additive and Generalized Additive Mixed Models
The generalized additive models (GAMs) derived by [33] are a combination of generalized linear (GLMs) and additive (AMs) models. In this way, GAMs exhibit the properties of both, namely the flexibility to adapt to any distribution of GLMs and the nonparametric nature of AMs.
The structure of a GAM is
(10) 
where is the number of predictors, is the expected value, is a link function, and are smooth functions.
Even though the distribution of the response variable is the same as for GLMs, there is a generalization that allows GAMs to accommodate different kinds of responses, for example binary or continuous. GAMs assume that the means of the predictors
are related to an additive response through a nonlinear link function(such as the identity or logarithm function). The model distribution can be selected from e.g. a Gaussian or Poisson distribution
[10].As additive models, in contrast to parametric regression analysis (which assumes a linear relation between responses and predictors), GAMs serve to explore nonparametric relationships, as they make no assumptions about those relations. Instead of using the sum of the individual effects of the predictors as observations, GAMs employ the sum of their smooth functions, which are called
splines [34] and include a parameter that controls the smoothness of the curve to prevent overfitting. With the link and smooth functions mentioned above, the GAM approach has the flexibility to interpret and regularize both linear and nonlinear effects [35].However, GAMs do not assume correlations between observations (such as the time series temporal correlation in this study). As a consequence, when these are present, another approach would be better suited to perform the forecast. Generalized additive mixed models (GAMMs) [36]
, an additive extension of generalized linear mixed models (GLMMs)
[37], constitute such an approach. GAMMs incorporate random effects and covariate effects (which model the correlations between observations by taking the order of observations into account) into the previous GAM formulation. When estimating the th observation, the structure of a GAMM is(11) 
Smoothness selection can be automatized [38] and the estimation of the GAMM is conducted via a penalized likelihood approach.
IiD Deep Belief Networks
Deep neural networks (DNNs)[39] have been used with great success in recent years, achieving cuttingedge results in a wide range of fields [40]. This method has outperformed alternative models on almost all tasks from image classification[41] to speech recognition[42] and natural language modeling[40].
Deep belief networks (DBNs)[43]
are an extension of DNNs where the units in the hidden layers are binary and stochastic. They are capable of learning a joint probability distribution between the input and the hidden layers and can be used as a generative model to draw samples from the learned distribution. A DBN uses a stack of restricted Boltzman machines (RBMs)
[44], where each RBM is trained on the result of the previous layer as follows:(12) 
with being the weight matrix between the layers and and being the logistic sigmoid.
For supervised learning, a DBN can be used as a way to pretrain a DNN
[45]. Each layer of the RBM is first trained separately, and then the weights in all layers are finetuned through standard backpropagation. A more comprehensive overview of DBNs and DNNs can be found in [43] and [39].There have been previous works on applying DNNs to time series forecasting problems (see e.g. [46] and [11]) and comparisons between ARIMA and artificial neural networks (ANNs) have also been performed [47]. However, when dealing with small datasets, standard DNNs can quickly overfit due to having too many parameters. Careful regularization is required to ensure that the model can be generalized to unseen data. Recent techniques like dropout (which randomly masks some of the hidden units) [48] or stochastic DBNs can significantly help with this problem, as they constrain the number of parameters that can be modified. Additionally, using regularization on the weights[49] and ensuring proper selection of the number of hidden units can make DBNs an effective model even when faced with smaller datasets.
Iii Forecast Performance Metrics
In order to validate the accuracy of forecasting models, several performance measures have been proposed in the literature [50]. The use of various metrics to analyze time series is a common practice. The ones selected in this study possess different properties, which is important to correctly assess the forecasting capabilities of the models from different perspectives (e.g. in terms of the magnitude or direction of the error). Each of the metrics summarized here is a function of the actual time series and forecast results. In all the equations below, refers to the number of observations, denotes the forecast value and represents the actual value.
Iiia Root Mean Squared Logarithmic Error (RMSLE)
The RMSLE can be defined as
(13) 
Because the RMSLE uses a logarithmic scale, it is less sensitive to outliers than the standard root mean square error (RMSE). Additionally, the RMSE has the same tendency to underestimate and overestimate values, whereas the RMSLE penalizes more the underestimated predictions.
IiiB Mean Absolute Scaled Error (MASE)
The MASE is a scale independent metric, i.e. it can be used to compare the relative accuracy of several forecasting models applied to different datasets. To calculate this metric, errors are scaled with the insample mean absolute error (MAE). The exporession for the MASE is
(14) 
IiiC Mean Absolute Percentage Error (MAPE)
The MAPE is estimated by
(15) 
Since it is a percentage error measure of the average absolute error, the MAPE is also scaleindependent. However, if the time series have zero values, the MAPE yields undefined results because of a division by zero [50]. The MAPE is also biased towards underestimated values and does not penalize large errors.
Models  

GBM  ARIMA  DBN  
Dataset  max_depth  eta  p  d  q  P  D  Q  m  h  n  plr  tlr  k  b  
Sales  Age of Ishtaria  100  0.20  2  1  1  1  1  1  7  2  50  0.0001  0.01  5  50 
Sales  Grand Sphere  1  0.76  2  1  1  1  0  1  7  2  300  0.001  0.1  2  10 
Playtime  Age of Ishtaria  1  0.66  2  1  2  1  1  1  7  2  50  0.0001  0.01  2  50 
Playtime  Grand Sphere  1000  0.23  1  1  1  1  1  1  7  2  300  0.0001  0.1  2  50 
Iv Dataset
Iva Data Source
The study presented in this article focuses on the analysis of two different daily time series, those of playtime and sales. The data was collected from two Japanese game titles developed by Silicon Studio: Age of Ishtaria (hereafter, AoI) and Grand Sphere (hereafter, GS). Playtime corresponds to the total amount of time spent in the game by all users, while sales represents the total amount of inapp purchases. For AoI, daily information was extracted from October 2014 until February 2017 ( years) and for GS, from July 2015 until March 2016 ( months). Additionally, data from all the game events, marketing and promotion campaigns within the collection period were also gathered to be used as external variables for the model. Figures 2 and 3 show the two daily time series for AoI and GS, respectively, which present clear differences concerning trends and seasonal behavior.
IvB External features
Proper identification and subsequent removal of outliers caused by unexplained phenomena can significantly improve the modeling of the time series [51]
. To that end, anomaly detection using a deep autoencoder
[52] was performed. This technique is capable of finding subtler anomalous behaviors than traditional methods, and shows that the outliers coincide with the external events of the game that take place on the same particular day. These events are derived from ingame information and included in the model as external variables. They can be:
Game Events: Events such as raid battles, boss battles, etc. Each type of event is input separately.

Gacha: A monetization technique used in many successful Japanese freetoplay games. It describes an event where players can spend money to randomly pull an item from a large pool (inspired by capsule toy vending machines).

Promotions: Release of new cards and discounts to engage current users.

Marketing Campaigns: Acquisition campaigns launched to obtain new users.
In the case of gacha and promotions, the corresponding event scale was also taken into account. This scale is used to quantify the outlier effect. It represents the influence or importance of the event in question and is related to the amount of money invested in it. Specifically, the event is assigned a value 1, 2, 3 or 4 to denote low, medium, high or superhigh influence.
Other nongamerelated features included in the model are:
Moreover, the day of the week and month of the year were added as extra input for the GAMM and GBM, to make it possible for the model to learn the seasonality effects. For the ARIMA model these data do not need to be included as they are already inherently considered by the seasonal parameters.
V Method
Va Data preparation
To make the original time series stationary, a Box–Cox transformation [55] is applied to the sales data, and a logarithm transformation to the playtime data, for the DR, GBM and DBN models. The GAMM is the only technique that does not require a prior transformation of the data.
The categorical values for the external features (game events, gacha
, promotions, marketing campaigns, national holidays, etc.) are included in the model as step functions (dummy variables). For each day in the training and forecasting data, the covariates are encoded with a vector that is either 0 or 1 depending on the absence or presence of the event on that date. However, for events with an
event scale, the vector value matches the corresponding scale value instead.Since promotions and marketing campaigns can have some delayed effects on the series, input is added to the days after the campaign release by means of a decay function. For marketing campaigns, oneweek effects are considered, and their values are assumed to decrease linearly with time; on the other hand, for promotions, the decrease is dependent on the scale of the campaign.
VB Model specification
VB1 Dr
VB2 Gbm
The implementation used for the GBM model is XGBoost
[56], an efficient and scalable tree boosting model. Table I contains the optimal parameters found for XGBoost using crossvalidation and grid search. The parameter refers to the maximum depth of a tree, and is the step size used to prevent overfitting. In the case of GBMs, tuning the model is computationally expensive and timeconsuming as there are many parameters. However, the parameter search can be automatized and directly reused for equivalent time series data from other game titles, which makes it more flexible.VB3 Gamm
As we have continuous data, we consider the identity
link function with a Gaussian distribution. For the GAMM, weekly and monthly seasonalities are introduced as cyclic Psplines
[57]. Gacha is added by applying a Pspline with 4knots corresponding to the four values of the event scale. For the temperature variable, we employ a cyclic cubic regression spline [58] that estimates a periodic smooth regression function for seasonal patterns. For the other variables (holidays, events, day of the week and the month), the default spline corresponding to lowrank isotropic smoothers is used [59].VB4 Dbn
The parameters obtained by grid search are shown in Table I (: number of hidden layers, : number of nodes per layer, : pretrain learning rate, : finetuning learning rate, : number of steps to perform Gibb sampling in the RBM,
: minibatch size). Before training the model, training data are shuffled and 80% of them are randomly assigned to the training set and the other 20% to the validation set. The model is first trained with the training set, and then validated with the validation set, for every epoch. To avoid overfitting, early stopping
[60] is applied and the finetuning iteration is terminated when the loss of the validation set stops decreasing for 20 consecutive epochs.Dataset  Model  RMSLE  MASE  MAPE 

DR  0.106  0.669  8.9  
Sales  GBM  0.118  0.727  10.2 
Age of Ishtaria  GAMM  0.106  0.672  9.1 
DBN  0.122  0.770  11.1  
DR  0.119  0.741  9.8  
Sales  GBM  0.164  0.814  14.8 
Grand Sphere  GAMM  0.142  0.874  11.9 
DBN  0.128  0.762  11.4  
DR  0.243  0.921  18.7  
Playtime  GBM  0.237  1.057  19.5 
Age of Ishtaria  GAMM  0.246  0.931  22.7 
DBN  0.357  1.057  31.5  
DR  0.362  1.034  35.6  
Playtime  GBM  0.559  1.000  73.5 
Grand Sphere  GAMM  0.495  1.011  56.2 
DBN  0.265  0.991  24.6 
VC Model Validation
To perform the predictions, a period of thirty days is chosen in order to obtain monthly forecasts of sales and playtime. For the test evaluation, we use a rolling forecasting technique [61], taking steps of 7 days for each new forecast and with a minimum of 6 months of training data. For AoI the forecasts were performed weekly from Nov 2, 2015 until Jan 10, 2017. In the case of GS, they were carried out from Oct 5, 2015 until Feb 1, 2016. The average errors are then computed over the rolling prediction results, which serves to compare the RMSLE, MASE and MAPE values.
Vi Results
Figure 4 shows an example of the predictions within a given period for each of the models. It illustrates that, while the performance of both DR and the GAMM is relatively similar, the GBM model has much more difficulty capturing the valleys of the series, while the DBN overestimates the peaks. The prediction errors for sales and playtime displayed in Table II also reflect this, showing that both of these models perform worse than the GAMM and DR in the case of AoI.
DR does provide better results for playtime forecasting than the GAM; still, to achieve a proper parameter selection, we need to check the autocorrelation and other measures to have more control over the fitting. This evaluation turns DR into a rigid model, which cannot adapt easily to time series data from other games, while the GAMM is more flexible and easier to tune. Once the smooth functions in the GAMM are selected, they automatically adjust to fit the distribution of the external variables. This way, the model can also learn to fit time series data from other games of the same nature.
Table II also shows the forecasting error results for GS. We can see that the pattern is similar to that for AoI. The performance of the GAMM and DR is approximately the same, and both methods outperform GBM on the sales forecasting. In this case, the DBN does perform significantly better than the GBM model, and also slightly better than the GAMM. Overall, however, taking into account all error measures for both games and dimensions (i.e. playtime and sales), the GAMM yields the most consistent results.
Via Forecasting horizon
The forecasting horizon was evaluated for all the models, since the performance can significantly decrease as the number of days forecast increases [62]. Figure 5 depicts the resulting RMSLE, MASE and MAPE as a function of time, illustrating thus the predictability of the different models. The GAMM performs better than all the other models, staying much flatter for all error measures, which indicates that the forecast accuracy does not decrease much even when forecasting two or three months into the future. The GBMmodel also shows a steadier behavior, with a reasonably stable forecasting accuracy. However, this method has much higher initial and overall errors than the GAMM. The DBN has more difficulty keeping the predictions stable, showing divergent behavior as the number of days forecast increases. Finally, the DR results diverge rapidly as the forecast period becomes larger. For a short prediction range of just a few days, this technique performs better than the GAMM, but when the forecast horizon increases, the errors also become significantly larger.
ViB Minimal training set
We performed an error analysis of the model performance as a function of the training set size in order to evaluate the minimal training time required to obtain robust predictions.
Figure 5 shows a significant drop in the prediction errors after 12 months of training time for the GAMM, DBN and GBM. For the GAMM, the error consistently decreases with training time, while the DR performance is more unstable. The latter behavior can also be seen for the DBN and could be explained by the variable nature of the time series, which causes instability during training. In general, a 12month training set should be sufficient to obtain the most accurate forecasts, but even after just 6 months of training data the errors are already relatively low, especially for DR and the GAMM.
Vii Event simulation
Simulation optimization (i.e. the analysis of whatif scenarios) is used to find the optimum input value that maximizes the response [7, 6]. Using the time series forecasting models proposed in this work, a simulation was carried out to analyze the effect of future events on the total playtime and sales. The order of the upcoming events can be changed to evaluate how a different event planning impacts the forecasts.
Viia Simulation Results and Analysis
Figure 7 shows an example of a simulation, with different event sequences being input into the models. In the case of DR, the sales for Sequences 1 and 2 were 37% higher and 25% lower, resepctively, than for the predefined original event sequence (the sequence that had been planned). Thus, using a different sequence of events the total sales could have been increased by 37%. For the GAMM, Sequence 1 results in an amount of sales lower than that for DR (by 32%). Although all models are suitable to perform simulations, their forecasts present different levels of accuracy. As shown in the results section, the GAMM model provides the most accurate estimate of the predicted sales and therefore it can also be expected to produce the most precise simulation results.
In DR models, the response has a linear relationship with the features, which has the drawback of not capturing the nonlinear behavior of real phenomena. However, this also makes simulations easier, as the interpretation of the parameters is straightforward compared to other strongly nonlinear models, like DBNs.
ViiB Simulation Engine Tool
A common business problem faced by many game studios is how to plan future acquisition and ingame events so as to maximize sales and playtime [63]. While it is possible to manually investigate the success of past events, there are too many potentially correlated variables to be considered. The proposed forecasting models can do this automatically, and learn all the potential impacts of external variables on the future sales and playtime, providing a better estimate of the effect of future event sequences.
In order to provide a solution to the eventplanning problem from a business perspective, a webbased system for time series simulation was developed. With this system, users can easily plan ingame events by means of an event planner user interface. After inputting the planned events, the daily sales and playtime for the next 30 days will be simulated and shown within a few seconds.
The system structure is shown in Figure 6. A database stores the forecasting models, the model parameters, and the external variables mentioned in Section IVB (such as temperature and holidays). The parameters are tuned once for each game, and then models are trained once per month with the parameters. When the server receives a simulation request, it connects the database, the file system and the modeling script, so that the modeling script can obtain the data required for the simulation. After the simulation is finished, the server reads the output file and sends the results to the frontend for display.
Viii Summary and Conclusion
Overall, we found that the GAMM and ARIMA models are the most accurate for daily forecasting of sales and playtime. This result held for both of the evaluated games, Age of Ishtaria and Grand Sphere. However the GAMM has the advantage of requiring less manual tuning, which makes it more practical in a production environment. The gradient boosting model is less suitable for forecasting with these particular time series, as it showed difficulty capturing the peaks and valleys of the data. Similarly, the DBN overestimated the peaks, which could be explained by the fact that the model has many parameters, while the dataset used was relatively small (less than 1000 observations). The GAMM and ARIMA models, on the other hand, have less parameters, avoiding altogether the overfitting issue.
Alternatively, when dealing with much larger datasets that present longrange dependencies, a long shortterm memory (LSTM)
[64] model could be used to properly capture and learn these dependencies and provide accurate future predictions. However, for small datasets with very shortrange dependencies, as is the case for daily sales and playtime data, the input can still be fit by a standard DNN using a sliding window over the past days. This approach, though, still suffers from the overfitting problem due to the small number of data.Nevertheless the DBN or DNN models still show room for improvement in time series forecasting, even when dealing with small datasets. ARIMA is a common forecasting model applied in a wide range of fields[65] and has the advantage of inherently containing parameters for time series forecasting. Potentially, we could incorporate such parameters into DNNs, and inspiration from the ARIMA model could be drawn to construct a similar model suitable for nonlinear forecasts. However this approach is beyond the scope of this paper, as the aim here was to have an interpretable, wellestablished model capable of performing accurate predictions and simulations that can be applied to time series forecasting in games.
The GAMM allows for a generalizable model that can correctly capture the time series dynamics. It can be used not only to forecast both future playtime and sales, but also to simulate future game and marketing events. Instead of randomly deciding the event planning, we can employ a modelbased approach that uses past information to automatically learn the interactions that are relevant for predicting event success (e.g. the weather, the day of the week and the national holidays). We provided a solution that can be used operationally in a business setting to get realtime simulation results. Allowing game studios to accurately simulate future events can help them to optimize their planning of acquisition campaigns and ingame events, ultimately leading to an increase in the amount of user playing time and to an overall rise of the ingame monetization.
Ix Software
Acknowledgements
We thank Sovannrith Lay for helping to gather the data and Javier Grande for his careful review of the manuscript.
References
 [1] M. S. ElNasr, A. Drachen, and A. Canossa, “Game analytics,” New York, Sprint, 2013.
 [2] G. N. Yannakakis and J. Togelius, Artificial Intelligence and Games. Springer, 2017, http://gameaibook.org.
 [3] J. G. De Gooijer and R. J. Hyndman, “25 years of time series forecasting,” International journal of forecasting, vol. 22, no. 3, pp. 443–473, 2006.
 [4] P. J. Brockwell and R. A. Davis, Introduction to time series and forecasting. Springer, 2016.
 [5] R. Adhikari and R. Agrawal, “An introductory study on time series modeling and forecasting,” arXiv preprint arXiv:1302.6613, 2013.
 [6] S. Asmussen and P. W. Glynn, Stochastic simulation: algorithms and analysis. Springer Science & Business Media, 2007, vol. 57.
 [7] Y. Carson and A. Maria, “Simulation optimization: methods and applications,” in Proceedings of the 29th conference on Winter simulation. IEEE Computer Society, 1997, pp. 118–126.
 [8] G. E. Box and G. M. Jenkins, Time series analysis: forecasting and control, revised ed. HoldenDay, 1976.
 [9] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001.
 [10] T. J. Hastie and R. J. Tibshirani, Generalized additive models. CRC press, 1990, vol. 43.
 [11] E. Busseti, I. Osband, and S. Wong, “Deep learning for time series modeling,” Technical report, Stanford University, 2012.
 [12] C. Bauckhage, K. Kersting, R. Sifa, C. Thurau, A. Drachen, and A. Canossa, “How players lose interest in playing a game: An empirical study based on distributions of total playing times,” in Computational Intelligence and Games (CIG), 2012 IEEE conference on. IEEE, 2012, pp. 139–146.
 [13] F. Hadiji, R. Sifa, A. Drachen, C. Thurau, K. Kersting, and C. Bauckhage, “Predicting player churn in the wild,” in Computational intelligence and games (CIG), 2014 IEEE conference on. IEEE, 2014, pp. 1–8.
 [14] Á. Periáñez, A. Saas, A. Guitart, and C. Magne, “Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using Survival Ensembles,” in Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on. IEEE, 2016, pp. 564–573.
 [15] P. Bertens, A. Guitart, and Á. Periáñez, “Games and Big Data: A Scalable MultiDimensional Churn Prediction Model,” Submitted to IEEE CIG, 2017.
 [16] C. Bauckhage, A. Drachen, and R. Sifa, “Clustering game behavior data,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 7, no. 3, pp. 266–278, 2015.
 [17] A. Drachen, R. Sifa, C. Bauckhage, and C. Thurau, “Guns, swords and data: Clustering of player behavior in computer games in the wild,” in Computational Intelligence and Games (CIG), 2012 IEEE Conference on. IEEE, 2012, pp. 163–170.
 [18] A. Drachen, C. Thurau, R. Sifa, and C. Bauckhage, “A comparison of methods for player clustering via behavioral telemetry,” arXiv preprint arXiv:1407.3950, 2014.
 [19] R. Sifa, C. Bauckhage, and A. Drachen, “The playtime principle: Largescale crossgames interest modeling,” in Computational Intelligence and Games (CIG), 2014 IEEE Conference on. IEEE, 2014, pp. 1–8.
 [20] A. Saas, A. Guitart, and Á. Periáñez, “Discovering playing patterns: Time series clustering of freetoplay game data,” in Computational Intelligence and Games (CIG), 2016 IEEE Conference on. IEEE, 2016, pp. 1–8.
 [21] G. E. Box and G. M. Jenkins, Time series analysis: forecasting and control, revised ed. HoldenDay, 1976.
 [22] K. D. Lawrence and M. D. Geurts, Advances in business and management forecasting. Emerald Group Publishing, 2006, vol. 4.
 [23] G. E. Box and D. R. Cox, “An analysis of transformations,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 211–252, 1964.
 [24] H. Akaike, “A new look at the statistical model identification,” IEEE transactions on automatic control, vol. 19, no. 6, pp. 716–723, 1974.
 [25] G. Schwarz, “Estimating the dimension of a model,” The annals of statistics, vol. 6, no. 2, pp. 461–464, 1978.

[26]
J. G. Cragg, “Estimation and testing in timeseries regression models with heteroscedastic disturbances,”
Journal of Econometrics, vol. 20, no. 1, pp. 135–157, 1982.  [27] T. G. Dietterich, “Ensemble methods in machine learning,” in International workshop on multiple classifier systems. Springer, 2000, pp. 1–15.
 [28] L. Mason, J. Baxter, P. L. Bartlett, and M. R. Frean, “Boosting Algorithms as Gradient Descent.” in NIPS, 1999, pp. 512–518.
 [29] L. Breiman, “Arcing the edge,” Technical Report 486, Statistics Department, University of California at Berkeley, Tech. Rep., 1997.
 [30] G. Ridgeway, “Generalized boosted models: A guide to the gbm package,” Update, vol. 1, no. 1, p. 2007, 2007.
 [31] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Frontiers in neurorobotics, vol. 7, p. 21, 2013.
 [32] T. Zhang and B. Yu, “Boosting with early stopping: Convergence and consistency,” The Annals of Statistics, vol. 33, no. 4, pp. 1538–1579, 2005.
 [33] T. Hastie and R. Tibshirani, “Generalized additive models: some applications,” Journal of the American Statistical Association, vol. 82, no. 398, pp. 371–386, 1987.
 [34] J. Maindonald, “Smoothing terms in GAM models,” 2010.
 [35] K. Larsen, “GAM: The predictive modeling silver bullet,” Multithreaded. Stitch Fix, vol. 30, 2015.
 [36] C. Chen, “Generalized additive mixed models,” Communications in StatisticsTheory and Methods, vol. 29, no. 56, pp. 1257–1271, 2000.
 [37] N. E. Breslow and D. G. Clayton, “Approximate inference in generalized linear mixed models,” Journal of the American statistical Association, vol. 88, no. 421, pp. 9–25, 1993.
 [38] S. N. Wood, “Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 73, no. 1, pp. 3–36, 2011.
 [39] Y. Bengio, “Learning deep architectures for AI,” Foundations and trends® in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
 [40] L. Deng and D. Yu, “Deep learning: methods and applications,” Foundations and Trends® in Signal Processing, vol. 7, no. 3–4, pp. 197–387, 2014.

[41]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in
Advances in neural information processing systems, 2012, pp. 1097–1105. 
[42]
A. Graves, A.r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in
Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 2013, pp. 6645–6649.  [43] G. E. Hinton, S. Osindero, and Y.W. Teh, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006.

[44]
D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, “A learning algorithm for Boltzmann machines,”
Cognitive science, vol. 9, no. 1, pp. 147–169, 1985. 
[45]
H. Larochelle and Y. Bengio, “Classification using discriminative restricted Boltzmann machines,” in
Proceedings of the 25th international conference on Machine learning. ACM, 2008, pp. 536–543.  [46] M. Längkvist, L. Karlsson, and A. Loutfi, “A review of unsupervised feature learning and deep learning for timeseries modeling,” Pattern Recognition Letters, vol. 42, pp. 11–24, 2014.
 [47] G. Zhang, B. E. Patuwo, and M. Y. Hu, “Forecasting with artificial neural networks: The state of the art,” International journal of forecasting, vol. 14, no. 1, pp. 35–62, 1998.
 [48] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.

[49]
A. Y. Ng, “Feature selection, l 1 vs. l 2 regularization, and rotational invariance,” in
Proceedings of the twentyfirst international conference on Machine learning. ACM, 2004, p. 78.  [50] R. J. Hyndman and A. B. Koehler, “Another look at measures of forecast accuracy,” International journal of forecasting, vol. 22, no. 4, 2005.
 [51] A. J. Fox, “Outliers in time series,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 350–363, 1972.
 [52] M. Sakurada and T. Yairi, “Anomaly detection using autoencoders with nonlinear dimensionality reduction,” in Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis. ACM, 2014, p. 4.
 [53] TimeAndDate.com. (2010) Japan National Holidays history. [Online]. Available: https://www.timeanddate.com/holidays/japan/
 [54] (1995) Tokyo Daily Temperature history. [Online]. Available: https://www.wunderground.com/
 [55] G. E. Box and D. R. Cox, “An analysis of transformations,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 211–252, 1964.
 [56] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016, pp. 785–794.
 [57] P. H. Eilers and B. D. Marx, “Flexible smoothing with Bsplines and penalties,” Statistical science, pp. 89–102, 1996.
 [58] S. N. Wood, Generalized additive models: an introduction with R. CRC press, 2017.
 [59] ——, “Thin plate regression splines,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 65, no. 1, pp. 95–114, 2003.
 [60] L. Prechelt, “Early stoppingbut when?” in Neural Networks: Tricks of the trade. Springer, 1998, pp. 55–69.
 [61] M. Gilliland, U. Sglavo, and L. Tashman, Business Forecasting: Practical Problems and Solutions. John Wiley & Sons, 2016.
 [62] S. Makridakis and M. Hibon, “The M3Competition: results, conclusions and implications,” International journal of forecasting, vol. 16, no. 4, pp. 451–476, 2000.
 [63] J. Julkunen, “Feature Spotlight: InGame Events and Market Trends,” 2016. [Online]. Available: http://www.gamerefinery.com/ingameeventsmarkettrends/
 [64] S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
 [65] L. Dwyer, A. Gill, and N. Seetaram, Handbook of research methods in tourism: Quantitative and qualitative approaches. Edward Elgar Publishing, 2012.
 [66] R. Hyndman and Y. Khandakar, “Automatic time series forecasting: the forecast package for R,” 2008.
 [67] S. Wood, “mgcv: Mixed GAM computation vehicle with GCV/AIC/REML smoothness estimation,” 2012.
 [68] Theano Development Team, “Theano: A Python framework for fast computation of mathematical expressions,” arXiv eprints, vol. abs/1605.02688, May 2016. [Online]. Available: http://arxiv.org/abs/1605.02688