Forecasting Player Behavioral Data and Simulating in-Game Events

10/05/2017 ∙ by Anna Guitart, et al. ∙ Silicon Studio Corp. 0

Understanding player behavior is fundamental in game data science. Video games evolve as players interact with the game, so being able to foresee player experience would help to ensure a successful game development. In particular, game developers need to evaluate beforehand the impact of in-game events. Simulation optimization of these events is crucial to increase player engagement and maximize monetization. We present an experimental analysis of several methods to forecast game-related variables, with two main aims: to obtain accurate predictions of in-app purchases and playtime in an operational production environment, and to perform simulations of in-game events in order to maximize sales and playtime. Our ultimate purpose is to take a step towards the data-driven development of games. The results suggest that, even though the performance of traditional approaches such as ARIMA is still better, the outcomes of state-of-the-art techniques like deep learning are promising. Deep learning comes up as a well-suited general model that could be used to forecast a variety of time series with different dynamic behaviors.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In the last few years, we have witnessed a genuine paradigm change in the development of video games [1, 2]. Nowadays, petabytes of player data are available, as every action, in-app purchase, guild conversation or in-game social interaction performed by the players is recorded. This provides data scientists and researchers with plenty of possibilities to construct sophisticated and reliable models to understand and predict player behavior and game dynamics. Game data are time-dependent observations, as players are constantly interacting with the game. Therefore, it is paramount to understand and model player actions taking into account this temporal dimension.

In-game events are drivers of player engagement. They influence player behavior due to their limited duration, strongly contributing to the in-game monetization (for instance, an event that offers a unique reward could serve to trigger in-app purchases). Those events break the monotony of the game, and thus it is essential to have a variety of types, such as battles, rewards, polls, etc. Anticipating the most adequate sequence of events and the right time to present them is a determinant factor to improve monetization and player engagement.

Fig. 1: Screenshot of an in-game gacha event in Grand Sphere developed by Silicon Studio

. The left panel shows an item that players can purchase, which on opening reveals an in-game card (shown in the right panel), in this case having 3 stars. The card obtained is random, and cards with more stars are more valuable and also rarer. Different in-game events can modify the probability of getting cards with more stars or the types of cards that can be obtained.

Forecasting time series data is a challenge common to multiple domains, e.g. weather or stock market prediction. Time series research has received significant contributions to improve the accuracy and time horizon of the forecasts, and an assortment of statistical learning and machine learning techniques have been developed or adapted to perform robust time series forecasting

[3, 4].

Time series analysis focuses on studying the structure of the relationships between time-dependent features, aiming to find mathematical expressions to represent these relationships. Based on them, several outcomes can be forecast [5].

Stochastic simulation [6] optimization consists on finding a local optimum for a response function whose values are not known analytically but can be inferred. On the other hand, the analysis of what-if scenarios (or just simulation optimization) is the process of finding the best inputs among all possibilities that maximize an outcome [7].

The aim of this work is twofold: on the one hand, to accurately forecast time series of in-game sales and playtime; on the other, to simulate events in order to find the best combination of in-game events and the optimal time to publish them. To achieve these goals, we performed an experimental analysis utilizing several techniques such as ARIMA (dynamic regression), gradient boosting, generalized additive models and deep neural networks

[8, 9, 10, 11].

Pioneering studies on game data science in the field of video games, such as [12, 13, 14, 15], concentrated in churn prediction. Other related articles that analyze temporal data in the game domain [16, 17, 18, 19, 20] focused on unsupervised clustering, not in supervised time series forecast. To the best of our knowledge this is the first work in which forecasts of time series of game data and simulations of in-game events are performed.

Ii Model Description

Ii-a Autoregressive Integrated Moving Average (ARIMA)

ARIMA was firstly introduced by Box and Jenkins [21] in 1976, in a book that had a tremendous impact on the forecasting community. Since then, this method has been applied in a large variety of fields, and it remains a robust model, used in multiple operating forecast systems [22]. ARIMA characterizes time series focusing on three aspects: 1) The autoregressive (AR) terms model the past information of the time series, 2) The integrated (I) terms model the differencing needed for the time series to become stationary, e.g. the trend of the time series, 3) The moving average (MA) terms control the past noise of the original time series.

Specifically, the AR part represents the time series as a function of past observations,

(1)

with the AR coefficients and the number of past observations needed to perform a forecast at the current time. The MA component, rather than focusing on past observations, uses a moving average of past error values in a regression model,

(2)

where is the number of moving average lags, are the MA coefficients and the errors. Finally, a parameter is used to model the order of differencing, i.e. the I term of ARIMA:

(3)

Taking is normally enough in most cases [5]. If , the model is reduced to an ARMA model.

The ARIMA model can only analyze stationary time series; however most of the stochastic processes exhibit non-stationary behavior. Only through the differencing operation, or by applying a previous additional transformation to the time series, e.g. a log or Box–Cox [23] transformation, we can convert the time series into stationary objects.

We can write the ARIMA model as

(4)

where is the lag operator, i.e. . Therefore, we can express the ARIMA model in terms of the , and parameters as

(5)

Box and Jenkins also generalized the model to recognize the seasonality pattern of the time series by using [21]. The parameters , , represent the number of seasonal autoregressive, differencing and moving average terms. The order of the seasonality (e.g. weekly, monthly, etc.) is set by . In order to determine the ARIMA parameters, the autocorrelation function (ACF) and partial autocorrelation function (PACF) are analyzed [21]

. Once the parameters are fixed, the model is fitted by maximum likelihood and selected (among the different estimated models) based on the Akaike (AIC)

[24] and Schwarz Bayesian (BIC) [25] information criteria.

Dynamic Regression

To include the serial dependency of external covariates, a multivariate extension of Equation 5 is proposed, the so-called dynamic regression (DR). The relationship between the output, its lags and the variables is linear. DR takes the form [26]

(6)

The first term represents the AR and I parts, the last corresponds to the MA component and the middle term is where the external variables are included. The are the corresponding coefficients of the covariates.

Fig. 2: Time series of the total daily sales (above) and total daily playtime (below) from Age of Ishtaria over a period of years. In this and other figures below, the quantitative sales and playtime values are not shown in the vertical axes for privacy reasons.

Ii-B Gradient Boosting Models

Gradient boosting machines (GBMs) [27] are ensemble-based machine learning methods capable of solving classification and regression problems[9]

. A GBM consists of an ensemble of weak learners (i.e. learners that perform only slightly better than random classifiers), commonly decision trees. Each weak learner is sequentially added to the ensemble, thus continuously improving its accuracy

[28]

. The approach taken by GBMs stems from the idea that boosting (using multiple weak models to create a strong model) can be seen as an optimization algorithm on some suitable loss function

[29]. If this loss function is differentiable, the gradient can be calculated and gradient descent can be used for optimization [9]. This makes it possible to recursively add weak learners that follow the gradient, minimizing the loss function and reducing the error at each step. The construction of an ensemble to fit a desired function is illustrated below (a more in-depth analysis can be found in [30]). For each weak learner in the ensemble, we sequentially do the following: First, we compute the negative gradient

(7)

where is the loss function and the current weak model.

Then, we fit a regression model predicting from and apply gradient descent, with step size given by

(8)

Finally, we update the estimate of through

(9)

GBMs can effectively capture complex non-linear dependencies [31] and several strategies to avoid overfitting can be applied, e.g. shrinkage [9] or early stopping [32].

Ii-C Generalized Additive and Generalized Additive Mixed Models

The generalized additive models (GAMs) derived by [33] are a combination of generalized linear (GLMs) and additive (AMs) models. In this way, GAMs exhibit the properties of both, namely the flexibility to adapt to any distribution of GLMs and the non-parametric nature of AMs.

The structure of a GAM is

(10)

where is the number of predictors, is the expected value, is a link function, and are smooth functions.

Even though the distribution of the response variable is the same as for GLMs, there is a generalization that allows GAMs to accommodate different kinds of responses, for example binary or continuous. GAMs assume that the means of the predictors

are related to an additive response through a nonlinear link function

(such as the identity or logarithm function). The model distribution can be selected from e.g. a Gaussian or Poisson distribution

[10].

As additive models, in contrast to parametric regression analysis (which assumes a linear relation between responses and predictors), GAMs serve to explore non-parametric relationships, as they make no assumptions about those relations. Instead of using the sum of the individual effects of the predictors as observations, GAMs employ the sum of their smooth functions, which are called

splines [34] and include a parameter that controls the smoothness of the curve to prevent overfitting. With the link and smooth functions mentioned above, the GAM approach has the flexibility to interpret and regularize both linear and nonlinear effects [35].

However, GAMs do not assume correlations between observations (such as the time series temporal correlation in this study). As a consequence, when these are present, another approach would be better suited to perform the forecast. Generalized additive mixed models (GAMMs) [36]

, an additive extension of generalized linear mixed models (GLMMs)

[37], constitute such an approach. GAMMs incorporate random effects and covariate effects (which model the correlations between observations by taking the order of observations into account) into the previous GAM formulation. When estimating the th observation, the structure of a GAMM is

(11)

Smoothness selection can be automatized [38] and the estimation of the GAMM is conducted via a penalized likelihood approach.

Ii-D Deep Belief Networks

Deep neural networks (DNNs)[39] have been used with great success in recent years, achieving cutting-edge results in a wide range of fields [40]. This method has outperformed alternative models on almost all tasks from image classification[41] to speech recognition[42] and natural language modeling[40].

Deep belief networks (DBNs)[43]

are an extension of DNNs where the units in the hidden layers are binary and stochastic. They are capable of learning a joint probability distribution between the input and the hidden layers and can be used as a generative model to draw samples from the learned distribution. A DBN uses a stack of restricted Boltzman machines (RBMs)

[44], where each RBM is trained on the result of the previous layer as follows:

(12)

with being the weight matrix between the layers and and being the logistic sigmoid.

For supervised learning, a DBN can be used as a way to pre-train a DNN

[45]. Each layer of the RBM is first trained separately, and then the weights in all layers are fine-tuned through standard back-propagation. A more comprehensive overview of DBNs and DNNs can be found in [43] and [39].

There have been previous works on applying DNNs to time series forecasting problems (see e.g. [46] and [11]) and comparisons between ARIMA and artificial neural networks (ANNs) have also been performed [47]. However, when dealing with small datasets, standard DNNs can quickly overfit due to having too many parameters. Careful regularization is required to ensure that the model can be generalized to unseen data. Recent techniques like drop-out (which randomly masks some of the hidden units) [48] or stochastic DBNs can significantly help with this problem, as they constrain the number of parameters that can be modified. Additionally, using regularization on the weights[49] and ensuring proper selection of the number of hidden units can make DBNs an effective model even when faced with smaller datasets.

Fig. 3: Grand Sphere time series of total daily sales (above) and total daily playtime (below) over a period of months. Quantitative values are not shown in the vertical axes for privacy reasons.

Iii Forecast Performance Metrics

In order to validate the accuracy of forecasting models, several performance measures have been proposed in the literature [50]. The use of various metrics to analyze time series is a common practice. The ones selected in this study possess different properties, which is important to correctly assess the forecasting capabilities of the models from different perspectives (e.g. in terms of the magnitude or direction of the error). Each of the metrics summarized here is a function of the actual time series and forecast results. In all the equations below, refers to the number of observations, denotes the forecast value and represents the actual value.

Iii-a Root Mean Squared Logarithmic Error (RMSLE)

The RMSLE can be defined as

(13)

Because the RMSLE uses a logarithmic scale, it is less sensitive to outliers than the standard root mean square error (RMSE). Additionally, the RMSE has the same tendency to underestimate and overestimate values, whereas the RMSLE penalizes more the underestimated predictions.

Iii-B Mean Absolute Scaled Error (MASE)

The MASE is a scale independent metric, i.e. it can be used to compare the relative accuracy of several forecasting models applied to different datasets. To calculate this metric, errors are scaled with the in-sample mean absolute error (MAE). The exporession for the MASE is

(14)

Iii-C Mean Absolute Percentage Error (MAPE)

The MAPE is estimated by

(15)

Since it is a percentage error measure of the average absolute error, the MAPE is also scale-independent. However, if the time series have zero values, the MAPE yields undefined results because of a division by zero [50]. The MAPE is also biased towards underestimated values and does not penalize large errors.

Models
GBM ARIMA DBN
Dataset max_depth eta p d q P D Q m h n plr tlr k b
Sales Age of Ishtaria 100 0.20 2 1 1 1 1 1 7 2 50 0.0001 0.01 5 50
Sales Grand Sphere 1 0.76 2 1 1 1 0 1 7 2 300 0.001 0.1 2 10
Playtime Age of Ishtaria 1 0.66 2 1 2 1 1 1 7 2 50 0.0001 0.01 2 50
Playtime Grand Sphere 1000 0.23 1 1 1 1 1 1 7 2 300 0.0001 0.1 2 50
TABLE I: Tuned parameters for each model

Iv Dataset

Iv-a Data Source

The study presented in this article focuses on the analysis of two different daily time series, those of playtime and sales. The data was collected from two Japanese game titles developed by Silicon Studio: Age of Ishtaria (hereafter, AoI) and Grand Sphere (hereafter, GS). Playtime corresponds to the total amount of time spent in the game by all users, while sales represents the total amount of in-app purchases. For AoI, daily information was extracted from October 2014 until February 2017 ( years) and for GS, from July 2015 until March 2016 ( months). Additionally, data from all the game events, marketing and promotion campaigns within the collection period were also gathered to be used as external variables for the model. Figures 2 and 3 show the two daily time series for AoI and GS, respectively, which present clear differences concerning trends and seasonal behavior.

Iv-B External features

Proper identification and subsequent removal of outliers caused by unexplained phenomena can significantly improve the modeling of the time series [51]

. To that end, anomaly detection using a deep autoencoder

[52] was performed. This technique is capable of finding subtler anomalous behaviors than traditional methods, and shows that the outliers coincide with the external events of the game that take place on the same particular day. These events are derived from in-game information and included in the model as external variables. They can be:

  • Game Events: Events such as raid battles, boss battles, etc. Each type of event is input separately.

  • Gacha: A monetization technique used in many successful Japanese free-to-play games. It describes an event where players can spend money to randomly pull an item from a large pool (inspired by capsule toy vending machines).

  • Promotions: Release of new cards and discounts to engage current users.

  • Marketing Campaigns: Acquisition campaigns launched to obtain new users.

In the case of gacha and promotions, the corresponding event scale was also taken into account. This scale is used to quantify the outlier effect. It represents the influence or importance of the event in question and is related to the amount of money invested in it. Specifically, the event is assigned a value 1, 2, 3 or 4 to denote low, medium, high or super-high influence.

Other non-game-related features included in the model are:

  • National Holidays: National Holidays in Japan. [53]

  • Temperature: Average daily temperature in Tokyo [54].

Moreover, the day of the week and month of the year were added as extra input for the GAMM and GBM, to make it possible for the model to learn the seasonality effects. For the ARIMA model these data do not need to be included as they are already inherently considered by the seasonal parameters.

V Method

V-a Data preparation

To make the original time series stationary, a Box–Cox transformation [55] is applied to the sales data, and a logarithm transformation to the playtime data, for the DR, GBM and DBN models. The GAMM is the only technique that does not require a prior transformation of the data.

The categorical values for the external features (game events, gacha

, promotions, marketing campaigns, national holidays, etc.) are included in the model as step functions (dummy variables). For each day in the training and forecasting data, the covariates are encoded with a vector that is either 0 or 1 depending on the absence or presence of the event on that date. However, for events with an

event scale, the vector value matches the corresponding scale value instead.

Since promotions and marketing campaigns can have some delayed effects on the series, input is added to the days after the campaign release by means of a decay function. For marketing campaigns, one-week effects are considered, and their values are assumed to decrease linearly with time; on the other hand, for promotions, the decrease is dependent on the scale of the campaign.

V-B Model specification

V-B1 Dr

ARIMA parameters are calculated through the ACF and PACF functions. The parameter tuning of the dynamic regression model is performed using cross-validation, based on the MAPE metric presented in Section III. Table I shows the parameters used to fit the models.

V-B2 Gbm

The implementation used for the GBM model is XGBoost

[56], an efficient and scalable tree boosting model. Table I contains the optimal parameters found for XGBoost using cross-validation and grid search. The parameter refers to the maximum depth of a tree, and is the step size used to prevent overfitting. In the case of GBMs, tuning the model is computationally expensive and time-consuming as there are many parameters. However, the parameter search can be automatized and directly re-used for equivalent time series data from other game titles, which makes it more flexible.

V-B3 Gamm

As we have continuous data, we consider the identity

link function with a Gaussian distribution. For the GAMM, weekly and monthly seasonalities are introduced as cyclic P-splines

[57]. Gacha is added by applying a P-spline with 4-knots corresponding to the four values of the event scale. For the temperature variable, we employ a cyclic cubic regression spline [58] that estimates a periodic smooth regression function for seasonal patterns. For the other variables (holidays, events, day of the week and the month), the default spline corresponding to low-rank isotropic smoothers is used [59].

V-B4 Dbn

The parameters obtained by grid search are shown in Table I (: number of hidden layers, : number of nodes per layer, : pre-train learning rate, : fine-tuning learning rate, : number of steps to perform Gibb sampling in the RBM,

: mini-batch size). Before training the model, training data are shuffled and 80% of them are randomly assigned to the training set and the other 20% to the validation set. The model is first trained with the training set, and then validated with the validation set, for every epoch. To avoid overfitting, early stopping

[60] is applied and the fine-tuning iteration is terminated when the loss of the validation set stops decreasing for 20 consecutive epochs.

Fig. 4: Actual and forecast sales time series for AoI, for the period from Jan 10, 2017 until Feb 8, 2017. The different panels correspond to the forecast with DR (top left), GAMM (top right) gradient boosting (bottom left) and DBN (bottom right).
Dataset Model RMSLE MASE MAPE
DR 0.106 0.669 8.9
Sales GBM 0.118 0.727 10.2
Age of Ishtaria GAMM 0.106 0.672 9.1
DBN 0.122 0.770 11.1
DR 0.119 0.741 9.8
Sales GBM 0.164 0.814 14.8
Grand Sphere GAMM 0.142 0.874 11.9
DBN 0.128 0.762 11.4
DR 0.243 0.921 18.7
Playtime GBM 0.237 1.057 19.5
Age of Ishtaria GAMM 0.246 0.931 22.7
DBN 0.357 1.057 31.5
DR 0.362 1.034 35.6
Playtime GBM 0.559 1.000 73.5
Grand Sphere GAMM 0.495 1.011 56.2
DBN 0.265 0.991 24.6
TABLE II: Error Results for Time Series Forecasting

V-C Model Validation

To perform the predictions, a period of thirty days is chosen in order to obtain monthly forecasts of sales and playtime. For the test evaluation, we use a rolling forecasting technique [61], taking steps of 7 days for each new forecast and with a minimum of 6 months of training data. For AoI the forecasts were performed weekly from Nov 2, 2015 until Jan 10, 2017. In the case of GS, they were carried out from Oct 5, 2015 until Feb 1, 2016. The average errors are then computed over the rolling prediction results, which serves to compare the RMSLE, MASE and MAPE values.

Vi Results

Fig. 5: MASE and MAPE of the forecasting horizon (left panels) and the minimal training period (right panels) for DR, GBM, GAMM and DBN. Results from AoI.

Figure 4 shows an example of the predictions within a given period for each of the models. It illustrates that, while the performance of both DR and the GAMM is relatively similar, the GBM model has much more difficulty capturing the valleys of the series, while the DBN overestimates the peaks. The prediction errors for sales and playtime displayed in Table II also reflect this, showing that both of these models perform worse than the GAMM and DR in the case of AoI.

DR does provide better results for playtime forecasting than the GAM; still, to achieve a proper parameter selection, we need to check the autocorrelation and other measures to have more control over the fitting. This evaluation turns DR into a rigid model, which cannot adapt easily to time series data from other games, while the GAMM is more flexible and easier to tune. Once the smooth functions in the GAMM are selected, they automatically adjust to fit the distribution of the external variables. This way, the model can also learn to fit time series data from other games of the same nature.

Table II also shows the forecasting error results for GS. We can see that the pattern is similar to that for AoI. The performance of the GAMM and DR is approximately the same, and both methods outperform GBM on the sales forecasting. In this case, the DBN does perform significantly better than the GBM model, and also slightly better than the GAMM. Overall, however, taking into account all error measures for both games and dimensions (i.e. playtime and sales), the GAMM yields the most consistent results.

Vi-a Forecasting horizon

The forecasting horizon was evaluated for all the models, since the performance can significantly decrease as the number of days forecast increases [62]. Figure 5 depicts the resulting RMSLE, MASE and MAPE as a function of time, illustrating thus the predictability of the different models. The GAMM performs better than all the other models, staying much flatter for all error measures, which indicates that the forecast accuracy does not decrease much even when forecasting two or three months into the future. The GBMmodel also shows a steadier behavior, with a reasonably stable forecasting accuracy. However, this method has much higher initial and overall errors than the GAMM. The DBN has more difficulty keeping the predictions stable, showing divergent behavior as the number of days forecast increases. Finally, the DR results diverge rapidly as the forecast period becomes larger. For a short prediction range of just a few days, this technique performs better than the GAMM, but when the forecast horizon increases, the errors also become significantly larger.

Vi-B Minimal training set

We performed an error analysis of the model performance as a function of the training set size in order to evaluate the minimal training time required to obtain robust predictions.

Figure 5 shows a significant drop in the prediction errors after 12 months of training time for the GAMM, DBN and GBM. For the GAMM, the error consistently decreases with training time, while the DR performance is more unstable. The latter behavior can also be seen for the DBN and could be explained by the variable nature of the time series, which causes instability during training. In general, a 12-month training set should be sufficient to obtain the most accurate forecasts, but even after just 6 months of training data the errors are already relatively low, especially for DR and the GAMM.

Fig. 6: Structure of the time series simulation system. The arrows show how the web page, server, database, file system and modeling script interact with each other. From the user interface, one can specify future game or marketing events for desired dates. Then, the server will take the previously trained model, perform daily predictions of the sales and playtime for the input events and return the simulation results to the user.

Vii Event simulation

Simulation optimization (i.e. the analysis of what-if scenarios) is used to find the optimum input value that maximizes the response [7, 6]. Using the time series forecasting models proposed in this work, a simulation was carried out to analyze the effect of future events on the total playtime and sales. The order of the upcoming events can be changed to evaluate how a different event planning impacts the forecasts.

Vii-a Simulation Results and Analysis

Figure 7 shows an example of a simulation, with different event sequences being input into the models. In the case of DR, the sales for Sequences 1 and 2 were 37% higher and 25% lower, resepctively, than for the predefined original event sequence (the sequence that had been planned). Thus, using a different sequence of events the total sales could have been increased by 37%. For the GAMM, Sequence 1 results in an amount of sales lower than that for DR (by 32%). Although all models are suitable to perform simulations, their forecasts present different levels of accuracy. As shown in the results section, the GAMM model provides the most accurate estimate of the predicted sales and therefore it can also be expected to produce the most precise simulation results.

In DR models, the response has a linear relationship with the features, which has the drawback of not capturing the non-linear behavior of real phenomena. However, this also makes simulations easier, as the interpretation of the parameters is straightforward compared to other strongly non-linear models, like DBNs.

Vii-B Simulation Engine Tool

A common business problem faced by many game studios is how to plan future acquisition and in-game events so as to maximize sales and playtime [63]. While it is possible to manually investigate the success of past events, there are too many potentially correlated variables to be considered. The proposed forecasting models can do this automatically, and learn all the potential impacts of external variables on the future sales and playtime, providing a better estimate of the effect of future event sequences.

In order to provide a solution to the event-planning problem from a business perspective, a web-based system for time series simulation was developed. With this system, users can easily plan in-game events by means of an event planner user interface. After inputting the planned events, the daily sales and playtime for the next 30 days will be simulated and shown within a few seconds.

The system structure is shown in Figure 6. A database stores the forecasting models, the model parameters, and the external variables mentioned in Section IV-B (such as temperature and holidays). The parameters are tuned once for each game, and then models are trained once per month with the parameters. When the server receives a simulation request, it connects the database, the file system and the modeling script, so that the modeling script can obtain the data required for the simulation. After the simulation is finished, the server reads the output file and sends the results to the front-end for display.

Fig. 7: Simulation results for the sales time series of AoI, forecasting 30 days from Jan 10, 2017. The forecast resulting from the predefined sequence of events (consisting of the planned events for that period of time) is compared with two other event sequences, named Sequence 1 and Sequence 2. The different panels correspond to the simulation with DR (top left), GAMM (top right), GBM (bottom left) and DBN (bottom right).

Viii Summary and Conclusion

Overall, we found that the GAMM and ARIMA models are the most accurate for daily forecasting of sales and playtime. This result held for both of the evaluated games, Age of Ishtaria and Grand Sphere. However the GAMM has the advantage of requiring less manual tuning, which makes it more practical in a production environment. The gradient boosting model is less suitable for forecasting with these particular time series, as it showed difficulty capturing the peaks and valleys of the data. Similarly, the DBN overestimated the peaks, which could be explained by the fact that the model has many parameters, while the dataset used was relatively small (less than 1000 observations). The GAMM and ARIMA models, on the other hand, have less parameters, avoiding altogether the overfitting issue.

Alternatively, when dealing with much larger datasets that present long-range dependencies, a long short-term memory (LSTM)

[64] model could be used to properly capture and learn these dependencies and provide accurate future predictions. However, for small datasets with very short-range dependencies, as is the case for daily sales and playtime data, the input can still be fit by a standard DNN using a sliding window over the past days. This approach, though, still suffers from the overfitting problem due to the small number of data.

Nevertheless the DBN or DNN models still show room for improvement in time series forecasting, even when dealing with small datasets. ARIMA is a common forecasting model applied in a wide range of fields[65] and has the advantage of inherently containing parameters for time series forecasting. Potentially, we could incorporate such parameters into DNNs, and inspiration from the ARIMA model could be drawn to construct a similar model suitable for non-linear forecasts. However this approach is beyond the scope of this paper, as the aim here was to have an interpretable, well-established model capable of performing accurate predictions and simulations that can be applied to time series forecasting in games.

The GAMM allows for a generalizable model that can correctly capture the time series dynamics. It can be used not only to forecast both future playtime and sales, but also to simulate future game and marketing events. Instead of randomly deciding the event planning, we can employ a model-based approach that uses past information to automatically learn the interactions that are relevant for predicting event success (e.g. the weather, the day of the week and the national holidays). We provided a solution that can be used operationally in a business setting to get real-time simulation results. Allowing game studios to accurately simulate future events can help them to optimize their planning of acquisition campaigns and in-game events, ultimately leading to an increase in the amount of user playing time and to an overall rise of the in-game monetization.

Ix Software

All analysis except DBN was performed with R version 3.3.2 for Linux, using the following packages from CRAN: forecast 1.0 [66], mcgv 1.0 [67] and xgboost 2.38 [56]. DBN was performed with Python 2.7.12, using Theano [68].

Acknowledgements

We thank Sovannrith Lay for helping to gather the data and Javier Grande for his careful review of the manuscript.

References

  • [1] M. S. El-Nasr, A. Drachen, and A. Canossa, “Game analytics,” New York, Sprint, 2013.
  • [2] G. N. Yannakakis and J. Togelius, Artificial Intelligence and Games.   Springer, 2017, http://gameaibook.org.
  • [3] J. G. De Gooijer and R. J. Hyndman, “25 years of time series forecasting,” International journal of forecasting, vol. 22, no. 3, pp. 443–473, 2006.
  • [4] P. J. Brockwell and R. A. Davis, Introduction to time series and forecasting.   Springer, 2016.
  • [5] R. Adhikari and R. Agrawal, “An introductory study on time series modeling and forecasting,” arXiv preprint arXiv:1302.6613, 2013.
  • [6] S. Asmussen and P. W. Glynn, Stochastic simulation: algorithms and analysis.   Springer Science & Business Media, 2007, vol. 57.
  • [7] Y. Carson and A. Maria, “Simulation optimization: methods and applications,” in Proceedings of the 29th conference on Winter simulation.   IEEE Computer Society, 1997, pp. 118–126.
  • [8] G. E. Box and G. M. Jenkins, Time series analysis: forecasting and control, revised ed.   Holden-Day, 1976.
  • [9] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001.
  • [10] T. J. Hastie and R. J. Tibshirani, Generalized additive models.   CRC press, 1990, vol. 43.
  • [11] E. Busseti, I. Osband, and S. Wong, “Deep learning for time series modeling,” Technical report, Stanford University, 2012.
  • [12] C. Bauckhage, K. Kersting, R. Sifa, C. Thurau, A. Drachen, and A. Canossa, “How players lose interest in playing a game: An empirical study based on distributions of total playing times,” in Computational Intelligence and Games (CIG), 2012 IEEE conference on.   IEEE, 2012, pp. 139–146.
  • [13] F. Hadiji, R. Sifa, A. Drachen, C. Thurau, K. Kersting, and C. Bauckhage, “Predicting player churn in the wild,” in Computational intelligence and games (CIG), 2014 IEEE conference on.   IEEE, 2014, pp. 1–8.
  • [14] Á. Periáñez, A. Saas, A. Guitart, and C. Magne, “Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using Survival Ensembles,” in Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on.   IEEE, 2016, pp. 564–573.
  • [15] P. Bertens, A. Guitart, and Á. Periáñez, “Games and Big Data: A Scalable Multi-Dimensional Churn Prediction Model,” Submitted to IEEE CIG, 2017.
  • [16] C. Bauckhage, A. Drachen, and R. Sifa, “Clustering game behavior data,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 7, no. 3, pp. 266–278, 2015.
  • [17] A. Drachen, R. Sifa, C. Bauckhage, and C. Thurau, “Guns, swords and data: Clustering of player behavior in computer games in the wild,” in Computational Intelligence and Games (CIG), 2012 IEEE Conference on.   IEEE, 2012, pp. 163–170.
  • [18] A. Drachen, C. Thurau, R. Sifa, and C. Bauckhage, “A comparison of methods for player clustering via behavioral telemetry,” arXiv preprint arXiv:1407.3950, 2014.
  • [19] R. Sifa, C. Bauckhage, and A. Drachen, “The playtime principle: Large-scale cross-games interest modeling,” in Computational Intelligence and Games (CIG), 2014 IEEE Conference on.   IEEE, 2014, pp. 1–8.
  • [20] A. Saas, A. Guitart, and Á. Periáñez, “Discovering playing patterns: Time series clustering of free-to-play game data,” in Computational Intelligence and Games (CIG), 2016 IEEE Conference on.   IEEE, 2016, pp. 1–8.
  • [21] G. E. Box and G. M. Jenkins, Time series analysis: forecasting and control, revised ed.   Holden-Day, 1976.
  • [22] K. D. Lawrence and M. D. Geurts, Advances in business and management forecasting.   Emerald Group Publishing, 2006, vol. 4.
  • [23] G. E. Box and D. R. Cox, “An analysis of transformations,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 211–252, 1964.
  • [24] H. Akaike, “A new look at the statistical model identification,” IEEE transactions on automatic control, vol. 19, no. 6, pp. 716–723, 1974.
  • [25] G. Schwarz, “Estimating the dimension of a model,” The annals of statistics, vol. 6, no. 2, pp. 461–464, 1978.
  • [26]

    J. G. Cragg, “Estimation and testing in time-series regression models with heteroscedastic disturbances,”

    Journal of Econometrics, vol. 20, no. 1, pp. 135–157, 1982.
  • [27] T. G. Dietterich, “Ensemble methods in machine learning,” in International workshop on multiple classifier systems.   Springer, 2000, pp. 1–15.
  • [28] L. Mason, J. Baxter, P. L. Bartlett, and M. R. Frean, “Boosting Algorithms as Gradient Descent.” in NIPS, 1999, pp. 512–518.
  • [29] L. Breiman, “Arcing the edge,” Technical Report 486, Statistics Department, University of California at Berkeley, Tech. Rep., 1997.
  • [30] G. Ridgeway, “Generalized boosted models: A guide to the gbm package,” Update, vol. 1, no. 1, p. 2007, 2007.
  • [31] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Frontiers in neurorobotics, vol. 7, p. 21, 2013.
  • [32] T. Zhang and B. Yu, “Boosting with early stopping: Convergence and consistency,” The Annals of Statistics, vol. 33, no. 4, pp. 1538–1579, 2005.
  • [33] T. Hastie and R. Tibshirani, “Generalized additive models: some applications,” Journal of the American Statistical Association, vol. 82, no. 398, pp. 371–386, 1987.
  • [34] J. Maindonald, “Smoothing terms in GAM models,” 2010.
  • [35] K. Larsen, “GAM: The predictive modeling silver bullet,” Multithreaded. Stitch Fix, vol. 30, 2015.
  • [36] C. Chen, “Generalized additive mixed models,” Communications in Statistics-Theory and Methods, vol. 29, no. 5-6, pp. 1257–1271, 2000.
  • [37] N. E. Breslow and D. G. Clayton, “Approximate inference in generalized linear mixed models,” Journal of the American statistical Association, vol. 88, no. 421, pp. 9–25, 1993.
  • [38] S. N. Wood, “Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 73, no. 1, pp. 3–36, 2011.
  • [39] Y. Bengio, “Learning deep architectures for AI,” Foundations and trends® in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
  • [40] L. Deng and D. Yu, “Deep learning: methods and applications,” Foundations and Trends® in Signal Processing, vol. 7, no. 3–4, pp. 197–387, 2014.
  • [41]

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in

    Advances in neural information processing systems, 2012, pp. 1097–1105.
  • [42]

    A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in

    Acoustics, speech and signal processing (icassp), 2013 ieee international conference on.   IEEE, 2013, pp. 6645–6649.
  • [43] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006.
  • [44]

    D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, “A learning algorithm for Boltzmann machines,”

    Cognitive science, vol. 9, no. 1, pp. 147–169, 1985.
  • [45]

    H. Larochelle and Y. Bengio, “Classification using discriminative restricted Boltzmann machines,” in

    Proceedings of the 25th international conference on Machine learning.   ACM, 2008, pp. 536–543.
  • [46] M. Längkvist, L. Karlsson, and A. Loutfi, “A review of unsupervised feature learning and deep learning for time-series modeling,” Pattern Recognition Letters, vol. 42, pp. 11–24, 2014.
  • [47] G. Zhang, B. E. Patuwo, and M. Y. Hu, “Forecasting with artificial neural networks: The state of the art,” International journal of forecasting, vol. 14, no. 1, pp. 35–62, 1998.
  • [48] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
  • [49]

    A. Y. Ng, “Feature selection, l 1 vs. l 2 regularization, and rotational invariance,” in

    Proceedings of the twenty-first international conference on Machine learning.   ACM, 2004, p. 78.
  • [50] R. J. Hyndman and A. B. Koehler, “Another look at measures of forecast accuracy,” International journal of forecasting, vol. 22, no. 4, 2005.
  • [51] A. J. Fox, “Outliers in time series,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 350–363, 1972.
  • [52] M. Sakurada and T. Yairi, “Anomaly detection using autoencoders with nonlinear dimensionality reduction,” in Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis.   ACM, 2014, p. 4.
  • [53] TimeAndDate.com. (2010) Japan National Holidays history. [Online]. Available: https://www.timeanddate.com/holidays/japan/
  • [54] (1995) Tokyo Daily Temperature history. [Online]. Available: https://www.wunderground.com/
  • [55] G. E. Box and D. R. Cox, “An analysis of transformations,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 211–252, 1964.
  • [56] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.   ACM, 2016, pp. 785–794.
  • [57] P. H. Eilers and B. D. Marx, “Flexible smoothing with B-splines and penalties,” Statistical science, pp. 89–102, 1996.
  • [58] S. N. Wood, Generalized additive models: an introduction with R.   CRC press, 2017.
  • [59] ——, “Thin plate regression splines,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 65, no. 1, pp. 95–114, 2003.
  • [60] L. Prechelt, “Early stopping-but when?” in Neural Networks: Tricks of the trade.   Springer, 1998, pp. 55–69.
  • [61] M. Gilliland, U. Sglavo, and L. Tashman, Business Forecasting: Practical Problems and Solutions.   John Wiley & Sons, 2016.
  • [62] S. Makridakis and M. Hibon, “The M3-Competition: results, conclusions and implications,” International journal of forecasting, vol. 16, no. 4, pp. 451–476, 2000.
  • [63] J. Julkunen, “Feature Spotlight: In-Game Events and Market Trends,” 2016. [Online]. Available: http://www.gamerefinery.com/in-game-events-market-trends/
  • [64] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  • [65] L. Dwyer, A. Gill, and N. Seetaram, Handbook of research methods in tourism: Quantitative and qualitative approaches.   Edward Elgar Publishing, 2012.
  • [66] R. Hyndman and Y. Khandakar, “Automatic time series forecasting: the forecast package for R,” 2008.
  • [67] S. Wood, “mgcv: Mixed GAM computation vehicle with GCV/AIC/REML smoothness estimation,” 2012.
  • [68] Theano Development Team, “Theano: A Python framework for fast computation of mathematical expressions,” arXiv e-prints, vol. abs/1605.02688, May 2016. [Online]. Available: http://arxiv.org/abs/1605.02688