From Non-Paying to Premium: Predicting User Conversion in Video Games with Ensemble Learning

06/25/2019 ∙ by Anna Guitart, et al. ∙ Yokozuna Data 7

Retaining premium players is key to the success of free-to-play games, but most of them do not start purchasing right after joining the game. By exploiting the exceptionally rich datasets recorded by modern video games--which provide information on the individual behavior of each and every player--survival analysis techniques can be used to predict what players are more likely to become paying (or even premium) users and when, both in terms of time and game level, the conversion will take place. Here we show that a traditional semi-parametric model (Cox regression), a random survival forest (RSF) technique and a method based on conditional inference survival ensembles all yield very promising results. However, the last approach has the advantage of being able to correct the inherent bias in RSF models by dividing the procedure into two steps: first selecting the best predictor to perform the splitting and then the best split point for that covariate. The proposed conditional inference survival ensembles method could be readily used in operational environments for early identification of premium players and the parts of the game that may prompt them to become paying users. Such knowledge would allow developers to induce their conversion and, more generally, to better understand the needs of their players and provide them with a personalized experience, thereby increasing their engagement and paving the way to higher monetization.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 5

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The recent paradigm change in video games—now games are always-online or have an online playing option—has driven a change in game monetization. A new business model has emerged: free-to-play or freemium games that can be acquired and played for free and only charge users for additional in-game content. Today a vast majority of mobile games follow this pricing strategy (Annie, 2013; Fields, 2014), and even traditional PC and platform games are relying more and more on extra contents to be purchased online as a source of revenue.

Identifying and retaining high-value players is crucial for successful monetization, especially in the case of freemium games (Periáñez et al., 2016). Previous research along these lines focused on predicting lifetime value (the amount a player will spend on purchases before leaving the game) (Sifa et al., 2018; Chen et al., 2018) and churn—by trying to foresee what players are going to leave the game (Jun Ding and Chen, 1996; Kawale et al., 2009; Hadiji et al., 2014; Rothenbuehler et al., 2015; Runge et al., 2014) and when they are going to do it (Periáñez et al., 2016; Bertens et al., 2017; Kim et al., 2018; Chen et al., 2019). The main idea behind these works is that pinpointing premium players who are likely to churn would allow developers to take steps to increase their lifetime in the game, since retention strategies are usually cheaper than acquisition campaigns (Fields, 2014).

In this paper we entertain a similar idea: that the ability to predict what players have the potential to become paying users (PUs) and when (or at what game level) they are more likely to start purchasing would allow developers to take steps to induce their conversion. And this ability could lead to a significant increase in monetization, since getting users to purchase remains challenging even for big games: up to 70% of players quit the game without having spent any money (Geron, 2013). For example, a game may be very engaging (very high retention rates) but present poor user conversion rates.

Once the game already has a base of users actively engaged in purchasing and/or continuous conversions from non-PUs to PUs player retention strategies come into play.

Another related issue is spotting the existing PUs who have the potential to become whales (top spenders). These are the most valuable players, typically providing up to 50% of the total revenue of the game despite accounting for less than 1% of the total number of players (Johnson, 2014), and thus their early identification is of the utmost importance.

To tackle this conversion prediction problem, we will apply survival analysis, a set of statistical methods used to estimate the time it takes for a certain event of interest—in our case, becoming a PU—to happen. We will explore three different approaches (the traditional Cox regression model, a random survival forest (RSF) technique and a method based on conditional inference survival ensembles) and provide predictions in terms of the number of days, in-game levels and cumulative playtime before a certain user becomes a PU. It is worth noting that, contrary to churn prediction in casual games (where the churn definition is not straightforward

(Hadiji et al., 2014; Periáñez et al., 2016)

) in this case the event of interest is clearly defined: it occurs the moment the player makes a purchase.

The prediction of conversion times has been thoroughly investigated in other fields, such as e-commerce (Cui et al., 2018) or medicine and healthcare (Wu et al., 2018), with some works also making use of survival analysis techniques. For instance, in (Ji et al., 2017) a conversion prediction model, together with a recommendation system, is proposed in connection to e-commerce websites, while the authors of (Wang et al., 2013) modeled career switches using the proportional hazards model.

In the context of video-games, previous research about conversion treats it as a binary classification problem (Sifa et al., 2015)

, where players are divided into potential and non-potential PUs through traditional machine learning techniques, such as support vector machines, decision trees and random forests.

1.1. Our Contribution

Previous studies have already shown the application of survival analysis to video games for predicting churn (Periáñez et al., 2016; Bertens et al., 2017) but, to the best of our knowledge, this is the first paper using a survival approach to predict conversion times in the context of video games.

2. Survival Analysis Models

Survival analysis (Clark TG, 2003) was introduced to address time-to-event regression problems characterized by having incomplete or partially labeled data. This set of methods focus on estimating the remaining lifetime of an individual until a specific event happens, given a set of predictor (explanatory) variables. Traditionally, the event of interest used to be death or organ failure, as these techniques were first applied in the biological and medical fields (Hougaard, 1999). In this work, the event of interest is becoming a PU. The time to the event of interest cannot be determined until it happens and hence not all individuals can be labeled, a situation known as censoring. A special type of time-to-event models considers the existence of competing risks (Prentice et al., 1978)

, events which impede the observation or affect the probability of occurrence of the event of interest.

The outcome of survival models is the survival probability curve for each individual, which indicates the probability that the event has not happened yet (i.e. that the user is still alive) at a certain time point.

However, for a more intuitive understanding, in this study we will depict the cumulative incidence function, which gives the probability that the event of interest—becoming a PU—does happen.

The predicted time-to-event is derived from the survival curves: it is identified with the median survival time, the time for which survival probability gets down to 50%. The survival function is related to the hazard function

, defined as the ratio of the probability density function

to the survival function:

(1)

In this paper we focus on comparing the performance of a semi-parametric model (the Cox proportional hazards model) to that of more recent survival ensemble techniques, such as the conditional inference survival ensembles and random survival forest methods. For the latter, we also tested the inclusion of competing risks. These models are presented in the following sections.

2.1. Cox Regression

The Cox proportional hazards or Cox regression model (Cox, 1972; Cox and Oakes, 1984; David, 1972) is a survival model that assumes a multiplicative relation between covariates and hazard:

(2)

Here is the baseline hazard function, is an unknown vector of regression coefficients (parameters) and are the covariates for each individual , with .

Cox regression is a very popular method and is frequently used in survival analysis due to its flexibility as a semiparametric model. The hazard function is estimated in a distribution-free manner from the data, and there exists a linear-exponential parametric relationship between the predictors and the outcome.

2.2. Conditional Inference Survival Ensembles

The conditional inference survival ensembles (also known as conditional inference forest) model is a fully non-parametric tree-based method used in survival analysis. It is based on the Breiman random forest (Breiman, 2001), but uses conditional inference trees (instead of the usual decision trees) as base learners (Hothorn et al., 2006). The splitting at each node is performed in two steps: (1) the optimal split variable is selected based on its correlation with the output, and (2) the best split point for that covariate—the one that maximizes the survival difference among daughter nodes—is determined using two-sample linear statistics.

Conditional inference forests use a weighted Kaplan–Meier estimate (Hothorn et al., 2004; Mogensen et al., 2012) to construct the survival function (Mogensen et al., 2012; Periáñez et al., 2016):

(3)

where , with the number of trees within the ensembles, and are the covariates for the th subject, with . In the node where is located, represents the uncensored events until time , and stands for the number of individuals at risk at .

2.3. Random Survival Forest

The random forest algorithm was first described in (Breiman, 2001). It consists of an ensemble of decision trees trained using bootstrap samples from the total set, with selection of the splitting variable at each node being random. The split point is taken as the one that maximizes a predefined splitting criteria (often, the Gini impurity measure (Breiman et al., 1984)). The selection of the split variable and split point is performed at the same step, which gives rise to a relatively biased model that favors variables with many possible split points. The survival extension of this method is called random survival forest (Ishwaran et al., 2008).

The ensemble is constructed using tree-based Nelson–Aalen estimators (Ishwaran et al., 2008):

(4)

and the ensemble survival function is

(5)

where the variables have the same meaning as in (3).

This model, as the previously described ensemble model, is fully non-parametric, which offers an advantage over other approaches.

2.4. Random Survival Forest with Competing Risks

This is an extension of the random survival forest method explained in the previous section in which competing risks are considered (Ishwaran et al., 2014). Throughout this work, we assume the main reason that prevents the event of interest from happening (i.e. that prevents players from becoming PUs) is a lack of interest in purchasing. However, now we will also take into account the fact that players may not become PUs because they churn (leave the game) before. Thus, we have two events of interest that conflict with each other: becoming a PU and churning, see Figure 1. We will only consider player information until one of these two events occur, as once a user has churned she obviously cannot become a PU anymore.

Figure 1. Example of right-censored data (for 10 users over 30 days of lifetime) considering churn as competing risk. Players may become PUs (circles) or churn (triangles) at some point. If neither of these two events occur within the observation period, then the data is censored (crosses).

Including competing risks affects the splitting rules used to grow the survival trees, and the values computed in each terminal node of the ensemble become event-specific (Ishwaran et al., 2014).

For random forests with competing risks, a competing risk tree is grown for each bootstrap sample and the node is split using the best covariate—the one that maximizes the competing risk splitting rule.

The cumulative event-specific hazard function for each event considering a Nelson–Aalen estimator is given by

(6)

where and is the number of type- events at time for all individuals , with being the corresponding event indicator. (The total number of events occurring at time is denoted as .)

3. Datasets

The work presented in this article focuses on the analysis of two datasets from two different game titles: Age of Ishtaria (hereafter, AoI) and Grand Sphere (hereafter, GS). Both titles are role-playing card battle games very popular in Japan and developed by Silicon Studio, with the first one having a larger number of active players (although they are very similar). Data comprises daily records of the daily activity of each player (playtime, actions, sessions, etc.) and was collected between January 2015 and February 2017 for AoI and between June 2017 and May 2018 for GS. During these periods, neither of the games experienced major changes that might have influenced the data, see (Kim et al., 2018; Chen et al., 2019).

Only a small percentage of users will eventually become PUs, a pattern that can be observed in Figures 2 and 3. These figures show the inverse of the Kaplan–Meier estimates for the probability of surviving as a non-paying user, i.e., they show the probability of becoming a PU in terms of the number of days, level achieved and accumulated playtime, both for the total population of players (Figure 2) and considering only PUs (Figure 3). Looking at the probability in terms of the number of days (Figure 2, left), we see that only around 25% or less of all players end up becoming PUs. In the plots for the number of game levels (center) and cumulative playtime (right) to become a PU, final percentages are higher, as the few players who reach higher levels or longer playtimes are mostly PUs. This does not happen for the probability in terms of the number of days though: even if players stay in the game for a very long time, only a few of them will become premium users.

Figure 2.

Cumulative incidence functions, showing the probability of becoming a PU as a function of the number of days since registration (left), game level (center) and cumulative playtime (right) for all players in the games AoI (top) and GS (bottom). The shaded area represents the 95% confidence interval.

Figure 3. Cumulative incidence functions, showing the probability of becoming a PU as a function of the number of days since registration (left), game level (center) and cumulative playtime (right), for PUs only, in the games AoI (top) and GS (bottom). The shaded area represents the 95% confidence interval.

We considered only players who logged in at least 2 days in the game, thus discarding new players. In freemium games, every day there are typically many new registered users, most of whom will not connect a second day—they are one-time comers. However, in operational settings, complete data from the first connection day is not available until the day has ended. Therefore, predicting the behavior of newcomers requires a different approach that is beyond the scope of this paper. By removing these new players, class imbalance is also reduced, as the vast majority of them will never become PUs. For non-newcomers, the percentage of PUs in our datasets was 5.32% for AoI and 5.30% for GS.

Our sample comprised 30,000 users for AoI and 10,000 users for GS.

To perform the data splitting into train and test sets, we took random samples, ensuring that the proportion of PUs was similar in both sets; 30% of players were assigned to the training set and the remaining 70% constituted the test sample.

One of the aims of this exercise was to test if our models could provide accurate prediction results in an operational environment—where datasets can be huge—when trained with just a small subset of the total data. This is why we used a training set much smaller than the test set.

3.1. Response Variables

The implemented models were trained to predict the number of days to become a PU, the level at which each player will become a PU and the number of hours she will play until then. Similarly as in (Bertens et al., 2017)

, we used the following predictor variables:

  • Lifetime: Number of days since the user’s registration date.

  • Level: Latest game level reached by the player.

  • Playtime: Number of hours played by the user.

In all cases, the censored variable was whether the player became a PU or not. When including competing risks, there is an additional event to consider: whether the user churned before becoming a PU. For conversions, the event definition is straightforward: the event takes place as soon as the player makes her first purchase. In the case of churn, the definition is not as clear, and the event is usually assumed to happen after a certain inactivity period that may vary from game to game. This has been already discussed in depth in (Periáñez et al., 2016; Bertens et al., 2017; Chen et al., 2018).

3.2. Feature Selection

We considered features not related to the peculiarities of the games and that can be measured in practically any title, as having game-independent features makes it easier to apply our research to real business environments. They were mainly based on playtime and actions/sessions, and several statistical operations (averaging playtime, etc.) were performed to obtain the final static features. We also explored features related to user level, as most games have some measure of in-game progression (e.g. game or player level). For each outcome—number of days, level, cumulative playtime—we selected the features that best modeled every output through a feature engineering process.

4. Modeling

4.1. Model Specification

For the ensemble methods (the conditional inference survival ensembles model and the random survival forest model, either with or without competing risks) we selected 900 trees to be used as base learners.

As validation metrics, we used the root mean square logarithmic error (RMSLE) between the observed and predicted values, false positive rate (percentage of players in the validation sample who were predicted to become PUs but churned before doing so)

and false negative rate (players who became PUs despite not being predicted to do so). Scatter plots of predicted vs. observed variables are also examined.

4.2. Results

The results for all different models and variables (lifetime, level and playtime) are summarized in Table 1. Scatter plots comparing observed and predicted values for players that did become PUs are shown in Figure 4, whereas Figure 5 displays the corresponding log-log scatter plots. The latter are probably more illustrative, as using logarithms allows a close-up look at small values of the observed and predicted quantities while preventing a visual overpenalization by errors at large values.

Considering the identification of potential PUs (regardless of when the conversion occurs) all models give accurate results, as inferred from the low rates of false negatives and false positives in Table 1. All methods also provide reasonable predictions of when the conversion will take place in terms of the three variables, thus confirming the suitability of survival analysis to explore this problem. Overall results for the semi-parametric Cox regression model show relatively larger errors—across all variables and games—as compared to the ensemble approaches.

The three ensemble methods yield comparable results in general. It is worth noting that the model including competing risks does not outperform the others. This probably indicates that churn is not a competitive risk in nature, i.e. non-PUs with a high risk of churning very rarely become PUs and, conversely, players with a high probability of becoming PUs are normally not considering quitting the game. Taking churn into account does slightly reduce the rate of false positives, as would be expected, but produces a larger increase in the rate of false negatives (except for playtime in AoI). In regard to when conversions will occur (for those players that are indeed to become PUs) including competing risks results into less accurate predictions except for lifetime in GS.

The RSF model yields slightly better lifetime and level predictions than conditional inference survival ensembles in both games, but performs significantly worse for playtime. In particular, conversions that occur after a very long playtime are only predicted by the conditional inference survival ensembles model, as can be seen in the scatter plots shown in Figures 4 and 5. This is of the utmost importance for the problem under consideration, as one of the obvious applications of this analysis would be to individually target potential PUs in order to accelerate their conversion. Even when the conversion happens after a short playtime, both the random survival forest and Cox regression models exhibit very obvious biases, yielding prediction values that are systematically lower than the actual outcomes.

For level predictions, however, the RSF model produces better results across all scales in both games. The scatter plots in Figures 4 and 5 also reveal the inability of all models to predict conversions in the first levels of the game—where player progression is typically very quick. This has however hardly any practical relevance: in these first stages of the game, conversions are almost immediate in terms of lifetime and playtime, so early detection of the potential of these players adds very little value. Similarly, although RSFs also provide overall better predictions for lifetime, this is due mainly to its better performance in cases when conversion takes place early on and which have thus limited impact for practical purposes. Note also that (although this effect is smaller in the case of the RSF method) all models are biased in that they tend to predict higher levels of conversion than actually observed. This is also the case for playtime predictions using conditional inference ensembles.

Scatter plots for GS are similar to those shown for AoI—as suggested by the results of Table 1—and thus they are not included.

Figure 4. Scatter plots of observed vs. predicted “times” for the occurrence of the event becoming a PU for AoI players. We consider three different time measures—lifetime (left), game level (center) and playtime (right)—and three different models—conditional inference survival ensembles (top), random survival forest (middle) and Cox regression (bottom). Predictions correspond to the median survival values.
Figure 5. Log-log scatter plots of observed vs. predicted “times” for the occurrence of the event becoming a PU for AoI players. We consider three different time measures—lifetime (left), game level (center) and playtime (right)—and three different models—conditional inference survival ensembles (top), random survival forest (middle) and Cox regression (bottom). Predictions correspond to the median survival values. The logarithm transformation provides a close-up look at the spread of the data points (cf. Figure 4).
Age of Ishtaria (AoI) RMSLE False Negatives False Positives
(r)2-4(l)5-7(l)8-10 Model Lifetime Level Playtime Lifetime Level Playtime Lifetime Level Playtime
Conditional inference survival ensembles 0.54 0.69 0.47 0.27% 0.84% 0.60% 3.68% 4.02% 4.02%
Random survival forest 0.45 0.50 0.71 0.18% 1.08% 1.01% 3.70% 3.32% 3.42%
Random survival forest (competing risks) 0.50 0.63 0.85 0.61% 3.21% 0.58% 3.41% 1.17% 3.27%
Cox regression 1.08 1.00 0.79 12.22% 1.69% 2.34% 3.75% 4.19% 2.30%
Grand Sphere (GS) RMSLE False Negatives False Positives
(r)2-4(l)5-7(l)8-10 Model Lifetime Level Playtime Lifetime Level Playtime Lifetime Level Playtime
Conditional inference survival ensembles 1.00 0.77 0.48 1.74% 0.58% 1.31% 1.54% 3.09% 2.97%
Random survival forest 0.58 0.63 0.79 1.78% 1.07% 1.17% 1.62% 2.47% 2.38%
Random survival forest (competing risks) 0.34 0.92 0.83 2.42% 2.89% 3.39% 1.07% 0.59% 2.30%
Cox regression 2.71 1.23 0.85 3.07% 3.66% 3.46% 1.70% 3.36% 3.11%
Table 1. Validation results for all models and variables considered. (RMSLE: Root mean square logarithmic error.)

5. Summary and Conclusion

Our results show that survival analysis is a suitable framework to study user conversion in video games. We implemented several survival analysis methods, including three ensemble-based approaches, to determine the time, number of levels and accumulated playtime that non-paying players need to become PUs in two different free-to-play games. Historical data is included in the models at the individual level, as the aim of this work is to provide prediction results for each user.

All models are very good at detecting potential PUs and provide fairly accurate time-to-event predictions in terms of days after first login, game level and playtime. Ensemble models outperform the classical semi-parametric Cox regression model across most validation metrics, variables and games. They are also particularly well suited for operational settings, as they can be easily parallelized and thus admit a scalable implementation.

Among the different ensemble approaches considered, the RSF method yields slightly better predictions in terms of lifetime and level, but critically fails at predicting playtime for those players who only start purchasing after having played for a very long time. Including churn as a competing risk does not have any clear positive impact. Moreover, RSFs are notorious for their proneness to introducing biases, as they favour variables with many splitting points. These results point to conditional inference survival ensembles as the most viable model in controlled production settings.

This work represents a step toward the personalization of the game experience in that it serves to target players individually, not only based on their current or past actions but also on their expected future behavior. Game developers and planners could use these methods to automatically determine who is likely to become a premium player and when she is likely to start behaving as such. This information can be then used to tailor the game experience of players with several goals in mind. Actions can be taken on players that have potential to become PUs to ensure they remain long enough in the game for the conversion to take place. Actions can be also taken to motivate each user at the precise moment or adequate stage of the game instead of targeting them too early on, when, for example, notifications or discounts are more likely to bother and disengage the players than to produce the conversion. These predictions also bring attention to those players who are not expected to become PUs in the near future, so as to try to accelerate their conversion if and when possible.

Future extensions of this work include applying the same approach to identify potential top spenders among the already existing PUs, and to detect conversions between different types of purchasing behavior, which should enable further personalization and increased monetization. For example, while for frequent spenders with low average outlay the goal would be to increase the latter, for players that seldom make purchases, efforts directed toward raising their purchasing frequency will probably be more effective.

6. Software

All analyses were performed using R version 3.4.4 for Linux and the following packages from the Comprehensive R Archive Network (CRAN): party (version 1.3-0) (Hothorn et al., 2010; Hothorn et al., 2015), survival (version 2.42-6) (Therneau and Lumley, 2015), survminer (version 0.4.3) (Kassambara et al., 2017b, a), ROCR (version 1.0-7) (Sing et al., 2005a, b), randomForestSRC (version 2.8.0) (Ishwaran et al., 2019) and peperr (version 1.1-7) (Porzelius et al., 2019).

Acknowledgements.
We thank Javier Grande for his careful review of the manuscript.

References

  • (1)
  • Annie (2013) App Annie. 2013. App Annie and IDC Mobile App Advertising and Monetization Trends. http://go.appannie.com/mobile-app-advertising-and-monetization-trends-2013-2018/. (2013).
  • Bertens et al. (2017) Paul Bertens, Anna Guitart, and África Periáñez. 2017. Games and Big Data: A Scalable Multi-Dimensional Churn Prediction Model. In 2017 IEEE Conference on Computational Intelligence and Games (CIG). IEEE, 33–36. https://doi.org/10.1109/CIG.2017.8080412
  • Breiman (2001) Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.
  • Breiman et al. (1984) Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone. 1984. Classification and regression trees. Wadsworth Int. Group 37, 15 (1984), 237–251.
  • Chen et al. (2019) Pei Pei Chen, Anna Guitart, and África Periáñez. 2019. The Winning Solution to the IEEE CIG 2017 Game Data Mining Competition. Machine Learning Knowledge. Extraction (2019), 1(1), 252–264. https://doi.org/10.3390/make1010016
  • Chen et al. (2018) Pei Pei Chen, Anna Guitart, África Periáñez, and Ana Fernández del Río. 2018.

    Customer Lifetime Value in Video Games Using Deep Learning and Parametric Models.

    IEEE International Conference on Big Data (2018), 2134–2140.
  • Clark TG (2003) Love SB Altman DG Clark TG, Bradburn MJ. 2003. Survival Analysis Part I: Basic concepts and first analyses. British Journal of Cancer 89(2) (2003), 232–238.
  • Cox (1972) D. R. Cox. 1972. Regression Models and Life-Tables. Journal of the Royal Statistical Society. Series B (Methodological) 34, 2 (1972), 187–220. http://links.jstor.org/sici?sici=0035-9246%281972%2934%3A2%3C187%3ARMAL%3E2.0.CO%3B2-6
  • Cox and Oakes (1984) David Roxbee Cox and David Oakes. 1984. Analysis of survival data. Vol. 21. CRC Press.
  • Cui et al. (2018) Yanwei Cui, Rogatien Tobossi, and Olivia Vigouroux. 2018. Modelling customer online behaviours with neural networks: applications to conversion prediction and advertising retargeting. arXiv preprint arXiv:1804.07669 (2018).
  • David (1972) Cox R David. 1972. Regression models and life tables (with discussion). Journal of the Royal Statistical Society 34 (1972), 187–220.
  • Fields (2014) Tim Fields. 2014. Mobile and Social Game Design: Monetization Methods and Mechanics (2 ed.). CRC Press. 2–64 pages.
  • Geron (2013) Tomio Geron. 2013. How King.com Zoomed Up The Social Gaming Charts. (2013). https://www.forbes.com/sites/tomiogeron/2013/03/26/how-king-com-zoomed-up-the-social-gaming-charts/#724b2fe9421e
  • Hadiji et al. (2014) Fabian Hadiji, Rafet Sifa, Anders Drachen, Christian Thurau, Kristian Kersting, and Christian Bauckhage. 2014. Predicting player churn in the wild. In Computational Intelligence and Games (CIG), 2014 IEEE Conference on. IEEE, 1–8.
  • Hothorn et al. (2010) Torsten Hothorn, Kurt Hornik, Carolin Strobl, and Achim Zeileis. 2010. Party: A laboratory for recursive partytioning. (2010).
  • Hothorn et al. (2015) Torsten Hothorn, Kurt Hornik, Carolin Strobl, Achim Zeileis, and Maintainer Torsten Hothorn. 2015. Package ’party’. Package Reference Manual for Party Version 0.9-998 16 (2015), 37.
  • Hothorn et al. (2006) Torsten Hothorn, Kurt Hornik, and Achim Zeileis. 2006. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15, 3 (2006), 651–674.
  • Hothorn et al. (2004) Torsten Hothorn, Berthold Lausen, Axel Benner, and Martin Radespiel-Tröger. 2004. Bagging survival trees. Statistics in medicine 23, 1 (2004), 77–91.
  • Hougaard (1999) Philip Hougaard. 1999. Fundamentals of survival data. Biometrics 55, 1 (1999), 13–22.
  • Ishwaran et al. (2014) Hemant Ishwaran, Thomas A Gerds, Udaya B Kogalur, Richard D Moore, Stephen J Gange, and Bryan M Lau. 2014. Random survival forests for competing risks. Biostatistics 15, 4 (2014), 757–773.
  • Ishwaran et al. (2008) Hemant Ishwaran, Udaya B Kogalur, Eugene H Blackstone, Michael S Lauer, et al. 2008. Random survival forests. The annals of applied statistics 2, 3 (2008), 841–860.
  • Ishwaran et al. (2019) Hemant Ishwaran, Udaya B Kogalur, and Maintainer Udaya B Kogalur. 2019. Package ‘randomForestSRC’. (2019).
  • Ji et al. (2017) Wendi Ji, Xiaoling Wang, and Feida Zhu. 2017. Time-aware conversion prediction. Frontiers of Computer Science 11, 4 (01 Aug 2017), 702–716. https://doi.org/10.1007/s11704-016-5546-y
  • Johnson (2014) Eric Johnson. 2014. A Long Tail of Whales: Half of Mobile Games Money Comes From 0.15 Percent of Players. http://recode.net/2014/02/26/a-long-tail-of-whales-half-of-mobile-games-money-comes-from-0-15-percent-of-players. (2014).
  • Jun Ding and Chen (1996) Daqi Gao Jun Ding and Xiaohong Chen. 1996.

    Alone in the Game: Dynamic Spread of Churn Behavior in a Large Social Network a Longitudinal Study in MMORPG.

    falta 24, 2 (1996), 123–140.
  • Kassambara et al. (2017b) A Kassambara, M Kosinski, P Biecek, et al. 2017b. survminer: Drawing Survival Curves using’ggplot2’. R package version 0.3 1 (2017).
  • Kassambara et al. (2017a) Alboukadel Kassambara, Marcin Kosinski, Przemyslaw Biecek, and S Fabian. 2017a. Package ‘survminer’. (2017).
  • Kawale et al. (2009) Jaya Kawale, Aditya Pal, and Jaideep Srivastava. 2009. Churn prediction in MMORPGs: A social influence based approach. In Computational Science and Engineering, 2009. CSE’09. International Conference on, Vol. 4. IEEE, 423–428.
  • Kim et al. (2018) Kyung-Joong Kim, DuMim Yoon, JiHoon Jeon, Seong-il Yang, Sang-Kwang Lee, EunJo Lee, Yoonjae Jang, Dae-Wook Kim, Pei Pei Chen, Anna Guitart, Paul Bertens, África Periáñez, Fabian Hadiji, Marc Müller, Youngjun Joo, Jiyeon Lee, and Inchon Hwang. 2018. Game Data Mining Competition on Churn Prediction and Survival Analysis using Commercial Game Log Data. IEEE Transactions on Games (2018), 1–1. https://doi.org/10.1109/TG.2018.2888863
  • Mogensen et al. (2012) Ulla B Mogensen, Hemant Ishwaran, and Thomas A Gerds. 2012. Evaluating random forests for survival analysis using prediction error curves. Journal of statistical software 50, 11 (2012), 1.
  • Periáñez et al. (2016) África Periáñez, Alain Saas, Anna Guitart, and Colin Magne. 2016. Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using Survival Ensembles. In

    2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

    . IEEE, 564–573.
    https://doi.org/10.1109/DSAA.2016.84
  • Porzelius et al. (2019) Christine Porzelius, Harald Binder, and Maintainer Christine Porzelius. 2019. Package ‘peperr’. (2019).
  • Prentice et al. (1978) R. L. Prentice, J. D. Kalbfleisch, and A. V. Peterson. 1978. The analysis of failure times in the presence of competing risks. Biometrics 34 (1978), 541–544. https://doi.org/10.2307/2530374
  • Rothenbuehler et al. (2015) Pierangelo Rothenbuehler, Julian Runge, Florent Garcin, and Boi Faltings. 2015. Hidden markov models for churn prediction. In SAI Intelligent Systems Conference (IntelliSys), 2015. IEEE, 723–730.
  • Runge et al. (2014) Julian Runge, Peng Gao, Florent Garcin, and Boi Faltings. 2014. Churn Prediction for High-value Players in Casual Social Games. In Computational Intelligence and Games (CIG), 2014 IEEE Conference on. IEEE, 1–8.
  • Sifa et al. (2015) Rafet Sifa, Fabian Hadiji, Julian Runge, Anders Drachen, Kristian Kersting, and Christian Bauckhage. 2015. Predicting purchase decisions in mobile free-to-play games. In

    Eleventh Artificial Intelligence and Interactive Digital Entertainment Conference

    .
  • Sifa et al. (2018) Rafet Sifa, Julian Runge, Christian Bauckhage, and Daniel Klapper. 2018.

    Customer Lifetime Value Prediction in Non-Contractual Freemium Settings: Chasing High-Value Users Using Deep Neural Networks and SMOTE. In

    HICSS.
  • Sing et al. (2005a) Tobias Sing, Oliver Sander, Niko Beerenwinkel, and Thomas Lengauer. 2005a.

    ROCR: visualizing classifier performance in R.

    Bioinformatics 21, 20 (2005), 3940–3941.
  • Sing et al. (2005b) Tobias Sing, Oliver Sander, Niko Beerenwinkel, and Thomas Lengauer. 2005b. ROCR: visualizing classifier performance in R. Bioinformatics 21, 20 (2005), 3940–3941.
  • Therneau and Lumley (2015) Terry M Therneau and Thomas Lumley. 2015. Package ’survival’. (2015).
  • Wang et al. (2013) Jian Wang, Yi Zhang, Christian Posse, and Anmol Bhasin. 2013. Is it time for a career switch?. In Proceedings of the 22nd international conference on World Wide Web. ACM, 1377–1388.
  • Wu et al. (2018) Congling Wu, Shengwen Guo, Yanjia Hong, Benheng Xiao, Yupeng Wu, Qin Zhang, Alzheimer’s Disease Neuroimaging Initiative, et al. 2018.

    Discrimination and conversion prediction of mild cognitive impairment using convolutional neural networks.

    Quantitative Imaging in Medicine and Surgery 8, 10 (2018), 992.