1 Introduction
Over the last few decades, the telecommunication industry (TCI) has witnessed enormous growth and development in terms of technology, level of competition, number of operators, new products and services and so on. However, because of extensive competition, saturated markets, dynamic environment, and attractive and lucrative offers, the TCI faces serious customer churn issues, which is considered to be a formidable problem in this regard Óskarsdóttir et al. (2017). In a competitive market, where customers have numerous choice of service providers, they can easily switch services and even service providers. Such customers are referred to as churned customers Óskarsdóttir et al. (2017) with respect to the original service provider.
The three main generic strategies to generate more revenues in an industry are (i) to increase the retention period of customers, (ii) to acquire new customers and (iii) to upsell the existing customers being the other two Wei and Chiu (2002). In fact, customer retention is believed to be the most profitable strategy, as customer turnover severely hits the company’s income and its marketing expenses Amin et al. (2016).
Churn is an inevitable result of a customer’s long term dissatisfaction over the company’s services. Complete withdrawal from a service (provider) on part of a customer does not happen in a day; rather the dissatisfaction of the customer, grown over time and exacerbated by the lack of attention by the service provider, results in such a fiery gesture by the customer. To prevent this, the service provider must work on limitations (perceived by the customers) in its services to retain the aggrieved customers. Thus it is highly beneficial for a service provider to be able to identify a customer as a potential churned customer. In this context, nonchurn customers are those who are reluctant to move from one service provider to another in contrast to churn customers.
If a telephone company (TELCO) can predict that a customer is likely to churn, then it can potentially cater targeted offerings to that customer to reduce his dissatisfaction, increase his engagement and thus potentially retain him/her. This has a clear positive impact on revenue. Additionally, customer churn adversely affects the company’s fame and branding. As such, churn prediction is a very important task particularly in the telecom sector. To this end, TELCOs generally maintain a detailed standing report of the customer’s to understand their standing and to anticipate their longevity in continuing the services. Since the expense of getting new customers is relatively high Lu and Ph (2002); Hadden et al. (2008), TELCO nowadays principally focus on retaining their longterm customers rather than getting new ones. This makes churn prediction essential in the telecom sector Keramati et al. (2014); Xie et al. (2009). With the above backdrop, in this paper, we revisit the customer churn prediction (CCP) problem as a binary classification problem in which all of the customers are partitioned into two classes, namely, Churn and NonChurn.
1.1 Brief Literature review
The problem of CCP has been tackled using various approaches including machine learning models, data mining methods, and hybrid techniques. Several Machine Learning (ML) and data mining approaches (e.g., Rough set theory Amin et al. (2016, 2015)
, Naïve Bayes and Bayesian network
Kirui et al. (2013)Hung et al. (2006); De Caigny et al. (2018)De Caigny et al. (2018), RotBoostIdris and Khan (2012), Support Vector Machine (SVM)
Renjith (2017), Genetic algorithm based neural network
Pendharkar (2009), AdaBoost Ensemble learning technique Idris et al. (2017), etc.) have been proposed for churn prediction in the TCI using customer relationship management (CRM) data. Notably, CRM data is widely used in prediction and classification problems Huang et al. (2010). A detailed literature review considering all these works is beyond the scope of this paper; however, we briefly review some of the most relevant papers below.Brandusoiu et al. Brandusoiu et al. (2016)
presented a data mining based approach for prepaid customer churn prediction. To reduce data dimension, the authors applied Principal Component Analysis (PCA). Three machine learning classifiers were used here, namely, Neural Networks (NN), Support Vector Machine (SVM), and Bayes Networks (BN) to predict churn customers. He et al.
He et al. (2009) proposed a model based on Neural Networks (NN) in order to tackle the CCP problem in a large Chinese TELCO that had about million customers. Idris et al. Idris et al. (2012)proposed a technique combining genetic programming with AdaBoost to model the churn problem in the TCI. Huang et al.
Huang et al. (2015)studied the problem of CCP in the big data platform. The aim of the study was to show that big data significantly improves the performance of churn prediction using Random Forest classifier.
Makhtar et al. Makhtar et al. (2017) proposed a rough set theory based model for churn prediction in TELCO. Amin et al. Amin et al. (2016) on the other hand focused on tackling the data imbalance issue in the context of CCP in TELCO and compared six unique sampling strategies for oversampling. Burez et al. Burez and Van den Poel (2009) also studied the issue of unbalanced datasets in churn prediction models and conducted a comparative study for different methods for tackling the data imbalance issue. Hybrid strategies have also been used for processing massive amount of customer information together with regression techniques that provide effective churn prediction results S. A. Qureshi et al. (2013). On the other hand, Etaiwi et al. Etaiwi et al. (2017) showed that their Naïve Bayes model was able to beat SVM in terms of precision, recall, and Fmeasure.
To the best of our knowledge, an important limitation in this context is that most of the methods in the literature have been experimented on a single dataset. Also, the impact of data transformation methods on CCP models have not been investigated deeply. There are various DT methods like the Log, Rank, Zscore, Discretization, Minmax, Boxcox, Aarcsine and so on. Among these, researchers broadly used the Log, Zscore, and Rank DT methods in different domains (e.g., software metrics normality and maintainability
Zhang et al. (2013) Zhang et al. (2017), defect prediction Fukushima et al. (2014), dimensionality reduction Fukushima et al. (2014) etc.). To the best of our knowledge There are only one work in the literature where DT methods have been applied in the context of CCP in TELCO Amin et al. (2018), where only two DT methods (e.g., Log and Rank) and a single classifier (e.g., Naïve Bayes) have been leveraged. Therefore, a large room for improvement is there in this context, which we consider in this work.1.2 Our Contributions
This paper makes the following key contributions:

We develop customer churn prediction models that leverage various data transformation (DT) methods and various optimized machine learning algorithms. In particular, we have combined six different DT methods with eight different optimized classifiers to develop a number of models to handle the CCP problem. The DT methods we utilized are: Log, Rank, Boxcox, Zscore, Discretization and Weightofevidence (WOE). On the other hand the classification algorithms we used include KNearest Neighbor (KNN), Naïve Bayes (NB), Logistic Regression (LR), Random forest (RF), Decision tree (DTree), Gradient boosting (GB), FeedForward Neural Networks (FNN) and Recurrent Neural Networks (RNN).

We have conducted extensive experiments on three different publicly available datasets and evaluated our models using various information retrieval metrics such as, AUC, Precision, Recall and Fmeasure. Our models achieved promising results and conclusively found that the DT methods have positive impact on CCP models.

We also conduct statistical tests to check whether our findings are statistically significant or not. Our results clearly indicate that the impact of DT methods on the classifiers is not only positive but also statistically significant.
2 Materials and Methods
2.1 Datasets
We use three publicly available benchmark datasets (referred to as Dataset 1, 2 and 3 henceforth), that are broadly used for the CCP problem in the telecommunication area. Table 1 describes these three datasets.
Description  Dataset1  Dataset2  Dataset3 

No. of samples  100000  5000  3333 
No. of attributes  101  20  21 
No. of class labels  2  2  2 
Percentage churn samples  50.43  85.86  85.5 
Percentage nonchurn samples  49.56  14.14  14.5 
Source of the datasets  URL  URL  URL 
URL: https://www.kaggle.com/abhinav89/telecomcustomer/data (Last Access: September 29, 2019).
URL: https://data.world/earino/churn (Last Access: February 10, 2020).
URL: https://www.kaggle.com/becksddf/churnintelecomsdataset/data (Last Access: February 10, 2020).
2.1.1 Data preprocessing
We apply the following essential data preprocessing steps:

We ignore the sample IDs and/or descriptive texts which are used only for informational purposes.

Redundant attributes are removed.

Missing numerical values are replaced with zero (0) and missing categorical values are treated as a separate category.

We normalize the categorical values (such as ‘yes’ or ‘no’, ‘true’ or ‘false’) into 0s and 1s where each value represents the corresponding category Amin et al. (2015). Label encoder is used to normalize the categorical attributes.
2.2 Data Transformation (DT) Methods
data transformation refers to the application of a deterministic mathematical function to each point in a data set. Table LABEL:table:DTMethods provides a description of the DT methods leveraged in our research.
Begin of Table  

DT Method  Description  Equation  
Log  Each variable x is replaced with log(x), where the base of the log is left up to the analyst Zhang et al. (2017) Menzies et al. (2007) Feng et al. (2014). In this study, In case the feature value contains zero, a constant 1 is typically added, along with ln(x) 


Rank  It is a statistically calculated rank value Zhang et al. (2017) Bishara and Hittner (2015). In this research, we followed the study Zhang et al. (2017) to transform the initial values of every feature in a original dataset into ten (10) ranks, using each 10th % (percentile) of the given feature’s values 


BoxCox  It is a lamba based power transformation method Zhang et al. (2017) Feng et al. (2014) . This transformation method is a process to transform nonnormal dependent feature values into a normal distribution. 


Zscore  It indicates the distance of a data point from the mean in units of standard deviation Cheadle et al. (2003). 


Discretization  It is a binning technique Fayyad and Irani (1992) . For continuous variables, four widely used discretization techniques are Kmeans, equal width, equal frequency, and decision tree based discretization. We used the equal width discretization technique which is a very simple method. 
For any given continuous variable x, the following process is applied:
Provided is the minimum of a selected feature and is the maximum, bin width can be computed as


Weightofevidence (WOE)  It is binning and logarithmic based transformationSiddiqi (2005) . Most of the cases, the WOE solves the skewed problem in the data distribution. WOE is the natural logarithm (ln) of the distribution which is the distribution of the good events (1) divided by the distribution of the bad events (0). 


End of Table 
2.3 Evaluation Measures
The confusion matrix is generally used to assess the overall performance of a predictive model. For the CCP problem, the individual components of confusion matrix is defined as follows: (i) True Positives (TP): correctly predicted churn customers (ii) True Negatives (TN): correctly predicted nonchurn customers (iii) False Positives (FP): nonchurn customers, misspredicted as churned customers and (iv) False Negatives (FN): churn customers, misspredicted as nonchurn customers. We use the following popular evaluation measures for comparing the performance of the models.
Precision : Mathematically precision can be expressed as:
(7) 
The probability of detection (POD)/ Recall:
POD or recall is a valid choice of evaluation metric when we want to capture as many true churn customers as possible. Mathematically POD can be expressed as:
(8) 
The probability of false alarm (POF): The value of POF should be small as much as possible (in an ideal case, the value of POF is 0 ). Mathematically POF can be defined as:
(9) 
We use POF for measuring incorrect churn predictions.
The area under the curve (AUC): Both POF and POD are used to measure the AUC Zhang et al. (2017) Amin et al. (2019). A higher AUC value indicates a higher performance of the model. Mathematically AUC can be expressed as:
(10) 
FMeasure:
The Fmeasure is the harmonic mean of the precision and recall. Fmeasure is needed when we want to seek a balance between precision and recall. A perfect model has an Fmeasure of 1. The Mathematical formula of Fmeasure is defined below.
(11) 
Key  Classifer  Model type  Description 

KNN  KNearest Neighbor  Instancebased learning, lazy learning  The KNN algorithm assumes that similar things exist in close proximity. 
NB  Naïve Bayes  Gaussian  NB is a family of probabilistic algorithms. It gives the conditional probability, based on the Bayes theorem 
LR  Logistic Regression  Statistical model  Logistic regression is estimating the parameters of a logistic model (a form of binary regression). 
RF  Random forest  Trees  RF is an ensemble treebased learning algorithm 
DTree  Decision tree  Trees  DTree builds classification or regression models in the form of tree structure 
GB  Gradient boosting  Trees  GB is an ensemble treebased boosting method 
FNN  FeedForward Networks  Deep learning  FNN is a deep learning classifier where the input travels in one direction 
RNN  Recurrent Neural Networks  Deep learning  RNN is a deep learning classifier where the output from previous step are fed as input to the current step. 

2.4 Optimized CCP models
The baseline classifiers used in our research are presented in Table 3. To examine the effect of the DT methods, we apply them on the original datasets and subsequently, on the transformed data, we train our CCP models with multiple machine learning classifiers (KNN, NB, LR, RF, DTree, GB, FNN and RNN) listed in Table 3.
2.4.1 Validation method and steps
In all our experiments, the classifiers of the CCP models were trained and tested using 10fold crossvalidation on the three different datasets described in Table 1
. Firstly, a RAW data based CCP model was constructed without leveraging any of the DT methods on any features of the original datasets. In this case, we did not apply any feature selection steps either. However, we used the best hyperparameters for the classifiers.
Subsequently, we applied a DT method on each attribute of the dataset and retrained our models based on this transformed dataset. We experimented with each of the DT methods mentioned in Table 3. For each DT based model, we also used a feature selection and optimization procedure, which is described in the following section.
2.4.2 Feature Selection and Optimization
We have a set of hyperparameters and we aim to find the right combination of the values thereof which will optimize the objective function. For tuning the hyperparameters, we have applied grid search Syarif et al. (2016). Figure 1 illustrates the overall flowchart of our proposed optimized CCP model. First, we applied some necessary preprocessing steps on the datasets. Then, DT methods (Log, Rank, Boxcox, Zscore, Discretization, and WOE ) were applied thereon. Next, we used the univariate feature selection technique to select the higher scored features from the dataset (we selected the top 80 features for dataset1 and top 15 features for both dataset2 and dataset3). We applied grid search to find the best hyperparameters for individual classifier algorithms. Finally, 10fold cross validation was employed to train and validate the models.
3 Stability measurement tests
We used Friedman nonparametric statistical test (FMT) Demšar (2006) to examine the reliability of the findings and whether the improvement achieved by the DT based classification models are statistically significant. The Friedman test is the nonparametric statistical test for analyzing and finding differences in treatments across multiple attempts Demšar (2006)
. It does not assume any particular distribution of the data. Friedman test ranks all the methods. It ranks the classifiers independently for each dataset. Lower rank indicates a better performer. We performed the Friedman test on the Fmeasure results. Here, the null hypothesis
represents: “there is no difference among the performances of the CCP models”. In our experiments, the test was carried out with the significance level, .Subsequently, post hoc Holm test is conducted to perform the paired comparisons with respect to the best performing DT model. In particular, when the null hypothesis is rejected, we used the post hoc Holm test to compare the performance of the models. This test is a similarity measurement process that compares all the models. We performed the Holm’s post hoc comparison for and .
4 DT methods and Data Distribution
Data transformation attempts to change the data from one representation to another to enhance the quality thereof with a goal to enable analysis of certain information for specific purposes. In order to find out the impact of the DT methods on the datasets, data skewness and data normality measurement tests have been performed on the three different datasets and the results are visualized through QQ (quantilequantile) plots
Amin et al. (2019); Zhang et al. (2017).4.0.1 Coding and Experimental Environment
All experiments were conducted on a machine having Windows 10, 64bit system with Intel Core i7 3.6GHz processor, 24GB RAM, and 500GB HD. All codes were implemented with Python 3.7. Jupyter Notebook was used for coding. All data and code are available at the following link: https://github.com/joysana1/Churnprediction.
5 Results
The impact of the DT methods on all the 8 classifiers (through rigorous experimentation on 3 benchmark datasets) are illustrated in Figures 2 through 9. Each of these figures illustrates the performance comparison (in terms of AUC, precision, recall, and Fmeasure) among the RAW data based CCP model and other DT methods based CCP models for all three datasets as follows (please check Table 4 for a map for understanding the figures). Table 7, 8 and 9 in the supplementary file reports the values for all the measures for all the datasets.
Figure  Classifier  Subfigure  Dataset 
1  KNN  a  1 
b  2  
c  3  
2  NB  a  1 
b  2  
c  3  
3  RF  a  1 
b  2  
c  3  
4  LR  a  1 
b  2  
c  3  
5  FNN  a  1 
b  2  
c  3  
6  RNN  a  1 
b  2  
c  3  
7  DTree  a  1 
b  2  
c  3  
8  GB  a  1 
b  2  
c  3 
5.1 Results on Dataset 1
The performance of the baseline classifiers (referred to as RAW in the figures) in dataset 1 is quite poor in all the metrics: the best performer in terms of Fmeasure is NB with a value of 0.636 only. Interestingly, not all DT methods performed better than RAW. However, the performance of WOE is consistently better than RAW across all classifiers. In a few cases of course some other DT methods able to outperform WOE: for example, across all combinations in Dataset 1, the best individula performance is achieved by FNN with ZSCORE with a staggering FMeasure of 0.917. As for AUC as well, the most consistent performer is WOE with the best value achieved for FNN (0.802)
5.2 Results on Dataset 2
Interestingly, the performance of some baseline classifiers in Dataset 2 is quite impressive in Dataset 2, particularly in the context of AUC. For example, both DT and GB (RAW version) achieved more than 0.82 as AUC; the FMeasure was also acceptable, particularly for GB (0.78).
Among the DT methods, again, WOE performs (in terms of FMeasure) most consistently albeit with the glitch that for DT and GB, it performs slightly worse than RAW. In fact, surprisingly enough, for GB, the best performer is RAW; for DT however, ZSCORE is the winner, very closely followed by BOXCOX.
5.3 Results on Dataset 3
In Dataset 3 as well, the performance of DT and GB in RAW mode is quite impressive: for DT the AUC and FMeasure values are respectively 0.84 and 0.727 and for GB these are even better, 0.86 and 0.809, respectively. Again, the performance of WOE is the most consistent except in the case of DT and GB where it is beaten by RAW. The overall winner is GB with LOG transformation which registers 0.864 as AUC and 0.818 as FMeasure.
6 Statistical test results
Algorithm  Rank (#Position) 

WOE  2.4167 (#1) 
ZSCORE  3.5417 (#2) 
RAW  3.7917 (#3) 
Discritization  4.0833 (#4) 
BOXCOX  4.1667 (#5) 
RANK  4.9375 (#6) 
LOG  5.0625 (#7) 
Table 5 summarizes the ranking of the Freedman test among the DT methods. Friedman statistic distributed according to Chisquare with (
1) degrees of freedom is 24.700893. Here
is the number of methods. Pvalue computed by Friedman test is 0.00039. Form the Chisquare distribution table, critical value is 12.59. Notably,
confidence interval (CI) has been considered for this test. Our Friedman test statistic value (24.700893) is greater than the critical value (12.59). So the decision is to reject the null hypothesis . Subsequently, the post hoc Holm test revealed significant differences among the DT methods. Figure 10 illustrates the results of Holm’s test as a heat map. pvalue was considered as the evidence of significance. Figure 10 tells that WOE performance is significantly different from other DT methods except for the ZSCORE. Table 6 reflects the post hoc comparisons for and . When the pvalue of the test is smaller than the significant rate = 10% and 5% then Holm’s procedure rejects the null hypothesis. Evidently, WOE DT based models are found to be significantly better than the other models.Method  pvalue  Hypothesis ()  Hypothesis ()  

1  WOE vs. LOG  0.000022  Rejected  Rejected 
2  WOE vs. RANK  0.000053  Rejected  Rejected 
3  WOE vs. BOXCOX  0.005012  Rejected  Rejected 
4  WOE vs. Discritization  0.007526  Rejected  Rejected 
5  WOE vs. RAW  0.027461  Rejected  Rejected 
6  WOE vs. ZSCORE  0.071229  Not Rejected  Rejected 
7 Impact of the DT methods on Data Distribution
The QQ plots are shown in Figure 11, 12 and 13 for Dataset1, Dataset2 and Dataset3, respectively. As we found WOE and ZScore DT methods are performing better than the RAW (without DT) method (see the Friedman ranked table 5), we generated QQ plots only for RAW, WOE, and ZScore methods. In each QQ plot, the first 3 features of the respective dataset are shown. From the QQ plots, it is observed that after transformation by the WOE DT method, we achieved less skewness (i.e., the data became more normally distributed). Normally distributed data is beneficial for the classifiers Amin et al. (2019); Coussement et al. (2017). Similar performance is also observed for ZSCORE.
8 Discussion
From the comparative analysis and statistical tests, it is evident that DT methods have a great impact on improving the CCP performance in TELCO. A few prior works (e.g., Zhang et al. (2013), Zhang et al. (2017), and Amin et al. (2018)) also studied the effect of DT methods but in a limited scale and did not consider the optimization issues. We on the other hand conducted a comprehensive study considering six DT methods and eight machine learning classifiers on three different benchmark datasets. The performance of the DT based classifiers have been investigated in terms of AUC, precision, recall, and Fmeasure.
The data transformation techniques have shown great promise in improving the data distribution quality in general. Specially, in our experiments, the WOE method improved the data normality which in the sequel provided a clear positive impact on the prediction performance for the customer churn prediction (Figures 11  13).
The comparative analyses involving the RAW based and DT based CCP models clearly suggested the potential of DT methods in improving the CCP performance (Figures 2 through 9). In particular, our experimental results strongly suggested that the WOE method contributed a lot towards improving the performance, albeit with the exception of DTree and GB classifiers for the datasets 2 and 3. While the performance of WOE in these cases satisfactory, it failed to outperform RAW based model performance. We hypothesize that this is due to the binning technique within the WOE method. Moreover, those two datasets are unbalanced datasets. The DTree and GB classifiers might consider them as some order which is not a specific order.
From Table 5 we notice that WOE is the best ranked method and the rank value is 2.4167. The post hoc comparison heatmap 10 and Table 6 reflect how the WOE is better than the other methods. As Friedman test is rejecting the null hypothesis and post hoc Holm analysis advocates the WOE method’s supremacy, it is clear that DT methods improve the user churn prediction performance significantly for the telecommunication industry. Therefore, to construct a successful CCP model, we recommend to select the best classifier (LR, FNN) and the WOE data transfer method.
9 Conclusion
Predicting customer churn is one of the most important factors in business planning in TELCOs. To improve the churn prediction performance we investigated with six different data transformation methods, namely, Log, Rank, Boxcox, Zscore, Discretization, and Weightofevidence. We used eight different machine learning classifiers which are KNearest neighbor (KNN), Naïve Bayes (NB), Logistic regression (LR), Random forest (RF), Decision tree (DTree), Gradient boosting (GB), Feedforward neural networks (FNN), Recurrent neural networks (RNN). For each classifier, we applied univariate feature selection method to select top ranked features and used grid search for hyperparameter tuning. We evaluated our methods in terms of AUC, precision, recall, and Fmeasure. The experimental outcomes indicate that, in most cases, the data transformation methods enhance the data quality and improve the prediction performance. To support our experimental results we performed Friedman nonparametric statistical test and post hoc Holm statistical analysis. The Friedman statistical test and post hoc Holm statistical analysis confirmed that Weightofevidence and Zscore DT based CCP models perform better than the raw based CCP model. To test the robustness of our DTaugmented CCP models, we performed our experiments on both balanced (dataset1) and nonbalanced datasets (dataset2 and dataset3). CCP is still a hard and swiftly developing problem usually for competitive businesses and particularly for telecommunication companies. Future research is probably capable to offer higher outcomes on other datasets with multiple classifiers. Another future direction can be to extend this study with other types of data transformation approaches and classifiers. Our proposed model can be tested on the other telecom datasets to examine the generalization of our results at a larger scale. Last but not the least, work can be done to extend our approach to customer churn datasets from other business sectors to study the generalization of our claim across business domains.
References

Justintime customer churn prediction: with and without data transformation.
In
2018 IEEE Congress on Evolutionary Computation (CEC)
, Vol. , pp. 1–6. Cited by: §1.1, §8.  Customer churn prediction in telecommunication sector using rough set approach. Neurocomputing, pp. . External Links: Document Cited by: §1.1, §1.
 Comparing oversampling techniques to handle the class imbalance problem: a customer churn prediction case study. IEEEAccess PP, pp. 7940–7957. External Links: Document Cited by: §1.1.
 Crosscompany customer churn prediction in telecommunication: a comparison of data transformation methods. International Journal of Information Management 46, pp. 304 – 319. External Links: ISSN 02684012, Document, Link Cited by: §2.3, §4, §7.
 Churn prediction in telecommunication industry using rough set approach. Vol. 572, pp. 83–95. External Links: Document Cited by: §1.1, item 4.
 Reducing bias and error in the correlation coefficient due to nonnormality. Educational and Psychological Measurement 75 (5), pp. 785–804. External Links: Document Cited by: Table 2.
 Methods for churn prediction in the prepaid mobile telecommunications industry. pp. 97–100. External Links: Document Cited by: §1.1.
 Handling class imbalance in customer churn prediction. Expert Systems with Applications 36 (3 PART 1), pp. 4626–4636. External Links: Document, ISSN 09574174, Link Cited by: §1.1.
 Analysis of microarray data using zscore transformation. The Journal of molecular diagnostics : JMD 5 (2), pp. 73–81. External Links: Document Cited by: Table 2.
 A comparative analysis of data preparation algorithms for customer churn prediction: a case study in the telecommunication industry. Decision Support Systems 95, pp. 27 – 36. External Links: ISSN 01679236, Document, Link Cited by: §7.
 A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research 269, pp. . External Links: Document Cited by: §1.1.
 Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, pp. 1–30. External Links: ISSN 15337928 Cited by: §3.
 Evaluation of classification algorithms for banking customer’s behavior under apache spark data processing system. Procedia Computer Science 113, pp. 559 – 564. External Links: ISSN 18770509, Document, Link Cited by: §1.1.
 On the handling of continuousvalued attributes in decision tree generation. Machine Learning 8 (1), pp. 87–102. Cited by: Table 2.
 Logtransformation and its implications for data analysis. Shanghai archives of psychiatry 26 (2), pp. 105–9. External Links: Document Cited by: Table 2.
 An empirical study of justintime defect prediction using crossproject models. Empirical Software Engineering 21, pp. 172–181. External Links: ISBN 9781450328630, Document Cited by: §1.1.
 Churn prediction: Does technology matter. World Academy of Science, Engineering and Technology (16), pp. 973–979. External Links: Link Cited by: §1.
 A study on prediction of customer churn in fixed communication network based on data mining. pp. 92–94. External Links: Document Cited by: §1.1.
 A new feature set with new window techniques for customer churn prediction in landline telecommunications. Expert Systems with Applications 37 (5), pp. 3657 – 3665. External Links: ISSN 09574174, Document, Link Cited by: §1.1.
 Telco churn prediction with big data. pp. 607–618. External Links: Document Cited by: §1.1.
 Applying data mining to telecom chum management. Expert Systems with Applications 31, pp. 515–524. External Links: Document Cited by: §1.1.
 Intelligent churn prediction for telecom using gpadaboost learning and pso undersampling. Cluster Computing 22, pp. 7241–7255. Cited by: §1.1.
 Genetic programming and adaboosting based churn prediction for telecom. pp. 1328–1332. External Links: ISBN 9781467317139, Document Cited by: §1.1.
 Customer churn prediction for telecommunication: employing various various features selection techniques and tree based ensemble classifiers. pp. 23–27. External Links: ISBN 9781467322492, Document Cited by: §1.1.
 Improved churn prediction in telecommunication industry using data mining techniques. Applied Soft Computing 24, pp. 994 – 1012. External Links: ISSN 15684946, Document, Link Cited by: §1.
 Predicting customer churn in mobile telephony industry using probabilistic classifiers in data mining. IJCSI Int. J. Comput. Sci. Issues 10, pp. 165–172. Cited by: §1.1.
 Predicting Customer Churn in the Telecommunications Industry –– An Application of Survival Analysis Modeling Using SAS. Techniques 11427, pp. 114–27. External Links: Link Cited by: §1.
 Churn classification model for local telecommunication company based on rough set theory. Journal of Fundamental and Applied Sciences 9 (6), pp. 854–68. External Links: Document Cited by: §1.1.
 Problems with precision: a response to ”comments on ’data mining static code attributes to learn defect predictors’”. IEEE Transactions on Software Engineering 33 (9), pp. 637–640. Cited by: Table 2.
 Social network analytics for churn prediction in telco: model building, evaluation and network architecture. Expert Systems with Applications 85, pp. . External Links: Document Cited by: §1.
 Genetic algorithm based neural network approaches for predicting churn in cellular wireless network services. Expert Systems with Applications 36 (3, Part 2), pp. 6714 – 6720. External Links: ISSN 09574174, Document, Link Cited by: §1.1.
 B2C ecommerce customer churn management: churn detection using support vector machine and personalized retention using hybrid recommendations. International Journal on Future Revolution in Computer Science and Communication Engineering (IJFRCSCE) 3, pp. 34 – 39. External Links: Document Cited by: §1.1.
 Telecommunication subscribers’ churn prediction model using machine learning. pp. 131–136. External Links: Document Cited by: §1.1.
 Credit risk scorecards: developing and implementing intelligent credit scoring. Cited by: Table 2.
 SVM parameter optimization using grid search and genetic algorithm to improve classification performance. TELKOMNIKA (Telecommunication Computing Electronics and Control) 14 (4), pp. 1502. External Links: Document Cited by: §2.4.2.
 Turning telecommunications call details to churn prediction: a data mining approach. Expert Systems with Applications 23, pp. 103–112. External Links: Document Cited by: §1.
 Customer churn prediction using improved balanced random forests. Expert Systems with Applications 36 (3, Part 1), pp. 5445 – 5449. External Links: ISSN 09574174, Document, Link Cited by: §1.
 Data transformation in crossproject defect prediction. Empirical Software Engineering 22 (6), pp. 1–33. External Links: Document Cited by: §1.1, §2.3, Table 2, §4, §8.
 How does context affect the distribution of software maintainability metrics?. pp. 350–359. External Links: Document Cited by: §1.1, §8.
Comments
There are no comments yet.