Abstract
Credit ratings are one of the primary keys that reflect the level of riskiness and reliability of corporations to meet their financial obligations. Rating agencies tend to take extended periods of time to provide new ratings and update older ones. Therefore, credit scoring assessments using artificial intelligence has gained a lot of interest in recent years. Successful machine learning methods can provide rapid analysis of credit scores while updating older ones on a daily time scale. Related studies have shown that neural networks and support vector machines outperform other techniques by providing better prediction accuracy. The purpose of this paper is two fold. First, we provide a survey and a comparative analysis of results from literature applying machine learning techniques to predict credit rating. Second, we apply ourselves four machine learning techniques deemed useful from previous studies (Bagged Decision Trees, Random Forest, Support Vector Machine and Multilayer Perceptron) to the same datasets. We evaluate the results using a 10fold cross validation technique. The results of the experiment for the datasets chosen show superior performance for decision tree based models. In addition to the conventional accuracy measure of classifiers, we introduce a measure of accuracy based on notches called ”Notch Distance” to analyze the performance of the above classifiers in the specific context of credit rating. This measure tells us how far the predictions are from the true ratings. We further compare the performance of three major rating agencies, Standard Poors, Moody’s and Fitch where we show that the difference in their ratings is comparable with the decision tree prediction versus the actual rating on the test dataset.
1 Introduction
Machine learning models have been widely applied on a range of applications in financial fields due to their capability of detecting embedded trends in financial data. For example, asset prices follow a random, non linear dynamic pattern due to many factors such as political events, economic conditions and traders behavior (Huang et al., 2005)
. Despite this non linear nature of price movements, machine learning techniques provide an accurate estimation of asset prices
(Kim, 2003; Huang et al., 2005; Kanchymalay et al., 2017; Ganguli and Dunnmon, 2017; Culkin and Das, 2017; Feng et al., 2018; Gu et al., 2019; Sambasivan and Das, 2017). In addition to asset pricing, machine learning techniques gained a lot of interest in credit rating prediction problem. Credit ratings have been used as an informative indicators of the level of riskiness of companies and bonds (Huang et al., 2004; Lee, 2007). They reflect the likelihood that a counterparty will default on his/her financial obligation (Martens et al., 2010). Rating agencies such as Standard and Poor’s and Moodys analyze various aspects of companies financial status to come up with these credit ratings (Huang et al., 2004). Such an assessment is an expensive complicated process often taking months since it involves many experts to analyze different variables that reflect the underlying corporations’ reliability (Hajek and Michalak, 2013). One solution to reduce this financial and time cost is to come up with a predictive quantitative model based on the historical financial information of a company.In this study we provide a comparative analysis of the literature of Corporate credit rating prediction problem. We focus on modern statistical and Artificial Intelligence techniques. We review and summarize the performance of techniques that were demonstrated to be the most accurate in the literature. We implement these most accurate techniques and compare their performance when analyzing three different sectors of the US economy. To our knowledge, this is one of the first papers that implements corporate rating models on the same data and thus compares algorithms’ performance on an equal footing.
Corporate credit rating is a different problem than the typical image classification techniques. Thus, we introduce a new evaluation measure: the Notch Distance in section 4.2 on page 4.2. This measure allows us to better quantify and distinguish between performance of various machine learning methods.
This paper is organized as follows. In Section 2, we provide a review of prior literature on the credit rating problem. Section 3 contains an explanation of each machine learning method studied in order to predict corporate credit ratings. Section 4 explains the nature of the input and output data sets in this study as well as the evaluation methodology used. Experimental results and analysis are provided in Section 5. Final conclusions are drawn in section 6, together with future applications of the methodology.
2 Prior Methods on the Credit Rating Problem using Machine Learning Techniques
Bond Rating
The majority of literature on the credit rating problem is focused on predicting bond’s probability of default and thus bond rating
(Saha and Waheed, 2017). Garavaglia (Garavaglia, 1991) simulated S&P corporate bond rating using the unidirectional version of the counterpropagation neural network. This neural network model introduced by Robert HetchNielsen. Chaveesuk et al. (Chaveesuk et al., 1999)studied three supervised neural networks concepts, backpropagation, radial basis function and learning vector quantization. However, they concluded, neither a neural network nor a regression model performs well on bond rating assignments. Shin and Han
(Shin and Han, 2001) combined inductive learning approach based on decision trees with case indexing process in order to build a knowledgebased system that can classify Korean corporate bond ratings. Kim & Han (Kim and Han, 2001)clustered financial data among bond rating groups. They used competitive artificial neural network to generate the centroid values of the clusters. They perform a caseindexing study based on these cluster information and combine it with the classification technique in order to present a hybrid model for forecasting corporate bond ratings. They apply their method on 167 financial variables including five categorical variable and 162 financial ratios. After performing some statistical tests such as ANOVA, factor analysis and stepwise method, they select 13 financial ratios as the final input data. Abdou
(Abdou, 2009)studied feed forward neural network to evaluate consumer loans within the Egyptian private banks. Lessman et al.
Lessmann et al. (2015) provided a deep analysis of machine learning techniques used in forecasting retail credit scoring. They studied classification algorithms used to predict probability of default for retail loans. The authors used multiple datasets and criteria for evaluation and concluded that their hybrid method is superior to other methods. Gangolf et al. Gangolf et al. (2016) provided a comparison of statistical and artificial intelligence techniques used for forecasting probability of default for bonds. In case of predicting probability of default, Angelini et al (Angelini et al., 2008) studied two different neural networks to evaluate credit risk of Italian small businesses. They showed that neural network architectures can be useful for estimating the probability of default of borrowers. Tsai et al. (Tsai and Wu, 2008) studied the application of multilayer perceptron network on bankruptcy prediction and credit risk evaluation. Their analysis proved that single neural network provides accurate results compared to multilayers classifiers. Addo et al. (Addo et al., 2018)studied seven binary classifiers (elastic net, random forest, gradient boosting model and four different neural network architectures) in predicting loan default probability. They implemented these models on 117019 data points in the year 2016 and 2017 provided from European banks. They used
of data as training set, as validation and as the test set. They compared the performance of these models on a selected subset of important features based on variable importance of each model. They concluded three important points. First, that decision tree based models outperform multilayer artificial neural networks. Second, that standard performance criteria such as AIC or R2 are not sufficient enough to compare different models. Third, that the selected subset of important features based on variable importance of models, does not necessarily provide stable results.Corporate credit rating
In this work we are concerned with corporate credit rating. Although, the two problems are related, the bond ratings are not necessarily the same as the issuing company’s corporate rating due to the priority of claim of various bonds. There are many more bonds than there are corporations and thus the data availability is much more limited. When rating publicly traded corporations the input is based generally on the 10Q and 10K statements. However, bonds are traded and thus bond values can be marked to market far more frequently than quarterly. Furthermore, while the basic bond structure is similar for all issuing companies regardless of sector, the quarterly statement issued by a bank is fundamentally different from that issued by a company from say the Consumer Staples sector. Thus papers dedicated to corporate credit rating are generally constructing models for companies from a specific sector. The lack of large data sets is thus typical of this domain and this makes the credit rating problem challenging.
Methodologically, statistical models and artificial intelligence approaches are used in the field of credit rating prediction. In this section, we provide a review of prior and relevant literature of models used. We start from standard models and later present more recent and complex models. We summarize these studies in Tables 1, 2 and 3. We are basing our subsequent analysis on the accuracy numbers in these tables to determine which models will be implemented on the financial statements data.
Statistical Methods
Standard statistical techniques such as logistic regression analysis (LRA) and discriminant analysis (DA) have been used to construct credit scoring models
(Horrigan, 1966; West, 1970; Kaplan and Urwitz, 1979; Kim et al., 1993; Kamstra et al., 2001; Oelerich and Poddig, 2006; Abdou, 2009; Akkoç, 2012; Abdou et al., 2016). Modified version of these techniques have been developed to improve credit scoring models. (Hwang et al., 2009, 2010)modify the usual ordered probit model by replacing the linear regression function with a semiparametric function to increase the choices of regression function. They apply this prediction method on four market driven variables, nineteen accounting variables and industry indicator variables. Their conclusion shows that an ordered semiparametric probit model (OSPM) provides a better prediction than ordered linear probit model (OLPM). However, different assumptions behind most of these statistical approach, such as linearity, normality and distributiondependency limit their predictive capability comparing to Artificial Intelligence techniques
(Huang et al., 2004; Pai et al., 2015).Nonetheless, recent papers are still using traditional statistics methods and new developments. Gogas et al. (2014) used ordered probit regression model to forecast long term bank credit rating. The benchmark was Fitch credit rating in 2012. Irmatova (2016)
introduced a new rating model based on principal component analysis and kmeans clustering algorithm to predict
country’s rating. Karminsky and Khromova (2016) implemented an ordered probit regression model to predict bank ratings using the Bankscope database. Hajek et al. (2016)developed a methodology based on latent semantic analysis to extract topical content from companyrelated documents. The authors use naive bayesian network to forecast corporate credit ratings. Recently,
(Petropoulos et al., 2016)developed a methodology based on Student’st Hidden Markov Models (SHMMs) to investigate the temporal dimension and heavy tailed distribution of firmrelated data.
We next present three prevalent AI techniques used for corporate credit rating: Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Classification and Regression Trees (CART).
Artificial Neural Networks (ANN)
Artificial Neural Networks draws attention from many researchers with its robustness, flexibility and higher accuracy (Dutta and Shekhar, 1988; Surkan and Singleton, 1990; Kim et al., 1993; Maher and Sen, 1997; Kwon et al., 1997; Brennan and Brabazon, 2004; Abdou, 2009; Abdou et al., 2016). West (West, 2000)
implemented five neural network architectures: multilayer perceptron, mixtureofexperts, radial basis function, learning vector quantization and fuzzy adaptive resonance. He compared the results of these neural networks with linear discriminant analysis, logistic regression, k nearest neighbor, kernel density estimation, and decision trees. They studied German and Australian credit scoring data and conclude that multilayer perceptron is not the most accurate neural network architecture and logistic regression outperforms all neural network architectures. Kim
(Kim, 2005)applied adaptive learning networks (ALN) which is a nonparametric model, on both financial and non financial variables to predict S&P credit ratings. Yu et al.
(Yu et al., 2008) proposed a six stage neural network ensemble learning model to assess a credit risk measurement on Japanese consumer credit card applications approval and UK corporations. Kashman (Khashman, 2010) investigated three neural networks based on back propagation learning algorithm on German Credit Approval dataset. The architecture of these neural networks are different according to various parameters used in the model such as hidden units, learning coefficients, momentum rate and random initial weight range. Pacelli and Azzollini (Pacelli and Azzollini, 2011) provided an overview of different types of neural networks used in credit rating literature.Support Vector Machines
Among all artificial intelligence techniques, Support Vector Machine (SVM) has indicated a powerful classification ability (Cortes and Vapnik, 1995; Kim and Sohn, 2010; Vapnik, 2013; Xiao et al., 2016). However the standard SVM, addresses the twoclass classification problems. For multiclass classification problems, SVM can be represented as several two class tasks such as oneagainstone (OAOSVM) (Schölkopf et al., 1999), oneagainstall (OAASVM) (Bottou et al., 1994) and direct acyclic graph (DAGSVM) (Platt et al., 2000; Wang et al., 2009). Cao et al. (Cao et al., 2006) studied all these three approaches on US companies from the manufacturing sector. Hajek & Olej (Hájek and Olej, 2011)
studied Support Vector Machines with supervised learning as well as kernelbased approach with semisupervised learning in order to predict corporate and municipal credit ratings. Kim & Ahn
(Kim and Ahn, 2012) studied a new type classifier based on Support Vector Machine, called (OMSVM) which applies ordinal partwise partitioning (OPP) to extend the binary SVM.Decision Trees
According to the literature, some researchers focus on the higher level of interpretability and feature extraction flexibility of decision treebased approach in multiclass classification problems
(Lee et al., 2006; Kao et al., 2012; Abdou et al., 2016). The fundamental of these techniques are introduced first by (Breiman et al., 1984). Paleologo et al. (Paleologo et al., 2010)proposed a missing data imputation method and ensemble classification technique, subagging, particularly for unbalanced credit rating data sets. The proposed subagging technique is based on different models such as decision trees and support vector machines. Khandani et al.
(Khandani et al., 2010) studied general classification and regression tree technique (CART) on a combination of traditional credit factors and consumer banking transactions to predict consumer credit risk. Pedro Veronezi (Veronezi, 2016)applied Random Forest(RF) and Multi Layer Perceptron (MLP) to predict corporate credit ratings using their financial data. First, he used RF to perform feature selection and then the result was fed to two different RF models. One model aims to predict the changes and another model predicts the direction. The result of these two models have been added to the input dataset. After creating a new dataset, two models were trained: a MLP model and a RF model. He concluded that RF outperforms MLP by providing a more accurate and stable results in a shorter period of time.
Hybrid Models
In addition to all these frequently used techniques, some researchers study other approaches to provide a credit scoring model. Peng et al. (Peng et al., 2011) introduced three multiple criteria decision making (MCDM) methods to evaluate classification algorithms for financial risk prediction. Chen (Chen, 2012) investigated rough set theory (RST) approach to classify Asian banks’ ratings. Some researchers take one step further and integrate multiple techniques to achieve a higher accuracy. Yeh et al. (Yeh et al., 2012) combined random forest feature selection with different approach such as rough set theory (RST) and SVM. They tested these techniques on 17 financial variables in addition to the market based information which has been obtained by Moody’s KMV tool for each corporation. Wu & Hsu (Wu and Hsu, 2012) introduced an enhanced decision support model which is a combination of relevance vector machine and decision tree. Hajek (Hájek, 2012)
used fuzzy rulebased system adapted by a feedforward neural network to classify credit ratings of US municipalities and companies. Pai et al.
(Pai et al., 2015) used Decision Tree Support Vector Machine (DTSVM) integrated to multiple feature selection strategies and rough set theory (RST) to predict credit ratings. Description for other hybrid techniques are available in (Chandra et al., 2009; Akkoç, 2012; Chen and Cheng, 2013).Evaluation and Comparison
The Comparison of the discussed techniques with different parameters and architectures on several datasets, have been widely studied in credit scoring field. Huang et al. (Huang et al., 2004) studied support vector machines (SVM) and backpropagation neural network (BNN) to obtain more accurate credit rating predictions for Taiwan financial institutes and United States commercial banks. Lee (Lee, 2007) used 5fold crossvalidation to find the optimal parameters of SVM, multiple discriminant analysis (MDA), casebased reasoning (CBR), and threelayer fully connected backpropagation neural networks (BPNs) on Korean Datasets. Ye et al. (Ye et al., 2008) compared the performance of four different approaches: Linear Regression, Bagged decision tree with Laplace smoothing, SVM and Proximal SVM (PSVM) over four industries manufacturing, wholesale, retail and services. Ryser & Denzler (Ryser and Denzler, 2009) used kfold cross validation procedure to compare the performance of different credit rating models. They concluded that although nonparametric approaches such as random forest, neural network and generalized additive models still outperform the parametric models, they may overfit during the training process. Hajek & Michalak (Hajek and Michalak, 2013)
studied the performance of supervised machine learning techniques such as Multilayer Perceptron (MLP), Support Vector Machines (SVM), radial Basis Function Neural Network (RBF), Naive Bayes (NB), Random Forest (RF), Linear Discriminant Classifier (LDC) and Nearest Mean Classifier (NMC) on both US and European nonfinancial companies. They focused on 81 financial and nonfinancial variables of 852 US companies and 43 financial and nonfinancial variables of 244 European companies. They reduced the dimension of data by applying feature selection techniques such as wrapper algorithms. They concluded that US credit ratings are more affected by the size of companies and market value ratios while the European ratings depend more on profitability and leverage ratios. Zhong et al.
(Zhong et al., 2014) provided a comparison study over reliability, overfitness and error distribution of different learning algorithms such as SVM, Back propagation neural network, Extreme Learning Machine (ELM) and Incremental Extreme Learning (IELM). They considered 6 financial variables on both S&P and Moody’s rating categories to classify the US corporate credit ratings.Bahrammirzaee (Bahrammirzaee, 2010) provided a comparative analysis of the application of three categories of techniques: artificial neural networks, expert systems and hybrid intelligent systems on credit evaluation. Their analysis categorized articles where artificial neural network outperform other methods as well as articles where neural network performs as same as or even worse than other traditional methods such as decision trees and linear regression. Dima et al. (Dima and Vasilache, 2016) compared the performance of neural networks with Bayesian approach on a large sample of companies ( companies) applying for credit to the same bank in Romania. Khemkhem et al. (Khemakhem and Boujelbene, 2015) implemented neural networks and linear discriminant analysis to forecast Tunisian corporate credit ratings. They concluded neural networks outperform discriminant analysis. Hajek et al. (Hájek and Olej, 2014)
compared the performance of artificial immune classification algorithms (AIRS1, AIRS2, AIRS2p and CSCA) and several wellknown machine learning techniques (MLP, Decision tree, SVM and RBF) on investment and noninvestment grade firms. Both groups of input variables were collected for 520 U.S. companies in the year 2010, 195 classified as investment grade and 325 as noninvestment grade. They used accuracy and misclassification cost as performance measures of the studied classifiers. They also provided confusion matrix for each classifier (TP, TN, FP, FN). First, they conclude that AIRS2 outperforms AIRS1, AIRS2p and CSCA on the test set based on the studied measures. They also show that MLP and Decision tree are the best models by providing the highest accuracy and the lowest misclassification cost, respectively.
Wallis et al. (Wallis et al., 2019) provided a comparative analysis of several most popular machine learning techniques (multinomial logistic regression, linear discriminant analysis, regularized discriminant analysis,artificial neural network, support vector machine, gaussian process classifier, random forest, gradient boosting machine) to predict Moody’s lone term credit rating. They studied these models on 308 of the S&P 500 companies from January 2016 to November 2017. They concluded the top three performing models were random forest, artificial neural network and support vector machines where random forest outperforms the others with accuracy.
Moscatelli et al. (Moscatelli et al., 2019) applied random forest, gradient boosted trees, logistic regression, penalized logistic regression and linear discriminant analysis to predict default probability of Italian nonfinancial firms from 20112017. They studied economic and financial ratios as potential indicators of corporate defaults. They also added a set of credit behavioral indicators on the firmbank relationship to the studied financial ratios. They concluded random forest outperform all other models in predicting default/nondefault credit rating.
A summary of discussed approach including number of financial variables, input dataset, output data set provider and the corresponding accuracy is provided in tables below. Table 1 refers to ANNbased approach. Table 2 addresses SVMbased models and Table 3 represents Decision treebased methods.
Reference  Year  Num. of Input Var.  Input Data  Credit Rating Provider  Accuracy 

(Garavaglia, 1991)  1991  87  797 Companies  S&P  85% 
(Moody and Utans, 1994)  1994  10  196 Indudtrial Firms  S&P  36% 
(Huang et al., 2004)  2004  5  36 Commercial Banks  S&P  80.00% 
(Huang et al., 2004)  2004  6 & 16  Taiwan Dataset  Taiwan Ratings Corporation  75.68% 
(Kim, 2005)  2005  1080 US companies  S&P  84%  
(Brabazon and O’Neill, 2006)  2006  8  791 Nonfinancial US Firms  S&P  85% 
(Cao et al., 2006)  2006  17  US Manufactoring Companies  S&P  80.28% 
(Lee, 2007)  2007  10  3017 Korean Companies  Korea Information Service, Inc  59.93% 
(Yu et al., 2008)  2008  13  Jap. Consumer Credit Card  Application Approvals  88.08% 
(Yu et al., 2008)  2008  12  60 UK Corporations  UK Ratings  85.87% 
(Khashman, 2010)  2010  24  German Dataset  German Ratings  85.9% 
(Kim and Sohn, 2010)  2010  10  Korean SMEs  Korean Default Rates  64.23% 
(Hájek, 2012)  2012  52  852 US Mining Companies  S&P  80% 
(Hájek, 2012)  2012  14  154 US Municipalities  Moody’s  71.62% 
(West, 2000)  2000  24  German and Australian Data  German and Australian Rating  error:0.2243 
(Khemakhem and Boujelbene, 2015)  2015  15  86 Tunisian companies  Tunisian credit rating  82.55% 
(Zhao et al., 2015)  2015  20  German Credit Dataset  German Credit Ratings  87% 
(Hájek and Olej, 2014)  2014  20  Investment and noninvestment grade  S&P  88.44% 
(Addo et al., 2018)  2018  181 and 10  117019 companies asking for a loan  European Bank  RMSE: 0.044 
(Daniel et al., 2019)  2019  17  Random US companies  S&P and Moody’s  83% 
(Wallis et al., 2019)  2019  27  308 companies from S&P 500  Moody’s  63.6% 
Reference  Year  Num. of Input Var.  Input Data  Credit Rating Provider  Accuracy 

(Huang et al., 2004)  2004  14  36 Commercial Banks  S&P  80.00% 
(Huang et al., 2004)  2004  6  Taiwan Dataset  Taiwan Ratings Corporation  79.73% 
(Cao et al., 2006)  2006  17  US Manufactoring Companies  S&P  84.61% 
(Lee, 2007)  2007  10  3017 Korean Companies  Korea Information Service, Inc  67.22% 
(Ye et al., 2008)  2008  33  Four US industries  Moody’s  64% 
(Yu et al., 2008)  2008  13  Jap. Consumer Credit Card  Application Approvals  79.91% 
(Yu et al., 2008)  2008  12  60 UK Corporations  UK Ratings  77.84% 
(Kim and Sohn, 2010)  2010  10  Korean SMEs  Korean Default Rates  66.16% 
(Hájek and Olej, 2011)  2011  5  852 US firms  S&P  85.96% 
(Hájek and Olej, 2011)  2011  12  452 Czech municipals  local experts  89.76% 
(Kim and Ahn, 2012)  2012  14  Korea Dataset  Korea Ratings Corporation  67.98% 
(Yeh et al., 2012)  2012  18  2470 Taiwanese companies  Taiwanese Rating  74.4% 
(Hájek and Olej, 2014)  2014  20  Investment and noninvestment grade  S&P  87.28% 
(Pai et al., 2015)  2015  18  Taiwan Dataset  Taiwan Ratings Corporation  86% 
(Wallis et al., 2019)  2019  27  308 companies from S&P 500  Moody’s  60.1% 
Reference  Year  Num. of Input Var.  Input Data  Credit Rating Provider  Accuracy 
(Shin and Han, 2001)  2001  12  Korean Companies  Korean Ratings  69.3% 
(Koh et al., 2004)  2004  74.2%  
(Ye et al., 2008)  2008  33  Manufacturing/Wholesale/Retail/Services  Moody’s  47.1% 
(Hájek and Olej, 2014)  2014  20  Investment and noninvestment firms  S&P  86.13% 
(Veronezi, 2016)  2016  33  Healthcare, Financial, Technology Sectors in USA  SP and Moody’s  
(Addo et al., 2018)  2018  181 and 10  117019 companies asking for a loan from a bank  European Bank  RMSE: 0.044 
(Wallis et al., 2019)  2019  27  308 companies from S&P 500  Moody’s  64.6% 
(Moscatelli et al., 2019)  2019  26  Italian nonfinancial firms  Italian Credit Register 
3 Analytical Methods
Supervised learning methods are based on learning from observations. Let and be the input and output space respectively. The output space identifies the classification type . When , the problem is a binary classification. If , the problem is a multiple class classification. leads to a regression type problems. During the learning process, the classifier assumes there exists a hidden function in a training set . The classifier attempts to find a prediction for this function such that the error is minimized.
There are multiple ways to determine the exact nature of function . In computer science and statistics these are generically named machine learning algorithms. We are focusing an a few specific machine learning algorithms described in this section. The previous section and the summary Tables 1, 2, and 3 detail these methods as the most popular in credit rating problem. This is the reason we chose them for implementation.
3.1 Bagged Decision Trees (BDT)
Let be the learning/training set consists of instances. Bagging or bootstrap aggregation algorithm creates bootstrap samples where each sample consists of randomly chosen instances but with replacement from . Each of these bootstrap samples train the same classifier which in this case is a decision tree. In order to predict an outcome for each instance of the testing set, the new case must be run down each of the decision trees. The prediction is obtained by a maximum number of votes among the classifiers.
This ensemble structure combines predictions of multiple classifiers to provide a better performance than unique learner. The combination of learners in bagging structure reduces the risk of classification error (variance) for those unstable high variance classifiers such as decision trees.
In Bagged Decision Trees (BDT), the possibility of overfitting of individual tree is less concerned. For this reason, the individual decision trees are grown deep and the trees are not pruned. Bagged Decision tree has only two hyperparameters, the number of trees and the number of samples. However, they may have many similarities which leads to a high correlation in predictions. The structure of Random Forest will address this problem. A more detailed description of these methods can be found in (Breiman et al., 1984).
3.2 Random Forest (RF)
Random Forest (RF) is another algorithm based on ensemble of classification trees which is developed by Breiman (Breiman et al., 1984; Breiman, 2001). The only difference between RF and BDT is that RF takes one extra step. In addition to taking the random subset of data, it also chooses randomly a subset of at each node and calculate the best split at that node only within the given subset of . This structure provides uncorrelated or weakly correlated predictions. Also, there is no pruning step, which means all the trees of the forest are grown deep. RF has only two hyperparameters, the number of variables in the random subset at each node and the number of trees in the forest. Moreover, RF ranks variables by the importance of a variable based on the classification accuracy, while considering the interaction between variables.
3.3 ANN: Multi Layer Perceptron (MLP)
A perceptron is a linear classifier that separates two classes with a straight line. Let X be the input vector and y be a perceptron that produces a single output:
where w is a vector of input weights, b is a bias and
denotes a nonlinear activation function.
A multilayer perceptron is a deep, artificial neural network composed of more than one perceptron. In this NN, the input layer receives the input vector, the output layer makes a decision and provide a prediction, and in between these two, there exists a number of hidden layers. The input signal moves forward from the input layer to the output layer, through the hidden layers. Let represents the number of input units, represents the number of hidden units and represents the number of output units. Each hidden layer unit is defined as:
where is an activation function on hidden layers, and . The common choice for is a logistic function . this hoice is important and depends on the nature of the data.
Similarly, the output units are denoted as:
where is another activation function and . Again, the common choices are logistic functions.
Considering the layers:
The number of parameters required to build this neural network is :
So depending on the weights and the choice of activation functions, different values of outputs are obtainable.
When the decision of the output layer is computed against the real outputs, the error will be calculated. The parameters of the model, the weights and the bias terms, will be adjusted during the training in order to minimize the error.
where represents the error on the output unit and is the input that caused the error and is the learning rate which defines how much to change the weight to correct the error.
Partial derivative of the error w.r.t. the weights and biases are back propagated through the MLP. Therefore the change of weights in the hidden layer follows the previous formula except the error term which will be computed as:
where is the transfer derivative. The network keeps doing these forward and backward passes until the error can go no lower. This state is called convergence.
3.4 Support Vector Machines (SVM)
SVM is an algorithm that implements non linear boundaries between classes by transforming the input data into a high dimensional space. This mapping into a new space is a task of kernel functions which make the input data set linearly separable. In the new space, SVM constructs a maximal margin hyperplane which provides a maximum separation between output classes. The training observations that are closest to the maximal margin hyperplane are called support vectors. It introduced first by Vapnik in a form of quadratic optimization problem with bound constrains and one linear equality constraints. The standard SVM classifier is only applicable on binary type classification problems.
such that and represent the weight vector and bias respectively. represents a mapping function and the optimal separating hyperplane.
The formula above is equivalent to
for
This formula constructs two hyperplanes on the opposite sides of the optimal hyperplane with the total margin size of .
The classifier takes a decision then based on a decision function
In order to solve the linearly nonseparable problems, it is general to use a slack variable which allows a missclassification error. The optimization problem containing the weighted missclassification error will be formed as:
subject to 
where
is a tuning hyperparameter that weights the importance of missclassification errors. This optimization problem is solvable using Lagrangian where the Lagrangian multiplier
exists for each training observation. The nonzero ’s represent the support vectors in the training set.Therefore, the formula above leads to a dual problem:
subject to 
where is the vector of ones, is a positive semidefinite matrix and is a kernel function. The choice of kernel functions depends on the type of the problem in hand. The linear kernel function is , the radial basis kernel function is , the polynomial kernel function with degree is and the sigmoid kernel is
The final SVM classifier will be in a form of:
3.5 Multi Class Support Vector Machines
The conventional SVM explained above is a binary classifier. For multiclass classification problems, we need to modify its structure. There are different approaches that extends binary SVM structure to solve the multiclass classification problems, two of which have been explained and used in this study.
3.5.1 One versus All Approach
Consider an Mclass classification problem. The problem can be decomposed into M binary subproblems. Oneagainstall approach constructs M binary SVM classifiers, each of which separates one class from the other M1 classes. Let N be the number of training samples . The th SVM classifier is trained with all the training examples of the th class with positive labels, and all the others with negative labels (Liu and Zheng, 2005; Kim and Ahn, 2012).
3.5.2 One versus One Approach
This approach constructs binary SVM classifiers for all pairs of classes; in total, there are pairs. For each given pair, binary SVM classifier follows the optimization problem above to maximize the margin between classes (Kim and Ahn, 2012).
4 Research Methodology
In order to have a fair assessment of the methodologies developed to predict credit rating, we need to have an equal evaluation footing for the methods. To do so we need to use the same dataset with the same structure of training, validation and testing. Second we need to have a fair measure for performance evaluation. We introduce a so called Notch measure in section 4.2 that we believe it is appropriate for assessing the performance of credit rating algorithms. In our applications we focus on methods described in section 3. These are the methods used by the majority of credit rating literature (see Tables 1, 2, and 3).
4.1 Data sets
Previous Studies show that machine learning techniques are able to predict corporate credit ratings using their historical financial variables (Huang et al., 2004; Hájek, 2012; Pai et al., 2015). In this study, we consider stocks of financial sector, stocks of energy sector and stocks of healthcare sector. The input data set covers these corporate historical financial variables from 19902018 for financial sector and from 2009 to 2018 for energy and healthcare sectors. These variables are taken from both Bloomberg and Compustat and have been merged together to reduce the number of missing values. Among all available variables, we selected financial variables for financial sector and variables for both energy and healthcare sectors in order to predict the credit ratings. The primary factors in the selection of these variables are the availability of data and the influence of a variable on the credit rating based on previous studies (Huang et al., 2004). The output data which are the corresponded credit ratings are taken from S&P and it contains rating classes starting from AAA to CC. These variables are listed in Tables 4,5. Table 6 and Figure 1 present the distribution of each sector with respect to rating classes.
Variables  

Asset/Equity  
Total Common Equity  
Total Asset  
Total Invested Capital  
Total Debt/Total Equity  
Total Debt/Total Asset  
Total Liabilities  
Sgort and Long term Debt  
Long Term Borrow  
Return on Asset  
Debt/Market Cap  
Operating Margin  
ISOPERINC  
Net Income  
Profit Margin  
EPSFORRATIOS 
Ratio  

Debt/EBITDA  
FFO/Total Debt  
EBITDA/Interest  
FFO/Interest  
CFO/Debt  
FFO/Net Profit  
NWC/Revenue  
Current Asset/Current Liabilities  
(FFO+Cash)/Current Liabilities  
EBITDA/Revenues  
Cash/Total Debt  
Total Debt/Tangible Net worth  
Total Debt/Revenue  
Debt/Capital  
Cash/Asset  
Total Fixed Capital/Total Fixed Assets  
Equity/Asset  
NWC/Total Assets  
Retained Earnings/Total Assets  
EBITDA/Total Assets 
Rating Classes  Financial Sector  Energy Sector  Healthcare Sector 

AAA  1  1  6 
AA+  3  1  1 
AA  12  1  5 
AA  24  4  5 
A+  30  4  10 
A  34  7  16 
A  36  11  18 
BBB+  33  15  19 
BBB  18  18  20 
BBB  12  16  17 
BB+  6  10  14 
BB  4  9  13 
BB  2  5  6 
B+  1  3  3 
B  1  1  2 
B  1  
CCC+  1  
CCC  1  
CCC  
CC  1 
4.2 Evaluation Methodology
The typical way to evaluate a machine learning model is the standard accuracy as the proportion of observations correctly classified in the test data. For example, the column “Accuracy” in Tables 1, 2, and 3 present a summary of these percentages in prior work. A quick inspection of these numbers appear to say that accurate predictions based on publicly available financial variables using machine learning techniques is achievable. However, predicting credit rating is a problem that goes beyond most other classification problems.
First, the credit rating data is temporal. That is, a prediction has to be done at a point in time and data from future quarters may not be used in prediction. Most techniques divide data into 80% training and 20% test data sampled at random which means invariably that future observations will be used in the model development. We will investigate the models performance when the test data is made of future observations.
Second issue is that corporate credit ratings do not change very often. Due to this, in about of observations the credit rating does not change with respect to the previous quarter and in only of observations the credit rating changes. Therefore, creating a simple model that just predicts the same credit rating as the previous quarter will have a 90% accuracy! This beats the majority of the results in Tables 1, 2, and 3. Thus, we believe it is crucial to investigate the performance of the classifier on those observations where the rating is changing from the previous quarter.
Third issue is that the proportion of observations correctly classified tells us nothing about how far from the true rating were the inaccurate predictions. We quantify the distance between predictions and real ratings using what is known in literature as Notches distance. To create a numerical value of the distance we sort the ratings from best (AAA) to worst (CC). We give a numerical value to each rating (e.g., AAA is )
Definition 1 (Notch Distance).
To formulate this distance, we let denote the true rating of an observation. Let
denote the prediction given by a particular model. Then define a random variable: Notch Distance as
. To asses the distribution of this variable for a particular model that makes predictions , we calculate the frequency of the notch value:where is the total number of observations in the test set, , and is the indicator function taking value every time the condition is satisfied and otherwise.
In the definition above is the proportion of observations predicted correctly. To create a measure of accuracy we can use the random variable created. Specifically, we can calculate the expected value:
and we may regard it as a Dissimilarity Coefficient. The numerical value is the expected notch distance corresponding to a particular method. In a similar way we may define an Absolute Dissimilarity Coefficient
We can of course calculate the standard deviation of the variable
and that would be a measure of how far the typical prediction is from the actual rating.Theorem 1 (Jensen’s Inequality).
Let be a random variable with finite expectation, and let be a convex function whose domain includes the range of X. Then,
If the function is strictly convex, then the inequality holds with equality if and only if is degenerate.
As a consequence of Jensen’s inequality, let and . Then, and hence Jensen’s inequality reads
.
Both these measures will be influenced by the percentage of the correct predictions (value ). To have a more accurate measure in terms of average notch distance when the prediction is wrong we can calculate a conditional expectation:
This conditional expectation eliminates the correct prediction and its expectation should be a better measure of how many notches we expect the algorithm to be off when the prediction fails.
5 Experiment Results and Analysis
In this section we study the performance of four Machine learning techniques: Bagged Decision Tree (BDT), Random Forest (RF), Multiple Layer Perceptron (MLP) and Support Vector Machine (SVM) techniques to predict corporate credit ratings. Figures 2,3,4 illustrate some examples of the type of results obtained. Figures depict the training set (blue circle), test set (red circle) and predictions (green x) for various stocks for three methods: Bagged Decision Tree, Random Forest, and MLP.
5.1 Analysis of Prediction Accuracy for different Machine Learning models
To address overfitting and obtain a more accurate results, we use a, so called, 10fold cross validation procedure. Specifically, each sector data is split into three subsets, a training set, a validation set and a test set. For example, for financial sector dataset, a training set of , which is data points, a validation set of which is data points and a test set of which is data points of input data have been considered. Table 7 presents prediction accuracy of each of these techniques based on 10fold cross validation procedure. The SVM technique is a binary type classification technique. However, in our case we have several categories that need to be classified not just two. Thus we use two different SVMbased methods to classify the rating categories (Kim and Ahn, 2012).
Financial Sector  Energy Sector  Healthcare Sector  

Bagged Decision Tree  84.21%  82.11%  83.90% 
Random Forest  82.83%  84.45%  82.97% 
Multi Layer Perceptron  73.95%  78.19%  76.63% 
One Vs. OneSVM  42.12%  75.31%  71.29% 
One Vs. AllSVM  40.14%  59.17%  61.89% 
Table 7 contains the type of results that are reported by the majority of articles applying Machine Learning algorithms to the credit rating problem. According to the results we obtain it is clear that Bagged Decision Tree and Random Forest outperformed MLP and SVMbased approaches. This type of performance was observed for all datasets and periods considered. Based on the Tables 1 and 2, most of the related studies on S&P credit scores, prove that neural network and support vector machine obtain more than 80% prediction accuracy on US firms. Only one study obtained 36% accuracy of neural network on industrial firms Moody and Utans (1994).
For a financial analysis it is important to have a measure of how far the prediction is when the algorithm fails. To this end recall we introduced the Notch Distance in Definition 1 on page 1.
A summary of the distribution of the notch distance is presented in Table 8. As expected values for are exactly the prediction accuracy percentages in Table 7.
Sector  Methods  Zero  One Absolute  Greater than One Absolute 

Financial  Bagged Decision Tree  84.21%  12.01%  3.78% 
Random Forest  82.83%  12.08%  5.09%  
Multi Layer Perceptron  73.95%  15.92%  10.13%  
One Vs. OneSVM  42.12%  32.13%  25.75%  
One Vs. AllSVM  40.14%  32.28%  27.58%  
Energy  Bagged Decision Tree  82.11%  13.98%  3.91% 
Random Forest  84.45%  11.85%  3.70%  
Multi Layer Perceptron  78.19%  15.48%  6.33%  
One Vs. OneSVM  75.31%  16.98%  7.71%  
One Vs. AllSVM  59.17%  20.64%  20.19%  
Healthcare  Bagged Decision Tree  83.90%  13.51%  2.59% 
Random Forest  82.97%  11.74%  5.29%  
Multi Layer Perceptron  76.63%  16.08%  7.29%  
One Vs. OneSVM  71.29%  16.03%  12.68%  
One Vs. AllSVM  61.89%  24.50%  13.61% 
Sector  Methods  

Financial  Bagged Decision Tree  0.0501  0.7614  0.2719 
Random Forest  0.0068  0.7028  0.2731  
Multi Layer Perceptron  0.1381  1.0152  0.5114  
One Vs. OneSVM  0.2661  1.6917  1.4584  
One Vs. AllSVM  0.6263  1.4562  1.4699  
Energy  Bagged Decision Tree  0.0980  0.5817  0.2851 
Random Forest  0.0594  0.5892  0.3212  
Multi Layer Perceptron  0.0800  0.8421  0.3858  
One Vs. OneSVM  0.0685  0.7224  0.3481  
One Vs. AllSVM  0.0066  0.9117  0.6491  
Healthcare  Bagged Decision Tree  0.1975  0.3512  0.2959 
Random Forest  0.2154  0.6317  0.3223  
Multi Layer Perceptron  0.0531  0.6854  0.4486  
One Vs. OneSVM  0.1191  0.7515  0.6531  
One Vs. AllSVM  0.2461  0.9683  0.8372 
Table 9 provides expected value and standard deviation of the notch distance for every model. The columns represent the three sectors we are analyzing: financial sector, energy sector and healthcare sector. First we look at the expected value as an indication of symmetry of prediction. The standard deviation and expected absolute value are measure of how far the prediction is from the actual rating.
The results in the table generally indicate that Bagged decision tree and Random Forest methods have a better performance compared to the ANN and SVM. The largest proportion of predictions is accurate thus the notches distance is zero. We next calculate the conditional expectation of the notches measurement value when we are wrong. Table 10 shows this conditional expected values and the conditional standard deviation.
Sector  Methods  

Financial  Bagged Decision Tree  0.0018  0.8619  1.2795 
Random Forest  0.0489  0.9037  1.1451  
Multi Layer Perceptron  0.2391  1.3294  1.3912  
One Vs. OneSVM  0.0911  1.8011  2.0594  
One Vs. AllSVM  0.7216  1.4991  1.9171  
Energy  Bagged Decision Tree  0.0770  0.3735  0.8264 
Random Forest  0.0885  0.3010  0.7587  
Multi Layer Perceptron  0.1317  0.5615  0.9991  
One Vs. OneSVM  0.2415  0.3512  0.9387  
One Vs. AllSVM  0.5131  0.4411  1.2853  
Healthcare  Bagged Decision Tree  0.1973  0.2218  0.5313 
Random Forest  0.4516  0.3054  0.6214  
Multi Layer Perceptron  0.2712  0.3614  0.7778  
One Vs. OneSVM  0.0710  0.5784  1.0496  
One Vs. AllSVM  0.4052  0.6413  1.2384 
Looking at the standard deviation and expected absolute for the unconditional Notches distance in Table 9 it would appear that the performance for all sectors is roughly similar. However, looking at the conditional numbers in Table 10 we see that healthcare and perhaps energy sectors are predicted better than the financial sector. The numerical values in Table 10 also provide information about how far the prediction is when it is wrong. For example the Bagged decision tree is on the average notches away from the true value.
5.2 Comparing the performance of the various Machine Learning Credit Ratings with the differences between different credit rating agencies.
In the previous section we compared the accuracy of the prediction of Machine learning models with the corporate ratings provided by Standard and Poor’s, Moody’s and Fitch. These three credit agencies do not always agree on the ratings of companies. Therefore, the variation of their ratings as measured by Notch distance is useful in order to assess a measure of an acceptable standard deviation of a specific credit rating.
In Table 11 we present the same statistics as in the previous table using quarterly ratings from each of the rating agencies. We can see that the notch distance between the rating agencies shows less agreement between themselves than most of the Machine learning methods considered in this paper. This should give some confidence in the machine learning methods in this paper.
SP and Moody’s  0.33  1.23  0.76  0.54  1.53  1.43 

SP and Fitch  0.05  0.95  0.64  0.11  1.32  1.23 
Moody’s and Fitch  0.30  1.02  0.75  0.51  1.28  1.27 
5.3 Capturing rating changes
Generally, credit ratings do not change very much from quarter to quarter. Thus for example if a dataset contains 90% of quarters in which ratings did not change from the previous quarter, a vary naive model that would predict the same rating as the one from the previous quarter would have 90% accuracy! Thus, the most important issue when predicting credit rating based on information available is about predicting a rating change when the actual rating will change. Table 12 shows the percentage of credit rating changes that have been captured by each model. These numbers are actually much lower than the precision of the models. They are larger than we expected, however, the performance of the Bagged decision tree and random forest is not that far from the MLP and SVM in capturing rating changes. These ideas will be investigated in future work as predicting changes is the most important part of the credit analysis.
Financial Sector  Energy Sector  Healthcare Sector  

Bagged Decision Tree  67.64%  88.88%  77.27% 
Random Forest  71.56%  77.77%  83.33% 
Multi Layer Perceptron  53.92%  77.77%  66.66% 
One Vs. OneSVM  29.41%  77.77%  58.33% 
One Vs. AllSVM  25.49%  33.33%  41.66% 
6 Conclusion and Future Directions
In this work we first provide a survey of current literature using machine learning techniques to predict corporate credit ratings. The survey pointed toward support vector machine (SVM) and artificial neural network (ANN) as methods that produce the most successful results. We implemented these methods and two other methods based on decision trees which in addition have the ability to select best features for the classification problems. We applied all methods for the same datasets spanning 2009 to 2018 for companies from two different sectors: energy, and healthcare and another dataset spanning 1990 to 2018 for companies from financial sector. All our results indicate that decision tree methods are in fact outperforming SVM and MLP. To check the results we introduced a Notches Distance concept. This measure helps us quantify how far the predictions are from the true values when the algorithm fails to produce the correct prediction. Using this new measure the results were unchanged. Since this is a new measure we need to have a benchmark for the results we obtained. To this end we calculated the Notch Distance using ratings provided by S&P, Moody’s and Fitch on the same companies. We found that the best algorithms produce a Notch Distance from the true ratings which is comparable with the distance between ratings produced by different rating agencies on the same company.
References
 Abdou (2009) Abdou, H. A. (2009). An evaluation of alternative scoring models in private banking. The Journal of Risk Finance, 10(1):38–53.
 Abdou et al. (2016) Abdou, H. A., Tsafack, M. D. D., Ntim, C. G., and Baker, R. D. (2016). Predicting creditworthiness in retail banking with limited scoring data. KnowledgeBased Systems, 103:89–103.

Addo et al. (2018)
Addo, P. M., Guegan, D., and Hassani, B. (2018).
Credit risk analysis using machine and deep learning models.
Risks, 6(2):38.  Akkoç (2012) Akkoç, S. (2012). An empirical comparison of conventional techniques, neural networks and the three stage hybrid adaptive neuro fuzzy inference system (anfis) model for credit scoring analysis: The case of turkish credit card data. European Journal of Operational Research, 222(1):168–178.
 Angelini et al. (2008) Angelini, E., di Tollo, G., and Roli, A. (2008). A neural network approach for credit risk evaluation. The quarterly review of economics and finance, 48(4):733–755.
 Bahrammirzaee (2010) Bahrammirzaee, A. (2010). A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems. Neural Computing and Applications, 19(8):1165–1195.
 Bottou et al. (1994) Bottou, L., Cortes, C., Denker, J. S., Drucker, H., Guyon, I., Jackel, L. D., LeCun, Y., Muller, U. A., Sackinger, E., Simard, P., et al. (1994). Comparison of classifier methods: a case study in handwritten digit recognition. In Pattern Recognition, 1994. Vol. 2Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conference on, volume 2, pages 77–82. IEEE.
 Brabazon and O’Neill (2006) Brabazon, A. and O’Neill, M. (2006). Credit classification using grammatical evolution. Informatica, 30(3).
 Breiman (2001) Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
 Breiman et al. (1984) Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and regression t rees (monterey, california: Wadsworth).
 Brennan and Brabazon (2004) Brennan, D. and Brabazon, A. (2004). Corporate bond rating using neural networks. In ICAI.
 Cao et al. (2006) Cao, L., Guan, L. K., and Jingqing, Z. (2006). Bond rating using support vector machine. Intelligent Data Analysis, 10(3):285–296.
 Chandra et al. (2009) Chandra, D. K., Ravi, V., and Bose, I. (2009). Failure prediction of dotcom companies using hybrid intelligent techniques. Expert Systems with Applications, 36(3):4830–4837.
 Chaveesuk et al. (1999) Chaveesuk, R., SrivareeRatana, C., and Smith, A. E. (1999). Alternative neural network approaches to corporate bond rating. Journal of Engineering Valuation and Cost Analysis, 2(2):117–131.
 Chen (2012) Chen, Y.S. (2012). Classifying credit ratings for asian banks using integrating feature selection and the cpdabased rough sets approach. KnowledgeBased Systems, 26:259–270.
 Chen and Cheng (2013) Chen, Y.S. and Cheng, C.H. (2013). Hybrid models based on rough set classifiers for setting credit rating decision rules in the global banking industry. KnowledgeBased Systems, 39:224–239.
 Cortes and Vapnik (1995) Cortes, C. and Vapnik, V. (1995). Supportvector networks. Machine learning, 20(3):273–297.
 Culkin and Das (2017) Culkin, R. and Das, S. R. (2017). Machine learning in finance: The case of deep learning for option pricing. Journal of Investment Management, 15(4).
 Daniel et al. (2019) Daniel, C., Hančlová, J., el Woujoud Bousselmi, H., et al. (2019). Corporate rating forecasting using artificial intelligence statistical techniques. Investment Management & Financial Innovations, 16(2):295.
 Dima and Vasilache (2016) Dima, A. M. and Vasilache, S. (2016). Companies default prediction using neural networks. Romanian Journal of Economic Forecasting, 19(3):127.
 Dutta and Shekhar (1988) Dutta, S. and Shekhar, S. (1988). Bond rating: a nonconservative application of neural networks. In Proceedings of the IEEE international conference on neural networks, volume 2, pages 443–450. Los Alamitos: IEEE Press.
 Feng et al. (2018) Feng, G., He, J., and Polson, N. G. (2018). Deep learning for predicting asset returns. arXiv preprint arXiv:1804.09314.
 Gangolf et al. (2016) Gangolf, C., Dochow, R., Schmidt, G., and Tamisier, T. (2016). Automated credit rating prediction in a competitive framework. RAIROOperations Research, 50(45):749–765.
 Ganguli and Dunnmon (2017) Ganguli, S. and Dunnmon, J. (2017). Machine learning for better models for predicting bond prices. arXiv preprint arXiv:1705.01142.
 Garavaglia (1991) Garavaglia, S. (1991). An application of a counterpropagation neural network: simulating the standard and poor’s corporate bond rating system. In Artificial Intelligence Applications on Wall Street, 1991. Proceedings., First International Conference on, pages 278–287. IEEE.
 Gogas et al. (2014) Gogas, P., Papadimitriou, T., and Agrapetidou, A. (2014). Forecasting bank credit ratings. The Journal of Risk Finance, 15(2):195–209.
 Gu et al. (2019) Gu, S., Kelly, B. T., and Xiu, D. (2019). Empirical asset pricing via machine learning. 31st Australasian Finance and Banking Conference 2018.
 Hájek (2012) Hájek, P. (2012). Credit rating analysis using adaptive fuzzy rulebased systems: an industryspecific approach. Central European Journal of Operations Research, 20(3):421–434.
 Hajek and Michalak (2013) Hajek, P. and Michalak, K. (2013). Feature selection in corporate credit rating prediction. KnowledgeBased Systems, 51:72–84.
 Hájek and Olej (2011) Hájek, P. and Olej, V. (2011). Credit rating modelling by kernelbased approaches with supervised and semisupervised learning. Neural Computing and Applications, 20(6):761–773.
 Hájek and Olej (2014) Hájek, P. and Olej, V. (2014). Predicting firms’ credit ratings using ensembles of artificial immune systems and machine learning–an oversampling approach. In IFIP International Conference on Artificial Intelligence Applications and Innovations, pages 29–38. Springer.
 Hajek et al. (2016) Hajek, P., Olej, V., and Prochazka, O. (2016). Predicting corporate credit ratings using content analysis of annual reports–a naïve bayesian network approach. In FinanceCom 2016, pages 47–61. Springer.
 Horrigan (1966) Horrigan, J. O. (1966). The determination of longterm credit standing with financial ratios. Journal of Accounting Research, pages 44–62.
 Huang et al. (2005) Huang, W., Nakamori, Y., and Wang, S.Y. (2005). Forecasting stock market movement direction with support vector machine. Computers & Operations Research, 32(10):2513–2522.
 Huang et al. (2004) Huang, Z., Chen, H., Hsu, C.J., Chen, W.H., and Wu, S. (2004). Credit rating analysis with support vector machines and neural networks: a market comparative study. Decision support systems, 37(4):543–558.
 Hwang et al. (2009) Hwang, R.C., Cheng, K., and Lee, C.F. (2009). On multipleclass prediction of issuer credit ratings. Applied Stochastic Models in Business and Industry, 25(5):535–550.
 Hwang et al. (2010) Hwang, R.C., Chung, H., and Chu, C. (2010). Predicting issuer credit ratings using a semiparametric method. Journal of Empirical Finance, 17(1):120–137.
 Irmatova (2016) Irmatova, E. (2016). Relarm: A rating model based on relative pca attributes and kmeans clustering. arXiv preprint arXiv:1608.06416.

Kamstra et al. (2001)
Kamstra, M., Kennedy, P., and Suan, T.K. (2001).
Combining bond rating forecasts using logit.
Financial Review, 36(2):75–96.  Kanchymalay et al. (2017) Kanchymalay, K., Salim, N., Sukprasert, A., Krishnan, R., and Hashim, U. R. (2017). Multivariate time series forecasting of crude palm oil price using machine learning techniques. In IOP Conference Series: Materials Science and Engineering, volume 226, page 012117. IOP Publishing.
 Kao et al. (2012) Kao, L.J., Chiu, C.C., and Chiu, F.Y. (2012). A bayesian latent variable model with classification and regression tree approach for behavior and credit scoring. KnowledgeBased Systems, 36:245–252.
 Kaplan and Urwitz (1979) Kaplan, R. S. and Urwitz, G. (1979). Statistical models of bond ratings: A methodological inquiry. Journal of business, pages 231–261.
 Karminsky and Khromova (2016) Karminsky, A. M. and Khromova, E. (2016). Extended modeling of banks’ credit ratings. Procedia Computer Science, 91:201–210.
 Khandani et al. (2010) Khandani, A. E., Kim, A. J., and Lo, A. W. (2010). Consumer creditrisk models via machinelearning algorithms. Journal of Banking & Finance, 34(11):2767–2787.
 Khashman (2010) Khashman, A. (2010). Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes. Expert Systems with Applications, 37(9):6233–6239.
 Khemakhem and Boujelbene (2015) Khemakhem, S. and Boujelbene, Y. (2015). Credit risk prediction: A comparative study between discriminant analysis and the neural network approach. Accounting and Management Information Systems, 14(1):60.
 Kim and Sohn (2010) Kim, H. S. and Sohn, S. Y. (2010). Support vector machines for default prediction of smes based on technology credit. European Journal of Operational Research, 201(3):838–846.
 Kim et al. (1993) Kim, J. W., Weistroffer, H. R., and Redmond, R. T. (1993). Expert systems for bond rating: a comparative analysis of statistical, rulebased and neural network systems. Expert systems, 10(3):167–172.
 Kim (2003) Kim, K.j. (2003). Financial time series forecasting using support vector machines. Neurocomputing, 55(12):307–319.
 Kim and Ahn (2012) Kim, K.j. and Ahn, H. (2012). A corporate credit rating model using multiclass support vector machines with an ordinal pairwise partitioning approach. Computers & Operations Research, 39(8):1800–1811.
 Kim (2005) Kim, K. S. (2005). Predicting bond ratings using publicly available information. Expert Systems with Applications, 29(1):75–81.

Kim and Han (2001)
Kim, K.S. and Han, I. (2001).
The clusterindexing method for casebased reasoning using selforganizing maps and learning vector quantization for bond rating cases.
Expert systems with applications, 21(3):147–156.  Koh et al. (2004) Koh, H. C., Tan, W. C., and Peng, G. C. (2004). Credit scoring using data mining techniques. Singapore Management Review, 26(2):25.
 Kwon et al. (1997) Kwon, Y. S., Han, I., and Lee, K. C. (1997). Ordinal pairwise partitioning (opp) approach to neural networks training in bond rating. International journal of intelligent systems in accounting finance and management, 6:23–40.

Lee et al. (2006)
Lee, T.S., Chiu, C.C., Chou, Y.C., and Lu, C.J. (2006).
Mining the customer credit using classification and regression tree and multivariate adaptive regression splines.
Computational Statistics & Data Analysis, 50(4):1113–1130.  Lee (2007) Lee, Y.C. (2007). Application of support vector machines to corporate credit rating prediction. Expert Systems with Applications, 33(1):67–74.
 Lessmann et al. (2015) Lessmann, S., Baesens, B., Seow, H.V., and Thomas, L. C. (2015). Benchmarking stateoftheart classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1):124–136.
 Liu and Zheng (2005) Liu, Y. and Zheng, Y. F. (2005). Oneagainstall multiclass svm classification using reliability measures. Neural Networks, 2:849–854.
 Maher and Sen (1997) Maher, J. J. and Sen, T. K. (1997). Predicting bond ratings using neural networks: a comparison with logistic regression. Intelligent Systems in Accounting, Finance & Management, 6(1):59–72.
 Martens et al. (2010) Martens, D., Van Gestel, T., De Backer, M., Haesen, R., Vanthienen, J., and Baesens, B. (2010). Credit rating prediction using ant colony optimization. Journal of the Operational Research Society, 61(4):561–573.
 Moody and Utans (1994) Moody, J. and Utans, J. (1994). Architecture selection strategies for neural networks: Application to corporate bond rating prediction. In Neural networks in the capital markets, pages 277–300. Citeseer.
 Moscatelli et al. (2019) Moscatelli, M., Narizzano, S., Parlapiano, F., Viggiano, G., et al. (2019). Corporate default forecasting with machine learning. Technical report, Bank of Italy, Economic Research and International Relations Area.
 Oelerich and Poddig (2006) Oelerich, A. and Poddig, T. (2006). Evaluation of rating systems. Expert Systems with Applications, 30(3):437–447.
 Pacelli and Azzollini (2011) Pacelli, V. and Azzollini, M. (2011). An artificial neural network approach for credit risk management. Journal of Intelligent Learning Systems and Applications, 3(02):103.
 Pai et al. (2015) Pai, P.F., Tan, Y.S., and Hsu, M.F. (2015). Credit rating analysis by the decisiontree support vector machine with ensemble strategies. International Journal of Fuzzy Systems, 17(4):521–530.
 Paleologo et al. (2010) Paleologo, G., Elisseeff, A., and Antonini, G. (2010). Subagging for credit scoring models. European journal of operational research, 201(2):490–499.
 Peng et al. (2011) Peng, Y., Wang, G., Kou, G., and Shi, Y. (2011). An empirical study of classification algorithm evaluation for financial risk prediction. Applied Soft Computing, 11(2):2906–2915.
 Petropoulos et al. (2016) Petropoulos, A., Chatzis, S. P., and Xanthopoulos, S. (2016). A novel corporate credit rating system based on student’st hidden markov models. Expert Systems with Applications, 53:87–105.
 Platt et al. (2000) Platt, J. C., Cristianini, N., and ShaweTaylor, J. (2000). Large margin dags for multiclass classification. In Advances in neural information processing systems, pages 547–553.
 Ryser and Denzler (2009) Ryser, M. and Denzler, S. (2009). Selecting credit rating models: a crossvalidationbased comparison of discriminatory power. Financial Markets and Portfolio Management, 23(2):187–203.
 Saha and Waheed (2017) Saha, S. and Waheed, S. (2017). Credit risk of bank customers can be predicted from customer’s attribute using neural network. International Journal of Computer Applications, 975:8887.

Sambasivan and Das (2017)
Sambasivan, R. and Das, S. (2017).
A statistical machine learning approach to yield curve forecasting.
In
Computational Intelligence in Data Science (ICCIDS), 2017 International Conference on
, pages 1–6. IEEE.  Schölkopf et al. (1999) Schölkopf, B., Burges, C. J., and Smola, A. J. (1999). Advances in kernel methods: support vector learning. MIT press.
 Shin and Han (2001) Shin, K.s. and Han, I. (2001). A casebased approach using inductive indexing for corporate bond rating. Decision Support Systems, 32(1):41–52.
 Surkan and Singleton (1990) Surkan, A. J. and Singleton, J. C. (1990). Neural networks for bond rating improved by multiple hidden layers. In Neural Networks, 1990., 1990 IJCNN International Joint Conference on, pages 157–162. IEEE.
 Tsai and Wu (2008) Tsai, C.F. and Wu, J.W. (2008). Using neural network ensembles for bankruptcy prediction and credit scoring. Expert systems with applications, 34(4):2639–2649.

Vapnik (2013)
Vapnik, V. (2013).
The nature of statistical learning theory
. Springer science & business media.  Veronezi (2016) Veronezi, P. H. (2016). Corporate credit rating prediction using machine learning techniques. Master’s thesis, Stevens Institute of Technology.
 Wallis et al. (2019) Wallis, M., Kumar, K., and Gepp, A. (2019). Credit rating forecasting using machine learning techniques. In Managerial Perspectives on Intelligent Big Data Analytics, pages 180–198. IGI Global.
 Wang et al. (2009) Wang, A., Yuan, W., Liu, J., Yu, Z., and Li, H. (2009). A novel pattern recognition algorithm: Combining art network with svm to reconstruct a multiclass classifier. Computers & Mathematics with Applications, 57(1112):1908–1914.
 West (2000) West, D. (2000). Neural network credit scoring models. Computers & Operations Research, 27(1112):1131–1152.
 West (1970) West, R. R. (1970). An alternative approach to predicting corporate bond ratings. Journal of Accounting Research, pages 118–125.
 Wu and Hsu (2012) Wu, T.C. and Hsu, M.F. (2012). Credit risk assessment and decision making by a fusion approach. KnowledgeBased Systems, 35:102–110.
 Xiao et al. (2016) Xiao, H., Xiao, Z., and Wang, Y. (2016). Ensemble classification based on supervised clustering for credit scoring. Applied Soft Computing, 43:73–86.
 Ye et al. (2008) Ye, Y., Liu, S., and Li, J. (2008). A multiclass machine learning approach to credit rating prediction. In Information Processing (ISIP), 2008 International Symposiums on, pages 57–61. IEEE.
 Yeh et al. (2012) Yeh, C.C., Lin, F., and Hsu, C.Y. (2012). A hybrid kmv model, random forests and rough set theory approach for credit rating. KnowledgeBased Systems, 33:166–172.
 Yu et al. (2008) Yu, L., Wang, S., and Lai, K. K. (2008). Credit risk assessment with a multistage neural network ensemble learning approach. Expert systems with applications, 34(2):1434–1444.
 Zhao et al. (2015) Zhao, Z., Xu, S., Kang, B. H., Kabir, M. M. J., Liu, Y., and Wasinger, R. (2015). Investigation and improvement of multilayer perceptron neural networks for credit scoring. Expert Systems with Applications, 42(7):3508–3516.
 Zhong et al. (2014) Zhong, H., Miao, C., Shen, Z., and Feng, Y. (2014). Comparing the learning effectiveness of bp, elm, ielm, and svm for corporate credit ratings. Neurocomputing, 128:285–295.
Comments
There are no comments yet.