A Sparsity Algorithm with Applications to Corporate Credit Rating

by   Dan Wang, et al.

In Artificial Intelligence, interpreting the results of a Machine Learning technique often termed as a black box is a difficult task. A counterfactual explanation of a particular "black box" attempts to find the smallest change to the input values that modifies the prediction to a particular output, other than the original one. In this work we formulate the problem of finding a counterfactual explanation as an optimization problem. We propose a new "sparsity algorithm" which solves the optimization problem, while also maximizing the sparsity of the counterfactual explanation. We apply the sparsity algorithm to provide a simple suggestion to publicly traded companies in order to improve their credit ratings. We validate the sparsity algorithm with a synthetically generated dataset and we further apply it to quarterly financial statements from companies in financial, healthcare and IT sectors of the US market. We provide evidence that the counterfactual explanation can capture the nature of the real statement features that changed between the current quarter and the following quarter when ratings improved. The empirical results show that the higher the rating of a company the greater the "effort" required to further improve credit rating.


page 6

page 7


Counterfactual explanation of machine learning survival models

A method for counterfactual explanation of machine learning survival mod...

Interpretable Credit Application Predictions With Counterfactual Explanations

We predict credit applications with off-the-shelf, interchangeable black...

TDLS: A Top-Down Layer Searching Algorithm for Generating Counterfactual Visual Explanation

Explanation of AI, as well as fairness of algorithms' decisions and the ...

PermuteAttack: Counterfactual Explanation of Machine Learning Credit Scorecards

This paper is a note on new directions and methodologies for validation ...

Explainable Enterprise Credit Rating via Deep Feature Crossing Network

Due to the powerful learning ability on high-rank and non-linear feature...

An Artificial Intelligence approach to Shadow Rating

We analyse the effectiveness of modern deep learning techniques in predi...

Counterfactual Explanation Based on Gradual Construction for Deep Networks

To understand the black-box characteristics of deep networks, counterfac...

1 Introduction

Corporate credit rating is an assessment of the credit risk level of a company. Generally, it is issued by a credit rating agency such as Standard and Poor, Moody’s, or Fitch (S&P Global, 2018). (Standard and Poor’s Corporation, 1981). The rating expresses the agency’s opinion about a company’s ability to meet its financial obligation in full and on time. This rating serves as an aid to financial investors in order to assess various investment opportunities. Since the credit rating is supposed to be a uniform measure across companies, it enables investors to compare risk levels of companies which issue the financial instruments present in their portfolios.

Thus, credit rating is a very important measure for companies. Credit rating expedites the process of purchasing and issuing bonds by providing an uniform and efficient measure of credit risk (Akdemir and Karslı, 2012). Thus, instead of borrowing loans from banks, public companies are more likely to raise money from capital markets by issuing bonds and notes. A good credit rating is beneficial to companies. Bond yields are negatively related to credit rating (Luo and Chen, 2019). That is, a higher credit rating can help public companies raise funds for a lower repayment cost. Further, a good credit rating means that the company is less likely to default on their obligations, thus attracting risk-averse investors such as pension funds and mutual funds (Dittrich, 2007).

Recent literature is implementing machine learning and deep learning techniques to assess public corporations’ credit rating. In the US markets, corporate credit rating has been evaluated using Support Vector Machines (SVM’s), Tree based models and network learning methods such as Artificial Neural Network (ANN), Convolutional Neural Network (CNN) and Long short Term Memory (LSTM)

(Ye et al., 2008; Wallis et al., 2019; Hájek and Olej, 2011, 2014; Golbayani et al., 2020b; Wang et al., 2020). Deep learning techniques are popular to assess credit risk in European market and Asian market as well (Khashman, 2010; Kim and Sohn, 2010; West, 2000; Khemakhem and Boujelbene, 2015; Zhao et al., 2015; Addo et al., 2018).

Even though machine learning and deep learning methods have achieved considerable accuracy for various types of classification problems (LeCun et al., 2015), and in particular for credit risk assessment (Golbayani et al., 2020a), the constructed neural networks continue to be treated as a black-box method. This black-box maps the input features into a classification output without low-level explanation (Chakraborty et al., 2017; Carvalho et al., 2019). However, it is very important for a model to provide visibility and interpretability. Which specific features are important and how these features impact the output. In finance, interpretability of the model is critical, as it is required by law. Financial regulation provides investors the right to receive an explanation of the algorithms used by investment firms (Protection Regulation, 2018; General Data Protection Regulation, 2016; Goodman and Flaxman, 2017).

Interpretable machine learning is a fast growing field that addresses this issue. It is defined as the use of machine learning or deep learning models to extract relevant knowledge about domain relations contained in the data (Murdoch et al., 2019). The interpretability of a machine learning method may be divided into: (1) model explanation, and (2) post-hoc explanation. Model explanation means that the model is inherently interpretable and can generate explanations when trained (Yang et al., 2016). The post-hoc explanation, refers to the capability of the model to generate an explanation based on existing decisions (Mordvintsev et al., 2015; Plumb et al., 2018).

In mathematics once a functional relationship between an input and an output is constructed111i.e., a deterministic function, one is able to determine the preimage of a set. That is, calculate defined as the set of inputs that take values in , where is a set in the co-domain of . Generally speaking, a machine learning technique creates a relationship , where is provided by the ML technique used, and typically is categorical. Thus, given another input , a constructed ML technique is able to calculate the output of by simply calculating . However, providing the preimage of a specific category is hard. Often, the domain set is ill defined, and furthermore there is randomness in the ML technique. For instance, if a particular

has a 50/50 probability of being in category

respectively , then should the preimage of contain or rather should be in the preimage of ?

In order to answer such questions, the counterfactual explanation was introduced for ML techniques (Wachter et al., 2017). Specifically, given a ML technique that associates an input with an output , if we want the value of to be how should the particular input be modified so that the value of the modified is

? The counterfactual explanation technique was applied to image recognition, healthcare and language models

(Goyal et al., 2019; Huang et al., 2019; Prosperi et al., 2020)

. The closest application of counterfactual explanation to finance we could find is to credit cards application with a binary black box classifier

Grath et al. (2018). Specifically, in the paper cited, the author use the counterfactual explanation to provide advice about how to change applications for credit cards in order to have a successful outcome. The authors solve a typical optimization problem using a Median Absolute Deviation (MAD) norm. In our work we are modifying the optimization problem by focusing on the sparsity of the counterfactual solution.

Figure 1: Example of counterfactual explanation

We next discuss the challenges faced when calculating a counterfactual explanation. In a classification problem a high dimensional input is assigned through to a in a countable (often finite) set of outputs. Therefore, the ML “function” is not injective. This means that for each output there is a range of inputs which are mapped into it. Now look at Fig.1 which describes our credit rating problem. Obviously, the solution is not unique. That is, there are multiple ’s which will map into the new . To make matters even more complex the function is in fact “probabilistic”. For example, the same input is associated to output with probability , and with output with probability . Since the ML decided but in fact, the magnitude of the probabilities needs to be taken into consideration as well.

When studying corporate credit rating, there are two considerations worth mentioning. First, a financial statement contains a large number of features. For instance, the Compustat® dataset (Compustat, 2019), contains financial accounting variables collected from the original quarterly financial statements. It is impractical for a company to focus on changing all features. Second, some of the features may not be possible to change, for example: Comprehensive Income - Noncontrolling Interest, Equity in Earnings (I/S) - Unconsolidated Subsidiaries.

The purpose of this paper is to set up a proper optimization problem to address these issues. The central idea is to minimize the number of features modified for , where is the counterfactual explanation. In finance, particularly when applied to corporate credit rating, this allows company’s decision makers to focus their attention when trying to improve company’s credit rating. In section 2 we describe the optimization problem and propose an algorithm we call “the sparsity algorithm” to solve it. Section 3 presents experiment results obtained from both simulated data and real rating data.

2 Methodology

In this section, we describe the optimization problem and the algorithm introduced to solve this problem.

2.1 Statement of problem

As described in the introduction, the goal of this work is to discover the smallest subset of input data that can realistically be changed, so that the output of the model is reclassified for this changed input. To achieve this goal we propose solving a minimization problem.

Specifically, given a trained deep learning model which relates input variable with a specific classification , the problem is to find such that the response of is a different class than the response for . However, only certain components of can realistically be modified. Further, the problem attempts to find the smallest modification of which will accomplish the respective reclassification. Thus, we minimize the L0 “norm” of , and we impose a mask . The problem to solve is expressed mathematically as:



is a mapping which counts the nonzero elements of a vector. This operator, often described as the L0 norm, is not actually a norm. However, it is used extensively in the Machine Learning area

(Shukla and Fricklas, 2018). is the Hadamard product that calculates the product element-wise of two matrices of the same dimensions. is a predefined vector with values 0 and 1, which masks the input components of which are not modifiable. is the desired output.

With denoting the solution of the problem (1), the counterfactual explanation is (Wachter et al., 2017). In a credit rating problem, is typically taken as a one grade upgrade from the original credit rating , but in principle it may be any target rating.

Remark 1.

In this paper

is modeled using a Multi-Layer Perceptron (MLP). MLP is the most prevalent network architecture for credit rating problems

Ahn and Kim (2011); Huang et al. (2004); Kumar and Haynes (2003); Kumar and Bhattacharya (2006). In credit rating applications the output layer of

contains the distinct classes of the corporate credit rating. The error in prediction is obtained by applying a categorical cross-entropy loss function on the output layer. A

GridSearch has been applied to find the optimal values of the MLP hyper parameters for our specific datasets.

2.2 The Algorithm

The sparsity minimization problem is a well studied problem (Yuan and Ghanem, 2016; Cai et al., 2013; Zhang et al., 2020). However, the problem as written in (1) is still very challenging when the function is complicated as is the case of a deep neural network. The difficulty comes from L0 not being a norm, as well as from the function in the problem having a very complex form. In previous work Grath et al. (2018), the authors use Median Absolute Deviation (MAD) in order to impose sparsity. In our case their approach is not feasible for two reasons. First, MAD imposes sparsity by minimizing the size of the change of certain features (the features that are far from the median). In practice, this results in changing ALL components with some components having relatively small changes. For our finance applications, sparsity means that most components of the change have to be exactly . Second, the MAD weights used in the optimization are determined automatically from the dataset. In our application, some features cannot be modified. Thus, we need to define the problem in a way that will allow the algorithm to only modify pre-specified features.

In our approach to solve (1), we replace with the norm. There are two reasons for this. First, the norm has been previously used as a regularizer, to increase sparsity (Bruckstein et al., 2009; Selesnick, 2017). Second, since it is a proper norm, we can rewrite the problem (1) as the following unconstrained optimization problem.


Note that problem (2) treats the output of as a single number. However, as mentioned in the introduction, most Machine Learning methods take the decision based on a likelihood set of probabilities associated to each of the discrete outputs. To handle this issue, is replaced with the set of probabilities denoted (the output distribution). The output is replaced with the ideal probability set (Janocha and Czarnecki, 2017). We thus replace the first part of the loss function in equation (2) with the cross-entropy (Kline and Berardi, 2005) in equation (3

). In this way, we can inform the managers how their entire credit rating probability distribution will be modified following the algorithm’s recommendation.


Since the problem is now unconstrained we can use the gradient descent method to solve this problem. Gradient descent is a good way to solve such optimization problems when the objective function is convex and differentiable (Cauchy and others, 1847; Curry, 1944).

However, the solution of the unconstrained problem (3) is not necessarily sparse and also cannot guarantee that . This means that the counterfactual solution may not always produce a better credit rating. To solve these issues, we propose a new algorithm as follows.

1:Define the masking variable by an accounting expert or the client company.
2:Solve equation (3) using the gradient descent to get . Due to the masking variable , the only has nonzero coordinates which correspond exactly to the values of in the .
3: is generally too large and we want to focus on a small number of changes. Let denote the practical number of changes that may be implemented. Let
denote the absolute magnitude of change relative to the original vector .
4:We set when the component .
5:We construct vectors in the following way. For each in , holds the values in which correspond to the largest values in . All other components in are set to 0.
6:Set t = 1
7:while  & & …& &  do
8:     increase by a predefined proportion
9:     Do steps 2 to 4
10:     t = t + 1
11:return , where
Algorithm 1 Sparsity algorithm

Comparing with the solution obtained directly from equation (3), the sparsity algorithm produces a vector with a large number of components (a sparse vector). The algorithm accomplishes this task by following three main steps. First, the algorithm calculates the change ratio for each element in the output vector relative to the original vector. Second, it constructs candidate vectors, going less sparse from vector to vector . The algorithm repeatedly solves the problem by putting more and more importance on the boundary condition (). We end the process if there is at least one candidate solution which qualifies the input for the better rating . If there is no solution to the sparsity algorithm, we interpret it as the rating may not be changed in a simple way for the given input vector .

In the last loop there are two issues when returning the final value .

Remark 2.

If the in not unique and there are multiple solutions for example and , we can choose the final output based on whether or achieve a smaller value in equation (3) or we can chose the variant with the smallest number of nonzero coordinates.

Remark 3.

If we reach step and there is no solution then the rating of the company cannot be improved based on the existing credit rating model.

Remark 4 (Step 4 in the algorithm).

When a component of equals the relative change in step 3 for that component will be infinite. This forces the sparsity algorithm to favor choosing the features with values equal to . This step in the sparsity algorithm is introduced to set a ceiling for the ratio in order to resolve this issue. However, we will discuss another possible solution in Section 3.2.1 when we apply the algorithm to financial data.

2.3 What is the practical importance of the sparsity algorithm?

The idea of this work is simple. Given a learned algorithm , which associates a categorical (rating) to an input , can we find a change in so that the new rating associated to the changed (counterfactual) is now ? In this context, we call the distance () between the original input and the counterfactual input as effort. In the context of credit rating this calculates how much actual effort has to be put in changing the qualified rating. Having a sparser solution , may translate into a smaller effort to change while making sure that the output class of has been improved by at least one notch.

3 Empirical Results

We will be using two sets of data to demonstrate the validity of the proposed algorithm. First, we shall use synthetically generated data to illustrate the performance of the algorithm on a simple to understand case. Second, we use quarterly fundamental data obtained from the Compustat Database (Compustat, 2019). The fundamental data contains 332 accounting variables including balance sheet data, income statement data, etc. We use Standard and Poor’s credit ratings as the target rating . The case study 2 is the real financial study we wish to analyze.

In this work we are interested in answering three different questions.

  1. Are the results of the sparsity algorithm intuitively correct when using the synthetically generated data?

  2. Is it possible to improve credit rating with less effort then it actually happened in reality?

  3. Does the effort to improve rating depend on the rating? Specifically, do we need to exert more effort when changing rating from non-investment grade to investment grade than to change rating within the investment grade?

3.1 Case study: synthetically generated data

Since is the solution to a machine learning algorithm, in principle we could attempt to prove mathematically that the sparsity algorithm can solve the equation (1). The Lagrange multiplier version of the problem in equation (3) is well posed and the gradient descent will provide the optimal solution. The sparsity algorithm imposes constraints on the solution and it fundamentally is checking how close the solution is to the original problem (1). Thus the mathematical proof idea is to show that the sparsity algorithm produces an improvement at every step and that in the limit we obtain the solution of (1).

However, such proof would be dry and would only bring joy to mathematically inclined. We chose to follow a different approach. In this section we design an intuitive case study by synthetically generating data in such a way that would have an easy to understand solution. We compare the solutions obtained using the classical gradient descent and the solutions obtained using the sparsity algorithm. We perform matched pairs one-sided t tests on the L0 and

norms to compare these solutions.

We create a -dimensional dataset with points

, where all of the features are normally distributed random variables. We let

and denote the important variables and we let , , and be noise variables. Specifically, the and variables are each a mixture of normals with means and

and variance

. Their pdf is:

The , and are iid normally distributed with mean and variance .

Figure 2 shows the projection of the synthetically generated points on the first two coordinates. We can clearly see the centers of the classes. In this synthetically generated data, we arbitrarily define blue points as rating 1, orange points as rating 2, green points as rating 3, and red points as rating 4 (counterclockwise starting from the first quadrant). We make the convention that ratings is the best, decreasing with being the worst. In this experiment, we aim to improve the rating of the points using the smallest effort .

Figure 2: Data visualization for and

The point of this synthetically generated case study is to showcase the results of the algorithms in a context where we can plot and actually see the results.

3.1.1 Results obtained when using the synthetically generated data

As mentioned, we want to determine which coordinates need to be changed to “improve the rating”. In this simple exercise, for a point this translates into determining the “best” that will improve the class number. To illustrate the performance of the algorithm we pick 3 points (one from each class 4,3,2 respectively) which showcase the largest difference between the two algorithms used. Table 1 presents the coordinates of the 3 points chosen and the arrows on figure 3 show the counterfactual point in the improved rating class. The purple arrow is the from the graduate descent, while the yellow arrow depicts the sparsity algorithm result. Table 1 gives the numerical values of the and shows that the ratings are improved successfully.

The gradient descent solution “improves” the class by changing all coordinates. The largest changes are in the first two coordinates, as they should, while the remaining three coordinates are just noise. Compared to the solution from the gradient descent, the sparsity algorithm solution removes the “noise” from features. It picks the relevant coordinate to be changed every time.

However, we also showcase an exception (point 3). For this point the rating indeed improves from to . However, the algorithm picks the feature to change in addition to . This is due to the fact that the relative change ratio is larger for than for just by chance. The sparsity algorithm checks the result for which does not change the rating, then at the next iteration it settles on the solution.

This point 3 is one of few exceptions we observed in our results. It actually illustrates an issue we will observe in the next case study dealing with real data.

Figure 3: Algorithm applied on synthetically generated data
original vector 0.6019 -0.4742 0.0827 -0.0595 0.0588 4
GD solution () -0.767 -0.5539 0.0179 -0.0095 -0.012 3
Optimized Algo () -0.767 0 0 0 0 3
original vector -0.5488 -1.0176 0.1723 0.2329 0.4329 3
GD solution () -0.3963 1.0276 -0.0133 0.0149 0.0074 2
Optimized Algo () 0 1.0276 0 0 0 2
original vector -1.3814 0.5363 0.0031 -0.2783 -0.074 2
GD solution () 1.5275 0.4331 0.0054 -0.0134 -0.0071 1
Optimized Algo () 1.5275 0 0.0054 0 0 1
Table 1: Sample result for simulated data

This study primarily focuses on the L0 norm of as a measurement of the effort defined in section 2.3. Recall that the objective of our problem in equation (1) is to increase the sparsity of the solution. However, the norm may be viewed as another measurement of effort as it calculates the total ‘distance’ between the original point and the target.

To formally compare the solution from the gradient descent with the solution from the sparsity algorithm we perform matched pairs one-sided t tests as follows:

L0 testing
algorithm is equal to
algorithm is less than
L2 testing
algorithm is equal to
algorithm is greater than

We use all the points in the dataset to perform these tests. We treat each change separately - from to , to , and to , respectively. The average L0 and for each group and the results for the matched pairs t-tests are presented in Table 2.

From these results it is evident that the solution from the sparsity algorithm is significantly smaller than the solution from the gradient descent, i.e., requires less “effort”.

L2 from L2 from L2_diff L0 from L0 from L0_diff
sparsity GD sparsity GD
2 to 1 1.15182 1.15409 0.00227 (0.00024) 1.15255 5.00000 3.84745 (0.00693)
3 to 2 1.07638 1.07827 0.00189 (0.00022) 1.16672 5.00000 3.83328 (0.00728)
4 to 3 1.19610 1.20001 0.00390 (0.00040) 1.18393 5.00000 3.81607 (0.00758)
Table 2: Results of testing whether there is a difference in the procedures observed in L0 and the L2 norm

3.2 Case study: Quarterly financial statement data

A description of the Financial statement data used

In this section we apply the sparsity algorithm to data obtained from financial statements. Given a particular financial statement, there may be many ways in which to improve the financial stability of a company and thus increasing its credit rating. In this work, we are trying to provide a data driven answer which is based purely on the machine learning technique used. To this end, we have to assume that the machine learning technique used to determine the original is very accurate. Recall that the counterfactual problem we are focusing on, is defined for a given .

We apply the methodology to companies chosen from 3 sectors of the US economy: Financial, Healthcare, and Information Technology (IT). We first clean the data by removing features which are not reported for each of the specific sectors. The data is thus reduced to around 300 variables for each sector (294, 296, 296 respectively). Next, we define in equation (3) by analyzing all remaining features for each sector and determining whether each feature can be feasibly changed. More precisely, certain accounting variables may not be changed because of contractual obligation, unpredictable events, related to tax, city governance, etc. Table 3 groups all the reasons we found as to why accounting variables may not be changed in practice. The table also lists one accounting variable as an example for each of the reasons. A complete list of accounting variables that we found hard or impossible to change is presented in Table 11 of the Appendix 5. The remaining number of variables that are not masked by is , , and respectively for Healthcare, IT and Finance sectors.

Reasons Example
Scheduled items Pension Plan
Assets are discontinued operations Extraordinary Items and Discontinued Operations
Intangible asset Good Will
Special items Costs of Failed Acquisitions
Regulated items Tier 1 Capital Ratio
Agreements with shareholders, employees Deferred Compensation
Computational Items Depreciation & Amortization
Special events Loss from Flood/Fire
Loss/gain from subsidiary Equity in Earnings (I/S) - Unconsolidated Subsidiaries
Non-operating items Gain/Loss on Sale of Property
Table 3: The list of reasons why the respective variable may not be feasibly changed

3.2.1 Question 2: Comparing the results of the algorithms with quarters when companies changed ratings.

It is simple to visualize the results of the algorithms for the synthetically generated data. For real data, the clusters are hard to visualize, but the algorithms works in a similar way. In most cases we are able to determine a which improves rating using either the gradient descent (GD) or the sparsity algorithm. However, how relevant is this ? Suppose we find a company to listen to our advice and for the next quarter the company places resources towards changing the variables indicated by . If the targets are reached would the company improve its rating during the next quarter?

This is of course a question hard to answer. In an attempt to answer it, we focus on those quarters and companies whose ratings actually improved during the next quarter. We apply GD and the sparsity algorithms to the statements from the quarters before the rating change. To assess the effectiveness of the proposed changes we calculate the actual Real change between the two consecutive quarters when the ratings improved. Table 4 presents these values in columns 1 and 2. For example, the L0 number for the Healthcare sector Real change is calculated by looking at how many features changed between the two consecutive quarters when the rating of the company went up. We display the average number of features changed for all companies in the healthcare sector which went up in ratings. We compare this real change “effort” with the proposed changes by the two algorithms in columns 3 and 4 of the table. The numbers in these columns are calculated using only the data from the quarters before the rating changed. Mathematically, they are calculated as the respective norm of . Since both Gradient Descent (GD) and Sparsity algorithms only change a selected number of features (the unmasked features), for a proper comparison in table 4 we calculate the Real change only for the features that can be changed (column two). Similar numbers are calculated for all sectors.

Real change Real change for relevant variables Change for GD Change for Sparsity Match Rate
Healthcare L0 113.84 59.02 87.00 53.82 85.43%
L2 4744.44 4263.46 6021.42 4615.10
IT L0 119.13 61.95 87.00 60.12 87.41%
L2 12550.07 11774.29 2727.35 2057.80
Financial L0 101.10 48.39 86.00 57.24 76.58%
L2 65607.00 46018.43 11474.30 7591.71
Table 4: Results comparing with the real rating change when ignoring 0’s in the feasible data

The last column in the table 4

is labeled Match Rate. For each company that changed rating we look at the features suggested to be changed by the sparsity algorithms. We calculate what percentage of them were actually changed in the real statements between the two quarters when ratings improved. A high match rate indicate that the features selected by the sparsity algorithm are similar to the changed features in the real statements. It is worth mentioning (again) that the sparsity algorithm comes up with these features based solely on the data from the quarter BEFORE the ratings changed.

However, to obtain the numbers in the table 4 we implement the sparsity algorithm with a different Step 4 mentioned in remark 4. Specifically, step 4 is: “we set when ”. With this change the sparsity algorithm essentially ignores those feasible features in the original statement whose value is . We do this inspired by the synthetic data in the previous section. Recall the point 3 which by chance had the 3rd coordinate with a large relative change. A similar phenomenon is happening in the real statement data when there are ’s present in the unmasked set of features. The algorithm focuses on them as the relative change from is technically infinite. In fact, in the real statements some of those features do change, and that is probably why we aren’t able to capture 100% of the changes.

Looking at L0 (the number of changed features) the results are consistent. In the two quarters data the average number of features changed is between and , while the number of relevant features changed is about half the total number. The gradient descent changes all features and in fact only the distance is relevant for it. However, the sparsity algorithm produces a number of features to be changed which is similar to the real number. Although it is nice to see that we recover most of the features that actually changed this is not our goal, as we want to identify the smallest number of changes possible with a minimum effort. By neglecting the features with a value, the algorithm is probably ignoring features that might be very important to improve credit rating. This is why we are replacing the step 4 in the sparsity algorithm with “We set when the component ”, as it was in fact written in the actual algorithm. We rerun this algorithm and present the results in Table 5.

Real change Real change for relevant variables Change for GD Change for Sparsity Match Rate
Healthcare L0 113.84 59.02 87.00 22.93 61.01%
L2 4744.44 4263.46 6021.42 5239.12
IT L0 119.13 61.95 87.00 24.73 52.59%
L2 12550.07 11774.29 2727.35 1925.80
Financial L0 101.10 48.39 86.00 33.46 37.35%
L2 65607.00 46018.43 11474.30 7962.78
Table 5: Results comparing with the real rating change

Including the features in the set of possible changes helps the sparsity algorithm reduce its L0 norm. However, the match rate of the sparsity algorithm drops to around . Considering the purpose of the algorithm is to identify relevant features for improving ratings the sparsity of the is important. In terms of the norm, we note that the magnitude of change (“effort”) is reduced dramatically in the Finance and IT sectors but it is in fact increased on the average in the Healthcare sector. We believe this it normal as focusing on improving the sparsity of the solution, the feasible domain would be reduced. Thus, to qualify the solution for an improved rating, we have to exert more effort on those feasible features. This may cause an increase in the norm of the solution.

Comparatively, we observe that the companies in the financial sector need to exert more effort to improve their credit rating than companies in the IT and healthcare sectors.

Why two sparsity algorithms?

Generally, published articles do not detail all attempts and only showcase the best, which is typically the last algorithm. In this article, we chose to present a variant of the algorithm which we initially employed as well as the final algorithm version. We decided to do this as we are dealing with real data between two quarters, and our algorithm is dependent on how well the MLP is performing. Thus it is important to validate the features we obtain from using the algorithm on the previous quarter with the features actually changed in the next quarter. This match is an argument that our algorithm catches the relevant changes as well as that is producing accurate results.

This is also important for the final algorithm results in Table 5. Taken by itself the sparsity algorithm results are academic. However, when corroborated with the results in Table 4 which show it is possible to match the real changes, we think the results point to a valid way to potentially produce an improved rating during the next quarter.

3.2.2 Comparing the effort needed to improve ratings from different levels

In this section, we implement the sparsity algorithm to all observations in the dataset. We calculate the “effort” needed for a company during a particular quarter to improve during the next quarter. We aggregate the results by the specific rating and sector. We want to investigate how the effort changes depending on the ratings the company is at the time of the respective quarter. For example, is it harder to improve rating from the highest non investment grade (BB+) to an investment grade (BBB-) then it is from other ratings?

S&P rating Rating description
AAA Extremely strong capacity to meet its financial commitments Investment grade
AA+ Very strong capacity to meet its financial commitments
A+ Strong capacity to meet its financial commitments
BBB+ Adequate capacity to meet its financial commitments
BB+ Has inadequate capacity to meet its financial commitments Non investment grade
B+ Has the capacity to meet its financial commitments
CCC+ Substantial risks
CCC Extremely speculative
CCC- Default imminent with little prospect for recovery
D In default
Table 6: S&P rating description

For reference in Table 6 we present the Standard & Poor’s classification and ratings interpretation. The higher the rating, the lower the interest rate the company has to pay. Furthermore, having an investment grade rating means that the pool of investors is enlarged considerably as government regulations prevent pension funds and mutual funds from purchasing non-investment grade bonds. In fact, if any of their holdings drops below BBB-, the pension funds are required to sell, often at a loss.

IT Healthcare Financial
ori curr L2 L0 Count L2 L0 Count L2 L0 Count
AA+ AAA 108400.3 27.3 15 11183.0 30.1 13
AA AA+ 94786.6 52.0 47 174127.2 62.7 128 41469.8 34.7 85
AA- AA 59565.2 38.5 59 4879.0 37.2 126 37547.4 43.2 382
A+ AA- 23463.1 41.8 297 4854.6 28.1 268 6051.8 35.8 641
A A+ 3482.2 31.1 194 2668.8 21.2 351 15756.5 31.1 630
A- A 1372.4 28.1 217 2892.9 29.6 192 6755.0 29.1 644
BBB+ A- 2025.7 32.4 234 1096.2 25.8 308 4483.3 31.0 421
BBB BBB+ 1250.3 28.1 342 1598.6 24.5 343 2492.2 31.8 353
BBB- BBB 386.7 22.1 195 1169.3 21.9 244 2886.5 27.8 212
BB+ BBB- 1474.1 32.9 180 602.3 23.8 125 1637.3 36.5 80
BB BB+ 864.0 25.9 74 567.5 23.6 182 6191.6 47.0 19
BB- BB 1121.8 29.0 146 262.6 16.4 90 4243.2 47.7 11
B+ BB- 411.0 28.9 112 1344.1 20.5 45 1943.0 43.1 20
B B+ 402.6 16.6 39 517.9 46.7 15 4176.9 63.5 2
B- B 950.7 33.9 24 1817.3 34.5 15
CCC+ B- 1033.3 36.0 12 1033.4 41.1 9
Average 8735.4 31.3 11228.0 27.1 11288.6 33.2
Table 7: The average “effort” required to improve ratings for each rating level

Table 7 presents the L0 and averages for the sparsity algorithm as well as how many observations were in each category (“Count” column). We observe no clear pattern to indicate that the results fit with the ratings in table 6. We generally note that the “effort” needed increases when ratings are increasing. This suggests that it may be easier to increase in rating from say BB+ to BBB- than it is to increase from A+ to AA-.

However, we need to point out an issue that arises when we aggregate all these companies. Specifically, a particular credit rating says that a company is in a range of risk levels, it is not providing a specific value of risk for that company. Thus, the actual risk value for companies within the same credit rating score may be different. For example, a company XXX may be at one extreme of ranges for the AA ratings, while company YYY may be at the other extreme and still rated AA. It is obviously more difficult for one company to improve its ratings than it is for the other company. The results in the Table 7 contain all the companies for a particular rating range and it calculates an aggregate average effort. It may be impossible or very hard for a company that just improved its rating to go to an even better rating the following quarter. In the sparsity algorithm, in equation (3) controls this “difficulty”. A higher indicates this particular company is harder to improve its rating level. Thus, the “effort” needed may be larger.

Lambda AAA AA+ AA AA- A+ A A- BBB+ BBB BBB- BB+ BB BB- B+ B B-
0.1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0
10 0 0 0 0 2 6 2 0 0 0 7 4 0 18 0 0
50 0 0 1 2 49 159 135 122 105 69 42 72 85 21 22 9
100 0 0 0 22 55 47 46 114 81 63 10 51 19 0 1 1
200 0 0 11 65 46 5 29 80 7 29 6 18 8 0 1 2
500 0 0 5 95 17 0 17 24 1 11 1 1 0 0 0 0
1000 0 5 6 69 22 0 5 2 0 7 0 0 0 0 0 0
10000 15 41 36 43 3 0 0 0 0 1 0 0 0 0 0 0
100000 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
  • Higher lambda means more effort needs to be exerted to improve rating.

Table 8: The number of companies improving rating as changes in the IT sector.
Lambda AAA AA+ AA AA- A+ A A- BBB+ BBB BBB- BB+ BB BB- B+ B B-
10 0 0 0 0 1 0 4 0 0 0 0 0 0 0 0 0
50 0 1 2 54 82 85 94 63 95 6 5 0 2 0 3 2
100 0 7 10 106 123 82 48 82 79 5 1 0 4 0 0 3
200 1 1 52 118 140 127 82 89 13 17 1 0 9 1 8 4
500 9 6 108 183 135 125 99 76 25 43 9 8 4 1 4 0
1000 3 3 61 84 37 104 42 27 0 9 3 3 1 0 0 0
10000 0 40 122 87 72 90 42 15 0 0 0 0 0 0 0 0
100000 0 27 27 9 40 31 10 1 0 0 0 0 0 0 0 0
  • Higher lambda means more effort needs to be exerted to improve rating.

Table 9: The number of companies improving rating as changes in the Financial sector.
Lambda AA+ AA AA- A+ A A- BBB+ BBB BBB- BB+ BB BB- B+
5 0 0 0 0 0 0 0 0 0 1 1 0 0
10 0 1 0 0 0 0 0 0 0 0 0 0 0
50 0 7 36 18 11 63 45 79 23 34 23 20 11
100 0 6 50 78 32 107 53 74 54 57 46 16 3
200 0 26 73 63 62 66 83 43 36 68 20 9 1
500 0 49 62 100 45 55 110 47 12 22 0 0 0
1000 1 35 29 44 34 14 34 1 0 0 0 0 0
10000 63 2 18 47 8 3 14 0 0 0 0 0 0
100000 64 0 0 1 0 0 4 0 0 0 0 0 0
  • Higher lambda means more effort needs to be exerted to improve rating.

Table 10: The number of companies improving rating as changes in the Healthcare sector.

Tables 8, 9, and 10 present in each row the numbers of companies that successfully improved their current rating, for a particular value. The tables are split by sector. We can see that as the values increase more companies improve their rating. Recall our range of rating scores assertion. We interpret the values in the tables as the companies that are closer to the threshold (smaller lambda) are improving easier and thus are in an upper row in the tables.

Looking at all the three tables we see the numbers shifting to left as increases. This is consistent with our previous observations in table 7. Indeed, the results seem to indicate that a lower is needed (thus a lower effort) for the majority of the lower rated companies. In contrast a larger value is needed for the majority of the high rated companies to improve their score. Thus, as the rating of the company gets better, a much larger effort is needed to further improve its ratings.

4 Conclusion

In this work we propose a sparsity algorithm that finds a counterfactual explanation for the credit rating problem. The sparsity algorithm is designed to discover the least amount of changes to be made to a particular financial statement variables that has a large probability of improving the prediction to a predefined credit rating.

We apply the sparsity algorithm to a synthetically generated dataset as well as to quarterly financial statements data. Our toy case study, using synthetically generated data, shows that the sparsity algorithm can successfully change points to the target class, with less “effort” then the solution obtained using a gradient descent method. The results obtained using quarterly financial statements confirm that the sparsity algorithm may be employed to significantly reduce the “effort” to improve corporate credit rating. More importantly, when analyzing quarterly statements before an actual rating increase we show that the sparsity algorithm captures the majority of features that in fact will have changed in the next quarter statement. This result gives us confidence to propose the final algorithm which results in an even more focused recommendation to the corporation’s managers.

Finally, we find that the “effort” required to improve the credit rating is positively related to the credit rating level. Specifically, improving credit rating for A rated corporations is much harder than improving credit rating for B level companies.


The authors would like to acknowledge the UBS research grant awarded to the Hanlon Laboratories which provided partial support for this research. We want to acknowledge Bingyang Wen who provided helpful discussions about the algorithm. We also acknowledge Professor Zachary Feinstein who suggested the use of the norm in the proposed optimization problem.


  • P. M. Addo, D. Guegan, and B. Hassani (2018) Credit risk analysis using machine and deep learning models. Risks 6 (2), pp. 38. Cited by: §1.
  • H. Ahn and K. Kim (2011) Corporate credit rating using multiclass classification models with order information. World Academy of Science, Engineering and Technology, International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering 5 (12), pp. 1783–1788. Cited by: Remark 1.
  • A. Akdemir and D. Karslı (2012) An assessment of strategic importance of credit rating agencies for companies and organizations. Procedia-Social and Behavioral Sciences 58, pp. 1628–1639. Cited by: §1.
  • A. M. Bruckstein, D. L. Donoho, and M. Elad (2009) From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM review 51 (1), pp. 34–81. Cited by: §2.2.
  • X. Cai, F. Nie, and H. Huang (2013) Exact top-k feature selection via l2, 0-norm constraint. In Twenty-third international joint conference on artificial intelligence, Cited by: §2.2.
  • D. V. Carvalho, E. M. Pereira, and J. S. Cardoso (2019) Machine learning interpretability: a survey on methods and metrics. Electronics 8 (8), pp. 832. Cited by: §1.
  • A. Cauchy et al. (1847) Méthode générale pour la résolution des systemes d’équations simultanées. Comp. Rend. Sci. Paris 25 (1847), pp. 536–538. Cited by: §2.2.
  • S. Chakraborty, R. Tomsett, R. Raghavendra, D. Harborne, M. Alzantot, F. Cerutti, M. Srivastava, A. Preece, S. Julier, R. M. Rao, et al. (2017) Interpretability of deep learning models: a survey of results. In 2017 IEEE smartworld, ubiquitous intelligence & computing, advanced & trusted computed, scalable computing & communications, cloud & big data computing, Internet of people and smart city innovation (smartworld/SCALCOM/UIC/ATC/CBDcom/IOP/SCI), pp. 1–6. Cited by: §1.
  • S. &. P. Compustat (2019) Compustat online manual. Standard & Poor’s. Cited by: §1, §3.
  • H. B. Curry (1944) The method of steepest descent for non-linear minimization problems. Quarterly of Applied Mathematics 2 (3), pp. 258–261. Cited by: §2.2.
  • F. Dittrich (2007) The credit rating industry: competition and regulation. Ph.D. Thesis, Universität zu Köln. Cited by: §1.
  • General Data Protection Regulation (2016) Regulation eu 2016/679 of the european parliament and of the council of 27 april 2016. Official Journal of the European Union. Available at: http://ec. europa. eu/justice/data-protection/reform/files/regulation_oj_en. pdf (accessed 20 September 2017). Cited by: §1.
  • P. Golbayani, I. Florescu, and R. Chatterjee (2020a)

    A comparative study of forecasting corporate credit ratings using neural networks, support vector machines, and decision trees

    The North American Journal of Economics and Finance 54, pp. 101251. Cited by: §1.
  • P. Golbayani, D. Wang, and I. Florescu (2020b) Application of deep neural networks to assess corporate credit rating. arXiv preprint arXiv:2003.02334. Cited by: §1.
  • B. Goodman and S. Flaxman (2017) European union regulations on algorithmic decision-making and a “right to explanation”. AI magazine 38 (3), pp. 50–57. Cited by: §1.
  • Y. Goyal, Z. Wu, J. Ernst, D. Batra, D. Parikh, and S. Lee (2019) Counterfactual visual explanations. In International Conference on Machine Learning, pp. 2376–2384. Cited by: §1.
  • R. M. Grath, L. Costabello, C. L. Van, P. Sweeney, F. Kamiab, Z. Shen, and F. Lecue (2018) Interpretable credit application predictions with counterfactual explanations. arXiv preprint arXiv:1811.05245. Cited by: §1, §2.2.
  • P. Hájek and V. Olej (2011)

    Credit rating modelling by kernel-based approaches with supervised and semi-supervised learning

    Neural Computing and Applications 20 (6), pp. 761–773. Cited by: §1.
  • P. Hájek and V. Olej (2014) Predicting firms’ credit ratings using ensembles of artificial immune systems and machine learning–an over-sampling approach. In IFIP International Conference on Artificial Intelligence Applications and Innovations, pp. 29–38. Cited by: §1.
  • P. Huang, H. Zhang, R. Jiang, R. Stanforth, J. Welbl, J. Rae, V. Maini, D. Yogatama, and P. Kohli (2019) Reducing sentiment bias in language models via counterfactual evaluation. arXiv preprint arXiv:1911.03064. Cited by: §1.
  • Z. Huang, H. Chen, C. Hsu, W. Chen, and S. Wu (2004) Credit rating analysis with support vector machines and neural networks: a market comparative study. Decision support systems 37 (4), pp. 543–558. Cited by: Remark 1.
  • K. Janocha and W. M. Czarnecki (2017) On loss functions for deep neural networks in classification. arXiv preprint arXiv:1702.05659. Cited by: §2.2.
  • A. Khashman (2010) Neural networks for credit risk evaluation: investigation of different neural models and learning schemes. Expert Systems with Applications 37 (9), pp. 6233–6239. Cited by: §1.
  • S. Khemakhem and Y. Boujelbene (2015) Credit risk prediction: a comparative study between discriminant analysis and the neural network approach. Accounting and Management Information Systems 14 (1), pp. 60. Cited by: §1.
  • H. S. Kim and S. Y. Sohn (2010) Support vector machines for default prediction of smes based on technology credit. European Journal of Operational Research 201 (3), pp. 838–846. Cited by: §1.
  • D. M. Kline and V. L. Berardi (2005) Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Computing & Applications 14 (4), pp. 310–318. Cited by: §2.2.
  • K. Kumar and S. Bhattacharya (2006) Artificial neural network vs linear discriminant analysis in credit ratings forecast: a comparative study of prediction performances. Review of Accounting and Finance 5 (3), pp. 216–227. Cited by: Remark 1.
  • K. Kumar and J. D. Haynes (2003) FORECASTING credit ratings using an ann and statistical techniques.. International journal of business studies 11 (1). Cited by: Remark 1.
  • Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. nature 521 (7553), 436-444. Google Scholar Google Scholar Cross Ref Cross Ref. Cited by: §1.
  • H. Luo and L. Chen (2019) Bond yield and credit rating: evidence of chinese local government financing vehicles. Review of Quantitative Finance and Accounting 52 (3), pp. 737–758. Cited by: §1.
  • A. Mordvintsev, C. Olah, and M. Tyka (2015) Inceptionism: going deeper into neural networks. Cited by: §1.
  • W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu (2019) Interpretable machine learning: definitions, methods, and applications. arXiv preprint arXiv:1901.04592. Cited by: §1.
  • G. Plumb, D. Molitor, and A. Talwalkar (2018) Model agnostic supervised local explanations. arXiv preprint arXiv:1807.02910. Cited by: §1.
  • M. Prosperi, Y. Guo, M. Sperrin, J. S. Koopman, J. S. Min, X. He, S. Rich, M. Wang, I. E. Buchan, and J. Bian (2020) Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nature Machine Intelligence 2 (7), pp. 369–375. Cited by: §1.
  • Protection Regulation (2018) General data protection regulation. Intouch. Cited by: §1.
  • S&P Global (2018) Guide to credit rating essentials: what are credit ratings and how do they work. Standard & Poor’s Financial Services New York. Cited by: §1.
  • I. Selesnick (2017) Sparse regularization via convex analysis. IEEE Transactions on Signal Processing 65 (17), pp. 4481–4494. Cited by: §2.2.
  • N. Shukla and K. Fricklas (2018)

    Machine learning with tensorflow

    Manning Greenwich. Cited by: §2.1.
  • Standard and Poor’s Corporation (1981) Standard & poor’s guide to credit rating essentials. Standard & Poor’s. Cited by: §1.
  • S. Wachter, B. Mittelstadt, and C. Russell (2017) Counterfactual explanations without opening the black box: automated decisions and the gdpr. Harv. JL & Tech. 31, pp. 841. Cited by: §1, §2.1.
  • M. Wallis, K. Kumar, and A. Gepp (2019) Credit rating forecasting using machine learning techniques. In Managerial Perspectives on Intelligent Big Data Analytics, pp. 180–198. Cited by: §1.
  • D. Wang, T. Wang, and I. Florescu (2020) Is image encoding beneficial for deep learning in finance?. IEEE Internet of Things Journal. Cited by: §1.
  • D. West (2000) Neural network credit scoring models. Computers & Operations Research 27 (11-12), pp. 1131–1152. Cited by: §1.
  • Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy (2016) Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 1480–1489. Cited by: §1.
  • Y. Ye, S. Liu, and J. Li (2008) A multiclass machine learning approach to credit rating prediction. In 2008 International Symposiums on Information Processing, pp. 57–61. Cited by: §1.
  • G. Yuan and B. Ghanem (2016) Sparsity constrained minimization via mathematical programming with equilibrium constraints. arXiv preprint arXiv:1608.04430. Cited by: §2.2.
  • X. Zhang, M. Fan, D. Wang, P. Zhou, and D. Tao (2020) Top-k feature selection framework using robust 0-1 integer programming. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §2.2.
  • Z. Zhao, S. Xu, B. H. Kang, M. M. J. Kabir, Y. Liu, and R. Wasinger (2015) Investigation and improvement of multi-layer perceptron neural networks for credit scoring. Expert Systems with Applications 42 (7), pp. 3508–3516. Cited by: §1.

5 Appendix

width=center Reasons Features Adjustment, scheduled items

Accounting Changes - Cumulative Effect, Accumulated Other Comprehensive Income (Loss), Assets Netting & Other Adjustments, Accum Other Comp Inc - Other Adjustments, Accum Other Comp Inc - Min Pension Liab Adj, Comp Inc - Beginning Net Income, Comp Inc - Currency Trans Adj, Comp Inc - Other Adj, Comp Inc - Minimum Pension Adj, Dilution Adjustment, Accum Other Comp Inc - Marketable Security Adjustments, Provision for Loan/Asset Losses, Pension Core Adjustment - 12mm, Core Pension Adjustment Diluted EPS Effect 12MM, Core Pension Adjustment Diluted EPS Effect, Core Pension Adjustment Basic EPS Effect 12MM, Core Pension Adjustment Basic EPS Effect, Core Pension Interest Adjustment After-tax Preliminary, Core Pension Interest Adjustment After-tax, Core Pension Interest Adjustment Diluted EPS Effect Preliminary, Core Pension Interest Adjustment Diluted EPS Effect, Core Pension Interest Adjustment Basic EPS Effect Preliminary, Core Pension Interest Adjustment Basic EPS Effect, Core Pension Interest Adjustment Pretax Preliminary, Core Pension Interest Adjustment Pretax, Core Pension Adjustment 12MM Diluted EPS Effect Preliminary, Core Pension Adjustment Diluted EPS Effect Preliminary, Core Pension Adjustment 12MM Basic EPS Effect Preliminary, Core Pension Adjustment Basic EPS Effect Preliminary, Core Pension Adjustment Preliminary, Core Pension Adjustment, Core Pension w/o Interest Adjustment After-tax Preliminary, Core Pension w/o Interest Adjustment After-tax, Core Pension w/o Interest Adjustment Diluted EPS Effect Preliminary, Core Pension w/o Interest Adjustment Diluted EPS Effect, Core Pension w/o Interest Adjustment Basic EPS Effect Preliminary, Core Pension w/o Interest Adjustment Basic EPS Effect, Core Pension w/o Interest Adjustment Pretax Preliminary, Core Pension w/o Interest Adjustment Pretax, Core Post Retirement Adjustment, Core Post Retirement Adjustment Diluted EPS Effect 12MM, Core Post Retirement Adjustment Diluted EPS Effect, Core Post Retirement Adjustment 12MM, Core Post Retirement Adjustment Basic EPS Effect 12MM, Core Post Retirement Adjustment Basic EPS Effect, Core Post Retirement Adjustment 12MM Diluted EPS Effect Preliminary, Core Post Retirement Adjustment Diluted EPS Effect Preliminary, Core Post Retirement Adjustment 12MM Basic EPS Effect Preliminary, Core Post Retirement Adjustment Basic EPS Effect Preliminary, Core Post Retirement Adjustment Preliminary, Receivables - Estimated Doubtful, Accum Other Comp Inc - Cumulative Translation Adjustments, Reserve for Loan/Asset Losses

Special items, unusual or non-recurring items Acquisition/Merger After-Tax, Acquisition/Merger Diluted EPS Effect, Acquisition/Merger Basic EPS Effect, Acquisition/Merger Pretax, Extinguishment of Debt After-tax, Extinguishment of Debt Diluted EPS Effect, Extinguishment of Debt Basic EPS Effect, Extinguishment of Debt Pretax, Impairments of Goodwill AfterTax - 12mm, Impairment of Goodwill After-tax, Impairments Diluted EPS - 12mm, Impairment of Goodwill Diluted EPS Effect, Impairment of Goodwill Basic EPS Effect 12MM, Impairment of Goodwill Basic EPS Effect, Impairment of Goodwill Pretax, Gain/Loss After-Tax, Gain/Loss on Sale (Core Earnings Adjusted) After-tax 12MM, Gain/Loss on Sale (Core Earnings Adjusted) After-tax, Gain/Loss on Sale (Core Earnings Adjusted) Diluted EPS Effect 12MM, Gain/Loss on Sale (Core Earnings Adjusted) Diluted EPS, Gain/Loss on Sale (Core Earnings Adjusted) Basic EPS Effect 12MM, Gain/Loss on Sale (Core Earnings Adjusted) Basic EPS Effect, Gain/Loss on Sale (Core Earnings Adjusted) Pretax, Gain/Loss Diluted EPS Effect, Gain/Loss Basic EPS Effect, Gain/Loss Pretax, Gain/Loss on Ineffective Hedges, Inventory - Other, Nonperforming Assets - Total, Nonrecurring Income Taxes Diluted EPS Effect, Nonrecurring Income Taxes Basic EPS Effect, Nonrecurring Income Taxes - After-tax, Order backlog, Restructuring Cost After-tax, Restructuring Cost Diluted EPS Effect, Restructuring Cost Basic EPS Effect, Restructuring Cost Pretax, Other Special Items Diluted EPS Effect, Other Special Items Basic EPS Effect, Other Special Items After-tax, Other Special Items Pretax, Special Items, Writedowns After-tax, Writedowns Diluted EPS Effect, Writedowns Basic EPS Effect, Writedowns Pretax Assets are discontinued operations Other Long-term Assets, Discontinued Operations, Extraordinary Items and Discontinued Operations Assets are in the Market Accum Other Comp Inc - Derivatives Unrealized Gain/Loss, Accum Other Comp Inc - Unreal G/L Ret Int in Sec Assets, Assets Level2 (Observable), Comp Inc - Derivative Gains/Losses, Comp Inc - Securities Gains/Losses, Common Shares Used to Calculate Earnings Per Share - 12 Months Moving, Com Shares for Diluted EPS, Common Shares Issued, Common/Ordinary Stock (Capital), Dividends - Preferred/Preference, Earnings Per Share (Diluted) - Including Extraordinary Items, Earnings Per Share (Diluted) - Excluding Extraordinary items, Earnings Per Share (Basic) - Including Extraordinary Items, Earnings Per Share (Basic) - Excluding Extraordinary Items, Earnings Per Share (Basic) - Excluding Extraordinary Items - 12 Months Moving, Foreign Exchange Income (Loss), Goodwill (net), Options - Fair Value of Options Granted, Life of Options - Assumption (# yrs), Risk Free Rate - Assumption (%), Volatility - Assumption (%), Repurchase Price - Average per share Quarter, Preferred/Preference Stock - Nonredeemable, Preferred/Preference Stock (Capital) - Total, Preferred/Preference Stock - Redeemable, Implied Option Expense - 12mm, Implied Option EPS Diluted 12MM, Implied Option 12MM EPS Diluted Preliminary, Implied Option EPS Diluted, Implied Option EPS Diluted Preliminary, Implied Option EPS Basic 12MM, Implied Option 12MM EPS Basic Preliminary, Implied Option EPS Basic, Implied Option EPS Basic Preliminary, Implied Option Expense, Implied Option Expense Preliminary Accord, regulated items Risk-Adjusted Capital Ratio - Tier 1, Risk-Adjusted Capital Ratio - Tier 2, Risk-Adjusted Capital Ratio - Combined Agreements with shareholders, employees Total Shares Repurchased - Quarter, Common Shares Outstanding, Common Shares Used to Calculate Earnings Per Share - Basic, Carrying Value, Common Stock Equivalents - Dollar Savings, Deferred Compensation, Dividends - Preferred/Preference, Common ESOP Obligation - Total, Preferred ESOP Obligation - Non-Redeemable, Preferred ESOP Obligation - Redeemable, Preferred ESOP Obligation - Total, Dividend Rate - Assumption (%), Nonred Pfd Shares Outs (000) - Quarterly, Redeem Pfd Shares Outs (000), Other Stockholders- Equity Adjustments, Stock Compensation Expense, Treasury Stock - Number of Common Shares, Treasury Stock - Total (All Capital) Computational Items Accumulated Depreciation of RE Property, Depreciation, Depletion and Amortization (Accumulated), Depreciation and Amortization - Total, Depr/Amort of Property, Amortization of Goodwill, Receivables - Current Other incl Tax Refunds, Total Fair Value Changes including Earnings, Total Fair Value Liabilities, Deferred Tax Asset - Long Term, Current Deferred Tax Asset, Current Deferred Tax Liability, Deferred Taxes - Balance Sheet, Income Taxes - Deferred, Deferred Taxes and Investment Tax Credit, Income Taxes Payable, Income Taxes - Total, Excise Taxes Special events Reversal - Restructruring/Acquisition Aftertax 12MM, Reversal - Restructruring/Acquisition Aftertax, Reversal - Restructuring/Acq Diluted EPS Effect 12MM, Reversal - Restructuring/Acq Diluted EPS Effect, Reversal - Restructuring/Acq Basic EPS Effect 12MM, Reversal - Restructuring/Acq Basic EPS Effect, Reversal - Restructruring/Acquisition Pretax, Settlement (Litigation/Insurance) AfterTax - 12mm, Settlement (Litigation/Insurance) After-tax, Settlement (Litigation/Insurance) Diluted EPS Effect 12MM, Settlement (Litigation/Insurance) Diluted EPS Effect, Settlement (Litigation/Insurance) Basic EPS Effect 12MM, Settlement (Litigation/Insurance) Basic EPS Effect, Settlement (Litigation/Insurance) Pretax, Extraordinary Items Noncontrolling loss/gain from subsidiary Comprehensive Income - Noncontrolling Interest, Equity in Earnings (I/S) - Unconsolidated Subsidiaries, Investment and Advances - Equity, Investment and Advances - Other, Noncontrolling Interests - Nonredeemable - Balance Sheet, Noncontrolling Interest - Redeemable - Balance Sheet, Noncontrolling Interests - Total - Balance Sheet, Noncontrolling Interest - Income Account Non-operating items Non-Operating Income (Expense) - Total, Gain/Loss on Sale of Property

Table 11: A complete list of accounting variables that may not be changed