XtracTree for Regulator Validation of Bagging Methods Used in Retail Banking

by   Jeremy Charlier, et al.

Bootstrap aggregation, known as bagging, is one of the most popular ensemble methods used in machine learning (ML). An ensemble method is a supervised ML method that combines multiple hypotheses to form a single hypothesis used for prediction. A bagging algorithm combines multiple classifiers modelled on different sub-samples of the same data set to build one large classifier. Large retail banks are nowadays using the power of ML algorithms, including decision trees and random forests, to optimize the retail banking activities. However, AI bank researchers face a strong challenge from their own model validation department as well as from national financial regulators. Each proposed ML model has to be validated and clear rules for every algorithm-based decision have to be established. In this context, we propose XtracTree, an algorithm that is capable of effectively converting an ML bagging classifier, such as a decision tree or a random forest, into simple "if-then" rules satisfying the requirements of model validation. Our algorithm is also capable of highlighting the decision path for each individual sample or a group of samples, addressing any concern from the regulators regarding ML "black-box". We use a public loan data set from Kaggle to illustrate the usefulness of our approach. Our experiments indicate that, using XtracTree, we are able to ensure a better understanding for our model, leading to an easier model validation by national financial regulators and the internal model validation department.


Rule Covering for Interpretation and Boosting

We propose two algorithms for interpretation and boosting of tree-based ...

Classifier Suites for Insider Threat Detection

Better methods to detect insider threats need new anticipatory analytics...

Cost-complexity pruning of random forests

Random forests perform bootstrap-aggregation by sampling the training sa...

VisRuler: Visual Analytics for Extracting Decision Rules from Bagged and Boosted Decision Trees

Bagging and boosting are two popular ensemble methods in machine learnin...

Optimal trees selection for classification via out-of-bag assessment and sub-bagging

The effect of training data size on machine learning methods has been we...

What Can I Do Now? Guiding Users in a World of Automated Decisions

More and more processes governing our lives use in some part an automati...

Predicting sepsis in multi-site, multi-national intensive care cohorts using deep learning

Despite decades of clinical research, sepsis remains a global public hea...