Analyzing Credit Risk Model Problems through NLP-Based Clustering and Machine Learning: Insights from Validation Reports

06/02/2023
by   Szymon Lis, et al.
0

This paper explores the use of clustering methods and machine learning algorithms, including Natural Language Processing (NLP), to identify and classify problems identified in credit risk models through textual information contained in validation reports. Using a unique dataset of 657 findings raised by validation teams in a large international banking group between January 2019 and December 2022. The findings are classified into nine validation dimensions and assigned a severity level by validators using their expert knowledge. The authors use embedding generation for the findings' titles and observations using four different pre-trained models, including "module_url" from TensorFlow Hub and three models from the SentenceTransformer library, namely "all-mpnet-base-v2", "all-MiniLM-L6-v2", and "paraphrase-mpnet-base-v2". The paper uses and compares various clustering methods in grouping findings with similar characteristics, enabling the identification of common problems within each validation dimension and severity. The results of the study show that clustering is an effective approach for identifying and classifying credit risk model problems with accuracy higher than 60%. The authors also employ machine learning algorithms, including logistic regression and XGBoost, to predict the validation dimension and its severity, achieving an accuracy of 80% for XGBoost algorithm. Furthermore, the study identifies the top 10 words that predict a validation dimension and severity. Overall, this paper makes a contribution by demonstrating the usefulness of clustering and machine learning for analyzing textual information in validation reports, and providing insights into the types of problems encountered in the development and validation of credit risk models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2022

Analyzing Machine Learning Models for Credit Scoring with Explainable AI and Optimizing Investment Decisions

This paper examines two different yet related questions related to expla...
research
12/27/2021

Casr-Cluster: Crash Clustering for Linux Applications

Crash report analysis is a necessary step before developers begin fixing...
research
09/21/2023

Improving VTE Identification through Adaptive NLP Model Selection and Clinical Expert Rule-based Classifier from Radiology Reports

Rapid and accurate identification of Venous thromboembolism (VTE), a sev...
research
08/11/2022

Searching for chromate replacements using natural language processing and machine learning algorithms

The past few years has seen the application of machine learning utilised...
research
02/27/2019

Semi-supervised Learning for Quantification of Pulmonary Edema in Chest X-Ray Images

We propose and demonstrate machine learning algorithms to assess the sev...
research
10/11/2021

Estimating IRI based on pavement distress type, density, and severity: Insights from machine learning techniques

Surface roughness is primary measure of pavement performance that has be...

Please sign up or login with your details

Forgot password? Click here to reset