Accurate, Data-Efficient Learning from Noisy, Choice-Based Labels for Inherent Risk Scoring

11/27/2018
by   W. Ronny Huang, et al.
0

Inherent risk scoring is an important function in anti-money laundering, used for determining the riskiness of an individual during onboarding before fraudulent transactions occur. It is, however, often fraught with two challenges: (1) inconsistent notions of what constitutes as high or low risk by experts and (2) the lack of labeled data. This paper explores a new paradigm of data labeling and data collection to tackle these issues. The data labeling is choice-based; the expert does not provide an absolute risk score but merely chooses the most/least risky example out of a small choice set, which reduces inconsistency because experts make only relative judgments of risk. The data collection is synthetic; examples are crafted using optimal experimental design methods, obviating the need for real data which is often difficult to obtain due to regulatory concerns. We present the methodology of an end-to-end inherent risk scoring algorithm that we built for a large financial institution. The system was trained on a small set of synthetic data (188 examples, 24 features) whose labels are obtained via the choice-based paradigm using an efficient number of expert labelers. The system achieves 89 accuracy on a test set of 52 examples, with an area under the ROC curve of 93

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2011

Eliciting Forecasts from Self-interested Experts: Scoring Rules for Decision Makers

Scoring rules for eliciting expert predictions of random variables are u...
research
07/09/2016

Classifier Risk Estimation under Limited Labeling Resources

In this paper we propose strategies for estimating performance of a clas...
research
03/09/2022

All You Need is LUV: Unsupervised Collection of Labeled Images using Invisible UV Fluorescent Indicators

Large-scale semantic image annotation is a significant challenge for lea...
research
11/05/2020

Measuring Data Collection Quality for Community Healthcare

Machine learning has tremendous potential to provide targeted interventi...
research
12/09/2022

Multidimensional Service Quality Scoring System

This supplementary paper aims to introduce the Multidimensional Service ...
research
03/08/2021

The Weakly-Labeled Rand Index

Synthetic Aperture Sonar (SAS) surveys produce imagery with large region...
research
04/27/2022

An Iterative Labeling Method for Annotating Fisheries Imagery

In this paper, we present a methodology for fisheries-related data that ...

Please sign up or login with your details

Forgot password? Click here to reset