The fraud loss for selecting the model complexity in fraud detection
In fraud detection applications, the investigator is typically limited to controlling a restricted number k of cases. The most efficient manner of allocating the resources is then to try selecting the k cases with the highest probability of being fraudulent. The prediction model used for this purpose must normally be regularized to avoid overfitting and consequently bad prediction performance. A new loss function, denoted the fraud loss, is proposed for selecting the model complexity via a tuning parameter. A simulation study is performed to find the optimal settings for validation. Further, the performance of the proposed procedure is compared to the most relevant competing procedure, based on the area under the receiver operating characteristic curve (AUC), in a set of simulations, as well as on a VAT fraud dataset. In most cases, choosing the complexity of the model according to the fraud loss, gave a better than, or comparable performance to the AUC in terms of the fraud loss.
READ FULL TEXT