Learning Optimized Risk Scores on Large-Scale Datasets

10/01/2016
by   Berk Ustun, et al.
0

Risk scores are simple classification models that let users quickly assess risk by adding, subtracting, and multiplying a few small numbers. These models are widely used for high-stakes applications in healthcare and criminology, but are difficult to create because they need to be risk-calibrated, sparse, use small integer coefficients, and obey operational constraints. In this paper, we present a new approach to learn risk scores that are fully optimized for feature selection, integer coefficients, and operational constraints. We formulate the risk score problem as a mixed integer nonlinear program, and present a new cutting plane algorithm to efficiently recover its optimal solution while avoiding the stalling behavior of existing cutting plane algorithms in non-convex settings. We pair our algorithm with specialized techniques to generate feasible solutions, narrow the optimality gap, and reduce data-related computation. The resulting approach can learn optimized risk scores in a way that scales linearly in the number of samples, provides a proof of optimality, and accommodates complex operational constraints. We illustrate the benefits of this approach through extensive numerical experiments.

READ FULL TEXT
research
02/15/2015

Supersparse Linear Integer Models for Optimized Medical Scoring Systems

Scoring systems are linear classification models that only require users...
research
10/12/2022

FasterRisk: Fast and Accurate Interpretable Risk Scores

Over the last century, risk scores have been the most popular form of pr...
research
03/03/2021

Stochastic Cutting Planes for Data-Driven Optimization

We introduce a stochastic version of the cutting-plane method for a larg...
research
12/02/2021

Learning Optimal Predictive Checklists

Checklists are simple decision aids that are often used to promote safet...
research
03/30/2020

Optimal Behavior Planning for Autonomous Driving: A Generic Mixed-Integer Formulation

Mixed-Integer Quadratic Programming (MIQP) has been identified as a suit...
research
12/27/2019

A Deterministic Model for the Transshipment Problem of a Fast Fashion Retailer under Capacity Constraints

In this paper we present a novel transshipment problem for a large appar...
research
02/22/2021

Slowly Varying Regression under Sparsity

We consider the problem of parameter estimation in slowly varying regres...

Please sign up or login with your details

Forgot password? Click here to reset