Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization

07/26/2021
by   Vishal Gupta, et al.
0

Motivated by the poor performance of cross-validation in settings where data are scarce, we propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization.Our approach exploits the optimization problem's sensitivity analysis to estimate the gradient of the optimal objective value with respect to the amount of noise in the data and uses the estimated gradient to debias the policy's in-sample performance. Unlike cross-validation techniques, our approach avoids sacrificing data for a test set, utilizes all data when training and, hence, is well-suited to settings where data are scarce. We prove bounds on the bias and variance of our estimator for optimization problems with uncertain linear objectives but known, potentially non-convex, feasible regions. For more specialized optimization problems where the feasible region is "weakly-coupled" in a certain sense, we prove stronger results. Specifically, we provide explicit high-probability bounds on the error of our estimator that hold uniformly over a policy class and depends on the problem's dimension and policy class's complexity. Our bounds show that under mild conditions, the error of our estimator vanishes as the dimension of the optimization problem grows, even if the amount of available data remains small and constant. Said differently, we prove our estimator performs well in the small-data, large-scale regime. Finally, we numerically compare our proposed method to state-of-the-art approaches through a case-study on dispatching emergency medical response services using real data. Our method provides more accurate estimates of out-of-sample performance and learns better-performing policies.

READ FULL TEXT
research
02/28/2013

Estimating the Maximum Expected Value: An Analysis of (Nested) Cross Validation and the Maximum Sample Average

We investigate the accuracy of the two most common estimators for the ma...
research
01/26/2020

On a Nadaraya-Watson Estimator with Two Bandwidths

In a regression model, we write the Nadaraya-Watson estimator of the reg...
research
10/24/2021

Integrated Conditional Estimation-Optimization

Many real-world optimization problems involve uncertain parameters with ...
research
11/23/2021

MARS via LASSO

MARS is a popular method for nonparametric regression introduced by Frie...
research
03/12/2021

Bias Reduction in Sample-Based Optimization

We consider stochastic optimization problems which use observed data to ...
research
12/04/2022

Coupled Bootstrap Test Error Estimation for Poisson Variables

Test error estimation is a fundamental problem in statistics and machine...
research
06/01/2019

Data-Pooling in Stochastic Optimization

Managing large-scale systems often involves simultaneously solving thous...

Please sign up or login with your details

Forgot password? Click here to reset