Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

06/03/2021
by   Aurélien Bibaut, et al.
7

Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result of running a contextual bandit algorithm. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class and provide first-of-their-kind generalization guarantees and fast convergence rates. Our results are based on a new maximal inequality that carefully leverages the importance sampling structure to obtain rates with the right dependence on the exploration rate in the data. For regression, we provide fast rates that leverage the strong convexity of squared-error loss. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero, as is the case for bandit-collected data. An empirical investigation validates our theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2019

Balanced Off-Policy Evaluation General Action Spaces

In many practical applications of contextual bandits, online learning is...
research
06/09/2019

Balanced Off-Policy Evaluation in General Action Spaces

In many practical applications of contextual bandits, online learning is...
research
02/12/2020

Weighted Empirical Risk Minimization: Sample Selection Bias Correction based on Importance Sampling

We consider statistical learning problems, when the distribution P' of t...
research
07/18/2023

Adaptively Optimised Adaptive Importance Samplers

We introduce a new class of adaptive importance samplers leveraging adap...
research
02/09/2023

An information-theoretic learning model based on importance sampling

A crucial assumption underlying the most current theory of machine learn...
research
11/22/2022

Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning

We design and implement an adaptive experiment (a “contextual bandit”) t...
research
11/02/2014

Fast Randomized Kernel Methods With Statistical Guarantees

One approach to improving the running time of kernel-based machine learn...

Please sign up or login with your details

Forgot password? Click here to reset