Boosted Off-Policy Learning

08/01/2022
by   Ben London, et al.
0

We investigate boosted ensemble models for off-policy learning from logged bandit feedback. Toward this goal, we propose a new boosting algorithm that directly optimizes an estimate of the policy's expected reward. We analyze this algorithm and prove that the empirical risk decreases (possibly exponentially fast) with each round of boosting, provided a "weak" learning condition is satisfied. We further show how the base learner reduces to standard supervised learning problems. Experiments indicate that our algorithm can outperform deep off-policy learning and methods that simply regress on the observed rewards, thereby demonstrating the benefits of both boosting and choosing the right learning objective.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2021

Boosting for Online Convex Optimization

We consider the decision-making framework of online convex optimization ...
research
10/11/2018

Online Multiclass Boosting with Bandit Feedback

We present online boosting algorithms for multiclass classification with...
research
05/06/2021

Machine Collaboration

We propose a new ensemble framework for supervised learning, named machi...
research
01/04/2014

From Kernel Machines to Ensemble Learning

Ensemble methods such as boosting combine multiple learners to obtain be...
research
07/18/2022

Online Learning with Off-Policy Feedback

We study the problem of online learning in adversarial bandit problems u...
research
01/26/2021

Tree boosting for learning probability measures

Learning probability measures based on an i.i.d. sample is a fundamental...
research
01/31/2023

Multicalibration as Boosting for Regression

We study the connection between multicalibration and boosting for square...

Please sign up or login with your details

Forgot password? Click here to reset