Residual Overfit Method of Exploration

10/06/2021
by   James McInerney, et al.
0

Exploration is a crucial aspect of bandit and reinforcement learning algorithms. The uncertainty quantification necessary for exploration often comes from either closed-form expressions based on simple models or resampling and posterior approximations that are computationally intensive. We propose instead an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit. The approach, which we term the residual overfit method of exploration (ROME), drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model. The intuition is that overfitting occurs the most at actions and contexts with insufficient data to form accurate predictions of the reward. We justify this intuition formally from both a frequentist and a Bayesian information theoretic perspective. The result is a method that generalizes to a wide variety of models and avoids the computational overhead of resampling or posterior approximations. We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.

READ FULL TEXT
research
02/26/2018

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

Recent advances in deep reinforcement learning have made significant str...
research
01/23/2019

Meta-Learning for Contextual Bandit Exploration

We describe MELEE, a meta-learning algorithm for learning a good explora...
research
02/12/2018

Practical Evaluation and Optimization of Contextual Bandit Algorithms

We study and empirically optimize contextual bandit learning, exploratio...
research
02/13/2018

Efficient Exploration through Bayesian Deep Q-Networks

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling...
research
02/24/2023

Model-Based Uncertainty in Value Functions

We consider the problem of quantifying uncertainty over expected cumulat...
research
10/12/2019

Efficient Inference and Exploration for Reinforcement Learning

Despite an ever growing literature on reinforcement learning algorithms ...
research
12/15/2022

Ungeneralizable Contextual Logistic Bandit in Credit Scoring

The application of reinforcement learning in credit scoring has created ...

Please sign up or login with your details

Forgot password? Click here to reset