Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability

09/24/2021
by   Aviv Tamar, et al.
0

In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem parameters – the rewards and transitions – is assumed, and a policy that optimizes the (posterior) expected return is sought. A common approximation, which has been recently popularized as meta-RL, is to train the agent on a sample of N problem instances from the prior, with the hope that for large enough N, good generalization behavior to an unseen test instance will be obtained. In this work, we study generalization in Bayesian RL under the probably approximately correct (PAC) framework, using the method of algorithmic stability. Our main contribution is showing that by adding regularization, the optimal policy becomes stable in an appropriate sense. Most stability results in the literature build on strong convexity of the regularized loss – an approach that is not suitable for RL as Markov decision processes (MDPs) are not convex. Instead, building on recent results of fast convergence rates for mirror descent in regularized MDPs, we show that regularized MDPs satisfy a certain quadratic growth criterion, which is sufficient to establish stability. This result, which may be of independent interest, allows us to study the effect of regularization on generalization in the Bayesian RL setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2023

Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization

Robust Markov decision processes (MDPs) aim to handle changing or partia...
research
03/17/2022

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

In probably approximately correct (PAC) reinforcement learning (RL), an ...
research
10/12/2021

Twice regularized MDPs and the equivalence between robustness and regularization

Robust Markov decision processes (MDPs) aim to handle changing or partia...
research
08/05/2021

Active Reinforcement Learning over MDPs

The past decade has seen the rapid development of Reinforcement Learning...
research
10/27/2021

Learning Domain Invariant Representations in Goal-conditioned Block MDPs

Deep Reinforcement Learning (RL) is successful in solving many complex M...
research
09/14/2015

Benchmarking for Bayesian Reinforcement Learning

In the Bayesian Reinforcement Learning (BRL) setting, agents try to maxi...
research
05/23/2022

Learning to branch with Tree MDPs

State-of-the-art Mixed Integer Linear Program (MILP) solvers combine sys...

Please sign up or login with your details

Forgot password? Click here to reset