POMRL: No-Regret Learning-to-Plan with Increasing Horizons

12/30/2022
by   Khimya Khetarpal, et al.
0

We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss. Our bound suggests that the average regret over tasks decreases as the number of tasks increases and as the tasks are more similar. In the classical single-task setting, it is known that the planning horizon should depend on the estimated model's accuracy, that is, on the number of samples within task. We generalize this finding to meta-RL and study this dependence of planning horizons on the number of tasks. Based on our theoretical findings, we derive heuristics for selecting slowly increasing discount factors, and we validate its significance empirically.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2022

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

We study lifelong reinforcement learning (RL) in a regret minimization s...
research
07/06/2021

Meta-Reinforcement Learning for Heuristic Planning

In Meta-Reinforcement Learning (meta-RL) an agent is trained on a set of...
research
08/18/2022

Meta-Learning Online Control for Linear Dynamical Systems

In this paper, we consider the problem of finding a meta-learning online...
research
05/24/2019

Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound

Exploration in reinforcement learning (RL) suffers from the curse of dim...
research
04/03/2017

Multi-Advisor Reinforcement Learning

We consider tackling a single-agent RL problem by distributing it to n l...
research
06/24/2022

Joint Representation Training in Sequential Tasks with Shared Structure

Classical theory in reinforcement learning (RL) predominantly focuses on...
research
05/10/2023

An Option-Dependent Analysis of Regret Minimization Algorithms in Finite-Horizon Semi-Markov Decision Processes

A large variety of real-world Reinforcement Learning (RL) tasks is chara...

Please sign up or login with your details

Forgot password? Click here to reset