Adaptive Multi-Goal Exploration

11/23/2021
by   Jean Tarbouriech, et al.
0

We introduce a generic strategy for provably efficient multi-goal exploration. It relies on AdaGoal, a novel goal selection scheme that is based on a simple constrained optimization problem, which adaptively targets goal states that are neither too difficult nor too easy to reach according to the agent's current knowledge. We show how AdaGoal can be used to tackle the objective of learning an ϵ-optimal goal-conditioned policy for all the goal states that are reachable within L steps in expectation from a reference state s_0 in a reward-free Markov decision process. In the tabular case with S states and A actions, our algorithm requires Õ(L^3 S A ϵ^-2) exploration steps, which is nearly minimax optimal. We also readily instantiate AdaGoal in linear mixture Markov decision processes, which yields the first goal-oriented PAC guarantee with linear function approximation. Beyond its strong theoretical guarantees, AdaGoal is anchored in the high-level algorithmic structure of existing methods for goal-conditioned deep reinforcement learning.

READ FULL TEXT
research
04/29/2022

Markov Abstractions for PAC Reinforcement Learning in Non-Markov Decision Processes

Our work aims at developing reinforcement learning algorithms that do no...
research
02/25/2023

On Bellman's principle of optimality and Reinforcement learning for safety-constrained Markov decision process

We study optimality for the safety-constrained Markov decision process w...
research
11/20/2022

Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning

We introduce a physiological model-based agent as proof-of-principle tha...
research
12/29/2020

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

We investigate the exploration of an unknown environment when no reward ...
research
05/08/2023

Goal-oriented inference of environment from redundant observations

The agent learns to organize decision behavior to achieve a behavioral g...
research
01/30/2019

InfoBot: Transfer and Exploration via the Information Bottleneck

A central challenge in reinforcement learning is discovering effective p...
research
05/15/2019

Exploration-Exploitation Trade-off in Reinforcement Learning on Online Markov Decision Processes with Global Concave Rewards

We consider an agent who is involved in a Markov decision process and re...

Please sign up or login with your details

Forgot password? Click here to reset