DeepAI
Log In Sign Up

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

12/29/2020
by   Jean Tarbouriech, et al.
0

We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of ϵ-optimal goal-conditioned policies attaining all states that are incrementally reachable within L steps (in expectation) from a reference state s_0. In this paper, we introduce a novel model-based approach that interleaves discovering new states from s_0 and improving the accuracy of a model estimate that is used to compute goal-conditioned policies to reach newly discovered states. The resulting algorithm, DisCo, achieves a sample complexity scaling as Õ(L^5 S_L+ϵΓ_L+ϵ A ϵ^-2), where A is the number of actions, S_L+ϵ is the number of states that are incrementally reachable from s_0 in L+ϵ steps, and Γ_L+ϵ is the branching factor of the dynamics over such states. This improves over the algorithm proposed in [1] in both ϵ and L at the cost of an extra Γ_L+ϵ factor, which is small in most environments of interest. Furthermore, DisCo is the first algorithm that can return an ϵ/c_min-optimal policy for any cost-sensitive shortest-path problem defined on the L-reachable states with minimum cost c_min. Finally, we report preliminary empirical results confirming our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/22/2022

Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path

We revisit the incremental autonomous exploration problem proposed by Li...
10/10/2022

Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

We study the sample complexity of learning an ϵ-optimal policy in the St...
11/23/2021

Adaptive Multi-Goal Exploration

We introduce a generic strategy for provably efficient multi-goal explor...
08/17/2022

Nearly Optimal Latent State Decoding in Block MDPs

We investigate the problems of model estimation and reward-free learning...
07/10/2019

An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies

What is a good exploration strategy for an agent that interacts with an ...
01/25/2019

Provably efficient RL with Rich Observations via Latent State Decoding

We study the exploration problem in episodic MDPs with rich observations...
06/27/2019

ExTra: Transfer-guided Exploration

In this work we present a novel approach for transfer-guided exploration...