Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path

05/22/2022
by   Haoyuan Cai, et al.
0

We revisit the incremental autonomous exploration problem proposed by Lim Auer (2012). In this setting, the agent aims to learn a set of near-optimal goal-conditioned policies to reach the L-controllable states: states that are incrementally reachable from an initial state s_0 within L steps in expectation. We introduce a new algorithm with stronger sample complexity bounds than existing ones. Furthermore, we also prove the first lower bound for the autonomous exploration problem. In particular, the lower bound implies that our proposed algorithm, Value-Aware Autonomous Exploration, is nearly minimax-optimal when the number of L-controllable states grows polynomially with respect to L. Key in our algorithm design is a connection between autonomous exploration and multi-goal stochastic shortest path, a new problem that naturally generalizes the classical stochastic shortest path problem. This new problem and its connection to autonomous exploration can be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2020

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

We investigate the exploration of an unknown environment when no reward ...
research
02/07/2023

Layered State Discovery for Incremental Autonomous Exploration

We study the autonomous exploration (AX) problem proposed by Lim Aue...
research
04/22/2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

We study the problem of learning in the stochastic shortest path (SSP) s...
research
02/23/2020

Near-optimal Regret Bounds for Stochastic Shortest Path

Stochastic shortest path (SSP) is a well-known problem in planning and c...
research
10/22/2018

A minimax near-optimal algorithm for adaptive rejection sampling

Rejection Sampling is a fundamental Monte-Carlo method. It is used to sa...
research
02/01/2023

Agility and Target Distribution in the Dynamic Stochastic Traveling Salesman Problem

An important variant of the classic Traveling Salesman Problem (TSP) is ...
research
10/24/2018

Learning to Route Efficiently with End-to-End Feedback: The Value of Networked Structure

We introduce efficient algorithms which achieve nearly optimal regrets f...

Please sign up or login with your details

Forgot password? Click here to reset