Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL

04/12/2023
by   Zakaria Mhammedi, et al.
1

We study the design of sample-efficient algorithms for reinforcement learning in the presence of rich, high-dimensional observations, formalized via the Block MDP problem. Existing algorithms suffer from either 1) computational intractability, 2) strong statistical assumptions that are not necessarily satisfied in practice, or 3) suboptimal sample complexity. We address these issues by providing the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level, with minimal statistical assumptions. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics, a learning objective in which the aim is to predict the learner's own action from the current observation and observations in the (potentially distant) future. MusIK is simple and flexible, and can efficiently take advantage of general-purpose function approximation. Our analysis leverages several new techniques tailored to non-optimistic exploration algorithms, which we anticipate will find broader use.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2021

Provable RL with Exogenous Distractors via Multistep Inverse Dynamics

Many real-world applications of reinforcement learning (RL) require the ...
research
07/08/2023

Efficient Model-Free Exploration in Low-Rank MDPs

A major challenge in reinforcement learning is to develop practical, sam...
research
06/21/2023

Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP

In this paper, we study representation learning in partially observable ...
research
06/22/2021

Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations

There have been many recent advances on provably efficient Reinforcement...
research
03/01/2018

On Polynomial Time PAC Reinforcement Learning with Rich Observations

We study the computational tractability of provably sample-efficient (PA...
research
03/16/2020

Discrete-valued Preference Estimation with Graph Side Information

Incorporating graph side information into recommender systems has been w...
research
06/05/2023

Improved Active Multi-Task Representation Learning via Lasso

To leverage the copious amount of data from source tasks and overcome th...

Please sign up or login with your details

Forgot password? Click here to reset