A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning

11/22/2021
by   Tongzheng Ren, et al.
0

Representation learning lies at the heart of the empirical success of deep learning for dealing with the curse of dimensionality. However, the power of representation learning has not been fully exploited yet in reinforcement learning (RL), due to i), the trade-off between expressiveness and tractability; and ii), the coupling between exploration and representation learning. In this paper, we first reveal the fact that under some noise assumption in the stochastic control model, we can obtain the linear spectral feature of its corresponding Markov transition operator in closed-form for free. Based on this observation, we propose Spectral Dynamics Embedding (SPEDE), which breaks the trade-off and completes optimistic exploration for representation learning by exploiting the structure of the noise. We provide rigorous theoretical analysis of SPEDE, and demonstrate the practical superior performance over the existing state-of-the-art empirical algorithms on several benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2022

Spectral Decomposition Representation for Reinforcement Learning

Representation learning often plays a critical role in reinforcement lea...
research
06/22/2021

Provably Efficient Representation Learning in Low-rank Markov Decision Processes

The success of deep reinforcement learning (DRL) is due to the power of ...
research
02/14/2021

Model-free Representation Learning and Exploration in Low-rank MDPs

The low rank MDP has emerged as an important model for studying represen...
research
07/14/2022

Making Linear MDPs Practical via Contrastive Representation Learning

It is common to address the curse of dimensionality in Markov decision p...
research
09/28/2021

Exploratory State Representation Learning

Not having access to compact and meaningful representations is known to ...
research
03/31/2023

Accelerating exploration and representation learning with offline pre-training

Sequential decision-making agents struggle with long horizon tasks, sinc...
research
07/17/2019

Learnability for the Information Bottleneck

The Information Bottleneck (IB) method (tishby2000information) provides ...

Please sign up or login with your details

Forgot password? Click here to reset