Provably Efficient Algorithm for Nonstationary Low-Rank MDPs

08/10/2023
by   Yuan Cheng, et al.
0

Reinforcement learning (RL) under changing environment models many real-world applications via nonstationary Markov Decision Processes (MDPs), and hence gains considerable interest. However, theoretical studies on nonstationary MDPs in the literature have mainly focused on tabular and linear (mixture) MDPs, which do not capture the nature of unknown representation in deep RL. In this paper, we make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time, and the low-rank model contains unknown representation in addition to the linear state embedding function. We first propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL, which is able to tune its hyper-parameters adaptively without any prior knowledge of nonstationarity. For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with polynomial sample complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2021

Representation Learning for Online and Offline RL in Low-rank MDPs

This work studies the question of Representation Learning in RL: how can...
research
03/20/2023

Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs

In reward-free reinforcement learning (RL), an agent explores the enviro...
research
06/18/2020

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs

In order to deal with the curse of dimensionality in reinforcement learn...
research
07/08/2023

Efficient Model-Free Exploration in Low-Rank MDPs

A major challenge in reinforcement learning is to develop practical, sam...
research
07/14/2022

Making Linear MDPs Practical via Contrastive Representation Learning

It is common to address the curse of dimensionality in Markov decision p...
research
02/04/2023

Reinforcement Learning in Low-Rank MDPs with Density Features

MDPs with low-rank transitions – that is, the transition matrix can be f...
research
06/06/2021

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

In the predict-then-optimize framework, the objective is to train a pred...

Please sign up or login with your details

Forgot password? Click here to reset