MDPs with low-rank transitions – that is, the transition matrix can be
f...
We study reward-free reinforcement learning (RL) under general non-linea...
We consider a challenging theoretical problem in offline reinforcement
l...
Deployment efficiency is an important criterion for many real-world
appl...
The low rank MDP has emerged as an important model for studying
represen...
This paper studies regret minimization with randomized value functions i...
We consider reinforcement learning (RL) in episodic Markov decision proc...
Langevin diffusion is a powerful method for nonconvex optimization, whic...
Value-function approximation methods that operate in batch mode have
fou...
We propose a new localized inference algorithm for answering marginaliza...