Learning to Control in Metric Space with Optimal Regret

05/05/2019
by   Lin F. Yang, et al.
0

We study online reinforcement learning for finite-horizon deterministic control systems with arbitrary state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that the regret of the algorithm after K episodes is O(HL(KH)^d-1/d) where L is a smoothness parameter, and d is the doubling dimension of the state-action space with respect to the given metric. We also establish a near-matching regret lower bound. The proposed method can be adapted to work for more structured transition systems, including the finite-state case and the case where value functions are linear combinations of features, where the method also achieve the optimal regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2020

Provably adaptive reinforcement learning in metric spaces

We study reinforcement learning in continuous state and action spaces en...
research
03/09/2020

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Despite the wealth of research into provably efficient reinforcement lea...
research
11/02/2020

Stochastic Linear Bandits with Protected Subspace

We study a variant of the stochastic linear bandit problem wherein we op...
research
02/26/2018

Variance Reduction Methods for Sublinear Reinforcement Learning

This work considers the problem of provably optimal reinforcement learni...
research
07/13/2021

Model Selection with Near Optimal Rates for Reinforcement Learning with General Model Classes

We address the problem of model selection for the finite horizon episodi...
research
06/22/2023

Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping

Reinforcement learning often needs to deal with the exponential growth o...
research
02/19/2020

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently

We consider the problem of learning in Linear Quadratic Control systems ...

Please sign up or login with your details

Forgot password? Click here to reset