Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound

05/24/2019
by   Lin F. Yang, et al.
0

Exploration in reinforcement learning (RL) suffers from the curse of dimensionality when the state-action space is large. A common practice is to parameterize the high-dimensional value and policy functions using given features. However existing methods either have no theoretical guarantee or suffer a regret that is exponential in the planning horizon H. In this paper, we propose an online RL algorithm, namely the MatrixRL, that leverages ideas from linear bandit to learn a low-dimensional representation of the probability transition model while carefully balancing the exploitation-exploration tradeoff. We show that MatrixRL achieves a regret bound O(H^2d T√(T)) where d is the number of features. MatrixRL has an equivalent kernelized version, which is able to work with an arbitrary kernel Hilbert space without using explicit features. In this case, the kernelized MatrixRL satisfies a regret bound O(H^2d T√(T)), where d is the effective dimension of the kernel space. To our best knowledge, for RL using features or kernels, our results are the first regret bounds that are near-optimal in time T and dimension d (or d) and polynomial in the planning horizon H.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2020

No-Regret Reinforcement Learning with Value Function Approximation: a Kernel Embedding Approach

We consider the regret minimisation problem in reinforcement learning (R...
research
04/21/2022

Provably Efficient Kernelized Q-Learning

We propose and analyze a kernelized version of Q-learning. Although a ke...
research
04/12/2020

Regret Bounds for Kernel-Based Reinforcement Learning

We consider the exploration-exploitation dilemma in finite-horizon reinf...
research
11/18/2019

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

The construction in the recent paper by Du et al. [2019] implies that se...
research
05/15/2023

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

Recent studies have shown that episodic reinforcement learning (RL) is n...
research
09/04/2020

Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection

We consider the stochastic contextual bandit problem under the high dime...
research
12/30/2022

POMRL: No-Regret Learning-to-Plan with Increasing Horizons

We study the problem of planning under model uncertainty in an online me...

Please sign up or login with your details

Forgot password? Click here to reset