Near-optimal Representation Learning for Linear Bandits and Linear RL

02/08/2021
by   Jiachen Hu, et al.
0

This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We first consider the setting where we play M linear bandits with dimension d concurrently, and these bandits share a common k-dimensional linear representation so that k≪ d and k ≪ M. We propose a sample-efficient algorithm, MTLR-OFUL, which leverages the shared representation to achieve Õ(M√(dkT) + d√(kMT) ) regret, with T being the number of total steps. Our regret significantly improves upon the baseline Õ(Md√(T)) achieved by solving each task independently. We further develop a lower bound that shows our regret is near-optimal when d > M. Furthermore, we extend the algorithm and analysis to multi-task episodic RL with linear value function approximation under low inherent Bellman error <cit.>. To the best of our knowledge, this is the first theoretical result that characterizes the benefits of multi-task representation learning for exploration in RL with function approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

Nearly Minimax Algorithms for Linear Bandits with Shared Representation

We give novel algorithms for multi-task and lifelong linear bandits with...
research
06/24/2022

Joint Representation Training in Sequential Tasks with Shared Structure

Classical theory in reinforcement learning (RL) predominantly focuses on...
research
02/09/2023

Multi-task Representation Learning for Pure Exploration in Linear Bandits

Despite the recent success of representation learning in sequential deci...
research
02/26/2023

No-Regret Linear Bandits beyond Realizability

We study linear bandits when the underlying reward function is not linea...
research
05/31/2022

Provable General Function Class Representation Learning in Multitask Bandits and MDPs

While multitask representation learning has become a popular approach in...
research
10/24/2022

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

We study the problem of representation learning in stochastic contextual...
research
11/18/2019

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

The construction in the recent paper by Du et al. [2019] implies that se...

Please sign up or login with your details

Forgot password? Click here to reset