Adversarial Online Multi-Task Reinforcement Learning

01/11/2023
by   Quan Nguyen, et al.
7

We consider the adversarial online multi-task reinforcement learning setting, where in each of K episodes the learner is given an unknown task taken from a finite set of M unknown finite-horizon MDP models. The learner's objective is to minimize its regret with respect to the optimal policy for each task. We assume the MDPs in ℳ are well-separated under a notion of λ-separability, and show that this notion generalizes many task-separability notions from previous works. We prove a minimax lower bound of Ω(K√(DSAH)) on the regret of any learning algorithm and an instance-specific lower bound of Ω(K/λ^2) in sample complexity for a class of uniformly-good cluster-then-learn algorithms. We use a novel construction called 2-JAO MDP for proving the instance-specific lower bound. The lower bounds are complemented with a polynomial time algorithm that obtains Õ(K/λ^2) sample complexity guarantee for the clustering phase and Õ(√(MK)) regret guarantee for the learning phase, indicating that the dependency on K and 1/λ^2 is tight.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2020

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

In this paper, we propose new problem-independent lower bounds on the sa...
research
03/07/2021

A Lower Bound for the Sample Complexity of Inverse Reinforcement Learning

Inverse reinforcement learning (IRL) is the task of finding a reward fun...
research
02/24/2020

Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

It has been a trend in the Reinforcement Learning literature to derive s...
research
02/25/2023

Exponential Hardness of Reinforcement Learning with Linear Function Approximation

A fundamental question in reinforcement learning theory is: suppose the ...
research
01/05/2022

Regret Lower Bounds for Learning Linear Quadratic Gaussian Systems

This paper presents local minimax regret lower bounds for adaptively con...
research
08/05/2021

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

The theory of reinforcement learning has focused on two fundamental prob...
research
06/07/2017

Efficient Reinforcement Learning via Initial Pure Exploration

In several realistic situations, an interactive learning agent can pract...

Please sign up or login with your details

Forgot password? Click here to reset