Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

06/15/2021
by   Rongpeng Li, et al.
0

Online reinforcement learning (RL) has been widely applied in information processing scenarios, which usually exhibit much uncertainty due to the intrinsic randomness of channels and service demands. In this paper, we consider an un-discounted RL in general Markov decision processes (MDPs) with both endogeneous and exogeneous uncertainty, where both the rewards and state transition probability are unknown to the RL agent and evolve with the time as long as their respective variations do not exceed certain dynamic budget (i.e., upper bound). We first develop a variation-aware Bernstein-based upper confidence reinforcement learning (VB-UCRL), which we allow to restart according to a schedule dependent on the variations. We successfully overcome the challenges due to the exogeneous uncertainty and establish a regret bound of saving at most √(S) or S^1/6T^1/12 compared with the latest results in the literature, where S denotes the state size of the MDP and T indicates the iteration index of learning steps.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2020

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

We consider un-discounted reinforcement learning (RL) in Markov decision...
research
10/08/2020

Nonstationary Reinforcement Learning with Linear Function Approximation

We consider reinforcement learning (RL) in episodic Markov decision proc...
research
02/25/2022

Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach

Dynamic mechanism design studies how mechanism designers should allocate...
research
02/15/2021

Causal Markov Decision Processes: Learning Good Interventions Efficiently

We introduce causal Markov Decision Processes (C-MDPs), a new formalism ...
research
09/28/2022

Online Policy Optimization for Robust MDP

Reinforcement learning (RL) has exceeded human performance in many synth...
research
04/20/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

The upper confidence reinforcement learning (UCRL2) strategy introduced ...
research
01/31/2022

Cooperative Online Learning in Stochastic and Adversarial MDPs

We study cooperative online learning in stochastic and adversarial Marko...

Please sign up or login with your details

Forgot password? Click here to reset