Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations

06/02/2023
∙
by   Minshuo Chen, et al.
∙
5
∙

In real-world reinforcement learning (RL) systems, various forms of impaired observability can complicate matters. These situations arise when an agent is unable to observe the most recent state of the system due to latency or lossy channels, yet the agent must still make real-time decisions. This paper introduces a theoretical investigation into efficient RL in control systems where agents must act with delayed and missing state observations. We establish near-optimal regret bounds, of the form 𝒊Ėƒ(√( poly(H) SAK)), for RL in both the delayed and missing observation settings. Despite impaired observability posing significant challenges to the policy class and planning, our results demonstrate that learning remains efficient, with the regret bound optimally depending on the state-action size of the original system. Additionally, we provide a characterization of the performance of the optimal policy under impaired observability, comparing it to the optimal value obtained with full observability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 05/23/2022

Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation

We study human-in-the-loop reinforcement learning (RL) with trajectory p...
research
∙ 05/31/2022

Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

Many real-world settings involve costs for performing actions; transacti...
research
∙ 05/27/2019

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

State-of-the-art efficient model-based Reinforcement Learning (RL) algor...
research
∙ 04/02/2020

Value Driven Representation for Human-in-the-Loop Reinforcement Learning

Interactive adaptive systems powered by Reinforcement Learning (RL) have...
research
∙ 03/16/2022

Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act

Traditionally, Reinforcement Learning (RL) aims at deciding how to act o...
research
∙ 10/19/2022

On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness

Generalization in Reinforcement Learning (RL) aims to learn an agent dur...
research
∙ 01/28/2020

Real-time calibration of coherent-state receivers: learning by trial and error

The optimal discrimination of coherent states of light with current tech...

Please sign up or login with your details

Forgot password? Click here to reset