Bayesian Risk-Averse Q-Learning with Streaming Observations

05/18/2023
by   Yuhao Wang, et al.
0

We consider a robust reinforcement learning problem, where a learning agent learns from a simulated training environment. To account for the model mis-specification between this training environment and the real environment due to lack of data, we adopt a formulation of Bayesian risk MDP (BRMDP) with infinite horizon, which uses Bayesian posterior to estimate the transition model and impose a risk functional to account for the model uncertainty. Observations from the real environment that is out of the agent's control arrive periodically and are utilized by the agent to update the Bayesian posterior to reduce model uncertainty. We theoretically demonstrate that BRMDP balances the trade-off between robustness and conservativeness, and we further develop a multi-stage Bayesian risk-averse Q-learning algorithm to solve BRMDP with streaming observations from real environment. The proposed algorithm learns a risk-averse yet optimal policy that depends on the availability of real-world observations. We provide a theoretical guarantee of strong convergence for the proposed algorithm.

READ FULL TEXT
research
01/30/2023

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning

Many real-world domains require safe decision making in the presence of ...
research
08/16/2023

Eliciting Risk Aversion with Inverse Reinforcement Learning via Interactive Questioning

This paper proposes a novel framework for identifying an agent's risk av...
research
04/22/2023

Reinforcement Learning with an Abrupt Model Change

The problem of reinforcement learning is considered where the environmen...
research
09/09/2022

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

Prior work on safe Reinforcement Learning (RL) has studied risk-aversion...
research
06/01/2020

Robust Reinforcement Learning with Wasserstein Constraint

Robust Reinforcement Learning aims to find the optimal policy with some ...
research
10/27/2022

Provable Sim-to-real Transfer in Continuous Domain with Partial Observations

Sim-to-real transfer trains RL agents in the simulated environments and ...
research
01/11/2021

Reinforcement Learning under Model Risk for Biomanufacturing Fermentation Control

In the biopharmaceutical manufacturing, fermentation process plays a cri...

Please sign up or login with your details

Forgot password? Click here to reset