Towards Theoretical Understanding of Data-Driven Policy Refinement

05/11/2023
by   Ali Baheri, et al.
6

This paper presents an approach for data-driven policy refinement in reinforcement learning, specifically designed for safety-critical applications. Our methodology leverages the strengths of data-driven optimization and reinforcement learning to enhance policy safety and optimality through iterative refinement. Our principal contribution lies in the mathematical formulation of this data-driven policy refinement concept. This framework systematically improves reinforcement learning policies by learning from counterexamples identified during data-driven verification. Furthermore, we present a series of theorems elucidating key theoretical properties of our approach, including convergence, robustness bounds, generalization error, and resilience to model mismatch. These results not only validate the effectiveness of our methodology but also contribute to a deeper understanding of its behavior in different environments and scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2023

Joint Learning of Policy with Unknown Temporal Constraints for Safe Reinforcement Learning

In many real-world applications, safety constraints for reinforcement le...
research
03/06/2020

Lane-Merging Using Policy-based Reinforcement Learning and Post-Optimization

Many current behavior generation methods struggle to handle real-world t...
research
12/27/2022

Data-driven control of COVID-19 in buildings: a reinforcement-learning approach

In addition to its public health crisis, COVID-19 pandemic has led to th...
research
02/13/2023

Online Safety Property Collection and Refinement for Safe Deep Reinforcement Learning in Mapless Navigation

Safety is essential for deploying Deep Reinforcement Learning (DRL) algo...
research
04/08/2022

Data-Driven Evaluation of Training Action Space for Reinforcement Learning

Training action space selection for reinforcement learning (RL) is confl...
research
02/23/2021

Mixed Policy Gradient

Reinforcement learning (RL) has great potential in sequential decision-m...
research
07/01/2022

Action-modulated midbrain dopamine activity arises from distributed control policies

Animal behavior is driven by multiple brain regions working in parallel ...

Please sign up or login with your details

Forgot password? Click here to reset