Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability

07/13/2023
by   Yanran Wang, et al.
0

Reinforcement Learning or optimal control can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and corresponding optimal policy. Consequently, formalizing the sequential decision-making problems as inference has a considerable value, as probabilistic inference in principle offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of the reward design and policy convergence. In this study, we propose a novel Adaptive Wasserstein Variational Optimization (AWaVO) to tackle these challenges in sequential decision-making. Our approach utilizes formal methods to provide interpretations of reward design, transparency of training convergence, and probabilistic interpretation of sequential decisions. To demonstrate practicality, we show convergent training with guaranteed global convergence rates not only in simulation but also in real robot tasks, and empirically verify a reasonable tradeoff between high performance and conservative interpretability.

READ FULL TEXT
research
05/02/2018

Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

The framework of reinforcement learning or optimal control provides a ma...
research
04/20/2021

Outcome-Driven Reinforcement Learning via Variational Inference

While reinforcement learning algorithms provide automated acquisition of...
research
06/13/2019

Modeling and Interpreting Real-world Human Risk Decision Making with Inverse Reinforcement Learning

We model human decision-making behaviors in a risk-taking task using inv...
research
10/02/2019

CWAE-IRL: Formulating a supervised approach to Inverse Reinforcement Learning problem

Inverse reinforcement learning (IRL) is used to infer the reward functio...
research
09/13/2021

Pre-emptive learning-to-defer for sequential medical decision-making under uncertainty

We propose SLTD (`Sequential Learning-to-Defer') a framework for learnin...
research
05/06/2022

Optimal Control as Variational Inference

In this article we address the stochastic and risk sensitive optimal con...
research
01/24/2017

Weak Adaptive Submodularity and Group-Based Active Diagnosis with Applications to State Estimation with Persistent Sensor Faults

In this paper, we consider adaptive decision-making problems for stochas...

Please sign up or login with your details

Forgot password? Click here to reset