Continuous-Time Fitted Value Iteration for Robust Policies

10/05/2021
by   Michael Lutter, et al.
1

Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task. In the case of the Hamilton-Jacobi-Isaacs equation, which includes an adversary controlling the environment and minimizing the reward, the obtained policy is also robust to perturbations of the dynamics. In this paper we propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI). These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems to derive the optimal policy and optimal adversary in closed form. This analytic expression simplifies the differential equations and enables us to solve for the optimal value function using value iteration for continuous actions and states as well as the adversarial case. Notably, the resulting algorithms do not require discretization of states or actions. We apply the resulting algorithms to the Furuta pendulum and cartpole. We show that both algorithms obtain the optimal policy. The robustness Sim2Real experiments on the physical systems show that the policies successfully achieve the task in the real-world. When changing the masses of the pendulum, we observe that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi

READ FULL TEXT

page 9

page 10

page 11

page 12

page 15

research
05/25/2021

Robust Value Iteration for Continuous Control Tasks

When transferring a control policy from simulation to a physical system,...
research
05/10/2021

Value Iteration in Continuous Actions, States and Time

Classical value iteration approaches are not applicable to environments ...
research
04/02/2021

Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L)

This paper addresses distributional offline continuous-time reinforcemen...
research
07/14/2020

Ternary Policy Iteration Algorithm for Nonlinear Robust Control

The uncertainties in plant dynamics remain a challenge for nonlinear con...
research
02/21/2020

Estimating Q(s,s') with Deep Deterministic Dynamics Gradients

In this paper, we introduce a novel form of value function, Q(s, s'), th...
research
03/01/2023

Forward-PECVaR Algorithm: Exact Evaluation for CVaR SSPs

The Stochastic Shortest Path (SSP) problem models probabilistic sequenti...
research
01/26/2019

Action Robust Reinforcement Learning and Applications in Continuous Control

A policy is said to be robust if it maximizes the reward while consideri...

Please sign up or login with your details

Forgot password? Click here to reset