TOPS: Transition-based VOlatility-controlled Policy Search and its Global Convergence

01/24/2022
by   Liangliang Xu, et al.
0

Risk-averse problems receive far less attention than risk-neutral control problems in reinforcement learning, and existing risk-averse approaches are challenging to deploy to real-world applications. One primary reason is that such risk-averse algorithms often learn from consecutive trajectories with a certain length, which significantly increases the potential danger of causing dangerous failures in practice. This paper proposes Transition-based VOlatility-controlled Policy Search (TOPS), a novel algorithm that solves risk-averse problems by learning from (possibly non-consecutive) transitions instead of only consecutive trajectories. By using an actor-critic scheme with an overparameterized two-layer neural network, our algorithm finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy gradient, with effectiveness comparable to the state-of-the-art convergence rate of risk-neutral policy-search methods. The algorithm is evaluated on challenging Mujoco robot simulation tasks under the mean-variance evaluation metric. Both theoretical analysis and experimental results demonstrate a state-of-the-art level of risk-averse policy search methods.

READ FULL TEXT
research
06/25/2019

Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

Proximal policy optimization and trust region policy optimization (PPO a...
research
12/28/2020

Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy

While deep reinforcement learning has achieved tremendous successes in v...
research
06/20/2020

Entropic Risk Constrained Soft-Robust Policy Optimization

Having a perfect model to compute the optimal policy is often infeasible...
research
05/06/2022

Variance Reduction based Partial Trajectory Reuse to Accelerate Policy Gradient Optimization

We extend the idea underlying the success of green simulation assisted p...
research
07/15/2022

Deep Hedging: Continuous Reinforcement Learning for Hedging of General Portfolios across Multiple Risk Aversions

We present a method for finding optimal hedging policies for arbitrary i...
research
06/15/2022

Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Keeping risk under control is often more crucial than maximizing expecte...
research
06/30/2023

Risk-sensitive Actor-free Policy via Convex Optimization

Traditional reinforcement learning methods optimize agents without consi...

Please sign up or login with your details

Forgot password? Click here to reset