Understanding the Effect of Stochasticity in Policy Optimization

10/29/2021
by   Jincheng Mei, et al.
0

We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions. First, we show that the preferability of optimization methods depends critically on whether stochastic versus exact gradients are used. In particular, unlike the true gradient setting, geometric information cannot be easily exploited in the stochastic case for accelerating policy optimization without detrimental consequences or impractical assumptions. Second, to explain these findings we introduce the concept of committal rate for stochastic policy optimization, and show that this can serve as a criterion for determining almost sure convergence to global optimality. Third, we show that in the absence of external oracle information, which allows an algorithm to determine the difference between optimal and sub-optimal actions given only on-policy samples, there is an inherent trade-off between exploiting geometry to accelerate convergence versus achieving optimality almost surely. That is, an uninformed algorithm either converges to a globally optimal policy with probability 1 but at a rate no better than O(1/t), or it achieves faster than O(1/t) convergence but then must fail to converge to the globally optimal policy with some positive probability. Finally, we use the committal rate theory to explain why practical policy optimization methods are sensitive to random initialization, then develop an ensemble method that can be guaranteed to achieve near-optimal solutions with high probability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2021

Near Optimal Policy Optimization via REPS

Since its introduction a decade ago, relative entropy policy search (REP...
research
01/16/2023

The Role of Baselines in Policy Gradient Optimization

We study the effect of baselines in on-policy stochastic policy gradient...
research
05/13/2020

On the Global Convergence Rates of Softmax Policy Gradient Methods

We make three contributions toward better understanding policy gradient ...
research
06/25/2019

Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

Proximal policy optimization and trust region policy optimization (PPO a...
research
03/15/2023

Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators

Nonlinear control systems with partial information to the decision maker...
research
08/17/2023

Controlling Federated Learning for Covertness

A learner aims to minimize a function f by repeatedly querying a distrib...
research
06/02/2021

On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction

In this paper, we study the convergence properties of off-policy policy ...

Please sign up or login with your details

Forgot password? Click here to reset