A2C is a special case of PPO

05/18/2022
by   Shengyi Huang, et al.
11

Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years. A common understanding is that A2C and PPO are separate algorithms because PPO's clipped objective appears significantly different than A2C's objective. In this paper, however, we show A2C is a special case of PPO. We present theoretical justifications and pseudocode analysis to demonstrate why. To validate our claim, we conduct an empirical experiment using , showing A2C and PPO produce the exact same models when other settings are controlled.

READ FULL TEXT
research
06/13/2021

Characterizing the Gap Between Actor-Critic and Policy Gradient

Actor-critic (AC) methods are ubiquitous in reinforcement learning. Alth...
research
09/25/2021

Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms

The hierarchical interaction between the actor and critic in actor-criti...
research
04/08/2020

Solving the scalarization issues of Advantage-based Reinforcement Learning Algorithms

In this paper we investigate some of the issues that arise from the scal...
research
07/10/2020

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

This paper analyzes a two-timescale stochastic algorithm for a class of ...
research
10/26/2021

Hinge Policy Optimization: Rethinking Policy Improvement and Reinterpreting PPO

Policy optimization is a fundamental principle for designing reinforceme...
research
10/23/2019

Partially Detected Intelligent Traffic Signal Control: Environmental Adaptation

Partially Detected Intelligent Traffic Signal Control (PD-ITSC) systems ...
research
10/25/2021

Demystifying and Generalizing BinaryConnect

BinaryConnect (BC) and its many variations have become the de facto stan...

Please sign up or login with your details

Forgot password? Click here to reset