Controlling an Inverted Pendulum with Policy Gradient Methods-A Tutorial

05/17/2021
by   Swagat Kumar, et al.
0

This paper provides the details of implementing two important policy gradient methods to solve the inverted pendulum problem. These are namely the Deep Deterministic Policy Gradient (DDPG) and the Proximal Policy Optimization (PPO) algorithm. The problem is solved by using an actor-critic model where an actor-network is used to learn the policy function and a critic network is to evaluate the actor-network by learning to estimate the Q function. Apart from briefly explaining the mathematics behind these two algorithms, the details of python implementation are provided which helps in demystifying the underlying complexity of the algorithm. In the process, the readers will be introduced to OpenAI/Gym, Tensorflow 2.x and Keras utilities used for implementing the above concepts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2021

Characterizing the Gap Between Actor-Critic and Policy Gradient

Actor-critic (AC) methods are ubiquitous in reinforcement learning. Alth...
research
02/26/2020

When Do Drivers Concentrate? Attention-based Driver Behavior Modeling With Deep Reinforcement Learning

Driver distraction a significant risk to driving safety. Apart from spat...
research
11/22/2018

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Policy gradient methods are widely used for control in reinforcement lea...
research
06/20/2022

DNA: Proximal Policy Optimization with a Dual Network Architecture

This paper explores the problem of simultaneously learning a value funct...
research
06/28/2023

SARC: Soft Actor Retrospective Critic

The two-time scale nature of SAC, which is an actor-critic algorithm, is...
research
09/09/2019

Transfer Reward Learning for Policy Gradient-Based Text Generation

Task-specific scores are often used to optimize for and evaluate the per...
research
06/10/2019

Exploiting the sign of the advantage function to learn deterministic policies in continuous domains

In the context of learning deterministic policies in continuous domains,...

Please sign up or login with your details

Forgot password? Click here to reset