Training Reinforcement Neurocontrollers Using the Polytope Algorithm

12/03/1998
by   A. Likas, et al.
0

A new training algorithm is presented for delayed reinforcement learning problems that does not assume the existence of a critic model and employs the polytope optimization algorithm to adjust the weights of the action network so that a simple direct measure of the training performance is maximized. Experimental results from the application of the method to the pole balancing problem indicate improved training performance compared with critic-based and genetic reinforcement approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/10/2023

Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework

In this paper, we propose actor-director-critic, a new framework for dee...
research
06/03/2011

Efficient Reinforcement Learning Using Recursive Least-Squares Methods

The recursive least-squares (RLS) algorithm is one of the most well-know...
research
06/23/2022

CGAR: Critic Guided Action Redistribution in Reinforcement Leaning

Training a game-playing reinforcement learning agent requires multiple i...
research
03/10/2022

Action-Constrained Reinforcement Learning for Frame-Level Bit Allocation in HEVC/H.265 through Frank-Wolfe Policy Optimization

This paper presents a reinforcement learning (RL) framework that leverag...
research
10/20/2020

Language Inference with Multi-head Automata through Reinforcement Learning

The purpose of this paper is to use reinforcement learning to model lear...
research
05/25/2020

Generator and Critic: A Deep Reinforcement Learning Approach for Slate Re-ranking in E-commerce

The slate re-ranking problem considers the mutual influences between ite...
research
06/25/2022

Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization

The class of deep deterministic off-policy algorithms is effectively app...

Please sign up or login with your details

Forgot password? Click here to reset