Reinforcement Learning Architectures: SAC, TAC, and ESAC

04/05/2020
by   Ala'eddin Masadeh, et al.
0

The trend is to implement intelligent agents capable of analyzing available information and utilize it efficiently. This work presents a number of reinforcement learning (RL) architectures; one of them is designed for intelligent agents. The proposed architectures are called selector-actor-critic (SAC), tuner-actor-critic (TAC), and estimator-selector-actor-critic (ESAC). These architectures are improved models of a well known architecture in RL called actor-critic (AC). In AC, an actor optimizes the used policy, while a critic estimates a value function and evaluate the optimized policy by the actor. SAC is an architecture equipped with an actor, a critic, and a selector. The selector determines the most promising action at the current state based on the last estimate from the critic. TAC consists of a tuner, a model-learner, an actor, and a critic. After receiving the approximated value of the current state-action pair from the critic and the learned model from the model-learner, the tuner uses the Bellman equation to tune the value of the current state-action pair. ESAC is proposed to implement intelligent agents based on two ideas, which are lookahead and intuition. Lookahead appears in estimating the values of the available actions at the next state, while the intuition appears in maximizing the probability of selecting the most promising action. The newly added elements are an underlying model learner, an estimator, and a selector. The model learner is used to approximate the underlying model. The estimator uses the approximated value function, the learned underlying model, and the Bellman equation to estimate the values of all actions at the next state. The selector is used to determine the most promising action at the next state, which will be used by the actor to optimize the used policy. Finally, the results show the superiority of ESAC compared with the other architectures.

READ FULL TEXT

page 1

page 7

page 8

research
07/14/2019

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Despite the empirical success of the actor-critic algorithm, its theoret...
research
12/11/2019

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

We study the problem of off-policy critic evaluation in several variants...
research
11/05/2021

An Algorithmic Theory of Metacognition in Minds and Machines

Humans sometimes choose actions that they themselves can identify as sub...
research
01/23/2017

Learning to Decode for Future Success

We introduce a simple, general strategy to manipulate the behavior of a ...
research
10/31/2017

TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning

Combining deep model-free reinforcement learning with on-line planning i...
research
11/13/2020

Scaffolding Reflection in Reinforcement Learning Framework for Confinement Escape Problem

This paper formulates an application of reinforcement learning for an ev...
research
06/08/2022

Scalable Online Disease Diagnosis via Multi-Model-Fused Actor-Critic Reinforcement Learning

For those seeking healthcare advice online, AI based dialogue agents cap...

Please sign up or login with your details

Forgot password? Click here to reset