An Algorithmic Theory of Metacognition in Minds and Machines

11/05/2021
by   Rylan Schaeffer, et al.
0

Humans sometimes choose actions that they themselves can identify as sub-optimal, or wrong, even in the absence of additional information. How is this possible? We present an algorithmic theory of metacognition based on a well-understood trade-off in reinforcement learning (RL) between value-based RL and policy-based RL. To the cognitive (neuro)science community, our theory answers the outstanding question of why information can be used for error detection but not for action selection. To the machine learning community, our proposed theory creates a novel interaction between the Actor and Critic in Actor-Critic agents and notes a novel connection between RL and Bayesian Optimization. We call our proposed agent the Metacognitive Actor Critic (MAC). We conclude with showing how to create metacognition in machines by implementing a deep MAC and showing that it can detect (some of) its own suboptimal actions without external information or delay.

READ FULL TEXT
research
10/19/2021

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most importan...
research
04/05/2020

Reinforcement Learning Architectures: SAC, TAC, and ESAC

The trend is to implement intelligent agents capable of analyzing availa...
research
09/09/2023

Advantage Actor-Critic with Reasoner: Explaining the Agent's Behavior from an Exploratory Perspective

Reinforcement learning (RL) is a powerful tool for solving complex decis...
research
09/18/2019

A Human-Centered Data-Driven Planner-Actor-Critic Architecture via Logic Programming

Recent successes of Reinforcement Learning (RL) allow an agent to learn ...
research
06/17/2019

PACMAN: A Planner-Actor-Critic Architecture for Human-Centered Planning and Learning

Conventional reinforcement learning (RL) allows an agent to learn polici...
research
09/09/2019

AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers

The exploration mechanism used by a Deep Reinforcement Learning (RL) age...
research
05/19/2022

Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes

Despite intense efforts in basic and clinical research, an individualize...

Please sign up or login with your details

Forgot password? Click here to reset