Policy Supervectors: General Characterization of Agents by their Behaviour

12/02/2020
by   Anssi Kanervisto, et al.
0

By studying the underlying policies of decision-making agents, we can learn about their shortcomings and potentially improve them. Traditionally, this has been done either by examining the agent's implementation, its behaviour while it is being executed, its performance with a reward/fitness function or by visualizing the density of states the agent visits. However, these methods fail to describe the policy's behaviour in complex, high-dimensional environments or do not scale to thousands of policies, which is required when studying training algorithms. We propose policy supervectors for characterizing agents by the distribution of states they visit, adopting successful techniques from the area of speech technology. Policy supervectors can characterize policies regardless of their design philosophy (e.g. rule-based vs. neural networks) and scale to thousands of policies on a single workstation machine. We demonstrate method's applicability by studying the evolution of policies during reinforcement learning, evolutionary training and imitation learning, providing insight on e.g. how the search space of evolutionary algorithms is also reflected in agent's behaviour, not just in the parameters.

READ FULL TEXT

page 6

page 15

research
12/13/2019

Recruitment-imitation Mechanism for Evolutionary Reinforcement Learning

Reinforcement learning, evolutionary algorithms and imitation learning a...
research
03/31/2020

Mimicking Evolution with Reinforcement Learning

Evolution gave rise to human and animal intelligence here on Earth. We a...
research
01/21/2022

Reinforcement Learning Your Way: Agent Characterization through Policy Regularization

The increased complexity of state-of-the-art reinforcement learning (RL)...
research
09/02/2022

Semi-Centralised Multi-Agent Reinforcement Learning with Policy-Embedded Training

Centralised training (CT) is the basis for many popular multi-agent rein...
research
11/28/2019

Policies for constraining the behaviour of coalitions of agents in the context of algebraic information theory

This article takes an oblique sidestep from two previous papers, wherein...
research
02/02/2023

Diversity Through Exclusion (DTE): Niche Identification for Reinforcement Learning through Value-Decomposition

Many environments contain numerous available niches of variable value, e...

Please sign up or login with your details

Forgot password? Click here to reset