Co-Training an Observer and an Evading Target

10/20/2022
by   André Brandenburger, et al.
0

Reinforcement learning (RL) is already widely applied to applications such as robotics, but it is only sparsely used in sensor management. In this paper, we apply the popular Proximal Policy Optimization (PPO) approach to a multi-agent UAV tracking scenario. While recorded data of real scenarios can accurately reflect the real world, the required amount of data is not always available. Simulation data, however, is typically cheap to generate, but the utilized target behavior is often naive and only vaguely represents the real world. In this paper, we utilize multi-agent RL to jointly generate protagonistic and antagonistic policies and overcome the data generation problem, as the policies are generated on-the-fly and adapt continuously. This way, we are able to clearly outperform baseline methods and robustly generate competitive policies. In addition, we investigate explainable artificial intelligence (XAI) by interpreting feature saliency and generating an easy-to-read decision tree as a simplified policy.

READ FULL TEXT

page 1

page 7

research
05/25/2022

MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning

Many recent breakthroughs in multi-agent reinforcement learning (MARL) r...
research
10/03/2019

Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics

Many real world tasks require multiple agents to work together. Multi-ag...
research
07/20/2020

Interpretable Control by Reinforcement Learning

In this paper, three recently introduced reinforcement learning (RL) met...
research
11/10/2021

PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems

We present the PowerGridworld software package to provide users with a l...
research
12/15/2022

Emergent Behaviors in Multi-Agent Target Acquisition

Only limited studies and superficial evaluations are available on agents...
research
08/30/2021

Trustworthy AI for Process Automation on a Chylla-Haase Polymerization Reactor

In this paper, genetic programming reinforcement learning (GPRL) is util...
research
01/04/2022

Learning Complex Spatial Behaviours in ABM: An Experimental Observational Study

Capturing and simulating intelligent adaptive behaviours within spatiall...

Please sign up or login with your details

Forgot password? Click here to reset