Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution

06/29/2020
by   Zahi M. Kakish, et al.
0

In this paper, we present a reinforcement learning approach to designing a control policy for a "leader” agent that herds a swarm of "follower” agents, via repulsive interactions, as quickly as possible to a target probability distribution over a strongly connected graph. The leader control policy is a function of the swarm distribution, which evolves over time according to a mean-field model in the form of an ordinary difference equation. The dependence of the policy on agent populations at each graph vertex, rather than on individual agent activity, simplifies the observations required by the leader and enables the control strategy to scale with the number of agents. Two Temporal-Difference learning algorithms, SARSA and Q-Learning, are used to generate the leader control policy based on the follower agent distribution and the leader's location on the graph. A simulation environment corresponding to a grid graph with 4 vertices was used to train and validate the control policies for follower agent populations ranging from 10 to 100. Finally, the control policies trained on 100 simulated agents were used to successfully redistribute a physical swarm of 10 small robots to a target distribution among 4 spatial regions.

READ FULL TEXT
research
09/15/2022

Scalable Task-Driven Robotic Swarm Control via Collision Avoidance and Learning Mean-Field Control

In recent years, reinforcement learning and its multi-agent analogue hav...
research
04/16/2014

Partially Observed, Multi-objective Markov Games

The intent of this research is to generate a set of non-dominated polici...
research
07/12/2023

DSSE: a drone swarm search environment

The Drone Swarm Search project is an environment, based on PettingZoo, t...
research
08/27/2021

Optimized leaders strategies for crowd evacuation in unknown environments with multiple exits

In this chapter, we discuss the mathematical modeling of egressing pedes...
research
10/05/2021

A study of first-passage time minimization via Q-learning in heated gridworlds

Optimization of first-passage times is required in applications ranging ...
research
07/26/2023

MorphoLander: Reinforcement Learning Based Landing of a Group of Drones on the Adaptive Morphogenetic UAV

This paper focuses on a novel robotic system MorphoLander representing h...
research
03/21/2017

Controllability to Equilibria of the 1-D Fokker-Planck Equation with Zero-Flux Boundary Condition

We consider the problem of controlling the spatiotemporal probability di...

Please sign up or login with your details

Forgot password? Click here to reset