Gesture based Human-Swarm Interactions for Formation Control using interpreters

We propose a novel Human-Swarm Interaction (HSI) framework which enables the user to control a swarm shape and formation. The user commands the swarm utilizing just arm gestures and motions which are recorded by an off-the-shelf wearable armband. We propose a novel interpreter system, which acts as an intermediary between the user and the swarm to simplify the user's role in the interaction. The interpreter takes in a high level input drawn using gestures by the user, and translates it into low level swarm control commands. This interpreter employs machine learning, Kalman filtering and optimal control techniques to translate the user input into swarm control parameters. A notion of Human Interpretable dynamics is introduced, which is used by the interpreter for planning as well as to provide feedback to the user. The dynamics of the swarm are controlled using a novel decentralized formation controller based on distributed linear iterations and dynamic average consensus. The framework is demonstrated theoretically as well as experimentally in a 2D environment, with a human controlling a swarm of simulated robots in real time.



page 4


SwarmTouch: Guiding a Swarm of Micro-Quadrotors with Impedance Control using a Wearable Tactile Interface

To achieve a smooth and safe guiding of a drone formation by a human ope...

A wearable general-purpose solution for Human-Swarm Interaction

Swarms of robots will revolutionize many industrial applications, from t...

Towards Decentralized Human-Swarm Interaction by Means of Sequential Hand Gesture Recognition

In this work, we present preliminary work on a novel method for Human-Sw...

Leaderless Swarm Formation Control: From Global Specifications to Local Control Laws

This paper introduces a distributed leaderless swarm formation control f...

Ergodic Specifications for Flexible Swarm Control: From User Commands to Persistent Adaptation

This paper presents a formulation for swarm control and high-level task ...

SwarmTouch: Tactile Interaction of Human with Impedance Controlled Swarm of Nano-Quadrotors

We propose a novel interaction strategy for a human-swarm communication ...

SwarmPaint: Human-Swarm Interaction for Trajectory Generation and Formation Control by DNN-based Gesture Interface

Teleoperation tasks with multi-agent systems have a high potential in su...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Motivation. Due to recent advances in technology, the field of swarm robotics has become pervasive in the research community while slowly permeating to the industry. Although the coordination of multiple robots such as foraging, coverage, and flocking (Olfati-Saber et al. (2006); Jadbabaie et al. (2003); Bullo et al. (2009)) has received much attention, the human interaction with robotic swarms is less understood( Kolling et al. (2016)). Thus, according to the latest Robotics Roadmap111 Christensen, H. I., et al. "A roadmap for US robotics: from internet to robotics." (2016). a top priority in swarm robotics is the development of unifying HSI frameworks, the elucidation of rich set of HSI examples, and their comparison. In particular, there is a need to develop novel intuitive interfaces for humans to communicate their intentions to swarms and make it easier for humans to interpret swarms. At the same time, a swarm may require high dimensional and complex control inputs which cannot be intuitively given by a human. Motivated by this, we propose to build a novel supervisory interpreter (Figure 1) to bridge the human and the swarm, which is essential to ensure the effectiveness of a HSI system. We consider the particular problem of formation control, where the human can intuitively draw shapes in the air with his/her arm, which is translated into an effective distributed controller.

Human User

Wearable Device

Intention Decoder


Decentralized Swarm Controller

Robot Swarm


Figure 1: Workflow of proposed Human Swarm Interface with wearable. The user communicates their intent through the Myo armband which produces observations according to Section 3

. The decoder estimates the user intent

from observations . The planner uses to optimally plan a set of intermediate goals denoting the interpreter’s command . The decentralized controller present in each agent then tries to reach the by computing the velocities .

Related Work. According to recent surveys on HSI (Kolling et al. (2016)) and human multi-agent systems (Franchi (2017)), humans either take a supervisory( Savla and Frazzoli (2012)), direct( Setter et al. (2015)), shared (Franchi et al. (2012)) or environmental( Wang and Schwager (2016)) control role in an HSI framework. Our architecture however, allows humans to provide high level supervisory inputs that are also direct and detailed at the same time, thus allowing a high degree of control with lesser human effort for large swarms. Most of the HSI frameworks design have been human-centric and focused on direct control of swarms either through teleoperation or proximal interaction; see e.g. Jawad et al. (2014); Setter et al. (2015). Due to complicated swarm dynamics, the human will quickly be overwhelmed and would not make the best decisions, as in our previous work Suresh (2016); Suresh and Schwager (2016). Our planner addresses this by generating an intuitive human-approved swarm-friendly plan for the swarm to follow. More recently, gesture based techniques along with speech, vision and motion have been used together to interact with small teams of robots in Alonso-Mora et al. (2015) and Gromov et al. (2016). These works rely on proximal multi-modal interaction schemes which require complex hardware setup to interpret the human gestures, which is not practical for large scale swarms. We rely on a single wearable device without any other external electronics, which makes the implementation more practical. With respect to formation control for large scale swarms Rubenstein et al. (2014) and Alonso-Mora et al. (2012) have only used predefined shapes and images as inputs for the swarm, which facilitates only supervisory control for a HSI system. But in our approach the swarm is capable of understanding intuitive human intention with the aid of the interpreter.

Statement of Contributions. We propose a novel HSI framework where we consider both a human agent and a dynamic swarm, with an interpreter acting as an bridge between the two. By means of it, the user can communicate their intentions intuitively and naturally, without having an in depth understanding of the swarm dynamics. At the same time, the swarm receives control subgoals in their domain and need not spend resources to decode the user’s intention. The paper presents contributions in the following three aspects. On the human-interpreter interaction side, we formulate a novel intention decoder using Kalman Filtering and HMMs for simultaneous dynamic and static gesture decoding utilizing the IMU and EMG sensors, respectively. This method increases intuitiveness as preliminary tests have suggested that the human quickly learns to adapt to this interface, with results being comparable to a standard interfaces like a computer mouse. Second, we further exploit the interpreter element to devise control subgoals that are efficient for the swarm, and which require global information that is not easily accessible for the swarm. In this way, the interpreter solves a planning problem with the goal of controlling the swarm efficiently while following an intuitive behavior. Third, we present a novel discrete second-order distributed formation controller for the swarm that combines the Jacobi Overrelaxation Algorithm and dynamic consensus to guarantee the convergence of a (second-order integrator) swarm to a desired shape, scaling, rotation and displacement. Our controller relies only on the position information of each agent and communication with their neighbors using variable communication radii, which provides a practical setting. Finally, we highlight a contribution on the integration of diverse tools from control theory, network science, machine learning, signal processing, optimization and robotics that serve to articulate our HSI framework.

Paper Organization. Section 2 presents preliminary concepts required to build our framework in Section 3, which includes the problem statement. We then describe our approach taken to solve each aspect of the problem statement in Section 4. Next, we state and discuss our results using our proposed approach in Section 5. We finally present conclusions in Section 6.

2 Preliminary Concepts

This section introduces the basic notation and concepts used to construct our HSI framework.

2.1 Basic Notations

We let denote the space of real numbers, and the space of positive integers. Also, and denote the

-dimensional real vector space and the space of

real matrices, respectively. We use to denote the set of dimensional polygonal shapes. In what follows, are column vector of ones,

is the identity matrix, and

denotes a matrix of zeros. In what follows, denotes the Euclidean norm. Given a matrix

, its eigenvalues are denoted by

, enumerated by their increasing real parts. The row of a matrix is denoted by .

2.2 Graph Theory Notions

Here, we introduce some basic Graph Theory notations which will be used in the sequel. Readers can refer Bullo et al. (2009); Godsil and Royle (2001) for more details on Graph Theory and its application to robotics.

Consider a swarm of agents in . Let denote the position and velocity respectively of the agent at time . We denote by the position of the whole swarm defined by .

We model the communication among agents by means of an undirected -disk communication graph , where denotes the set of agents (vertices of the graph), and , denotes the set of edges. In particular, if and only if . The entries of the associated adjacency matrix become:

The neighbor set for the agent is given by . Associated with , we consider a weight-balanced weighting , where is the metropolis weight matrix corresponding to the communication graph ; see Xiao and Boyd (2004). With being the out degree of the agent, is given by:


Since we consider an undirected graph the matrix is symmetric and doubly stochastic. From equation (1) the graph is balanced as . We denote by the diagonal degree matrix of with , the degree of node , being the diagonal entry of . The Laplacian matrix of the graph is given by , and the normalized laplacian matrix is given by . Similarly the weighted Laplacian matrix is given by . The connectivity properties of a graph are captured by the second smallest eigenvalue of the Laplacian matrix . We can also express connectivity in terms of and . We can say that the respective graph is connected if , and connectivity increases with increase in .

3 Proposed Framework and Problem Formulation

Here, we first introduce the various timescales involved in the interactions, and propose a new HSI framework, while providing a description of its components. Later, we identify the various problems to be solved to implement this framework.

Timescales Involved. We assume that the interactions between the human, interpreter and the swarm, and the dynamic update of the swarm, may occur at time scales that go from coarser to finer resolution. In this way, human and interpreter may interact at discrete times that are a multiple of , the interpreter and the swarm may interact at multiples of , while the swarm dynamic update times occur at multiples of . In what follows, we identify (resp. , and ) and we distinguish these integers as belonging to (resp. , and .) We use the time variable for the wearable device as it operates at a fast rate, similar to the swarm.

Proposed Framework. The user specifies their intentions which are translated by the interpreter and in turn communicated to the swarm. The human uses a wearable device called the MYO armband222 which observes the human intended swarm command. By means of it, the user specifies a desired formation shape , centroid , orientation , and scaling for the swarm. These parameters make up the desired human intention which the interpreter decodes as , where . The MYO armband receives the human intention as Electromyography (EMG) signals and Inertial Measurement Unit IMU signals , where .

The interpreter first uses a decoder (Section 4.1) to translate human intentions into . Then it translates in to desired relative agent positions which best depicts the swarm shape. The swarm also has an operation mode corresponding to different communication ranges for each agent of the swarm. We have the notion of swarm operating cost involving as a trade-off between network connectivity and network maintenance costs. We also introduce the notion of Human Interpretable Dynamics (HID), which represents easily understandable swarm dynamics by the Human. Both these concepts will be elucidated in Section 4.5.2.

Now, Given a desired formation and the current state , the interpreter then determines the set of switching intermediate goals with , and being the time horizon for switching. These intermediate goals follow the HID and are optimal with respect to the swarm operating costs. These intermediate goals represent way points and intermediate shapes which will be communicated to the swarm. These parameters constitute the high-level commands that the swarm receives and executes via a distributed algorithm. That is, our swarm employs a decentralized control scheme detailed in Section 4.3 to reach . Figure 1 illustrates the work-flow of our proposed framework. Thus, from here, we need to solve the following problems to complete our framework:

Problem 1

(Human Intention Decoder). Given the observations and from the Myo armband, design a decoder to get the desired human intention .

Problem 2

(Behavior Specifier). Given the desired human intention , design an algorithm to produce the goal behavior which can be understood by the swarm.

Problem 3

(Planning Algorithm). Given the goal behavior , generate the set of optimal intermediate behavior subgoals with , and denoting the time horizon, and which follow human-interpretable dynamics and minimize swarm operating costs.

Problem 4

(Distributed Swarm Controller). Given the command , for some , design a distributed algorithm to drive the swarm to the intermediate shape with scaling , rotation and centroid using operation mode from some initial position .

Problem 5

(User Interface Design and Feedback). Develop a Graphical user interface (GUI) for the human to communicate their intention to the interpreter and receive feedback about the decoded intention and the state of the swarm.

We propose solutions to the above problems in Section 4.

4 Technical Approach

The following subsections describe the proposed solutions to the problems of Section 3.

4.1 Problem 1:Intention Decoding

The user conveys their intention through gestures and arm movement which are recorded by the Myo armband as EMG signals. There are spatial EMG sensors on the Myo armband which generate EMG signals at every time . The 9 DoF IMU provides 3D acceleration, 3D angular velocity, and 3D angular orientation values. We only consider the planar angular velocity and orientation signals and hence the relevant IMU signals at time are used. The first aspect of the intention decoder is to decipher discrete gestures from EMG signals . Then it deciphers the state of the arm consisting of planar arm position and planar arm velocity from IMU signals . The gestures and arm state are translated to mouse movement and mouse clicks, which provide feedback of the decoded intended gesture to the user. This pipeline is described in Figure 2. We use a custom Hidden Markov Model (HMM) based approach to decode the gestures from . In this work, we introduce the use of five gestures and map them to mouse functions as shown in Figure 3. We implement a Kalman filter based movement decoder which uses the gyroscope and magnetometer signals from the IMU of the Myo armband and maps it to arm state . The next few paragraphs give an insight about our proposed intention decoder, however the complete details of this pipeline are omitted due to space constraints.

Figure 2: The user intention decoder system. i) The user conveys their intention through arm movement and gestures. ii) The Myo armband captures the gestures as EMG signals which are read by the gesture decoder. iii) Arm movements are captured as IMU signals and sent to a Kalman filter. iv) The HMM based decoder provides gestures which are mapped to mouse clicks and scrolls. v) The updated state of the Kalman filter is used to assign mouse position. vi) Shape and centroid are specified using the GUI (Figure. 4) using iv) and v)
(a) Fist
(b) Spread
(c) Wave Up
(d) Wave down
(e) Normal
(f) Left click
(g) Right click
(h) Scroll up
(i) Scroll down
(j) Normal
Figure 3: (a)-(e) show the various gestures used and (f)-(j) indicate the corresponding mouse functionalities.

4.1.1 Gesture decoding using HMM

We use HMM, see Rabiner (1989), a common probabilistic machine learning technique to decode gestures from the EMG signals. Our HMM implementation uses discrete states which are the gestures

and continuous observations related to EMG signals which are modeled as a multivariate Gaussian distribution. The Myo Armband produces a

-dimensional spatial EMG signal . We use the mean

and standard deviation

of the signals over window and frame shift as input observations. The final feature observed by the HMM is is given by . We collect the training data for 1 minute, during which the user performs all gestures. The gestures are implemented in a fixed order in a second interval for each gesture without stopping. This gives us seconds data for each gesture spread across the minute horizon. Next, we employ the Baum-Welch algorithm to train the HMM model parameters . Details related to the Baum-Welch algorithm implementation can be found in our previous work Suresh (2016). Now we have constructed the HMM model from the training data and we can proceed to decode the gestures in real time.After implementing the Baum-Welch algorithm we obtain the model parameters which can be used to construct the HMM. Then we use the standard forward algorithm to perform live decoding of the gestures similar to our previous work in Suresh (2016). Now, we will now look into decoding the arm movements to complete the intention decoder.

4.1.2 Arm movement decoding using a Kalman filter

We use a standard discrete-time Kalman filter Thrun et al. (2005) to decode the arm state from the IMU signals . We consider only planar motions of the arm as we will be using a planar environment for the GUI and the formation controller. The arm state is transformed into mouse position and velocity by an appropriate scaling and sent to the GUI (Section 4.2). We use a discrete, linear time-invariant model to describe the dynamics of the mouse state, and , based on Newton’s second law. In this way,


where is the input acceleration given by the planar angular orientation of the arm which is under our control, is the update time constant and is the Gaussian process noise. In this way, the acceleration of the mouse pointer is controlled by changing the arm orientation, which is a more stable signal than the one provided by the accelerometer. The measurement model which uses signals to observe the states is given by


where is the distance between the MYO armband to the tip of the user’s finger, which can be measured or fixed approximately and is the Gaussian measurement noise present in the gyroscope and magnetometer signals. Equations (2) and (3) are in the standard form to apply the Kalman filter to estimate the mouse state which is then used by the GUI program to control the mouse movement in the computer. This enables the armband to essentially replace the computer mouse as a complete Human Computer interaction (HCI) device, which can be used for other purposes as well. This gives the user the opportunity to interact with the computer using both the mouse and the armband. Section 5.2 shows the results of our proposed intention decoder. The decoded intentions are sent to the GUI which is illustrated in Section 4.2.

4.2 Problem 5: User Interface Design

We developed a GUI in MATLAB which takes in the input from the human through the computer mouse and performs the desired behavior with simulated robots. The user interacts with the GUI using arm movements and gestures which are mapped to mouse movements and mouse clicks according to Section 4.1 and Figure 2. Figure 4 illustrates a snapshot of the GUI during the planning phase which has 5 different boxes, whose selection will be triggered by hovering over to the desired area with the mouse pointer. The current shape of the swarm is illustrated on the top left corner of the screen. The user specifies the desired shape on the to left side of the screen by choosing the vertices of the polygon using arm movements and the fist gesture or left click. Next the user proceeds to choose the rotation on the to right side of the screen using mouse scroll or the “wave up” and “wave down” gestures to increase or decrease the angle respectively. On the top right corner scaling is chosen again by the “wave up” and “wave down” gestures in a similar manner as the desired angle. The larger area in the bottom half of the screen represents the environment where the planning and execution of formation control takes place. The user decides the desired centroid by making a “fist” or clicking the left mouse button. In this manner the Human communicates their desired intention which is sent to the interpreter that is described in Section 4.5.2.

Figure 4: UI used to interact with the interpreter.

4.3 Problem 4: Swarm Controller

Our swarm controller is designed to achieve the interpreter’s intention at time . Having second-order integrator dynamics for the agents, and the need of controlling the swarm centroid motivates our controller which extends Cortés (2009) (for first-order agents) with the dynamic consensus feedback interconnection of Zhu and Martínez (2010).

With being the position and velocity of the agent, our second-order distributed swarm controller takes the form:


where are control gains and is the rotation matrix corresponding to . The variable is the estimated center of the swarm by the agent. Note that the are the Metropolis weights defined in Section 2.2. This algorithm, which applies to second-order systems, cancels out the drift observed in Cortés (2009) with the help of dynamic consensus, and drives the swarm to the desired centroid at time . The FODAC algorithm in Zhu and Martínez (2010) in equation (4b) is used to distributively estimate the mean of time varying reference signal which would give us the estimate of the swarm’s centroid .

Using (4) the swarm achieves the desired interpreter’s intention . After some calculations, with as the combined state of the swarm, the state space form of our swarm controller is represented as:


Here is a dummy state introduced to obtain a linear system in standard form. It is interesting to note that the swarm controller (5) consists of an autonomous component and a controlled component housing the desired interpreter’s intention . So can be communicated once at the beginning of the iteration and the agents just need to adjust their positions and communicate locally with their neighbors to achieve the intermediate goal. Letting , the desired intention in this state space is given by . Now we will theoretically analyze the performance of the proposed swarm controller in the next section.

4.4 Swarm Controller Analysis

In this section we will analyze our proposed controller (5) to determine stability and convergence. We will look at the case when remains constant for . So this makes our system time-invariant in that interval. In this work, we will make use of the following assumptions on :

Assumption 1 (Connectivity)

The communication graph has at least one globally reachable vertex at every time .

Assumption 2

(Constant graphs). The communication graph remains constant for .

System (5) represents copies of the same dynamics corresponding to different dimensions. To simplify notation, we will analyze only one of the dimensions. After fixing and omitting it for simplicity, our swarm controller (5) can be reduced by combining the and dynamics to obtain:


where . System (6) is an interconnected system whose stability depends on the chosen gains and . We will use the discrete analogue of composite Lyapunov functions Khalil (2002) to design the gains that guarantee the stability of the interconnected system. With we can state the following theorem.

Theorem 1

(Stability of Swarm Controller). Under Assumption 1 (connectivity) and Assumption 2 (constant interconnection graph), with the control gains satisfying , the swarm globally uniformly asymptotically stabilizes to the desired state under the swarm controller dynamics (5) from any initial condition.

The proof of Theorem 1 is presented in the Appendix. Next we will use the results of Theorem 1 to get an intuition of the role of graph connectivity ( and ) in the convergence of our swarm controller (5).

Corollary 1

The convergence rate of (5) is directly proportional to and of the communication graph.

The proof of Corollary 1 can be found in the Appendix. Using these results we will design a planning algorithm, which optimally determines the intermediate subgoals which will be described in Section 4.5.2.

4.5 The interpreter

In this section we describe the role of the interpreter in the framework. For ease of illustration, we consider the formulation in D space. The interpreter mainly consists of two parts: the behavior specifier and the planner, which are illustrated in the following paragraphs.

4.5.1 Problem 2: Behavior Specifier

The Behavior specifier converts the desired human intention into parameters that can be comprehended by the swarm. The human user specifies the desired shape which takes the form of an arbitrary polygon, the desired centroid , scaling and rotation . The interpreter then decides the formation denoted by the relative positions of the agent , which would best illustrate the shape

given by the human. For simplicity, we use a uniform distribution in the interior of the shape

to obtain , which is illustrated in Figure 5(b). The human specifies the polygon by providing the vertices sequentially using the GUI from Section 4.2, which is shown on the left side of the Figure 5(b). The corresponding formation density is calculated, where is the area of polygon . We assume the density is large enough to fit robots in the shape . Note that, since the shape is bounded, there exists a large enough box such that and is equal to a perfect square , for some and . Using this density, robots are distributed uniformly in the bounding box of the polygon by creating a meshgrid. Finally, we discard the generated points not in the polygon and we arrive at the desired formation of points shown in the right half of Figure 5(b). The parameters , , , and are passed on to the Planner, which is described next.

4.5.2 Problem 3: Planner

The Planner receives the decoded human intention in the form of desired formation (or, equivalently, ), scaling , rotation , and centroid . The planner then constructs a set of intermediate way points , where denotes the number of intermediate steps in the plan to reach the final goal.

To do this, we employ an -Horizon Discrete Switched Linear Quadratic Regulator (DSLQR) formulation. A particular DSLQR problem with a dynamical variable and time horizon can be formulated as follows:

subject to (7b)

where . Here, the running cost consist of a switching LQ cost function, with parameterized matrices and , depending on a mode . The function will be designed to enhance swarm performance while the linear constraint will be used to enforce an easy-to-interpret behavior by a human, which defines a Human Interpretable (HID) dynamics.

Details and methodology of DSLQR systems can be found in Zhang et al. (2009). We show next how we apply this approach in our particular setup and describe the matrices that we choose for our framework.

(i) Human-Interpretable Dynamics: We introduce the notion of Human Interpretable Dynamics (HID) to denote a dynamical system that can be easily understood by a human. Since the interpreter needs to provide feedback to the user, the planner needs to provide an abstraction of the complicated swarm dynamics in an -dimensional space. These dynamics need to be slower than the swarm dynamics to enhance human interpretability, and are hence implemented in the timescale described in Section 3.

Here, we propose a simple linear dynamical system approach to model these dynamics, which takes into account the desired human intention . We suppose that fully actuated linear dynamical systems are more easily understandable by humans, as opposed to other nonlinear system models. We let denote the state of the HID system with , where . Then, the HID takes the form:


where matrices and control input . In this paper, we choose and to be identity matrices. This seems to be the most intuitive dynamics as the control input applies directly on the system. In future work, we will study alternative choices for these dynamics.

We use the horizon Discrete LQR control technique to drive the HID towards starting from some initial configuration . By considering a change of variable , we define a first term contributing to the problem cost functional as follows:


where the matrices are positive definite and is a step change applied during the time. So is chosen such that the cost is minimized. This is solved using the standard LQR approach, and the results are shown in Figure 5(a) for a horizon problem. Intuitively, one can choose these matrices to satisfy to provide a more human “interpretable” dynamics. This condition implies that the priority is to reach the desired behavior with small changes in the intermediate steps, which would make it look more natural and “interpretable” to the human eye as seen in Figure 5(a). Figure 5(a) shows the stages of transformation of a 5 sided polygon to a rotated and translated 4 sided polygon. The figure depicts a seemingly natural transition which can be easily interpreted by the user, thus justifying the HID formulation. The case of mismatch in the number of vertices in the initial and desired shapes is handled by adding vertices appropriately on the perimeter of the shape that has fewer vertices.

(a) HID illustration
(b) Formation Specifier
Figure 5: (a) HID illustration for shape changing from rotated cone to a standing rectangle. The model parametrs used are , and . (b) Left: The user specifies the desired shape by providing vertices (triangles). Right: the interpreter determines the relative positions of agents (blue dots) to represent the shape drawn by user.

(ii) Swarm performance costs. We just discussed how to generate intermediate shapes taking into account the HID. Now we consider the swarm performance and communication cost to choose the operating mode in the general setup. The operating modes correspond to a subset of -disk graphs defined over the swarm when distributed over a shape. Since agent formations are chosen in a consistent manner as described in e.g. Figure 5(b), the number of possible graphs over the agents for different is very much reduced and remains constant for scaled shapes. From now on, we consider this set is given by by choosing appropriate communication radii.

Operating costs involved: To increase the speed of convergence and to facilitate quicker interpretation by a human, we need to maximize the notion of connectivity involving the second smallest eigenvalue or of the respective Laplacian matrices and . This can be found from the determinant of the matrix defined as with , and . Since the determinant of a matrix is a product of its eigenvalues, connectivity determined by increases iff the determinant of increases. So the connectivity cost being in formation and operation mode at time is given by:


To ensure remains well scaled and positive we introduce positive constants and respectively. Having a corresponding to a higher communication radius implies that we will be using more energy to communicate and maintain communication links. This is encoded as a communication cost being in formation and operation mode at time . It is given by


where is the communication range at time and is a positive constant used for scaling.

Adding these costs together defines the total cost used by the planner as:


where , , and . Observe that a solution to the above problem requires the evaluation of all possible graph combinations for different chosen controls . By choosing the graphs based on the communication radii, and considering a class of formations, we reduce significantly the number of possible graphs to evaluate. In addition, we employ the DSLQR formulation from Zhang et al. (2009) to obtain the optimal set of and which minimizes . Our optimization is done in the following sequential manner: first we optimize in the sequence of and , then, given this, we optimize in the variable using the DSLQR approach from Zhang et al. (2009). This is further illustrated and discussed in Section 5.3.

5 Implementation Results

5.1 System Setup

The user has the choice to use either the MYO armband or the mouse to interact with a GUI to control the formation of a simulated swarm in a two dimensional environment. The swarm controller developed in Section 4.3 essentially generates waypoints for the swarm to follow, we assume holonomic dynamics for the individual agents and assume they reach their respective waypoints. We do not focus on collision avoidance, which will we addressed in future work. We utilize the ROS kinetic framework with Python scripting language to interface with the MYO armband and control the mouse pointer. We use Matlab to create the GUI shown in Figure 4, which uses the mouse or the MYO armband as an input device. For the formation controller we set the control gain and proportional constant .

5.2 Intention Decoding

We performed tests to gauge the accuracy and speed of the proposed HMM and Kalman Filter models. For the HMM model, some of our previous tests had given an accuracy levels of over 90% on an average Suresh (2016) for similar gestures and framework. On preliminary tests we observed similar results and hence, in the interest of space, we skip this accuracy test for the HMM model. For the effectiveness of the arm movement decoder, we compare the results of operating a mouse with and without the MYO armband. Figure 6 represents the aggregate results over 5 trials. The user was tasked to continuously trace a pentagon which represents the human intention for a minute. It can be seen from the Figure 6 that the results are similar for both cases. Table 1 describes the error involved in each of the trials. It can be seen that the errors involved are about the same with both interfaces, however the speed of using the mouse is higher than the other. This is also due to the fact that users are accustomed to using the mouse for years and need time to adapt to the new interface. But in the trial it can be seen that the performance with the wearable matches many trials with the mouse, which shows that the user can adapt quickly to use the new interface.

(a) Mouse movement with wearable
(b) Mouse movement without wearable
Figure 6: Aggregate results of tracing a pentagon.(Red) a) The user specifies the shape by using MYO armband. (Blue) b) The user specifies the shape by using the mouse. (Green)
Mouse Wearable
Sl. no Loops Avg Error Total Error Loops Avg Error Total Error
1 7 0.026 122.57 5 0.038 179.26
2 8 0.028 129.40 5 0.037 174.74
3 9 0.031 147.02 7 0.048 222.07
4 8 0.031 148.92 7 0.05 235.27
5 9 0.035 161.50 5 0.029 132.72
Table 1: Error comparison mouse and wearable.

5.3 DSLQR Formulation

Now we will validate the proposed framework by running simulations of a swarm of 50 agents to reach the desired human intention. Below, we illustrate a particular execution of our framework.

Figure 7(a)-(d) indicate the desired human intention communicated by the human. Using , , , , , the planner was implemented for a horizon problem with subsystems. The communication ranges are , corresponding to the three operating modes. Figure 7(e) illustrates the intermediate shapes resulting from the horizon planner, starting from the current intention(triangle on the left), to the desired intention(larger rotated quadrilateral) on the right. The intermediate shapes look natural and the progression is gradual and intuitive, which justifies the notion of HID. Figure 7(f) describes the evolution of the cost (12) and switching strategy in a backward horizon. We can see that switching occurs in a timely manner to maintain minimum costs according to (12). Switching occurs from mode to the mode during the timestep. During the timestep another switching occurs to the operating mode to maintain minimum cost. This is coherent with the intuition of using larger communication radii for more sparse swarms. As the scaling increases with every timestep the agents are forced further apart and the cost of using a smaller communication range rapidly increases. Whereas, the cost of using the largest range remains almost constant throughout because the connectivity and communication costs mostly remain the same. Figure 7(e) shows the execution of the swarm controller during the horizon. Each of the red dots represent individual agents of the swarm. We evaluate the performance of the swarm controller (5) by measuring the error with respect to the intermediate formations and centroid at each time step . The formation error and centroid error are measured as and respectively in reaching the intermediate goal. The evolution of these errors(y-axis) with respect to time (x-axis) is illustrated in Figures 7(g)  and 7(h). We see that the swarm successfully reaches every intermediate goal and finally reaches the desired human intention.

(a) Current shape
(b) Desired Shape
(c) Desired rotation:
(d) Desired scaling:
(e) Planning and Execution
(f) Switching cost throughout execution
(g) Formation error
(h) Centroid error
Figure 7: Results of executing a particular desired behavior communicated by the human.

6 Conclusions and Future Work

In this work we have proposed and successfully implemented a novel HSI framework for formation control, where the user draws the desired shape using intuitive gestures, and the swarm successfully depicts the drawn shape. We have combined diverse tools from control theory, network science, machine learning, signal processing, optimization and robotics to create this multi-disciplinary framework. Firstly, we have demonstrated the effectiveness and intuitiveness of human interaction using this framework, whose accuracy and speeds are comparable to standard interaction devices. Next, we have proposed and utilized a unique notion of human interpretable dynamics along with switching systems to plan intermediate natural shapes for the swarm to depict, which can be easily understood by the human and the swarm. We have also developed, analyzed and illustrated a novel decentralized formation controller capable of reaching any shape and centroid in the space. Lastly, we have integrated the framework by developing a GUI environment which interacts with user by means of gestures, and rest of the framework is encapsulated in the GUI using matlab simulations.

Future work will involve validation of the proposed framework with robustness towards noise and uncertainties. We also wish to learn the Human Interpret-able dynamics from existing human behavior models and data.

We thank Mac Schwager for useful discussions regarding the HMM formulation used in this work. We also thank Chidi Ewenike, Ramon Duran and Tomaz Torres for their help in developing the Myo armband setup used in this work.


Preliminaries for proof of Theorm 1

Let us first define the following quantities : , , ; and .

With these definitions System (6) can be represented as :


where is the system dynamics of the system, is the interconnection to the system and is the drift of the system. Now resembles the shape stabilizing JOR algorithm in Cortés (2009) with some additional centroid drift . From Cortés (2009) we know this system converges to the desired shape with some centroid translation. Henceforth, we will ignore the drift while analyzing the overall system stability. As we see next, stability is established by first analyzing the convergence rates of each of the subsystems defined by , and by identifying suitable conditions on the interconnections , for . To this end, we define the Lyapunov function , defined over for .

Lemma 1

The subsystem is globally uniformly asymptotically stable at .

Considering , we have that the eigenvalues and

is a simple eigenvalue with right eigenvector

, which shows is globally stable. We can perform a similarity transformation on to get where is the symmetric normalized Laplacian of the graph. It holds that the eigenvalues of are the same as and the eigenvectors are those of scaled by a factor of . We perform a Hotelling deflation Saad (2003) on using the largest eigenvalue to get . In this way, we have deactivated the largest eigenvalue of and now we have where is the second smallest eigenvalue of the normalized Laplacian . We will proceed by analyzing the stability properties of which is similar to analyzing the stability of since the eigenvalues and their related properties are the same.

With and we have

The above observation follows from the fact that is symmetric and , hence the eigenvalues , which makes negative definite. From Lyapunov theory we have that is globally uniformly asymptotically stable about the origin. From the theory of symmetric quadratic forms we also have the following inequality


which gives us a convergence rate for the dynamics. Now we will analyze the second subsystem. The Matrix has 1 as the simple eigenvalue with eigenvector . The matrix is Schur stable and , where is the second smallest eigenvalue associated with the weighted graph . Hence we will analyze the convergence of the system , which will give us the convergence rate for system .

Lemma 2

The system is globally uniformly asymptotically stable to the origin, and the convergence rate of system is proportional to .

With and we have

This follows from the fact that the eigenvalues , which makes negative definite. Hence according to Lyapunov theory is globally uniformly asymptotically stable to the origin. In addition,


which finally gives us the convergence rate for the dynamics.

The analysis of the third subsystem, , is trivial. Now, let us define the following constants: