1 Introduction
The search and coverage operations of autonomous robots have widespread applications such as floor cleaning, lawn mowing, oil spill cleaning, crop cutting and seeding, mine countermeasures, ocean floor inspection. These operations require Coverage Path Planning (CPP) [1][2][3][4][5], where a coverage path is needed for the robot to completely cover the search area while avoiding obstacles and having minimum overlapping trajectory to minimize the total coverage time.
Thus far, several CPP algorithms have been reported in literature [6][7] and a brief review is provided in Section 2.
1a Motivation
Although many CPP methods are available when using a single robot, only a limited body of work has focused on Multirobot Coverage Path Planning (MCPP). A popular control architecture in the existing MCPP methods is to split the overall workload into multiple tasks, and then use some singlerobot CPP method for coverage in each task [8][9].
However, since the robots typically operate in uncertain environments, they are prone to different failures such as sensor or actuator malfunctions, mechanical defects, loss of power [10]. The consequences of these failures include coverage gaps, loss of critical data, performance degradation (e.g., missed detections of targets), prolonged operation time, and in extreme cases overall mission failure. For example, coverage gaps in mine countermeasure operations can leave undetected underwater mines which are serious threats to traversing vessels. It is therefore critical that the robot team is resilient to failures, in the sense that it can sustain the overall team operation and protect the mission goals (e.g., complete coverage) even in presence of a few robot failures [11]. The role of resilience is to assure systemlevel survivability and fast recovery to normalcy from unanticipated emergency situations (e.g., robot failures). In the context of the MCPP problem, a resilient robot team is expected to autonomously reorganize the active robots in an optimal manner to complete the unfinished tasks of failed robots.
Secondly, it is also important that the robot team operates efficiently. Typically, due to incorrect, incomplete or lack of a priori knowledge of the environment, the initial task allocation may be suboptimal. As a result, some robots may finish their tasks earlier and become idle, which is a waste of their resources. Thus, it is critical that the robot team autonomously reallocates these idling robots in an optimal manner to assist other robots to reduce the total coverage time.
Fig. 1 illustrates the above concepts of resilience and efficiency. Fig. 1a shows an example of resilience where the neighbors of a failed robot proactively negotiate to decide whether any of them should leave its current task to fill the coverage gap. Fig. 1b shows an example of efficiency where a group of robots that have finished (or are close to finish) their current tasks negotiate to optimally reallocate to new tasks or to help other robots in their existing tasks.
1B Challenges
The challenges associated with the problem of resilient and efficient MCPP are presented below.

Scalability: The MCPP algorithm should be scalable to accommodate a growing number of tasks and/or robots, thus making a distributed control structure appropriate.

Optimization factors: The optimization for task reallocation must consider the following factors:

Task worths, which can be quantified by the expected number of undiscovered targets (e.g., crops to cut or mines to discover) in the tasks.

Probabilities of success of the available robots in finishing the contested tasks, which depend on various factors including their current energy levels, the costs of traveling to the contested tasks, and the costs of finishing those tasks.


Dynamically changing conditions: The conditions of robots as well as tasks change dynamically during coverage. The task worths decrease as targets are discovered. On the other hand, the robots drain their batteries during exploration, hence decreasing their probabilities of success. Therefore, the optimization process must accommodate these dynamic factors.

Computation time: First of all, the optimization must be eventdriven, i.e., triggered only in case of failures and/or idling. Secondly, once the optimization is triggered, the task reallocation decision must be made in a timely manner to avoid prolonged coverage time, thus motivating a local distributed eventfocused optimization over only a subset of available robots and tasks.

Connection between local and global objectives: Although the local optimization decision can be suboptimal for the whole team, it is important that it is still aligned with the global objectives of the team. In other words, the local optimization must not only benefit the involved robots but also the whole team. The objectives include early detection of remaining targets, reduction in the total coverage time, and complete coverage.

Complete coverage: The MCPP algorithm must guarantee complete coverage of the a priori unknown environment.
1C Our Contributions
To the best of our knowledge, the concept of resilient coverage has not been adequately addressed in the existing MCPP methods. Thus, we present a novel online MCPP algorithm for resilient and efficient coverage in a priori unknown environments, which addresses the challenges discussed above. The algorithm is called Cooperative Autonomy for Resilience and Efficiency (CARE). For coverage control in each task, CARE utilizes our previously published algorithm [1], which is a singlerobot online CPP algorithm. More details of the algorithm are provided in Section 2.
The CARE algorithm operates in a distributed yet cooperative fashion. Each robot is controlled using a Discrete Event Supervisor (DES), which triggers games between a set of feasible players in the event of a robot failure or idling, to make collaborative decisions for task reallocations. The gametheoretic structure is modeled using Potential Games [12], where the utility of each player is connected with a shared objective function for all players. In case of no failures, CARE reallocates idling robots to support other robots in their tasks; hence reduces coverage time and improves team efficiency. In case of robot failures, CARE guarantees complete coverage via filling coverage gaps by optimal reallocation of other robots; hence providing resilience, albeit with a possibly small degradation in coverage time.
The CARE algorithm has been validated on a highfidelity robot simulator (i.e., Player/Stage) in various complex obstaclerich scenarios. The results demonstrate that the team achieves complete coverage under failures and enables faster target discovery as compared to three alternative methods.
The rest of the paper is organized as follows. Section 2 presents a brief review of the existing MCPP algorithms. Section 3 formulates the MCPP problem and Section 4 presents the details of the CARE algorithm. The results are discussed in Section 5 and the paper is concluded in Section 6 with recommendations for future work.
2 Related Work
Existing CPP methods can be categorized as offline or online (i.e., sensorbased). Offline approaches assume the environment to be a priori known, while online approaches generate coverage paths in situ based on sensor information. Independently, CPP approaches are also described as randomized or systematic. Random strategies follow simple behaviorbased rules, requiring in some cases no localization system or costly computational resources. However, they tend to generate strongly overlapped trajectories, thus unsuitable for timecritical operations. In contrast, systematic approaches are typically based on cellular decomposition [13][14] of the search area into subregions, and then adopt certain predefined path pattern (e.g., back and forth) for coverage in each subregion; or by partitioning the search area into grid cells and then constructing potential field [15] or spanning trees [16][17] to generate coverage paths. In our previous work [1], we presented the algorithm for singlerobot CPP in unknown environments. The
algorithm uses an Exploratory Turing Machine (ETM) that consists of a 2D multilevel tape formed by Multiscale Adaptive Potential Surfaces (MAPS). The ETM stores and updates the explored, unexplored, and obstacle regions as timevarying potentials on MAPS. By default, the ETM adopts the lowest level of MAPS to generate the next waypoint; while it switches to higher levels as needed to evacuate from a local extremum. It is shown that
is computationally efficient for realtime applications, guarantees complete coverage in unknown environments, and does not require cellular decomposition of search area using critical points on obstacles. In CARE, is used as the baseline coverage method by each robot to search within its task.CARE  Firstresponder [9]  Brick & Mortar [18]  ORMSTC [19]  
Path Pattern  Back and forth  Back and forth  No obvious pattern observed  Spiral  

















In terms of MCPP, Batalin and Sukhatme [20] proposed two local approaches for MCPP in unknown environments, based on mutually dispersive interaction between robots. Latimer et al. [21] presented a boustrophedon cellular decompositionbased approach using a team of circular robots. The robots operate together, but can split up into smaller teams when cells are created or completed. Rekleitis et al. [22] presented a distributed auctionbased cooperative coverage algorithm, where the whole space is partitioned into tasks of fixed height and width, and robots utilized the Morse decomposition based singlerobot CPP algorithm to search within each task. Sheng et al. [23] proposed a multirobot area exploration method with limited communication range, where the waypoint of each robot is computed using a distributed bidding mechanism based on frontier cells. The bids rely on the information gain, communication limitation, and traveling costs to frontier cells. Rutishauser et al. [24] presented a distributed coverage method using miniature robots that are subject to sensor and actuator noise. Xu and Stentz [25] presented the Rural Postman Problem (RPP) algorithm to achieve environmental coverage with incomplete prior information using robots, that seeks to equalize the lengths of paths. Bhattacharya et al. [26] generalized the control law towards minimizing the coverage functional to nonEuclidean spaces, and presented a discrete implementation using graph searchbased algorithms for MCPP. Karapetyan et al. [27]
presented two approximation heuristics for MCPP in known environments, where the search area is divided into equal regions and exact cellular decomposition based coverage was used to search each region. Later, these methods were improved to consider vehicle kinematic constraints
[28]. Yang et al. [29]proposed an online neural network based MCPP approach. In their method, the discovered environment was represented according to the dynamic activity landscape of the neural network, which is used to compute robot waypoints; and robots treat each other as moving obstacles during operation.
However, the abovementioned algorithms have not addressed the problem of resilience in MCPP. In this regard, Agmon et al. [19] presented a family of Multirobot Spanning Tree Coverage (MSTC) algorithms, where the Online Robust MSTC (ORMSTC) algorithm enables each robot to incrementally construct a local spanning tree to cover a portion of the whole space. If some robot fails, its local tree is released and taken over by its neighbors, but the already explored region of the failed robot must be scanned again. Also, the tree grows on the scale of cells, while if any cell within such larger cell is occupied by obstacles, the whole larger cell would not be covered, thus leading to incomplete coverage. Zheng et al. [30] presented a polynomialtime MultiRobot Forest Coverage (MFC) algorithm that computes tree covers for each robot with trees of balanced weights, and they showed the superiority of MFC in time to MSTC via simulations; however, their algorithm does not consider failures. Song et. al. [9] presented the FirstResponder (FR) cooperative coverage strategy, where early completed robots are reassigned to available new tasks that can maximize their own utility. However, this algorithm is not proactive, i.e., the coverage gaps caused by robot failures will not be filled until some other robots complete their tasks. Ferranti et. al. [18] presented the Brick and Mortar (B&M) algorithm, where the waypoint of each robot is computed locally based on the states of cells in the neighborhood. The idea behind B&M is to gradually thicken the blocks of inaccessible cells (i.e., visited or wall cells), while maintaining the connectivity of accessible cells (i.e., explored or unexplored cells). An unexplored cell can be marked as explored or visited, where the latter is allowed if it does not block the path between any two accessible cells in the neighborhood. The waypoint gives priority to the unexplored cell in the neighborhood, which has the most inaccessible cells around it. When some robot fails, the remaining robots continue regularly and the coverage gap becomes an extra workload; however, their method may produce redundant coverage due to the looping problem.
Research Gap: Although resilience concepts have been discussed in robot design [31], robot damage detection and recovery [32], flocking of robot teams [33] and networked control systems security under attacks [34], there is a scarcity of efforts that deal with the resilient coverage using multiple robots. Some of abovementioned papers considered robot failures during coverage, however, their remedy was to simply release the coverage gaps to the remaining team, without optimization over the criticality (i.e., available worth) of such coverage gaps and the reliability of remaining robots. Thus, they are not proactive in filling the coverage gaps immediately if they satisfy optimization criteria, they wait until some other robots finish their tasks. In this regard, this paper presents a gametheoretic method for resilient and efficient coverage that incorporates these optimization factors while making eventdriven proactive task reallocations. Table I presents a comparison of the key features of the CARE algorithm with the other relevant online MCPP algorithms.
3 Problem Description
This section presents the description of the robots, the MCPP problem and the performance metrics.
3a Description of the Robots
Let be the team of robots, which are unmanned autonomous vehicles, as shown in Fig. 2. It is assumed that each robot is equipped with:

a localization device (e.g., GPS) or a SLAM [35] system for operations in GPSdenied environments;

a range detector (e.g., a laser) to detect obstacles within a radius ;

a task specific sensor for performing the desired task; and

a wireless communication device for (periodic or eventdriven) information exchange between all pairs of robots. The communication is assumed to be perfect.
The robots continuously deplete the energies from their batteries; thus, their reliability is assessed based on the remaining energy as presented below.
Battery Reliability: Each robot , is assumed to carry a battery whose reliability [36], denoted as , can be computed as , where is the probability of battery being drained up to time . Typically, the stateofcharge of a battery can be model using the realistic Kinetic Battery Model (KiBaM), which takes into account many important nonlinear properties of batteries such as the ratecapacity effect and the recovery effect [37]. It is shown in [38] that with KiBaM,
follows a Sshaped curve when operating under different stochastic workload models (e.g., the on/off model and the burst model). The Sshaped curve can be approximated using a sigmoid function
[36]. As such, the reliability of a robot is given as:(1) 
where and indicate the curvature of the growth part and the inflection point, respectively. Their exact values depend on the choice of batteries. More details on the selection of these parameters are presented in Section 5.
3B The MCPP Problem
The search area is assumed to be a planar field whose borderline is defined either by a hard barrier (e.g., walls or obstacles) or by a soft boundary (e.g., subarea of a large field). A finite but unknown number of obstacles with arbitrary shapes are assumed to populate this area, but their exact locations and shapes are a priori unknown.
For the purpose of coverage path planning, a tiling is constructed to cover , i.e. , as shown in Fig. 2. Each is called an cell, which is a squareshaped cell of side length . The tiling is formed as minimal such that all cells are disjoint from each other, i.e., , , where denotes the interior; and the removal of any single cell from will destroy the covering.
The tiling is partitioned into three sets: i) obstacle (), ii) forbidden (), and iii) allowed (). While the cells in are occupied by obstacles, the cells in create a buffer around the obstacles to prevent collisions due to inertia or large turning radius of the robots. Due to lack of a priori knowledge of the environment, the obstacle cells and forbidden cells are discovered online using sensor measurements. The remaining cells are allowed, which form the free space that is desired to be covered.
For distribution of multiple robots, an initial task allocation is required. Thus, the tiling is grouped into disjoint regions , s.t. . Each region is regarded as one task and is referred as task . Fig. 2 shows an example of the area with tasks. Each robot can work on one task at a time, but one task can be assigned to multiple robots. Note that may not be equal to .
Remark 3.1.
The problem of optimal space partitioning into disjoint tasks and optimal initial robot allocations may require consideration of several factors (e.g., obstacle distribution, robot capabilities, and terrain types (or bathymetry)) and is beyond the scope of this paper. Here, we assume that no a priori knowledge of the environment is available, thus the tasks are made of equal sizes. However, as more information is obtained during exploration, eventdriven task reallocations are performed for performance improvement.
Definition 3.1 (Complete Coverage).
Let be the cell that is visited and explored by the robot at time . Then the robot team is said to achieve complete coverage, if , s.t. the sequences , jointly cover the free space , i.e.,
(2) 
In other words, the coverage is said to be complete if every cell in is explored by at least one robot.
Next, it is assumed that each task contains randomly distributed targets, and their exact numbers and locations are unknown (details in Section 4B). However, it is assumed that the expected number of targets in each task is known, which in practice could be obtained by various means such as field surveys, aerial views or prior knowledge from other sources.
Remark 3.2.
If the total number and spatial distribution of targets is known a priori, then complete coverage may not be necessary and an optimal traversing strategy could be constructed to find all the targets. However, in this paper, we assume that the planner neither knows the exact number of these targets, nor their exact locations, thus complete coverage becomes mandatory to guarantee finding all the targets.
Due to nonuniform spatial distribution of targets and obstacles within each task, the targets are discovered at unequal rates by all robots. Thus, at any point of time all tasks could contain significantly different numbers of undiscovered targets. It is therefore critical that the regions with the maximum number of targets are scanned earlier and are given priority. Early detection of targets helps when the mission is terminated prematurely due to emergencies, failures or other reasons and also provides mental comfort to the operator. For example, once the highly utilized areas of a building floor are cleaned then other areas could be cleaned gradually at ease.
Furthermore, the robots may suffer from unexpected failures during the coverage operation due to several reasons (e.g., sensor (or actuator) malfunctions, and mechanical defects) which lead to coverage gaps. Thus, it is important to fill these coverage gaps by task reallocations of healthy robots. It is also important that the criticality of the task of the failed robot, as measured by the expected number of remaining targets, is evaluated for taskreallocations in comparison with the existing tasks of healthy robots.
3C Performance Metrics
The quality of multirobot coverage can be evaluated based on the following performance metrics:

Coverage ratio (): The ratio of the explored free space to the total free space, i.e.,
(3) Note that if the coverage gaps caused by robot failures are left unattended.

Coverage time (): The total operation time of the team. This is measured by the last robot that finishes its task.

Remaining reliability (): The remaining reliability of all live robots at the end of the operation.

Number of Targets Found (): The total number of targets discovered by the whole team.

Time of Target Discovery (): The time for the whole team to discover a certain percentage of all targets. Note that the time of discovering all targets is less than or equal to the coverage time. Only in the limiting case, when the last target is discovered in the last visited cell by the robot that stops last, the coverage time will be equal to the for all targets.
The objective of MCPP is to achieve (even under a few robot failures), while minimizing , minimizing , and maximizing .
4 CARE Algorithm
The CARE algorithm addresses the abovementioned MCPP problem via facilitating distributed eventdriven task reallocations. In CARE, a set of local robots jointly replan their task assignments in two situations: (1) when a robot has finished its current task, or (2) when a robot has failed and is detected as nonresponsive. The replanning algorithm employs a gametheoretic formulation, which computes the task worths and the success probabilities for each participating robottask pair as optimization factors for optimal task reallocations. The task worths are measured by their expected number of undiscovered targets, while the probabilities of success of robottask pairs are computed based on the robots’ battery reliabilities, travel times, and predicted times to finish the contested tasks.
CARE utilizes a distributed yet cooperative control architecture, where each robot is controlled by a Discrete Event Supervisor (DES) that is modeled as a finite state automaton [39].
4a Discrete Event Supervisor
The DES as shown in Fig. 3 is defined below.
Definition 4.1 (Des).
The DES, denoted as , is a deterministic finite state automaton represented by a tuple as follows
where:

is the set of states, where ‘Start’, ‘Working’, ‘Noidling Game’, ‘Resilience Game’, ‘Idle’, ‘Failed’ and ‘Stop’.

is the finite set of events.

is the partial state transition function. It is defined from one state to another if and only if there exists an arrow connecting them carrying an event.

is the initial state.

is the set of marked states, which means a robot can either stop after finishing all the tasks or it may fail unexpectedly.
While the states , , and are selfexplanatory, the operations in states , and are described as follows. In state , the supervisor of robot adopts the algorithm [1] for online coverage within its own task. Since no a priori information is available, all cells are initialized as unexplored. As the robot explores its task, it updates these cells as explored, obstacles and forbidden as suitable to track the progress of exploration [40]. This information is then periodically shared and synchronized with other robots such that each robot maintains a symbolic map of the entire region.
In states and , triggers the Optimizer to play resilience games and noidling games, respectively. The objective of resilience games is to optimally reorganize the neighbors of the failed robot to immediately fill the coverage gap, if it contains higher worth; while for noidling games, the objective is to optimally reallocate the idling robot and its nearfinishing neighbors to help other robots to reduce coverage time and collect more worth early. Details of Optimizer functionality are explained later in Section 4B.
Events and State Transitions: The events in enable state transitions in , which are explained below. First, we define:

to be the allocation function that indicates the current task allocations of robots;

to be the remaining time required to complete a given task by its assigned robots;

to be the number of unexplored cells in a given task.
Now, consider a robot that is currently working in task . Event is generated when is turned on, and moves to the state to start searching in task using the algorithm.
Event is produced if any of its neighbor robot fails. This transitions to the state , that in turn invokes the Optimizer to play the resilience game to generate a task reallocation decision for . Failure of a robot is detected using a standard mechanism based on heartbeat signals [41]. Each robot periodically broadcasts heartbeat signals, and also listening from others. Then a neighbor robot is detected as failed if its message is not received by constantly for a certain period of time . To ensure robustness to false alarms, its failure is further confirmed if the majority of neighbors detect its failure. Event occurs as soon as task is completed, i.e., the number of unexplored cells in task , denoted as , becomes . Event moves to the state , where the Optimizer is called to play the noidling game for finding a new task for .
Event appears if the Optimizer assigns a new (or current) task to robot , which drives back to the state to search in the assigned task; otherwise if no task is assigned, event is generated that moves to the state and the robot becomes idle. Event is produced if some neighbor robot just completed its task and triggered the noidling game, while is close to finish task , i.e., , hence ready to reallocate after finishing the current task. Specifically, , where is the speed of tasking a cell by the assigned robots. Then again, comes to the state and the Optimizer is invoked to compute for a new task.
Event occurs when the entire area is covered, by satisfying Eq. (2). This happens when no unexplored cells are left in the whole region, i.e., . This moves to the terminal state and the coverage is complete. At last, event is generated if itself is diagnosed as failed by its own diagnosis, and moves to the state . An advanced failure diagnostic tool is beyond the scope of this paper.
4B Distributed Optimizer
The Optimizer is invoked by the supervisor to compute reallocation decisions under two conditions: (i) reaches state upon detection of a neighbor failure (i.e., event ); or (ii) reaches state upon completion of its own task (i.e., event ), or completion of a neighbor’s task (i.e., event ).
Specifically, the Optimizer is built based on the concept of Potential Games [12]
, which have the following advantages: (i) at least one Nash Equilibrium is guaranteed to exist, (ii) several learning algorithms are available (e.g., the MaxLogit algorithm
[42][43]) that can converge fast to the optimal equilibrium, and (iii) the utility of each player is perfectly aligned with a globally shared potential function, thus as each player seeks to increase its own utility, the potential function is simultaneously improved and maximized upon reaching the optimal equilibrium.Before presenting the details of the Optimizer, we list the various useful parameters in Table II. Some mathematical preliminaries are presented below.
Preliminaries: A game in strategic form [44] consists of:

A finite set of players , which includes all available robots that could be reallocated.

A nonempty set of actions associated to each player . In this paper, each action corresponds to the index of an available task, and the action set is assumed identical for all players, i.e., , .

The utility function associated with each player , defined as , where denotes the set of joint actions for all players.
The utility function computes the payoff that can receive by taking an action , given that the rest of the players jointly select , where . A joint action of all players is often written as .
Parameter  Description  

Total number of robots  
Total number of tasks  
Curvature of the growth part in battery model  
Inflection point in battery model  
Robot traveling speed  
Robot tasking speed  
Expected number of targets in task  




Neighborhood size in noidling games  
Neighborhood size in resilient games 
Definition 4.2 (Nash Equlibrium).
A joint action is called a pure Nash Equilibrium if
(4) 
Definition 4.3 (Potential Games).
A game in strategic form with action sets together with utility functions is a potential game if and only if, a potential function exists, s.t.
(5) 
and .
A potential game requires perfect alignment between the utility of an individual player and the globally shared potential function for all players, in the sense that the utility change by unilaterally deviating a player’s action is equal to the amount of change in the potential function. In other words, the potential function can track the changes in payoffs as some player unilaterally deviates from its current action. Therefore, if is designed as the global objective, then as players negotiate towards maximizing their individual utilities, the global objective is simultaneously optimized.
Now, let us present the resilience games and noidling games modeled as potential games.
Specifics of Resilience Games and Noidling Games: Due to different objectives and triggering conditions, the player set and action set are fundamentally different for resilience games and noidling games. Let denote the set of nearest neighbors of robot .
Noidling Game: A noidling game is triggered when some robot completes its current task and becomes idle. Then, it calls its nearest neighbors that are close to finish their tasks to participate in the game. Thus, a noidling game comprises of:

.

, which contains incomplete tasks that have sufficient work left to be finished by their currently assigned robots. If some players still have some work left in their current tasks, they are assigned such that they finish their current tasks before being reallocated to new tasks.
Resilience Game: A resilience game is triggered when some robot fails. Then, the nearest neighbors of are involved in the game to reoptimize their current task allocations. Thus, a resilience game comprises of:

.

, which contains the current tasks of all players and the failed robot. The condition ensures that those tasks close to be finished will be completed by their currently assigned robot and hence not needed to be part of the game.
Remark 4.1.
If there exist other active robots working in the same task of the failed robot, then they will take over this task and no resilience game is triggered.
Remark 4.2.
When a game is initiated, the information is exchanged and synchronized between all players, including their locations, discovered environment maps, success probabilities and estimated task worths.
Although the game specifics are different for the resilience and noidling games, they follow the same design of the potential function and utility function as follows.
Design of Potential Function for Task Reallocations: As explained earlier in Section 1B, the players must analyze the following optimization factors during task reallocation:

Task worths, which can be quantified by the expected number of undiscovered targets in the tasks.

Probability of success of each player to finish a certain task, which depends on its current battery reliability, the cost of traveling to the new task, and the cost of finishing the new task.
Thus, the potential function for all players in the game is defined to be the total expected worth [45] obtained by choosing a joint action , as follows.
(6) 
where denotes the subset of players that choose the same task in the joint action ; is the current available worth of task ; and is the success probability of player to finish task . The term is the joint success probability for all players to finish task together.
As exploration continues, the conditions of robots and tasks change dynamically. Thus, the success probability and the task worth in Eq. (6) must be updated before a game is played.
Computation of Success Probability: The success probability is evaluated online using Eq. (1) as follows.
(7) 
where is the reliability of player at time , which is estimated as
(8) 
where is the total tasking time of since the beginning until the game was initiated, is the traveling time to task , and is the estimated time to complete task . Specifically, , where measures the distance between player ’s current location and the centroid of task , and is the robot’s traveling speed; and the time , where is the speed of tasking a cell by the assigned robots.
In addition, if a robot is selected as a player to find a new task but it still has a small portion left in its current task, then it would like to first finish this task before being reallocated to a new one. Hence, an extra term is included in Eq. (8) if the estimated time to complete the unfinished part of its current task satisfies .
Computation of Task Worths: The worth in Eq. (6) indicates the expected number of undiscovered targets in task that are available to the players. Let
be a random variable that denotes the total number of targets in task
, which is assumed to follow the Poisson distribution with parameter
. Its probability mass function is given as:(9) 
If targets have been already discovered in task , then the estimated remaining number of targets, , is computed as:
By definition, Poisson distribution has mean , i.e., . Also, one has . Thus, is computed as:
(10) 
Next, we decide the portion of available to the players, i.e., . Since task may contain some robots currently working there but are not participating in the task reallocation, i.e., they are not players, then if a player selects task , it must work together with these existing robots. In turn, the maximum payoff a player could expect from task becomes less due to sharing with the existing robots. Let denote the subset of robots that are not players. Similarly, let be the set of nonplayer robots that are currently working in task , which have a joint success probability for task , i.e., . Then is computed as:
(11) 
Utility Function of Each Player: In order to form a potential game, the utility function, together with the potential function defined in Eq. (6), must satisfy Eq. (5). Since the utility of a player also depends on the actions taken by the rest of the players, thus a rule is needed to distribute the total produced payoff among contributing players. In this regard, this paper adopts the concept of Marginal Contribution due to its low computation burden thus feasible for online decisionmaking [45].
Definition 4.4 (Marginal Contribution).
The marginal contribution of player in a joint action is
(12) 
where represents player ’s null action, indicating no task is assigned to it.
Note that for any task not selected by player , i.e., , one has . Thus, the produced potentials in these tasks are canceled in the above equation. It can then be further simplified as below, where is the worth of task .
(13) 
Proposition 4.1.
Proof.
Given a joint action , the difference in potential when player deviates its action from to is:
Thus game satisfies Eq. (5) and it is a potential game. ∎
In this paper, the optimal equilibrium is acquired using the MaxLogit algorithm [42]. Before any game starts, each player computes its success probability using Eq. (7), and updates the estimated task worth using Eq. (10). Then, necessary information are communicated and synchronized as mentioned in Remark 4.2.
Algorithm 1 presents details to acquire in a distributed manner using MaxLogit. In particular, the initial joint action (line 1) is initialized as follows: for resilience games, is set as the current task of player , while for noidling games, is randomly picked from ; then, is determined via synchronization with all other players.
Once is obtained using Algorithm 1, the new task for player is set as its action in the equilibrium .
Postgame Coordination: If multiple robots (including both existing robots and incoming players) are assigned to the same task , it becomes imperative to utilize some strategy to ensure their safety and efficiency when searching together. Let be the maximum number of robots allowed to work in the same task at the same time. In this regard, task is evenly partitioned into subregions, where each subregion is only allowed one robot at a time.
Consider some nonplayer robot that is currently working in task . It continues as usual but its task is restricted to the subregion determined by its current location. This produces incomplete subregions that are instantly available to the incoming players. Now consider a player that is also assigned to task . It selects the subregion by following the steps below. First, it computes its rank in based on its success probability. If it ranks in the top and all other players ranked above it have selected their new subregion, then it selects the new available subregion for itself that minimizes its traveling distance. However, if it ranks after , it stays temporarily idle but can later be reactivated to replace any robot in task should it fail. Once finds a new subregion, its centroid is set as the movement goal. As described previously, resumes to search its new subregion using the algorithm upon its arrival, and its supervisor transitions to the state accordingly.
4C Computational Complexity of the Optimizer
As described above, once the Optimizer triggers a game involving the player set and the action set , the joint action is first initialized locally and then synchronized with other players. This process takes complexity.
Thereafter, the game follows Algorithm 1 in a distributed manner, which operates in a loop for a userdefined computation cycles. At each cycle, one player is randomly selected and is allowed to probabilistically alter its action, which takes to find out if is selected. If not, its action is repeated, which takes complexity; otherwise, first randomly chooses an alternative action with equal probability, which is . Then the associated utility is computed using Eq. (4B), which takes complexity. Thereafter, uses to update its action in a probabilistic manner, which has . At the end of each cycle, the updated action is transmitted to other players, which requires complexity.
Therefore, in the worst case where is selected in every cycle, the total complexity becomes . In comparison, for a centralized optimization algorithm, it must search over possible joint actions, which grows significantly faster as and increase.
4D Connection between Local Games and Team Potential
As discussed earlier, in both resilience games and noidling games, the potential function is optimized for the set of players, which form a subset of the robot team. Now, we show that the increase in will directly improve the performance of the whole team.
To illustrate this, let denote the total team potential that defines the total expected worth achievable by the team, where is the joint action of the team including players and nonplayers . Note for the nonplayers, the action simply represent their current tasks. Since the players and nonplayers are mixed and distributed over different tasks, the total team potential is defined as:
(14) 
where is the set of all robots that are assigned to task in the joint action , and the term within the parentheses on the right hand side computes the joint success probability to complete task by all of its assigned robots.
As the players reach the optimal equilibrium, the joint action becomes and the team potential becomes .
Theorem 4.1.
The optimal equilibrium increases the total team potential , i.e., .
Proof.
First, let us show that the team potential is separable by the worth created by the players (i.e., ) and the rest of the robots (i.e., ). Then we will investigate the change of due to task reallocation. Now can be decomposed as follows.
(15) 
where and are used to denote the joint success probability of the players and the rest of the robots that are assigned to task in the joint action , respectively.
Then we can further break down as follows.
(16) 
where the second term in the last step is the worth generated by the players (if any) that would like to finish the small unfinished part in their current tasks before being reallocated to new tasks, while the third term indicates the worth generated by the nonplayer robots . The values of both these terms do not change by games. Since , , we have . ∎
Game Performance Metrics: The quality of the task reallocation decision (i.e., for the players and for the team) can be evaluated by the worth gain. Note that in any task reallocation, there is a tradeoff between whether the robot should continue with its current task or reallocate to a new task. Thus, where a higher gain implies early detection of targets. Specifically, at playerlevel, the Gain of Players () is defined as
(17) 
Similarly, at teamlevel, the Gain of Team () is
(18) 
Note that since and , both and are nonnegative, which implies that the outcome of a game results in the gain of worth not only for the players but also for the whole team. Both and will be quantitatively examined in Section 5.
4E Complete Coverage under Failures
The success of finding all hidden targets relies on the complete coverage of the whole area . Due to the completeness of the underlying singlerobot coverage algorithm [1], each task can be fully covered by the assigned robot in finite time if it stays alive. Now, let us examine coverage under failures.
Theorem 4.2.
The CARE algorithm guarantees complete coverage in finite time as long as one robot is alive.
Proof.
Consider a robot that is alive during the whole operation, whose supervisor starts with the state upon robot being turned on. We show below that must reach the terminal state in finite time, which happens if and only if , i.e., complete coverage.
First, as shown in Fig. 3, any cycle between states in involves either state or . Also, a robot can reach the states or due to completion of some task or failure of some robot, respectively. Now, since there are only a finite number of robots (i.e., ) and a finite number of tasks (i.e., ), each robot can visit these states only a finite number of times. Thus, cannot have any live lock. Moreover, in states or , it takes a finite amount of time to reach an equilibrium solution using the MaxLogit algorithm. Thus, will always switch to either state or after games. In state , the underlying algorithm is used to explore in the current assigned task of robot . As shown in [1], constantly reduces until task is completed in finite time, so can only stay in state for finite time.
Further, in state , can either be invoked to play new games and hence move to states or , or it can move to the state upon complete coverage, i.e., . Since the former case can only happen for a finite number of times, will come back to the state when no incomplete task is available to anymore. The same logic applies to all other active robots. Thus, all active robots including will reach state in finite time, which implies that no incomplete tasks exist, i.e., . Then, they all transition to the terminal state and the complete coverage is achieved. ∎
5 Results and Discussion
The CARE algorithm was validated on the highfidelity robotic simulation platform called Player/Stage [46] using a computer with GHZ CPU and GB RAM. The Player provides a software base whose libraries contain models of different types of robots, sensors and actuators. On the other hand, Stage is a highly configurable robot simulator.
In this section, we present the performance of the CARE algorithm in three complex obstaclerich scenarios. The search area of size , was partitioned into a tiling consisting of cells of size . The was partitioned into tasks , each of size . Each task was initially assigned with one robot, and a maximum number of robots were allowed to search together in one task. Each task contained an unknown number of targets distributed randomly according to the Poisson distribution with mean .
A team of Pioneer 2AT robots was simulated, where each robot has dimensions of , and was equipped with beam laser scanners with a detection range of . The kinematic constraints of the robot, such as the top speed of and the minimum turn radius of , were included in the simulation model. The tasking speed was set as cells/s. The parameters and in the battery reliability model are chosen such that each robot can finish one and two tasks with more than and remaining reliability, respectively. Specifically, based on the size of each task ( cells) and the robot tasking speed, it takes to finish an obstaclefree task. Then, using Eq. (1):
which lead to and
s. Then, considering stochastic uncertainties in the initial battery charging conditions, these parameters are generated on different robots using Gaussian distributions, s.t.,
and, where the standard deviation is chosen as
of the corresponding mean value.Initially, due to lack of a priori knowledge of the environment, all cells are initialized as unexplored, and as the robot explores the environment, the obstacle and forbidden cells are discovered and updated accordingly. The game parameters are chosen as: and , and in MaxLogit, the number of computation cycles is set as and learning parameter is . The other parameters and are chosen as follows. We set such that it corresponds to less than of the time to finish one task, i.e., . Thus, robots which have only of the task left will participate as players for noidling games. Similarly, we set such that it corresponds to over of the time to finish one task, i.e., . Thus, tasks which have still more than unexplored area become contested tasks. Hence, further rounding up we used and .
5a Scenario 1: No Failures but Some Robots Idle
Fig. (a)a presents the cooperative coverage of a complex islands scenario. A total number of targets were distributed randomly in the field. No failure appeared throughout the whole search, while two noidling games were triggered to reallocate early completed robots to reduce the coverage time. Each subfigure in Fig. (a)a, i.e., Fig. (a)a(1)(a)a(8), is comprised of a top figure showing the trajectories of robots by different colors, and a bottom figure showing the corresponding overall symbolic map of the entire search area , which is periodically synchronized and merged by all live robots. The different colors in the symbolic map represent the following regions: i) light green for obstacles, ii) medium green for unexplored, iii) dark green for explored with no obstacles, and iv) yellow for the forbidden region around the obstacles.
Fig. (a)a shows that the robots started exploration and used their onboard sensing systems to explore the a priori unknown environment. Fig. (a)a(2) shows that the robots continue searching within their assigned tasks. Fig. (a)a shows the instance when robot finished task and triggered a noidling game . The player set was formed as , where
was near finishing its task. At that moment, tasks
, , and still had a lot of area unexplored and required significant time to finish by their currently assigned robots, thus they formed the action set
Comments
There are no comments yet.