Autonomous robot navigation is a fundamental problem that has received considerable research attraction . The basic motion planning problem consists of generating robot trajectories that reach a desired goal region starting from an initial configuration while avoiding obstacles. More recently, a new class of planning approaches have been developed that can handle a richer class of tasks than the classical point-to-point navigation, and can capture temporal and Boolean specifications. Such tasks can be, e.g., sequencing or coverage , data gathering , intermittent communication , or persistent surveillance , and can be modeled using formal languages, such as Linear Temporal Logic (LTL) [6, 7]. Several control synthesis methods have been proposed to design correct-by-construction controllers that satisfy temporal logic specifications assuming robots with known dynamics that operate in known environments [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]. As a result, these methods cannot be applied to scenarios where the environment is initially unknown and, therefore, online re-planning may be required as environmental maps are constructed; resulting in limited applicability. To mitigate this issue, learning-based approaches have also been proposed that consider robots with unknown dynamics operating in unknown environments. These approaches learn policies that directly map on-board sensor readings to control commands in a trial-and-error fashion; see e.g., [20, 21, 22, 23, 24]. However, learning approaches, in addition to being data inefficient, are specialized to the environment they were trained on and do not generalize well to previously unseen environments.
In this paper, we address a motion planning problem for a team of mobile sensing robots with known dynamics that operate in environments with metric and semantic uncertainty. Specifically, the uncertain environment is modeled by static landmarks with uncertain classes (semantic uncertainty) located at uncertain positions (metric uncertainty) giving rise to an uncertain semantic map. The semantic map is determined by Gaussian and arbitrary discrete distributions over the positions and labels of the landmarks, respectively. Such maps can be initially user-specified or can be obtained and updated by recently proposed semantic Simultaneous Localization And Mapping (SLAM) algorithms [25, 26]. We consider robots equipped with noisy sensors that are tasked with accomplishing collaborative tasks captured by a global temporal logic formula in the presence of metric and semantic uncertainty. To account for sensing and environmental uncertainty, we extend Linear Temporal Logic (LTL) by including sensor-based predicates. This allows us to incorporate uncertainty and probabilistic satisfaction requirements directly into the mission specification. First, we formulate the planning problem as an optimal control problem that designs open-loop sensor-based control policies that satisfy the assigned specification. To solve this problem, we propose a new sampling-based approach that explores the robot motion space, the metric uncertainty space, and an automaton space corresponding to the assigned mission. The open loop control policies can be updated online at time instants determined by an automaton to adapt to the semantic map that is continuously learned using existing semantic SLAM algorithms [25, 26]. To ensure that the proposed sampling-based approach can efficiently explore this large joint space, we build upon our previous works [27, 28] to design sampling strategies that bias explorations towards regions that are expected to be informative and contribute to satisfaction of the assigned specification. We show that the proposed sampling-based algorithm is probabilistically complete and asymptotically optimal under Gaussian and linearity assumptions in the sensor models. Extensions of the proposed algorithm to account for mobile landmarks and nonlinear sensor models are discussed as well. We provide extensive experiments that corroborate the theoretical analysis and show that the proposed algorithm can address large-scale planning tasks under environmental and sensing uncertainty.
I-a Related Research
I-A1 Sampling-based Temporal Logic Planning
Sampling-based approaches for temporal logic planning have also been proposed in [29, 17, 18, 30] but they consider known environments. In particular, in , a sampling-based algorithm is proposed which builds incrementally a Kripke structure until it is expressive enough to generate a path that satisfies a task specification expressed in deterministic -calculus. In fact,  builds upon the RRG algorithm  to construct an expressive enough abstraction of the environment. To enhance scalability of  a sampling-based temporal logic path planning algorithm is proposed in , that also builds upon the RRG algorithm, but constructs incrementally sparse graphs to build discrete abstractions of the workspace and the robot dynamics. These abstractions are then composed with an automaton, capturing the task specification, which yields a product automaton. Then, correct-by-construction paths are synthesized by applying graph search methods on the product automaton. However, similar to , as the number of samples increases, the sparsity of the constructed graph is lost and, as a result, this method does not scale well to large planning problems either. Highly scalable sampling-based approaches for temporal logic planning problems are also presented in [18, 30] that can handle hundreds of robots while requiring discrete abstractions of the robot dynamics . Note that unlike these works, as discussed earlier, our sampling-based approach explores a map uncertainty space, as well, to account for environmental and sensing uncertainty.
I-A2 Temporal Logic Planning under Map Uncertainty
Temporal logic planning in the presence of map uncertainty in terms of incomplete (or dynamic) environment models is considered in [33, 34, 35, 36, 37, 38, 39]. Common in all these works is that they consider static environments with geometric uncertainty, i.e., uncertainty in the positions of obstacles, using unknown but continuously learned environmental models in the form of transition systems  or occupancy grid maps . To the contrary, in this work we consider metric and semantic uncertainty in the environment. Also, we show that the proposed method can also handle dynamic environments as well. Particularly, [33, 34] model the environment as transition systems which are partially known. Then discrete controllers are designed by applying graph search methods on a product automaton. As the environment, i.e., the transition system, is updated, the product automaton is locally updated as well, and new paths are re-designed by applying graph search approaches on the revised automaton. A conceptually similar approach is proposed in [35, 36] as well. The works in [37, 38] propose methods to locally patch paths as transition systems, modeling the environment, change so that GR(1) (General Reactivity of Rank 1) specifications are satisfied. Common in all these works is that, unlike our approach, they rely on discrete abstractions of the robot dynamics [32, 41]. Thus, correctness of the generated motion plans is guaranteed with respect to the discrete abstractions resulting in a gap between the generated discrete motion plans and their physical low-level implementation . An abstraction-free reactive method for temporal logic planning over continuously learned occupancy grid maps is proposed in . Note that  can handle sensing and environmental uncertainty, as shown experimentally, but completeness guarantees are provided assuming perfect sensors, which is not the case in this paper.
The most relevant works are presented by the authors in [43, 27] that also address temporal logic motion planning problems in static and uncertain environments modeled as uncertain semantic maps. In particular,  considers sequencing tasks, captured by co-safe LTL formulas, under the assumption of static landmarks with a priori known labels. This work is extended in  by considering task specifications captured by arbitrary co-safe temporal logic formulas with uncertain landmark labels. Common in both works is that control policies are designed based on a given semantic map - which is never updated - without considering the sensing capabilities of the robots. In contrast, in this work, the proposed algorithm designs sensor-based control policies that actively decrease the uncertainty in the environment to satisfy uncertainty and probabilistic satisfaction requirements that are embedded into the mission specification. In other words, the proposed method may find feasible paths even if such paths do not exist for the initial uncertain semantic map. Also, unlike [43, 27], we show that the proposed method can handle dynamic environments as well, i.e., cases where the landmarks move as per known but noisy dynamics. Finally, we provide formal completeness and optimality that do not exist in [43, 27].
The contribution of this paper can be summarized as follows. First, we formulate the first task planning problem over unknown maps that considers both semantic and metric uncertainty. In fact, we propose the first formal bridge between temporal logic planning and semantic SLAM methods. Second, we propose the first abstraction-free sampling-based approach that can synthesize sensor-based control policies for robot teams operating in uncertain semantic maps. Third, we show that the proposed sampling-based approach is probabilistically complete and asymptotically optimal. Fourth, we design a sampling strategy that biases exploration towards regions that are expected to contribute to mission accomplishment allowing us to scale to large robot teams that operate in large environments. Fifth, we provide extensive experiments that illustrate the efficiency of the proposed algorithm and corroborate its theoretical guarantees.
Ii Problem Definition
Ii-a Multi-Robot System
Consider mobile robots governed by the following dynamics: , for all , where stands for the state (e.g., position and orientation) of robot in an environment at discrete time , stands for a control input in a finite space of admissible controls . Hereafter, we compactly model a group of robots each equipped with a finite library of motion primitives as
where , , and .
Ii-B Uncertain Semantic Maps
The robots operate in an environment with known geometric structure (e.g., walls) but with uncertain metric and semantic structure. Specifically, the uncertain environment is modeled by unknown, static, and labeled landmarks giving rise to an uncertain semantic map . Each landmark is defined by its position and its class , where
is a finite set of semantic classes. The robots do not know the true landmark positions and classes but instead they have access to a probability distribution over the space of all possible maps. Such a distribution can be produced by recently proposed semantic SLAM algorithms[25, 26]
and typically consists of a Gaussian distribution over the landmark positions, modeling metric uncertainty in the environment, and an arbitrary discrete distribution over the landmark classes modeling semantic uncertainty. Specifically, we assume thatis determined by parameters , for all landmarks , such that , where and denote the mean and covariance matrix of the position of landmark , and , where is an arbitrary discrete distribution over the finite set of possible classes. Hereafter, we compactly denote by the parameters that determine all landmarks in the semantic map .
Example II.1 (Semantic Map)
An example of an uncertain semantic map is illustrated in Figure 2 that consists of landmarks with set of classes . Uncertain semantic maps provide a novel means to define mission objectives in terms of meaningful semantic objects in uncertain environments. For instance, in Figure 2, consider a task that requires the robots to collect all available supplies and deliver in a specific order to injured people. A formal method to specify robot missions in the presence of metric and semantic uncertainty is discussed in Section II-E.
Ii-C Sensor Model
The robots are equipped with sensors (e.g., LiDAR or cameras) to collect measurements associated with the landmark positions as per the following linear observation model: , where is the measurement signal at discrete time taken by robot that e.g., consists of range (or bearing) measurements. Also, is sensor-state-dependent Gaussian noise with covariance . Hereafter, we compactly denote the observation models of all robots as
where collects measurements taken by all robots associated with any landmark.111Note that the observation model (2 ) is assumed to be linear with respect to the landmark positions as this allows us to employ separation principles developed in estimation theory; see Section
) is assumed to be linear with respect to the landmark positions as this allows us to employ separation principles developed in estimation theory; see SectionII-D. Nonlinear models are considered in Section III-E.
Moreover, we assume that the robots are equipped with sensors (e.g., cameras) to collect measurements associated with the landmark classes . Specifically, such semantic measurements are generated by an object recognition algorithm and they consist of label measurements along with a confidence score associated with this label measurement . Since such algorithms are scale- and orientation-invariant, the class measurement is independent of the robot/sensor position (assuming that the landmark is within the sensing range of the robot). Thus, the semantic measurement is generated by the following observation model: (see also ) where and represent the class measurements and the corresponding confidence scores respectively. Hereafter, we compactly denote the object recognition model of all robots as
Ii-D Separation Principle: Offline Update of Semantic Maps
The uncertain semantic map can be updated by fusing measurements that the robots collect until a time instant as they apply a sequence control inputs, denoted by , as per the sensor models (2)-(3). The collected measurements can be fused by using recently proposed semantic SLAM algorithms [25, 26]. Note that updating the map requires knowledge of the measurements that the robots may collect in the field if the apply the control actions which, however, is impossible to obtain offline, i.e., during the control synthesis process.
To mitigate this challenge, we rely on separation principles that have recently been employed in estimation theory [45, 46]. Specifically, given a linear observation model with respect to the landmark positions defined in (2) and given a sequence of control inputs
, the a-posteriori covariance matrix can be computed offline using the Kalman filter Riccati equation, i.e., without requiring knowledge of any measurements[45, 46]. Hereafter, with slight abuse of notation, we denote the Kalman filter Riccati map by . Note that the expected positions of landmarks cannot be updated offline, i.e., without sensor measurements generated by (2). Similarly, the semantic uncertainty, captured by the discrete distribution , cannot be updated offline as it requires measurements (i.e., images) that will be processed by the object recognition algorithm (3).
Finally, throughout the paper we make the following assumption that is required for applying a Kalman filter.
The observation model (2) and the covariance matrix of the measurement noise are known for all time instants .
Ii-E Temporal Logic Tasks over Uncertain Semantic Maps
The goal of the robots is to accomplish a complex collaborative task captured by a global Linear Temporal Logic (LTL) specification . To account for semantic and metric uncertainty in the environment, we extend LTL by including probabilistic atomic predicates. Particularly, given a map distribution , we define atomic predicates that reason about the semantic and/or metric uncertainty. To define such predicates, we first denote by the map distribution at time obtained after the robots apply a sequence of control actions ; the offline construction of was discussed in Section II-D. The syntax of LTL over uncertain semantic maps is defined as follows.
Definition II.3 (Syntax)
The syntax of Linear Temporal Logic over a sequence of multi-robot states and an uncertain semantic map is defined as
where is a predicate over the multi-robot state , and is a sensor-based predicate over the multi-robot state and the semantic map , where .
An example of an atomic predicate defined over the robot states is
where is true only if robot is within a region . Moreover, given a map , we consider atomic predicates that reason about the metric uncertainty of the workspace defined as follows:
The predicate (5a) is true at time
if the probability of robotbeing within distance less than from landmark is greater than , after applying control actions , for some user-specified parameters . Similarly, (5b) is true if the determinant of the a-posteriori covariance matrix associated with landmark is less than a user-specified threshold after applying control actions . The latter predicate allows us to reason about the uncertainty associated with target ; the smaller the is, the smaller the metric uncertainty about landmark is.
In a similar way, we can define sensor-based predicates that reason about both the metric and the semantic uncertainty. For instance, consider the following predicate:
In words, (6) is true at time if (i) the probability of robot being within distance less than from landmark is greater than and (ii) the probability that landmark has class , captured by , is greater than .
The semantics of the proposed specification language over dynamic maps is defined as follows.
Definition II.4 (Semantics)
Let be an infinite sequence of multi-robot states and uncertain semantic maps. The semantics of temporal logic over semantic maps is defined recursively as:
Hereafter we assume that the assigned task is expressed as a co-safe formula defined over a set of atomic propositions , which is satisfied by a finite-length robot trajectory. In order to interpret a temporal logic formula over the trajectories of the robot system, we use a labeling function that determines which atomic propositions are true given the current robot system state and the current map distribution obtained after applying a sequence of control actions .
Example II.5 (Mission Specification)
Consider a single-robot search-and-rescue task with two landmarks where and captured by the following temporal logic formula:
This specification requires robot to (i) eventually approach the landmark with class Hostage in order e.g., to provide supplies (see (8a)) and at the same time be confident enough about the position of this landmark (see (8b)) and (ii) to always maintain a safe distance from the hostile landmark that is patrolling the environment (see (8c)).
Ii-F Safe Planning over Uncertain Semantic Maps
Given a task specification , the observation model (2), and the robot dynamics (1), our goal is to select a stopping horizon and a sequence of control inputs , for all , that satisfy while minimizing a user-specified motion cost function. This gives rise to the following optimal control problem:
where the first constraint in (9b) requires the mission specification to be satisfied, if the robots move according to their known dynamics captured by (9c). Also, the constraints in (9d)-(9e) require the metric uncertainty to be updated as per the Kalman filter Ricatti map discussed in II-D while the semantic uncertainty is not updated during the control synthesis process captured by (9f); instead, as it will be discussed in Section III, it is updated online as semantic measurements are collected as per (2). Then, the problem addressed in this paper can be summarized as follows:
Note that the object recognition method modeled in (3) is not required to solve (9). In fact, solving (9) will yield open-loop/offline robot paths. However, as the robots follow these paths, they collect measurements as per (2)-(3). Based on the collected measurements, the uncertain semantic map will be updated using existing semantic SLAM methods [25, 26]. Then, the offline synthesized paths may need to be revised to adapt to the continuously learned map. The proposed method to solve (9) along with a method to revise paths online are presented in Section III.
Iii Planning In Probabilistic Semantic Maps
In this section, first we show that (9) can be converted into a reachability problem defined over a hybrid and uncertain space; see Section III-A. Due to the hybrid nature of the state space that needs to be explored, (9) cannot be solved directly by existing reachability-based planning methods such as RRT* . To address this challenge, in Section III-B, we propose a new sampling-based algorithm that can efficiently explore hybrid and uncertain state-spaces.
Iii-a Reachability in Uncertain Spaces
Definition III.1 (Dfa)
A Deterministic Finite state Automaton (DFA) over is defined as a tuple , where is the set of states, is the initial state, is a deterministic transition relation, and is the accepting/final state.
For a robot trajectory and a sequence of corresponding maps , we get the labeled sequence . This labeled sequence satisfies the specification , if starting from the initial state , each symbol/element in yields a DFA transition so that eventually -after DFA transitions- the final state is reached . As a result, we can equivalently re-write (9) as follows:
where , and is determined by , , and . Note that the constraint in (10b) captures the automaton dynamics, i.e., the next DFA state that will be reached from the current DFA state under the observation/symbol . In words, (10), is a reachability problem defined over a joint space consisting of the automaton state-space (see (10b)), the robot motion space (see (10c)), and the metric and semantic uncertainty space (see (10d)-(10f)) while the terminal constraint requires to reach the automaton final state (see (10g)).
Iii-B A New Sampling-based Approach
Note that the reachability problem in (10) is computationally challenging to solve as it requires to explore a large hybrid state space. To address this challenge, we propose a new sampling-based approach to solve it. As discussed before, existing sampling-based approaches, e.g., RRT* , cannot directly solve (10) due to the hybrid state-space that needs to be explored. Particularly, we propose a sampling-based algorithm that incrementally builds trees that explore simultaneously the robot motion space, the space of covariance matrices (metric uncertainty), and the automaton space that corresponds to the assigned specification. The proposed algorithm is summarized in Algorithm 1.
In what follows we provide some intuition for the steps of Algorithm 1. First, we denote the constructed tree by , where is the set of nodes and denotes the set of edges. The set of nodes contains states of the form , where and .222Throughout the paper, when it is clear from the context, we drop the dependence of on . The function assigns the cost of reaching node from the root of the tree. The root of the tree, denoted by , is constructed so that it matches the initial robot state , the initial semantic map , and the initial DFA state, i.e., . By convention the cost of the root is , while the cost of a node , given its parent node , is computed as
The tree is initialized so that , , and [line 1, Alg. 1]. Also, the tree is built incrementally by adding new states to and corresponding edges to , at every iteration of Algorithm 1, based on a sampling [lines 1-1, Alg. 1] and extending-the-tree operation [lines 1-1, Alg. 1]. After taking samples, where is user-specified, Algorithm 1 terminates and returns a feasible solution to (10) (if it has been found), i.e., a terminal horizon and a sequence of control inputs .
To extract such a solution, we need first to define the set that collects all states of the tree that satisfy , which captures the terminal constraint (10g) [lines 1-1, Alg. 1]. Then, among all nodes , we select the node , with the smallest cost , denoted by [line 1, Alg. 1]. Then, the terminal horizon is , and the control inputs are recovered by computing the path in that connects to the root , i.e., [line 1, Alg. 1]. Note that satisfaction of the constraints (10) is guaranteed by construction of ; see Section III-C. In what follows, we describe the core operations of Algorithm 1, ‘sample’ and ‘extend’ that are used to construct the tree .
Iii-C Incremental Construction of Trees
At every iteration of Algorithm 1, a new state is sampled. The construction of the state relies on three steps. Specifically, first we sample a state ; see Section III-C1. Second, given and its parent node, we compute the corresponding new semantic map using e.g., a Kalman filter. Third, we append to and a DFA state giving rise to which is then added to the tree structure, if possible; see Section III-C2.
Iii-C1 Sampling Strategy
To construct the state , we first divide the set of nodes into a finite number of sets, denoted by , based on the robot state and the DFA state that comprise the states . Specifically, collects all states that share the same DFA state and the same robot state (or in practice, robot states that very close to each other). By construction of , we get that , where is the number of subsets at iteration . Also, notice that is finite for all iterations , due to the finite number of available control inputs and the finite DFA state-space. At iteration of Algorithm 1, it holds that , [line 1, Alg. 1].
Second, given the sets , we first sample from a given discrete distribution an index that points to the set [line 1, Alg. 1]. The density function defines the probability of selecting the set at iteration of Algorithm 1 given the set .
Iii-C2 Extending the tree
To build incrementally a tree that explores both the robot motion space, the space of metric uncertainty, and the DFA state space, we need to append to the corresponding semantic map determined by the parameters and DFA state . Particularly, is constructed so that
as required in (10d), and computing as per the Kalman filter Ricatti map, i.e.,
as required in (10e), and
as required by (10b). In words, is the automaton state that that can be reached from the parent automaton state given the observation . If such a DFA state does not exist, then this means the observation results in violating the LTL formula and this new sample is rejected. Otherwise, the state is constructed [line 1, Alg. 1] which is then added to the tree.
Given a state , we update the set of nodes and edges of the tree as and , respectively [lines 1-1, Alg. 1]. The cost of the new node is computed as in (11), i.e., [line 1, Alg. 1]. Finally, the sets are updated, so that if there already exists a subset associated with both the DFA state and the robot state , then . Otherwise, a new set is created, i.e., and [line 1, Alg. 1]. Recall that this process is repeated for all states [line 1, Alg. 1].
Remark III.2 (Finite Decomposition of )
In Section III-C1, we decomposed the set of nodes into finite sets, where remains finite as , by construction. The finite decomposition of the set of nodes is required to ensure probabilistic completeness and optimality of the proposed sampling-based algorithm; see Section IV. Therefore, any alternative decomposition of that always results in a finite number of subsets can be employed without affecting correctness of Algorithm 1.
Iii-D Online Execution and Re-planning
Algorithm 1 generates an open-loop sequence , where , so that the resulting robot trajectory and sequence of maps satisfy . Recall that is the offline estimated map at time . As the robots follow their respective paths in synthesized by Algorithm 1 synchronously, they take measurements (see (2)) which are used to update both the expected landmark positions and their covariances using e.g., any existing SLAM algorithm giving rise to a map denoted by . Note that this map may be different from the corresponding offline-generated map due to the fact that landmark states are updated (which is not the case during the construction of ). Re-planning is required only if the observations that the robots generate at runtime do not allow them to follow the sequence of DFA states that appear in . Formally, re-planning at time is required only if the observation does not enable a transition from to , where and are determined by computed by Algorithm 1. In this case Algorithm 1 is initialized with the current robot state , the current DFA state , and the current map and generates a new path.
In this section, we discuss how assumptions made in Section II can be relaxed.
Iii-E1 Mobile Targets
First, we assumed that the environment is occupied by static landmarks. Mobile landmarks can also be considered that are governed by the following known but noisy dynamics:
where and are the control input and the process noise for target at discrete time . We assume that the process noise
is uncertain and follows a normally distributed as, where is the covariance matrix at time . To account for dynamic targets, the optimal control problem in (9) can be reformulated by replacing the constraint in (9d) with
which is, in fact, the prediction step of a Kalman filter. Assuming that the control inputs and the covariance matrices are known to the robots for all , Algorithm 1 can be used to solve the new formulation of (9). The only difference is that (17) will be used to construct the landmark states , instead of (12), which are then used to define the map .
Iii-E2 Nonlinear Observation Models
Finally, in Section II, we assumed that the sensor model (2) is linear with respect to the landmark positions allowing us to compute offline the a-posteriori covariance matrix without the need of measurements. Nevertheless, this assumption may not hold in practice. In this case, during the execution of Algorithm 1, we compute the a-posteriori covariance matrices using the linearized observation model about the estimated target positions [line 1, Alg. 1]. Such case studies are presented in Section VI.
Iv Completeness & Optimality
In this section, we examine, the correctness and optimality of Algorithm 1. First, in Theorem IV.3, we show that Algorithm 1 is probabilistically complete. In other words, given an initial semantic map and a sensor model as per (2), Algorithm 1 will eventually find a sequence of multi-robot states and a corresponding sequence of semantic maps , where the metric uncertainty is updated as discussed in Section II-D, that solve (9). Recall that the synthesized paths may need to be revised online, by re-running Algorithm 1, as the semantic map learned online may differ from the offline constructed map. Theorem IV.3 implies that if re-planning is required at any time, a feasible path will eventually be found if it exists. In Theorem IV.4, we also show that Algorithm 1 is asymptotically optimal, i.e., Algorithm 1 will eventually find the optimal solution to (9). Note that completeness and optimality guarantees hold in the presence of mobile landmarks, as well, under the assumptions discussed in Section III-C2.
To show completeness and optimality, we need to make the following two assumptions about the sampling strategy. First, any density function and can be used as long as they satisfy Assumptions IV.1 and IV.2, respectively.
Assumption IV.1 (Probability density function )
(i) The probability density function
(i) The probability density functionsatisfies , and for all , for some that remains constant across all iterations . (ii) Independent samples can be drawn from .
Assumption IV.2 (Probability density function )
(i) The distribution satisfies , for all , for some that remains constant across all iterations . (ii) Independent samples can be drawn from the probability density function .
Theorem IV.3 (Probabilistic Completeness)
Theorem IV.4 (Asymptotic Optimality)
Assume that there exists a solution to Problem 1, i.e., an optimal solution to (9). Then, Algorithm 1 is asymptotically optimal, i.e., the optimal path , will be found with probability , as . In other words, the path generated by Algorithm 1 satisfies where is the objective function of (9) and is the optimal cost.333Note that the horizon and returned by Algorithm 1 depend on . For simplicity of notation, we drop this dependence.
Remark IV.5 (Density functions and )
Remark IV.6 (Sampling Strategy)
An example of a distribution that satisfies Assumption IV.1 is the discrete uniform distribution
is the discrete uniform distribution, for all . Observe that the uniform function trivially satisfies Assumption IV.1(ii). Also, observe that Assumption IV.1(i) is also satisfied, since there exists an that satisfies Assumption IV.1(i), which is , where is a set that collects all robot configurations that can be reached by the initial state . Note that is a finite set, since the set of admissible control inputs is finite, by assumption. Similarly, uniform density functions satisfy Assumption IV.2. Note that any functions and can be employed as long as they satisfy Assumptions IV.1 and IV.2. Nevertheless, the selection of and affects the performance of Algorithm 1; see the numerical experiments in Section VI. In Section V, we design (nonuniform) density functions and that bias the exploration towards locations that are expected to contribute to the satisfaction of the assigned mission specification.
V Designing Biased Sampling Functions
As discussed in Section IV, any density functions and that satisfy Assumptions IV.1 and IV.2, respectively, can be employed in Algorithm 1 without sacrificing its completeness and optimality guarantees. In what follows, we build upon our previous works [19, 27, 28], to design density functions that allow us to address large-scale planning tasks that involve large teams of robots and workspaces.
V-a Pruning the DFA
We first prune the DFA by removing transitions that are unlikely to be enabled. Particularly, these are DFA transitions that are enabled when a robot has to satisfy atomic propositions that require it to be present in more than one location simultaneously. Hereafter, these DFA transitions are called infeasible. For instance, a DFA transition that is enabled if
is true, is classified as infeasible if the regionsand are disjoint, and feasible otherwise. Hereafter, for simplicity, we assume that all regions are disjoint. Note that detecting transitions that are enabled based on such deterministic predicates can be determined by solving Boolean formulas that are true if at least one robot is required to be in more than region simultaneously; see  for more details.
However, determining infeasible transitions becomes more challenging for transitions that depend on probabilistic predicates defined in (5a). The reason is that search over the space of distributions is required to determine if there exist robot positions that satisfy a Boolean formula defined over such predicates; recall that the map distribution is updated during the execution of Algorithm 1. For instance, consider a transition from to that is enabled when the following Boolean formula is true: which requires robot to be close to landmarks and , simultaneously, with certain probabilistic requirements determined by and . Note that existence of a robot position that satisfies depends on the distribution of the position of landmarks and - which may change - and the parameters . Hereafter, for simplicity, we classify such DFA transitions that require a robot to be close to more than one lamndmark at the same time as infeasible.444Note that this is a conservative approach as transitions that are in fact feasible may be classified as infeasible transitions. Nevertheless, this does not affect the correctness of the proposed algorithm, since the biased sampling functions that will be constructed next still satisfy Assumptions (IV.1) and (IV.2). If in the pruned DFA, there is no path from the initial to the final state then this means that either the specification is infeasible or the task is feasible but the pruning was very conservative. In this case, a warning message can be returned to the user, and either uniform distributions or the proposed biased distributions can be employed but using the original (and not the pruned) DFA.
Second, along the lines of  we define a distance function between any two DFA states, which captures the minimum number of transitions in the pruned DFA to reach a state from a state . This function is defined as follows
where denotes the shortest path (in terms of hops) in the pruned DFA from to and stands for its cost (number of hops).
V-B Density function
First, we define the set that collects all tree nodes with the smallest distance , denoted by , among all nodes in the tree, i.e., . The set initially collects only the root and is updated (along with ) as new states are added to the tree. Given the set , we define the set that collects the indices of the subsets that satisfy . Given the set , the probability density function is defined so that it is biased to select more often subsets