I Introduction
Task and motion planning for complex manipulation tasks, such as rearrangement planning of multiple objects, has recently received increasing attention [13, 45, 28]. However, existing algorithms are typically combinatorially hard and do not scale well, while they also focus mostly on known environments [38, 18]. As a result, such methods cannot be applied to scenarios where the environment is initially unknown or needs to be reconfigured to accomplish the assigned mission and, therefore, online replanning may be required, resulting in limited applicability.
Instead, we propose an architecture for addressing complex mobile manipulation task planning problems which can handle unanticipated conditions in unknown environments. Fig. 1 illustrates the problem domain and scope of our algorithm. Fig. 2 illustrates the structure of the architecture and the organization of the paper. Our extensive simulation study suggests that this formal interface between a symbolic logic task planner and a physically grounded dynamical controller  each endowed with their own formal properties  can achieve computationally efficient rearrangement of complicated workspaces, motivating work now in progress to give conditions under which their interaction can guarantee success.
Ia Mobile Manipulation of Movable Objects
Planning the rearrangement of movable objects has long been known to be algorithmically hard (PSPACEhardness was established in [19]), and most approaches have focused on simplified instances of the problem. For example, past work on reactive rearrangement using vector field planners such as navigation functions [36] assumes either that each object is actuated [47, 24] or that there are no other obstacles in the environment [20, 23, 2]. When considering more complicated workspaces, most approaches focus either on samplingbased methods that empirically work well [40], motivated by the typically high dimensional configuration spaces arising from combined task and motion planning [13, 28], or learning a symbolic language on the fly [26]. Such methods can achieve asymptotic optimality [46] by leveraging tools for efficient search on large graphs [48], but come with no guarantee of task completion under partial prior knowledge and their search time grows exponentially with the number of movable pieces [44]. Samplingbased approaches have also been applied to the problem of navigation among movable obstacles (NAMO) [39]
, where the robot needs to grasp and move obstacles in order to connect disconnected components of the configuration space, with recent extensions focusing on heuristics for manipulation planning in unknown environments
[17, 30]. Unlike such methods that require constant deliberative replanning in the presence of unanticipated conditions, this work examines the use of a reactive vector field controller, with simultaneous guarantees of convergence and obstacle avoidance in partially known environments [42], endowed with a narrow symbolic interface to the abstract reactive temporal logic planner whose freedom from any consideration of geometric details affords decisive computational advantage in supervising the task.IB Reactive Temporal Logic Planning
Reactive temporal logic planning algorithms that can account for environmental uncertainty in terms of incomplete environment models have been developed in [16, 15, 33, 29, 31, 32, 27, 1]. Particularly, [16, 15] model the environment as transition systems which are partially known. Then, discrete controllers are designed by applying graph search methods on a product automaton. As the environment, i.e., the transition system, is updated, the product automaton is locally updated as well, and new paths are redesigned by applying graph search approaches on the revised automaton. A conceptually similar approach is proposed in [33, 29] as well. The works in [31, 32] propose methods to locally patch paths, as transition systems (modeling the environment) change so that GR(1) (General Reactivity of Rank 1) specifications [34] are satisfied. Reactive to LTL specifications planning algorithms are proposed in [27, 1], as well. Specifically, in [27, 1] the robot reacts to the environment while the task specification captures this reactivity. Correctness of these algorithms is guaranteed if the robot operates in an environment that satisfies the assumptions that were explicitly modeled in the task specification. Common in all these works is that, unlike our approach, they rely on discrete abstractions of the robot dynamics [5, 35] while active interaction with the environment to satisfy the logic specification is neglected.
IC Contributions and Organization of the Paper
This paper introduces the first planning and control architecture to provide a formal interface between an abstract temporal logic engine and a physically grounded mobile manipulation vector field planner for the rearrangement of movable objects in partially known workspaces cluttered with unknown obstacles. We provide conditions under which the symbolic controller is complete (Proposition 1), while exploiting prior results [42] that guarantee safe physical achievement of its subtasks when they are feasible, and introduce a new heuristic vector field controller for greedy physical rearrangement of the workspace when they are not. We provide a variety of simulation examples that illustrate the efficacy of the proposed algorithm for accomplishing complex manipulation tasks in unknown environments.
The paper is organized as follows. After formulating the problem in Section II, Section III presents a discrete controller which given an LTL specification generates onthefly highlevel manipulation primitives, translated to point navigation commands through an interface layer outlined in Section IV. Using this interface, Section V continues with the reactive implementation of our symbolic actions and the employed algorithm for connecting disconnected freespace components blocked by movable objects. Section VI discusses our numerical results and, finally, Section VII concludes with ideas for future work.
Ii Problem Description
Iia Model of the Robot and the Environment
We consider a firstorder, nonholonomicallyconstrained, diskshaped robot, centered at with radius and orientation ; its rigid placement is denoted by and its input vector consists of a foreaft and an angular velocity command. The robot uses a gripper to move diskshaped movable objects of known location, denoted by , with a vector of radii , in a closed, compact, polygonal, typically nonconvex workspace . The robot’s gripper can either be engaged () or disengaged (). Moreover, we adopt the perceptual model of our recent physical implementations [42, 43] whereby a sensor of range recognizes and instantaneously localizes any fixed “familiar” or “unfamiliar” obstacles; see also Fig. 1.
The workspace is cluttered by a finite collection of disjoint obstacles of unknown number and placement, denoted by . This set might also include nonconvex “intrusions” of the boundary of the physical workspace into the convex hull of the closure of the workspace , defined as the enclosing workspace. As in [3, 43, 42], we define the freespace as the set of collisionfree placements for the closed ball centered at with radius , and the enclosing freespace, , as .
Although none of the positions of any obstacles in are àpriori known, a subset of these obstacles is assumed to be “familiar” in the sense of having a recognizable polygonal geometry, that the robot can instantly identify and localize (as we have implemented in [42, 43]). The remaining obstacles in are assumed to be strongly convex according to [3, Assumption 2] (and implemented in [42, 43]), but are otherwise completely unknown.
To simplify the notation, we dilate each obstacle and movable object by (or when the robot carries an object ), and assume that the robot operates in the freespace . We denote the set of dilated objects and obstacles derived from and , by and respectively. For our formal results, we assume that each obstacle in is always wellseparated from all other obstacles in both and , as outlined in [42, Assumption 1]; in practice, the surrounding environment often violates our separation assumptions, without precluding successful task completion.
IiB Specifying Complex Manipulation Tasks
The robot needs to accomplish a mobile manipulation task, by visiting known regions of interest , where , for some , and applying one of the following three manipulation actions , with referring to a movable object, defined as follows:

instructing the robot to move to region , labeled as , where means that this action does not logically entail interaction with any specific movable object^{1}^{1}1Although, as will be detailed in Section V, the hybrid reactive controller may actually need to move objects out of the way, rearranging the topology of the workspace in a manner hidden from the logical task controller..

instructing the robot to grasp the movable object , labeled as , with denoting that no region is associated with this action.

instructing the robot to push the (assumed already grasped) object toward its designated goal position, , labeled as .
For instance, consider a rearrangement planning scenario where the locations of three objects of interest need to be rearranged, as in Fig. 1. We capture such complex manipulation tasks via Linear Temporal Logic (LTL) specifications. Specifically, we use atomic predicates of the form , which are true when the robot applies the action and false until the robot achieves that action. Note that these atomic predicates allow us to specify temporal logic specifications defined over manipulation primitives and, unlike related works [18, 37], are entirely agnostic to the geometry of the environment. We define LTL formulas by collecting such predicates in a set of atomic propositions. For example, the rearrangement planning scenario with three movable objects initially located in regions , , and , as shown in Fig. 1, can be described as a sequencing task [12] by the following LTL formula:
(1) 
where and refer to the ‘eventually’ and ‘AND’ operator. In particular, this task requires the robot to perform the following steps in this order (i) grasp object and release it in location (first line in (1)); (ii) then grasp object and release it in location (second line in (1)); (iii) grasp object and release it in location (third line in (1)). LTL formulas are satisfied over an infinite sequence of states [4]. Unlike related works where a state is defined to be the robot position, e.g., [27], here a state is defined by the manipulation action that the robot applies. In other words, an LTL formula defined over manipulationbased predicates is satisfied by an infinite sequence of actions , where , for all [4]. Given a sequence , the syntax and semantics of LTL can be found in [4]; hereafter, we exclude the ‘next’ operator from the syntax, since it is not meaningful for practical robotics applications [25], as well as the negation operator^{2}^{2}2Since the negation operator is excluded, safety requirements, such as obstacle avoidance, cannot be captured by the LTL formula; nevertheless, the proposed method can still handle safety constraints by construction of the (continuoustime) reactive, vector field controller in Section V..
IiC Problem Statement
Given a task specification captured by an LTL formula , our goal is to (i) generate online, as the robot discovers the environment via sensor feedback, appropriate actions using the (discrete) symbolic controller, (ii) translate them to point navigation tasks, (iii) execute these navigation tasks and apply the desired manipulation actions with a (continuoustime) vector field controller, while avoiding unknown and familiar obstacles, (iv) be able to online detect freespace disconnections that prohibit successful action completion, and (v) either locally amend the provided plan by disassembling blocking movable objects, or report failure to the symbolic controller and request an alternative action.
Iii Symbolic Controller
In this Section, we design a discrete controller that generates manipulation commands online in the form of the actions defined in Section II (e.g., ‘release the movable object at a region ’). A summary of the overall architecture is given in Fig. 2, and we now proceed to outline the manner in which the symbolic controller depicted there extends prior work [21] to account for the manipulationbased atomic predicates defined in Section II and adapted to the singleagent setting. The detailed construction can be found in Appendix A.
Iiia Construction of the Symbolic Controller
First, we translate the specification , constructed using a set of atomic predicates , into a Nondeterministic Bchi Automaton (NBA) with statespace and transitions among states that can be used to measure how much progress the robot has made in terms of accomplishing the assigned mission; see Appendix A. Particularly, we define a distance metric over this NBA statespace to compute how far the robot is from accomplishing its assigned task, or more formally, from satisfying the accepting condition of the NBA, by following a similar analysis as in [22, 21]. This metric is used to generate manipulation commands online as the robot navigates the unknown environment. Namely, as the robot navigates the workspace, the main idea is that, given its current NBA state, it should reach a next NBA state so that the distance to a state that accomplishes the assigned task decreases over time. Once the target NBA state is selected, a symbolic action is generated in the form of a manipulation action presented in Section II (e.g., ‘release the movable object in region ’) that, once executed, the desired NBA state is reached. This symbolic action acts as an input to the continuoustime controller; see Section V. Observe that the symbolic controller is agnostic to the geometry of the environment; the continuoustime controller is responsible for accomplishing the generated symbolic commands.
When the continuoustime controller accomplishes the assigned subtask, a new target automaton state is selected and a new manipulation command is generated. In case the continuoustime controller fails to accomplish the assigned subtask (because e.g., a target region of interest is surrounded by fixed obstacles), the symbolic controller checks if there exists an alternative command that ensures reachability of the target automaton state (e.g., consider a case where a given target NBA state can be reached if the robot goes to either region or ). If there are no alternative commands to reach the desired automaton state, then a new target automaton state is selected that also decreases the distance to satisfying the accepting NBA condition. If there are no other automaton states that can decrease this distance, a message is returned stating that the robot cannot accomplish the assigned mission. A detailed description for the construction of this distance metric is provided in Appendix A.
IiiB Completeness of the Symbolic Controller
Here, we provide conditions under which the proposed discrete controller is complete. The proof for the following Proposition can be found in Appendix A.
Proposition 1 (Completeness)
Assume that there exists at least one infinite sequence of manipulation actions in the set that satisfies . If the environmental structure and the continuoustime controller always ensure that at least one of the candidate next NBA states can be reached, then the proposed discrete algorithm is complete, i.e., a feasible solution will be found.^{3}^{3}3Given the current NBA state, denoted by , the symbolic controller selects as the next NBA state, a state that is reachable from and closer to the final states as per the proposed distance metric; see also Appendix A. All NBA states that satisfy this condition are called candidate next NBA states. Also, reaching an NBA state means that at least one of the manipulation actions required to enable the transition from to the next NBA state is feasible.
Iv Interface Layer Between the Symbolic and the Reactive Controller
We assume that the robot is nominally in an LTL mode, where it executes sequentially the commands provided by the symbolic controller described in Section III. We use an interface layer between the symbolic controller and the reactive motion planner, as shown in Fig. 2, to translate each action to an appropriate gripper command ( for Move and GraspObject, and for ReleaseObject), and a navigation command toward a target . If the provided action is or , we pick as the centroid of region . If the action is , we pick as a collisionfree location on the boundary of object , contained in the freespace .
Consider again the example shown in Fig. 1. The first step of the assembly requires the robot to move object to which, however, is occupied by the object . In this case, instead of reporting that the assigned LTL formula cannot be satisfied, we allow the robot to temporarily pause the command execution from the symbolic controller and switch to a Fix mode and push object away from , before resuming the execution of the action instructed by the symbolic controller. For plan fixing purposes, we introduce a fourth action, , invisible to the symbolic controller, instructing the robot to push the object (after it has been grasped using GraspObject) toward a position on the boundary of the freespace until specific separation conditions are satisfied. Hence, an additional responsibility of the interface layer (when in Fix mode) is to pick the next object to be grasped and disassembled from a stack of blocking movable objects , as well as the target of each DisassembleObject action, until the stack becomes empty^{4}^{4}4The exclusion of the negation operator from the LTL syntax, as assumed in Section II, guarantees that each DisassembleObject action will not interfere with the satisfaction of the LTL formula , e.g., the robot will not disassemble an object that should not be grasped or moved.; see Section VB.
Finally, the interface layer (a) requests a new action from the symbolic controller, if the robot successfully converges to to complete the current action execution, or (b) reports that the currently executed action (associated with a movable object and a region of interest ) is infeasible and requests an alternative action, if the topology checking module outlined in Section VA determines that the goal is surrounded by fixed obstacles.
V Symbolic Action Implementation
In this Section, we describe the online implementation of our symbolic actions, assuming that the robot has already picked a target using the interface layer from Section IV. As reported above, in the LTL mode, the robot executes commands from the symbolic controller, using one of the actions Move, GraspObject and ReleaseObject. The robot exits the LTL mode and enters the Fix mode when one or more movable objects block the target destination ; in this mode, it attempts to rearrange blocking movable objects using a sequence of the actions GraspObject and DisassembleObject, before returning to the LTL mode.
The “backbone” of the symbolic action implementation is the reactive, vector field motion planner from prior work [42], allowing either a fully actuated or a differentialdrive robot to provably converge to a designated fixed target while avoiding all obstacles in the environment. When the robot is gripping an object , we use the method from [41] for generating virtual commands for the center of the circumscribed disk with radius , enclosing the robot and the object. Namely, we assume that the robotobject pair is a fully actuated particle with dynamics , design our control policy using the same vector field controller, and translate to commands for our differential drive robot as , with the Jacobian of the gripping contact, i.e., .
This reactive controller assumes that a path to the goal always exists (i.e., the robot’s freespace is pathconnected), and does not consider cases where the target is blocked either by a fixed obstacle or a movable object^{5}^{5}5The possibility of an entirely unknown blocking convex obstacle is precluded by our separation assumptions in Section II.. Hence, here we extend the algorithm’s capabilities by providing a topology checking algorithm (Section VA) that detects blocking movable objects or fixed obstacles, as outlined in Fig. 2. Based on these capabilities, Section VB describes our symbolic action implementations. Appendix B includes a brief overview of the reactive, vector field motion planner, and Appendix C includes an algorithmic outline of our topology checking algorithm.
Va Topology Checking Algorithm
The topology checking algorithm is used to detect freespace disconnections, update the robot’s enclosing freespace , and modify its action by switching to the Fix mode, if necessary. In summary, the algorithm’s input is the initially assumed polygonal enclosing freespace for either the robot or the robotobject pair, along with all known dilated movable objects in and fixed obstacles in (corresponding to the index set of localized familiar obstacles). The algorithm’s output is the detected enclosing freespace , used for the diffeomorphism construction in the reactive controller [42], along with a stack of blocking movable objects and a Boolean indication of whether the current symbolic action is feasible. Based on this output, the robot switches to the Fix mode when the stack becomes nonempty, and resumes execution from the symbolic controller once all movable objects in are disassembled. A detailed description is given in Appendix C.
VB Action Implementation
We are now ready to describe the used symbolic actions. The symbolic action simply uses the reactive controller to navigate to the selected target , as described in Appendix B. Similarly, the symbolic action uses the reactive controller to navigate to a collisionfree location on the boundary of object , and then aligns the robot so that its gripper faces the object, in order to get around Brockett’s condition [8]. uses the reactive controller to design inputs for the robotobject center and translates them to differential drive commands through the center’s Jacobian , in order to converge to the goal .
Finally, the action is identical to ReleaseObject, with two important differences. First, we heuristically select as target the middle point of the edge of the polygonal freespace that maximizes the distance to all other movable objects (except ) and all regions of interest . Second, in order to accelerate performance and shorten the resulting trajectories, we stop the action’s execution if the robotobject pair, centered at does not intersect any region of interest and the distance of from all other objects in the workspace is at least , as this would imply that dropping the object in its current location would not block a next step of the disassembly process. Even though we do not yet report on formal results pertaining to the task sequence in the Fix mode, the DisassembleObject action maintains formal results of obstacle avoidance and target convergence to a feasible , using our reactive, vector field controller.
Vi Illustrative Simulations
In this Section, we implement simulated examples of different tasks in various environments using our architecture, shown in Fig. 2. All simulations were run in MATLAB using ode45, leveraging and enhancing existing presentation infrastructure^{6}^{6}6See https://github.com/KodlabPenn/semnav_matlab.
. The discrete controller and the interface layer are implemented in MATLAB, whereas the reactive controller is implemented in Python and communicates with MATLAB using the standard MATLABPython interface. For our numerical results, we assume perfect robot state estimation and localization of obstacles using the onboard sensor, which can instantly identify and localize either the entirety of familiar obstacles or fragments of unknown obstacles within its range
^{7}^{7}7The reader is referred to [42, 43]for examples of physical realization of such sensory assumptions, using a combination of an onboard camera for obstacle recognition and pose estimation, and a LIDAR sensor for extracting distance to nearby obstacles.
. The reader is referred to the accompanying video submission for visual context and additional simulations.Via Demonstration of local LTL plan fixing
Fig. 3 includes a demonstration of a simple task, encoded in the LTL formula , i.e., eventually execute the action Move to navigate to region 1, demonstrating how the Fix mode for local rearrangement of blocking movable objects works.
ViB Executing more complex LTL tasks
Fig. 4 includes successive snapshots of a more complicated LTL task, captured by the formula
which instructs the robot to first navigate to region 1, then navigate to region 2, and finally grasp object 1 and move it to region 3, in an environment cluttered with both familiar nonconvex and completely unknown convex obstacles. Before navigating to region 1, the robot correctly identifies that the movable object disconnects its freespace and proceeds to disassemble it. After visiting region 2, it then revisits the movable object, grasps it and moves it to the designated location to complete the required task. The reader is referred to the video submission for visual context regarding the evolution of all planning spaces [42] (semantic, mapped and model space) during the execution of this task, as well as several other simulations with more movable objects, including (among others) a task where the robot needs to patrol between some predefined regions of interest in an environment cluttered with obstacles by visiting each one of them infinitely often.
ViC Execution of rearrangement tasks
Finally, a promising application of our reactive architecture concerns rearrangement planning with multiple movable pieces. Traditionally, such tasks are executed using samplingbased planners, whose offline search times can blow up exponentially with the number of movable pieces in the environment (see, e.g., [44, Table I]). Instead, as shown in Fig. 5, the persistent nature of our reactive architecture succeeds in achieving the given task online in an environment with multiple obstacles, even though our approach might require more steps and longer trajectories in the overall assembly process than other optimal algorithms [46]. Moreover, the LTL formulas for encoding such tasks are quite simple to write (see (1) for the example in Fig. 5), instructing the robot to grasp and release each object in sequence; the reactive controller is capable of handling obstacles and blocking objects during execution time. Our video submission includes a rearrangement example with 4 movable objects, requiring more steps in the assembly process.
Vii Conclusion
In this paper, we propose a novel hybrid control architecture for achieving complex tasks with mobile manipulators in the presence of unanticipated obstacles and conditions. Future work will focus on providing endtoend (instead of componentwise) correctness guarantees, extensions to multiple robots for collaborative manipulation tasks, as well as physical experimental validation.
Appendix A Detailed Description of the Symbolic Controller
This Appendix provides a detailed description of the distance metric used in Section III upon which a discrete controller is designed that generates manipulation commands online. To accomplish this, first in Appendix AA we translate the LTL formula into a Nondeterministic Bchi Automaton (NBA) and we provide a formal definition of its accepting condition. Then, in Appendix AB, we provide a detailed description for the construction of the distance metric over this automaton state space. Appendix AC describes our method for generating symbolic actions online, and Appendix AD includes the proof of our completeness result. The proposed method is also illustrated in Figs. 67.
Aa From LTL formulas to Automata
As reported in Section III, we first translate the specification , constructed using a set of atomic predicates , into a Nondeterministic Bchi Automaton (NBA) defined as follows; see also Fig. 6.
Definition 1 (Nba)
A Nondeterministic Bchi Automaton (NBA) over is defined as a tuple , where (i) is the set of states; (ii) is a set of initial states; (iii) is a nondeterministic transition relation, and is a set of accepting/final states.
To interpret a temporal logic formula over the trajectories of the robot system, we use a labeling function that determines which atomic propositions are true given the current robot action ; note that, by definition, these actions also encapsulate the position of the robot in the environment. An infinite sequence of actions , satisfies if the word yields an accepting NBA run defined as follows [4]. First, a run of over an infinite word , is a sequence , where and , . A run is called accepting if at least one final state appears infinitely often in it. In words, an infinitelength discrete plan satisfies an LTL formula if it can generate at least one accepting NBA run.
AB Distance Metric Over the NBA
In this Section, given a graph constructed using the NBA, we define a function to compute how far an NBA state is from the set of final states. Following a similar analysis as in [22, 21], we first prune the NBA by removing infeasible transitions that can never be enabled as they require the robot to be in more than one region and/or take more that one action simultaneously. Specifically, a symbol is feasible if and only if , where is a Boolean formula defined as
(2) 
In words, requires the robot to be either present simultaneously in more than one region or take more than one action in a given region at the same time. Specifically, the first line requires the robot to be present in locations and , and apply the actions while the second line requires the robot to take two distinct actions and at the same region , simultaneously.
Next, we define the sets that collect all feasible symbols that enable a transition from an NBA state to another, not necessarily different, NBA state . This definition relies on the fact that transition from a state to a state is enabled if a Boolean formula, denoted by and defined over the set of atomic predicates , is satisfied. In other words, , i.e., can be reached from the NBA state under the symbol , if satisfies . An NBA transition from to is infeasible if there are no feasible symbols that satisfy . All infeasible NBA transitions are removed yielding a pruned NBA automaton. All feasible symbols that satisfy are collected in the set .
To take into account the initial robot state in the construction of the distance metric, in the pruned automaton we introduce an auxiliary state and transitions from to all initial states so that and , i.e., transition from to can always be enabled based on the atomic predicate that is initially satisfied denoted by ; note that if no predicates are satisfied initially, then corresponds to the empty symbol [4]. Hereafter, the auxiliary state is considered to be the initial state of the resulting NBA; see also Fig. 7.
Next, we collect all NBA states that can be reached from in a possibly multihop fashion, using a finite sequence of feasible symbols, so that once these states are reached, the robot can always remain in them as long as needed using the same symbol that allowed it to reach this state. Formally, let be a set that collects all NBA states (i) that have a feasible selfloop, i.e., and (ii) for which there exists a finite and feasible word , i.e., a finite sequence of feasible symbols, so that starting from a finite NBA run (i.e., a finite sequence of NBA states) is incurred that ends in and activates the selfloop of . In math, is defined as:
(3)  
By definition of , we have that .
Among all possible pairs of states in , we examine which transitions, possibly multihop, can be enabled using feasible symbols, so that, once these states are reached, the robot can always remain in them forever using the same symbol that allowed it to reach this state. Formally, consider any two states (i) that are connected through a  possibly multihop  path in the NBA, and (ii) for which there exists a symbol, denoted by , so that if it is repeated a finite number of times starting from , the following finite run can be generated:
(4) 
where , for some finite . In (4), the run is defined so that (i) , for all ; (ii) is not valid for all , i.e., the robot cannot remain in any of the intermediate states (if any) that connect to either because a feasible selfloop does not exist or because cannot activate this selfloop; and (iii) i.e., there is a feasible loop associated with that is activated by . Due to (iii), the robot can remain in as long as is generated. The fact that the finite repetition of a single symbol needs to generate the run (4) precludes multihop transitions from to that require the robot to jump from one region of interest to another one instantaneously as such transitions are not meaningful as discussed in Section II; see also Fig. 7. Hereafter, we denote the  potentially multihop  transition incurred due to the run (4) by .
Then, we construct the directed graph where is the set of nodes and is the set of edges. The set of nodes is defined so that and the set of edges is defined so that if there exists a feasible symbol that incurs the run defined in (4); see also Fig. 7.
Given the graph , we define the following distance metric.
Definition 2 (Distance Metric)
Let be the directed graph that corresponds to NBA . Then, we define the distance function as follows
(5) 
where denotes the shortest path (in terms of hops) in from to and stands for its cost (number of hops).
In words, returns the minimum number of edges in the graph that are required to reach a state starting from a state . This metric can be computed using available shortest path algorithms, such the Dijkstra method with worstcase complexity .
Next, we define the final/accepting edges in as follows.
Definition 3 (Final/Accepting Edges)
An edge is called final or accepting if the corresponding multihop NBA transition includes at least one final state .
Based on the definition of accepting edges, we define the set that collects all states from which an accepting edge originates, i.e.,
(6) 
By definition of the accepting condition of the NBA, we have that if at least one of the accepting edges is traversed infinitely often, then the corresponding LTL formula is satisfied.
AC Online symbolic controller
In this Section, we present how manipulation commands are generated online. The proposed controller requires as an input the graph defined in Appendix AB. The main idea is that as the robot navigates the unknown environment, it selects NBA states that it should visit next so that the distance to the final states, as per (7), decreases over time.
Let be the NBA state that the robot has reached after navigating the unknown environment for time units. At time , is selected to be the initial NBA state. Given the current NBA state , the robot selects a new NBA state, denoted by that it should reach next to make progress towards accomplishing their task. This state is selected among the neighbors of in the graph based on the following two cases. If , where is defined in (6), then among all neighboring nodes, we select one that satisfies
(8) 
i.e., a state that is one hop closer to the set than is where is defined in (7). Under this policy of selecting , we have that eventually ; controlling the robot to ensure this property is discussed in Section V. If , then the state is selected so that is an accepting edge as per Definition 3. This way we ensure that accepting edges are traversed infinitely often and, therefore, the assigned LTL task is satisfied.
Given the selected state , a feasible symbol is selected that can enable the transition from to , i.e., can incur the run (4). By definition of the run in (4), it suffices to select a symbol that satisfies the following Boolean formula:
(9) 
where . In words, the Boolean formula in (9) is the conjunction of all Boolean formulas that need to be satisfied simultaneously to reach through a multihop path. Once such a symbol is generated, a pointtopoint navigation and manipulation command is accordingly generated. For instance, if this symbol is then the robot has to apply the action , i.e., go to a known region of interest and apply action to the movable object . The online implementation of such action is discussed in Section V.
AD Completeness of the symbolic controller
In what follows, we provide the proof of Proposition 1.
Proof of Proposition 1.
To show this result it suffices to show that eventually the accepting condition of the NBA is satisfied, i.e., the robot will visit at least one of the final NBA states infinitely often. Equivalently, as discussed in Appendix AB, it suffices to show that accepting edges , where are traversed infinitely often.
First, consider an infinite sequence of time instants where , so that an edge in , defined in Appendix AB, is traversed at every time instant . Let denote the edge that is traversed at time . Thus, yields the following sequence of edges where , , , and the state is defined based on the following two cases. If , then the state is closer to than is, i.e., , where is defined in (7). If , then is selected so that an accepting edge originating from is traversed. By definition of , the ‘distance’ to decreases as increases, i.e., given any time instant , there exists a time instant so that and then at the next time instant an accepting edge is traversed. This means that includes an infinite number of accepting edges. This sequence exists since, by assumption, there exists an infinite sequence of manipulation actions that satisfies . Particularly, recall that by construction of the graph , the set of edges in this graph captures all NBA transitions besides those that (i) require the robot to be in more than one region simultaneously or (ii) multihop NBA transitions that require the robot to jump instantaneously from one region of interest which are not meaningful in practice. As a result, if there does not exist at least one sequence , i.e., at least one infinite path in that starts from the initial state and traverses at least one accepting edge infinitely often, then this means that there is no path that satisfies (unless conditions (i)(ii) mentioned before are violated).
Assume that the discrete controller selects NBA states as discussed in Appendix AC. To show that the discrete controller is complete, it suffices to show that it can generate a infinite sequence of edges as defined before. Note that the discrete controller selects next NBA states that the robot should reach in the same way as discussed before. Also, by assumption, the environmental structure and the continuoustime controller ensure that at least one of the candidate next NBA states (i.e., the ones that can decrease the distance to ) can be reached. Based on these two observations, we conclude that such a sequence will be generated, completing the proof. ∎
Note that the graph is agnostic to the structure of the environment, meaning that an edge in may not be able to be traversed. For instance, consider an edge in this graph that is enabled only if the robot applies a certain action to a movable object that is in a region blocked by fixed obstacles; in this case the continuoustime controller will not be able to execute this action due to the environmental structure. Satisfaction of the second assumption in Proposition 1 implies that if such scenarios never happen, (e.g., all regions and objects that the robot needs to interact with are accessible and the continuoustime controller allows the robot to reach them) then the proposed hybrid control method will satisfy the assigned LTL task if this formula is feasible. However, if the second assumption does not hold, there may be an alternative sequence of automaton states to follow in order to satisfy the LTL formula that the proposed algorithm failed to find due to the àpriori unknown structure of the environment.
Appendix B Reactive Controller Overview
This Appendix provides a brief description of the reactive, vector field controller from [42] used in this work. As shown in Fig. 3, the robot navigates the physical space and discovers obstacles (e.g., using the semantic mapping engine in [7]), which are dilated by the robot radius and stored in the semantic space. Potentially overlapping obstacles in the semantic space are subsequently consolidated in real time to form the mapped space. A change of coordinates from this space is then employed to construct a geometrically simplified (but topologically equivalent) model space, by merging familiar obstacles overlapping with the boundary of the enclosing freespace to this boundary, deforming other familiar obstacles to disks, and leaving unknown obstacles intact. It is shown in [42] that the constructed change of coordinates between the mapped and the model space, for a given index set of instantiated familiar obstacles, is a diffeomorphism away from sharp corners. Using the diffeomorphism , we construct a hybrid vector field controller (with the modes indexed by , i.e., depending on external perceptual updates), that guarantees simultaneous obstacle avoidance and target convergence, while respecting input command limits, in unexplored semantic environments (see [42, Eq. (1)]). The reactive controller is further extended to accommodate differentialdrive robot dynamics, while maintaining the same formal guarantees.
Appendix C Description of the Topology Checking Algorithm
This Appendix includes an algorithmic outline of the topology checking algorithm from Section VA, shown in Algorithm 1. This algorithm works as follows. Starting with the initially assumed polygonal enclosing freespace for either the robot or the robotobject pair, we subtract the union of all known dilated movable objects in and fixed obstacles in (corresponding to the index set of localized familiar obstacles), using standard logic operations with polygons (see e.g., [11, 9, 10]). This operation results in a list of freespace components, which we denote by . From this list, we identify the freespace as the freespace component that contains the robot position (or the robotobject pair center ) and redefine the enclosing freespace as its convex hull, i.e., .
If the goal is contained in , the reactive controller proceeds as usual, using for the diffeomorphism construction (see Appendix B), and treating all other freespace components as obstacles. Otherwise, we need to check whether movable objects or fixed obstacles cause a freespace disconnection that does not allow for successful action completion. Namely, we need to check whether both the robot position (or the robotobject pair center ) and the target are included in the same connected component of the set , i.e., the union of all freespace components in with all dilated movable objects in . This would imply that a subset of movable objects in blocks the target configuration. In that case, the robot switches to the Fix mode to rearrange these objects; otherwise, the interface layer reports to the symbolic controller that the current action is infeasible.
In the former case, we proceed one step further to identify the blocking movable objects in order to reconfigure them onthefly. First, we isolate the connected components of the union of all movable objects in into a list ; we refer to the elements of that list as the movable object clusters. Assuming that each movable object cluster is connected to at most two freespace components from , we build a connectivity tree rooted at the robot’s (or the robotobject pair’s) freespace , by checking whether the closures of two individual regions overlap; the tree’s vertices are geometric regions (freespace components in and movable object clusters in ) and edges denote adjacency. We then backtrack from the vertex of the tree that contains the goal until we reach the root, saving the encountered movable object clusters along the way. Any movable object intersecting any of these clusters is pushed to a stack of blocking movable objects .
References
 [1] (2018) Reactive mission and motion planning with deadlock resolution avoiding dynamic obstacles. Autonomous Robots 42 (4), pp. 801–824. Cited by: §IB.

[2]
(2016)
Coordinated robot navigation via hierarchical clustering
. IEEE Transactions on Robotics 32 (2), pp. 352–371. Cited by: §IA.  [3] (201807) SensorBased Reactive Navigation in Unknown Convex Sphere Worlds. The International Journal of Robotics Research 38 (12), pp. 196–223. Cited by: §IIA, §IIA.
 [4] (2008) Principles of Model Checking. MIT Press, Cambridge, MA. Cited by: Fig. 7, §AA, §AB, §IIB.
 [5] (2005) Discrete abstractions for robot motion planning and control in polygonal environments. IEEE Transactions on Robotics 21 (5), pp. 864–874. Cited by: §IB.
 [6] (2018) A hybrid barrier certificate approach to satisfy linear temporal logic specifications. In American Control Conference, pp. 634–639. Cited by: §AB.
 [7] (201705) Probabilistic data association for semantic SLAM. In IEEE Int. Conf. Robotics and Automation, pp. 1722–1729. Cited by: Appendix B.
 [8] (1983) Asymptotic stability and feedback stabilization. In Differential Geometric Control Theory, pp. 181–191. Cited by: §VB.
 [9] (1993) A Small Set of Formal Topological Relationships Suitable for EndUser Interaction. Advances in Spatial Databases, pp. 277–295. Cited by: Appendix C.
 [10] (1973) Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line or its Caricature. Cartographica: The International Journal for Geographic Information and Geovisualization 10, pp. 112–122. Cited by: Appendix C.
 [11] (1993) Slicing an ear using pruneandsearch. Pattern Recognition Letters 14 (9), pp. 719–722. Cited by: Appendix C.
 [12] (2005) Hybrid controllers for path planning: a temporal logic approach. In 44th IEEE Conference on Decision and Control, European Control Conference, (CDCECC), Seville, Spain, pp. 4885–4890. Cited by: §IIB.
 [13] (2018) Samplingbased methods for factored task and motion planning. The International Journal of Robotics Research 37 (1314), pp. 1796–1825. Cited by: §IA, §I.
 [14] (2001) Fast ltl to büchi automata translation. In International Conference on Computer Aided Verification, pp. 53–65. Cited by: Fig. 6.
 [15] (2015) Multiagent plan reconfiguration under local ltl specifications. The International Journal of Robotics Research 34 (2), pp. 218–235. Cited by: §IB.
 [16] (2013) Revising motion planning under linear temporal logic specifications in partially known workspaces. In IEEE International Conference on Robotics and Automation, pp. 5025–5032. Cited by: §IB.
 [17] (2010) Navigation Among Movable Obstacles in unknown environments. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1433–1438. Cited by: §IA.
 [18] (2015) Towards manipulation planning with temporal logic specifications. In IEEE International Conference on Robotics and Automation, pp. 346–352. Cited by: §I, §IIB.
 [19] (1984) On the Complexity of Motion Planning for Multiple Independent Objects; PSPACEHardness of the “Warehouseman’s Problem”. The International Journal of Robotics Research 3 (4), pp. 76–88. Cited by: §IA.
 [20] (2001) Assembly as a noncooperative game of its pieces: analysis of 1D sphere assemblies. Robotica 19 (01), pp. 93–108. Cited by: §IA.
 [21] (2020) Reactive temporal logic planning for multiple robots in unknown environments. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 11479–11485. Cited by: §AB, §IIIA, §III.
 [22] (2020) STyLuS*: a temporal logic optimal control synthesis algorithm for largescale multirobot systems. The International Journal of Robotics Research 39 (7), pp. 812–836. Cited by: §AB, §IIIA.
 [23] (2004) Feedbackbased eventdriven parts moving. IEEE Transactions on Robotics 20 (6), pp. 1012–1018. Cited by: §IA.
 [24] (201412) Coordinated Navigation of Multiple Independent DiskShaped Robots. IEEE Transactions on Robotics 30 (6), pp. 1289–1304. Cited by: §IA.
 [25] (2008) A Fully Automated Framework for Control of Linear Systems from Temporal Logic Specifications. IEEE Transactions on Automatic Control 53 (1), pp. 287–297. Cited by: §IIB.

[26]
(2018)
From Skills to Symbols: Learning Symbolic Representations for Abstract HighLevel Planning.
Journal of Artificial Intelligence Research
61, pp. 215–289. Cited by: §IA.  [27] (2009) Temporallogicbased reactive mission and motion planning. IEEE Transactions on Robotics 25 (6), pp. 1370–1381. Cited by: §IB, §IIB.
 [28] (2015) Dealing with difficult instances of object rearrangement. In Robotics: Science and Systems, Cited by: §IA, §I.
 [29] (2016) Iterative temporal planning in uncertain environments with partial satisfaction guarantees. IEEE Transactions on Robotics 32 (3), pp. 583–599. Cited by: §IB.
 [30] (2013) Planning with movable obstacles in continuous environments with uncertain dynamics. In IEEE International Conference on Robotics and Automation, pp. 3832–3838. Cited by: §IA.
 [31] (2012) Backtracking temporal logic synthesis for uncertain environments. In IEEE International Conference on Robotics and Automation, pp. 5163–5170. Cited by: §IB.
 [32] (2013) Patching tasklevel robot controllers based on a local calculus formula. In IEEE International Conference on Robotics and Automation, pp. 4588–4595. Cited by: §IB.
 [33] (2013) Iterative temporal motion planning for hybrid systems in partially unknown environments. In The 16th International Conference on Hybrid Systems: Computation and Control, pp. 353–362. Cited by: §IB.
 [34] (2006) Synthesis of reactive (1) designs. In International Workshop on Verification, Model Checking, and Abstract Interpretation, pp. 364–380. Cited by: §IB.
 [35] (2008) Approximately bisimilar symbolic models for nonlinear control systems. Automatica 44 (10), pp. 2508–2516. Cited by: §IB.
 [36] (1992) Exact Robot Navigation Using Artificial Potential Functions. IEEE Transactions on Robotics and Automation 8 (5), pp. 501–518. Cited by: §IA.
 [37] (2018) SMC: Satisfiability Modulo Convex Programming. Proceedings of the IEEE 106 (9), pp. 1655–1679. Cited by: §IIB.
 [38] (2014) Combined task and motion planning through an extensible plannerindependent interface layer. In IEEE International Conference on Robotics and Automation, pp. 639–646. Cited by: §I.
 [39] (2008) Planning Among Movable Obstacles with Artificial Constraints. The International Journal of Robotics Research 27 (1112), pp. 1295–1307. Cited by: §IA.
 [40] (2010) Path planning among movable obstacles: a probabilistically complete approach. In The 8th International Workshop on the Algorithmic Foundations of Robotics, Cited by: §IA.
 [41] (2018) SensorBased Reactive Symbolic Planning in Partially Known Environments. In IEEE International Conference on Robotics and Automation, pp. 5683–5690. Cited by: §V.
 [42] (2020) Reactive Semantic Planning in Unexplored Semantic Environments Using Deep Perceptual Feedback. IEEE Robotics and Automation Letters 5 (3), pp. 4455–4462. Cited by: Appendix B, Fig. 2, §IA, §IC, §IIA, §IIA, §IIA, §IIA, §VA, §V, §VIB, footnote 7.
 [43] (2020) Reactive Navigation in Partially Familiar Planar Environments Using Semantic Perceptual Feedback. Under review, arXiv: 2002.08946. Cited by: §IIA, §IIA, §IIA, footnote 7.
 [44] (2017) Anchoring abstractions for nearoptimal task and motion planning. In RSS Workshop on Integrated Task and Motion Planning, Cited by: §IA, §VIC.
 [45] (2016) Asymptotically optimal planning under piecewiseanalytic constraints. In The 12th International Workshop on the Algorithmic Foundations of Robotics, Cited by: §I.
 [46] (2018) Admissible abstractions for nearoptimal task and motion planning. In International Joint Conference on Artificial Intelligence, Cited by: §IA, §VIC.
 [47] (1992) Toward the Automatic Control of Robot Assembly Tasks via Potential Functions: The Case of 2D Sphere Assemblies. In IEEE International Conference on Robotics and Automation, pp. 2186–2192. Cited by: §IA.
 [48] (2010) Combined task and motion planning for mobile manipulation. In International Conference on Automated Planning and Scheduling, Cited by: §IA.
Comments
There are no comments yet.