I Introduction
This paper presents a modular, hierarchical framework for motion planning and control of robotic systems. While motion planning has received a great deal of attention by many researchers, because the problem is highly complex especially when there are several robotic agents working together in a cluttered environment, significant challenges remain. Hierarchy, in which the control design has several layers, is an architectural strategy to overcome this complexity. Almost all hierarchical frameworks for motion planning aim to balance flexibility in the control specification at the high level, guarantees on correctness and safety at the low level, and computational feasibility overall.
Historically motion planning was focused on high level planning algorithms, while suppressing details on the dynamic capabilities of the robots at the low level [19]. Taking full account of low level dynamics in combination with solving the high level planning problem can lead to a computationally intractable problem. Despite the wealth of available research [9, 28, 3, 17], computationally efficient solutions to the motion planning problem with tight integration of high and low levels are highly sought after.
We propose a modular hierarchical framework so that one can independently plug and play both low level controllers and high level planning algorithms in order to realize a balance between flexibility at the high level, safety at the low level, and computational feasibility. To make a customizable approach feasible, we introduce three assumptions. First, the output space of the underlying dynamical system has translational symmetry, namely position invariance, a property satisfied by many robotic models [12]. Second, the output space is gridded uniformly into rectangular boxes. Finally, the control capabilities are discretized into a finite set of motion primitives, where the low level describes the implementation of the motion primitives while the high level selects the motion primitives. Together, these assumptions imply that motion primitives can be designed over a single box, so that they can then be reapplied to any other box.
Now we give an overview of the features and techniques we employ, and we highlight other frameworks that share those features. We provide general formulation of motion primitives for nonlinear systems so that they can be applied to multirobot systems. We focus on reachavoid specifications in a priori known environments, in which the system must reach a desired configuration in a safe manner [3, 9, 15, 20]. Reachavoid offers a fairly rich behavior set so that, for instance, a fragment of linear temporal logic (LTL) can be encoded as a sequence of reachavoid problems [33], as we also show in our applications.
As we have mentioned, we abstract the output space into rectangular regions [28] rather than more general polytopic regions [9, 11, 17, 20] in order to exploit symmetry. Motion primitives have been employed in various ways [28, 15] and we encode feasible sequences of motion primitives by a maneuver automaton [12]. In contrast to the motion primitive methods above, our implementation of the lowlevel control design of motion primitives is based on reach control theory[27, 4], which provides a highly flexible and intuitive set of design tools that have two notable advantages over tracking: first, it is not necessary to find feasible openloop trajectories to track; second, safety constraints on the system states during the execution and concatenation of motion primitives can be guaranteed by design. Finally, planning at the high level is based on standard shortest path algorithms [19, 5] applied to the graph arising from the synchronous product of the discrete part of the maneuver automaton and the graph arising from the output space partition. The highlevel plan generates a control policy, which selects the motion primitives over the gridded output space. The modularity of our approach enables one to employ other closedloop methods such as potential methods [9]
or vectorfield shaping
[20] for lowlevel control design, and standard or customized graph search algorithms to generate a highlevel plan.There are three main contributions of this work. First, we provide the complete theoretical details on the requirements for the lowlevel control design and highlevel plan, and show that these two levels operate consistently to solve the reachavoid problem. Second, we formulate the parallel composition of maneuver automata in order to obtain correctbydesign motion primitives for a system composed of individual subsystems, such as in the case of multiple vehicles. Finally, the modularity and effectiveness of our framework is experimentally validated on a group of quadrocopters in several illustrative scenarios. In particular, we feature a novel and versatile design of motion primitives based on double integrators and we show how the customizability of the high level plan generation can be used to easily tradeoff solution quality with computational efficiency. This paper is an extension of our previous work [31], which now supplies all the theoretical details along with proofs on correctness, the parallel composition construction, additional approaches to generate control policies, and more elaborate experimental results.
The paper is organized as follows. In the next section we highlight our contributions relative to the literature. In Section III we present a formal problem statement. The modular framework is introduced in Section IV. We define the output transition system, the maneuver automaton, the product automaton, and the high level plan, each of which contribute to realizing a solution of the motion planning problem. In Section V, we prove that our overall methodology solves the motion planning problem. In Section VI we give the procedure for composing motion primitives. In Section VII we present specific motion primitives for a double integrator system. In Section VIII we consider several methods to generate high level plans, which are experimentally demonstrated on quadrocopters. We conclude the paper in Section IX.
Notation. Let denote the integers and denote the real numbers. Let denote the cardinality of a set. If is a set, we denote its power set as . If and are sets, let denote the usual set difference. If there are sets , let denote the usual cartesian product. Given a function , the image of under and the preimage of under are defined the usual way, and are denoted as and , respectively. Let denote the convex hull of the vectors . Given two vectors , we denote the componentwise multiplication (or Hadamard product) as . Let denote the set of globally Lipschitz vector fields on .
Ii Related Literature
The literature on motion planning is vast and encompasses many research communities. As such, we have categorized some common approaches and discussed how they relate to our method.
Iia Graph Search and Trajectory Planning
Motion planning has often been addressed as a discrete planning problem, for which many standard graph search algorithms exist [19]. Recent work on the multiagent reachavoid problem has developed novel algorithms in the context of applications such as manufacturing and warehouse automation, aiming to balance computational efficiency with solution quality. For example, a centralized approach is given in [35]
, discretizing the workspace into a lattice and using integer linear programming to minimize the total time for robots to traverse in high densities. In
[10], a samplingbased roadmap is constructed in the joint robot space using individual robot roadmaps, which is shown to be asymptotically optimal. Prioritized planning enables to safely coordinate many vehicles and is considered in a centralized and decentralized fashion in [6]. Subdimensional expansion computes mainly decentrally, but coordinates in the joint search space when agents are neighboring [32]. While such approaches typically provide various theoretical guarantees on the proposed algorithms, dynamical models and application on real robotic systems is often not considered.The modularity of our framework is complementary, as it potentially enables existing multiagent literature on gridded workspaces to be used directly or adapted for the generation of a highlevel plan when used in conjunction with our proposed formulation of motion primitives. However, the consideration of continuous time dynamics may complicate the application of discrete planning methods in two ways. First, we must contend with constraints on successive motion primitives so that the continuous time behavior is acceptable  for example, avoiding abrupt changes in velocity. Second, we must contend with nondeterministic transitions to neighboring boxes, because motion primitives may allow more than one next box to be reached [18]  for example, modeling the joint asynchronous motion capabilities of a multirobot system.
Trajectory tracking methods have also been applied to the formation change problem on real vehicles with complex dynamics. A sequential convex programming approach is given in [2], which computes discretized, noncolliding positional trajectories for a modest number of quadrocopters. More recently, an impressive number of quadrocopters were coordinated in [25], by first computing a sequence of gridbased waypoints and then refining it into smoother piecewise polynomials. However, since these openloop trajectories are computed offline, deviations from the computed trajectories could result in crashes. On the other hand, our approach is more robust as it is completely untimed, carefully monitoring the progress of vehicles over the grid in a reactive way based on the measured box transitions.
IiB Formal Methods
A growing body of research has explored the use of formal methods in motion planning. This paper has been particularly inspired by [17], which provides a general framework for solving control problems for affine systems with LTL specifications. Their approach involves constructing a transition system over a polyhedral partition of the state space that arises from linear inequality constraints that constitute the atomic propositions of the LTL specification. Transitions between states of the transition system can occur if there exists an affine or piecewise affine feedback steering all continuous time trajectories from one polyhedral region to a contiguous one. Similar works to [17] include [9, 14, 20], which consider the simpler reachavoid problem. Single and multirobot applications followed shortly after in [11] and [3] respectively.
The appeal of these approaches is derived from their generality and faithful account of the low level system capabilities. On the downside, these methods generally do not scale well to larger state space dimensions, and so they would have limited applicability to large multirobot systems. Our approach specializes these ideas by exploiting symmetry in the system dynamics and grid partition in order to strike a better balance between generality and computational efficiency. In particular, our feedback controllers are given as motion primitives, which can be designed independently of the obstacle and goal locations.
More recent works have also built on these formal method approaches, investigating more complex and realistic multirobot problems. For example, service requests by multiple car robots in a citylike environment with communication constraints was considered in [7]. A cooperative task for ground vehicles was addressed in a distributed manner, enabling knowledge sharing amongst neighbors and reconfiguration of the motion plan in real time [13]. Tasks such as picking up objects are considered in conjunction with motion requirements in [26]. Since these works consider only fairly simple vehicle dynamics, they place greater emphasis on the synthesis of discrete plans satisfying the task specification. On the other hand, this paper considers the simpler reachavoid problem in order to develop a formulation of motion primitives for nonlinear systems with symmetries.
IiC Motion Primitives
The usage of motion primitives has become popular recently in robotics, as they serve to simplify the motion planning problem by using predefined executable motion segments. Many variations exist, which have designed motion primitives using timed reference trajectories to control a formation of quadrocopters [28], paths on a state space lattice for a mobile robots [24, 8], and funnels in the state space centered about a reference trajectory for a car [15] and a small airplane [22].
We have been inspired by ideas in [12], from which we borrowed the term “maneuver automaton”. They define a motion primitive either as an equivalence class of trajectories or a timed maneuver between two classes, whereas we define a motion primitive as a feedback controller over a polyhedral region in the state space. In our formulation, concatenations between motion primitives are possible only across contiguous boxes in the output space, which provides a strict safety guarantee during concatenation. Moreover, this enables our approach to simplify obstacle avoidance to a discrete planning problem over safe boxes as in [8], bypassing the need to concatenate motion primitive trajectories using numerical optimization techniques as in [12].
Our presentation of the maneuver automaton gives explicit constraints on the design of motion primitives so that they can used reliably for high level planning. We have also introduced the notion of parallel composition of maneuver automata to build motion primitives for multirobot systems. While our construction resembles existing methods of parallel composition [34, 30], we additionally prove that our construction preserves desired properties that enable consistency between the low and high levels. To the authors’ best knowledge, this paper is the first rigorous treatment of feedbackbased motion primitives defined on a uniformly gridded output space.
Iii Problem Statement
Consider the general nonlinear control system
(1) 
where is the state, is the input, and is the output. Let and denote the state and output trajectories of (1) starting at initial condition and under some openloop or feedback control.
Let be a feasible set in the output space and let be a goal set. In multivehicle motion planning contexts, represents the feasible joint output configurations of the system, which can arise from specifications involving obstacle avoidance, collision avoidance, communication constraints, and others. We consider the following problem.
Problem III.1 (ReachAvoid).
We are given the system (1), a nonempty feasible set and a nonempty goal set . Find a feedback control and a set of initial conditions such that for each we have

Avoid: for all ,

Reach: there exists such that for all , .
We make an assumption regarding the outputs of the system (1) in order to exploit symmetry; see [12] for an exposition on nonlinear control systems with symmetries.
Assumption III.1.
First, we assume that there is an injective map associating each output to a distinct state, so that . We define the (injective) insertion map as , which satisfies and for all . Second, we assume that the system has a translational invariance with respect to its outputs. That is, for all , and , we have .
The assumption that the outputs of the system are a subset of the states is used in our framework to be able to design feedback controllers in the full state space that achieve desirable behavior in the output space. The second statement says that the vector field is invariant to the value of the output. In the literature this condition is called a symmetry of the system or translational invariance. This assumption is satisfied for many robotic systems, for example, when the outputs are positions. Also, we will see in Section VII that it significantly simplifies our control design.
Iv Modular Framework
In this section we present our methodology to solve the motion planning problem in the form of an architecture that breaks down Problem III.1. This architecture consists of five main modules, as depicted in Figure 2.

The Output Transition System (OTS) is a directed graph whose nodes (called locations) represent dimensional boxes on a gridded output space and whose edges describe which boxes in the output space are contiguous.

The Maneuver Automaton (MA) is a hybrid system whose modes correspond to socalled motion primitives. Each motion primitive is associated with a closedloop vector field by applying a feedback law to (1). The edges of the MA represent feasible successive motion primitives. Each motion primitive generates some desired behavior of the output trajectories of the closedloop system over a box in the output space. Because of the uniform gridding of the output space into boxes and because of the symmetry in the outputs described in Assumption III.1, motion primitives can be designed over only one canonical box .

The Product Automaton (PA) is a graph which is the synchronous product of the OTS and the discrete part of the MA. It represents the combined constraints on feasible motions in the output space and feasible successive motion primitives.

The Hybrid Control Strategy is a combination of low level controllers obtained from the design of motion primitives, and a high level plan on the product automaton.
Next we describe in greater detail the OTS, MA, and PA.
Iva Output Transition System
The OTS provides an abstract description of the workspace or output space associated with the system (1). It serves to capture the feasible motions of output trajectories of the system (1) in a gridded output space, as in Figure 3. Specifically, we partition the output space into dimensional boxes with lengths , where is the length of th edge. We use a finite number of boxes to underapproximate the feasible set . Enumerating the boxes as , the th box can be expressed in the form , where , are constants. We require that . Among these boxes, we assume there is a nonempty set of indices , so that we may similarly underapproximate the goal region as . We define a canonical dimensional box with edge lengths given by . Each box , is a translation of by an amount along the th axis.
Definition IV.1.
Given the lengths and a nonempty goal index set , an output transition system (OTS) is a tuple with the following components:
 State Space

is a finite set of nodes called locations. Each location is associated with a safe box in the output space and hence we write .
 Labels

is a finite set of labels. A label is used to identify the offset between neighbouring boxes.
 Edges

is a set of directed edges where if , , and . Thus, for each , the neighbouring box is either one box to the left (), the same box (), or one box to the right (). In this manner records the offset between contiguous boxes.
 Final Condition

denotes the set of locations associated with goal boxes.
Remark IV.1.
We observe that the OTS is deterministic. That is, for a given and , there is at most one such that . This follows immediately from the fact that records the offset between the neighbouring boxes.
Figure 3 shows a sample OTS for a simple scenario. The OTS locations are associated with 15 feasible boxes, including a goal box for the reachavoid task. The OTS edges are shown as bidirectional arrows; for example, interpreting and on the grid, then .
IvB Maneuver Automaton
The maneuver automaton (MA) is a hybrid system consisting of a finite automaton and continuous time dynamics in each discrete state. The discrete states of the finite automaton correspond to motion primitives, while transitions between discrete states correspond to the allowable transitions between motion primitives. The continuous time dynamics are given by closedloop vector fields (1) with a control law designed based on reach control theory (any other feedback control design method can be used).
Before presenting the MA, we first explain how this module is used in the overall framework. To solve Problem III.1, we assign motion primitives to the boxes of the partitioned output space such that obstacle regions are avoided and the goal region is eventually reached. The discrete part of the MA encodes the constraints on successive motion primitives. Such constraints might arise from a nonchattering requirement, continuity requirement, or requirement on correct switching between regions of the state space. A dynamic programming algorithm for assignment of motion primitives on boxes is addressed in Section IVD; the salient point about this algorithm at this stage is that it only uses the discrete part of the MA.
In contrast, the continuous time part of the MA is used both for simulation of the closedloop dynamics to verify that the motion primitives are well designed, as well as for the implementation of the low level feedback in realtime. The motion primitives are defined only on the canonical box to simplify the design. This simplification is possible because of the translational symmetry provided by Assumption III.1 and the fact that each box is a translation of . In simulation, a given motion primitive can cause output trajectories to reach certain faces of . If a face is reached, the output trajectory is interpreted as being reset to the opposite face and another motion primitive is selected to be implemented over (of course, the real experimental output trajectories do not undergo resets but move continuously from box to box in the output space). The selection of the next motion primitive is constrained by a combination of the previous motion primitive and the face of that is reached. The discrete transitions in the MA encode these constraints.
Definition IV.2.
Consider the system (1) satisfying Assumption III.1 and the box with lengths . The maneuver automaton (MA) is a tuple , where
 State Space

is the hybrid state space, where is a finite set of nodes, each corresponding to a motion primitive.
 Labels

, the same labels used in the OTS.
 Edges

is a finite set of edges.
 Vector Fields

is a function assigning a globally Lipschitz closedloop vector field to each motion primitive . That is, for each , we have where is a feedback controller associated with .
 Invariants

assigns a bounded invariant set to each . We impose that . The set defines the region on which the vector field is defined. Note that there is no requirement that the invariant is a closed set.
 Enabling Conditions

assigns to each edge , a nonempty enabling or guard condition . We require that . We make an additional requirement that lies on a certain face of determined by the label . Defining the face associated with as
we require that also .
 Reset Conditions

assigns to each edge a reset map . We require that , where is the Hadamard product. This definition says that the th output component is reset to the right face of , , if , reset to the left face if , and unchanged otherwise. Overall, resets of states are determined by the event and only affect the output coordinates in order to maintain output trajectories inside the canonical box .
 Initial Conditions

is the set of initial conditions given by .
Example IV.1.
Suppose the system is a double integrator and the first state is the translationally invariant output . The box is simply a segment. Let , where Hold () stabilizes , Forward () increases , and Backward () decreases . Referring to Figure 4, if is the current motion primitive and reaches the right face of , then the event occurs and we may select or as the next motion primitive. To correctly implement the discrete evolution of the MA in the continuous state space, an invariant and feedback control must be associated with each motion primitive, while an enabling and reset condition must be associated with each edge; see Figure 5. Formal details are given in Section VII.
We now formulate assumptions on the motion primitives so that correct continuous time behavior is ensured at the low level for consistency with the high level. For each , define the set of possible events as
(2) 
Assumption IV.1.

For all , .

For all such that and , .

For all such that and , if , then .

For all such that and , .

For all , .

For all , if then for all and , .

For all , if , then for all there exist (a unique) and (a unique) such that for all and for all , and .
Condition (i) disallows tautological chattering behavior that arises by erroneously interpreting continuous evolution of trajectories in the interior of as “discrete transitions” of the MA (see Section V for definitions). Condition (ii) imposes that guard sets are independent of the next motion primitive. Since guard sets arise as the set of exit points of closedloop trajectories from under a given motion primitive, it is reasonable that these exit points should depend only on the current motion primitive , and not on the choice of next motion primitive. Condition (iii) imposes that all guard sets corresponding to different labels are nonoverlapping. This ensures that when the continuous trajectory reaches a guard , then it is unambiguous which edge of the MA is taken next; namely . Conditions (v), (vi), and (vii) are placed to guarantee that the MA is nonblocking. These conditions are based on known results in the literature [21]; see Lemma V.1. In order for condition (vii) to make sense, there must exist a unique label and a unique time for an MA trajectory to reach a guard set. First, we have uniqueness of solutions since the vector fields are globally Lipschitz. Second, the unique MA trajectory can only reach one guard set by condition (iii); this in turn means there is a unique . Obviously there exists a unique time to reach the guard set. Conditions (vi) and (vii) work together to state that either all trajectories do not leave, or all trajectories do eventually leave. Referring to Figure 5, all closedloop state trajectories within the invariant of reach the guard set shown in green on the right. For either choice of next feasible motion primitive, or , trajectories enter the next invariant on the left due to the reset. Finally, condition (iv) eliminates potential chattering Zeno behavior, see Remark V.1.
Remark IV.2.
We make several further observations about the MA.
(i) The MA is nondeterministic in the sense that given and , there may be multiple such that . The discrete part of the MA is nondeterministic in a second sense: for each , the cardinality of the set may be greater than one. The latter situation corresponds to the fact that for different initial conditions of the continuous part, the associated output trajectories can reach different guard sets. In essence, which guard is enabled is interpreted, at the high level, as an uncontrollable event [34]. Remark IV.3 further illustrates these two types of nondeterminism in the case of the PA.
(ii) The set of events in the MA correspond to the same events in the OTS. This correspondence is used in the product automaton PA, described in the next section, to synchronize transitions in the MA with transitions in the OTS. The interpretation is that when a continuous trajectory of the MA (over the box ) undergoes a reset with the label , the associated continuous trajectory of (1) in the box enters a neighboring box with the offset . Obviously, this interpretation assumes that the vector of box lengths is the same in both OTS and MA.
IvC Product Automaton
In this section we introduce the product automaton (PA). It is constructed as the synchronous product of the OTS and the discrete part of the MA, namely . The purpose of the PA is to merge the constraints on successive motion primitives with the constraints on transitions in the OTS in order to enforce feasible and safe motions. As such, it captures the overall feasible motions of the system – any high level plan must adhere to these feasible motions.
Definition IV.3.
We are given an OTS and an MA satisfying Assumption IV.1. We define the product automaton (PA) to be the tuple , where
 State Space

is a finite set of PA states. A PA state satisfies the following: if there exists and such that , then there exists such that . That is, if all faces that can be reached by motion primitive lead to a neighboring box of the box associated with location of the OTS.
 Labels

is the same set of labels used by the OTS and the MA.
 Edges

is a set of directed edges defined according to the following rule. Let , , and . If and , then .
 Final Condition

is the set of final PA states.
Remark IV.3.
Formally an automaton is said to be nondeterministic if there exists a state with more than one outgoing edge with the same label. The PA is nondeterministic. First, consider a PA state . Because the MA allows for more than one feasible next motion primitive such that , the PA will also have multiple next PA states such that . Second, there can be multiple possible labels such that for some . Thus, the PA inherits the two types of nondeterminism of the MA that we discussed in Remark IV.2. For example, consider the PA fragment in Figure 6. For the first type of nondeterminism, observe that there are two PA edges and with the same label. For the second type, observe that there are two possible events from , each with its own set of PA edges. Note also some additional structure: since the OTS is deterministic, the box state is in both and , corresponding to the OTS edge .
IvD HighLevel Plan
In this section we formulate the notion of a control policy on the PA, which gives a rule for selecting subsequent PA states by choosing the next motion primitive. Informally, the objective of the high level plan is to produce a control policy and find a set of initial PA states such that a goal PA state is eventually reached. To this end, in this section we also develop a Dynamic Programming Principle (DPP) suitable for use on the PA. Because of the two types of nondeterminism of the PA, existing algorithms cannot be applied directly [5, 33]. By adapting the algorithm in [5], we obtain two formulations of the DPP, one of which is more computationally efficient as it exploits certain structure in the PA; further details are provided in Remark IV.5.
First some notation will be useful. Recall from (2), given , is the set of all labels on outgoing edges starting at . Similarly, is the set of all labels on outgoing edges starting at . That is,
Now we formalize the semantics of the PA. A state of the PA is a pair where is a location in the OTS and is a motion primitive. A run of is a finite or infinite sequence of states , with and for each , there exists such that . We define the length of a run to be ; for infinite runs is defined to be . We consider a subset of runs starting at that satisfy one further property. If the run is infinite, then if . Instead if the run is finite, then if and additionally, . It is the latter requirement – that the last PA state of a finite run may not have outgoing edges in the PA – which is of interest. The interpretation is that we regard the event labels between PA states as uncontrollable, so if any event is possible, then it must occur eventually. Thus without loss of generality, each run is the prefix of a run . Further elaboration is given in Remark IV.4 (ii).
Given and , the set of admissible motion primitives is
More generally, given and , the set of admissible motion primitives at is
Next we introduce the notion of a control policy. Given and , an admissible control assignment at is a vector
where , or equivalently . Notice that is a vector whose dimension varies as a function of the cardinality of the set . An admissible control policy is a map that assigns an admissible control assignment at each . Thus, for each and , . The set of all admissible control policies is denoted by .
Consider an admissible control policy and a state . We denote the set of runs in induced by as . Formally, if , and for all and , . Similarly, we denote the subset of runs in that eventually reach a state in as . Formally, if there exists an integer such that . For , we define
Next we define an instantaneous cost , which satisfies for all , and a terminal cost . Now consider the run with , , and . We define a costtogo by
Remark IV.4.
There are several notable features of our formula for the costtogo.
(i) For a given , there may be multiple runs due to the (second, nonstandard type of) nondeterminism of the PA. As such, we assume the worst case and take the maximum over in the costtogo. Moreover, we require for a finite costtogo so that is welldefined and all runs starting at eventually reach .
(ii) We have assumed that finite runs must terminate on PA states that have no outgoing edges. Suppose we included in finite prefixes of (finite or infinite) runs. These necessarily would be finite runs with final PA states that have outgoing edges. Then if we take a finite or infinite run that eventually reaches a goal PA state, certain finite prefixes of that run may not yet have reached a goal PA state, and we would get and an infinite costtogo. This anomaly arises from creating an artificial situation in which not all runs starting at an initial PA state reach a goal PA state because we included (unsuccessful) finite prefixes of successful runs.
(iii) The costtogo function also accounts for infinite runs by using the variable to record the first time a goal PA state is reached and by taking the cost only over the associated prefix of the infinite run. Although our primary focus is on reachavoid specifications, in which finite runs terminate on goal PA states with no outgoing edges, infinite runs allow us to extend our framework to a fragment of LTL where, for example, a goal PA state is reached always eventually; see Remark V.3 for further details.
Example IV.2.
Consider the PA shown at the top of Figure 7 corresponding to a single output system with the three motion primitives from Example IV.1 over three boxes . Suppose that for all and that for all .
First consider the feasible control policy with the control assignments: , , , and . The bottom left of Figure 7 shows how the control policy trims away possible edges in the PA. Now suppose that . Choosing the initial condition and under the assumption that we do not include finite runs that terminate at PA states with outgoing edges, we can see that consists of only the single infinite run . Even though this run is infinite, , , and . Similarly, we compute , , , and . In contrast, the feasible control policy shown on the bottom right of Figure 7 only contains finite runs.
Next we define the value function to be
The value function satisfies a dynamic programming principle (DPP) that takes into account the nondeterminacy of ; see [5] where a slightly different result is proved. The proof is found in the appendix.
Theorem IV.1.
Consider and suppose . Then satisfies
(3)  
(4) 
where , , , and .
Notice that for all such that , (since there can be no paths to the goal). Also, for all , .
Remark IV.5.
In (3) of Theorem IV.1, it is shown that can be computed using the local information of instead of using all of . In (4), the result is taken one step further by showing that can be calculated using only for each . The benefit of (4) becomes clear when we compare the cardinality of the sets over which the minimizations occur. Given , let . In (3) the minimization is over , and therefore the cardinality of the minimization set is . In (4) the minimization is over for each , and therefore the cardinality of the set is . While both (3) and (4) can be used to compute , in general (4) will be more computationally efficient.
Corollary IV.1.
Consider the control policy such that for all , and
where , and . Then is an optimal control policy such that for all , .
Figure 8 shows a possible control policy for the scenario in Figure 3. Since there are two outputs, we use the motion primitives from Example IV.1 in each output; formal details are given in Section VI. The control policy was handcomputed. Notice that different routes may be taken from the same product state depending on the face reached, but ultimately the control policy ensures that all paths lead to the goal.
V Main Results
In this section we present our main results on a solution to Problem III.1. Our final result combines the notion of a control policy at the high level with feedback controllers executing correct continuous time behavior at the low level. First, in accordance with the reachavoid objective (see Remark IV.4 (iii)), we assume the existence of motion primitives that can stabilize trajectories within a given box, that is, there exists such that . We restrict the final PA states to be goal OTS states equipped with such motion primitives
(5) 
Now suppose we have an admissible control policy derived using Theorem IV.1 or otherwise with as above. We present a complete solution to Problem III.1 including an initial condition set , a feedback control , and conditions on the motion primitives so that the reachavoid specifications of Problem III.1 are met.
First we specify the initial condition set . The set of feasible initial PA states is
(6) 
That is, a feasible initial PA state satisfies that every run (induced by the control policy) starting at the PA state eventually reaches a goal PA state.
Now consider a state . It can be used as an initial state of the system if there is some for which the state is both in the box and in the invariant of . Recall that for all and ,
Comments
There are no comments yet.