One of the major challenges in autonomous robotic navigation is coping with uncertainties arising from limited a-priori knowledge of the environment. Acquiring necessary information and achieving the overall goal are complementary subtasks that require adapting the motion of a robot during mission execution, typically accompanied by minimizing a performance criterion. In this work we address an Optimal Control Problem (OCP) for a robot with fourth-order dynamics that has to find, collect and move a finite number of objects to a designated spot in minimum time. The objects with a-priori known masses are located in a bounded two-dimensional space, where the robot is capable of localizing itself using a state-of-the-art simultaneous localization and mapping (SLAM) system . The challenging aspects of the problem at hand are (at least) threefold. One of them arises due to the discontinuity of the value function denoting the overall completion time, which makes it hard to obtain an explicit controller even for deterministic linear systems [2, 3]. Fortunately, a wide range of approximate solutions has been proposed, including approaches based on numerical continuation , value set approximation , multi-parametric programming  etc. Another challenge follows from the requirement to collect a finite number of objects and drop them at a particular spot, both leading to autonomous switchings of the robot’s continuous dynamics. While deterministic versions of this problem can be handled efficiently, e.g., by two-stage optimization [7, 8] or relaxation , the complexity of most approaches for stochastic setups scales poorly with the problem size . Since the robot has to reach the corresponding locations of the objects or the depot with minimal overall cost, the overall problem also contains an instance of the well-known NP-hard Traveling Salesperson Problem (TSP) 
. Further, optimal exploration of a limited space is an inherently difficult problem by itself. Minimizing the expected time for detecting a target located on a real line with a known probability distribution by a searcher that can change its motion direction instantaneously, has a bounded maximal velocity and starts at the origin, was originally addressed in
. Different versions of this problem have received considerable attention from several research communities, e.g., as a “pursuit-evasion game” in game theory[13, 14], as a “cow-path problem” in computer science  or as a “coverage problem” in control [16, 17]
, but its solution for a general probability distribution or a general geometry of the region is, to a large extent, still an open question. Effective approaches for the related persistent monitoring problem based on estimation19] or parametric optimization  have been also been proposed. OCPs with uncertainties have also been addressed by certainty equivalent event-triggered , minimax  and sampling-based 
optimization schemes. While methods for Partially Observable Markov Decision Processes (POMDP’s) can also be applied, e.g.,[24, 25], they typically become computationally infeasible for larger problem instances. Due to the aforementioned aspects, the problem at hand has exponential complexity in the number of objects and for any chosen time and space discretization. In this context, employing a discrete abstraction of the underlying continuous dynamics is often only possible by introducing a hierarchical decomposition , or additional assumptions that simplify the implementation of automatically synthesized hybrid controllers . Alternatively, one may resort to receding horizon approaches that have been shown to outperform other optimization methods under the presence of uncertainty, e.g., for the elevator dispatching problem , multi-agent reward collection problems  or planning with temporal logic constraints .
For a scenario where the number of objects is finite but unknown, a combined optimal exploration and control scheme for a robot that has to find, collect and move objects in a two-dimensional position space was proposed in 
. The approach was based on a policy enforcing a pick-up upon an object’s detection, followed by a certainty equivalent discrete optimization on a finite abstraction of the robot’s motion in the environment. This heuristic restriction was omitted in, where optimal exploration and control solutions for the worst and a probabilistic case assuming a uniform distribution of the objects on a line interval were derived. Since a direct generalization of this result for higher dimensions was not possible, this paper proposes and compares two approximate receding horizon approaches. The first is based on discretizing time and space and solving a non-convex OCP over a finite horizon by a Mixed Integer Programming (MIP) implementation. In the second approach, the motion of the robot is parameterized by a finite number of parameters. This enables the use of Infinitesimal Perturbation Analysis (IPA)  to solve the worst and probabilistic case OCPs by a bi-level iterative optimization scheme, solved only whenever new information becomes available. Preliminary versions of these approaches along a fixed exploratory trajectory have been presented in . Here we extend the methods such that the shape of the exploratory trajectory can be adjusted online, which is particularly useful under the presence of a-priori unknown obstacles.
The remainder of the paper is organized as follows: in Sec. II, we present the problem formulation. Sec. III starts with a brief discussion on the performance index and introduces a lower bound for the cost-to-go, followed by the proposed time-driven (Sec. IV) and event-driven approaches (Sec. V). The four methods are then compared in a numerical example (Sec. VI), followed by the conclusions in Sec. VII.
For a set , and denote its cardinality and the set of all of its subsets (power set), respectively. For , respectively, , and denote the absolute value and the Euclidean norm. is an identity matrix with dimension matrix with zero entries. For a vector of zeros or ones with length
is an identity matrix with dimension. represents an
matrix with zero entries. For a vector of zeros or ones with length, we write or , respectively. denote the sets of reals, non-negative reals and positive reals, respectively. We use the derivatives , and the gradient .
Ii Problem formulation
Consider a finite set of objects , where every , is uniquely characterized by its position , , and mass . A robot has to find, collect and move all objects back to a designated spot (depot), located at , in minimum time. The robot is equipped with an omni-directional sensor footprint of size around its current position , hence covering the area
The overall system is modeled by a hybrid automaton , i.e., a 9-tuple . The discrete state at time is , where is the set of objects being carried by the robot, the set of objects that has been dropped at the depot prior to or at time , and is the set of objects that have been detected so far. Clearly, with . The current mass of the robot is , where is the nominal mass of the robot. The overall continuous state consists of the robot state , where is the current velocity of the robot, and the region that has not been explored at time . The robot state evolves according to a finite collection of vector fields , i.e.
driven by the piecewise continuous control signal , where is the free final time for the overall assignment. As is finite, the set of discrete state transitions (or events) is also finite. Let be partitioned into , where for ,
is the set of detection events,
is the set of pick-up events, and
corresponds to the set of drop-off events. With the introduced sensor paradigm (1), detection events occur when the distance between the current robot position and the position of an object that has not been detected so far becomes . Pick-up events occur when the robot reaches the position of an object that has not been collected so far. Drop-off events occur when the robot reaches the depot and carries objects. In addition, for both pick-up and drop-off events, zero velocity is required. The corresponding conditions on and for the occurrence of detection, pick-up and drop-off events are captured by the invariant , i.e.,
and the guard map , i.e. with ,
For example, upon a detection of a new object as per (1), when the robot is in the discrete state , the first case of the Inv requires that a transition must occur, and the first case of allows a transition to a discrete state , where the discovered object is included in the detected objects set . The reset map is trivial since no jumps of the continuous variables occur upon a discrete state switching. Note that the above conditions do not depend on , and hence, Inv, and map into instead of . As both the robot and the objects are represented by points in , we assume that no collisions can occur. A practical setup that satisfies this assumption is, e.g., a quadrotor that has to explore a two-dimensional space on the ground from above. Finally, as the robot is assumed to start at the depot with zero velocity, and no objects have been detected, picked up or dropped off before that, the initial state set is .
Obstacles in can be easily included in the proposed approaches. However, to keep notation as simple as possible, we omit their presence in the main analysis and briefly outline the solution that was used to handle the obstacle in the numerical example (Sec. VI) in a follow-up remark.
Solving the addressed problem involves detection events, pick-up events and up to drop-off events, as it can be advantageous to collect several objects on the way and drop them off simultaneously at the depot. Hence, for the total number of events, holds. The time of the occurrence of event , is denoted by , is the initial time, the final time, and . The time intervals , form the time axis from the initial to the final time with . The input is an ordered set of functions , where are absolutely continuous functions for . Thus, if is an execution of the hybrid automaton for an input signal , i.e. , is a discrete state trajectory with . is the continuous state trajectory with , where are absolutely continuous functions, and non-increasing functions, i.e., for . The cost of an execution is the total task time
Let denote the set of states that can be reached upon completing the task. One way to account for the uncertainty in the addressed OCP is to minimize, at time
, the largest cost that may occur for a possible configuration of all objects that have not been discovered so far. Alternatively, the positions of the objects that have not been detected so far can be assumed to be independent identically distributed random variables with probability density functions
, where measures the size of . This leads to the following worst-case (A) and probabilistic (B) OCPs.
At state , find the input signal for , such that for
Note that Problem A is always deterministic, while Problem B is probabilistic until the last detection of an object.
The outline of the solution reads as follows. First, we provide a discussion on the time-optimal value function and derive a lower bound for the cost-to-go. Then, we propose two approximation-based approaches for Problems A and B – one that requires time discretization and re-computation at every time step, and one based on motion parameterization that allows for an event-driven implementation, i.e., the corresponding OCPs are re-solved only upon the occurrence of a detection event.
Iii Preliminary analysis
Let be a time instant at which the robot has reached a pick-up or drop-off location with zero velocity. The overall cost-to-go at state is
i.e., the sum of the time until the next pick-up or drop-off at time , and the remaining time until the final state is reached.
Assuming that all objects have been detected prior to , the second term on the right hand sight of (5) is the cost of the optimal sequence of pick-ups and drop-offs, necessary for completing the overall task. Let the set of all corresponding discrete state strings from the state to the final discrete state be denoted by
Minimizing the cost of a particular sequence can be decoupled in terms of the input at every pick-up and drop-off time instant , i.e.,
with and . Assuming the absence of obstacles in , the time-optimal motion of the robot with dynamics (2) from the hybrid state to with is on straight lines. Thus, using an affine transformation, (2) can be reduced to a double integrator in one dimensional space. The OCP for the reduced model corresponds to the classical linear time-OCP  solved by a piecewise constant control that takes values in the set and yields the optimal cost . The controller can be transformed back to (2) by using the inverse affine transformation (details can be found in ). Since a transition from a hybrid state, where the robot with dynamics (2) has zero velocity, to another hybrid state, where the robot has zero velocity, can be tightly lower bounded by the cost for the time-optimal point-to-point motion of a double integrator with zero initial and final velocity, for the optimal cost of a string we obtain
To illustrate this expression, consider a scenario with two remaining objects, both to be picked up and dropped. The corresponding discrete dynamics of are captured by the quadruple (Fig. 1), where and are the corresponding sets of , and and are the initial and final discrete state (specified by Init and Fin), respectively. If both objects have been detected and the robot is at rest, the right hand side of (9) denotes the actual cost-to-go for completing the task. In addition, the right hand side of (9) can be used as a lower bound for the cost-to-go at time , where the robot is at rest but not both objects have been discovered, i.e.,
which represents the cost-to-go without taking into account exploration.
To obtain a finite conservative approximation for , introduce a finite cover of by cells defined by a set of grid points , equally spaced by , such that (see Fig. 2 for an example). Let denote the set of grid points, whose associated cells have not been completely covered by the robot’s sensing range (1) until time , i.e. . Thus, is over-approximated by . With that, we can turn to approximate solutions of Problems A and B.
Iv Time-driven optimization
In this section, we present an approximation of Problem 1 based on equidistant time discretization.
Iv-a Worst-case solution
where is defined as
which follows from relaxing the assumption at time in a sense that the robot has zero velocity at pick-up or drop-off locations, but not all objects have necessarily been detected before . Since is finite, it is possible to reformulate (11
) by introducing a dummy variableand additional nonlinear constraints for each string in leading to
As the robot has zero velocity at , the initial (approximately) optimal control can be obtained by solving (11) followed by re-translation to (2), as described in the previous section. Once the robot starts moving, optimizing the first term of the third line of (10) at the optimum is difficult in continuous time. Therefore, consider a finite equidistant sampling of a time horizon beginning at with sampling time , which we assume to include the yet unknown time , i.e. , , . Then, at every time instant, given the solution of (11), we solve
where is the constraint set resulting from the corresponding discrete-time version of the hybrid automaton . The OCP can be approximately implemented as a MILP. For further implementation details, we refer the reader to Appendix A and  for a closely related OCP.
Iv-B Probabilistic solution
where denotes the area of . The approximation of the first term follows from the fact that is certainly greater or equal to the shortest time needed for the robot to move from its current position to a currently unexplored point in , while the approximation of the second term is obtained by applying Jensen’s inequality. To compute the control that minimizes the first term in (13), we formulate a MILP analogously to (12). The second term in (13) is obtained through numerical integration of for over , followed by choosing the sequence that yields the minimal cost. This allows for a receding horizon scheme that minimizes the cost-to-go at each time instant until all objects are dropped off.
V Event-driven optimization
The approaches presented in the previous section require solving computationally expensive MIPs at each time instant online. Since the locations of the objects are the only source of uncertainty in the considered problem, the ultimate goal is a tractable and scalable, albeit suboptimal alternative that avoids time discretization and requires re-computation only upon a detection. The approach proposed in the following is based on restricting the motion of the robot to a pre-specified family of curves, whose shape is determined by a finite parameter vector, such that the cost-to-go can be evaluated efficiently. This allows for an event-driven scheme based on an iterative gradient-based optimization over the parameters of the curve only upon detection.
Let the robot’s position be described by the parametric equation
where denotes the position of the robot along the curve , is a parameter vector that controls the shape of , and is twice continuously differentiable with respect to and . Let be the monotonically non-decreasing curve length function of over . With denoting the arc-length of , let denote the normed arc-length variable, such that at the initial position, and at the final position. The parametric functions employed in this work are Fourier series (see Appendix B) that exhibit rich expressiveness in terms of motion behaviors and allow for an efficient solution of the optimization problem. Other types of parametric functions or more complex robot dynamics may also be used, as outlined in Remark 2.
Upon detection, optimization will be performed by a bi-level optimization algorithm, based on iteratively solving the following two OCPs:
Find the parameter that determines the optimal shape of solving Problem 1;
Control the motion of the robot along by the optimal that respects the restrictions imposed by .
The outline of the algorithm reads as follows: starting with an initial parameter guess for , we solve the low-level OCP 2). Then, is updated by solving 1) using the solution of 2). The high-level OCP is solved by an augmented Lagrangian method that allows for replacing the constrained optimization problem by a series of unconstrained optimization problems. Employing Infinitesimal Perturbation Analysis , we obtain the derivative of the augmented cost and solve the unconstrained OCPs by gradient-based methods. The steps 1) and 2) are solved iteratively until reaching a (local) minimum of the OCP, which is attained upon satisfying an iteration threshold condition. We start with solving the second step.
V-a Optimal motion along the curve
and the robot’s dynamics (2) are restated as
With the employed arc-length parameterization, the robot traverses the curve at constant speed, i.e., , where is the arc-length of . Substituting in polar coordinates in (15) and using , (2) is equivalently restated by (14) and the state with dynamics
To simplify the analysis in the following, the necessary optimality conditions for will be derived for
which represents a reasonable approximation of (16) along general Fourier series curves. Note that, for lines, implies and , and (17) describes the dynamics of the robot exactly. Since the sensor footprint (1) is typically much smaller than , for evaluating the cost-to-go we assume that prior to their discovery all objects are located on (14), i.e., , and neglect the sensing range of the robot. A preliminary version of this analysis was presented in . In what follows, we further assume that the high-level OCP (presented in the following section) provides an optimal parameter , such that the robot moving along (14) with plans to cover the remaining space as long as there are objects to be detected, and passes through object locations that have been discovered previously but have not been picked-up yet. We start the analysis assuming that there is only one object, i.e. , with mass located at .
V-A1 Optimal control for one object
The robot with dynamics (17) starts at , and . Clearly, the optimal control solving Problem 1 is divided into three parts, i.e. , denoting the control until detection, the control until pick-up and the control until drop-off. After the object is detected at time , when the robot moves with velocity , it can be reached at time with by employing a time-optimal bang-bang controller with a switching at time , i.e.,
Solving (17) with and , and applying the boundary conditions for the object’s pick-up , yields the optimal cost
Since the robot stops at , steering it back to the depot by is again given by bang-bang control . Since its corresponding cost is independent of , , it can be neglected for finding .
In the worst case, the object is located the furthest away from the initial point, i.e., at . Thus, the time-optimal control satisfies the condition
In the probabilistic case, the object’s location is uniformly distributed over . To compute we need to consider the time from detection to pick-up (18), yielding . To obtain a standard representation for the cost, introduce an additional state for the unknown detection time , leading to an extended system state with dynamics (17) and . Substituting the relation , the expected time for picking up the object is
where is free and the boundary constraints
must be satisfied. Thus, the probabilistic OCP has been transformed into a free final time nonlinear OCP. The corresponding control Hamiltonian is
with absolutely continuous costate dynamics
Applying Pontryagin’s Minimum Principle, there exists an optimal state , a control , and a nontrivial costate trajectory, such that ,
leading to the following theorems.
The optimal control is for almost all .
The optimal control for is
The proofs can be found in Appendix C.
V-A2 Control for multiple objects
For multi-object setups, we propose an approach to obtain the control analogously to the single-object case. While the worst-case optimal control is built from a finite sequence of bang-bang control segments, a computationally tractable scheme for the probabilistic control is derived as follows. Recall the robot moving in the position space shown in Fig. 2 (b) and consider a scenario with two objects. Like in the single-object case, the robot starts moving in the discrete state . The possible discrete event strings until the robot comes to a halt for the first time consist of detecting an object followed by its immediate pick-up, i.e. , or detecting both objects and stopping at one of the two objects’ positions, i.e. , (see Fig. 1). For both cases, we employ the control (23) for the time interval up to the detection of the first object, although for more than one object this policy may not be optimal. Now assume that object has just been detected. Fig. 3 and 3 show two possible curves that provide complete exploration of , allow for picking up and end at the depot. The trajectories resulting from the probabilistic control are shown in Fig. 3 and 3, respectively. The particular control can be derived using the corresponding boundary and continuity conditions, as shown in Appendix D. The analysis for these two cases can be easily generalized to obtain the probabilistic control for an arbitrary finite number of unexplored segments in the interval .
Moving on to the problem with more than two objects, we obtain the controls along the curve by following a similar line of argumentation – both the worst-case and the probabilistic controls consist of a finite sequence of appropriate bang-bang control segments. For that, consider the set of discrete state strings from a state , where all objects have been detected, to the final discrete state , as defined in (6). Then, let denote the projection of all onto . For a given and with , the curve (14) yields a discrete state string that is traversed in time
Since minimizing (24) over the free parameters of the proposed policy, i.e. the switching times, can be decoupled at pick-up and drop-off instants, we solve Two-Point Boundary Value Problems (TPBVP’s)