I Introduction
One of the major challenges in autonomous robotic navigation is coping with uncertainties arising from limited apriori knowledge of the environment. Acquiring necessary information and achieving the overall goal are complementary subtasks that require adapting the motion of a robot during mission execution, typically accompanied by minimizing a performance criterion. In this work we address an Optimal Control Problem (OCP) for a robot with fourthorder dynamics that has to find, collect and move a finite number of objects to a designated spot in minimum time. The objects with apriori known masses are located in a bounded twodimensional space, where the robot is capable of localizing itself using a stateoftheart simultaneous localization and mapping (SLAM) system [1]. The challenging aspects of the problem at hand are (at least) threefold. One of them arises due to the discontinuity of the value function denoting the overall completion time, which makes it hard to obtain an explicit controller even for deterministic linear systems [2, 3]. Fortunately, a wide range of approximate solutions has been proposed, including approaches based on numerical continuation [4], value set approximation [5], multiparametric programming [6] etc. Another challenge follows from the requirement to collect a finite number of objects and drop them at a particular spot, both leading to autonomous switchings of the robot’s continuous dynamics. While deterministic versions of this problem can be handled efficiently, e.g., by twostage optimization [7, 8] or relaxation [9], the complexity of most approaches for stochastic setups scales poorly with the problem size [10]. Since the robot has to reach the corresponding locations of the objects or the depot with minimal overall cost, the overall problem also contains an instance of the wellknown NPhard Traveling Salesperson Problem (TSP) [11]
. Further, optimal exploration of a limited space is an inherently difficult problem by itself. Minimizing the expected time for detecting a target located on a real line with a known probability distribution by a searcher that can change its motion direction instantaneously, has a bounded maximal velocity and starts at the origin, was originally addressed in
[12]. Different versions of this problem have received considerable attention from several research communities, e.g., as a “pursuitevasion game” in game theory
[13, 14], as a “cowpath problem” in computer science [15] or as a “coverage problem” in control [16, 17], but its solution for a general probability distribution or a general geometry of the region is, to a large extent, still an open question. Effective approaches for the related persistent monitoring problem based on estimation
[18][19] or parametric optimization [20] have been also been proposed. OCPs with uncertainties have also been addressed by certainty equivalent eventtriggered [21], minimax [22] and samplingbased [23]optimization schemes. While methods for Partially Observable Markov Decision Processes (POMDP’s) can also be applied, e.g.,
[24, 25], they typically become computationally infeasible for larger problem instances. Due to the aforementioned aspects, the problem at hand has exponential complexity in the number of objects and for any chosen time and space discretization. In this context, employing a discrete abstraction of the underlying continuous dynamics is often only possible by introducing a hierarchical decomposition [26], or additional assumptions that simplify the implementation of automatically synthesized hybrid controllers [27]. Alternatively, one may resort to receding horizon approaches that have been shown to outperform other optimization methods under the presence of uncertainty, e.g., for the elevator dispatching problem [28], multiagent reward collection problems [29] or planning with temporal logic constraints [30].For a scenario where the number of objects is finite but unknown, a combined optimal exploration and control scheme for a robot that has to find, collect and move objects in a twodimensional position space was proposed in [31]
. The approach was based on a policy enforcing a pickup upon an object’s detection, followed by a certainty equivalent discrete optimization on a finite abstraction of the robot’s motion in the environment. This heuristic restriction was omitted in
[32], where optimal exploration and control solutions for the worst and a probabilistic case assuming a uniform distribution of the objects on a line interval were derived. Since a direct generalization of this result for higher dimensions was not possible, this paper proposes and compares two approximate receding horizon approaches. The first is based on discretizing time and space and solving a nonconvex OCP over a finite horizon by a Mixed Integer Programming (MIP) implementation. In the second approach, the motion of the robot is parameterized by a finite number of parameters. This enables the use of Infinitesimal Perturbation Analysis (IPA) [33] to solve the worst and probabilistic case OCPs by a bilevel iterative optimization scheme, solved only whenever new information becomes available. Preliminary versions of these approaches along a fixed exploratory trajectory have been presented in [34]. Here we extend the methods such that the shape of the exploratory trajectory can be adjusted online, which is particularly useful under the presence of apriori unknown obstacles.The remainder of the paper is organized as follows: in Sec. II, we present the problem formulation. Sec. III starts with a brief discussion on the performance index and introduces a lower bound for the costtogo, followed by the proposed timedriven (Sec. IV) and eventdriven approaches (Sec. V). The four methods are then compared in a numerical example (Sec. VI), followed by the conclusions in Sec. VII.
Notation.
For a set , and denote its cardinality and the set of all of its subsets (power set), respectively. For , respectively, , and denote the absolute value and the Euclidean norm.
is an identity matrix with dimension
. represents anmatrix with zero entries. For a vector of zeros or ones with length
, we write or , respectively. denote the sets of reals, nonnegative reals and positive reals, respectively. We use the derivatives , and the gradient .Ii Problem formulation
Consider a finite set of objects , where every , is uniquely characterized by its position , , and mass . A robot has to find, collect and move all objects back to a designated spot (depot), located at , in minimum time. The robot is equipped with an omnidirectional sensor footprint of size around its current position , hence covering the area
(1) 
The overall system is modeled by a hybrid automaton [35], i.e., a 9tuple . The discrete state at time is , where is the set of objects being carried by the robot, the set of objects that has been dropped at the depot prior to or at time , and is the set of objects that have been detected so far. Clearly, with . The current mass of the robot is , where is the nominal mass of the robot. The overall continuous state consists of the robot state , where is the current velocity of the robot, and the region that has not been explored at time . The robot state evolves according to a finite collection of vector fields , i.e.
(2) 
driven by the piecewise continuous control signal , where is the free final time for the overall assignment. As is finite, the set of discrete state transitions (or events) is also finite. Let be partitioned into , where for ,
is the set of detection events,
is the set of pickup events, and
corresponds to the set of dropoff events. With the introduced sensor paradigm (1), detection events occur when the distance between the current robot position and the position of an object that has not been detected so far becomes . Pickup events occur when the robot reaches the position of an object that has not been collected so far. Dropoff events occur when the robot reaches the depot and carries objects. In addition, for both pickup and dropoff events, zero velocity is required. The corresponding conditions on and for the occurrence of detection, pickup and dropoff events are captured by the invariant , i.e.,
and the guard map , i.e. with ,
For example, upon a detection of a new object as per (1), when the robot is in the discrete state , the first case of the Inv requires that a transition must occur, and the first case of allows a transition to a discrete state , where the discovered object is included in the detected objects set . The reset map is trivial since no jumps of the continuous variables occur upon a discrete state switching. Note that the above conditions do not depend on , and hence, Inv, and map into instead of . As both the robot and the objects are represented by points in , we assume that no collisions can occur. A practical setup that satisfies this assumption is, e.g., a quadrotor that has to explore a twodimensional space on the ground from above. Finally, as the robot is assumed to start at the depot with zero velocity, and no objects have been detected, picked up or dropped off before that, the initial state set is .
Remark 1.
Obstacles in can be easily included in the proposed approaches. However, to keep notation as simple as possible, we omit their presence in the main analysis and briefly outline the solution that was used to handle the obstacle in the numerical example (Sec. VI) in a followup remark.
Solving the addressed problem involves detection events, pickup events and up to dropoff events, as it can be advantageous to collect several objects on the way and drop them off simultaneously at the depot. Hence, for the total number of events, holds. The time of the occurrence of event , is denoted by , is the initial time, the final time, and . The time intervals , form the time axis from the initial to the final time with . The input is an ordered set of functions , where are absolutely continuous functions for . Thus, if is an execution of the hybrid automaton for an input signal , i.e. , is a discrete state trajectory with . is the continuous state trajectory with , where are absolutely continuous functions, and nonincreasing functions, i.e., for . The cost of an execution is the total task time
(3) 
Let denote the set of states that can be reached upon completing the task. One way to account for the uncertainty in the addressed OCP is to minimize, at time
, the largest cost that may occur for a possible configuration of all objects that have not been discovered so far. Alternatively, the positions of the objects that have not been detected so far can be assumed to be independent identically distributed random variables with probability density functions
(4) 
, where measures the size of . This leads to the following worstcase (A) and probabilistic (B) OCPs.
Problem 1.
At state , find the input signal for , such that for
Note that Problem A is always deterministic, while Problem B is probabilistic until the last detection of an object.
The outline of the solution reads as follows. First, we provide a discussion on the timeoptimal value function and derive a lower bound for the costtogo. Then, we propose two approximationbased approaches for Problems A and B – one that requires time discretization and recomputation at every time step, and one based on motion parameterization that allows for an eventdriven implementation, i.e., the corresponding OCPs are resolved only upon the occurrence of a detection event.
Iii Preliminary analysis
Let be a time instant at which the robot has reached a pickup or dropoff location with zero velocity. The overall costtogo at state is
(5) 
i.e., the sum of the time until the next pickup or dropoff at time , and the remaining time until the final state is reached.
Assuming that all objects have been detected prior to , the second term on the right hand sight of (5) is the cost of the optimal sequence of pickups and dropoffs, necessary for completing the overall task. Let the set of all corresponding discrete state strings from the state to the final discrete state be denoted by
(6) 
Minimizing the cost of a particular sequence can be decoupled in terms of the input at every pickup and dropoff time instant , i.e.,
(7) 
with and . Assuming the absence of obstacles in , the timeoptimal motion of the robot with dynamics (2) from the hybrid state to with is on straight lines. Thus, using an affine transformation, (2) can be reduced to a double integrator in one dimensional space. The OCP for the reduced model corresponds to the classical linear timeOCP [2] solved by a piecewise constant control that takes values in the set and yields the optimal cost . The controller can be transformed back to (2) by using the inverse affine transformation (details can be found in [8]). Since a transition from a hybrid state, where the robot with dynamics (2) has zero velocity, to another hybrid state, where the robot has zero velocity, can be tightly lower bounded by the cost for the timeoptimal pointtopoint motion of a double integrator with zero initial and final velocity, for the optimal cost of a string we obtain
(8) 
To illustrate this expression, consider a scenario with two remaining objects, both to be picked up and dropped. The corresponding discrete dynamics of are captured by the quadruple (Fig. 1), where and are the corresponding sets of , and and are the initial and final discrete state (specified by Init and Fin), respectively. If both objects have been detected and the robot is at rest, the right hand side of (9) denotes the actual costtogo for completing the task. In addition, the right hand side of (9) can be used as a lower bound for the costtogo at time , where the robot is at rest but not both objects have been discovered, i.e.,
(9) 
which represents the costtogo without taking into account exploration.
To obtain a finite conservative approximation for , introduce a finite cover of by cells defined by a set of grid points , equally spaced by , such that (see Fig. 2 for an example). Let denote the set of grid points, whose associated cells have not been completely covered by the robot’s sensing range (1) until time , i.e. . Thus, is overapproximated by . With that, we can turn to approximate solutions of Problems A and B.
Iv Timedriven optimization
In this section, we present an approximation of Problem 1 based on equidistant time discretization.
Iva Worstcase solution
By applying the minmax inequality and (9), since , , for the (certainty equivalent) worstcase evaluation of (5), we obtain
(10) 
where is defined as
(11) 
which follows from relaxing the assumption at time in a sense that the robot has zero velocity at pickup or dropoff locations, but not all objects have necessarily been detected before . Since is finite, it is possible to reformulate (11
) by introducing a dummy variable
and additional nonlinear constraints for each string in leading toAs the robot has zero velocity at , the initial (approximately) optimal control can be obtained by solving (11) followed by retranslation to (2), as described in the previous section. Once the robot starts moving, optimizing the first term of the third line of (10) at the optimum is difficult in continuous time. Therefore, consider a finite equidistant sampling of a time horizon beginning at with sampling time , which we assume to include the yet unknown time , i.e. , , . Then, at every time instant, given the solution of (11), we solve
(12) 
where is the constraint set resulting from the corresponding discretetime version of the hybrid automaton . The OCP can be approximately implemented as a MILP. For further implementation details, we refer the reader to Appendix A and [31] for a closely related OCP.
IvB Probabilistic solution
With (4), (5) and (9), prior to the discovery of all objects, the optimal cost in the probabilistic case is given by the minimum expected time (omitting function arguments)
(13) 
where denotes the area of . The approximation of the first term follows from the fact that is certainly greater or equal to the shortest time needed for the robot to move from its current position to a currently unexplored point in , while the approximation of the second term is obtained by applying Jensen’s inequality. To compute the control that minimizes the first term in (13), we formulate a MILP analogously to (12). The second term in (13) is obtained through numerical integration of for over , followed by choosing the sequence that yields the minimal cost. This allows for a receding horizon scheme that minimizes the costtogo at each time instant until all objects are dropped off.
V Eventdriven optimization
The approaches presented in the previous section require solving computationally expensive MIPs at each time instant online. Since the locations of the objects are the only source of uncertainty in the considered problem, the ultimate goal is a tractable and scalable, albeit suboptimal alternative that avoids time discretization and requires recomputation only upon a detection. The approach proposed in the following is based on restricting the motion of the robot to a prespecified family of curves, whose shape is determined by a finite parameter vector, such that the costtogo can be evaluated efficiently. This allows for an eventdriven scheme based on an iterative gradientbased optimization over the parameters of the curve only upon detection.
Let the robot’s position be described by the parametric equation
(14) 
where denotes the position of the robot along the curve , is a parameter vector that controls the shape of , and is twice continuously differentiable with respect to and . Let be the monotonically nondecreasing curve length function of over . With denoting the arclength of , let denote the normed arclength variable, such that at the initial position, and at the final position. The parametric functions employed in this work are Fourier series (see Appendix B) that exhibit rich expressiveness in terms of motion behaviors and allow for an efficient solution of the optimization problem. Other types of parametric functions or more complex robot dynamics may also be used, as outlined in Remark 2.
Upon detection, optimization will be performed by a bilevel optimization algorithm, based on iteratively solving the following two OCPs:

Find the parameter that determines the optimal shape of solving Problem 1;

Control the motion of the robot along by the optimal that respects the restrictions imposed by .
The outline of the algorithm reads as follows: starting with an initial parameter guess for , we solve the lowlevel OCP 2). Then, is updated by solving 1) using the solution of 2). The highlevel OCP is solved by an augmented Lagrangian method that allows for replacing the constrained optimization problem by a series of unconstrained optimization problems. Employing Infinitesimal Perturbation Analysis [33], we obtain the derivative of the augmented cost and solve the unconstrained OCPs by gradientbased methods. The steps 1) and 2) are solved iteratively until reaching a (local) minimum of the OCP, which is attained upon satisfying an iteration threshold condition. We start with solving the second step.
Va Optimal motion along the curve
Let the first and second derivatives of (14) w.r.t. be and , respectively. Further, let and denote the time derivatives. For the velocity and the acceleration along (14), we respectively obtain
and the robot’s dynamics (2) are restated as
(15) 
With the employed arclength parameterization, the robot traverses the curve at constant speed, i.e., , where is the arclength of . Substituting in polar coordinates in (15) and using , (2) is equivalently restated by (14) and the state with dynamics
(16) 
To simplify the analysis in the following, the necessary optimality conditions for will be derived for
(17) 
which represents a reasonable approximation of (16) along general Fourier series curves. Note that, for lines, implies and , and (17) describes the dynamics of the robot exactly. Since the sensor footprint (1) is typically much smaller than , for evaluating the costtogo we assume that prior to their discovery all objects are located on (14), i.e., , and neglect the sensing range of the robot. A preliminary version of this analysis was presented in [32]. In what follows, we further assume that the highlevel OCP (presented in the following section) provides an optimal parameter , such that the robot moving along (14) with plans to cover the remaining space as long as there are objects to be detected, and passes through object locations that have been discovered previously but have not been pickedup yet. We start the analysis assuming that there is only one object, i.e. , with mass located at .
VA1 Optimal control for one object
The robot with dynamics (17) starts at , and . Clearly, the optimal control solving Problem 1 is divided into three parts, i.e. , denoting the control until detection, the control until pickup and the control until dropoff. After the object is detected at time , when the robot moves with velocity , it can be reached at time with by employing a timeoptimal bangbang controller with a switching at time [2], i.e.,
Solving (17) with and , and applying the boundary conditions for the object’s pickup , yields the optimal cost
(18) 
Since the robot stops at , steering it back to the depot by is again given by bangbang control [2]. Since its corresponding cost is independent of , , it can be neglected for finding .
In the worst case, the object is located the furthest away from the initial point, i.e., at . Thus, the timeoptimal control satisfies the condition
(19) 
In the probabilistic case, the object’s location is uniformly distributed over . To compute we need to consider the time from detection to pickup (18), yielding . To obtain a standard representation for the cost, introduce an additional state for the unknown detection time , leading to an extended system state with dynamics (17) and . Substituting the relation , the expected time for picking up the object is
where is free and the boundary constraints
must be satisfied. Thus, the probabilistic OCP has been transformed into a free final time nonlinear OCP. The corresponding control Hamiltonian is
(20) 
with absolutely continuous costate dynamics
(21) 
Applying Pontryagin’s Minimum Principle, there exists an optimal state , a control , and a nontrivial costate trajectory, such that ,
(22) 
leading to the following theorems.
Theorem 1.
The optimal control is for almost all .
Theorem 2.
The optimal control for is
(23) 
The proofs can be found in Appendix C.
VA2 Control for multiple objects
For multiobject setups, we propose an approach to obtain the control analogously to the singleobject case. While the worstcase optimal control is built from a finite sequence of bangbang control segments, a computationally tractable scheme for the probabilistic control is derived as follows. Recall the robot moving in the position space shown in Fig. 2 (b) and consider a scenario with two objects. Like in the singleobject case, the robot starts moving in the discrete state . The possible discrete event strings until the robot comes to a halt for the first time consist of detecting an object followed by its immediate pickup, i.e. , or detecting both objects and stopping at one of the two objects’ positions, i.e. , (see Fig. 1). For both cases, we employ the control (23) for the time interval up to the detection of the first object, although for more than one object this policy may not be optimal. Now assume that object has just been detected. Fig. 3 and 3 show two possible curves that provide complete exploration of , allow for picking up and end at the depot. The trajectories resulting from the probabilistic control are shown in Fig. 3 and 3, respectively. The particular control can be derived using the corresponding boundary and continuity conditions, as shown in Appendix D. The analysis for these two cases can be easily generalized to obtain the probabilistic control for an arbitrary finite number of unexplored segments in the interval .
Moving on to the problem with more than two objects, we obtain the controls along the curve by following a similar line of argumentation – both the worstcase and the probabilistic controls consist of a finite sequence of appropriate bangbang control segments. For that, consider the set of discrete state strings from a state , where all objects have been detected, to the final discrete state , as defined in (6). Then, let denote the projection of all onto . For a given and with , the curve (14) yields a discrete state string that is traversed in time
(24) 
Since minimizing (24) over the free parameters of the proposed policy, i.e. the switching times, can be decoupled at pickup and dropoff instants, we solve TwoPoint Boundary Value Problems (TPBVP’s)