Optimal control for a robotic exploration, pick-up and delivery problem

This paper addresses an optimal control problem for a robot that has to find and collect a finite number of objects and move them to a depot in minimum time. The robot has fourth-order dynamics that change instantaneously at any pick-up or drop-off of an object. The objects are modeled by point masses with a-priori unknown locations in a bounded two-dimensional space that may contain unknown obstacles. For this hybrid system, an Optimal Control Problem (OCP) is approximately solved by a receding horizon scheme, where the derived lower bound for the cost-to-go is evaluated for the worst and for a probabilistic case, assuming a uniform distribution of the objects. First, a time-driven approximate solution based on time and position space discretization and mixed integer programming is presented. Due to the high computational cost of this solution, an alternative event-driven approximate approach based on a suitable motion parameterization and gradient-based optimization is proposed. The solutions are compared in a numerical example, suggesting that the latter approach offers a significant computational advantage while yielding similar qualitative results compared to the former. The methods are particularly relevant for various robotic applications like automated cleaning, search and rescue, harvesting or manufacturing.


Reduced Order Model Predictive Control for Parametrized Parabolic Partial Differential Equations

Model Predictive Control (MPC) is a well-established approach to solve i...

Solving Bang-Bang Problems Using The Immersed Interface Method and Integer Programming

In this paper we study numerically solving optimal control problems with...

Reliable Error Estimates for Optimal Control of Linear Elliptic PDEs with Random Inputs

We discretize a risk-neutral optimal control problem governed by a linea...

Computational performance studies for space-time phase-field fracture optimal control problems

The purpose of this work are computational demonstations for a newly dev...

Space-time formulation and time discretization of phase-field fracture optimal control problems

The purpose of this work is the development of space-time discretization...

Impedance Optimization for Uncertain Contact Interactions Through Risk Sensitive Optimal Control

This paper addresses the problem of computing optimal impedance schedule...

Memory Clustering using Persistent Homology for Multimodality- and Discontinuity-Sensitive Learning of Optimal Control Warm-starts

Shooting methods are an efficient approach to solving nonlinear optimal ...

I Introduction

One of the major challenges in autonomous robotic navigation is coping with uncertainties arising from limited a-priori knowledge of the environment. Acquiring necessary information and achieving the overall goal are complementary subtasks that require adapting the motion of a robot during mission execution, typically accompanied by minimizing a performance criterion. In this work we address an Optimal Control Problem (OCP) for a robot with fourth-order dynamics that has to find, collect and move a finite number of objects to a designated spot in minimum time. The objects with a-priori known masses are located in a bounded two-dimensional space, where the robot is capable of localizing itself using a state-of-the-art simultaneous localization and mapping (SLAM) system [1]. The challenging aspects of the problem at hand are (at least) threefold. One of them arises due to the discontinuity of the value function denoting the overall completion time, which makes it hard to obtain an explicit controller even for deterministic linear systems [2, 3]. Fortunately, a wide range of approximate solutions has been proposed, including approaches based on numerical continuation [4], value set approximation [5], multi-parametric programming [6] etc. Another challenge follows from the requirement to collect a finite number of objects and drop them at a particular spot, both leading to autonomous switchings of the robot’s continuous dynamics. While deterministic versions of this problem can be handled efficiently, e.g., by two-stage optimization [7, 8] or relaxation [9], the complexity of most approaches for stochastic setups scales poorly with the problem size [10]. Since the robot has to reach the corresponding locations of the objects or the depot with minimal overall cost, the overall problem also contains an instance of the well-known NP-hard Traveling Salesperson Problem (TSP) [11]

. Further, optimal exploration of a limited space is an inherently difficult problem by itself. Minimizing the expected time for detecting a target located on a real line with a known probability distribution by a searcher that can change its motion direction instantaneously, has a bounded maximal velocity and starts at the origin, was originally addressed in


. Different versions of this problem have received considerable attention from several research communities, e.g., as a “pursuit-evasion game” in game theory

[13, 14], as a “cow-path problem” in computer science [15] or as a “coverage problem” in control [16, 17]

, but its solution for a general probability distribution or a general geometry of the region is, to a large extent, still an open question. Effective approaches for the related persistent monitoring problem based on estimation


, linear programming

[19] or parametric optimization [20] have been also been proposed. OCPs with uncertainties have also been addressed by certainty equivalent event-triggered [21], minimax [22] and sampling-based [23]

optimization schemes. While methods for Partially Observable Markov Decision Processes (POMDP’s) can also be applied, e.g.,

[24, 25], they typically become computationally infeasible for larger problem instances. Due to the aforementioned aspects, the problem at hand has exponential complexity in the number of objects and for any chosen time and space discretization. In this context, employing a discrete abstraction of the underlying continuous dynamics is often only possible by introducing a hierarchical decomposition [26], or additional assumptions that simplify the implementation of automatically synthesized hybrid controllers [27]. Alternatively, one may resort to receding horizon approaches that have been shown to outperform other optimization methods under the presence of uncertainty, e.g., for the elevator dispatching problem [28], multi-agent reward collection problems [29] or planning with temporal logic constraints [30].

For a scenario where the number of objects is finite but unknown, a combined optimal exploration and control scheme for a robot that has to find, collect and move objects in a two-dimensional position space was proposed in [31]

. The approach was based on a policy enforcing a pick-up upon an object’s detection, followed by a certainty equivalent discrete optimization on a finite abstraction of the robot’s motion in the environment. This heuristic restriction was omitted in

[32], where optimal exploration and control solutions for the worst and a probabilistic case assuming a uniform distribution of the objects on a line interval were derived. Since a direct generalization of this result for higher dimensions was not possible, this paper proposes and compares two approximate receding horizon approaches. The first is based on discretizing time and space and solving a non-convex OCP over a finite horizon by a Mixed Integer Programming (MIP) implementation. In the second approach, the motion of the robot is parameterized by a finite number of parameters. This enables the use of Infinitesimal Perturbation Analysis (IPA) [33] to solve the worst and probabilistic case OCPs by a bi-level iterative optimization scheme, solved only whenever new information becomes available. Preliminary versions of these approaches along a fixed exploratory trajectory have been presented in [34]. Here we extend the methods such that the shape of the exploratory trajectory can be adjusted online, which is particularly useful under the presence of a-priori unknown obstacles.

The remainder of the paper is organized as follows: in Sec. II, we present the problem formulation. Sec. III starts with a brief discussion on the performance index and introduces a lower bound for the cost-to-go, followed by the proposed time-driven (Sec. IV) and event-driven approaches (Sec. V). The four methods are then compared in a numerical example (Sec.  VI), followed by the conclusions in Sec. VII.


For a set , and denote its cardinality and the set of all of its subsets (power set), respectively. For , respectively, , and denote the absolute value and the Euclidean norm.

is an identity matrix with dimension

. represents an

matrix with zero entries. For a vector of zeros or ones with length

, we write or , respectively. denote the sets of reals, non-negative reals and positive reals, respectively. We use the derivatives , and the gradient .

Ii Problem formulation

Consider a finite set of objects , where every , is uniquely characterized by its position , , and mass . A robot has to find, collect and move all objects back to a designated spot (depot), located at , in minimum time. The robot is equipped with an omni-directional sensor footprint of size around its current position , hence covering the area


The overall system is modeled by a hybrid automaton [35], i.e., a 9-tuple . The discrete state at time is , where is the set of objects being carried by the robot, the set of objects that has been dropped at the depot prior to or at time , and is the set of objects that have been detected so far. Clearly, with . The current mass of the robot is , where is the nominal mass of the robot. The overall continuous state consists of the robot state , where is the current velocity of the robot, and the region that has not been explored at time . The robot state evolves according to a finite collection of vector fields , i.e.


driven by the piecewise continuous control signal , where is the free final time for the overall assignment. As is finite, the set of discrete state transitions (or events) is also finite. Let be partitioned into , where for ,

is the set of detection events,

is the set of pick-up events, and

corresponds to the set of drop-off events. With the introduced sensor paradigm (1), detection events occur when the distance between the current robot position and the position of an object that has not been detected so far becomes . Pick-up events occur when the robot reaches the position of an object that has not been collected so far. Drop-off events occur when the robot reaches the depot and carries objects. In addition, for both pick-up and drop-off events, zero velocity is required. The corresponding conditions on and for the occurrence of detection, pick-up and drop-off events are captured by the invariant , i.e.,

and the guard map , i.e. with ,

For example, upon a detection of a new object as per (1), when the robot is in the discrete state , the first case of the Inv requires that a transition must occur, and the first case of allows a transition to a discrete state , where the discovered object is included in the detected objects set . The reset map is trivial since no jumps of the continuous variables occur upon a discrete state switching. Note that the above conditions do not depend on , and hence, Inv, and map into instead of . As both the robot and the objects are represented by points in , we assume that no collisions can occur. A practical setup that satisfies this assumption is, e.g., a quadrotor that has to explore a two-dimensional space on the ground from above. Finally, as the robot is assumed to start at the depot with zero velocity, and no objects have been detected, picked up or dropped off before that, the initial state set is .

Remark 1.

Obstacles in can be easily included in the proposed approaches. However, to keep notation as simple as possible, we omit their presence in the main analysis and briefly outline the solution that was used to handle the obstacle in the numerical example (Sec. VI) in a follow-up remark.

Solving the addressed problem involves detection events, pick-up events and up to drop-off events, as it can be advantageous to collect several objects on the way and drop them off simultaneously at the depot. Hence, for the total number of events, holds. The time of the occurrence of event , is denoted by , is the initial time, the final time, and . The time intervals , form the time axis from the initial to the final time with . The input is an ordered set of functions , where are absolutely continuous functions for . Thus, if is an execution of the hybrid automaton for an input signal , i.e. , is a discrete state trajectory with . is the continuous state trajectory with , where are absolutely continuous functions, and non-increasing functions, i.e., for . The cost of an execution is the total task time


Let denote the set of states that can be reached upon completing the task. One way to account for the uncertainty in the addressed OCP is to minimize, at time

, the largest cost that may occur for a possible configuration of all objects that have not been discovered so far. Alternatively, the positions of the objects that have not been detected so far can be assumed to be independent identically distributed random variables with probability density functions


, where measures the size of . This leads to the following worst-case (A) and probabilistic (B) OCPs.

Problem 1.

At state , find the input signal for , such that for

Note that Problem A is always deterministic, while Problem B is probabilistic until the last detection of an object.

The outline of the solution reads as follows. First, we provide a discussion on the time-optimal value function and derive a lower bound for the cost-to-go. Then, we propose two approximation-based approaches for Problems A and B – one that requires time discretization and re-computation at every time step, and one based on motion parameterization that allows for an event-driven implementation, i.e., the corresponding OCPs are re-solved only upon the occurrence of a detection event.

Iii Preliminary analysis

Let be a time instant at which the robot has reached a pick-up or drop-off location with zero velocity. The overall cost-to-go at state is


i.e., the sum of the time until the next pick-up or drop-off at time , and the remaining time until the final state is reached.

Fig. 1: Discrete dynamics of for with , , , . Exploration takes place at gray states. The robot has zero velocity at states denoted by a square.

Assuming that all objects have been detected prior to , the second term on the right hand sight of (5) is the cost of the optimal sequence of pick-ups and drop-offs, necessary for completing the overall task. Let the set of all corresponding discrete state strings from the state to the final discrete state be denoted by


Minimizing the cost of a particular sequence can be decoupled in terms of the input at every pick-up and drop-off time instant , i.e.,


with and . Assuming the absence of obstacles in , the time-optimal motion of the robot with dynamics (2) from the hybrid state to with is on straight lines. Thus, using an affine transformation, (2) can be reduced to a double integrator in one dimensional space. The OCP for the reduced model corresponds to the classical linear time-OCP [2] solved by a piecewise constant control that takes values in the set and yields the optimal cost . The controller can be transformed back to (2) by using the inverse affine transformation (details can be found in [8]). Since a transition from a hybrid state, where the robot with dynamics (2) has zero velocity, to another hybrid state, where the robot has zero velocity, can be tightly lower bounded by the cost for the time-optimal point-to-point motion of a double integrator with zero initial and final velocity, for the optimal cost of a string we obtain


To illustrate this expression, consider a scenario with two remaining objects, both to be picked up and dropped. The corresponding discrete dynamics of are captured by the quadruple (Fig. 1), where and are the corresponding sets of , and and are the initial and final discrete state (specified by Init and Fin), respectively. If both objects have been detected and the robot is at rest, the right hand side of (9) denotes the actual cost-to-go for completing the task. In addition, the right hand side of (9) can be used as a lower bound for the cost-to-go at time , where the robot is at rest but not both objects have been discovered, i.e.,


which represents the cost-to-go without taking into account exploration.

To obtain a finite conservative approximation for , introduce a finite cover of by cells defined by a set of grid points , equally spaced by , such that (see Fig. 2 for an example). Let denote the set of grid points, whose associated cells have not been completely covered by the robot’s sensing range (1) until time , i.e. . Thus, is over-approximated by . With that, we can turn to approximate solutions of Problems A and B.

Fig. 2: A robot with sensing radius over the coarsest allowed grid (a). A snapshot of the robot that has moved from to with (b). The area covered along the path is under-approximated over the grid . The over-approximation (in gray) of is described by a finite number of rectangular regions (c) used for the time-driven optimization.

Iv Time-driven optimization

In this section, we present an approximation of Problem 1 based on equidistant time discretization.

Iv-a Worst-case solution

By applying the min-max inequality and (9), since , , for the (certainty equivalent) worst-case evaluation of (5), we obtain


where is defined as


which follows from relaxing the assumption at time in a sense that the robot has zero velocity at pick-up or drop-off locations, but not all objects have necessarily been detected before . Since is finite, it is possible to reformulate (11

) by introducing a dummy variable

and additional nonlinear constraints for each string in leading to

As the robot has zero velocity at , the initial (approximately) optimal control can be obtained by solving (11) followed by re-translation to (2), as described in the previous section. Once the robot starts moving, optimizing the first term of the third line of (10) at the optimum is difficult in continuous time. Therefore, consider a finite equidistant sampling of a time horizon beginning at with sampling time , which we assume to include the yet unknown time , i.e. , , . Then, at every time instant, given the solution of (11), we solve


where is the constraint set resulting from the corresponding discrete-time version of the hybrid automaton . The OCP can be approximately implemented as a MILP. For further implementation details, we refer the reader to Appendix A and [31] for a closely related OCP.

Iv-B Probabilistic solution

With (4), (5) and (9), prior to the discovery of all objects, the optimal cost in the probabilistic case is given by the minimum expected time (omitting function arguments)


where denotes the area of . The approximation of the first term follows from the fact that is certainly greater or equal to the shortest time needed for the robot to move from its current position to a currently unexplored point in , while the approximation of the second term is obtained by applying Jensen’s inequality. To compute the control that minimizes the first term in (13), we formulate a MILP analogously to (12). The second term in (13) is obtained through numerical integration of for over , followed by choosing the sequence that yields the minimal cost. This allows for a receding horizon scheme that minimizes the cost-to-go at each time instant until all objects are dropped off.

V Event-driven optimization

The approaches presented in the previous section require solving computationally expensive MIPs at each time instant online. Since the locations of the objects are the only source of uncertainty in the considered problem, the ultimate goal is a tractable and scalable, albeit suboptimal alternative that avoids time discretization and requires re-computation only upon a detection. The approach proposed in the following is based on restricting the motion of the robot to a pre-specified family of curves, whose shape is determined by a finite parameter vector, such that the cost-to-go can be evaluated efficiently. This allows for an event-driven scheme based on an iterative gradient-based optimization over the parameters of the curve only upon detection.

Let the robot’s position be described by the parametric equation


where denotes the position of the robot along the curve , is a parameter vector that controls the shape of , and is twice continuously differentiable with respect to and . Let be the monotonically non-decreasing curve length function of over . With denoting the arc-length of , let denote the normed arc-length variable, such that at the initial position, and at the final position. The parametric functions employed in this work are Fourier series (see Appendix B) that exhibit rich expressiveness in terms of motion behaviors and allow for an efficient solution of the optimization problem. Other types of parametric functions or more complex robot dynamics may also be used, as outlined in Remark 2.

Upon detection, optimization will be performed by a bi-level optimization algorithm, based on iteratively solving the following two OCPs:

  1. Find the parameter that determines the optimal shape of solving Problem 1;

  2. Control the motion of the robot along by the optimal that respects the restrictions imposed by .

The outline of the algorithm reads as follows: starting with an initial parameter guess for , we solve the low-level OCP 2). Then, is updated by solving 1) using the solution of 2). The high-level OCP is solved by an augmented Lagrangian method that allows for replacing the constrained optimization problem by a series of unconstrained optimization problems. Employing Infinitesimal Perturbation Analysis [33], we obtain the derivative of the augmented cost and solve the unconstrained OCPs by gradient-based methods. The steps 1) and 2) are solved iteratively until reaching a (local) minimum of the OCP, which is attained upon satisfying an iteration threshold condition. We start with solving the second step.

V-a Optimal motion along the curve

Let the first and second derivatives of (14) w.r.t. be and , respectively. Further, let and denote the time derivatives. For the velocity and the acceleration along (14), we respectively obtain

and the robot’s dynamics (2) are restated as


With the employed arc-length parameterization, the robot traverses the curve at constant speed, i.e., , where is the arc-length of . Substituting in polar coordinates in (15) and using , (2) is equivalently restated by (14) and the state with dynamics


To simplify the analysis in the following, the necessary optimality conditions for will be derived for


which represents a reasonable approximation of (16) along general Fourier series curves. Note that, for lines, implies and , and (17) describes the dynamics of the robot exactly. Since the sensor footprint (1) is typically much smaller than , for evaluating the cost-to-go we assume that prior to their discovery all objects are located on (14), i.e., , and neglect the sensing range of the robot. A preliminary version of this analysis was presented in [32]. In what follows, we further assume that the high-level OCP (presented in the following section) provides an optimal parameter , such that the robot moving along (14) with plans to cover the remaining space as long as there are objects to be detected, and passes through object locations that have been discovered previously but have not been picked-up yet. We start the analysis assuming that there is only one object, i.e. , with mass located at .

V-A1 Optimal control for one object

The robot with dynamics (17) starts at , and . Clearly, the optimal control solving Problem 1 is divided into three parts, i.e. , denoting the control until detection, the control until pick-up and the control until drop-off. After the object is detected at time , when the robot moves with velocity , it can be reached at time with by employing a time-optimal bang-bang controller with a switching at time [2], i.e.,

Solving (17) with and , and applying the boundary conditions for the object’s pick-up , yields the optimal cost


Since the robot stops at , steering it back to the depot by is again given by bang-bang control [2]. Since its corresponding cost is independent of , , it can be neglected for finding .

In the worst case, the object is located the furthest away from the initial point, i.e., at . Thus, the time-optimal control satisfies the condition


In the probabilistic case, the object’s location is uniformly distributed over . To compute we need to consider the time from detection to pick-up (18), yielding . To obtain a standard representation for the cost, introduce an additional state for the unknown detection time , leading to an extended system state with dynamics (17) and . Substituting the relation , the expected time for picking up the object is

where is free and the boundary constraints

must be satisfied. Thus, the probabilistic OCP has been transformed into a free final time nonlinear OCP. The corresponding control Hamiltonian is


with absolutely continuous costate dynamics


Applying Pontryagin’s Minimum Principle, there exists an optimal state , a control , and a nontrivial costate trajectory, such that ,


leading to the following theorems.

Theorem 1.

The optimal control is for almost all .

Theorem 2.

The optimal control for is


The proofs can be found in Appendix C.

V-A2 Control for multiple objects

Fig. 3: Scenario upon detecting object , denoted by , and object , denoted by , with yet unknown position. Two planned position trajectories for the robot that has previously moved from to are shown with dashed lines in (a) and (b). The corresponding generalized trajectory is shown in (c) and (d), respectively.

For multi-object setups, we propose an approach to obtain the control analogously to the single-object case. While the worst-case optimal control is built from a finite sequence of bang-bang control segments, a computationally tractable scheme for the probabilistic control is derived as follows. Recall the robot moving in the position space shown in Fig. 2 (b) and consider a scenario with two objects. Like in the single-object case, the robot starts moving in the discrete state . The possible discrete event strings until the robot comes to a halt for the first time consist of detecting an object followed by its immediate pick-up, i.e. , or detecting both objects and stopping at one of the two objects’ positions, i.e. , (see Fig. 1). For both cases, we employ the control (23) for the time interval up to the detection of the first object, although for more than one object this policy may not be optimal. Now assume that object has just been detected. Fig. 3 and 3 show two possible curves that provide complete exploration of , allow for picking up and end at the depot. The trajectories resulting from the probabilistic control are shown in Fig. 3 and 3, respectively. The particular control can be derived using the corresponding boundary and continuity conditions, as shown in Appendix D. The analysis for these two cases can be easily generalized to obtain the probabilistic control for an arbitrary finite number of unexplored segments in the interval .

Moving on to the problem with more than two objects, we obtain the controls along the curve by following a similar line of argumentation – both the worst-case and the probabilistic controls consist of a finite sequence of appropriate bang-bang control segments. For that, consider the set of discrete state strings from a state , where all objects have been detected, to the final discrete state , as defined in (6). Then, let denote the projection of all onto . For a given and with , the curve (14) yields a discrete state string that is traversed in time


Since minimizing (24) over the free parameters of the proposed policy, i.e. the switching times, can be decoupled at pick-up and drop-off instants, we solve Two-Point Boundary Value Problems (TPBVP’s)