Feedback Synthesis For Underactuated Systems Using Sequential Second-Order Needle Variations

04/24/2018 ∙ by Giorgos Mamakoukas, et al. ∙ Northwestern University 0

This paper derives nonlinear feedback control synthesis for general control affine systems using second-order actions---the second-order needle variations of optimal control---as the basis for choosing each control response to the current state. A second result of the paper is that the method provably exploits the nonlinear controllability of a system by virtue of an explicit dependence of the second-order needle variation on the Lie bracket between vector fields. As a result, each control decision necessarily decreases the objective when the system is nonlinearly controllable using first-order Lie brackets. Simulation results using a differential drive cart, an underactuated kinematic vehicle in three dimensions, and an underactuated dynamic model of an underwater vehicle demonstrate that the method finds control solutions when the first-order analysis is singular. Lastly, the underactuated dynamic underwater vehicle model demonstrates convergence even in the presence of a velocity field.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

page 12

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With many important applications in aerial or underwater missions, systems are underactuated either by design—in order to reduce actuator weight, expenses, or energy consumption—or as a result of technical failures. In both cases, it is important to develop control policies that can exploit the nonlinearities of the dynamics, are general enough for this broad class of systems, and easily computable. Various approaches to nonlinear control range from steering methods using sinusoid controls (Murray & Sastry, 1993), sequential actions of Lie bracket sequences (Murray et al., 1994) and backstepping (Kokotovic, 1992; Seto & Baillieul, 1994) to perturbation methods (Junkins & Thompson, 1986), sliding mode control (SMC) (Perruquetti & Barbot, 2002; Utkin, 1992; Xu & Özgüner, 2008), intelligent control (Brown & Passino, 1997; Harris et al., 1993) or hybrid control (Fierro et al., 1999) and nonlinear model predictive control (NMPC) methods (Allgöwer et al., 2004). These schemes have been successful on well-studied examples including, but not limited to, the rolling disk, the kinematic car, wheeling mobile robots, the Snakeboard, surface vessels, quadrotors, and cranes (Bullo et al., 2000; Lin et al., 2014; Escareño et al., 2013; Reyhanoglu et al., 1996; Fang et al., 2003; Toussaint et al., 2000; Bouadi et al., 2007b, a; Chen et al., 2013; Nakazono et al., 2008; Shammas & de Oliveira, 2012; Morbidi & Prattichizzo, 2007; Roy & Asada, 2007; Becker & Bretl, 2010; Kolmanovsky & McClamroch, 1995; Boskovic et al., 1999).

The aforementioned methods have limitations. In the case of perturbations, the applied controls assume a future of control decisions that do not take the disturbance history into account; backstepping is generally ineffective in the presence of control limits and NMPC methods are typically computationally expensive. SMC methods suffer from chattering, which results in high energy consumption and instability risks by virtue of exciting unmodeled high-frequency dynamics (Khalil, 2002), intelligent control methods are subject to data uncertainties (El-Nagar et al., 2014), while other methods are often case-specific and will not hold for the level of generality encountered in robotics. We address these limitations by using needle variations to compute real-time feedback laws for general nonlinear systems affine in control, discussed next.

1.1 Needle Variations Advantages to Optimal Control

In this paper, we investigate using needle variation methods to find optimal controls for nonlinear controllable systems. Needle variations consider the sensitivity of the cost function to infinitesimal application of controls and synthesize actions that reduce the objective (Aseev & Veliov, 2014; Shaikh & Caines, 2007). Such control synthesis methods have the advantage of efficiency in terms of computational effort, making them appropriate for online feedback—similar to other model predictive control methods, such as iLQG (Todorov & Li, 2005), but with the advantage, as shown here, of having provable formal properties over the entire state space. For time evolving objectives, as in the case of trajectory tracking tasks, controls calculated from other methods (such as sinusoids or Lie brackets for nonholonomic integrators) may be rendered ineffective as the target continuously moves to different states. In such cases, needle variation controls have the advantage of computing actions that directly reduce the cost, without depending on future control decisions. However, needle variation methods, to the best of our knowledge, have not yet considered higher than first-order sensitivities of the cost function.

We demonstrate analytically in Section III that, by considering second-order needle variations, we obtain variations that explicitly depend on the Lie brackets between vector fields and, as a consequence, the higher-order nonlinearities in the system. Later, in Section III, we show that, for classically studied systems, such as the differential drive cart, this amounts to being able to guarantee that the control approach is globally certain to provide descent at every state, despite the conditions of Brockett’s theorem (Brockett, 1983) on nonexistence of smooth feedback laws for such systems. We extend this result by proving that second-order needle variations controls necessarily decrease the objective for the entire class of systems that are controllable with first-order Lie brackets. As a consequence, provided that the objective is convex with respect to the state (in the unconstrained sense), second-order needle variation controls provably guarantee that the agent reaches the target in a collision-free manner in the presence of obstacles without relying on predefined trajectories.

1.2 Paper Contribution and Structure

This paper derives the second-order sensitivity of the cost function with respect to infinitesimal duration of inserted control, which we will refer to interchangeably as the second-order mode insertion gradient or mode insertion Hessian (MIH). We relate the MIH expression to controllability analysis by revealing its underlying Lie bracket structure and present a method of using second-order needle variation actions to expand the set of states for which individual actions that guarantee descent of an objective function can be computed. Finally, we compute an analytical solution of controls that uses the first two orders of needle variations.

This paper expands the work presented in Mamakoukas et al. (2017) by including the derivations of the MIH, the proofs that guarantee descent, and extensive simulation results that include comparisons to alternative feedback algorithms. Further, we extend the results to account for obstacles and prove the algorithm finds collision-free solutions for the controllable systems considered, including simulations of obstacle-avoidance for static and moving obstacles.

The content is structured as follows. In Section II, we provide relevant research background in the field of motion planning for controllable systems. In Section III, we prove that second-order needle variations guarantee control solutions for systems that are nonlinearly controllable using first-order Lie brackets. We use this result to provably generate collision-free trajectories that safely reach the target among obstacles provided convex objectives in the unconstrained sense. In Section IV, we present an analytical control synthesis method that uses second-order needle actions. In Section V, we implement the proposed synthesis method and present simulation results on a controllable, underactuated model of a 2D differential drive vehicle, a 3D controllable, underactuated kinematic rigid body and a 3D underactuated dynamic model of an underwater vehicle.

2 Existing methods for controllable systems

In this section, we present a review of some popular methods that are available for underactuated, controllable systems, followed by a discussion of techniques for collision-avoidance. An introduction to these methods, as well as additional algorithms for controllable systems, can be found in La Valle (2011).

2.1 Optimization Algorithms for Nonholonomic Controllable Systems

Nonholonomic systems are underactuated agents subject to nonintegrable differential constraints. Examples include wheeled agents that are not allowed to skid (e.g., unicycle, differential drive, tricycle). Nonholonomic systems are of interest to the control community because one needs to obtain solutions for motion planning tasks (Kolmanovsky & McClamroch, 1995).

The concept of controllability is indispensable in the study of nonholonomic systems. Controllability analytically answers the existence of control solutions that move a certain agent between arbitrary states in finite time, and, in doing so, it reveals all possible effects of combined control inputs of underactuated systems that are subject to velocity, but not displacement, constraints.

A popular approach in controlling nonholonomic systems is piecewise constant motion planning (Sussmann, 1991; Lafferriere & Sussmann, 1993). Lafferriere and Sussmann (Lafferriere & Sussmann, 1991, 1993) extend the original dynamics with fictitious action variables in the direction of the nested Lie brackets to determine a control for the extended system. They first compute the time the system must flow along each vector field, in a sequential manner, to accomplish a given motion of the extended system. Then, using the Campbell-Baker-Hausdorff-Dynkin (CBHD) formula (Strichartz, 1987; Rossmann, 2002; La Valle, 2011), they recover the solution in terms of the original inputs of the system.

On the other hand, piecewise constant motion planning is model-specific, since the process changes for different number of inputs. In addition, solutions involve a sequence of individual actions that generate the Lie bracket motion and the actuation sequence grows increasingly larger for higher order brackets. Compensating for the third-order error in the CBHD formula involves two second-order Lie brackets and twenty successive individual inputs, each of infinitesimal duration (McMickell & Goodwine, 2007). The sequence is described in detail by Lafferriere & Sussmann (1993). In practice, such actuation becomes challenging as the number of switches grows. The theoretically infinitesimal duration of each input may be hard to reproduce in hardware, while, in the face of uncertainty and time-evolving trajectories, actuation consisting of a large sequence of controls (e.g., of twenty actions) is likely to change once feedback is received.

Another popular approach is steering using sinusoids (Brockett, 1982; Murray & Sastry, 1993; Murray et al., 1994; Sastry, 2013; Teel et al., 1995; Laumond et al., 1998). This method applies sinusoidal control inputs of integrally related frequencies. States are sequentially brought into the desired configuration in stages, while the rest of the states remain invariant over a single cycle. This approach has been validated in generating motion of an underactuated robot fish (Morgansen et al., 2001).

Steering using sinusoids suffers from the complicated sequence of actions that grows as a function of the inputs involved. Moreover, besides also being model-specific, the method addresses each state separately, meaning each state gets controlled by its own periodic motion, requiring periods for an -dimensional system, leading to slow convergence. Further, solutions focus on the final states (at the end of each cycle) and not their time evolution, hence they may temporarily increase the running cost (consider the car example of Fig. 7 in Murray & Sastry (1993)). As with the method of piecewise constant motion planning, when tracking a moving target, these factors also compromise the performance of this approach.

Other trajectory generation techniques for controllable systems involve differential flatness (Lamiraux & Laumond, 2000; Rathinam & Murray, 1998; Ross & Fahroo, 2004; Fliess et al., 1995; Rouchon et al., 1993) and kinematic reduction (Bullo & Lynch, 2001; Lynch et al., 2000; Murphey & Burdick, 2006). Control based on differential flatness uses outputs and their derivatives to determine control laws. However, as discussed in Choudhury & Lynch (2004), there is not an automatic procedure to discover whether flat outputs exist. Further, differential flatness does not apply to all controllable systems and motion planning is further complicated when control limits or obstacles are present (Bullo & Lynch, 2001).

2.2 Motion Planning for Controllable Systems in the Presence of Obstacles

Controllability in its classical sense concerns itself with the existence of an action trajectory that can move the agent to a desired state, subject to the differential constraints posed by the dynamics, in the absence of obstacles. Controllability is an inherent property of the dynamics and reveals all allowable motion, disregarding the presence of physical constraints in the environment. This is true for the methods discussed in Section 2.1.

Feasible path planning amidst obstacles is often treated separately from the optimal control problem. Most commonly, feasible trajectories are generated with efficient path planners, such as rapidly-exploring random tree (RRT) and probabilistic road map (PRM) methods (LaValle & Kuffner Jr, 2001; Hsu et al., 2002). The distinction between path planning and optimal control can be seen in work by Choudhury & Lynch (2004); Lynch et al. (2000) that uses such motion planners to generate trajectories among obstacles and then uses them as a reference to compute the optimal control. In this setting, nonholonomic motion consists of two stages, the path planning and the feedback synthesis that tracks the feasible trajectory.

Another solution to obstacle avoidance in motion planning is the use of barrier certificates (Prajna et al., 2007). Barrier certificates provably enforce collision-free behavior by minimally perturbing, in a least-squares sense, the control response in order to satisfy safety constraints. Feedback synthesis proceeds without accounting for obstacles and solutions are modified, only when necessary, via a quadratic program (QP) subject to constraints that ensure collision avoidance (Borrmann et al., 2015; Xu et al., 2015; Wang et al., 2017; Ames et al., 2014; Wu & Sreenath, 2016).

Additional solutions to obstacle avoidance include compensating functions that eliminate local minima in the objective caused by the obstacles (Deng et al., 2008), as well as designing navigation functions using inverse Lyapunov expressions (Tanner et al., 2001). The former method computes the local minima in the objective and constructs a plane surface function to remove them and make the objective convex. This process can be cumbersome, as one would have to locate all local minima in the objective induced by the obstacles and then calculate the compensating function. On the other hand, navigation functions, described in Rimon & Koditschek (1992); Tanner et al. (2001), are globally convergent potential functions and are system-specific.

Several of these collision-avoidance algorithms are not system-specific and could be implemented with our controller, later outlined in Section IV. In simulation results, presented in Section V, we show collision-avoidance using only penalty functions in the objective, demonstrating that the proposed controller succeeds in tasks (collision avoidance) that traditionally require sophisticated treatment.

3 Needle Variation Controls based on Non-Linear Controllability

In this section, we relate the controllability of systems to first- and second-order needle variation actions. After presenting the MIH expression, we relate the MIH to the Lie bracket terms between vector fields. Using this connection, we tie the descent property of needle variation actions to the controllability of a system and prove that second-order needle variation controls can produce control solutions for a wider set of the configuration state space than first-order needle variation methods. As a result, we are able to constructively compute, via an analytic solution, control formulas that are guaranteed to provide descent, provided that the system is controllable with first-order Lie brackets. Generalization to higher-order Lie brackets appears to have the same structure, but that analysis is postponed to future work.

3.1 Second-Order Mode Insertion Gradient

Needle variation methods in optimal control have served as the basic tool in proving the Pontryagin’s Maximum Principle (Pontryagin et al., 1962; Dmitruk & Osmolovskii, 2014; Garavello & Piccoli, 2005). Using piecewise dynamics, they introduce infinitesimal perturbations in control that change the default trajectory and objective (see Fig. 1). Such dynamics are typically used in optimal control of hybrid systems to optimize the schedule of a-priori known modes (Egerstedt et al., 2006; Caldwell & Murphey, 2016).

Here, instead, we consider dynamics of a single switch to obtain a new control mode at every time step that will optimally perturb the trajectory(Ansari & Murphey, 2016). The feedback algorithm presented in Ansari & Murphey (2016), however, only considers the first-order sensitivity of the cost function to a needle action and, as a result, often fails to provide solutions for controllable underactuated systems. By augmenting the algorithm with higher order information (via the MIH), we are able to provide solutions in cases when the first-order needle variation algorithm in Ansari & Murphey (2016) is singular.

Consider a system with state and control with control-affine dynamics of the form

(1)

where is the drift vector field. Consider a time period and control modes described by

(2)

where and are the dynamics associated with default and inserted control and , respectively. Parameters and are the duration of the inserted dynamics and the switching time between the two modes.

Note that the default control is the input for the nominal trajectory— could itself be the result of a different controller—which is then improved by the insertion of a new control vector creating a switched mode . In addition, while the default control of the switched mode sequence in (2) may be time-dependent, the dynamics have control that has a fixed value over .

Figure 1: A fixed-value perturbation in the nominal control, introduced at time and with duration , and the associated variation in the state. In the limit , the control perturbation becomes a needle variation.

Given a cost function of the form

(3)

where is the running cost and the terminal cost, the mode insertion gradient (MIG), derived in Egerstedt et al. (2006), is

(4)

where is the first-order adjoint state, which is calculated from the default trajectory and given by

(5)

We use the subscript to indicate that a certain variable is considered after evaluating the limit . For brevity, the dependencies of variables are dropped. While the objective for needle variation controls has typically not included a control term, doing so is straightforward and yields similar performance. Work in Ansari & Murphey (2016) has considered objectives with control terms, and one can recompute the mode insertion gradient and mode insertion Hessian assuming the objective depends on without impacting any of the rest of the approach.

The derivation of the mode insertion Hessian is similar to Caldwell & Murphey (2011) and is presented in the Appendix. For dynamics that do not depend on the control duration, the mode insertion Hessian (MIH)11endnote: 1In this work, we consider the second-order sensitivity with respect to an action centered at one single application time . It is also possible to consider the second-order sensitivity with respect to two application times and in the same iteration. Assuming that the entire control curve is a descent direction over the time horizon for second-order needle variation solutions, as we have proved is the case for first-order needle variation methods in recently submitted work (Mamakoukas et al., 2018), multiple second-order needle actions at different application times would still decrease the objective. On the other hand, searching for two application times would slow down the algorithm and was not preferred in this work. is given by

(6)

where is the second-order adjoint state, which is calculated from the default trajectory and is given by

(7)

The superscript in the dynamics refers to the element of the vector.

3.2 Dependence of Second Order Needle Variations on Lie Bracket Structure

The Lie bracket of two vectors , and is

which generates a control vector that points in the direction of the net infinitesimal change in state created by infinitesimal noncommutative flow , where is the flow along a vector field for time (Murray et al., 1994; Jakubczyk, 2001). Lie brackets are most commonly used for their connection to controllability (Rashevsky, 1938; Chow, 1940/1941), but here they will show up in the expression describing the second-order needle variation.

We relate second-order needle variation actions to Lie brackets in order to connect the existence of descent-providing controls to the nonlinear controllability of a system. Let denote the column control vectors that make up in (1) and be the individual control inputs. Then, we can express dynamics as

and, for default control , we can re-write the MIH as

Splitting the sum expression into diagonal () and off-diagonal () elements, and by adding and subtracting , we can write

Then, we can express the MIH as

The expression contains Lie bracket terms of the control vectors that appear in the system dynamics, indicating that second-order needle variations incorporate higher-order nonlinearities. By associating the MIH to Lie brackets, we next prove that second-order needle variation actions can guarantee decrease of the objective for systems that are controllable with first-order Lie brackets.

3.3 Existence of Control Solutions with First- and Second-Order Mode Insertion Gradients

In this section, we prove that the first two orders of the mode insertion gradient can be used to guarantee controls that reduce objectives of the form (3) for systems that are controllable with first-order Lie brackets. The analysis is applicable to optimization problems that satisfy the following assumptions.

Assumption 1.

The vector elements of dynamics and are real, bounded, in , and in and .

Assumption 2.

The incremental cost is real, bounded, and in . The terminal cost is real and twice differentiable with respect to .

Assumption 3.

Default and inserted controls and are real, bounded, and in .

Under Assumptions 1-3, the MIG and MIH expressions exist and are unique. Then, as we show next, there are control actions that can improve any objective as long as there exists for which .

Definition 1.

A trajectory described by a pair is the global minimizer of the objective function for which .

Given Definition 1, a trajectory described by a pair is the global minimizer of the cost function in the unconstrained sense (not subject to the dynamics of the system) and satisfies throughout the time horizon considered.

Assumption 4.

The pair describes the only trajectory for which the unconstrained derivative of the objective is equal to zero (i.e., ).

Assumption 4 is necessary to prove that the first-order adjoint is non-zero, which is a requirement for the controllability results shown in this work. It assumes that the objective function in the unconstrained sense does not have a maximizer or saddle point and has only one minimizer described by that indicates the target trajectory or location. It is an assumption that, among other choices, can be easily satisfied with a quadratic cost function that even includes penalty functions associated with physical obstacles.

Proposition 1.

Consider a pair that describes the state and default control of (2). If , then the first-order adjoint is a non-zero vector.

Proof.

Using (3), and by Assumption 4,

Therefore, if , then such that . ∎

Proposition 2.

Consider dynamics given by (2) and a trajectory described by state and control . Then, there are always control solutions such that for some

Proof.

Using dynamics of the form in (1), the expression of the mode insertion gradient can be written as

Given controls and that generate a positive mode insertion gradient, there always exist control such that the mode insertion gradient is negative, i.e. . The mode insertion gradient is zero for all if the costate vector is orthogonal to each control vector or if the costate vector is zero everywhere.22endnote: 2If the control vectors span the state space , the costate vector cannot be orthogonal to each of them. Therefore, for first-order controllable (fully actuated) systems, there always exist controls for which the cost can be reduced to first order.

Proposition 3.

Consider dynamics given by (2) and a pair of state and control for which and . Then, the first-order adjoint is orthogonal (under the Euclidean inner product) to all control vectors .

Proof.

We rewrite (4) as

where and . The linear combination of the elements of is zero for any , which means must be the zero vector. By Proposition 1, for a trajectory described by a pair of state and control and, as a result, . ∎

Proposition 4.

Consider dynamics given by (2) and a pair of state and control for which and . Further assume that the control vectors and the Lie Bracket terms and —where —span the state space . Then, there exist and such that either or .

Proof.

Let be a set of vectors that span the state space (). Then, any vector in can be written as a linear combination of the vectors in . The first-order adjoint is an -dimensional vector, which is non-zero for a trajectory described by a pair of state and control by Proposition 1. Therefore, it can be expressed as

(8)

where . Left-multiplying (8) by yields

Given that , and by Proposition 3, is orthogonal to all control vectors (which also implies that the control vectors do not span ), the above equation simplifies to

Because , there exists and a Lie bracket term , or that is not orthogonal to the costate . That is,

First-order needle variation methods are singular when the mode insertion gradient is zero. When that is true, the next result—that is the main piece required for the main theoretical result of this section in Theorem 1—demonstrates that the second-order mode insertion gradient is guaranteed to be negative for systems that are controllable with first-order Lie Brackets, which in turn implies that a control solution can be found with second-order needle variation methods.

Proposition 5.

Consider dynamics given by (2) and a trajectory described by state and control for which for all and . If the control vectors and the Lie brackets and span the state space (), then there always exist control solutions such that .

Proof.

See Appendix. ∎

Theorem 1.

Consider dynamics given by (2) and a trajectory described by state and control . If the control vectors and the Lie brackets and span the state space , then there always exists a control vector and a duration such that the cost function (3) can be reduced.

Proof.

The local change of the cost function (3) due to inserted control of duration can be approximated with a Taylor series expansion

By Propositions 2 and 5, either 1) or 2) and . Therefore, there always exist controls that reduce the cost function (3) to first or second order. ∎

4 Control Synthesis

In this section, we present an analytical solution of first- and second-order needle variation controls that reduce the cost function (3) to second order. We then describe the algorithmic steps of the feedback scheme used in the simulation results in Section V.

4.1 Analytical Solution for Second Order Actions

For underactuated systems, there are states at which is orthogonal to the control vectors (see Proposition 3). At these states, control calculations based only on first-order sensitivities fail, while controls based on second-order information still have the potential to decrease the objective provided that the control vectors and their Lie brackets span the state space (see Theorem 1). We use this property to compute an analytical synthesis method that expands the set of states for which individual actions that guarantee descent of an objective function can be computed.

Consider the Taylor series expansion of the cost around control duration . Given the expressions of the first- and second-order mode insertion gradients, we can write the cost function (3) as a Taylor series expansion around the infinitesimal duration of inserted control :

(9)

The first- and second-order mode insertion gradients used in the expression are functions of the inserted control in (2). Equation (9) is quadratic in and, for a fixed , has a unique solution which is used to update the control actions. Controls that minimize the Taylor expansion of the cost will have the form

(10)

where the MIH has both linear and quadratic terms in . We compute the minimizer of (10) to be

(11)

where and are respectively the first- and second-order derivatives of with respect to the control (see Appendix). These quantities are given by

The parameter , a positive definite matrix, denotes a metric on control effort.

The existence of control solutions in (11) depends on the inversion of the Hessian . To practically ensure H is positive definite, we implement a spectral decomposition on the Hessian , where matrices and

contain the eigenvectors and eigenvalues of

, respectively. We replace all elements of the diagonal matrix that are smaller than with to obtain and replace with in (11). We prefer the spectral decomposition approach to the Levenberg-Marquardt method (), because the latter affects all eigenvalues of the Hessian and further distorts the second-order information. At saddle points, we set the control equal to the eigenvector of that corresponds to the most negative eigenvalue in order to descend along the direction of most negative curvature (Murray, 2010; Schnabel & Eskow, 1990; Boyd & Vandenberghe, 2004; Nocedal & Wright, 2006).

Synthesis based on (11) provides controls at time that guarantee to reduce the cost function (3) for systems that are controllable using first-order Lie brackets. Control solutions are computed by forward simulating the state over a time horizon and backward simulating the first- and second-order costates and . As we see next, this leads to a very natural, and easily implementable, algorithm for applying cost-based feedback while avoiding iterative trajectory optimization.

4.2 Algorithmic Description of Control Synthesis Method

  1. Simulate states and costates with default dynamics over a time horizon

  2. Compute optimal needle variation controls

  3. Saturate controls

  4. Use a line search to find control duration that ensures reduction of the cost function (3)33endnote: 3The application time of the inserted action is typically chosen to correspond to the most negative first-order mode insertion gradient.

Algorithm 1
Figure 2: The steps of the controller outlined by Algorithm 1. Using the default control, the states and co-states are forward-simulated for the time horizon . The optimal control response is computed from (11), and saturated appropriately. At the end, the algorithm determines the finite duration of the inserted single action, evaluated at an application time , with a line search.

The second-order controls in (11) are implemented in a series of steps shown in Algorithm 1 and visualized in Fig. 2. We compare first- and second-order needle variation actions by implementing different controls in Step 2 of Algorithm 1. For the first-order case, we implement controls that are the solution to a minimization problem of the first-order sensitivity of the cost function (3) and the control effort

(12)

where and expresses the desired value of the mode insertion gradient term (see, for example, Mamakoukas et al. (2016)). Typically, , where is the cost function (3) computed using default dynamics . For second-order needle variation actions, we compute controls using (11). As Fig. 2 indicates, the applied actuation is the saturated value of the control response of either (11) or (4.2), evaluated at the application time .

While we do not show it in this paper, it is shown in Ansari & Murphey (2016) that the first-order needle variation control solutions (4.2) remain a descent direction after saturation. This result is extended in Mamakoukas et al. (2018) to show that the entire control signal over the time horizon, and not a needle action, remains a descent direction when saturated by an arbitrary amount. While we have not yet formally proved a similar property for the second-order needle variation controls (11), one can test and identify if the saturated controls would decrease the cost function before applying any actuation. In addition, the results of this work rely on the sign and not the magnitude of the control solutions, suggesting that the saturated second-order solutions in (11) also provide a descent direction.

Further, needle variation actuation as shown in Fig. 2 may be practically infeasible or at least problematic for motors due to the abrupt changes in the control. There are two remedies to this issue. First, introducing additional filter states associated with the control can constraint the changes in actuation (Fan & Murphey, 2016). Second, one can show that the entire curve of the first-order needle variation solution is a descent direction (Mamakoukas et al., 2018). Assuming the same is true for the second-order solutions as well, one could either apply part of the continuous control solution around the time of application or filter the discontinuous actuation in hardware and still provide descent with more motor-friendly actuation.

4.3 Convergence in the presence of obstacles

We use Theorem 1 to show that the proposed needle-variation controller will always converge to the global minimizer for convex functions. This statement is true independent of the presence of obstacles.

Theorem 2.

Consider dynamics given by (2), a trajectory described by state and control . Let describe the trajectory generated after iterations of Algorithm 1. Further let , where denotes the collision-free part of the state-space. Consider an objective (3) that satisfies Assumption 2 and whose running cost term penalizes collisions, such that if where . If the objective is convex with respect to the state in the unconstrained sense and the control vectors and the Lie brackets and span the state space , then the sequence of solutions generated by Algorithm 1 converges to . Further, .

Proof.

Algorithm 1 constructs control responses out of the first- and second-order mode insertion gradients. By extension of Theorem 1, it can guarantee to reduce the objective with each iteration (for some control of duration ). Therefore,

(13)

where is the (only) minimizer of the convex objective. It follows that,

(14)

Further, assume that there exists and such that . Then , which contradicts (13). Using proof by contradiction, we conclude that

(15)

Assuming that a collision-free path exists between the agent and the target, it is straightforward that the minimizer trajectory is the target’s location. Therefore, from (14) and (15), Algorithm 1 generates a sequence that converges to the target safely.44endnote: 4Although control responses are constructed from a second-order Taylor series approximation of the objective, the iterated sequence is guaranteed to decrease the real cost function at each iteration. If the real cost function is convex with respect to the state (in the unconstrained sense), the iterated sequence will converge towards the only minimizer. Using a sufficient descent condition, the algorithm is guaranteed to converge to a point where the unconstrained derivative of the objective is zero (i.e., ), which, from Assumption 4, only happens at the global minimizer.

With regards to Theorem 2, we should alert the reader that Algorithm 1 may not guarantee collision-avoidance if the default trajectory is not collision-free, that is if there exists such that . Further, the result of Theorem 2 can generalize to non-convex functions that have only one minimum.

4.4 Comparison to Alternative Optimization Approaches

Algorithm 1 differs from controllers that compute control sequences over the entire time horizon in order to locally minimize the cost function. Rather, the proposed scheme utilizes the time-evolving sensitivity of the objective to an infinitesimal switch from to and searches a one-dimensional space for a finite duration of a single action that will optimally improve the cost. It does so using a closed-form expression and, as a result, avoids the expensive iterative computational search in high-dimensional spaces, while it may still get closer to the optimizer with one iterate.

Specifically, in terms of computational effort, Algorithm 1 computes two (state (2) and first-order adjoint (5) variables) and one (second-order adjoint (7)) differential equations and searches. All simulations presented in this paper are able to run in real time, including the final 13-dimensional system. However, real-time execution is not guaranteed for higher dimensional systems. Nevertheless, the presented algorithm runs faster than the iLQG method for the simulations considered here.

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 3: Differential drive using first-, second-order needle variation actions, iLQG, and DDP, from left to right. Snapshots of the system are shown at , and  sec. The target state is  mm mm.
(a)
(b)
Figure 4: Fig. 3(a) plots the running state cost; Fig. 3(b) plots the integrated (cumulative) cost, including the control cost. DDP and iLQG obtain the same cumulative cost, with slightly different trajectories (see Fig. 3). Second-order needle variation actions demonstrate improved convergence to the target over DDP and iLQG, despite optimizing over one single action at each iteration.

Further, compared to traditional optimization algorithms such as iLQG, needle variation solutions exist globally, demonstrate a larger region of attraction and have a less complicated representation on Lie Groups (Fan & Murphey, 2016). These traits naturally transfer to second-order needle controls (11) that also contain the first-order information present in (4.2). In addition, as this paper demonstrates, the suggested second-order needle variation controller has formal guarantees of descent for systems that are controllable with first-order Lie brackets, which—to the best of our knowledge—is not provided by any alternative method.

Given these benefits, the authors propose second-order needle variation actions as a complement to existing approaches for time-sensitive robotic applications that may be subject to large initial error, Euler angle singularities, or fast-evolving (and uncertain) objectives. Next, we implement Algorithm 1 using first or second-order needle variation controls (shown in (4.2) and (11), respectively) to compare them in terms of convergence success on various underactuated systems.

5 Simulation Results

The proposed synthesis method based on (11) is implemented on three underactuated examples—the differential drive cart, a 3D kinematic rigid body, and a dynamic model of an underwater vehicle. The kinematic systems of a 2D differential drive and a 3D rigid body are controllable using first-order Lie brackets of the vector fields and help demonstrate Theorem 1. The underactuated dynamic model of a 3D rigid body serves to compare controls in (11) and (4.2), as well as make comparisons to other control techniques, in a more sophisticated environment. In all simulation results, we start with default control and an objective function of the form

where is the desired state-trajectory, and , are metrics on state error.

5.1 2D Kinematic Differential Drive

We use the differential drive system to demonstrate that first-order controls shown in (4.2) that are based only on the first-order sensitivity of the cost function (3) can be insufficient for controllable systems, contrary to controls shown in (11) that guarantee decrease of the objective for systems that are controllable using first-order Lie brackets (see Theorem 1).

The system states are its coordinates and orientation, given by , with kinematic () dynamics

where  cm,  cm denote the wheel radius and the distance between them, and , are the right and left wheel control angular velocities, respectively (these parameter values match the specifications of the iRobot Roomba). The control vectors , and their Lie bracket term span the state space (). Therefore, from Theorem 1, there always exist controls that reduce the cost to first or second order.

Fig. 3 and 4 demonstrate how first-, second-order needle variations, iLQG, and DDP (Todorov & Li, 2005; Tassa et al., 2014) perform on reaching a nearby target. We implement the iLQG and DDP algorithms to generate offline trajectory optimization solutions using the publicly available software.55endnote: 5Available at http://www.mathworks.com/matlabcentral/
fileexchange/52069-ilqg-ddp-trajectory-optimization.
Actions based on first-order needle variations (4.2) do not generate solutions that turn the vehicle, but rather drive it straight until the orthogonal displacement between the system and the target location is minimized. Actions based on second-order needle variations (11), on the other hand, converge successfully. The solutions differ from the trajectories computed by iLQG and DDP, despite using the same simulation parameters.

We note the fact that, besides the computational benefits, single-action approaches appear to be rich in information and perform comparably to offline schemes that attempt to minimize the objective by computing different control responses over the entire horizon. Given that the solutions of iLQG and DDP are very similar, and the fact that DDP is slower than iLQG due to expanding the dynamics to second order, we use only the iLQG algorithm as a means of comparison for the rest of the simulations presented in this work. The results in Fig. 3 based on second-order needle variations are generated in real time in Matlab and approximately forty times faster than the iLQG implementation.

Figure 5: Convergence success rates of first- (4.2) and second-order (11) needle variation controls for the kinematic differential drive model. Simulation runs: 1000.

Fig. 5 shows a Monte Carlo simulation that compares convergence success using first- and second-order needle variations controls shown in (4.2) and (11), respectively, and iLQG. We sampled over initial coordinates

 mm using a uniform distribution and keeping only samples for which the initial distance from the origin exceeded

; for all samples. Successful samples are defined by being within from the origin with an angle within 60 seconds using feedback sampling rate of 4 Hz. Results are generated using , ,  s, for (4.2), for (11), , and saturation limits on the angular velocities of each wheel 150/36 mm/s for each control approach.66endnote: 6The metric on control effort is necessarily smaller for (11), due to parameter . The parameter is chosen carefully to ensure that control solutions from (11) and (4.2) are comparable in magnitude. As shown in Fig. 5, the system always converges to the target using second-order needle variation actions, matching the theory.

(a)
(b)
Figure 6: Differential drive using second-order needle variation actions in the presence of obstacles. Fig. 5(b) shows the deviation from the nominal trajectory that is the solution to the no-obstacle task. The system performs two maneuvers to avoid each obstacle. These are evident in the angle deviation (compare to Fig. 2(c)).

5.1.1 Convergence with Obstacles

(a)
(b)
(c)
(d)
Figure 7: Trajectories of the differential drive in the presence of obstacles. Fig. 6(d) compares the solution to the trajectories generated when considering only a) obstacle 1, and b) obstacles 1 and 2, both of which collide with the obstacles. Simulations run in real time in Matlab.

Next, we illustrate the performance of the algorithm in the presence of obstacles. In all simulations, obstacles are considered in the objective in the form of a penalty function. In Fig. 6, we test the system in the same task as Fig. 3 in the presence of two obstacles, indicated with red spheres. In comparison with Fig. 2(f), it is worth noting the two angle peaks, corresponding to each obstacle. After passing the obstacles, the system recovers the same angle profile.

Fig. 7 shows more complicated maneuvers using controls from (11). The controller, without relying on a motion planner, is able to generate collision-free trajectories and safely converge to the target in all cases. These simulations also demonstrate another aspect of Algorithm 1. The differential drive always drives up to an obstacle and then narrowly maneuvers around, instead of preparing a turn earlier on. This behavior is to be expected of needle variation actions that instantly reduce the cost.

We next use the more complicated scenario of Fig. 6(d) to evaluate the second-order expansion of the objective, shown in (9), across the state-space (see Fig. 8). The first- and second-order mode insertion gradients are computed based on the second-order needle variation controls from (11). States are sampled in the space for and in increments of 5 mm, with everywhere. These results correspond to . The horizontal discontinuity that appears around  mm is believed to be due to the effect of the penalty functions. As Fig. 8 indicates, the change in cost is always negative, verifying Theorem 1, even in the presence of obstacles.

(a)
(b)
Figure 8: Cost reduction , modeled after (9), for sampled and in the presence of obstacles, given second-order needle variation controls. The first- and second-order mode insertion gradients are evaluated with the controls from (11). Figures 7(a) and 7(b) are identical, but shown over a different range to illustrate that even when looking at small variations of the first-order mode insertion gradient, the second-order method is reliably negative. The bright vertical line in Fig. 7(b) is vertically aligned with the target located at [ mm,  mm], where first-order solutions are singular. No data are sampled inside the white circles, as these indicate the infeasible occupied region.

We further use a Monte Carlo simulation of 500 trials on the initial conditions to test convergence success (Fig. 9). We sample initial conditions from a uniform distribution in [-200 mm, 1000 mm] [-400 mm, 800 mm], where in all cases. All trials converged within 25 seconds.

(a)
(b)
Figure 9: Performance of second-order needle variation actions in the presence of static obstacles. The controller is able to converge to the target for all 500 trials and avoid collisions. Fig. 8(a)

is an interpolated heat map that indicates the time to convergence as a function of initial position; Fig.

8(b) shows the trajectories followed by the center of mass of the agent. The gray area indicates the collision space, taking into account the width of the differential drive. (the simulation runs in real time in Matlab). For visualization, watch Extension 1.

Last, we test the differential drive in the presence of moving obstacles (Fig. 10). The controller is again able to avoid collision and converge to the target, without relying on additional motion planning techniques. The feedback rate used is 20 Hz and the trajectory of the obstacles is known to the agent throughout the time horizon. In these simulations,  s.

Figure 10: Performance of second-order needle variation actions in the presence of three moving obstacles. The left figure shows a snapshot of the simulation; the right figure plots the distance of the agent from each object and the target, where the gray area indicates the threshold minimum distance to avoid collision with the obstacles. The controller converges to the target in a collision-free manner (the simulation runs in real time in Matlab). For visualization, watch Extension 2.

5.2 3D Kinematic Rigid Body

The underactuated kinematic rigid body is a three dimensional example of a system that is controllable with first-order Lie brackets. To avoid singularities in the state space, the orientation of the system is expressed in quaternions (Titterton & Weston, 2004; Kuipers, 1999). The states are , where are the world-frame coordinates and are unit quaternions. Dynamics are given by

(16)
(17)

where and are the body frame linear and angular velocities, respectively (da Cunha, 2015). The rotation matrix for quaternions is

The system is kinematic: and , where and describe respectively the surge, sway, and heave input forces, and the roll, pitch, and yaw input torques. We render the rigid body underactuated by removing the sway and yaw control authorities ().

The four control vectors span a four-dimensional space. First-order Lie bracket terms add two more dimensions to span the state space () (the fact that there are seven states in the model of the system is an artifact of the quaternion representation; it does not affect controllability).

The vectors , and span associated with the world frame coordinates , and . Similarly, vectors , and span associated with the orientation. Thereby, control vectors and their first-order Lie brackets span the state space and, from Theorem 1, optimal actions shown in (11) will always reduce the cost function (3).

To verify this prediction, we present the convergence success of the system on 3D motion. Using Monte Carlo sampling with uniform distribution, initial locations are randomly generated such that  cm keeping only samples for which the initial distance from the origin exceeded 6 cm. We regard as a convergence success each trial in which the rigid body is within 6 cm to the origin by the end of 60 seconds at any orientation. Results are generated at a sampling rate of 20 Hz using