 # Detailed Proofs of Alternating Minimization Based Trajectory Generation for Quadrotor Aggressive Flight

This technical report provides detailed theoretical analysis of the algorithm used in Alternating Minimization Based Trajectory Generation for Quadrotor Aggressive Flight. An assumption is provided to ensure that settings for the objective function are meaningful. What's more, we explore the structure of the optimization problem and analyze the global/local convergence rate of the employed algorithm.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Preliminaries

Piece-wise polynomial representation of trajectory is adopted. Any segment of the trajectory can be denoted as an -order polynomial

 p(t)=cTβ(t),t∈[0,T] (1)

where is the coefficient matrix, is the duration and

 β(t)=(1,t,t2,⋯,tN)T (2)

is a basis function. It is worth noting that

should be an odd number hereafter, which makes the mapping bijective between the coefficient matrix and its boundary condition.

Consider derivatives of up to order

 d(t)=(p(t),˙p(t),⋯,p(N−12)(t))T, (3)

we have

 d(t)=B(t)c (4)

where

 B(t)=(β(t),˙β(t),⋯,β(N−12)(t))T. (5)

We denote and by and , respectively. The boundary condition is described by tuple . The following mapping holds:

 (dTstart,dTend)T=A(T)c (6)

where

 A(T)=(BT(0),BT(T))T (7)

is the mapping matrix. Since is an odd number, it is easy to know that is a non-singular square matrix. In other words, the mapping in 6 is bijective. Therefore, any segment of a trajectory can be equivalently expressed by tuple or tuple .

Consequently, we consider an -segment trajectory parametrized by time allocation as well as boundary conditions of all segments. The trajectory is defined by

 P(t):=dTmA(Tm)−Tβ(t−m−1∑i=1Ti) (8)

where lies in the -th segment and is a boundary condition of the -th segment. Normally, some entries in are fixed while the others are to be optimized. We split into two parts, the fixed part which is viewed as constant, and the free part which is to be optimized. Then, the whole trajectory can be fully determined by

 P=Φ(DP,T). (9)

## 2 Optimization Objective

The following time regularized quadratic objective function is used:

 J(P)=∫∑Mm=1Tm0⎛⎝ρ+Dmax∑i=Dminwi∥∥P(i)(t)∥∥2⎞⎠dt (10)

where and are the lowest and the highest order of derivative to be penalized respectively, is the weight of the -order derivative and is the weight of time regularization. When , some derivatives on the boundary of each segment may not exist, hence we sum up objectives on all segments instead, which have the form

 Jm(dm,Tm):=ρTm+Tr{dTmA(Tm)−TQ(Tm)A(Tm)−1dm} (11)

for the -th segment, where is a symmetric matrix  consisting of high powers of , and is trace operation. The overall objective is formulated as

 J(DP,T)=ρ∥T∥1+∗Tr{(DFDP)TCTH(T)C(DFDP)} (12)
 H(T)=M⨁m=1A(Tm)−TQ(Tm)A(Tm)−1 (13)

where is the direct sum of its diagonal blocks, and is a permutation matrix.

In Eq. 12, and are all parameters that directly determine the structure of . It is important to know that not all settings for are legal. Instead of restricting those parameters, we make the following assumption on the objective function such that the setting is meaningful.

###### Assumption 1.

For any finite , the corresponding -sublevel set of

 L−α(J):={(DP,T) ∣∣ J(DP,T)≤α} (14)

is bounded and satisfies

 L−α(J)⊂{(DP,T) ∣∣ T∈RM+}. (15)

Intuitively, Assumption 1 forbids the objective from taking meaningful value when decision variables are extremely large or any duration is extremely small. For example, consecutive repeating waypoints with identical boundary conditions fixed in are illegal, because the optimal duration on corresponding segment becomes which violates condition (15). In other words, the segment should not exist if the objective is to be minimized. Another example is that is also illegal. Non-positive weight on total duration means that the objective can be sufficiently low when duration on each segment is large enough. In such a case, the boundness condition is violated.

## 3 Unconstrained Optimization Algorithm

To optimize Eq. 12, Algorithm. 1 is proposed. Initially, is solved for any provided . After that, the minimization of the objective function is done through a two-phase process, in which only one of and is optimized while the other is fixed.

In the first phase, the sub-problem

 D∗P(T)=argminDPJ(DP,T) (16)

is solved for each . We employ the unconstrained QP formulation by Richter et al. , which we briefly introduce here. The matrix is partitioned as

 R(T)=(RFF(T)RFP(T)RPF(T)RPP(T)). (17)

then the solution is obtained analytically through

 D∗P(T)=−RPP(T)−1RFP(T)DF. (18)

In the second phase, the sub-problem

 (19)

is solved for each . In this phase, the scale of sub-problem can be reduced into each segment. Due to our representation of trajectory, once is fixed, the boundary conditions isolate each entry in from the others. Therefore, can be optimized individually to get all entries of . As for the -th segment, its cost in (11) is indeed a rational function of . We show the structure of and omit the trivial deduction for brevity:

 Jm(T)=ρT+1Tpnpd∑i=0αiTi (20)

where and are orders of numerator and denominator respectively. The coefficient is determined by . It is clear that is smooth on . Due to the positiveness of , we have as or . Therefore, the minimizer exists for

 T∗m(DP)=argminT∈(0,+∞)Jm(T). (21)

To find all candidates, we compute the derivative of (20):

 dJm(T)dT=ρ+1T1+pnpd∑i=0(i−pn)αiTi. (22)

The minimum exists in the solution set of , which can be calculated through any modern univariate polynomial real-roots solver . The second phase is completed by updating every entry in .

## 4 Convergence Analysis

We first explore some basic properties of , which help a lot in convergence analysis of Algorithm 1. We have already shown that are rational function of each entry in . As for the part, it is indeed partially convex, which is given by the following lemma.

###### Lemma 1.

is convex in for any , provided that Assumption 1 holds.

###### Proof.

Assumption 1 implies that , holds for all and at least one is nonzero. Otherwise, the boundness on or positiveness on its time allocation is violated. Thus, for any , the objective function is always positive, which can be seen from (10). The non-negativity of implies the positive semidefiniteness of the symmetric matrix . Since is the principal submatrix of , it is also positive semidefinite. We compute the Hessian matrix of with respect to :

 ∇2DPJ(DP,T)=23⨁k=1RPP(T) (23)

which means is positive semidefinite. Therefore, is convex in . ∎

###### Lemma 2.

For any convex function , if the following inequality holds for any

 ∥∇f(X)−∇f(Y)∥F≤L∥X−Y∥F, (24)

in which is a constant and is Frobenius norm, then

 f(X)−f(Y)≥Tr{∇f(Y)T(X−Y)}+12L∥∇f(X)−∇f(Y)∥2F (25)
###### Proof.

See Theorem 2.1.5 in . ∎

###### Lemma 3.

Provided that Assumption 1 is satisfied, then the following inequality holds for any and any :

 (26)

where

 D∗P=argminDPJ(DP,T), (27)

and

is the largest singular value of

.

###### Proof.

The gradient of with respect to can be calculated as

 (28)

The difference in gradient at and is

 (29)

Assumption 1 ensures that is nonzero matrix, which means it has largest singular value for any . According to the basic property of spectral norm, we have

 ∥∥RTPP(T)(DP−D∗P)∥∥F∥DP−D∗P∥F≤max∥X∥F=1∥∥RTPP(T)X∥∥F=σP(T). (30)

Combining (29) and (30), we get

 ∥∥∇DPJ(DP,T)−∇DPJ(D∗P,T)∥∥F≤2σP(T)∥DP−D∗P∥F. (31)

According to Lemma 1 and Lemma 2, if we substitute by , together with the fact that (27) implies , the result follows. ∎

###### Theorem 1.

Consider the process in Algorithm 1 started with any . Provided that Assumption 1 is satisfied, then the inequality always holds for -th iteration:

 min0≤k≤K∥∇J(DkP,Tk)∥2F≤McJ(D0P,T0)−JcK

where and are both constant.

###### Proof.

It is clear that the objective function is non-increasing in any iteration, i.e., for any , we have

 J(DkP,Tk)≥J(Dk+1P,Tk)≥J(Dk+1P,Tk+1). (32)

Moreover, the objective function is non-negative, which means for for any . Therefore,

 limk→+∞J(DkP,Tk)=Jc. (33)

Since , the following condition holds by Lemma 3:

 ∥∥∇DPJ(DkP,Tk)∥∥2F4σPP(Tk)≤J(DkP,Tk)−J(Dk+1P,Tk)≤J(DkP,Tk)−J(Dk+1P,Tk+1). (34)

Notice that in each iteration, then

 ∥∥∇J(DkP,Tk)∥∥F=∥∥∇DPJ(DkP,Tk)∥∥F. (35)

Therefore,

 (36)

We simply let , then for all . According to Assumption 1, is bounded and satisfies condition (15). Then there exists positive constant and such that always holds for . Consequently, is also upper bounded by a positive constant . We have

 ∥∥∇J(DkP,Tk)∥∥2FMc≤∥∥∇J(DkP,Tk)∥∥2F4σP(Tk)≤J(DkP,Tk)−J(Dk+1P,Tk+1). (37)

We sum it up for all iterations:

 1McK∑k=0∥∥∇J(DkP,Tk)∥∥2F≤J(D0P,T0)−J(DK+1P,TK+1)≤J(D0P,T0)−Jc (38)

Since the right hand side is bounded, we have

 limK→+∞∇J(DKP,TK)=0. (39)

Taking the minimum of left hand side equals

 min0≤k≤K∥∇J(DkP,Tk)∥2FMc/K≤J(D0P,T0)−Jc. (40)

Rearranging gives the result. ∎

Theorem 1 shows that, under no assumption on convexity, Algorithm 1 shares the same global convergence rate as that of gradient descent with the best step-size chosen in each iteration . However, the best step-size is practically unavailable. As a contrast, Algorithm 1 does not involve any step-size choosing in each iteration. Sub-problems (Eq. 16 and Eq. 19) both can be solved exactly and efficiently due to their algebraic convenience. Therefore, Algorithm 1 is faster than gradient-based methods in practice.

Although only convergence to stationary point is guaranteed, strict saddle points are theoretically and numerically unstable  for Algorithm 1, which is indeed a first-order method. Moreover, when the stationary point is a strict local minimum, we show that the convergence rate is faster than the general case in Theorem 1.

###### Lemma 4.

Consider a positive sequence satisfying

 αk−αk+1≥γα2k, k=0,1,⋯, (41)

where is a constant. Then for all ,

 αk≤1kγ+α−10. (42)
###### Proof.

Apparently,

 1αk+1−1αk≥1αk−γα2k−1αk=γ1−γαk≥γ, (43)

hence

 1αk−1α0≥kγ. (44)

Rearranging gives the result. ∎

###### Theorem 2.

Provided that Assumption 1 is satisfied, let denote any strict local minimum of to which Algorithm 1 converges, then there exist and , such that

 J(DKP,TK)−J∗≤1γ(K−Kc)+(J(DKcP,TKc)−J∗)−1 (45)

for all , where .

###### Proof.

Define the neighborhood as

 B(r)={(DP,T) ∣∣ ∥DP−ˆDP∥2F+∥T−ˆT∥2≤r2} (46)

A strict local minimum satisfies , then there exists such that is locally convex in the domain . Moreover, there exists a positive integer such that holds for all , so we only consider hereafter. Due to the local convexity, we have

 J(DkP,Tk)−J∗≤Tr{∇DPJ(DkP,Tk)T(DkP−ˆDP)}+∇TJ(DkP,Tk)T(Tk−ˆT). (47)

By applying Cauchy-Schwartz inequality on the right hand side, we have

 (J(DkP,Tk)−J∗)2≤∥∥∇J(DkP,Tk)∥∥2F(∥∥DkP−ˆDP∥∥2F+∥∥Tk−ˆT∥∥2). (48)

Notice that the distance between and the local minimum is upper-bounded by , thus

 (J(DkP,Tk)−J∗)2R2c≤∥∥∇J(DkP,Tk)∥∥2F. (49)

According to the inequality (37) deduced in the proof of Theorem 1, we have

 J(DkP,Tk)−J(Dk+1P,Tk+1)≥∥∥∇J(DkP,Tk)∥∥2FMc, (50)

where is the upper bound of . Combining these two conditions, we get

 J(DkP,Tk)−J(Dk+1P,Tk+1)≥(J(DkP,Tk)−J∗)2McR2c. (51)

We apply Lemma 4 by defining

 αk:=J(Dk+KcP,Tk+Kc)−J∗,   γ:=1/McR2c, (52)

then the result follows. ∎

When Algorithm 1 converges to a strict local minimum, the above theorem shows that the local convergence rate is . Note that it is possible to accelerate our method to attain the optimal rate of first-order methods or use high-order methods to achieve a faster rate.

## References

•  Adam Bry, Charles Richter, Abraham Bachrach, and Nicholas Roy. Aggressive flight of fixed-wing and quadrotor aircraft in dense indoor environments. The International Journal of Robotics Research, 34:1002 – 969, 2015.
•  Jason D. Lee, Ioannis Panageas, Georgios Piliouras, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. First-order methods almost always avoid strict saddle points. Mathematical Programming, 176:311–337, 2019.
•  Yurii Nesterov. Lectures on Convex Optimization. 2018.
•  Michael Sagraloff and Kurt Mehlhorn. Computing real roots of real polynomials. J. Symb. Comput., 73:46–86, 2013.