# On the Convexity of Discrete Time Covariance Steering in Stochastic Linear Systems with Wasserstein Terminal Cost

In this work, we analyze the properties of the solution to the covariance steering problem for discrete time Gaussian linear systems with a squared Wasserstein distance terminal cost. In our previous work, we have shown that by utilizing the state feedback control policy parametrization, this stochastic optimal control problem can be associated with a difference of convex functions program. Here, we revisit the same covariance control problem but this time we focus on the analysis of the problem. Specifically, we establish the existence of solutions to the optimization problem and derive the first and second order conditions for optimality. We provide analytic expressions for the gradient and the Hessian of the performance index by utilizing specialized tools from matrix calculus. Subsequently, we prove that the optimization problem always admits a global minimizer, and finally, we provide a sufficient condition for the performance index to be a strictly convex function (under the latter condition, the problem admits a unique global minimizer). In particular, we show that when the terminal state covariance is upper bounded, with respect to the Löwner partial order, by the covariance matrix of the desired terminal normal distribution, then our problem admits a unique global minimizing state feedback gain. The results of this paper set the stage for the development of specialized control design tools that exploit the structure of the solution to the covariance steering problem with a squared Wasserstein distance terminal cost.

## Authors

• 1 publication
• 6 publications
• 8 publications
• ### Shrinking the Sample Covariance Matrix using Convex Penalties on the Matrix-Log Transformation

For q-dimensional data, penalized versions of the sample covariance matr...
03/19/2019 ∙ by David E. Tyler, et al. ∙ 0

• ### Index Reduction for Second Order Singular Systems of Difference Equations

This paper is devoted to the analysis of linear second order discrete-ti...
05/12/2020 ∙ by Vu Hoang Linh, et al. ∙ 0

• ### Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient-Based Methods and Global Convergence

Recently, policy optimization for control purposes has received renewed ...
11/24/2020 ∙ by Joao Paulo Jansch-Porto, et al. ∙ 0

• ### Robust Controller Design for Stochastic Nonlinear Systems via Convex Optimization

This paper presents ConVex optimization-based Stochastic steady-state Tr...
07/10/2020 ∙ by Hiroyasu Tsukamoto, et al. ∙ 0

• ### Localizability with Range-Difference Measurements

The physical position is crucial in location-aware services or protocols...
02/22/2021 ∙ by Wu Junfeng, et al. ∙ 0

• ### Assessing transfer functions in control systems

When dealing with control systems, it is useful and even necessary to as...
05/27/2018 ∙ by Nadezhda Gribkova, et al. ∙ 0

• ### Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory

In this paper, we provide a unified analysis of temporal difference lear...
06/16/2019 ∙ by Bin Hu, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

In this work, we study the existence and uniqueness of solutions to the covariance steering problem for discrete time Gaussian linear systems with a squared Wasserstein distance terminal cost. This instance of stochastic optimal control problem seeks for a feedback control policy that will steer the probability distribution of the state of the uncertain system, close to a goal multivariate normal distribution over a finite time horizon, where the closeness of the two distributions is measured in terms of the squared Wasserstein distance between them. In our previous work

[1], we have shown that the latter problem can be reduced into a difference of convex functions program (DCP) provided that the control policy conforms to the so-called state feedback control parametrization according to which the control input can be expressed as an affine function of the current state and all past states visited by the system. Whereas the focus in [1] was on the control design problem, in this work we focus on the analysis of the problem and in particular, addressing questions about the existence and uniqueness of solutions and the convexity (or lack thereof) of the performance index.

Literature review: Early works on covariance control problems can be attributed to Skelton and his co-authors who mainly examined infinite-horizon problems in a series of papers (refer to, for instance, [2, 3, 4]). Recently, finite-horizon covariance control problems for Gaussian linear systems have received significant attention; the reader may refer to [5, 6, 7] for the continuous-time case and [8, 9, 10, 11, 12, 13, 14] for the discrete-time case. The covariance steering problem for continuous-time Gaussian linear systems with a Wasserstein distance terminal cost was first studied in [15] whereas the same problem but for the discrete-time case was studied in [1]. Both of these references present numerical algorithms (shooting method in [15] and convex-concave procedure in [1]) for control design but do not address theoretical questions regarding the existence and uniqueness of solution, or investigate convexity properties of the performance index.

Main contributions: Next we summarize the main contributions of this paper. First, we establish the existence of at least one global minimizer to the optimization problem. Subsequently, we derive first and second order conditions of optimality, and provide analytic expressions for the gradient and the Hessian of the performance index by utilizing specialized tools from matrix calculus (these analytic expressions may also facilitate the implementation of numerical optimization algorithms, and thus improve in practice the speed of convergence). Finally, we present a sufficient condition for the performance index to be a strictly convex function under which the optimization problem admits a unique solution. In particular, we show that when the terminal state covariance is bounded from above, with respect to the Löwner partial order over the cone of positive semidefinite matrices, by the covariance matrix of the goal normal distribution, then the Hessian of the performance index becomes a strictly positive definite matrix, which in turn implies that the performance index is a strictly convex function.

Outline of the paper: In Section II, we review a few important results from matrix calculus that we use throughout the paper. In Section III, we formulate the covariance steering problem with a Wasserstein distance terminal cost, and briefly outline its reduction into a DCP. Sections IV and V present the first and second order optimality conditions for the latter optimization problem along with a sufficient condition for the convexity of the performance index. Finally, Section VI concludes the paper with a summary of remarks and directions for future research.

## Ii Preliminaries

Here we collect some notations and background material that will come in handy throughout this paper.

#### Set and inequality notations

We denote the set of nonnegative integers as , and for any positive integer , let . We use the inequalities and in the sense of Löwner partial order.

#### Kronecker product, Kronecker sum, and the vec operator

The basic properties of Kronecker product will be useful in the sequel, including

 (M1⊗M2)(M3⊗M4)=(M1M3⊗M2M4), (1)

and that matrix transpose and inverse are distributive w.r.t. the Kronecker product. The vectorization operator

and the Kronecker product are related through

 vec(M1M2M3)=(MT3⊗M1)vec(M2). (2)

Furthermore,

 trace(M⊤1M2)=vec(M1)Tvec(M2). (3)

We will also need the Kronecker sum

 M1⊕M2:=M1⊗I+I⊗M2,

where

is an identity matrix of commensurate dimension. For matrices

of appropriate size and non-singular, we have

 (L⊗L)(M⊕M)(L−1⊗L−1)=LML−1⊕LML−1 (4)

which is easy to verify using the definition of Kronecker sum and (1), and will be useful later.

#### Commutation matrix

The commutation matrix is the unique symmetric permutation matrix such that

 vec(M)=K0vec(MT),

see e.g., [16]. Being orthogonal, satisfies

 K−10=KT0=K0.

Therefore, is idempotent of order two. Two useful properties of are

 K0vec(I)=vec(I),K0(M1⊗M2)=(M2⊗M1)K0.

Notice that

being symmetric orthogonal, its eigenvalues are

. Consequently, the matrix , which is also symmetric idempotent, has eigenvalues and .

Another observation that will be useful is that commutes with “self Kronecker product or sum”, i.e., for any square matrix , we have

 (I+K0)(M⊗M) =(M⊗M)(I+K0), (5a) (I+K0)(M⊕M) =(M⊕M)(I+K0), (5b)

which follows from the property of mentioned before. We also have

 (I+K0)(M⊕M)−1=(M⊕M)−1(I+K0). (6)

To see (6), notice that equals

 =(M⊕M)−1K−10=(M⊕M)−1K0.

#### Matrix differential and Jacobian

The matrix differential and the vectorization are linear operators that commute with each other. We will frequently use the Jacobian identification rule [17, Ch. 9, Sec. 5], which for a given matrix function , is

 dvec(F(X))=DF(X)dvecX, (7)

where is the Jacobian of evaluated at . In case is independent of , the Jacobain

is a zero matrix. Some Jacobians of our interest are collected in the Appendix.

#### Matrix geometric mean

Given two symmetric positive definite matrices and

, their geometric mean (see e.g.,

[18]) is the symmetric positive definite matrix

 A#B:=A12(A−12BA−12)12A12. (8)

It satisfies intuitive properties such as , , .

#### Function composition and normal distribution

We use the symbol to denote function composition. The symbol denotes that the random vector has normal distribution with mean vector and covariance matrix .

## Iii Problem Set up

We consider a discrete-time stochastic linear system

 xk+1=Akxk+Bkuk+Gkwk,k∈N0, (9)

where , , and denote the state, control input, and disturbance vectors at time , respectively. It is assumed that the initial state is a normal vector and in particular, , where and , and in addition, the disturbance process is a sequence of independent and identically distributed random vectors for all and . We suppose that and are mutually independent for all , from which it follows that for all , where denotes the expectation operator. We assume that the matrices are full rank for all .

For , let

 x :=[xT0,xT1,…,xTN]T∈R(N+1)nx, u :=[uT0,uT1,…,uTN−1]T∈RNnu, w :=[wT0,wT1,…,wTN−1]T∈RNnw.

Then, we can write

 x=Γx0+Huu+Hww, (10)

where the block (column) vector

 Γ:=[ITnx ΦT(1,0) ΦT(2,0) … ΦT(N,0)]T, (11)

and for all with , the matrices , and (note that ).

Furthermore,

 Hu :=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣00…0B00…0Φ(2,1)B0B1…0⋮⋮⋮⋮Φ(N,1)B0Φ(N,2)B1…BN−1⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦, (12)

and is defined likewise by replacing the matrices in (12) with the matrices .

The problem of interest is to perform minimum energy feedback synthesis for (9) over a time horizon of length , such that the distribution of the terminal state goes close to desired distribution where , are given. The mismatch between the desired distribution and the distribution of the actual terminal state is penalized as a terminal cost quantified using the squared 2-Wasserstein distance between those two distributions. We refer the readers to [1, Sec. II] for the details on problem formulation.

To recover the statistics of the terminal state from the concatenated state , the following relation will be useful:

 xN=Fx,F:=[0,…,0,Inx]. (13)

It was shown in [1] that the problem of discrete time covariance steering with Wasserstein terminal cost subject to (9) (or equivalently (10)), can be reduced to a difference of convex functions program, provided the control policy is parameterized as

 uk=uff,k+k∑t=0K(k,t)(xt−¯xt) (14)

where , and the parameters of the control policy are , for all . The concatenated control input can be written as

 u:=uff+K(x−¯x) (15)

where , and

 ~K:=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣K(0,0)0…0K(1,0)K(1,1)…0⋮⋮⋱⋮K(N−1,0)K(N−1,1)…K(N−1,N−1)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (16)

The controller synthesis thus amounts to computing the optimal feedforward control and feedback gain pair .

In [1], the authors proposed a bijective mapping and back, given by

 Θ:=K(I−HuK)−1, (17a) K:=(I+ΘHu)−1Θ. (17b) We have (I−HuK)−1 =I+HuK(I−HuK)−1 =(I+HuΘ). (17c)

With the new feedback gain parameterization , it was deduced in [1] that the optimal pair minimizes the objective , given by

 J(uff,Θ)=Jcost-to-go(uff,Θ)+λW22(uff,Θ) (18)

where is given, and

 Jcost-to-go(uff,Θ)=trace(Θ~SΘT)+∥uff∥22, (19)

and

 W22(uff,Θ)=∥μd−(Γμ0+Huuff)∥22 +trace(F(I+HuΘ)~S(I+HuΘ)TFT+Sd) −2trace((√SdF(I+HuΘ)~S(I+HuΘ)TFT√Sd)12), (20)

where

 ~S:=ΓS0ΓT+HwWHwT (21)

and the block diagonal matrix .

###### Proposition 1.

Consider as in (21). Then .

###### Proof.

From (21), it is clear that . Suppose if possible that is singular. Then there exists vector such that , which in turn, is possible iff and , since , .

Now let where the sub-vector for all . From , we get since the matrices are full rank per our assumption. In , substituting , yields . Thus, which contradicts our hypothesis. Therefore, the positive semidefinite matrix is nonsingular, i.e., . ∎

###### Remark 1.

An important consideration is that in order to ensure the causality of the control policy, the matrix should be constrained to be block lower triangular of the form

 Θ:=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣θ0,00…00θ1,0θ1,1…00⋮⋮⋮⋮⋮θN−1,0θN−1,1…θN−1,N−10⎤⎥ ⎥ ⎥ ⎥ ⎥⎦ (22)

where for all index pairs .

The block lower triangular condition on in Remark 1 can be equivalently expressed as

 θi,j=0∀(i,j)∈N0[N−1]×N0[N]such thatj>i. (23)

We transcribe this constraint in terms of the decision variable as

 Eu,iΘETx,j=0∀(i,j)∈N0[N−1]×N0[N]such thatj>i, (24)

where and are defined as block vectors whose th and th blocks are equal to the identity matrices of suitable dimensions; all the other blocks are equal to the zero matrix. For example,

 Eu,0=[Inu,0,…,0],Ex,1=[0,Inx,0,…,0],

where denotes an identity matrix of size .

It is clear that (19) is a convex quadratic function in its arguments with Lipschitz continuous gradient. The squared Wasserstein distance (III) is a difference of convex functions in , and it can be shown that it is also Lipschitz continuous gradient. Thus, the objective in (18) is a difference of convex functions in the decision variables, and as such, it is unclear when it might in fact be convex. In [1], we used convex-concave procedure [19] to numerically compute the optimal solution. In our numerical experiments, we observed multiple local minima which motivates investigating the conditions of optimality for (18). This is what we pursue in Sections IV and V. Before doing so, we show that the objective in (18) is not convex in general but there exists a global minimizer.

###### Proposition 2.

The problem of minimizing the objective in (18) subject to the constraints (24), admits a global minimizing pair .

###### Proof.

The objective in (18) is continuous and coercive (i.e., ) in its arguments.

That is continuous in is immediate. To establish coercivity, following [1, see equation (26)], we write

 J(uff,Θ)=J1(uff)+J2(Θ)+J3(Θ)−J4(Θ), (25)

where

 J1(uff):=∥uff∥22+λ∥F(Γμ0+Huuff)−μd∥22, (26a) J2(Θ):=trace(Θ~SΘT), (26b) J3(Θ):=λtrace(F(I+HuΘ)~S(I+HuΘ)TFT +Sd), (26c) FTS12d)12⎞⎠. (26d)

Since in (26a) is strictly convex quadratic in , it is clear that as .

We note that equals due to invariance of the trace operator under cyclic permutation. Using (2) and (3), we then write

 J2(Θ)=→(Θ)T(~S⊗I)→(Θ). (27)

Since (by Proposition 1), we have . Thus, is a strictly convex quadratic function and as .

Finally, since comes from the expression of the squared Wasserstein distance which is lower bounded by zero, the function for all . Thus, , i.e., the function in (25) is coercive.

Moreover, the constraint set

is closed. Thus, minimizing the objective in (18) subject to the constraints (24), amounts to minimizing a continuous coercive function over a closed set. Hence, there exists global minimizing pair for this problem. ∎

Notice that Proposition 2 only guarantees the existence of global minimizer; it does not guarantee uniqueness. The following example shows that in general, is nonconvex, and there might be multiple local minima which makes it challenging to find the global minimizer.

###### Example 1.

(Nonconvexity of ) Consider the system matrices

 Ak=[1.00.10.01.0],Bk=[0.00.1],Gk=[1.00.00.01.0]∀k∈N0,

with time horizon . The initial and desired mean vectors are , , respectively. The initial covariance is . With this data, for two different desired distributions and with

 Sd1=[4.0−2.0−2.02.0],Sd2=[0.20.00.00.1],

we numerically computed (using convex-concave procedure, see [1, Sec. IV]) the minimizers () and (). Since the desired mean vector is the same in both cases, .

For , define an affine function and let . The function is convex iff its restriction to a line, i.e., is convex. Fig. 1 shows that the function has multiple local minima, thus the function is nonconvex.

## Iv First Order Conditions for Optimality

Recall the function in (25) and (26). We define the index set

 I:={(i,j)∈N0[N−1]×N0[N]∣j>i}.

Now consider the Lagrangian

 L(uff,Θ,N)=J(uff,Θ)+∑(i,j)∈I⟨Ψi,j,Eu,iΘETx,j⟩ (28)

where is the Lagrange multiplier matrix associated with the th linear equality constraint (24) for all , and denotes the Frobenius inner product. Let us denote the optimal pair as . The first order necessary conditions for optimality are

 ∂L∂uff∣∣∣(u⋆ff,Θ⋆)=0,∂L∂Θ∣∣∣(u⋆ff,Θ⋆)=0,Eu,iΘ⋆ETx,j=0.

We next compute the gradients of w.r.t. the vector variable and the matrix variable , respectively, and use them to determine the pair .

### Iv-a The optimal feedforward control

From (25), (26a) and (28), we obtain

 ∂L∂uff=∂J1∂uff=2uff+2λHuTFT(F(Γμ0+Huuff)−μd),

and therefore, the condition yields a linear matrix-vector equation

 (I+λHuTFTFHu)u⋆ff=λHuTFT(μd−FΓμ0). (29)

The matrix in (29) is positive definite (thus non-singular), and the optimal feedforward control is

 u⋆ff=(I+λHuTFTFHu)−1λHuTFT(μd−FΓμ0) =(I−λHuTFT(I+λFHuHuTFT)FHu)λHuTFT (μd−FΓμ0). (30)

Notice that is unique.

### Iv-B The optimal feedback gain

From (25), (26) and (28), we have

 ∂L∂Θ =∂J2∂Θ+∂J3∂Θ−∂J4∂Θ+∑(i,j)∈I∂∂Θtrace(ETx,jΨTi,jEu,iΘ) =∂J2∂Θ+∂J3∂Θ−∂J4∂Θ+∑(i,j)∈IETu,iΨi,jEx,j. (31)

Notice that

 ∂J2∂Θ=∂∂Θtrace(Θ~SΘT)=∂∂Θtrace(ΘTΘ~S)=2Θ~S, (32)

which follows from the invariance of trace under cyclic permutation, and the from the fact that the directional derivative (in the matricial direction )

 limh→01h[trace((Θ+hZ)T(Θ+hZ)~S)−trace(ΘTΘ~S)] =⟨2Θ~S,Z⟩.

Furthermore, let , , , and notice that

 ∂J3∂Θ=λ(2HuTFTF~S+Y3). (33)

From the chain rule of Jacobians, we have

 dtrace(J31∘J32(Θ)) = (vec(I))TDJ31(J32(Θ))DJ32(Θ)dvec(Θ) = (vec(Y3))Tdvec(Θ), (34)

wherein using Lemma 1 and 2 from Appendix, we get

 DJ32(Θ) =~S12⊗FHu, (35a) DJ31(Θ) =(I+K0)(Θ⊗I). (35b)

Combining (34) and (35), we obtain

 vec(Y3)=(DJ32(Θ))T(DJ31(J32(Θ)))Tvec(I) =(~S12⊗HuTFT)(~S12ΘTHuTFT⊗I)(I+K0)vec(I) =2(~SΘTHuTFT⊗HuTFT)vec(I) (36a) =2vec(HuTFTFHuΘ~S), (36b)

wherein (36a) used , and (36b) follows from (2).

From (36b), we identify , which together with (33), yields

 ∂J3∂Θ=2λHuTFTF(I+HuΘ)~S. (37)

Next, let

 J41(Θ) :=(√SdΘ√Sd)12, J42(Θ) :=ΘΘT,J43(Θ):=F(I+HuΘ)~S12, Y4 :=∂∂Θtrace(J41∘J42∘J43(Θ)),

and notice that . Therefore, writing

 dtrace(J41∘J42∘J43(Θ)) = (vec(I))TDJ41(J42(J43(Θ)))DJ42(J43(Θ)) DJ43(Θ)dvec(Θ) = (vec(Y4))Tdvec(Θ),

we obtain

 vec(Y4)=(DJ43(Θ))T(DJ42(J43(Θ)))T (DJ41(J42(J43(Θ))))Tvec(I). (38)

To proceed further, we utilize the following results:

 DJ43(Θ) =D(F~S12+FHuΘ~S12)=~S12⊗FHu, (39a) DJ42(Θ) =(I+K0)(Θ⊗I), (39b) DJ41(Θ) =⎛⎝(S12dΘS12d)12⊕(S12dΘS12d)12⎞⎠−1(S12d⊗S12d). (39c)

The result (39a) follows from Lemma 1 while (39b) follows from Lemma 2. The expression (39c) follows from [15, equation (30)].

Now let

 Ω:=F(I+HuΘ), (40)

which is a linear function of . Substituting (39) in (38), and then using (1), (5a), (6), and recalling , we obtain

 ⎛⎝(S12dΩ~SΩTS12d)12⊕(S12dΩ~SΩTS12d)1