# Proximal algorithms for large-scale statistical modeling and optimal sensor/actuator selection

Several problems in modeling and control of stochastically-driven dynamical systems can be cast as regularized semi-definite programs. We examine two such representative problems and show that they can be formulated in a similar manner. The first, in statistical modeling, seeks to reconcile observed statistics by suitably and minimally perturbing prior dynamics. The second, seeks to optimally select sensors and actuators for control purposes. To address modeling and control of large-scale systems we develop a unified algorithmic framework using proximal methods. Our customized algorithms exploit problem structure and allow handling statistical modeling, as well as sensor and actuator selection, for substantially larger scales than what is amenable to current general-purpose solvers.

• 3 publications
• 3 publications
• 14 publications
• 7 publications
11/18/2021

### Information-theoretic formulation of dynamical systems: causality, modeling, and control

The problems of causality, modeling, and control for chaotic, high-dimen...
11/02/2021

### Constructing Neural Network-Based Models for Simulating Dynamical Systems

Dynamical systems see widespread use in natural sciences like physics, b...
07/14/2021

### General-purpose preconditioning for regularized interior point methods

In this paper we present general-purpose preconditioners for regularized...
08/07/2022

### Accelerating Numerical Solvers for Large-Scale Simulation of Dynamical System via NeurVec

Ensemble-based large-scale simulation of dynamical systems is essential ...
09/08/2017

01/16/2018

### Combinatorial Preconditioners for Proximal Algorithms on Graphs

We present a novel preconditioning technique for proximal optimization m...

## I Introduction

Convex optimization has had tremendous impact on many disciplines, including system identification and control design [1, 2, 3, 4, 5, 6, 7]. The forefront of research points to broadening the range of applications as well as sharpening the effectiveness of algorithms in terms of speed and scalability. The present paper focuses on two representative control problems, statistical control-oriented modeling and sensor/actuator selection, that are cast as convex programs. A range of modern applications require addressing these over increasingly large parameter spaces, placing them outside the reach of standard solvers. A contribution of the paper is to formulate such problems as regularized semi-definite programs (SDPs) and to develop customized optimization algorithms that scale favorably with size.

Modeling is often seen as an inverse problem where a search in parameter space aims to find a parsimonious representation of data. For example, in the control-oriented modeling of fluid flows, it is of interest to improve upon dynamical equations arising from first-principle physics (e.g., linearized Navier-Stokes equations), in order to accurately replicate observed statistical features that are estimated from data. To this end, a perturbation of the prior model can be seen as a feedback gain that results in dynamical coupling between a suitable subset of parameters [8, 9]. On the flip side, active control of large-scale and distributed systems requires judicious placement of sensors and actuators which again can be viewed as the selection of a suitable feedback or Kalman gain. In either modeling or control, the selection of such gain matrices must be guided by optimality criteria as well as simplicity (low rank or sparse architecture). We cast both types of problems as optimization problems that utilize suitable convex surrogates to handle complexity. The use of such surrogates is necessitated by the fact that searching over all possible architectures is combinatorially prohibitive.

Applications that motivate our study require scalable algorithms that can handle large-scale problems. While the optimization problems that we formulate are SDP representable, e.g., for actuator selection, worst-case complexity of generic solvers scales as the sixth power of the sum of the state dimension and the number of actuators. Thus, solvers that do not exploit the problem structure cannot cope with the demands of such large-scale applications. This necessitates the development of customized algorithms that are pursued herein.

Our presentation is organized as follows. In Section II, we describe the modeling and control problems that we consider, provide an overview of literature and the state-of-the-art, and highlight the technical contribution of the paper. In Section III, we formulate the minimum energy covariance completion (control-oriented modeling) and sensor/actuator selection (control) problems as nonsmooth SDPs. In Section IV, we present a customized Method of Multipliers (MM) algorithm for covariance completion. An essential ingredient of MM is the proximal gradient method which we also use for sensor/actuator selection. In Section V, we offer two motivating examples for actuator selection and covariance completion and discuss computational experiments. We conclude with a brief summary of the results and future directions in Section VI.

## Ii Motivating applications and contribution

We consider dynamical systems with additive stochastic disturbances. In the first instance, we are concerned with a modeling problem where the statistics are not consistent with a prior model that is available to us. In that case, we seek to modify our model in a parsimonious manner (a sparse and structured perturbation of the state matrix) so as to account for the partially observed statistics. In the second, we are concerned with the control of such stochastic dynamics via a collection of judiciously placed sensors and actuators. Once again, the architecture of the (now) control problem calls for the selection of sparse matrix gains that effect control and estimation. These problems are explained next.

### Ii-a Statistical modeling and covariance completion

It is well-established that the linearized Navier-Stokes equations driven by stochastic excitation can account for qualitative [10, 11, 12, 13] and quantitative [9] features of shear fluid flows. The goal has been to provide insights into the underlying physics as well as to guide control design. A significant advance was achieved recently by recognizing the value of nontrivial stochastic excitation; in [9], it was shown that colored-in-time noise can account for features of the flow field that white noise (used in earlier literature) cannot [14]. Furthermore, it has been pointed out that the effect of colored-in-time excitation is equivalent to a structural perturbation of the system dynamics subject to white-in-time excitation [8, 9]. Such structural perturbations may reveal important dynamical couplings between states and, thereby, enhance understanding of basic physics [9, Section 6.1].

Based on this last point, an optimal state-feedback synthesis problem was formulated in [15] to identify such dynamical couplings that bring consistency with the observed statistics. The principle of parsimony dictates a penalty on the complexity of such model perturbations. To this end, a convex optimization problem can be formulated that is reminiscent to optimal sensor and actuator selection [16, 17]. This type of problem involves a composite cost function of the form

 f(K,X)+γg(K), (1a) subject to stability of the system in Fig. 1. Here, X denotes a state covariance matrix and K is a state-feedback matrix. The state covariance may be partially known, in which case the constraint Xij = Gij for (i,j)∈I. (1b)

is added, where the entries of represent partially available second-order statistics, i.e., known entries of for indices in . The function

penalizes variance and control energy while

is a sparsity-promoting regularizer which penalizes the number of nonzero rows in ; sparsity in the rows of amounts to a low rank perturbation of the system dynamics.

The resulting minimum-control-energy covariance completion problem can be cast as an SDP which, for small-size problems, is readily solvable using standard software. We note that similar problems have been considered by complementary viewpoints, aiming at stochastic control [18, 19, 20, 21] and output covariance estimation [22, 23, 24].

### Ii-B Sensor and actuator selection

The selection of sensors and actuators affects the performance of any closed-loop control system. For large-scale systems with complex dynamical interactions, sensor/actuator selection and placement is a nontrivial task, potentially having combinatorial complexity. To this end, previous work has mostly focused on employing heuristics. The two most popular trends involve greedy algorithms and convex relaxations.

The benefit of a particular selection of sensors/actuators is commonly quantified by properties of the resulting controllability/observability gramians. This correspondence between a choice of sensors/actuators and performance metric has been approached using concepts of submodularity/supermodularity in an attempt to provide guarantees of near-optimality [25, 26]

. Unfortunately, metrics on the quality of Kalman filtering estimation – a key component of most designs – appear to lack supermodularity

[27]. To make things worse, even selection of open-loop actuators based on average control energy is not supermodular. Such problems may not be effectively addressed by greedy selection methods [28]. In spite of reported initial successes [29], this realization may eventually hamper the use of greedy approaches for sensor/actuator selection.

In addition to greedy algorithms, convex or non-convex methods have also been proposed. In [30], a convex formulation is provided for sensor selection in problems with linear measurements. In [31], sensor selection for correlated measurement noise was approached by maximizing the trace of the Fisher information matrix subject to energy constraints. The placement of power measurement units in power networks was formulated as a variation of the optimal experiment design in [32]

. Actuator selection via genetic algorithms was explored in

[33]. A non-convex formulation of the joint sensor and actuator placement was provided in [34, 35] and it was recently applied to the linearized Ginzburg-Landau equation [36]. Herein, we consider the problem of the optimal selection of a subset of available sensors or actuators which, similar to the modeling problem described earlier, involves minimization of nonsmooth composite function of the form (1a).

The sparsity-promoting framework introduced in [37, 38, 39] can be effectively used to obtain block-sparse structured feedback/observer gains as well as select actuators/sensors. Indeed, algorithms developed in [39] have been used for sensor selection in target tracking [40] and periodic sensor scheduling in networks of dynamical systems [41]. However, these algorithms have been developed for general problems and do not exploit a certain hidden convexity in sensor/actuator selection. Indeed, for the design of row-sparse feedback gains, the authors of [16] introduced a convex SDP reformulation of the problem formulated in [39]. Inspired by [39], the authors of [42] also provided an SDP formulation for the and sensor/actuator placement problems for discrete time LTI systems. Their approach utilizes standard solvers for SDPs with re-weighted norm regularizers. In this paper, we borrow group sparsity regularizers from statistics [43] and develop efficient customized algorithms for the resulting SDPs.

### Ii-C Main contribution

We build on the structural similarity between statistical modeling and sensor/actuator selection to develop a unified algorithmic framework for handling large-scale applications. Due to the non-differentiability of the sparsity-promoting term in the associated cost, standard descent methods are not applicable. We address this issue by utilizing proximal algorithms. For sensor/actuator selection, the alternating direction method of multipliers (ADMM) has been previously used to solve similar problems [17]. In contrast to interior point methods for solving the sensor selection SDP, which scale with the sum of problem dimension and number of available sensors, the ADMM algorithm of [17] only scales with the problem dimension. This is desired when a large pool of sensors/actuators is considered. However, the proposed ADMM algorithm involves subproblems that are difficult to solve and is thus not well-suited for large-scale systems. Herein, we further exploit the problem structure, implicitly handle the stability constraint on state covariances and controller gains by expressing one in terms of the other, and utilize a customized proximal gradient algorithm that scales with the third power of the state-space dimension. Issues like step-size selection, stopping conditions, and initialization are also considered. Finally, numerical experiments are presented to demonstrate the effectiveness of our approach relative to existing methods.

## Iii Problem formulation

Consider a linear time-invariant (LTI) system with state-space representation

 ˙x=Ax+B1d+B2uy=Cx (2)

where

is the state vector,

is the output, is the control input, is a stationary zero-mean stochastic process with covariance , and are input matrices with , is the output matrix, and the pair is controllable.

For the applications that we consider, the matrix in (2) is known and the input matrices and are predetermined full column-rank matrices. The goal is to design an optimal feedback control law, , such that the closed loop system shown in Fig. 1,

 ˙x=(A−B2K)x+B1d (3)

can account for a set of partially available state-correlations and/or is stable in the optimal control sense. More specifically, it is desired to limit the input control energy and steady-state variance of LTI system (2),

 limt→∞E(x∗(t)Qx(t)+u∗(t)Ru(t)), (4)

where is the expectation operator, is the state weight, and specifies the penalty on the control input. On the other hand, parsimony dictates a penalty on the size and directionality of state-feedback couplings. For this, optimal control laws that use a limited number of input channels can be obtained by minimizing composite cost functions of the form

 f(K,X)+γg(K), (5)

where

 f(K,X):=trace(QX+K∗RKX),

and is a regularizing term that promotes row-sparsity of the feedback gain matrix , subject to the structural constraint

 (A−B2K)X+X(A−B2K)∗+B1ΩB∗1 = 0.

This constraint represents the steady-state Lyapunov equation for the feedback loop and it relates the matrix to the steady-state covariance matrix of the state vector

 X:=limt→∞E(x(t)x∗(t)).

### Iii-a Minimum-input-energy covariance completion

In many applications, due to experimental or numerical limitations, only partial correlations between a limited number of state components are available. Moreover, it is often the case that the origin and directionality of disturbances that generate the statistics is unknown. It is thus desired to design an optimal state feedback control law, , such that the closed loop system (3) accounts for the partially available statistics.

For system (2) the steady-state covariance matrix of the state vector satisfies the following rank condition [44]:

 rank[AX+XA∗B2B∗20]=rank[0B2B∗20].

This implies that any positive-definite matrix is admissible as a covariance of LTI system (2) if the input matrix is full row rank [44], which thereby eliminates the role of the dynamics inherent in . It is thus important to limit the rank of the input matrix

and to restrict the number of degrees of freedom that are directly influenced by the state-feedback in (

3). One way to accomplish this is to minimize the number of input channels or columns of the input matrix that perturb the dynamical generator in (3); see [8, 9] for details. This can be equivalently accomplished by minimizing the number of non-zero rows in the matrix ; if the th row of is identically equal to , the th column of is not used. In addition, the feedback gain is chosen to minimize both the control energy and fluctuation energy in statistical steady-state, and at the same time, account for partially known second-order statistics. Following [15], we propose the minimum-control-energy covariance completion problem in its general form as

 (6)

where we have employed tools from the paradigm of group-sparsity [43] to augment the performance index with a term that promotes row-sparsity of the feedback gain matrix . Here, specifies the importance of sparsity, are nonzero weights, is the th unit vector in , with

as the covariance of the white noise process

, and matrices , , , , , , and are problem data. On the other hand, hermitian matrix and the feedback gain matrix are optimization variables. Entries of matrix represent partially available second-order statistics of the output , the symbol denotes elementwise matrix multiplication, and

is the structural identity matrix,

 Eij={1, if Gij is available0, if Gij is unavailable.

By identifying a subset of critical input channels in , problem (6) allows us to uncover the precise dynamical feedback interactions that are required to reconcile the available covariance data with the given linear dynamics.

### Iii-B Actuator selection

As is well-known, the unique optimal control law that minimizes the steady-state variance (4) of system (2) is a static state-feedback . The optimal gain and the corresponding state covariance can be obtained by solving

 minimizeK,Xtrace(QX+K∗RKX)subject to(A−B2K)X+X(A−B2K)∗+V=0X≻0. (7)

Here, , matrices , , , and are problem data, while and are optimization variables. The solution can also be obtained by solving an algebraic Riccati equation arising from the KKT conditions of problem (7). We note that, in general, has no particular structure (i.e., has no zero entries) and therefore all “input channels” (entries of ) are active and non-zero.

Similar to the covariance completion problem (6), a control law that uses only a subset of available actuators can be sought by promoting row-sparsity of via a suitable regularizer, as in the following problem

 (8)
###### Remark 1

Optimization problem (8) for actuator selection is a special case of optimization problem (6) when the set of known second-order statistics is empty, i.e., the matrices and in (6) are zero. In Section IV, we show that problem (8) arises as a subproblem in the customized method of multipliers algorithm that we propose for solving problem (6).

### Iii-C Sensor Selection

The sensor selection problem can be viewed as the dual of the actuator selection problem and can thus be approached in a similar manner. Consider LTI system (2) where the output is corrupted by additive white noise ,

 y=Cx+η.

If is an observable pair, the observer

 ˙^x=As^x+L(y−^y)=As^x+LC(x−^x)+Lη

provides an estimate of the state . Here, represents measurement data, the observer gain, and the predicted output. When is Hurwitz, the zero-mean estimate of is given by . The Kalman gain minimizes the norm of , i.e., the variance amplification from process and measurement noise to estimation error. Similar to the linear quadratic regulator, the Kalman gain matrix is obtained by solving a Riccati equation and, in general, has no particular structure and uses all available measurements.

Designing a Kalman filter which uses a subset of the available sensors is equivalent to designing a column-sparse Kalman gain matrix . Similar to the actuator selection problem, we promote column-sparsity via regularization. This amounts to the optimization problem

 (9)

where , , are as described in problem (8), , and is the covariance of a white stochastic process . Here, , , , and are problem data and and are the optimization variables.

By setting the problem data in (8) to

 A=A∗s,B2=C∗,Q=Vs,V=I,R=Ω,

the solution to sensor selection problem (9) can be obtained from the solution to actuator selection problem (8) as and . Because of this duality, in the remainder of the paper we focus on solving problem (9) as a special case of (8).

### Iii-D Change of variables and SDP representation

The hermitian matrix is positive definite and therefore invertible. Thus, the standard change of variables and the equivalence between the row-sparsity of and the row-sparsity of  [16] can be utilized to bring problems (6) and (8) to the following SDP representable form,

 (10)

where

• for covariance completion; and

• for actuator selection.

From the solution of these problems, the optimal feedback gain matrix can be recovered as . The convexity of problem (10) follows from the convexity of its objective function and the linearity of its constraint set [4].

Problem (10) can be solved efficiently using general-purpose solvers for small-size problems. To address larger problems, we next exploit the problem structure to develop optimization algorithms based on the proximal gradient algorithm and the method of multipliers.

## Iv Customized algorithms

In this section, we describe the steps through which we solve optimization problem (10), identify the essential input channels in , and subsequently refine the solutions based on the identified sparsity structure. For notational compactness, we write the linear constraints in (10) as

 A1(X)−B(Y)+V=0(1−δ)[A2(X)−G]=0

where the linear operators : , : and : are given by

 A1(X):=AX+XA∗A2(X):=(CXC∗)∘EB(Y):=B2Y+Y∗B∗2.

### Iv-a Elimination of variable X

For any , there is a unique that solves the equation

 A1(X)−B(Y)+V=0

if and only if the matrices and

do not have any common eigenvalues

[45]. When this condition holds, we can express the variable as a linear function of ,

 X(Y)=A−11(B(Y)−V), (11)

and restate optimization problem (10) as

 minimizeYf(Y)+γg(Y)subject to(1−δ)[A2(X(Y))−G]=0X(Y)≻0. (12)

The smooth part of the objective function in (12) is given by

 f(Y):=trace(QX(Y)+Y∗RYX−1(Y))

and the regularizing term is

 g(Y):=n∑i=1wi∥e∗iY∥2.

Since problem (12) is equivalent to (10) constrained to the linear subspace defined by (11), it remains convex.

###### Remark 2

When the matrix is Hurwitz, expression (11) can be cast in terms of the well-known integral representation,

 X(Y)=∫∞0eAt(V−BY−Y∗B∗)eA∗tdt.

Even for unstable open loop systems, the operator is invertible if the matrices and do not have any common eigenvalues. In our customized algorithms, we numerically evaluate action of on the current iterate by solving the corresponding Lyapunov equation.

###### Remark 3

Even in cases where the matrix cannot be expressed via (11), since is a controllable pair we can center the design variable around a stabilizing controller , i.e., by letting , where is held fixed and is the design variable. Based on this, the change of variables introduced in Section III-D yields and with

 ^A1(X):=(A−B2K0)X+X(A−B2K0)∗. (13)

We note that the resulting optimization problem involves a nonsmooth term which is not separable in the design variable ,

 minimizeY1^f(Y1)+γg(Y1+K0X(Y1))subject to(1−δ)[A2(X(Y1))−G]=0X(Y1)≻0

where

 ^f(Y1):=trace(QX(Y1)) +trace((Y1+K0X(Y1))∗R(Y1+K0X(Y1))X−1).

Although convex, does not have an easily computable proximal operator, making it difficult to apply algorithms that are based on proximal methods.

In such cases one may assume that a given subset of input channels will always be chosen and that the pair is stabilizable when contains columns that correspond to the input channels represented by . It would thus be desired to select input channels from the complement of via the following optimization problem

 minimizeY1^f(Y1)+γ^g(Y1)subject to(1−δ)[A2(X(Y1))−G]=0X(Y1)≻0

where the operator in (13) is now defined by fixed matrices and with being a feedback gain matrix with row-sparsity structure corresponding to . The regularization term is used to impose row-sparsity on the remaining input channels and has an easily computable proximal operator, thus facilitating the use of proximal methods. It is noteworthy hat this approach may also be employed to obtain an operator which is better conditioned than .

The alternative approach would be to avoid this problem altogether by not expressing as a function of and directly dualizing the Lyapunov constraint on and via augmented Lagrangian based methods, e.g., ADMM [17]. However, as we show in Section V, such approaches do not lead to algorithms that are computationally efficient for all problems.

### Iv-B Proximal gradient method for actuator selection

The proximal gradient (PG) method generalizes gradient descent to composite minimization problems in which the objective function is the sum of a differentiable and non-differentiable component [46]. It is most effective when the proximal operator [47] associated with the nondifferentiable component is easy to evaluate; many common regularization functions, such as the penalty, nuclear norm, and hinge loss, satisfy this condition.

The PG method for solving (12) with is given by

 Yk+1 := proxβkg(Yk−αk∇f(Yk)), (14)

where is the iteration counter, is the th iterate, is the step-size, , and is the proximal operator associated with the function

 proxβg(V) := argminY g(Y)+12β∥Y−V∥2F. (15)

Here, is the Frobenius norm and, for row-sparsity regularizer, the proximal operator is determined by the soft-thresholding operator which acts on the rows of the matrix ,

Proximal update (14) results from a local quadratic approximation of at iteration , i.e.,

 Yk+1:=argminYf(Yk)+⟨∇f(Yk),Y−Yk⟩+12αk∥Y−Yk∥2F+γg(Y), (16)

followed by a completion of squares that brings the problem into the form of (15). Here, denotes the standard matricial inner product and the expression for the gradient of is provided in Appendix -A. We note that PG is a special case of proximal Newton-type methods [48, 49], in which the Hessian of the smooth function is approximated by the scaled identity .

#### Iv-B1 Choice of step-size in (14)

At each iteration of the PG method, we determine the step-size via an adaptive Barzilai-Borwein (BB) initial step-size selection [50], i.e.,

 αk={αmif αm/αs>1/2αs−αm/2otherwise (17)

followed by backtracking to ensure sufficient descent of the objective function and positive definiteness of ; similar strategies were used in [46, Theorem 3.1]. Here, the so-called “steepest descent” step-size and the “minimum residual” step-size are given by [51],

 αs=⟨Yk−Yk−1,Yk−Yk−1⟩⟨Yk−Yk−1,∇f(Yk)−∇f(Yk−1)⟩,αm=⟨Yk−Yk−1,∇f(Yk)−∇f(Yk−1)⟩⟨∇f(Yk)−∇f(Yk−1),∇f(Yk)−∇f(Yk−1)⟩.

If or , the step-size from the previous iteration is used; see [50, Section 4.1] for additional details.

#### Iv-B2 Stopping criterion

We employ a combined stopping condition that terminates the algorithm when either the relative residual

 rk+1r=∥rk+1∥max{∥∇f(Yk+1)∥,∥^Yk+1−Yk+1αk∥}+ϵr,

or the normalized residual

 rk+1n = ∥rk+1∥∥r1∥+ϵn,

are smaller than a desired tolerance. Here, and are small positive constants, the residual is defined as

 rk+1:=∇f(Yk+1)+^Yk+1−Yk+1αk,

and . While achieving a small guarantees a certain degree of accuracy, its denominator nearly vanishes when , which happens when . In such cases, provides an appropriate stopping criterion; see [50, Section 4.6] for additional details.

#### Iv-B3 Convergence of PG

The proximal gradient algorithm converges with rate for a fixed step-size  [46], where is the Lipschitz constant of . Adaptive step-size selection methods [46, 50] can be used to further enhance performance.

### Iv-C Method of multipliers for covariance completion

We handle the additional constraint in the covariance completion problem by employing the Method of Multipliers (MM). MM is the dual ascent algorithm applied to the augmented Lagrangian and it is widely used for solving constrained nonlinear programming problems [52, 53, 54].

The MM algorithm for constrained optimization problem (12) with is given by,

 Yk+1 := argminYLρk(Y;Λk) (18a) Λk+1 := Λk+ρk(A2(X(Yk+1))−G), (18b)

where is the associated augmented Lagrangian,

 Lρ(Y;Λ)=f(Y)+γg(Y) + ⟨Λ,A2(X(Y))−G⟩+ρ2∥A2(X(Y))−G∥2F,

is the Lagrange multiplier and is a positive scalar. The algorithm terminates when the primal and dual residuals are small enough. The primal residual is given as

 Δp=∥A2(X(Yk+1))−G∥F, (19a) and the dual residual corresponds to the stopping criterion on subproblem (18a) Δd=min{rr,rn}, (19b)

where the relative and normal residuals, and , are described in Section IV-B.

#### Iv-C1 Solution to the Y-minimization problem (18a)

For fixed , minimizing the augmented Lagrangian with respect to amounts to finding the minimizer of subject to . Since is nonsmooth, we cannot use standard gradient descent methods to find the update . However, similar to Section IV-B, a PG method can be used to solve this subproblem iteratively

 Yj+1=proxβjg(Yj−αj∇F(Yj)), (20)

where is the inner PG iteration counter, is the step-size, , and denotes the smooth part of the augmented Lagrangian ,

 F(Y):=f(Y)+⟨Λk,A2(X(Y))−G⟩ +ρk2∥A2(X(Y))−G∥2F. (21)

The expression for the gradient of is provided in Appendix -B. Similar to Section IV-B, we combine BB step-size initialization with backtracking to ensure sufficient descent of  and positive definiteness of .

#### Iv-C2 Lagrange multiplier update and choice of step-size in (18b)

Customized MM for covariance completion is summarized as Algorithm 2. We follow the procedure outlined in [54, Algorithm 17.4] for the adaptive update of . This procedure allows for inexact solutions of subproblem (18a) and a more refined update of the Lagrange multiplier through the adjustment of convergence tolerances on and .

### Iv-D Computational complexity

Computation of the gradient in both algorithms involves computation of from based on (11), a matrix inversion, and solution to the Lyapunov equation, which each take operations, as well as an matrix-matrix multiplication. The proximal operator for the function amounts to computing the -norm of all rows of a matrix with columns, which takes operations. These steps are embedded within an iterative backtracking procedure for selecting the step-size . If the step-size selection takes inner iterations and it takes iterations for the gradient based method to converge, the total computation cost for a single iteration of our customized algorithms is . In contrast, the worst-case complexity of standard SDP solvers is .

### Iv-E Comparison with other methods

One way of dealing with the lack of differentiability of the objective function in (12) is to split the smooth and nonsmooth parts over separate variables and to add an additional equality constraint to couple these variables. This allows for the minimization of the augmented Lagrangian via the the Alternating Direction Method of Multipliers (ADMM) [55].

In contrast to splitting methods, the algorithms considered in this paper use the PG method to solve the nonsmooth problem in terms of the primal variable , thereby avoiding the necessity to update additional auxiliary variables and their corresponding Lagrange multipliers. Moreover, it is important to note that the performance of augmented Lagrangian-based methods is strongly influenced by the choice of . In contrast to ADMM, there are principled adaptive rules for updating the step-size in MM. Typically, in ADMM, either a constant step-size is used or the step-size is adjusted to keep the norms of primal and dual residuals within a constant factor of one another [55]. Our computational experiments demonstrate that the customized proximal algorithms considered in this paper significantly outperform ADMM.

###### Remark 4

In [17], a customized ADMM algorithm was proposed for solving the optimal actuator and sensor selection problem. In this, the structural Lyapunov constraint on and is dualized via the augmented Lagrangian. While this approach avoids the issue underscored in Remark 3, as we show in Section V, it performs poorly in practice. This is because of higher computational complexity ( per iteration) of the ADMM algorithm developed in [17].

### Iv-F Iterative reweighting and polishing

In optimization problem (10) the weighted norm is used to promote row sparsity of the matrix . This choice is inspired by the exact correspondence between the weighted norm, i.e., with for , and the cardinality function . Since this choice of weights cannot be implemented, the iterative reweighting scheme was proposed instead in [56]. We follow a similar approach and update the weights using

 wj+1i=1∥e∗iYj∥2+ϵ, (22)

where denotes the solution to problem (10) in the th reweighting step. The small positive parameter ensures that the weights are well-defined.

After we obtain the solution to problem (10), we conduct a polishing step to refine the solution based on the identified sparsity structure. For this, we consider the system

 ˙x=(A−B2,spK)x+B1d,

where the matrix is obtained by eliminating the columns of which correspond to the identified row sparsity structure of , and denotes the number of retained input channels. For this system, we solve optimization problem (10) with . This step allows us to identify the optimal matrix and subsequently the optimal feedback gain for a system with a lower number of input control channels. As we demonstrate in our computational experiments, polishing not only reduces the energy of the control input but it can also improve the quality of completion of the covariance matrix .

## V Computational experiments

We provide two examples to demonstrate the utility of the optimization framework presented for optimal sensor and actuator selection and covariance completion, in addition to the computational efficiency of our proposed algorithms.

### V-a Actuator selection

The Swift-Hohenberg equation is a partial differential equation that has been widely used as a model for studying pattern formations in hydrodynamics and nonlinear optics

[57]. Herein, we consider the linearized Swift-Hohenberg equation around its time independent spatially periodic solution [58]

 ∂tψ(t,ξ)=−(∂2x+1)2ψ(t,ξ)−cψ(t,ξ)+fψ(t,ξ)+u(t,ξ)+d(t,ξ),

with periodic boundary conditions on a spatial domain . Here, is a constant and we assume that with

. Finite dimensional approximation and diagonalization via the discrete Fourier transform yields the following state-space representation

 ˙ψ=Aψ+u+d. (23)

For , , and , the linearized dynamical generator has two unstable modes. We set and and solve the actuator selection problem (problem (10) with ) for 32, 64, 128 and 256 discretization points and for various values of the regularization parameter . For , Table I compares the proposed PG algorithm against CVX [59] and the ADMM algorithm of [17]. Both PG and ADMM were initialized with , where and solve the algebraic Riccati equation which specifies the optimal centralized controller. This choice guarantees that . All algorithms were implemented in Matlab and executed on a GHz Intel Core i5 processor with GB RAM. The algorithms terminate when an iterate achieves a certain distance from optimality, i.e., and . The choice of guarantees that the value of the objective function is within of optimality. For , CVX failed to converge. In this case, iterations are run until the relative or normalized residuals defined in Section IV-B2 become smaller than .

For and , ADMM did not converge to desired accuracy in reasonable time. Typically, the ADMM algorithm of [17] computes low-accuracy solutions quickly but obtaining higher accuracy requires precise solutions to subproblems. The iterative reweighting scheme of Section IV-F can be used to improve the sparsity patterns that are identified by such low-accuracy solutions. Nonetheless, Fig. 2 shows that even for larger tolerances, PG is faster than ADMM.

As increases in problem (10), more and more actuators are dropped and the performance degrades monotonically. For , Fig. 3 shows the number of retained actuators as a function of and Fig. 3 shows the percentage of performance degradation as a function of the number of retained actuators. Figure 3 also illustrates that for various numbers of retained actuators, solution to convex optimization problem (10) with consistently yields performance degradation that is no larger than the performance degradation of a greedy algorithm (that drops actuators based on their contribution to the performance index). For example, the greedy algorithm leads to performance degradation when actuators are retained whereas our approach yields performance degradation for the same number of actuators. This greedy heuristic is summarized in Algorithm 3, where is the set of actuators and denotes the performance index resulting from the actuators within the set . When the individual subproblems for choosing fixed numbers of actuators can be executed rapidly, greedy algorithms provide a viable alternative. There has also been recent effort to prove the optimality of such algorithms for certain classes of problems [29]. However, in our example, the greedy algorithm does not always provide the optimal set of actuators with respect to the performance index. Relative to the convex formulation, similar greedy techniques yield suboptimal sensor selection for flexible aircraft wing [7, Section 5.2].

The absence of the sparsity promoting regularizer in (10) leads to the optimal centralized controller which can be obtained from the solution to the algebraic Riccati equation. For , Figs. 4 and 4 show this centralized feedback gain and the two norms of its rows, respectively. For in (10), of possible actuators are retained and the corresponding optimal feedback gain matrix and row norms are shown in Figs. 4 and 4. From Figs. 4 and 4 we see that a truncation of the centralized feedback gain matrix based on its row-norms yields a different subset of actuators than the solution to optimization problem (10).

### V-B Covariance completion

We provide an example to demonstrate the utility of our modeling and optimization framework for the purpose of completing partially available second-order statistics of a three-dimensional channel flow. In an incompressible channel-flow, the dynamics of infinitesimal fluctuations around the parabolic mean velocity profile, with , are governed by the Navier-Stokes equations linearized around . The streamwise, normal and spanwise coordinates are represented by , , and , respectively; see Fig. 5 for geometry. Finite dimensional approximation yields the following state-space representation

 ˙\boldmathψ(k,t)=A(k)\boldmathψ(k,t)+ξ(k,t),v(k,t)=C(k)\boldmathψ(k,t). (24a) Here, \boldmathψ=[vT2 ηT]T∈C2N is the state of the linearized model, v2 and η=∂x3v1−∂x1v3 are the normal velocity and vorticity, the output v=[vT1 vT2 vT3]T∈C3N denotes the fluctuating velocity vector, ξ is a stochastic forcing disturbance, k=[k1 k3]T denotes the vector of horizontal wavenumbers, and the input matrix is the identity I2N×2N. The dynamical matrix A∈C2N×2N and output matrix C∈C3N×2N are described in [12].

We assume that the stochastic disturbance is generated by a low-pass filter with state-space representation

 ˙ξ(k,t)=−ξ(k,t)+w(t), (24b)

where denotes a zero mean unit variance white process. The steady-state covariance of system (24) can be obtained as a solution to the Lyapunov equation

 ~AΣ+Σ~A∗+~B~B∗=0.

with

 ~A=[A  IO−I], ~B=[0I], Σ=[Σ11Σ12Σ∗12Σ22].

The matrix denotes the steady-state covariance of system (24a)

 Σ11:=limt→∞E(\boldmathψ(t)\boldmathψ∗(t)),

which, at any wavenumber pair , is related to the steady-state covariance matrix of the output via

 Φ(k)=C(k)Σ11(k)C∗(k).

In this example, we set the covariance of white noise disturbances to the identity () and assume that the one-point velocity correlations, or diagonal entries of the subcovariance matrices in Fig. 6 are available. In order to account for these available statistics, we solve optimization problem (10) with and for a state covariance that agrees with the available statistics.

Computational experiments are conducted for a flow with Reynolds number , the wavenumber pair , for various number of collocation points in the normal direction (state dimension ), and for various values of the regularization parameter . As stated in Algorithm 2, we initialize our algorithms using the optimal centralized controller, . This choice guarantees that . When CVX can compute the optimal solution of problem (10), for each method iterations are run until the solutions come to within of the CVX solution. For larger problems, iterations are run until the primal and dual residuals satisfy a certain tolerances; , . For , Table II compares various methods based on run times (sec). For and , CVX failed to converge and ADMM did not converge in a reasonable time. Clearly, MM outperforms ADMM. This can also be deduced from Fig. 7, which shows convergence curves for steps of MM and steps of ADMM for and . For this example, Fig. 8 shows the convergence of MM based on the normalized primal residual and the dual residual defined in (19).

We now focus on collocation points and solve problem (10) for various values of . Since the input matrix is assumed to be the identity, the numb