Reducing the LQG Cost with Minimal Communication

09/25/2021 ∙ by Oron Sabag, et al. ∙ California Institute of Technology 0

We study the linear quadratic Gaussian (LQG) control problem, in which the controller's observation of the system state is such that a desired cost is unattainable. To achieve the desired LQG cost, we introduce a communication link from the observer (encoder) to the controller. We investigate the optimal trade-off between the improved LQG cost and the consumed communication (information) resources, measured with the conditional directed information, across all encoding-decoding policies. The main result is a semidefinite programming formulation for that optimization problem in the finite-horizon scenario, which applies to time-varying linear dynamical systems. This result extends a seminal work by Tanaka et al., where the only information the controller knows about the system state arrives via a communication channel, to the scenario where the controller has also access to a noisy observation of the system state. As part of our derivation to show the optimiality of an encoder that transmits a memoryless Gaussian measurement of the state, we show that the presence of the controller's observations at the encoder can not reduce the minimal directed information. For time-invariant systems, where the optimal policy may be time-varying, we show in the infinite-horizon scenario that the optimal policy is time-invariant and can be computed explicitly from a solution of a finite-dimensional semidefinite programming. The results are demonstrated via examples that show that even low-quality measurements can have a significant impact on the required communication resources.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 4

page 7

page 8

page 10

page 18

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Networked control systems share an inherent tension between the control performance and the resources that are allocated to communicate by different nodes of the system. Despite the great advances on important questions in this theme such as data rate theorems for stabilizability of dynamical systems [2, 3, 4, 5, 6, 7, 8, 9], there are still fundamental questions that remain open such as the trade-off between communication resources and the control cost [10, 11, 12, 13, 14, 15, 16]. In this paper, we investigate this question on a simple topology consisting of the classical Linear Quadratic Gaussian (LQG) setting with a single communication link.

X[][][1] Z[][][1] U[][][1] S[][][1] F[][][1](Encoder) E[][][1]Observer C[][][1]Controller D[][][1](Decoder) O[][][1]Measurement P[][][.8] R[][][1]Communication T[][][1]The dynamical system

Fig. 1: The LQG setting with a noisy observation . The control performance (the quadratic cost) is improved using a communication link (the dashed line) from the observer to the controller.

The networked control setting investigated in this paper (Fig. 1) aims to reduce the achievable control cost at the expense of communication resources. The communication link introduced between an encoder and a decoder (co-located with the controller) serves as an information pipeline to the controller that also has an access to the LQG measurements . Based on its (full) observation of the state, the encoder transmits extra information to the controller resulting in a reduction in the LQG cost. One can also view this setting as the standard rate-constrained LQG setting [17], but with side information available to the controller (the measurement ) [18, 15, 19, 20]. The objective of this paper is to characterize the minimal communication resources subject to a strict constraint on the control performance measured by a quadratic cost.

The communication (information) resources are measured with the conditional directed information. The directed information is suitable for scenarios where the operations of the involved units are sequential, e.g., channels with feedback in communication [21, 22, 23] and the causal rate distortion function in the context of control problems [11, 14]. Also here, both mappings of the encoder and the controller are sequential and the directed information serves as a lower bound to the operational variable-length (prefix) coding problem [24, 11] (See also Section VI). The control performance is measured by a quadratic cost function of the state and control signals. The optimization problem is formulated for two scenarios corresponding to the finite-horizon and infinite-horizon regimes.

For the finite-horizon problem, time-varying linear dynamical systems are investigated and the minimal conditional directed information is formulated as a convex optimization problem. The optimization problem has a semidefinite programming (SDP) form (more precisely, max log-det

form) and can be implemented using standard solvers even for large horizons. We also show that the solution to the optimization problem can be realized by three design steps: controller gains computation, solution for the convex optimization problem and a standard Kalman filter. For the infinite-horizon problem where the dynamical system matrices are time-invariant, we show that the optimization problem can be also formulated as an SDP with the optimization variables being two positive semidefinite matrices of finite dimensions. Most importantly, we show that the optimal encoding policy is a simple, time-invariant Gaussian measurement of the state that can be computed from the convex optimization.

Our results generalize the work by Tanaka et al. [17], which introduced the SDP approach for solving control-communication problems [25]. Specifically, we investigate the full LQG setting, while [17] assumed that the LQG measurement is absent ( in Fig. 1). Thus, the control performance in our setting relies on the fusion of both the communication link information and the LQG Gaussian measurement.

Two key changes in the SDP formulation are the objective function that includes a new term due to the study of conditional directed information rather than the directed information in [17], and a new linear matrix inequality (LMI) constraint which represents the error covariance reduction due to the LQG measurement. To find the optimal policy structure, we study a relaxed optimization problem where the LQG measurements are available to the encoder as well. We then show that even in this relaxed scenario, the optimal encoder signaling is a memoryless Gaussian measurement of the state. Thus, the knowledge of the LQG measurements at the encoder can not reduce the minimal communication resources. This extends the observation made in [18]

in the scalar setting for the vector one.

The problem of control under communication constraints with side information has recently attracted much interest [18, 19, 15, 26, 20]. In [18], a scalar version of the problem in Fig. 1 was solved. In [19], a slightly less general problem than Fig. 1 was considered. They conjectured that a linear, memoryless policy is optimal and provided a semidefinite programming solution. The conjecture and the SDP formulation are subsumed in the conference version of the current paper [1], published prior to [19]. Additionally, [19] shows that the conditional directed information is within a constant gap from the operational problem of variable-length coding with side information available to the controller and the encoder. This is obtained by constructing a practical coding scheme and analyzing its performance. In [15], the rate-distortion counterpart of the control problem studied here is considered. It is shown that if the optimal policy is assumed to be linear and the LQG cost admits an upper bound at all times, a simple optimization problem can be realized for the corresponding rate-distortion problem. The result presented below in Theorem 1 confirms the optimality of the policy conjectured in [19] and of the linear policy assumed in [15]. It should be remarked that the objective considered in [19, 15] and that in the current paper is the conditional directed information, which is a lower bound to the operational problem in the case of a fixed rate or in the case of a variable rate and prefix-free codebooks. In [26], it is shown that the directed information is a tighter lower bound but, it is also illustrated that a Gaussian policy does not attain its minimum and therefore, it is not clear whether a computable form of the directed information can be obtained. Finally, [20] studied coding schemes for the scalar LQG setting with a Gaussian communication channel based on the joint source channel schemes in [27, 28].

The remainder of this paper is organized as follows. Section II introduces the notation, setting and the problem definition. Section III presents our main results and Section V provides their proofs. Section IV presents numerical examples.

Ii The setting and problem definition

A linear dynamical system is described by

(1)

where are mutually independent. The initial state is distributed according to and is independent of . A noisy measurement of the state is available to the controller,

(2)

with . For a fixed time-horizon , the LQG quadratic cost is defined as

(3)

with and , and superscripts denote vectors starting at time , e.g., .

The objective is to design a system such that the LQG cost does not exceed a cost target denoted by . Naturally, if the measurements are sufficient to attain , the classical solution to the LQG problem is satisfactory, and there is no need to expand. In the other extreme, the LQG cost cannot be reduced below the LQG cost attained by a fully observer, i.e., . Our interest lies in the scenario where is below the optimal LQG cost attainable with the partial observer (2) but above the optimal LQG cost attainable with the full observer. In this case, the introduction of a communication/information link (see the dashed line in Fig. 1) between a full observer (encoder) and a controller (a decoder) will help to attain the desired LQG cost .

The encoder is characterized by the set of stochastic mappings that can be compactly represented by the causal conditioning

(4)

Similarly, the decoder (controller) is a causally conditioned probability distribution

(5)

By the construction, the encoder-decoder pair satisfies at all times

(6)

The overall joint distribution can be summarized using the one-step update

(7)

The communication resources are measured by the directed information from the encoder to the controller causally conditioned on the partial observations at the controller [29, 23]:

(8)

where is the mutual information between and conditioned on .

The objective of this paper is to solve the optimization problem:

(9)

where the minimum is over policies of the form (II).

When the measurement is absent, the optimization problem in (II) simplifies to the directed information   that was investigated in [10, 17]. To see that the conditional directed information measures the information encapsulated at the encoding policy, assume that the -th element in the conditional directed information satisfies:

(10)

Then, the right hand side extracts the state uncertainty at the controller with and without the encoding variable , i.e., . Specifically, the difference reflects the fact that is costly while is a natural occurrence of the dynamical system without any cost. These arguments are formalized in Theorem 1 and Lemma 1. We will also show a relation between the optimal conditional directed information and the Kalman filtering theory with two independent measurements.

Iii Results

This section presents our results. First, we provide a simple structure for the optimal policy in Theorem 1. Then, we present preliminaries on Kalman filtering theory to express the directed information in its terms. We then provide a semidefinite programming formulation of the optimization problem and present the optimal system design. Finally, Section III-E includes the formulation and the solution for the infinite-horizon problem.

Iii-a Optimal policy structure

The first result is the optimal structure of the observer (encoder) and controller (decoder) policies:

Theorem 1 (Optimal policy structure).

An optimal policy for the optimization problem in (II) is given by

(11)

where is independent from and is a constant given by the LQR controller (see (III-D1), below).

Moreover, the knowledge of the measurements at the encoder does not reduce the optimal directed information control problem in (II).

The theorem simplifies significantly the maximization domain from the general policy in (II) to the set . The encoding rule reveals that

reduces the communication resources by introducing an additive noise to the state observation. We emphasize that our problem formulation does not impose any structural constraints onto the encoding policy such as linear, memoryless, or following a Gaussian distribution. The control signal

is the standard LQG certainty equivalence controller. Thus, similar to the scalar case in [18]

, the separation between the control gain and the estimation is preserved in our setting. The proof of Theorem 

1 appears in Section V.

Theorem 1 extends [17, Th. ] and recovers it when , the observation, is absent. The extension of [17] to our setting is not trivial (see e.g., [19, 15] for progress on that problem), and involves the study of a relaxed optimization problem where, at time , the vector is also available to the encoder. For this relaxed optimization problem, we show that the optimal policy is of the form (1). In other words, even if the side information is available at the encoder, it cannot reduce the conditional directed information. This is consistent with the observation made in [18] in the context of the scalar system.

Iii-B Kalman filter with two (independent) measurements

As is evident from the optimal structure in Theorem 1, the encoding function is a noisy measurement of the system state, and its additive noise is independent of the other measurement . Thus, the optimal system has a structure of an LQG setting with two independent observations. However, for the purpose of optimizing the communication resources, has a cost, while is a natural occurrence of the system. In this section, we provide short preliminaries on Kalman filtering and present the conditional directed information in Kalman filtering terms.

Following a standard convention, we denote the error covariance matrices with respect to both measurements and as

(12)

Since the communication resources should be measured with respect to the observation only, we define the intermediate error covariance matrix corresponding to the prediction error after observing only:

(13)

 

(14)

The following lemma formalizes several relations between the error covariances.

Lemma 1 (Error covariance matrices).

Let be the covariance matrix of . Then, for a fixed policy , the error covariance matrices can be updated as

(15a)
(15b)
(15c)

where , and .

The identities are standard in Kalman filtering theory, and their proofs are omitted. It now follows that the directed information can be expressed as

(16)

Note that the matrix is the multiplicative term of the error reduction when computing from . Therefore, the conditional directed information measures the reduction in error covariance with respect to only, as desired.

Iii-C SDP formulation

Despite the elegant representation of the objective function in (III-B), it is not clear whether (II) can be formulated as a convex optimization since its inverse includes a product of two optimization variables . Our next result shows a convex optimization formulation for (II).

Theorem 2 (SDP formulation).

For a fixed , the optimization problem (II) can formulated as the convex optimization

(17)

where the constant matrices and can be computed from (III-D1) below, and the constant is given by

(18)

The optimization problem in Theorem 2 is convex optimization with respect to the decision variables , and can be solved using standard solvers, e.g., [30, 31, 32]111Some solvers require to write the determinant of in a symmetric form using Sylvester’s determinant theorem.. It will be shown in the proof of Theorem 2 in Section V below that the auxiliary decision variable evaluated at the optimal point is equal to . However, it is necessary to introduce this variable in order to convert the objective to have a standard convex form. Then, the equality constraint resulting from the change of variable can be (optimally) relaxed to an inequality that is equivalent to the LMI above. The optimization problem extends [17, Th. ] to the case where the LQG measurement is available to the controller, and recovers it by choosing . In this case, the constraints on simplify to and .

Iii-D System design

In this section, we construct a three-steps realizable policy using the results from the previous section..

Iii-D1 The controller gain

The controller gains are independent of the measurements and the variables from the optimization problem. The gains can be computed from a backward Riccati recursion, with the initial condition , as

(19)

Iii-D2 Covariance matrices

Given the sequence , the optimal can be determined from the convex optimization problem in Theorem 2, and one can compute

An application of the SVD decomposition determines the parameters of the optimal policy in Theorem 1.

Iii-D3 Kalman filter

The Kalman gain is defined as

(20)

where .

The Kalman update is done in two steps:

(21)

where the control signal is .

Iii-E The infinite-horizon setting

In this section, we formulate and solve the optimization problem (II) in the infinite-horizon regime. In this scenario, we consider time-invariant systems, i.e., , , , , and time-invariant cost matrices , . The optimization problem is defined as:

(22)

where the infimum is taken with respect to the sequence of stochastic policies given in (II).

The solution structure is similar to the finite-horizon solution in Theorem 2. In particular, we construct a controller based on a solution to a convex optimization problem. We begin with the controller description.

Iii-E1 Controller gain

Assume that is stabilizable and is observable on the unit circle. Then, we define to be the unique stabilizing solution for the Riccati equation

(23)

By having the stabilizing solution, we can present the SDP-based system design in the infinite-horizon regime.

Theorem 3.

If the pair is stabilizable and the pair is observable on the unit circle, the infinite-horizon optimization problem (III-E) can be formulated as the convex optimization

s.t.
(24)

where is given in (23), and .

Moreover, let be the optimal solution in (3) and compute

(25)

and its SVD decomposition as . Then, optimal time-invariant encoder and decoder are given by

(26)

where , , and is computed recursively using the Kalman filter in (III-D3).

Theorem 3 shows that the optimization problem in the infinite-horizon regime is computationaly simpler than the finite-horizon regime solved in Theorem 2. In the proof of Theorem 3, Theorem 1 is used for the structure of the optimal policy, however, it is interesting to note that we also show that a time-invariant law is optimal while in Theorem 2 the optimal policy is time-varying. The main idea to show this property is the convexity of the objective. In particular, one can use Jensen’s inequality to show that the evaluation of the objective at the convex combination of the decision variables is smaller than the averaged sum of objectives at all times. This fact can be exploited in the infinite-horizon regime to show that the convex combination of the decision variables satisfies the stationary constraints presented in Theorem 3. The proof of Theorem 3 is given in Section V-C.

Iv Examples

Iv-a Side information reduces the minimal directed information

Fig. 2: The trade-offs between the conditional directed information and the LQG cost when the SNR of the side information varies.

In this section, we study a numerical example to show the benefits of side information and discuss the trade-offs between communication resources and control performance. We set the matrices to be the same as those in [17, Sec. V]

(27)

and the cost matrices are set to be identity matrices.

We start by studying an LQG system in which the side information to the decoder is given by and with , so that . For each , and , we solve (3) for each LQG cost constraint in the range and plot the optimal value of (3) as a function of in Fig. 2. The case without side information studied in [17] can be equivalently viewed as the case with .

In Fig. 2, we can see that for any fixed , the minimal conditional directed information decreases as

(the signal-to-noise ratio of the side information) increases. The red vertical line corresponds to the minimal cost that can be attained with clean observation available at the controller. The intersection with the LQG constraint axis corresponds to the LQG cost that is achieved without communication, that is, using the side information only. It is also interesting to note that a fixed information level, the gain due to the presence of

increases for an increasing control cost.

In all curves with side information, the minimal directed information converges to zero as the LQG cost increases to infinity. However, in the case without side information, the curve converges to some constant known as the minimal rate needed to stabilize the system [33]. This rate can be computed as , where denotes the

th eigenvalue of its argument. The fact that the curves converge to zero follow from the detectability of the pair

(indeed, is a full-rank so that the pair is observable). We proceed to study a scenario in which the side information implies that the pair is not detectable.

Fig. 3: The trade-offs between the conditional directed information and the LQG cost when the SNR of the side information varies.

Here, we fix the side information variance to be the identity matrix

(i.e., ), but change the observability matrix according to two scenarios. In the first, the matrix has dimensions for , and is given by . Clearly, if , there is no side information, and if it is the full-observable matrix studied in Fig. 2 with . In the other case, we carefully choose to be orthogonal to one of the unstable eigenvectors of

, i.e., the eigenvector whose corresponding eigenvalue is

. One choice of such a matrix is 

In Fig. 3, the minimal directed information is plotted as a function of the LQG cost . As expected, it can be observed that the communication resources are decreasing as the side information dimension is increasing. For all observability matrices with , the curves tend to zero as the cost grows to . On the other hand, the curves that correspond to from [17], and the observability matrix tend to a constant when the cost is large. This constant can be calculated as the minimal rate needed to stabilize the system. In the blue curve, it is and for it is where is the only unstable eigenvalue that cannot be observed via .

Iv-B Scalar systems

For scalar systems, without the LQG measurement , the solution to  (3) [7, 17, 10] is

(28)

where is the unique solution to the Riccati equation and can be solved in closed-form as

(29)

In the following result, we provide a closed-form for the scalar problem. The proof is in Section V-D below.

Corollary 1.

When are scalars, and , the optimal value of the optimization (3) is

(30)

when ; and is 0 when , where is the unique positive solution to the quadratic equation

(31)

and is given in (29) and .

By comparing (28) and (1), the information gain due to the presence of the LQG measurement is the non-negative expression

(32)

Note that the gain is an increasing function of . Also, the gain is upper bounded by (28) which is achieved with equality when since satisfies (see Eq. (31)).

Remark 1.

In [18], the rate distortion problem which corresponds to the control problem studied in this paper has been solved for the scalar case. To reveal [18, Th. ] from Corollary 1, let in order to write (1) as

(33)

V Proofs

In this section, we prove our results. We start with Theorem 1 on the optimal policy structure.

V-a Proof of Theorem 1 (Optimal policy structure)

The proof follows from the following claims that will be shown consecutively thereafter.

  1. Instead of minimizing over stochastic kernels in (5), it is sufficient to minimize over that is a deterministic function of .

  2. The minimization domain is relaxed by allowing encoders of the form instead of (in (4)). That is, the new encoder has additional access to the observation .

  3. It is sufficient to minimize the relaxed optimization problem over , i.e., to let the encoder depend on rather the tuple .

  4. It is sufficient to minimize the relaxed optimization problem over Gaussian encoder outputs, i.e,

    (34)

    where .

  5. It is sufficient to minimize the relaxed optimization problem over

    (35)
  6. The optimal control is , where is the control gain.

By claim , the minimizer of the relaxed optimization problem is in the original minimization domain (II). Thus, both optimization problems have a common minimizer, and is a composition of a Kalman filter and certainty equivalence controller.

Claim : From the functional representation lemma [34], one can write for some deterministic function

and random variable

that is independent of . Let , and note that . Moreover, the joint distribution of and is unaffected by absorbing the controller’s randomness to the encoder (stochastic) mapping so the LQG cost remains the same. This procedure can be inductively repeated to de-randomize at all times.
Claim : Trivial, since the minimization domain is increased.
Claim : Consider a simple lower bound on the objective function,

(36)

For a fixed sequence of deterministic mappings characterizing , the lower bound (V-A) and the LQG cost are fully determined by .

We will now show by induction that is determined by . For , this claim is trivial. For the inductive step, assume that is determined by . Now, consider

and note that can be written as

which is fixed by the sequence due to the measurement characteristics (2), the fact that is a deterministic function of and the induction hypothesis.
Claim : First, the differential entropy from (V-A) is re-written as,

(37)

We now lower bound the mutual information using (V-A),