# Model Reduction of Linear Dynamical Systems via Balancing for Bayesian Inference

We consider the Bayesian approach to the linear Gaussian inference problem of inferring the initial condition of a linear dynamical system from noisy output measurements taken after the initial time. In practical applications, the large dimension of the dynamical system state poses a computational obstacle to computing the exact posterior distribution. Model reduction offers a variety of computational tools that seek to reduce this computational burden. In particular, balanced truncation is a system-theoretic approach to model reduction which obtains an efficient reduced-dimension dynamical system by projecting the system operators onto state directions which trade off the reachability and observability of state directions as expressed through the associated Gramians. We introduce Gramian definitions relevant to the inference setting and propose a balanced truncation approach based on these inference Gramians that yield a reduced dynamical system that can be used to cheaply approximate the posterior mean and covariance. Our definitions exploit natural connections between (i) the reachability Gramian and the prior covariance and (ii) the observability Gramian and the Fisher information. The resulting reduced model then inherits stability properties and error bounds from system theoretic considerations, and in some settings yields an optimal posterior covariance approximation. Numerical demonstrations on two benchmark problems in model reduction show that our method can yield near-optimal posterior covariance approximations with order-of-magnitude state dimension reduction.

There are no comments yet.

## Authors

• 4 publications
• 5 publications
• 4 publications
• 17 publications
• 7 publications
• 1 publication
• 33 publications
09/10/2019

### Gramians, Energy Functionals and Balanced Truncation for Linear Dynamical Systems with Quadratic Outputs

Model order reduction is a technique that is used to construct low-order...
04/26/2022

### The Pseudo-Reachability Problem for Diagonalisable Linear Dynamical Systems

We study fundamental reachability problems on pseudo-orbits of linear dy...
02/15/2021

### Full state approximation by Galerkin projection reduced order models for stochastic and bilinear systems

In this paper, the problem of full state approximation by model reductio...
12/14/2018

### Non-Factorised Variational Inference in Dynamical Systems

We focus on variational inference in dynamical systems where the discret...
06/13/2021

### A Large Deviation Approach to Posterior Consistency in Dynamical Systems

In this paper, we provide asymptotic results concerning (generalized) Ba...
09/28/2020

### Reachability in Dynamical Systems with Rounding

We consider reachability in dynamical systems with discrete linear updat...
02/06/2021

### Efficient Learning of a Linear Dynamical System with Stability Guarantees

We propose a principled method for projecting an arbitrary square matrix...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The task of updating computational models to take into account empirical observations — a task that plays a central role in data assimilation — occurs in a diverse range of disciplines, including geophysics, oceanography, and atmospheric sciences kalnay_atmospheric_2002; bennett_inverse_2005; palacios2006non; evensen_data_2009. Specific applications include sea ice modeling rounce_quantifying_2020; yang2020tipping, geothermal reservoir model calibration cui_bayesian_2011; cui2019posteriori, and nitrogen concentration of water bodies de2002bayesian. In the Bayesian inferential approach to data assimilation, model parameters are related to observed data through a measurement model that combines a forward map with additive stochastic noise :

 m=G(p)+ϵ. (1)

Before observations are obtained, uncertainty about the model parameters is encoded in a priorprobability distribution for

. The probability distribution of the measurements conditioned on the parameter,

, is called the likelihood. After observations are obtained, the Bayesian approach computes an updated posterior distribution for which reflects the uncertainty in the parameters conditioned on the measured data and is proportional to the product of the prior and the likelihood of the data. Once the posterior distribution has been characterized, subsequent steps in application may require computing and utilizing posterior statistics. In the linear Gaussian case, this requires computing only a posterior mean and covariance, but for general non-Gaussian posteriors, these statistics are approximated using sampling procedures soize2020sampling (e.g., rejection sampling vonNeumann1951; deligiannidis2020ensemble; devroye1986general

or Markov chain Monte Carlo methods

miroshnikov2015parallel; apte2007sampling; hensman2015mcmc). Modern overviews of inference techniques and algorithms can be found in excellent survey articles stuart_inverse_2010; dashti_bayesian_2017 and textbooks tarantola_inverse_2005; calvetti_introduction_2007; law_data_2015.

In practical settings, several factors may pose computational challenges to the use of Bayesian inference. First, the parameter dimension is often large; e.g., when represents the high-dimensional spatial discretization of a continuous field. Second, it is common to be interested in parameter values, , at an initial time, with observations, , taken after the initial time. In these settings, the forward model may involve simulation of a high-dimensional dynamical system, making the characterization of the posterior distribution computationally expensive.

To reduce this computational burden, one class of methods exploits the fact that in practice the data are only informative in a low-dimensional subspace of . We highlight the approach of spantini_optimal_2015 for the linear Gaussian setting, which defines the posterior covariance as a negative semi-definite update to the prior covariance. The work spantini_optimal_2015

shows that the optimal low-rank approximation to this update lies in a subspace spanned by directions identified from dominant eigenvalue-eigenvector pairs of a matrix pencil that encodes a trade-off between the prior uncertainty and the Fisher information of the measurement model. This low-rank approximation performs well when

, as is common in data assimilation problems for weather and climate kalnay_atmospheric_2002; bennett_inverse_2005. More broadly, this approach is beneficial in cases where the eigenvalues of the matrix pencil exhibit rapid decay, which can occur even if : some examples are discussed in the introduction of spantini_optimal_2015. Extensions of the approach in spantini_optimal_2015 have been used to develop low rank approximation strategies for inverse problems with nonlinear forward models cui2014likelihood, goal-oriented notions of optimality spantini2017goal

, and low-rank tensor-based approaches to reducing computational complexity in time as well as state

benner2018compression.

A complementary class of methods employs model reduction, which seeks to reduce the expense of evaluating the forward model . In projection-based model reduction, a high-dimensional linear dynamical system is approximated by projecting the system operators onto a low-dimensional subspace, yielding an inexpensive low-dimensional dynamical system gugercin2004survey; antoulas2005approximation; benner_survey_2015; antoulas_interpolatory_2020. One popular strategy known as balanced truncation moore1981principal; mullis1976synthesis defines the projection subspace as the span of the dominant eigenvectors of a matrix pencil that encodes a trade-off between states that the system can easily reach and states that contribute more to the system output. States are retained if they are both highly observable and easily reachable; otherwise, they are discarded. The resulting reduced-order dynamics evolve in a way that approximate the input-output map of the original large-order dynamical system with high fidelity, frequently with far smaller system order gugercin2004survey; lieberman2013goal; benner2017balanced; petreczky2012theoretical.

This work considers the Bayesian inference setting where the parameter to be inferred is the initial state (at time ) of a linear dynamical system, which is presumed to have a Gaussian prior distribution. A linear observer views the evolving state at selected times in the presence of Gaussian measurement noise. The primary goal of this work is then to propose a new method of model reduction for linear dynamical systems in this setting that exploits natural connections between (i) the optimal posterior approximation strategy of spantini_optimal_2015 and (ii) the system-theoretic model reduction method of balanced truncation. Our contributions are:

1. We define a notion of prior compatibility with system dynamics that allows the prior covariance matrix to be interpreted as a reachability Gramian in the inference context.

2. We propose two strategies for choosing a compatible prior: (i) by subjecting the system to a stochastic driving force at negative times to “spin up” the system state to a stationary distribution, and (ii) by modifying an incompatible prior to make it compatible. The spin-up strategy is related to ideas from the work redmann_balanced_2018 which considers stochastic dynamical systems.

3. We define a noisy observability Gramian that modifies the usual system-theoretic observability Gramian definition to account for additive noise in measurements.

4. We argue that the noisy observability Gramian is a natural continuous-time analogue of the Fisher information matrix, and we introduce two observation processes under which the Fisher information matrix recovers (in expectation/in the continuum limit) the noisy observability Gramian up to a scaling constant.

5. We propose a model reduction approach for our inference setting that applies the method of balanced truncation using a compatible prior covariance and the noisy observability Gramian, which inherits system-theoretic stability and error guarantees.

6. We show that the proposed approach can recover the optimal posterior covariance approximation of spantini_optimal_2015 in certain cases.

7. We demonstrate numerically that the proposed approach yields near-optimal approximations when the noisy observability Gramian and Fisher information matrix define similar approximation subspaces, which occurs in our numerical experiments when measurements are closely spaced over a long observation interval. Even when measurements are limited and the approximation subspaces are not particularly close, the proposed approach yields approximations that achieve order-of-magnitude system state dimension reduction with low, albeit sub-optimal, errors.

We note that spantini_optimal_2015 shows that the optimal low-rank posterior update coincides with an oblique projection of the high-dimensional forward map . Our contributions extend this result in two ways: first, we show that the low-rank update is in fact immediately associated with a low-dimensional reduced model for the linear dynamical system. This enables efficient evolution of the reduced dynamical system for low rank inference in settings where the map is only available implicitly through a high-dimensional linear dynamical systems model. Second, we show that with certain observation models, this reduced model has bounded error and inherits stability. We also note an earlier work lieberman2013goal developed a goal-oriented low-rank approach to classical and statistical inverse problems that was also inspired by balanced truncation but departs from the dynamical system setting: the parameter is inferred in a low-dimensional subspace defined by an ‘experiment observability Gramian’ and a goal-oriented ‘prediction observability Gramian’. In contrast, we show that balanced truncation for dynamical systems can be used to develop reduced dynamical systems with desirable properties.

Section 2 introduces our Bayesian inference problem for the initial condition of a linear dynamical system, summarizes the balanced truncation approach to model reduction, and presents the low rank inference procedure via optimal posterior updates from spantini_optimal_2015. Section 3 introduces our Gramian definitions for the inference setting, proposes the balanced truncation posterior approximation approach based on these Gramians, and proves results regarding the error, stability, and optimal posterior covariance approximation of the resulting model. These theoretical results are based on specific relationships between system theoretic Gramians and matrices appearing in inference that are discussed in detail in Section 4. Section 5 presents and discusses numerical demonstrations of the proposed method on two benchmark problems. Section 6 summarizes our discussion and considers directions for further investigation.

## 2 Setting and background

In this section, we introduce our notation (Section 2.1) and formulate the Bayesian inverse problem we consider for a high-dimensional linear dynamical system (Section 2.2). We summarize two approaches to dimension reduction relevant in our setting: the system-theoretic approach to model reduction of linear time-invariant systems (Section 2.3) and the low rank inference approach of spantini_optimal_2015 (Section 2.4).

### 2.1 Notation

Vectors will be denoted by boldface lowercase letters, e.g., , and matrices by boldface uppercase letters, e.g., . We will use to denote the Euclidean () norm, and for symmetric positive (semi-)definite to denote the squared weighted -(semi)norm. For square matrices and of the same size, and mean that is positive definite and positive semidefinite, respectively, with analogous meanings for and . The positive integers , , and

denote the dimensions of state, input, and output vectors of our dynamical system. Random vectors (i.e., vectors of random variables) will not be denoted differently from (deterministic) vectors, but their properties will be noted when they are introduced. The expectation of a random vector

is denoted as and the covariance of two random vectors is denoted .

### 2.2 Problem formulation

We consider the following linear dynamical system:

 dxdt=Ax, (2)

where and . The initial state is unknown. Noisy measurements of the system output are taken at times according to the following measurement model:

 mi≡Cx(ti)+ϵi=CeAtip+ϵi, (3)

where is the state-to-output operator and represents independently and identically distributed additive Gaussian noise, with positive definite covariance . In our analysis, we will consider both finite and measurements, and we will allow the observations times to be either deterministic or randomly distributed. Random observation times are to be interpreted in the sense that they are uncertain at the beginning of the observation process, but become known as measurements are taken. The random observation model is introduced principally to facilitate discussion of the linkage between observability Gramians and Fisher information in Section 4.2.3.

Our task is to infer the unknown initial state from the noisy measurements in eq. 3

. We use a Bayesian statistical approach to do so; i.e., we take as given a Gaussian prior:

 p∼N(0,Γpr) (4)

with prior covariance . The Gaussian prior, combined with our linear Gaussian measurement model (eq. 3), yields the following Gaussian likelihood conditioned on the measurement times:

 m∣(p,t)∼N(Gp,Γobs), (5)

where , , and are given as follows:

 m≡⎡⎢ ⎢⎣m1⋮mn⎤⎥ ⎥⎦, G≡⎡⎢ ⎢⎣CeAt1⋮CeAtn⎤⎥ ⎥⎦,Γobs≡⎡⎢ ⎢ ⎢ ⎢ ⎢⎣ΓϵΓϵ⋱Γϵ⎤⎥ ⎥ ⎥ ⎥ ⎥⎦. (6)

The Bayesian posterior distribution (conditioned on the measurement times) for such a prior and likelihood is again Gaussian:

 p∣(m,t)∼N(μpos,Γpos), (7)

where

 μpos=ΓposG⊤Γ−1obsm,Γpos=(H+Γ−1pr)−1, (8)

where denotes the Fisher information matrix of with respect to the parameter :

 H≡G⊤Γ−1obsG=n∑i=1G⊤iΓ−1ϵGi=n∑i=1eA⊤tiC⊤Γ−1ϵCeAti. (9)

In practical settings, the high state dimension can pose challenges to the computation of the conditional posterior statistics (8), especially when is available only implicitly through a computationally expensive model that evolves the high-dimensional dynamical system (2). Our procedure addresses this challenge by providing a computationally efficient strategy for emulating the dynamical system through model reduction.

### 2.3 System-theoretic model reduction via balanced truncation

Balanced truncation moore1981principal; mullis1976synthesis is a system-theoretic method for projection-based model reduction that yields a cheaply evaluated low-dimensional reduced model whose input-output map approximates that of the original high-dimensional system. Section 2.3.1 describes the system theory setting and Section 2.3.2 presents the balanced truncation method.

#### 2.3.1 System theory background

The usual setting in which balanced truncation is considered involves a linear time-invariant (LTI) system given by

 dxdt =Ax+Bu(t) (10a) y =Cx (10b)

where is a -dimensional input which drives the system filtered through the input port, , and describes the -dimensional observer output, . This system-theoretic setting differs from our inference setting in eq. 2 through the presence of an input and the lack of noise in the output (compared to the measurements defined in eq. 3).

We assume that the system (10) is stable, i.e., the eigenvalues of lie in the open left half-plane. The stable system (10) has infinite reachability Gramian and observability Gramian , given by

 P∞ ≡∫∞0eAtBB⊤eA⊤tdt, Qy ≡∫∞0eA⊤tC⊤CeAtdt. (11)

We use subscripts and to denote the specific Gramians defined in (11), contrasting with generic reachability and observability Gramians, and , which could be defined either as above or in other ways. Note in particular that denotes an observability Gramian tied to the LTI output  (10), whereas we will later introduce a Gramian for noisy measurements  (3). The Gramians are unique solutions to the dual Lyapunov equations

 AP∞+P∞A⊤=−BB⊤, A⊤Qy+QyA=−C⊤C. (12)

These infinite-horizon Gramians for the dynamical system (10) are associated with a reachability energy and an observability energy, defined as quadratic forms with respect to a state , respectively, as

 ∥^x∥2P−1∞≡^x⊤P−1∞^x, ∥^x∥2Qy≡^x⊤Qy^x. (13)

The reachability energy is the minimum -norm of a control input, , required to steer the system from the origin () to , potentially over an infinite time horizon. Similarly, observability energy is the -norm of the (noise-free) output signal over an infinite time horizon generated by the free evolution of the system initialized at with null input, . While the Gramian definitions in eq. 11 are perhaps the most commonly used, finite-horizon and discretized/approximate Gramian definitions are also sometimes employed, the latter especially for large-scale problems where solving eq. 12 is computationally intractable. For details, see, e.g., antoulas2005approximation; BennerB2017; benner2013numerical; gugercin2004survey.

A system is reachable if is positive definite (denoted as ) and a system is observable if . The system is minimal if it is both reachable and observable. In all that follows, we assume that the system is minimal.

#### 2.3.2 Balanced truncation

In projection-based model reduction, a reduced model is obtained by projecting the system operators onto low-dimensional subspaces. In balanced truncation, these subspaces are determined by the dominant eigenvectors of the matrix pencil , i.e., the associated with the dominant eigenvalues, , satisfying

 Qvi=δ2i P−1vi, (14)

where is a reachability Gramian and is an observability Gramian: the standard balanced truncation approach takes and , but we will in Section 3 introduce approaches that use different Gramians. The scalars with are called the Hankel singular values

and are the nonzero singular values of the underlying Hankel operator. We assume that the Hankel singular values are distinct to simplify the presentation. The dominant eigenvectors,

, maximize the following Rayleigh quotient:

 v⊤Qvv⊤P−1v=(∥v∥Q∥v∥P−1)2. (15)

Thus, directions retained by balanced truncation will be simultaneously easy to reach (low reachability energy) and easy to observe (high observability energy), or will realize some trade-off between these two desirable properties.

The balanced truncation procedure has the following explicit algebraic formulation. Suppose and (e.g., through a Cholesky factorization), and let

be the singular value decomposition of

. Then contains the Hankel singular values, Define and note that is invertible with . The ordered generalized eigenvectors, , of are the columns of , and through a renormalization by become the columns of .

is a balancing transformation, meaning that in the transformed coordinates the reachability and observability Gramians are equal diagonal matrices. The transformed LTI system is given by,

 d˜xdt =˜A˜x+˜Bu(t) (16a) ˜y =˜C˜x, (16b)

with

 T˜x =x, ˜A =T−1AT, ˜B =T−1B, ˜C =CT,

and the system (16) is called (principal-axis) balanced because its reachability and observability Gramians and are equal and diagonal, satisfying

 T−1PT−⊤=˜P=Σ=˜Q=T⊤QT. (17)

Order- balanced truncation obtains a reduced dynamical system by transforming to the balancing coordinates and truncating to the leading components of this balanced state. Denote by the leading columns of , respectively, and let denote the upper-left block of . Then, let denote the leading columns of and let denote the leading rows of , i.e.,

 Tr=RZrΔ−12r∈Rd×r,S⊤r=Δ−12rU⊤rL⊤∈Rr×d. (18)

Note that . Then, the order- balanced truncation approximation of (10) is a Petrov-Galerkin approximation given by restricting the state to lie in , and by enforcing (10) on . This results in the reduced-order approximation of eq. 10,

 dxrdt =Arxr+Bru(t), (19a) yr =Crx, (19b)

where , , , and .

Balanced truncation provides guaranteed stability and a priori -error bounds on the approximation error expressed in terms of the Hankel singular values in (14):

###### Lemma 1 (Thm. 7.9 of antoulas2005approximation)

Consider the system (10) with output and reachability and observability Gramians and defined in (11). If the original system (10) is stable, then the reduced model (19) is both stable and balanced: the infinite-horizon reachability and observability Gramians of the reduced state, , are diagonal and equal, and given by

 Pr≡∫∞0eArtBrB⊤reA⊤rtdt=∫∞0eA⊤rtC⊤rCreArtdt≡Qyr=diag(δ1,…,δr)∈Rr×r. (20)

Furthermore, if , then the output of eq. 19 commits an error bounded by twice the sum of the truncated Hankel singular values:

 (21)

### 2.4 Low rank inference via optimal posterior covariance approximation

We now summarize the main results of spantini_optimal_2015 for dimension reduction of the linear Gaussian Bayesian inverse problem by optimal approximation of the posterior covariance and mean. This approach is applicable even when does not entail the evolution of a dynamical system, and thus complements the model reduction approach of Section 2.3.

Let denote the ordered generalized eigenpairs of the pencil , i.e.,

 Hwi=τ2iΓ−1prwi, (22)

with , where denotes the Kronecker delta, and . Then, note that the posterior covariance, , satisfies

 Γpos=(Γ−1pr+H)−1=Γpr−d∑i=1τ2i1+τ2iwiw⊤i. (23)

When there are only a few non-zero eigenvalues, (e.g., when ), or when decays rapidly with , the sum in (23) can be well-approximated by a low-rank symmetric positive semidefinite matrix. The work in spantini_optimal_2015 thus considers the optimal approximation of within the following class of rank- negative semidefinite updates to :

 Mr\coloneqq{M∈Rd×d∣∣M=Γpr−KK⊤≻0 with rank(K)≤r}. (24)

The distance between the true posterior covariance and an approximation within is measured by the Förstner metric forstner_metric_2003 between symmetric positive semidefinite matrices :

 dF(A,B)=d∑i=1ln2(σi), (25)

where are the eigenvalues of the pencil . The Förstner distance is invariant to similarity transformation, and . The Förstner distance is widely used in statistics, being the natural Riemannian metric on the manifold of positive-definite (e.g., full-rank covariance) matrices (bhatia_positive_2006, Section 6.1) that can be generalized with appropriate modifications to rank-deficient manifolds of semi-definite matrices bonnabel_riemannian_2010.

The Förstner-optimal approximation to within the class of rank- updates ,

 ˆΓpos=argminM∈MrdF(Γpos,M) (26)

is then given by spantini_optimal_2015:

 ˆΓpos=Γpr−K∗K⊤∗whereK∗K⊤∗=r∑i=1τ2i1+τ2iwiw⊤i, (27)

with optimal Förstner distance

 dF(Γpos,ˆΓpos)=minM∈MrdF(Γpos,M)=d∑i=r+1ln2(11+τ2i). (28)

An alternative analysis in spantini_optimal_2015 allows one to associate the optimal approximation (27) with the posterior covariance of a projected forward map. Note that the pairs with are the generalized eigenpairs of the pencil . Let denote the oblique spectral projector for onto . Then, let

 ˆG =GΠ⊤r, ˆH =ˆG⊤Γ−1obsˆG, (29)

denote the projected forward map and projected Fisher information, respectively. Then, the optimal posterior covariance approximation (27) can be equivalently expressed as

 ˆΓpos=(Γ−1pr+ˆH)−1. (30)

The work in spantini_optimal_2015 further considers optimal approximation to the posterior mean in two different classes of mean approximations: a class of low-rank (LR) mean approximations given by

 VLRr\coloneqq{v=Nm∣N∈Rd×ndout,rank(N)≤r} (31)

and a class of mean approximations given by low-rank update (LRU) approximations to the posterior covariance:

 VLRUr\coloneqq{v=(Γpr−N)G⊤Γ−1obsm∣N∈Rd×d,rank(N)≤r}. (32)

Within , the class of low-rank mean approximations (31), the following mean approximation is optimal in the sense that it minimizes the

-norm error averaged over the joint distribution of the initial condition (with marginal given by the prior) and the data (e.g., Bayes risk):

 ˆμLRpos=r∑i=1τi1+τ2iwi~w⊤i=ˆΓposˆG⊤Γ−1obsm. (33)

Within , the class of low-rank update mean approximations (32), the optimal mean approximation in the -norm Bayes risk is given by spantini_optimal_2015:

 ˆμLRUpos=ˆΓposG⊤Γ−1obsm. (34)

Note that in a dynamical systems setting, computation of the posterior approximations , , and requires evolving the full high-dimensional dynamics of and then projecting the result to obtain . In the next section, we propose a model reduction approach that yields a low-dimensional dynamical systems model by combining elements from this posterior covariance approximation approach and from balanced truncation.

## 3 Balanced truncation for Bayesian inference

We now explore the application of the balanced truncation procedure of Section 2.3 to obtain a reduced model for the linear dynamical system (2) in an effort to reduce computational cost in our Bayesian inverse problem. To do this, we first negotiate two technical hurdles which frustrate a direct connection: the dynamical system (2) lacks an input and thus a notion of reachability, and the standard infinite-horizon observability Gramian accounts for neither the additive measurement noise nor the specific observation times of the measurement model (3). To reconcile these discrepancies with the inference problem, we introduce suitably modified Gramian definitions in Section 3.1. In Section 3.2, we then present two variations of our proposed balanced truncation approach to Bayesian inference. In  Section 3.3, we establish conditions under which one variation of our proposed method will approximately recover the optimal approximation qualities of the low rank inference procedure established in spantini_optimal_2015, with a faster computation of the forward maps via a stable and balanced reduced approximation of the dynamical system. The second variation we present may appear more natural, though it does not inherit the stability guarantees of a proper balanced truncation. Both variations will be explored in numerical examples in Section 5.

### 3.1 Inference Gramians

We begin by introducing a definition of prior-compatibility which allows the prior covariance, , to be interpreted as a reachability Gramian of a forced system related to (2).

###### Definition 1

The prior covariance, , will be said to be prior-compatible with state dynamics induced by if is negative semidefinite.

If a prior covariance satisfies Definition 1, then there exists some with such that the Lyapunov equation (12) is satisfied. Thus, the prior-compatibility with state dynamics means that the prior covariance matrix is the reachability Gramian of an LTI system with dynamics specified by and an input port matrix . It is not generally necessary to identify a particular port matrix, , for Definition 1 to hold, although in some applications may be naturally provided (for example if the end goal of the inference problem is to determine a control input). Discussions of how a compatible prior might be either naturally defined or obtained by modifying a non-compatible prior are provided in Section 4.1.

We make use of a modified observability Gramian that is motivated by the noisy measurement model for . Recall that the standard observability Gramian defines an energy that is the squared -norm of the output signal . In our inference context, where the measurement at time , , is an output polluted by measurement noise , we quantify the discernability of the signal from the noise by the Mahalanobis distance from the equilibrium state to the conditional distribution of :

 D(0,N(CeAtip,Γϵ))\coloneqq∥∥CeAtip∥∥Γ−1ϵ. (35)

One way to understand why this is a natural way to measure the distance between a distribution (the signal with observation noise) and a point (the stable equilibrium) is to note that for the case of a Gaussian distribution, neighborhoods of the mean of the distribution defined by the Mahalanobis distance contain the most probability relative to any other set with the same Lebesgue measure (Appendix

B). This motivates our definition of a modified observability Gramian for noisy measurements.

###### Definition 2

The noisy observability Gramian is given by

 Qm≡∫∞0eA⊤tC⊤Γ−1ϵCeAtdt. (36)

Thus, defines an energy based on integrating over all positive time the squared Mahalanobis distance between the equilibrium and the noise-polluted outputs. Note that the Fisher information matrix  (9) can be viewed as defining an analogous energy based on summing over discrete measurement times. This analogy motivates an approximation of in terms of discussed in subsequent sections.

### 3.2 Method

We propose a model reduction approach for the inference problem defined in Section 2.2 that applies the balanced truncation procedure of Section 2.3 by taking first a compatible prior covariance, , and then using as inference Gramians and . This leads to the following reduced-order model for the linear dynamics (2):

 dxrdt=Arxr,xr(0)=S⊤rp, (37)

where , and the following reduced-order approximation to the measurement model (3):

 mi≈Crxr(ti)+ϵi, (38)

where , and as before. The reduced dynamics (37) and measurement model (38) define a reduced forward map from the reduced state to the noise-free output:

 (39)

We then define a corresponding reduced-order approximate Fisher information matrix,

 HBT≡G⊤BTΓ−1obsGBT=Sr(n∑i=1eA⊤rtiC⊤rΓ−1ϵCreArti)S⊤r, (40)

and associated posterior mean and covariance approximations:

 μpos,BT=Γpos,BTG⊤BTΓ−1obsm,Γpos,BT=(HBT+Γ−1pr)−1. (41)

As previously discussed at the end of Section 3.1, the noisy observability Gramian can be viewed as a continuous-time analogue of . This suggests a variation of the proposed method that uses the inference Gramians and to define the balanced truncation. We label the two variations of our proposed method as follows:

1. BT-Q, which uses the inference Gramians in order to define the posterior approximations in (41), and

2. BT-H, which uses the inference Gramians in order to define the posterior approximations in (41).

While the BT-H approach may seem to be a more natural choice for our inference setting, we will show in Section 3.3 that the BT-Q approach inherits desirable properties from standard system-theoretic balanced truncation that the BT-H approach generally does not enjoy.

### 3.3 Properties

We note that while the principal goal of the balanced truncation approach in the present work is to provide a reduced model approximation to posterior quantities, the proposed BT-Q approach inherits guarantees from the system-theoretic LTI setting, as described in our first result.

###### Lemma 2

Suppose is prior-compatible with the state dynamics (2) as defined in Definition 1. Then, there exists a so that and are the infinite reachability and observability Gramians of the following LTI system:

 dxdt=Ax+Bu(t),¯y(t)=¯Cx(t), (42)

where .

Additionally, the balanced reduced LTI system (19) based on the inference Gramians and is balanced, stable, and satisfies the following Hankel singular value tail bound: if , then

 ∥¯Cx(t)−¯Crxr(t)∥L2(R)≤(2d∑j=r+1δj)∥u∥L2(R), (43)

where are the Hankel singular values associated with the inference Gramians and .

###### Proof

The first part of Lemma 2 follows immediately from the compatibility condition in Definition 1 and from replacing with in eq. 11. The second part then follows from direct application of Lemma 1 to (42).

###### Corollary 1

The proposed balanced truncation reduced model (37) for the inference dynamical system (2) is stable.

Corollary 1 has practical significance for the proposed BT-Q approach of Section 3.2, because computing the approximate posterior quantities requires evolving the reduced dynamics given by (2): Corollary 1 guarantees that these reduced dynamics preserve the stability of the full dynamics. On the other hand, while the error and balancing guarantees of Lemma 2 for LTI systems are not directly applicable to the inference setting we consider here, they may serve as the basis for future extension to inference problems with control inputs.

We now turn our attention to the quality of the proposed posterior covariance approximation eq. 41: again, we focus on the BT-Q approach. The BT-Q posterior covariance approximation must tautologically be suboptimal relative to the optimal approximation scheme of spantini_optimal_2015 discussed in Section 2.4. A natural question is: when does covariance approximation approach optimality? In comparing the covariance approximations (41) and (30), it is clear that as , both and ; i.e., both the balanced truncation Fisher information approximation and the projected Fisher information of spantini_optimal_2015 converge to the true Fisher information and thus the two posterior covariance approximations both converge to the full posterior covariance. However, since our goal is a low rank approximation with , we would like some sense of when we can expect to be close to for general .

For resulting from the BT-Q approach, approximate agreement with requires that the dominant generalized eigenvectors of the pencil be close to the dominant generalized eigenvectors of the pencil . This would follow when the Fisher information  (9) is approximately proportional to the noisy observability Gramian in Definition 2, that is for a scaling factor . This would seem to apply to situations in which the observations are of sufficient frequency and duration for the discrete sum in to provide, up to normalization, a good approximation to the integral in . In Section 4.2, we formalize this idea and identify the scaling factor as the (average) inter-observation interval and establish two limits of observation protocols under which we can obtain as a rescaled limit of .

Even when the spectral projections based on and are close, there remains a difference between and because the former involves the evolution of the full dynamics represented by matrix , whereas the proposed balanced truncation approximations require only the evolution of a computationally inexpensive reduced system . Remarkably, we can formulate a relation between the balanced truncation approximation to the Fisher information and the optimal low rank inference approximation without a further assumption about the size of .

To express this connection precisely, we consider the following noisy observability Gramian for the balanced truncation approximation:

 QBTm≡Sr(∫∞0eA⊤rtC⊤rΓ−1ϵCreArtdt)S⊤r∈Rd×d. (44)

The quantity can be viewed as a continuous-time analogue of : it lifts the reduced Gramian for the reduced state to the full state via the map . Thus if the sum in well-approximates the integral in , i.e., , we would also expect because and have the same sum-to-integral relation as and , differing only in the use of reduced operators and vs. full operators and .

###### Proposition 1

Let and , respectively, denote the ordered generalized eigenpairs of the pencil and , respectively, with normalization and . Then, if the dominant eigenpairs match up to a scaling constant , i.e., if

 δ2i=hτ2i,wi=vi,∀i≤r, (45)

then .

###### Proof

The integral in eq. 44 is the noisy observability Gramian of the balanced truncation reduced system and thus, by Lemma 2 it is balanced, i.e.,

 ∫∞0eA⊤rtC⊤rΓ−1ϵCreArtdt=diag{δ1,δ2,…,δr}≡Δr. (46)

Define and note from its definition in (18) that . Thus, eq. 44 is equivalent to

 QBTm=Γ−1prVrΔ2rV⊤rΓ−1pr.

Similarly defining , we have from (29),

 ˆH=Γ−1prWrΣ2rW⊤rΓ−1pr

with . Under the assumption (45), this gives us .

Proposition 1 provides us with a bridge between and for any , including . It suggests the following argument: as noted above, when , generally will also be true, and Proposition 1 would imply , so that . That is, when and are approximately proportional, the BT-Q posterior covariance approximation converges to the optimal one without further assumption on . Rigorizing such an argument would require a formulation of Proposition 1 with approximate rather than precise equalities of the dominant eigenpairs. Rather than pursuing this cumbersome exercise, we instead support this argument by numerical experiments in Section 5. We confirm that in observation scenarios where the Fisher information matrix and noisy observability Gramian are approximately proportional, the BT-Q approach leads to near-optimal posterior covariance approximations.

The preceding discussion provides intuition for what types of measurement models will lead to posterior covariance approximations that are close to the optimal low-rank approach of spantini_optimal_2015. Our analysis and discussion do not address the quality of the posterior mean approximation in (41): while the balanced truncation mean approximation does belong to the class of low rank mean approximations (31) considered in spantini_optimal_2015, the balanced truncation mean approximation differs in the use of the balanced truncation adjoint approximation vs. the projected full adjoint used in the optimal low-rank mean (33) of spantini_optimal_2015. Numerical experiments in Section 5 compare empirically the balanced truncation means with the optimal means of spantini_optimal_2015. We emphasize that in the dynamical system setting, both optimal approximations of spantini_optimal_2015 require the evolution of the full dynamics, whereas the proposed balanced truncation approximation approach requires only the evolution of a computationally inexpensive reduced system.

Until now we have focused on properties of the BT-Q approach, and the result of Proposition 1 relies on the assumption (45

) that the dominant eigenspaces of

and match, and that the associated eigenvalues match up to a constant scale factor. Comparable guarantees for the BT-H approach, which involves replacing with in the balanced truncation approximation, appear unlikely for general choices of Fisher information matrix, since they cannot generally be interpreted as infinite-horizon observability Gramians for any minimal LTI system, and so Lemma 1 does not directly apply. However, when the dominant generalized eigenvectors of the pencils and coincide, then the BT-H and BT-Q reduced models will be equivalent, as stated in our next result (note that no conditions are imposed on the generalized eigenvalues here).

###### Proposition 2

Let and , respectively, denote the ordered generalized eigenpairs of the pencil and , respectively, with normalization and . Then, if the dominant eigenvectors match, i.e., if