1 Introduction
The task of updating computational models to take into account empirical observations — a task that plays a central role in data assimilation — occurs in a diverse range of disciplines, including geophysics, oceanography, and atmospheric sciences kalnay_atmospheric_2002; bennett_inverse_2005; palacios2006non; evensen_data_2009. Specific applications include sea ice modeling rounce_quantifying_2020; yang2020tipping, geothermal reservoir model calibration cui_bayesian_2011; cui2019posteriori, and nitrogen concentration of water bodies de2002bayesian. In the Bayesian inferential approach to data assimilation, model parameters are related to observed data through a measurement model that combines a forward map with additive stochastic noise :
(1) 
Before observations are obtained, uncertainty about the model parameters is encoded in a priorprobability distribution for
. The probability distribution of the measurements conditioned on the parameter,
, is called the likelihood. After observations are obtained, the Bayesian approach computes an updated posterior distribution for which reflects the uncertainty in the parameters conditioned on the measured data and is proportional to the product of the prior and the likelihood of the data. Once the posterior distribution has been characterized, subsequent steps in application may require computing and utilizing posterior statistics. In the linear Gaussian case, this requires computing only a posterior mean and covariance, but for general nonGaussian posteriors, these statistics are approximated using sampling procedures soize2020sampling (e.g., rejection sampling vonNeumann1951; deligiannidis2020ensemble; devroye1986generalor Markov chain Monte Carlo methods
miroshnikov2015parallel; apte2007sampling; hensman2015mcmc). Modern overviews of inference techniques and algorithms can be found in excellent survey articles stuart_inverse_2010; dashti_bayesian_2017 and textbooks tarantola_inverse_2005; calvetti_introduction_2007; law_data_2015.In practical settings, several factors may pose computational challenges to the use of Bayesian inference. First, the parameter dimension is often large; e.g., when represents the highdimensional spatial discretization of a continuous field. Second, it is common to be interested in parameter values, , at an initial time, with observations, , taken after the initial time. In these settings, the forward model may involve simulation of a highdimensional dynamical system, making the characterization of the posterior distribution computationally expensive.
To reduce this computational burden, one class of methods exploits the fact that in practice the data are only informative in a lowdimensional subspace of . We highlight the approach of spantini_optimal_2015 for the linear Gaussian setting, which defines the posterior covariance as a negative semidefinite update to the prior covariance. The work spantini_optimal_2015
shows that the optimal lowrank approximation to this update lies in a subspace spanned by directions identified from dominant eigenvalueeigenvector pairs of a matrix pencil that encodes a tradeoff between the prior uncertainty and the Fisher information of the measurement model. This lowrank approximation performs well when
, as is common in data assimilation problems for weather and climate kalnay_atmospheric_2002; bennett_inverse_2005. More broadly, this approach is beneficial in cases where the eigenvalues of the matrix pencil exhibit rapid decay, which can occur even if : some examples are discussed in the introduction of spantini_optimal_2015. Extensions of the approach in spantini_optimal_2015 have been used to develop low rank approximation strategies for inverse problems with nonlinear forward models cui2014likelihood, goaloriented notions of optimality spantini2017goal, and lowrank tensorbased approaches to reducing computational complexity in time as well as state
benner2018compression.A complementary class of methods employs model reduction, which seeks to reduce the expense of evaluating the forward model . In projectionbased model reduction, a highdimensional linear dynamical system is approximated by projecting the system operators onto a lowdimensional subspace, yielding an inexpensive lowdimensional dynamical system gugercin2004survey; antoulas2005approximation; benner_survey_2015; antoulas_interpolatory_2020. One popular strategy known as balanced truncation moore1981principal; mullis1976synthesis defines the projection subspace as the span of the dominant eigenvectors of a matrix pencil that encodes a tradeoff between states that the system can easily reach and states that contribute more to the system output. States are retained if they are both highly observable and easily reachable; otherwise, they are discarded. The resulting reducedorder dynamics evolve in a way that approximate the inputoutput map of the original largeorder dynamical system with high fidelity, frequently with far smaller system order gugercin2004survey; lieberman2013goal; benner2017balanced; petreczky2012theoretical.
1.1 Contributions of this article
This work considers the Bayesian inference setting where the parameter to be inferred is the initial state (at time ) of a linear dynamical system, which is presumed to have a Gaussian prior distribution. A linear observer views the evolving state at selected times in the presence of Gaussian measurement noise. The primary goal of this work is then to propose a new method of model reduction for linear dynamical systems in this setting that exploits natural connections between (i) the optimal posterior approximation strategy of spantini_optimal_2015 and (ii) the systemtheoretic model reduction method of balanced truncation. Our contributions are:

We define a notion of prior compatibility with system dynamics that allows the prior covariance matrix to be interpreted as a reachability Gramian in the inference context.

We propose two strategies for choosing a compatible prior: (i) by subjecting the system to a stochastic driving force at negative times to “spin up” the system state to a stationary distribution, and (ii) by modifying an incompatible prior to make it compatible. The spinup strategy is related to ideas from the work redmann_balanced_2018 which considers stochastic dynamical systems.

We define a noisy observability Gramian that modifies the usual systemtheoretic observability Gramian definition to account for additive noise in measurements.

We argue that the noisy observability Gramian is a natural continuoustime analogue of the Fisher information matrix, and we introduce two observation processes under which the Fisher information matrix recovers (in expectation/in the continuum limit) the noisy observability Gramian up to a scaling constant.

We propose a model reduction approach for our inference setting that applies the method of balanced truncation using a compatible prior covariance and the noisy observability Gramian, which inherits systemtheoretic stability and error guarantees.

We show that the proposed approach can recover the optimal posterior covariance approximation of spantini_optimal_2015 in certain cases.

We demonstrate numerically that the proposed approach yields nearoptimal approximations when the noisy observability Gramian and Fisher information matrix define similar approximation subspaces, which occurs in our numerical experiments when measurements are closely spaced over a long observation interval. Even when measurements are limited and the approximation subspaces are not particularly close, the proposed approach yields approximations that achieve orderofmagnitude system state dimension reduction with low, albeit suboptimal, errors.
We note that spantini_optimal_2015 shows that the optimal lowrank posterior update coincides with an oblique projection of the highdimensional forward map . Our contributions extend this result in two ways: first, we show that the lowrank update is in fact immediately associated with a lowdimensional reduced model for the linear dynamical system. This enables efficient evolution of the reduced dynamical system for low rank inference in settings where the map is only available implicitly through a highdimensional linear dynamical systems model. Second, we show that with certain observation models, this reduced model has bounded error and inherits stability. We also note an earlier work lieberman2013goal developed a goaloriented lowrank approach to classical and statistical inverse problems that was also inspired by balanced truncation but departs from the dynamical system setting: the parameter is inferred in a lowdimensional subspace defined by an ‘experiment observability Gramian’ and a goaloriented ‘prediction observability Gramian’. In contrast, we show that balanced truncation for dynamical systems can be used to develop reduced dynamical systems with desirable properties.
Section 2 introduces our Bayesian inference problem for the initial condition of a linear dynamical system, summarizes the balanced truncation approach to model reduction, and presents the low rank inference procedure via optimal posterior updates from spantini_optimal_2015. Section 3 introduces our Gramian definitions for the inference setting, proposes the balanced truncation posterior approximation approach based on these Gramians, and proves results regarding the error, stability, and optimal posterior covariance approximation of the resulting model. These theoretical results are based on specific relationships between system theoretic Gramians and matrices appearing in inference that are discussed in detail in Section 4. Section 5 presents and discusses numerical demonstrations of the proposed method on two benchmark problems. Section 6 summarizes our discussion and considers directions for further investigation.
2 Setting and background
In this section, we introduce our notation (Section 2.1) and formulate the Bayesian inverse problem we consider for a highdimensional linear dynamical system (Section 2.2). We summarize two approaches to dimension reduction relevant in our setting: the systemtheoretic approach to model reduction of linear timeinvariant systems (Section 2.3) and the low rank inference approach of spantini_optimal_2015 (Section 2.4).
2.1 Notation
Vectors will be denoted by boldface lowercase letters, e.g., , and matrices by boldface uppercase letters, e.g., . We will use to denote the Euclidean () norm, and for symmetric positive (semi)definite to denote the squared weighted (semi)norm. For square matrices and of the same size, and mean that is positive definite and positive semidefinite, respectively, with analogous meanings for and . The positive integers , , and
denote the dimensions of state, input, and output vectors of our dynamical system. Random vectors (i.e., vectors of random variables) will not be denoted differently from (deterministic) vectors, but their properties will be noted when they are introduced. The expectation of a random vector
is denoted as and the covariance of two random vectors is denoted .2.2 Problem formulation
We consider the following linear dynamical system:
(2) 
where and . The initial state is unknown. Noisy measurements of the system output are taken at times according to the following measurement model:
(3) 
where is the statetooutput operator and represents independently and identically distributed additive Gaussian noise, with positive definite covariance . In our analysis, we will consider both finite and measurements, and we will allow the observations times to be either deterministic or randomly distributed. Random observation times are to be interpreted in the sense that they are uncertain at the beginning of the observation process, but become known as measurements are taken. The random observation model is introduced principally to facilitate discussion of the linkage between observability Gramians and Fisher information in Section 4.2.3.
Our task is to infer the unknown initial state from the noisy measurements in eq. 3
. We use a Bayesian statistical approach to do so; i.e., we take as given a Gaussian prior:
(4) 
with prior covariance . The Gaussian prior, combined with our linear Gaussian measurement model (eq. 3), yields the following Gaussian likelihood conditioned on the measurement times:
(5) 
where , , and are given as follows:
(6) 
The Bayesian posterior distribution (conditioned on the measurement times) for such a prior and likelihood is again Gaussian:
(7) 
where
(8) 
where denotes the Fisher information matrix of with respect to the parameter :
(9) 
In practical settings, the high state dimension can pose challenges to the computation of the conditional posterior statistics (8), especially when is available only implicitly through a computationally expensive model that evolves the highdimensional dynamical system (2). Our procedure addresses this challenge by providing a computationally efficient strategy for emulating the dynamical system through model reduction.
2.3 Systemtheoretic model reduction via balanced truncation
Balanced truncation moore1981principal; mullis1976synthesis is a systemtheoretic method for projectionbased model reduction that yields a cheaply evaluated lowdimensional reduced model whose inputoutput map approximates that of the original highdimensional system. Section 2.3.1 describes the system theory setting and Section 2.3.2 presents the balanced truncation method.
2.3.1 System theory background
The usual setting in which balanced truncation is considered involves a linear timeinvariant (LTI) system given by
(10a)  
(10b) 
where is a dimensional input which drives the system filtered through the input port, , and describes the dimensional observer output, . This systemtheoretic setting differs from our inference setting in eq. 2 through the presence of an input and the lack of noise in the output (compared to the measurements defined in eq. 3).
We assume that the system (10) is stable, i.e., the eigenvalues of lie in the open left halfplane. The stable system (10) has infinite reachability Gramian and observability Gramian , given by
(11) 
We use subscripts and to denote the specific Gramians defined in (11), contrasting with generic reachability and observability Gramians, and , which could be defined either as above or in other ways. Note in particular that denotes an observability Gramian tied to the LTI output (10), whereas we will later introduce a Gramian for noisy measurements (3). The Gramians are unique solutions to the dual Lyapunov equations
(12) 
These infinitehorizon Gramians for the dynamical system (10) are associated with a reachability energy and an observability energy, defined as quadratic forms with respect to a state , respectively, as
(13) 
The reachability energy is the minimum norm of a control input, , required to steer the system from the origin () to , potentially over an infinite time horizon. Similarly, observability energy is the norm of the (noisefree) output signal over an infinite time horizon generated by the free evolution of the system initialized at with null input, . While the Gramian definitions in eq. 11 are perhaps the most commonly used, finitehorizon and discretized/approximate Gramian definitions are also sometimes employed, the latter especially for largescale problems where solving eq. 12 is computationally intractable. For details, see, e.g., antoulas2005approximation; BennerB2017; benner2013numerical; gugercin2004survey.
A system is reachable if is positive definite (denoted as ) and a system is observable if . The system is minimal if it is both reachable and observable. In all that follows, we assume that the system is minimal.
2.3.2 Balanced truncation
In projectionbased model reduction, a reduced model is obtained by projecting the system operators onto lowdimensional subspaces. In balanced truncation, these subspaces are determined by the dominant eigenvectors of the matrix pencil , i.e., the associated with the dominant eigenvalues, , satisfying
(14) 
where is a reachability Gramian and is an observability Gramian: the standard balanced truncation approach takes and , but we will in Section 3 introduce approaches that use different Gramians. The scalars with are called the Hankel singular values
and are the nonzero singular values of the underlying Hankel operator. We assume that the Hankel singular values are distinct to simplify the presentation. The dominant eigenvectors,
, maximize the following Rayleigh quotient:(15) 
Thus, directions retained by balanced truncation will be simultaneously easy to reach (low reachability energy) and easy to observe (high observability energy), or will realize some tradeoff between these two desirable properties.
The balanced truncation procedure has the following explicit algebraic formulation. Suppose and (e.g., through a Cholesky factorization), and let
be the singular value decomposition of
. Then contains the Hankel singular values, Define and note that is invertible with . The ordered generalized eigenvectors, , of are the columns of , and through a renormalization by become the columns of .is a balancing transformation, meaning that in the transformed coordinates the reachability and observability Gramians are equal diagonal matrices. The transformed LTI system is given by,
(16a)  
(16b) 
with
and the system (16) is called (principalaxis) balanced because its reachability and observability Gramians and are equal and diagonal, satisfying
(17) 
Order balanced truncation obtains a reduced dynamical system by transforming to the balancing coordinates and truncating to the leading components of this balanced state. Denote by the leading columns of , respectively, and let denote the upperleft block of . Then, let denote the leading columns of and let denote the leading rows of , i.e.,
(18) 
Note that . Then, the order balanced truncation approximation of (10) is a PetrovGalerkin approximation given by restricting the state to lie in , and by enforcing (10) on . This results in the reducedorder approximation of eq. 10,
(19a)  
(19b) 
where , , , and .
Balanced truncation provides guaranteed stability and a priori error bounds on the approximation error expressed in terms of the Hankel singular values in (14):
Lemma 1 (Thm. 7.9 of antoulas2005approximation)
Consider the system (10) with output and reachability and observability Gramians and defined in (11). If the original system (10) is stable, then the reduced model (19) is both stable and balanced: the infinitehorizon reachability and observability Gramians of the reduced state, , are diagonal and equal, and given by
(20) 
Furthermore, if , then the output of eq. 19 commits an error bounded by twice the sum of the truncated Hankel singular values:
(21) 
2.4 Low rank inference via optimal posterior covariance approximation
We now summarize the main results of spantini_optimal_2015 for dimension reduction of the linear Gaussian Bayesian inverse problem by optimal approximation of the posterior covariance and mean. This approach is applicable even when does not entail the evolution of a dynamical system, and thus complements the model reduction approach of Section 2.3.
Let denote the ordered generalized eigenpairs of the pencil , i.e.,
(22) 
with , where denotes the Kronecker delta, and . Then, note that the posterior covariance, , satisfies
(23) 
When there are only a few nonzero eigenvalues, (e.g., when ), or when decays rapidly with , the sum in (23) can be wellapproximated by a lowrank symmetric positive semidefinite matrix. The work in spantini_optimal_2015 thus considers the optimal approximation of within the following class of rank negative semidefinite updates to :
(24) 
The distance between the true posterior covariance and an approximation within is measured by the Förstner metric forstner_metric_2003 between symmetric positive semidefinite matrices :
(25) 
where are the eigenvalues of the pencil . The Förstner distance is invariant to similarity transformation, and . The Förstner distance is widely used in statistics, being the natural Riemannian metric on the manifold of positivedefinite (e.g., fullrank covariance) matrices (bhatia_positive_2006, Section 6.1) that can be generalized with appropriate modifications to rankdeficient manifolds of semidefinite matrices bonnabel_riemannian_2010.
The Förstneroptimal approximation to within the class of rank updates ,
(26) 
is then given by spantini_optimal_2015:
(27) 
with optimal Förstner distance
(28) 
An alternative analysis in spantini_optimal_2015 allows one to associate the optimal approximation (27) with the posterior covariance of a projected forward map. Note that the pairs with are the generalized eigenpairs of the pencil . Let denote the oblique spectral projector for onto . Then, let
(29) 
denote the projected forward map and projected Fisher information, respectively. Then, the optimal posterior covariance approximation (27) can be equivalently expressed as
(30) 
The work in spantini_optimal_2015 further considers optimal approximation to the posterior mean in two different classes of mean approximations: a class of lowrank (LR) mean approximations given by
(31) 
and a class of mean approximations given by lowrank update (LRU) approximations to the posterior covariance:
(32) 
Within , the class of lowrank mean approximations (31), the following mean approximation is optimal in the sense that it minimizes the
norm error averaged over the joint distribution of the initial condition (with marginal given by the prior) and the data (e.g., Bayes risk):
(33) 
Within , the class of lowrank update mean approximations (32), the optimal mean approximation in the norm Bayes risk is given by spantini_optimal_2015:
(34) 
Note that in a dynamical systems setting, computation of the posterior approximations , , and requires evolving the full highdimensional dynamics of and then projecting the result to obtain . In the next section, we propose a model reduction approach that yields a lowdimensional dynamical systems model by combining elements from this posterior covariance approximation approach and from balanced truncation.
3 Balanced truncation for Bayesian inference
We now explore the application of the balanced truncation procedure of Section 2.3 to obtain a reduced model for the linear dynamical system (2) in an effort to reduce computational cost in our Bayesian inverse problem. To do this, we first negotiate two technical hurdles which frustrate a direct connection: the dynamical system (2) lacks an input and thus a notion of reachability, and the standard infinitehorizon observability Gramian accounts for neither the additive measurement noise nor the specific observation times of the measurement model (3). To reconcile these discrepancies with the inference problem, we introduce suitably modified Gramian definitions in Section 3.1. In Section 3.2, we then present two variations of our proposed balanced truncation approach to Bayesian inference. In Section 3.3, we establish conditions under which one variation of our proposed method will approximately recover the optimal approximation qualities of the low rank inference procedure established in spantini_optimal_2015, with a faster computation of the forward maps via a stable and balanced reduced approximation of the dynamical system. The second variation we present may appear more natural, though it does not inherit the stability guarantees of a proper balanced truncation. Both variations will be explored in numerical examples in Section 5.
3.1 Inference Gramians
We begin by introducing a definition of priorcompatibility which allows the prior covariance, , to be interpreted as a reachability Gramian of a forced system related to (2).
Definition 1
The prior covariance, , will be said to be priorcompatible with state dynamics induced by if is negative semidefinite.
If a prior covariance satisfies Definition 1, then there exists some with such that the Lyapunov equation (12) is satisfied. Thus, the priorcompatibility with state dynamics means that the prior covariance matrix is the reachability Gramian of an LTI system with dynamics specified by and an input port matrix . It is not generally necessary to identify a particular port matrix, , for Definition 1 to hold, although in some applications may be naturally provided (for example if the end goal of the inference problem is to determine a control input). Discussions of how a compatible prior might be either naturally defined or obtained by modifying a noncompatible prior are provided in Section 4.1.
We make use of a modified observability Gramian that is motivated by the noisy measurement model for . Recall that the standard observability Gramian defines an energy that is the squared norm of the output signal . In our inference context, where the measurement at time , , is an output polluted by measurement noise , we quantify the discernability of the signal from the noise by the Mahalanobis distance from the equilibrium state to the conditional distribution of :
(35) 
One way to understand why this is a natural way to measure the distance between a distribution (the signal with observation noise) and a point (the stable equilibrium) is to note that for the case of a Gaussian distribution, neighborhoods of the mean of the distribution defined by the Mahalanobis distance contain the most probability relative to any other set with the same Lebesgue measure (Appendix
B). This motivates our definition of a modified observability Gramian for noisy measurements.Definition 2
The noisy observability Gramian is given by
(36) 
Thus, defines an energy based on integrating over all positive time the squared Mahalanobis distance between the equilibrium and the noisepolluted outputs. Note that the Fisher information matrix (9) can be viewed as defining an analogous energy based on summing over discrete measurement times. This analogy motivates an approximation of in terms of discussed in subsequent sections.
3.2 Method
We propose a model reduction approach for the inference problem defined in Section 2.2 that applies the balanced truncation procedure of Section 2.3 by taking first a compatible prior covariance, , and then using as inference Gramians and . This leads to the following reducedorder model for the linear dynamics (2):
(37) 
where , and the following reducedorder approximation to the measurement model (3):
(38) 
where , and as before. The reduced dynamics (37) and measurement model (38) define a reduced forward map from the reduced state to the noisefree output:
(39) 
We then define a corresponding reducedorder approximate Fisher information matrix,
(40) 
and associated posterior mean and covariance approximations:
(41) 
As previously discussed at the end of Section 3.1, the noisy observability Gramian can be viewed as a continuoustime analogue of . This suggests a variation of the proposed method that uses the inference Gramians and to define the balanced truncation. We label the two variations of our proposed method as follows:

BTQ, which uses the inference Gramians in order to define the posterior approximations in (41), and

BTH, which uses the inference Gramians in order to define the posterior approximations in (41).
While the BTH approach may seem to be a more natural choice for our inference setting, we will show in Section 3.3 that the BTQ approach inherits desirable properties from standard systemtheoretic balanced truncation that the BTH approach generally does not enjoy.
3.3 Properties
We note that while the principal goal of the balanced truncation approach in the present work is to provide a reduced model approximation to posterior quantities, the proposed BTQ approach inherits guarantees from the systemtheoretic LTI setting, as described in our first result.
Lemma 2
Suppose is priorcompatible with the state dynamics (2) as defined in Definition 1. Then, there exists a so that and are the infinite reachability and observability Gramians of the following LTI system:
(42) 
where .
Additionally, the balanced reduced LTI system (19) based on the inference Gramians and is balanced, stable, and satisfies the following Hankel singular value tail bound: if , then
(43) 
where are the Hankel singular values associated with the inference Gramians and .
Proof
The first part of Lemma 2 follows immediately from the compatibility condition in Definition 1 and from replacing with in eq. 11. The second part then follows from direct application of Lemma 1 to (42).
Corollary 1
Corollary 1 has practical significance for the proposed BTQ approach of Section 3.2, because computing the approximate posterior quantities requires evolving the reduced dynamics given by (2): Corollary 1 guarantees that these reduced dynamics preserve the stability of the full dynamics. On the other hand, while the error and balancing guarantees of Lemma 2 for LTI systems are not directly applicable to the inference setting we consider here, they may serve as the basis for future extension to inference problems with control inputs.
We now turn our attention to the quality of the proposed posterior covariance approximation eq. 41: again, we focus on the BTQ approach. The BTQ posterior covariance approximation must tautologically be suboptimal relative to the optimal approximation scheme of spantini_optimal_2015 discussed in Section 2.4. A natural question is: when does covariance approximation approach optimality? In comparing the covariance approximations (41) and (30), it is clear that as , both and ; i.e., both the balanced truncation Fisher information approximation and the projected Fisher information of spantini_optimal_2015 converge to the true Fisher information and thus the two posterior covariance approximations both converge to the full posterior covariance. However, since our goal is a low rank approximation with , we would like some sense of when we can expect to be close to for general .
For resulting from the BTQ approach, approximate agreement with requires that the dominant generalized eigenvectors of the pencil be close to the dominant generalized eigenvectors of the pencil . This would follow when the Fisher information (9) is approximately proportional to the noisy observability Gramian in Definition 2, that is for a scaling factor . This would seem to apply to situations in which the observations are of sufficient frequency and duration for the discrete sum in to provide, up to normalization, a good approximation to the integral in . In Section 4.2, we formalize this idea and identify the scaling factor as the (average) interobservation interval and establish two limits of observation protocols under which we can obtain as a rescaled limit of .
Even when the spectral projections based on and are close, there remains a difference between and because the former involves the evolution of the full dynamics represented by matrix , whereas the proposed balanced truncation approximations require only the evolution of a computationally inexpensive reduced system . Remarkably, we can formulate a relation between the balanced truncation approximation to the Fisher information and the optimal low rank inference approximation without a further assumption about the size of .
To express this connection precisely, we consider the following noisy observability Gramian for the balanced truncation approximation:
(44) 
The quantity can be viewed as a continuoustime analogue of : it lifts the reduced Gramian for the reduced state to the full state via the map . Thus if the sum in wellapproximates the integral in , i.e., , we would also expect because and have the same sumtointegral relation as and , differing only in the use of reduced operators and vs. full operators and .
Proposition 1
Let and , respectively, denote the ordered generalized eigenpairs of the pencil and , respectively, with normalization and . Then, if the dominant eigenpairs match up to a scaling constant , i.e., if
(45) 
then .
Proof
The integral in eq. 44 is the noisy observability Gramian of the balanced truncation reduced system and thus, by Lemma 2 it is balanced, i.e.,
(46) 
Define and note from its definition in (18) that . Thus, eq. 44 is equivalent to
Similarly defining , we have from (29),
with . Under the assumption (45), this gives us .
Proposition 1 provides us with a bridge between and for any , including . It suggests the following argument: as noted above, when , generally will also be true, and Proposition 1 would imply , so that . That is, when and are approximately proportional, the BTQ posterior covariance approximation converges to the optimal one without further assumption on . Rigorizing such an argument would require a formulation of Proposition 1 with approximate rather than precise equalities of the dominant eigenpairs. Rather than pursuing this cumbersome exercise, we instead support this argument by numerical experiments in Section 5. We confirm that in observation scenarios where the Fisher information matrix and noisy observability Gramian are approximately proportional, the BTQ approach leads to nearoptimal posterior covariance approximations.
The preceding discussion provides intuition for what types of measurement models will lead to posterior covariance approximations that are close to the optimal lowrank approach of spantini_optimal_2015. Our analysis and discussion do not address the quality of the posterior mean approximation in (41): while the balanced truncation mean approximation does belong to the class of low rank mean approximations (31) considered in spantini_optimal_2015, the balanced truncation mean approximation differs in the use of the balanced truncation adjoint approximation vs. the projected full adjoint used in the optimal lowrank mean (33) of spantini_optimal_2015. Numerical experiments in Section 5 compare empirically the balanced truncation means with the optimal means of spantini_optimal_2015. We emphasize that in the dynamical system setting, both optimal approximations of spantini_optimal_2015 require the evolution of the full dynamics, whereas the proposed balanced truncation approximation approach requires only the evolution of a computationally inexpensive reduced system.
Until now we have focused on properties of the BTQ approach, and the result of Proposition 1 relies on the assumption (45
) that the dominant eigenspaces of
and match, and that the associated eigenvalues match up to a constant scale factor. Comparable guarantees for the BTH approach, which involves replacing with in the balanced truncation approximation, appear unlikely for general choices of Fisher information matrix, since they cannot generally be interpreted as infinitehorizon observability Gramians for any minimal LTI system, and so Lemma 1 does not directly apply. However, when the dominant generalized eigenvectors of the pencils and coincide, then the BTH and BTQ reduced models will be equivalent, as stated in our next result (note that no conditions are imposed on the generalized eigenvalues here).Proposition 2
Let and , respectively, denote the ordered generalized eigenpairs of the pencil and , respectively, with normalization and . Then, if the dominant eigenvectors match, i.e., if
Comments
There are no comments yet.