Information sensitivity functions to assess parameter information gain and identifiability of dynamical systems

11/21/2017
by   Sanjay Pant, et al.
0

A new class of functions, called the `Information sensitivity functions' (ISFs), which quantify the information gain about the parameters through the measurements/observables of a dynamical system are presented. These functions can be easily computed through classical sensitivity functions alone and are based on Bayesian and information-theoretic approaches. While marginal information gain is quantified by decrease in differential entropy, correlations between arbitrary sets of parameters are assessed through mutual information. For individual parameters these information gains are also presented as marginal posterior variances, and, to assess the effect of correlations, as conditional variances when other parameters are given. The easy to interpret ISFs can be used to a) identify time-intervals or regions in dynamical system behaviour where information about the parameters is concentrated; b) assess the effect of measurement noise on the information gain for the parameters; c) assess whether sufficient information in an experimental protocol (input, measurements, and their frequency) is available to identify the parameters; d) assess correlation in the posterior distribution of the parameters to identify the sets of parameters that are likely to be indistinguishable; and e) assess identifiability problems for particular sets of parameters.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

08/01/2018

Conditional Decoupling of Quantum Information

Insights from quantum information theory show that correlation measures ...
06/03/2021

MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories

Simulation-based inference enables learning the parameters of a model ev...
01/16/2013

Making Sensitivity Analysis Computationally Efficient

To investigate the robustness of the output probabilities of a Bayesian ...
07/28/2020

Robust parameter estimation in dynamical systems via Statistical Learning with an application to epidemiological models

We propose a robust parameter estimation method for dynamical systems ba...
08/11/2017

A probabilistic scheme for joint parameter estimation and state prediction in complex dynamical systems

Many problems in the geophysical sciences demand the ability to calibrat...
01/06/2019

Causality and Bayesian network PDEs for multiscale representations of porous media

Microscopic (pore-scale) properties of porous media affect and often det...
03/16/2021

Determining the maximum information gain and optimising experimental design in neutron reflectometry using the Fisher information

An approach based on the Fisher information (FI) is developed to quantif...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Sensitivity analysis [1]

has been widely used to determine how the parameters of a dynamical system influence its outputs. When one or more outputs are measured (observed), it quantifies the variation of the observations with respect to the parameters to determine which parameters are most and least influential towards the measurements. Therefore, when performing an inverse problem of estimating the parameters from the measurements, sensitivity analysis is widely used to fix the least influential parameters (as their effect on the measurements is insignificant and removing them reduces the dimensionality of the inverse problem) while focussing on estimation of the most influential parameters. Sensitivity analysis is also used to assess the question of parameter identifiability,

i.e. how easy or difficult is it to identify the parameters from the measurements. This is primarily based on the idea the if the observables are highly sensitive to perturbations in certain parameters then these parameters are likely to be identifiable, and if the observables are insensitive then the parameters are likely to be unidentifiable. However, the magnitude of the sensitivities is hard to interpret, except in the trivial case when the sensitivities are identically zero. Lastly, parameter identifiability based on sensitivity analysis also assesses correlation/dependence between the parameters—through principle component analysis [2], correlation method [3], orthogonal method [4]

, and the eigenvalue method

[5]—to identify which pairs of parameters, owing to the high correlation, are likely to be indistinguishable from each other (also see [6] and the referenced therein). Another method to assess correlations is based on the Fisher Information Matrix [7, 6, 8]

, which can be derived from asymptotic analysis of non-linear least squares estimators

[9, 10]. Thomaseth and Cobeli extended the classical sensitivity functions to ‘generalized sensitivity functions’ (GSFs) which assess information gain about the parameters from the measurements. This method has been widely used to assess identifiability of dynamical systems [10, 11, 12, 13, 14], where regions of high information gain show a sharp increase in the GSFs while oscillations imply correlation with other parameters. There are two drawbacks of GSFs: first, that they are designed to start at 0 and end at 1, which leads to the so called ‘force-to-one’ phenomenon, where even in the absence of information about the parameters the GSFs are forces to end at a value of 1; and second, oscillations in GSFs can be hard to interpret in terms of identifying which sets of parameters are correlated. Based on a pure information-theoretic approach Pant and Lombardi [15] proposed to compute information gain through a decrease in Shannon entropy, which alleviated the shortcomings of GSFs. However, since their method relies on a Monte Carlo type method the computational effort associated with the computation of information gains can be quite large. In this article, a novel method which combines the method of Pant and Lombardi [15] with the classical sensitivity functions to compute information gain about the parameters is presented. The new functions are collectively called ‘Information sensitivity functions’ (ISFs), which assess parameter information gain through sensitivity functions alone, thereby eliminating the need for Monte Carlo runs. These functions (i) are based on Bayesian/information-theoretic methods and do not rely on asymptotic analysis; (ii) are monotonically non-decreasing and therefore do not oscillate; (iii) can assess regions of high information content for individual parameters; iv) can assess parameter correlations between an arbitrary set of parameters; (v) can reveal potential problems in identifiability of system parameters; (vi) can assess the effect of experimental protocol on the inverse problem, for example, which outputs are measured, associated measurement noise, and measurement frequency; and (vii) are easily interpretable.

In what follows, first the theoretic developments are presented in sections 29, followed by their application to three different dynamical systems in section 10

. The three examples are chosen from different areas in mathematical biosciences: i) a Windkessel model, which is widely used a boundary condition in computational fluid dynamics simulations of haemodynamics; ii) the Hodgkin-Huxley model for a biological neuron, which has formed the basis for a variety of ionic models describing excitable tissues; and iii) a kinetics model for the Influenza A virus.

2 The dynamical system and sensitivity equations

Consider the following dynamical system governed by a set of parameterised ordinary differential equations (ODEs)

(1)

where represents time,

is the state vector,

is the parameter vector, the function represents the dynamics, and represents the initial condition at time . The initial conditions may depend on the parameters, and therefore

(2)

The above representation subsumes the case where the initial condition may itself be seen as a parameter. The RHS of the dynamical system, equation (1), can be linearised at at a reference point , to obtain

(3)

where represents evaluated at the reference point. Henceforth, in order to be concise, the explicit dependence of on its arguments is omitted and , without any arguments, is used to denote . Following this notation, equation (3) is concisely written as

(4)

The above linearisation will be used in the next section to study the evolution of the state covariance matrix with time. Let denote the matrix of sensitivity functions for the system in equation (1), i.e. , or

(5)

It is well known that

satisfies the following ODE system, which can be obtained by applying the chain rule of differentiation to equation (

1):

(6)

The goal is to relate the evolution of the sensitivity matrix to the evolution of the covariance of the joint vector of the state and the parameters. Let the subscript denote all quantities at time ; for example, denotes the state vector at time , the corresponding sensitivity matrix, and so on. To relate the sensitivity matrix at time with , a first order discretisation of equation (6) is considered

(7)

and, therefore, the matrix product can be written as

(8)

Next, it is hypothesised that under certain conditions can be seen as the covariance matrix of the state vector at time . These developments are presented in the next two sections.

3 Forward propagation of uncertainty

Since the objective is to study the relationship between the parameters and the state vector, a joint vector of all the state vectors until the current time and the parameter vector is considered. Assume that at time , this joint vector

is distributed according to a multivariate Normal distribution as follows

(9)

To obtain the joint distribution of

(all the state vectors until time and the parameater vector), the linearised dynamical system, equation (4), is utilised. Considering the reference point in equation (4) to be , i.e. considering the linearisation around the mean values of the parameter vector and the state at time , one obtains

(10)

Ignoring the higher order terms, and employing a forward Euler discretisation, one obtains

(11)
Remark 3.1.

is completely determined by and , i.e. given and nothing more can be learned about

. Hence, the forward propagation forms a Markov chain.

Remark 3.2.

, , are evaluated at .

Remark 3.3.

In equation (9), and .

The joint vector can be written from equations (9) and (11) as

(12)

where

represents an Identity matrix of size

,

represents a zero matrix of size

, and

(13)

is a term that does not depend on and . The distribution of can be written from equation (12) as

(14)

and the covariance can be expanded as

(15)

where

(16)
(17)
(18)

If the above evolution of the covariance matrix can be related to the evolution of the sensitivity matrix, as presented in the section 2 and equation (8), then the dependencies between the state vector and the parameters can be studied. This concept is developed in the next section.

4 Relationship between sensitivity and forward propagation of uncertainty

In this section the relationship between the evolution of the sensitivity matrix and the evolution of the covariance matrix of the joint distribution between all the state vectors until time and the parameters is developed. Equation (16) can be expanded as follows

(19)

Assume the following

(20)
(21)
(22)

Under the above assumptions, it can be deduced from equations (4) and (8) that

(23)

Furthermore, equation (17) reads

which, as evident from equation (7), is the standard forward propagation of the sensitivity matrix. Hence

(24)

Finally, the term from equation (18) can be written as

(25)

From equations (23), (24), and (25), it can be concluded that if the initial prior uncertainty in is assumed to be Gaussian with covariance

(26)

then the joint vector of , the parameters, and , the state-vector corresponding to time instants

, can be approximated, by considering only the first-order terms after linearisation, to be a Gaussian distribution with the following covariance

(27)
Remark 4.1.

Note that a prior mean for the vector is assumed to be

(28)

based on which the mean vector of the state will propagate according to equation (11), essentially according to the forward Euler method. While this propagated mean does not directly influence the posterior uncertainty of the parameters, which depends only on the covariance matrix, it is important to note that the sensitivity terms in the covariance matrix of equation (27) are evaluated at the propagated means. The propagated mean of the joint vector is referred throughout this manuscript as .

Remark 4.2.

The required conditions presented in equations (20), (21), and (22), can also be derived without temporal discretisation of the sensitivity and linearised forward model. This is presented in Appendix A.

5 Measurements (observations)

Having established how the covariance of the state and the parameters evolves in relation to the sensitivity matrix, the next task is to extend this framework to include the measurements. Eventually, one wants to obtain an expression for the joint distribution of the measuremennts and the parameters, so that conditioning this joint distribution on the measurements (implying that measurements are known) will yield information about how much can be learned about the parameters.

Consider a linear observation operator where is measured at time according to

(29)

where is the observation operator at time and is the measurement noise. Let be independently (across all measurement times) distributed as

(30)

where is a zero vector and is the covariance structure of the noise. From equations (27) and (29), it is easy to see that follows a Gaussian distribution with the following mean and covariance

(31)

where

(32)

and

(33)
Remark 5.1.

A non-linear observation operator in equation (29), as opposed to the linear operator , does not present any technical challenges to the formulation as it can be linearised at the current mean values. Following this, in equations (32) and (33), would need to be replaced by the tangent operator .

6 Conditional distribution of the parameters

The quantity of interest is the conditional distribution of parameters; i.e how the beliefs about the parameters have changed from the prior beliefs to the posterior beliefs (the conditional distribution) by the measurements. More than the mean of the conditional distribution, the covariance is of interest. This is due to two reasons: i) owing to the Gaussian approximations, the covariance entirely reflects the amount of uncertainty in the parameters; and ii) while the mean of the conditional distribution depends on the measurements, the covariance does not. The latter is significant because a priori the measurement values are not known. Consequently, the average (over all possible measurements) uncertainty in the parameters too is independent of the measurements, and hence can be studied in an a priori manner.

From equation (31), since the joint distribution of the parameter vector and the observables is Gaussian, the conditional distribution of the parameter vector given the measurements is also Gaussian and can be written as

(34)

with

(35)

and

(36)

where

denotes the measurement value (the realisation of the random variable

observed) at . Note that the conditional covariance is independent of these measurement values . Furthermore, since the uncertainty in a Gaussian random variable, quantified by the differential entropy, depends only on the covariance matrix, the posterior distribution uncertainty does not depend on the measurements.

7 Asymptotic analysis of the conditional covariance

In this section, the behaviour of the conditional covariance matrix as is considered. From equation (32) can be written as

(37)

where is a diagonal matrix with elements as follows

(38)

By applying the Kailath variant of the Sherman-Morrison-Woodbury identity the inverse of can be expanded as

(39)

Plugging this in equation (36) yields

(40)
(41)

where

(42)

The matrix

is symmetric, and can be factorised by singular value decomposition (SVD) as follows

(43)

with

(44)

and is a diagonal matrix with diagonal entries equal to the eigenvalues, , of .

(45)

Due to the symmetric nature of , all the eigenvalues are real. Furthermore, if is positive-definite then all eigenvalues are positive. Substituting from equation (43) in equation (41) yields

(46)
(47)
(48)
(49)
(50)

In the above is a diagonal matrix with the entries

(51)

If the minimum eigenvalue of is much larger than 1, i.e.

(52)

then

(53)

and

(54)

Consequently, equation (50) yields

(55)

Finally, from the above and equations (42), (38), and (33), the conditional covariance matrix can be written as

(56)

It can hence be concluded that if the minimum eigenvalue of monotonically increases as increases then

(57)

Let the eigenvalues of be denoted in decreasing order as . The behaviour of the minimum eigenvalue of is of concern. Note that can be written as

(58)

Consequently,

(59)

and are both symmetric matrices. Let the eigenvalues of be denoted in decreasing order as . From equation (59) one has

(60)

where denotes the trace. Expressed in terms of the eigenvalues of the respective matrices, the above reads

(61)

From several inequalities on the sums of eigenvalues of Hermitian matrices, specifically the Ky Fan inequality [16, 17], one has

(62)

Substituting in equation (62) and subtracting it from equation (61) results in

(63)

Consequently, if is full rank then and

(64)

which implies that the minimum eigenvalue of is monotonically increasing.

The above results are put in the perspective of classical non-linear regression analysis

[9, 10] by assuming that the observation operator is equal to identity for all . Then, under a further assumption that is full-rank for all , the conditional covariance matrix of the parameter is

(65)

where is the Fisher information matrix defined as

(66)

8 Conditional covariance for finite

From equation (50) we have

(67)

where is given by equation (51). Consider the matrix , it can be expanded as

(68)
(69)
(70)
(71)
(72)

Following the above and equation (58), the conditional covariance matrix can be written as

(73)

9 Information gain

In this section the gain in information about the parameters by the measurements is considered. For details of such an information-theoretic approach the reader is referred to [15]. The gain in information about the parameter vector by the measurements of is given by the mutual information between and , which is equal to the difference between the differential entropies of the prior distribution and the conditional distribution . From equations (26), (34), and (36), this gain in information can be written as

(74)

where denotes the determinant. The above can be expanded through equation (73) as

(75)

Note that the above represents the information gain for the joint vector of all the parameters. Commonly, one is interested in individual parameters, for which the information gain is now presented. Let denote the vector of a subset of parameters indexed by the elements of set and denote the vector of the remaining parameters, the complement of set . Hence, denotes the parameter, denotes the vector formed by taking the and parameters, and so on. The conditional covariance matrix can be decomposed into the components of and as

(76)
(77)

where is the cardinality of , and and are the sensitivity matrices for and , respectively, i.e. and . Given the above decomposition, the marginal covariance of given the measurements can be written as the Schur complement of the matrix in as follows

(78)

and the information gain as

(79)

Another quantity of interest is the correlation between two subsets of parameters and . In an information-theoretic context this can be assessed by how much more information is gained about the parameters in addition to if was also known, i.e. the mutual information between and gi