Variational State and Parameter Estimation

This paper considers the problem of computing Bayesian estimates of both states and model parameters for nonlinear state-space models. Generally, this problem does not have a tractable solution and approximations must be utilised. In this work, a variational approach is used to provide an assumed density which approximates the desired, intractable, distribution. The approach is deterministic and results in an optimisation problem of a standard form. Due to the parametrisation of the assumed density selected first- and second-order derivatives are readily available which allows for efficient solutions. The proposed method is compared against state-of-the-art Hamiltonian Monte Carlo in two numerical examples.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

12/08/2020

Variational Nonlinear System Identification

This paper considers parameter estimation for nonlinear state-space mode...
02/07/2020

Constructing a variational family for nonlinear state-space models

We consider the problem of maximum likelihood parameter estimation for n...
06/23/2020

Self-learning eigenstates with a quantum processor

Solutions to many-body problem instances often involve an intractable nu...
03/25/2022

A Point Mass Proposal Method for Bayesian State-Space Model Fitting

State-space models (SSMs) are often used to model time series data where...
07/12/2013

On-line Bayesian parameter estimation in general non-linear state-space models: A tutorial and new results

On-line estimation plays an important role in process control and monito...
12/19/2018

The Mixture of Markov Jump Processes: Monte Carlo Method and the EM Estimation

This paper discusses tractable development and statistical estimation of...
04/15/2019

Copula-like Variational Inference

This paper considers a new family of variational distributions motivated...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

State-space models are a widely utilised and flexible class of models considered in many scientific and engineering applications [15]. Generally, identification of the parameters of state-space models remains a challenging task [16, 19]. In this paper, nonlinear state-space models of the form

(1a)
(1b)

are considered, where and are the state and measurement at time step , respectively, and are model parameters. For clarity the presence of an input is not shown, it is, however, allowed.

In this paper, given a sequence of measurements, , the estimation of the state and model parameters is considered. Using a Bayesian approach, this amounts to obtaining the distribution , given by

(2)

where , and is a prior distribution over the parameters.

From , distributions of both parameters and states can be obtained via marginalisation. For example, the marginal distribution is found via

(3)

The difficulty with both (2) and (3) is that they are generally intractable. This is due to both the arbitrary form of the distributions considered and the nonlinear integrals involved. As such, these distributions must be approximated. Many differing approximations exist, one class of approaches is to utilise particle-based Monte Carlo methods, an example of this is Hamiltonian Monte Carlo (HMC) [18, 8] methods.

In this paper, an alternative approach, known as variational inference (VI) [10, 2], to approximate is followed. The contribution of this paper is to use VI to approximate . Using a Gaussian assumed density, and a suitable approximation for the integrals, this provides a deterministic method of approximating that can be efficiently performed using gradient-based optimisation with exact first- and second-order derivatives.

2 Variational Inference

Variational inference is a widely used method to approximate, potentially intractable, posterior distributions with a parametric density of an assumed form. The numeric values of this parametrisation are obtained by maximising a likelihood lower bound [2]. In this section, the use of VI in approximating is examined.

As is intractable, it is approximated using an assumed density, parameterised by , and denoted as . The quality of this approximation is measured by

(4)

the Kullback-Leiber (KL) [13] divergence of from . As it is desired that is close to the aim is to find a , denoted , which minimises this KL divergence and is given by

(5)

However, as is intractable cannot be directly minimised or even evaluated. Instead, the concept of VI is used to, equivalently, maximise a likelihood lower bound.

Using conditional probability, this lower bound can be found by expressing the log-likelihood,

, as

(6)

Through addition and subtraction of to the right-hand side of (6) this leads to

(7)

As is independent of and , the log-likelihood can alternatively be given by

(8)

where .

Substituting (7) into the right-hand side of (8) arrives at

(9)

which will be expressed as

(10)

where

(11)

From (10), and as any KL divergence is non-negative, is a lower bound to the log-likelihood .

Furthermore, from (10) we have

(12)

and therefore

(13a)
(13b)
(13c)

The important property of this is does not involve the intractable distribution, , rather only the assumed distribution which is selected in a tractable form.

The desired approach is therefore to find a value for , that maximises via

(14)

for an assumed density form. This provides an approximation for given by as desired.

However, as given, this approach is computationally intractable for any large quantity of time steps. This is because is a full dense distribution, the dimension of which continues to grow. Fortunately, by expressing as

(15)

and utilising the Markovian nature of state-space models (see Appendix A for details) to give

(16)

and

(17)

we have

(18)

without introducing any approximations.

Therefore, calculation of does not require the full distribution, rather only each and as such remains tractable to calculate for large quantities of time steps.

The focus of this paper correspondingly shifts from the, intractable, aim of obtaining , to the tractable aim of obtaining each .

3 Assumed Gaussian Distribution and Tractable Approximation

In this section, the parametric assumed density chosen and the subsequent formation of a standard form optimisation problem is examined. When selecting the parametric form of there are two conflicting goals. Firstly, the parametrisation chosen should be sufficiently flexible to well represent . Secondly, the parametrisation should be chosen to allow the optimisation to be efficiently performed.

In this paper, we have chosen to use a multivariate Gaussian parameterised as

(19)

where , , , , and

(20)

where are upper triangular. The parameter is therefore given as

(21)

and related to via

(22)

Importantly, parameterising each joint distribution using the Cholesky factor,

, allows the components of corresponding to each time step to be easily approximated using Gaussian quadrature as

(23a)

where is a weight, and the sigma points are denoted as where is given by linear combinations of the joint mean and the columns of . The sigma points, , being linear combinations of the elements of is important as it significantly simplifies the calculation of both first- and second-order derivatives used in the optimisation.

Furthermore, due to the Gaussian assumption , and its derivatives, are available in closed form. With a suitable prior is also available in closed form, alternatively it can be similarly approximated using Gaussian quadrature.

This allows to be approximated by as

(24)

This parametrisation is, however, over-parameterised as the marginal distributions , and can be calculated from either or
. Hence, the constraints defined by the feasible set

(25)

where and

are required to ensure consistency.

However, as each , being a Cholesky factor of a covariance matrix, is non-singular the constraints set can be simplified to

(26)

where

Using this, an approximation of (14) is given by

(27a)
s.t. (27b)

This constrained optimisation problem is of a standard form and can be efficiently solved using exact first- and second-order derivatives without more approximations to find a local maximum. The calculation of derivatives required can be performed using automatic differentiation [7] without manual effort. Due to the parametrisation chosen, this can be efficiently performed using standard tools. For the numerical examples provided CasADi [1] has been used.

The result of this optimisation is numerical values for each

(28)

which approximates each . Due to the Gaussian form selected, marginal and conditional distributions of both the parameters and states can be readily extracted as desired.

The proposed approach is summarised in Algorithm 1.

0:  Measurements , prior , initial estimate of
0:  Each for
  - Obtain from (27) initialised at
  - Extract each from .
Algorithm 1 State and Parameter Estimation

4 Numerical Examples

In this section, we provide two numerical examples of the proposed approach and compare the results against HMC. The HMC results have been calculating using STAN [3] as detailed in [8].

4.1 Stochastic Volatility

In this example, estimation of for the following stochastic volatility model:

(29a)
(29b)

where and is considered using 726 simulated measurements.

For the proposed method fifth-order cubature [9] is used to perform the Gaussian quadrature approximations and the optimisation routine is run until convergence to a locally optimal point requiring . The HMC results are obtained using iterations with half discarded at warm-up requiring .

In Figure 1 the marginal distributions of using both the proposed approach and HMC are shown. While the marginal distributions for parameters and closely match, the VI-based method is overconfident in the distribution of and cannot closely match the non-Gaussian relationships between the parameters.

Figure 1: Distribution of for both the proposed approach (contour plots) and HMC (scatter plots / histograms). The true parameters are shown as vertical lines.

Despite this the state distributions obtained using both methods closely match. Figure 2 highlights this for three differing time steps, which are representative of the results for other time steps.

Figure 2: State distributions for three time steps using the proposed approach (curve) and HMC (histogram) illustrating the high level of similarity.

In this example, the proposed method is demonstrated on a nonlinear and non-Gaussian example and is, generally, representative of the results obtained using HMC at a lower computational cost. This, however, comes with a trade-off, when the underlying distribution is non-Gaussian, the Gaussian assumption of the proposed method may produce overconfident results as shown for the distribution of .

4.2 Inverted Pendulum

In this section, the proposed method is applied to a rotational inverted pendulum, or Furata pendulum [6]

, using real data collected from a Quanser QUBE—Servo 2. The estimation of a four-dimensional state vector, six parameters

, and a seven dimensional covariance matrix associated with coupled additive Gaussian noise of the process and measurement models is considered using 375 measurements. For further details of the model refer to Appendix B.

For the proposed approach, a third order unscented-transform [11, 23] with , , and is used to perform the Gaussian quadrature required. A point estimate for is also considered, this highlights the flexibility of the proposed approach which allows for distributions of subsets of parameters alongside point estimates of others. In this example, we also exploit the additive form of the state-space model, this allows for a Hessian to be efficiently approximated using only Jacobian’s. Identification problems for systems with additive Gaussian noise commonly use this approach [12, 20]. While the calculations for this example are not fully detailed in Section 3, they represent minor modifications of the resultant optimisation problem.

For the HMC results, iterations are used to obtain distributions for all states and parameters. As the HMC results calculate a distribution for each element of , while the proposed method considers a point estimate, it is expected the results of the proposed method and HMC slightly differ.

In Figure 3 the distribution of using both the proposed approach and HMC are shown. This illustrates that, despite the approximations introduced, the proposed method has delivered results close to HMC. This has been achieved with a significant computational reduction. The proposed method required a few minutes to calculate while the HMC results required a few hours.

Figure 3: Distribution of inverted pendulum parameters calculated using the proposed method (contour plots) and HMC (scatter plots / histograms).

5 Conclusion

In this paper, a variational based approach to estimating state and parameter distributions is presented for general nonlinear state-space models. The proposed method results in an optimisation problem of a standard form for which, due to the parametrisation selected, derivatives are readily available. While the proposed method assumes a Gaussian density, and is therefore limited compared to more general HMC approaches, the numerical examples illustrate that approximations suitable for many purposes are obtained on both simulated and real data. Compared to state-of-the-art HMC approaches this is achieved at a significantly lower computational cost.

Future research directions for this work include blending the proposed approach with sequential Monte Carlo (SMC) algorithms along the lines of what has been done in e.g. [14, 17]. The motivation for this is to combine the computational efficiency of the proposed method and the theoretical guarantees of SMC to overcome the limitations imposed by the Gaussian assumed densities.

References

  • [1] Joel A. E. Andersson, Joris Gillis, Greg Horn, James B. Rawlings, and Moritz Diehl. CasADi: a software framework for nonlinear optimization and optimal control. Mathematical Programming Computation, 11(1):1–36, July 2018.
  • [2] D. Blei, A. Kucukelbir, and J. McAuliffe. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
  • [3] Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A Probabilistic Programming Language. Journal of Statistical Software, 76(1), 2017.
  • [4] S. B. Chitralekha, J. Prakash, H. Raghavan, R. B. Gopaluni, and S. L. Shah.

    Comparison of Expectation-Maximization based parameter estimation using Particle Filter, Unscented and Extended Kalman Filtering techniques.

    IFAC Proceedings Volumes, 42(10):804–809, 2009.
  • [5] Jarrad Courts, Adrian Wills, and Thomas B. Schön. Variational Nonlinear State Estimation. arXiv, 2020.
  • [6] K. Furuta, M. Yamakita, and S. Kobayashi. Swing-up Control of Inverted Pendulum Using Pseudo-State Feedback. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering, 206(4):263–269, November 1992.
  • [7] Andreas Griewank and Andrea Walther. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Society for Industrial and Applied Mathematics, jan 2008.
  • [8] Johannes Hendriks, Adrian Wills, Brett Ninness, and Johan Dahlin. Practical Bayesian System Identification using Hamiltonian Monte Carlo. arXiv, 2020.
  • [9] Bin Jia, Ming Xin, and Yang Cheng. High-degree cubature Kalman filter. Automatica, 49(2):510–518, feb 2013.
  • [10] Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. An Introduction to Variational Methods for Graphical Models. Machine Learning, 37(2):183–233, November 1999.
  • [11] Simon J. Julier and Jeffrey K. Uhlmann. New extension of the Kalman filter to nonlinear systems. In Ivan Kadar, editor, Signal Processing, Sensor Fusion, and Target Recognition VI. SPIE, July 1997.
  • [12] Juho Kokkala, Arno Solin, and Simo Särkkä. Sigma-Point Filtering and Smoothing Based Parameter Estimation in Nonlinear Dynamic Systems. Journal of Advances in Information Fusion, 2015.
  • [13] S. Kullback and R. A. Leibler. On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79–86, March 1951.
  • [14] Fredrik Lindsten, Jouni Helske, and Matti Vihola. Graphical model inference: Sequential Monte Carlo meets deterministic approximations. In Advances in Neural Information Processing Systems 31, pages 8190–8200. 2018.
  • [15] L. Ljung. System Identification: Theory for the User. Prentice Hall information and system sciences series. Prentice Hall PTR, 1999.
  • [16] Lennart Ljung. Perspectives on system identification. Annual Reviews in Control, 34(1):1–12, April 2010.
  • [17] Christian A. Naesseth, Fredrik Lindsten, and David M. Blei. Markovian Score Climbing: Variational Inference with KL(p||q). In Advances in Neural Information Processing Systems 33. 2020.
  • [18] Radford M. Neal.

    Handbook of Markov Chain Monte Carlo

    , chapter MCMC using Hamiltonian dynamics, pages 113–162.
    Chapman and Hall/CRC, 2011.
  • [19] Brett Ninness. Some System Identification Challenges and Approaches. IFAC Proceedings Volumes, 42(10):1–20, 2009.
  • [20] Simo Särkkä. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks. Cambridge University Press, 2013.
  • [21] Thomas B. Schön, Adrian Wills, and Brett Ninness. System identification of nonlinear state-space models. Automatica, 47(1):39 – 49, 2011.
  • [22] Michail D. Vrettas, Yuan Shen, and Dan Cornford. Derivations of Variational Gaussian Process Approximation Framework. Technical report, Aston University, March 2008.
  • [23] E. A. Wan and R. Van Der Merwe. The unscented Kalman filter for nonlinear estimation. In Proceedings of the IEEE Adaptive Systems for Signal Processing, Communications, and Control Symposium. IEEE, October 2000.

References

  • [1] Joel A. E. Andersson, Joris Gillis, Greg Horn, James B. Rawlings, and Moritz Diehl. CasADi: a software framework for nonlinear optimization and optimal control. Mathematical Programming Computation, 11(1):1–36, July 2018.
  • [2] D. Blei, A. Kucukelbir, and J. McAuliffe. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
  • [3] Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A Probabilistic Programming Language. Journal of Statistical Software, 76(1), 2017.
  • [4] S. B. Chitralekha, J. Prakash, H. Raghavan, R. B. Gopaluni, and S. L. Shah.

    Comparison of Expectation-Maximization based parameter estimation using Particle Filter, Unscented and Extended Kalman Filtering techniques.

    IFAC Proceedings Volumes, 42(10):804–809, 2009.
  • [5] Jarrad Courts, Adrian Wills, and Thomas B. Schön. Variational Nonlinear State Estimation. arXiv, 2020.
  • [6] K. Furuta, M. Yamakita, and S. Kobayashi. Swing-up Control of Inverted Pendulum Using Pseudo-State Feedback. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering, 206(4):263–269, November 1992.
  • [7] Andreas Griewank and Andrea Walther. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Society for Industrial and Applied Mathematics, jan 2008.
  • [8] Johannes Hendriks, Adrian Wills, Brett Ninness, and Johan Dahlin. Practical Bayesian System Identification using Hamiltonian Monte Carlo. arXiv, 2020.
  • [9] Bin Jia, Ming Xin, and Yang Cheng. High-degree cubature Kalman filter. Automatica, 49(2):510–518, feb 2013.
  • [10] Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. An Introduction to Variational Methods for Graphical Models. Machine Learning, 37(2):183–233, November 1999.
  • [11] Simon J. Julier and Jeffrey K. Uhlmann. New extension of the Kalman filter to nonlinear systems. In Ivan Kadar, editor, Signal Processing, Sensor Fusion, and Target Recognition VI. SPIE, July 1997.
  • [12] Juho Kokkala, Arno Solin, and Simo Särkkä. Sigma-Point Filtering and Smoothing Based Parameter Estimation in Nonlinear Dynamic Systems. Journal of Advances in Information Fusion, 2015.
  • [13] S. Kullback and R. A. Leibler. On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79–86, March 1951.
  • [14] Fredrik Lindsten, Jouni Helske, and Matti Vihola. Graphical model inference: Sequential Monte Carlo meets deterministic approximations. In Advances in Neural Information Processing Systems 31, pages 8190–8200. 2018.
  • [15] L. Ljung. System Identification: Theory for the User. Prentice Hall information and system sciences series. Prentice Hall PTR, 1999.
  • [16] Lennart Ljung. Perspectives on system identification. Annual Reviews in Control, 34(1):1–12, April 2010.
  • [17] Christian A. Naesseth, Fredrik Lindsten, and David M. Blei. Markovian Score Climbing: Variational Inference with KL(p||q). In Advances in Neural Information Processing Systems 33. 2020.
  • [18] Radford M. Neal.

    Handbook of Markov Chain Monte Carlo

    , chapter MCMC using Hamiltonian dynamics, pages 113–162.
    Chapman and Hall/CRC, 2011.
  • [19] Brett Ninness. Some System Identification Challenges and Approaches. IFAC Proceedings Volumes, 42(10):1–20, 2009.
  • [20] Simo Särkkä. Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks. Cambridge University Press, 2013.
  • [21] Thomas B. Schön, Adrian Wills, and Brett Ninness. System identification of nonlinear state-space models. Automatica, 47(1):39 – 49, 2011.
  • [22] Michail D. Vrettas, Yuan Shen, and Dan Cornford. Derivations of Variational Gaussian Process Approximation Framework. Technical report, Aston University, March 2008.
  • [23] E. A. Wan and R. Van Der Merwe. The unscented Kalman filter for nonlinear estimation. In Proceedings of the IEEE Adaptive Systems for Signal Processing, Communications, and Control Symposium. IEEE, October 2000.

Appendix A Decomposition of

In this section, the decomposition of required to express as pairwise joints distributions, rather than as a full distribution is presented. This utilises the Markovian nature of state-space models in a fashion similar to related system identification approaches such as [21, 4, 20, 22, 5].

Firstly, is expressed in the form of two integrals as

The first integral is decomposed according to

While the second integral is decomposed according to

As

this gives

and therefore,

Together, this allows to be calculated without requiring the full distribution, , but rather only each using

Appendix B Inverted Pendulum Model

The state vector used to model the Furata pendulum is where and are the arm and pendulum angles, respectively.

The continuous time dynamics are given by,

where

and

where

and is the pendulum mass, , and the rod and pendulum lengths respectively, , are the rod and pendulum inertias, is the motor resistance, is the motor constant, , are damping coefficients for the rod and pendulum respectively, and is the applied motor voltage input.

The process model used consists of a two-step Euler discretisation of these continuous time dynamics over an sampling time subsequently disturbed by noise .

Measurements are from encoders on the arm and pendulum angle and current measurement from the motor. The resultant measurement model is

and it is assumed that