1 Introduction
Consider the following jump Markov linear model on state space form
(1a)  
(1b)  
(1c) 
where means distributed according to and the (discrete) variable takes values in (which can be thought of as different modes which the model is jumping between) and the (continuous) variable lives in . Hence, the state variable consists of . Furthermore, and are zero mean white Gaussian noise and , and . The output (or measurement) is , the input is . As is finite, can be defined via a matrix with entries .
We are interested in offline identification of jump Markov linear models on the form (1) for the case of an unknown jump sequence, but the number of modes is known. More specifically, we will formulate and solve the Maximum Likelihood (ML) problem to compute an estimate of the static parameters of a jump Markov linear model based on a batch of measurements and (if available) inputs by solving,
(2) 
Here , i.e., all unknown static parameters in model (1). Here, and throughout the paper, the dependence on the inputs is implicit.
Solving (2) is challenging and there are no closed form solutions available. Our approach is to derive an expectation maximization (EM) [10] type of solution, where the strategy is to separate the original problem into two closely linked problems. The first problem is a challenging, but manageable nonlinear state smoothing problem and the second problem is a tractable optimization problem. The nonlinear smoothing problem we can solve using a combination of sequential Monte Carlo (SMC) methods (particle filters and particle smoothers) [11] and Markov chain Monte Carlo (MCMC) methods [27]. More specifically we will make use of particle MCMC (PMCMC), which is a systematic way of exploring the strengths of both approaches by using SMC to construct the necessary highdimensional Markov kernels needed in MCMC [1, 19].
Our main contribution is a new maximum likelihood estimator that can be used to identify jump Markov linear models on the form (1). The estimator exploits the conditionally linear Gaussian substructure that is inherent in (1) via RaoBlackwellization. More specifically we derive a RaoBlackwellized version of the particle stochastic approximation expectation maximization (PSAEM) algorithm recently introduced in [18].
Jump Markov linear models, or switching linear models, is a fairly well studied class of hybrid systems. For recent overviews of existing system identification methods for jump Markov linear models, see [13, 22]. Existing approaches considering the problem under study here include two stage methods, where the data is first segmented (using e.g. change detection type of methods) and the individual models are then identified for each segment, see e.g. [23, 6]. There has also been approximate EM algorithms proposed for identification of hybrid systems [5, 15] and the very recent [3] (differing from our method in that we use stochastic approximation EM and RaoBlackwellization). There are also relevant relationships to the PMCMC solutions introduced in [33] and the SMCbased online EM solution derived in [34].
2 Expectation maximization algorithms
The EM algorithm [10] provides an iterative method for computing maximum likelihood estimates of the unknown parameters in a probabilistic model involving latent variables. In the jump Markov linear model (1) we observe , whereas the state is latent.
The EM algorithm maximizes the likelihood by iteratively maximizing the intermediate quantity
(3) 
More specifically, the procedure is initialized in and then iterates between computing an expected (E) value and solving a maximization (M) problem,
(E)  
(M) 
Intuitively, this can be thought of as ‘selecting the new parameters as the ones that make the given measurements and the current state estimate as likely as possible’.
The use of EM type algorithms to identify dynamical systems is by now fairly well explored for both linear and nonlinear models. For linear models, there are explicit expressions for all involved quantities, see e.g. [14, 30]. For nonlinear models the intermediate quantity is intractable and we are forced to approximate solutions; see e.g. [18, 29, 21, 7]. This is the case also for the model (1) under study in this work. Indeed, the maximization step can be solved in closed form for the model (1), but (3) is still intractable in our case.
It is by now fairly well established that we can make use of sequential Monte Carlo (SMC) [11] or particle Markov chain Monte Carlo (PMCMC) [1] methods to approximate the joint smoothing distribution for a general nonlinear model arbitrarily well according to
(4) 
where are random samples with corresponding importance weights , is a pointmass distribution at and we refer to as a weighted particle system. The particle smoothing approximation (4) can be used to approximate the integral in (3). Using this approach within EM, we obtain the particle smoothing EM (PSEM) method [21, 29]. PSEM can be viewed as an SMCanalogue of the well known Monte Carlo EM (MCEM) algorithm [32].
However, it has been recognized that MCEM, and analogously PSEM, makes inefficient use of the generated samples [9]. This is particularly true when the simulation step is computationally expensive, which is the case when using SMC or PMCMC. To address this shortcoming, [9] proposed to use a stochastic approximation (SA) [26] of the intermediate quantity instead of a vanilla Monte Carlo approximation, resulting in the stochastic approximation EM (SAEM) algorithm. The SAEM algorithm replaces the intermediate quantity in EM with
(5) 
with being a sequence of step sizes which fulfils and . In the above, is a sample state trajectory, simulated from the joint smoothing distribution . It is shown by [9] that the SAEM algorithm—which iteratively updates the intermediate quantity according to (5) and computes the next parameter iterate by maximizing this stochastic approximation—enjoys good convergence properties. Indeed, despite the fact that the method requires only a single sample at each iteration, the sequence will converge to a maximizer of under reasonably weak assumptions.
However, in our setting it is not possible to simulate from the joint smoothing distribution . We will therefore make use of the particle SAEM (PSAEM) method [18], which combines recent PMCMC methodology with SAEM. Specifically, we will exploit the structure of (1) to develop a RaoBlackwellized PSAEM algorithm.
We will start our development in the subsequent section by considering the smoothing problem for (1). We derive a PMCMCbased RaoBlackwellized smoother for this model class. The proposed smoother can, principally, be used to compute (3) within PSEM. However, a more efficient approach is to use the proposed smoother to derive a RaoBlackwellized PSAEM algorithm, see Section 4.
3 Smoothing using Monte Carlo methods
For smoothing, that is, finding , various Monte Carlo methods can be applied. We will use an MCMC based approach, as it fits very well in the SAEM framework (see e.g. [2, 17]), which together shapes the PSAEM algorithm. The aim of this section is therefore to derive an MCMCbased smoother for jump Markov linear models.
To gain efficiency, the jump sequence and the linear states
are separated using conditional probabilities as
(6) 
This allows us to infer the conditionally linear states using closed form expressions. Hence, it is only the jump sequence that has to be computed using approximate inference. This technique is referred to as RaoBlackwellization [8].
3.1 Inferring the linear states:
State inference in linear Gaussian state space models can be performed exactly in closed form. More specifically, the Kalman filter provides the expressions for the filtering PDF
and the one step predictor PDF . The marginal smoothing PDF is provided by the RauchTungStriebel (RTS) smoother [25]. See, e.g., [16] for the relevant results. Here, we useto denote the PDF for the (multivariate) normal distribution with mean
and covariance matrix .3.2 Inferring the jump sequence:
To find , an MCMC approach is used. First, the concept of using Markov kernels for smoothing is introduced, and then the construction of the kernel itself follows.
MCMC makes use of ergodic theory for statistical inference. Let be a Markov kernel (to be defined below) on the fold product space . Note that the jump sequence lives in this space. Furthermore, assume that is ergodic with unique stationary distribution . This implies that by simulating a Markov chain with transition kernel , the marginal distribution of the chain will approach in the limit.
Specifically, let be an arbitrary initial state with and let for , then by the ergodic theorem [27]:
(7) 
as for any function . This allows a smoother to be constructed as in Algorithm 1.
We will use the conditional particle filter with ancestor sampling (CPFAS) [19] to construct the Markov kernel . The CPFAS is similar to a standard particle filter, but with the important difference that one particle trajectory (jump sequence), , is specified a priori.
The algorithm statement for the CPFAS can be found in, e.g., [19]. Similar to an auxiliary particle filter [11], the propagation of (approximated by ) to time is done using the ancestor indices . To generate , the ancestor index is sampled according to , and as . The trajectories are then augmented as .
This is repeated for , whereas is set as . To ‘find’ the history for , the ancestor index is drawn with probability
(8) 
The probability density in (8) is proportional to
(9) 
where the last factor is the importance weight .
By sampling from the rendered set of trajectories with , a Markov kernel mapping to is obtained. For this Markov kernel to be useful for statistical inference we require that (i) it is ergodic, and (ii) it admits as its unique limiting distribution. While we do not dwell on the (rather technical) details here, we note that these requirements are indeed fulfilled; see [19].
3.3 RaoBlackwellization
RaoBlackwellization of particle filters is a fusion of the Kalman filter and the particle filter based on (6), and it is described in, e.g., [28]. However, RaoBlackwellization of a particle smoother is somewhat more involved since the process is Markovian, but not (with marginalized, see, e.g., [33] and [20] for various ways to handle this).
A similar problem as for the particle smoothers arises in the ancestor sampling (8) in the CPFAS. In the case of a nonRaoBlackwellized CPFAS, (8) reduces to [19]. This does not hold in the RaoBlackwellized case.
To handle this, (8) can be rewritten as
(10) 
Using the results from Section 4.4 in [20] (adapted to model (1)), this can be written (omitting , and with the notation , , i.e. the Cholesky factorization, and etc.)
(11a)  
with  
(11b)  
(11c)  
where  
(11d)  
(11e)  
(11f)  
(11g)  
(11h)  
(11i)  
and and . The RaoBlackwellization also includes an RTS smoother for finding . 
Summarizing the above development, the RaoBlackwellized CPFAS (for the jump Markov linear model (1)) is presented in Algorithm 2, where
(12) 
is used. Note that the discrete state is drawn from a discrete distribution defined by , whereas the linear state is handled analytically. The algorithm implicitly defines a Markov kernel that can be used in Algorithm 1 for finding , or, as we will see, be placed in an SAEM framework to estimate (both yielding PMCMC [1] constructions).
4 Identification of jump Markov linear models
In the previous section, an ergodic Markov kernel leaving invariant was found as a RaoBlackwellized CPFAS summarized in Algorithm 2. This will be used together with SAEM, as it allows us to make one parameter update at each step of the Markov chain smoother in Algorithm 1, as presented as PSAEM in [18]. (However, following [18], we make use of all the particles generated by CPFAS, and not only , to compute the intermediate quantity in the SAEM.)
This leads to the approximation (cf. (5))
(13) 
where the expectation is w.r.t. . Putting this together, we obtain a RaoBlackwellized PSAEM (RBPSAEM) algorithm presented in Algorithm 3. Note that this algorithm is similar to the MCMCbased smoother in Algorithm 1, but with the difference that the model parameters are updated at each iteration, effectively enabling simultaneous smoothing and identification.
(For notational convenience, the iteration number is suppressed in the variables related to .)
With a strong theoretical foundation in PMCMC and Markovian stochastic approximation, the RBPSAEM algorithm presented here enjoys very favourable convergence properties. In particular, under certain smoothness and ergodicity conditions, the sequence of iterates will converge to a maximizer of as , regardless of the number of particles used in the internal CPFAS procedure (see [18, Proposition 1] together with [17] for details). Furthermore, empirically it has been found that a small number of particles can work well in practice as well. For instance, in the numerical examples considered in Section 5, we run Algorithm 3 with with accurate identification results.
For the model structure (1), there exists infinitely many solutions to the problem (2
); all relevant involved matrices can be transformed by a linear transformation matrix and the modes can be reordered, but the inputoutput behaviour will remain invariant. The model is therefore overparametrized, or lacks identifiability, in the general problem setting. However, it is shown in
[24]that the CramérRao Lower Bound is not affected by the overparametrization. That is, the estimate quality, in terms of variance, is unaffected by the overparametrization.
4.1 Maximizing the intermediate quantity
When making use of RBPSAEM from Algorithm 3, one major question arises from Step 5, namely the maximization of the intermediate quantity . For the jump Markov linear model, the expectation in (13) can be expressed using sufficient statistics, as will be shown later, as an inner product
(14) 
for a sufficient statistics and corresponding natural parameter . Hence can be written as
(15) 
if the transformation
(16) 
is used. In detail,
(17a)  
neglecting constant terms in the last expression. This can be verified to be an inner product (as indicated in (14)) in . Here the sufficient statistics  
(17b)  
(17c)  
(17d)  
with  
(17e)  
and  
(17f)  
have been used. Further notation introduced is as the indicator function, and  
(17g) 
For computing this, the RTSsmoother in step 17 in Algorithm 2 has to be extended by calculation of , which can be done as follows [31, Property P6.2]
(18) 
initialized with .
For notational convenience, we will partition as
(19) 
Lemma 1.
Assume for all modes , that all states are controllable and observable and . The parameters maximizing for the jump Markov linear model (1) are then given by
(20a)  
(20b)  
(20c)  
(20d)  
(20e)  
for . 
are the partitions of indicated in (19), and are the ‘SAupdates’ (16) of the sufficient statistics (17b)(17d).
Remark: If , the first square bracket in (17e) can be replaced by , and (20b) becomes . The case with is fully analogous.
Proof.
With arguments directly from [14, Lemma 3.3], the maximization of the last part of (17a) for a given (for any sufficient statistics in the inner product, and in particular ), is found to be (20b)(20e).
Using Lagrange multipliers and that , the maximum w.r.t. of the first part of (17a) is obtained as
(21) 
∎
4.2 Computational complexity
Regarding the computational complexity of Algorithm 3, the most important result is that it is linear in the number of measurements . It is also linear in the number of particles .
5 Numerical examples
Some numerical examples are given to illustrate the properties of the RaoBlackwellized PSAEM algorithm. The Matlab code for the examples is available via the homepage of the first author.
5.1 Example 1  Comparison to related methods
The first example concerns identification using simulated data () for a onedimensional () jump Markov linear model with 2 modes () (with parameters randomly generated according to
) with lowpass filtered white noise as
. The following methods are compared:
RBPSAEM from Algorithm 3, with (only) particles,

PSAEM as presented in [18] with ,

PSEM [29] with forward particles and backward simulated trajectories.
The initial parameters are each randomly picked from , where is the true parameter value. The results are illustrated in Figure 1, which shows the mean (over all modes and runs) error for the transfer function from the input to the output .
From Figure 1 (note the loglog scale used in the plot) it is clear that our new RaoBlackwellized PSAEM algorithm has a significantly better performance, both in terms of mean and in variance between different runs, compared to the previous algorithms.
5.2 Example 2  Identification of multidimensional systems
Let us now consider a twodimensional system () with
modes. The eigenvalues for
are randomly picked from . The other parameters are randomly picked as and the system is simulated for time steps with input being a lowpass filtered white noise. The initialization of the RaoBlackwellized PSAEM algorithm is randomly picked from for each parameter. The number of particles used in the particle filter is . Figure 1(a) shows the mean (over runs) error for each mode, similar to Figure 1. Figure 1(b) shows the estimated Bode plots after iterations. As is seen from Figure 1(b), the RBPSAEM algorithm has the ability to catch the dynamics of the multidimensional system fairly well.6 Conclusion and Future Work
We have derived a maximum likelihood estimator for identification of jump Markov linear models. More specifically an expectation maximization type of solution was derived. The nonlinear state smoothing problem inherent in the expectation step was solved by constructing an ergodic Markov kernel leaving the joint state smoothing distribution invariant. Key to this development was the introduction of a RaoBlackwellized conditional particle filter with ancestor sampling. The maximization step could be solved in closed form. The experimental results indicate that we obtain significantly better performance both in terms of accuracy and computational time when compared to previous state of the art particle filtering based methods. The ideas underlying the smoother derived in this work have great potential also outside the class of jump Markov linear models and this is something worth more investigation. Indeed, it is quite possible that it can turn out to be a serious competitor also in finding the joint smoothing distribution for general nonlinear state space models.
References
 [1] Christophe Andrieu, Arnaud Doucet, and Roman Holenstein. Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3):269–342, 2010.
 [2] Christophe Andrieu, Eric Moulines, and Pierre Priouret. Stability of stochastic approximation under verifiable conditions. SIAM Journal on control and optimization, 44(1):283–312, 2005.
 [3] Trevor T Ashley and Sean B Andersson. A sequential monte carlo framework for the system identification of jump markov state space models. In Proceedings of American Control Conference, pages 1144–1149, Portland, Oregon, June 2014.
 [4] Alberto Bemporad, Jacob Roll, and Lennart Ljung. Identification of hybrid systems via mixedinteger programming. In Proceedings of 40th IEEE Conference on Decision and Control, pages 786–792, Orlando, Florida, 2001.
 [5] Lars Blackmore, Stephanie Gil, Seung Chung, and Brian Williams. Model learning for switching linear systems with autonomous mode transitions. In Proceedings of 46th IEEE Conference on Decision and Control, pages 4648–4655, New Orleans, LA, 2007.
 [6] José Borges, Vincent Verdult, Michel Verhaegen, and Miguel Ayala Botto. A switching detection method based on projected subspace classification. In Proceedings of 44th IEEE Conference on Decision and Control, jointly with European Control Conference, pages 344–349, Sevilla, Spain, 2005.

[7]
Olivier Cappé, Éric. Moulines, and Tobias Rydén.
Inference in Hidden Markov Models
. Springer Series in Statistics. Springer, New York, USA, 2005.  [8] George Casella and Christian P Robert. RaoBlackwellisation of sampling schemes. Biometrika, 83(1):81–94, 1996.
 [9] Bernard Delyon, Marc Lavielle, and Eric Moulines. Convergence of a stochastic approximation version of the EM algorithm. Annals of Statistics, 27(1):94–128, 1999.
 [10] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1–38, 1977.
 [11] Arnaud Doucet and Adam M Johansen. A tutorial on particle filtering and smoothing: Fifteen years later. In Dan Crisan and Boris Rozovsky, editors, Nonlinear Filtering Handbook, pages 656–704. Oxford University Press, Oxford, 2011.
 [12] Emily Fox, Erik B Sudderth, Michael I Jordan, and Alan Willsky. Bayesian nonparametric inference of switching dynamic linear models. IEEE Transactions of Signal Processing, 59(4):1569–1585, April 2011.
 [13] Andrea Garulli, Simone Paoletti, and Antonio Vicino. A survey on switched and piecewise affine system identification. In Proceedings of 16th IFAC Symposium on System Identification, volume 16, pages 344–355, Brussels, Belgium, 2012.
 [14] Stuart Gibson and Brett Ninness. Robust maximumlikelihood estimation of multivariable dynamic systems. Automatica, 41(10):1667–1682, October 2005.
 [15] Stephanie Gil and Brian Williams. Beyond local optimality: An improved approach to hybrid model learning. In Proceedings of 48th IEEE Conf Decision and Control, jointly with 28th Chinese Control Conference, pages 3938–3945, Shanghai, China, 2009.
 [16] Thomas Kailath, Ali H. Sayed, and Babak Hassibi. Linear estimation. Prentice Hall, Upper Saddle River, NJ, 2000.
 [17] Estelle Kuhn and Marc Lavielle. Coupling a stochastic approximation version of EM with an MCMC procedure. ESAIM: Probability and Statistics, 8:115–131, September 2004.
 [18] Fredrik Lindsten. An efficient stochastic approximation EM algorithm using conditional particle filters. In Proceedings of 38th IEEE International Conference Acoustics, Speech, and Signal Processing (ICASSP), pages 6274–6278, Vancouver, Canada, May 2013.

[19]
Fredrik Lindsten, Michael I Jordan, and Thomas B Schön.
Particle Gibbs with ancestor sampling.
Journal of Machine Learning Research
, 15:2145–2184, 2014.  [20] Fredrik Lindsten and Thomas B Schön. Backward simulation methods for Monte Carlo statistical inference. Foundations and Trends in Machine Learning, 6(1):1–143, 2013.
 [21] Jimmy Olsson, Randal Douc, Olivier Cappé, and Éric Moulines. Sequential Monte Carlo smoothing with application to parameter estimation in nonlinear statespace models. Bernoulli, 14(1):155–179, 2008.
 [22] Simone Paoletti, Aleksandar Lj Juloski, Giancarlo FerrariTrecate, and René Vidal. Identification of hybrid systems a tutorial. European journal of control, 13(2):242–260, 2007.
 [23] Komi Midzodzi Pekpe, Gilles Mourot, Komi Gasso, and José Ragot. Identification of switching systems using change detection technique in the subspace framework. In Proceedings of 43rd IEEE Conference on Decision and Control, volume 4, pages 3838–3843, Bahamas, 2004.
 [24] Rik Pintelon, Joannes Schoukens, Tomas McKelvey, and Yves Rolain. Minimum variance bounds for overparameterized models. IEEE Transactions on Automatic Control, 41(5):719–720, 1996.
 [25] HE Rauch, CT Striebel, and F Tung. Maximum likelihood estimates of linear dynamic systems. AIAA journal, 3(8):1445–1450, 1965.
 [26] Herbert Robbins and Sutton Monro. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3):400–407, 1951.
 [27] Christian P. Robert and George Casella. Monte Carlo statistical methods. Springer, New York, 2. ed. edition, 2004.
 [28] Thomas B Schön, Fredrik Gustafsson, and PJ Nordlund. Marginalized particle filters for mixed linear/nonlinear statespace models. IEEE Transactions on Signal Proceedings, 53(7):2279–2289, 2005.
 [29] Thomas B Schön, Adrian Wills, and Brett Ninness. System identification of nonlinear statespace models. Automatica, 47(1):39–49, January 2011.
 [30] Robert H Shumway and David S Stoffer. An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis, 3(4):253–264, 1982.
 [31] Robert H Sumway and David S Stoffer. Time series analysis and its applications with R examples. Springer, New York, 2006.
 [32] Greg CG Wei and Martin A Tanner. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85(411):699–704, 1990.
 [33] Nick Whiteley, Christophe Andrieu, and Arnaud Doucet. Efficient Bayesian inference for switching statespace models using discrete particle Markov chain Monte Carlo methods. arXiv preprint arXiv:1011.2437, 2010.
 [34] Sinan Yildirim, Sumeetpal S Singh, and Arnaud Doucet. An online expectation–maximization algorithm for changepoint models. Journal of Computational and Graphical Statistics, 22(4):906–926, 2013.
Comments
There are no comments yet.