# Maximum likelihood estimation in hidden Markov models with inhomogeneous noise

We consider parameter estimation in hidden finite state space Markov models with time-dependent inhomogeneous noise, where the inhomogeneity vanishes sufficiently fast. Based on the concept of asymptotic mean stationary processes we prove that the maximum likelihood and a quasi-maximum likelihood estimator (QMLE) are strongly consistent. The computation of the QMLE ignores the inhomogeneity, hence, is much simpler and robust. The theory is motivated by an example from biophysics and applied to a Poisson- and linear Gaussian model.

## Authors

• 1 publication
• 30 publications
• 12 publications
• ### Maximum likelihood estimation for Gaussian processes under inequality constraints

We consider covariance parameter estimation for a Gaussian process under...
04/10/2018 ∙ by François Bachoc, et al. ∙ 0

• ### Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

We present an asymptotic analysis of Viterbi Training (VT) and contrast ...
12/16/2013 ∙ by Armen E. Allahverdyan, et al. ∙ 0

• ### Parameter-driven models for time series of count data

This paper considers a general class of parameter-driven models for time...
11/07/2017 ∙ by Abdollah Safari, et al. ∙ 0

• ### Maximum likelihood estimation of hidden Markov models for continuous longitudinal data with missing responses and dropout

We propose an inferential approach for maximum likelihood estimation of ...
06/30/2021 ∙ by Silvia Pandolfi, et al. ∙ 0

• ### Accurate Kernel Learning for Linear Gaussian Markov Processes using a Scalable Likelihood Computation

We report an exact likelihood computation for Linear Gaussian Markov pro...
05/18/2018 ∙ by Stijn de Waele, et al. ∙ 0

• ### Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm

This paper studies Quasi Maximum Likelihood estimation of dynamic factor...
10/09/2019 ∙ by Matteo Barigozzi, et al. ∙ 0

• ### Filtering Additive Measurement Noise with Maximum Entropy in the Mean

The purpose of this note is to show how the method of maximum entropy in...
09/04/2007 ∙ by Henryk Gzyl, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Motivation. Hidden Markov models (HMMs) have a long history and are widely used in a plenitude of applications ranging from econometrics, chemistry, biology, speech recognition to neurophysiology. For example, transition rates between openings and closings of ion channels, see [1], are often assumed to be Markovian and the observed conductance levels from such experiments can be modeled with homogeneous HMMs. The HMM is typically justified if the underlying experimental conditions, such as the applied voltage in ion channel recordings, are kept constant over time, see [2, 3, 4, 5, 6].

However, if the conductance levels are measured in experiments with varying voltage over time, then the noise appears to be inhomogeneous, i.e., the noise has a voltage-dependent component. Such experiments play an important role in the understanding of the dependence of the gating behavior to the gradient of the applied voltage [7, 8]. To the best of our knowledge, there is a lack of a rigorous statistical methodology for analyzing such type of problems, for which we provide some first theoretical insights. More detailed, in this paper we are concerned with the consistency of the maximum likelihood estimator (MLE) in such models and with the question of how much maximum likelihood estimation in a homogeneous model is affected by inhomogeneity of the noise, a problem which appears to be relevant to many other situations, as well.

A homogeneous hidden Markov model, as considered in this paper, is given by a bivariate stochastic process , where

is a Markov chain with finite state space

, and is, conditioned on

, an independent sequence of random variables mapping to a Polish space

, such that the distribution of depends only on . The Markov chain is not observable, but observations of are available. A well known statistical method to estimate the unknown parameters is based on the maximum likelihood principle, see [9, 10]. The study of consistency and asymptotic normality of the MLE of such homogeneous HMMs has a long history and is nowadays well understood in quite general situations. We refer to the final paragraph of this section for a review but already mention that the approach of [11] is particularly useful for us.

In contrast to the classical setting, we consider an inhomogeneous HMM, namely a bivariate stochastic process , where conditioned on we assume that is a sequence of independent random variables on space , such that the distribution of depends not only on the value of , but additionally on . The dependence on implies that the Markov chain is inhomogeneous. In such generality a theory for maximum likelihood estimation in inhomogeneous hidden Markov models is, of course, a notoriously difficult task.

However, motivated by the example above (for details see below) we consider a specific situation where e.g. the inhomogeneity is caused by an exogenous quantity (e.g. the varying voltage) with decreasing influence as increases . To this end, we introduce the concept of a doubly hidden Markov model (DHMM).

###### Definition 1 (Dhmm).

A doubly hidden Markov model is a trivariate stochastic process such that is a non-observed homogeneous HMM and is an inhomogeneous HMM with observations .

For such a DHMM we have in mind that the distribution of is getting “closer” to the distribution of for increasing . A crucial point here is that is observable whereas is not. Because of the “proximity” of and one might hope to carry theoretical results from homogeneous HMMs to inhomogeneous ones.

We illustrate a setting of a DHMM by modeling the conductance level of ion channel data with varying voltage111Measurements are kindly provided by the lab of C. Steinem, Institute for Organic and Molecular Biochemistry, University of Göttingen. In Figure 1 measurements of the current flow across the outer cell membrane of the porin PorB of Neisseria meningitidis are displayed in order to investigate the antibacterial resistance of the PorB channel. As the applied voltage increases linearly Ohm’s law suggests that the measured current increases also linearly, see Figure 1. A reasonable model for the observed current is to assume that it follows a Gaussian hidden Markov model, i.e., the dynamics can be described by

 (1) un(μ(Xn)+σ(Xn)Vn)+~εn.

Here the observation space and the finite state space of the hidden Markov chain is assumed to be , which corresponds to an “open” and “closed” gate. For , the expected slope is , the noise level and is an i.i.d. standard normal sequence, i.e., , where

denotes the normal distribution with mean

and variance

. Further, is another sequence of real-valued i.i.d. random variables, independent of , with and , which is necessary to model the background noise, even when .

Dividing the dynamic (1) by gives the conductivity of the channel, see Figure 2.

This is now a sequence of an inhomogeneous HMM. The state of the Markov chain determines the parameter or , both unknown. The non-observable sequence of random variables of the homogeneous HMM is given by

 (2) Yn :=μ(Xn)+σ(Xn)Vn.

The observation of the inhomogeneous HMM is determined by

 (3) Zn:=Yn+εn,

with , such that where and as the voltage increases. Such a DHMM describes approximately the observed conductance level of ion channel recordings with linearly increasing voltage.

Intuitively, here one can already see that for sufficiently large the influence of “washes out” as decreases to zero and observations of are “close” to .

Main result. We explain now our main theoretical contribution for such a DHMM. Assume that we have a parametrized DHMM with compact parameter space . For let be the likelihood function of and be the likelihood function of with . Both functions are assumed to be continuous in . Given observations of our goal is to estimate “the true” parameter . The MLE , given by a parameter in the set of maximizers of the log-likelihood function, i.e.,

 θMLν,n∈argmaxθ∈Θlogpνθ(z1,…,zn),

is the canonical estimator for approaching this problem. Note that this set is non-empty due to the compactness of the parameter space and the continuity of in . Unfortunately none of the strong consistency results of maximum likelihood parameter estimation provided for homogeneous HMMs are applicable, because of the inhomogenity. Namely, all proofs for consistency in HMMs rely on the fact that the conditional distribution of given is constant for all . In a DHMM this is usually not the case for , because of the time-dependent noise. This issue can be circumvented by proving that under suitable assumptions is an asymptotic mean stationary process. This implies ergodicity and an ergodic theorem for , that can be used. However, for the computation of explicit knowledge of the inhomogeneity is needed, i.e., of the time-dependent component of the noise which is hardly known in practice (recall our data example). That is the reason for us to introduce a quasi-maximum likelihood estimator (QMLE), given by a maximizer of the quasi-likelihood function, i.e.,

 θQMLν,n∈argmaxθ∈Θlogqνθ(z1,…,zn).

This is not a MLE, since the observations are generated from the inhomogeneous model, whereas is the likelihood function of the homogeneous model. Roughly, we assume the following (for a precise definition see Section 3.1):

1. [label=0.)]

2. The transition matrix of the hidden finite state space Markov chain is irreducible and satisfies a continuity condition w.r.t. the parameters.

3. The observable and non-observable random variables and are “close” to each other in a suitable sense.

4. The homogeneous HMM is well behaving, such that observations of would lead to a consistent MLE.

We show that if the approximate the reasonably well (see the condition 1 in Section 3.1 ) the estimator provides also a reasonable way for approximating “the true” parameter . If the model satisfies all conditions, see Section 3.1, then Theorem 1, states that

 θQMLν,n→θ∗a.s., as n→∞.

Hence the QMLE is consistent. As a consequence we obtain under an additional assumption that also the MLE is consistent, almost surely, as . For a Poisson model and linear Gaussian model we specify Theorem 1, see Section 4. In the DHMM described in (2) and (3) we obtain consistency of the QMLE whenever for some . In Section 5 we reconsider the approximating condition 2, precisely stated in Section 3.1, provide an outlook to possible extensions and discuss asymptotic normality of the estimators.

Literature review and connection to our work. The study of maximum likelihood estimation in homogeneous hidden Markov models has a long history and was initiated by Baum and Petrie, see [9, 10], who proved strong consistency of the MLE for finite state spaces and . Leroux extends this result to general observation spaces in [12]. These consistency results rely on ergodic theory for stationary processes which is not applicable in our setting since the process we observe is not stationary. More precisely, it was shown that the relative entropy rate converges for any parameter in the parameter space using an ergodic theorem for subadditive processes. There are further extensions also to Markov chains on general state spaces, but under stronger assumptions, see [13, 14, 15, 16, 17]. A breakthrough has been achieved by Douc et al. [11] who used the concept of exponential separability. This strategy allows one to bound the relative entropy rate directly.

Although the state space of the Markov chain is more general than in our setting, we cannot apply the results of [11] due to the inhomogeneity of the observation, but we use the same approach to show our consistency statements.

The investigation of strong consistency of maximum likelihood estimation in inhomogeneous HMMs is less developed. In [18] and [19]

the MLE in inhomogeneous Markov switching models is studied. There, the transition probabilities are also influenced by the observations, but the inhomogeneity there is different from the time-dependent inhomogeneity considered in our work, since the conditional law is not changing over time.

Related to strong consistency, as considered here, is the investigation of asymptotic normality (as it provides weak consistency). For homogeneous HMMs asymptotic normality has be shown for example in [14, 20]. In [19], also, asymptotic normality for the MLE in Markov switching models is studied whereas in [21] asymptotic normality of M-estimators in more general inhomogeneous situations is considered. However, the QMLE we suggest and analyze does not satisfy the assumptions imposed there. In Section 5.4 and in Appendix B we provide and discuss necessary conditions to achieve asymptotic normality for the QMLE by adapting the approach of [21].

To ease readability Section 6 is devoted to the proofs of our main results. In particular, we draw the connection between asymptotic mean stationary processes and inhomogeneous hidden Markov models.

## 2. Setup and notation

We denote the finite state space of by and denotes the power set of . Furthermore, let be a Polish space with metric and corresponding Borel -field . The measurable space is equipped with a -finite reference measure . Througout the whole work we consider parametrized families of DHMMs (see Definition 1) with compact parameter space for some . For this let be a sequence of probability measures on a measurable space such that for each parameter the distribution of is specified by

• an initial distribution on and a transition matrix of the Markov chain , such that

 Pθ(Xn=s)=νPn−1θ(s),s∈S,

where and for ,

 νPn−1θ(s)=∑s1,…,sn−1∈SPθ(sn−1,s)n−2∏i=1Pθ(si,si+1)ν(s1),s∈S;

(Here and elsewhere we use the convention that for any sequence .)

• and by the conditional distribution of given , that is,

 Pθ((Yn,Zn)∈C∣Xn=s)=Qθ,n(s,C),C∈B(G2)

which satisfies that there are conditional density functions w.r.t. , such that

 Pθ(Yn∈A∣Xn=s) =Qθ,n(s,A×G)=∫Afθ(s,y)λ(dy),A∈B(G), Pθ(Zn∈B∣Xn=s) =Qθ,n(s,G×B)=∫Bfθ,n(s,z)λ(dz),B∈B(G).

Here the distribution of given is independent of , whereas the distribution of given depends through also explicitly on .

By we denote the set of probability measures on . To indicate the dependence on the initial distribution, say , we write instead of just . To shorten the notation, let , and . Further, let and be the distributions of and on , respectively.

The “true” underlying model parameter will be denoted as and we assume that the transition matrix possesses a unique invariant distribution . We have access to a finite length observation of . Then, the problem is to find a consistent estimate of on the basis of the observations without observing . Consistency of the estimator of is limited up to equivalence classes in the following sense. Two parameters are equivalent, written as , iff there exist two stationary distributions for , respectively, such that . For the rest of the work assume that each represents its equivalence class.

For an arbitrary finite measure on , , and define

 pνθ(xt+1;z1,…,zt) \coloneqq∑x1,…,xt∈Sν(x1)t∏i=1fθ,i(xi,zi)Pθ(xi,xi+1), pνθ(z1,…,zt) \coloneqq∑xt+1∈Spνθ(xt+1;z1,…,zt).

If is a probability measure on , then is the likelihood of the observations for the inhomogeneous HMM with parameter and . Although there are no observations of available, we define similar quantities for by

 qνθ(xt+1,y1,…,yt) \coloneqq∑x1,…,xt∈Sν(x1)t∏i=1fθ(xi,yi)Pθ(xi,xi+1), qνθ(y1,…,yt) \coloneqq∑xt+1∈Sqνθ(xt+1,y1,…,yt).

## 3. Assumptions and main result

Assume for a moment that observations

of are available. Then the log-likelihood function of , with initial distribution , is given by

 logqνθ(y1,…,yn).

In our setting we do not have access to observations of , but have access to “contaminated” observations of . Based on these observations define a quasi-log-likelihood function

 ℓQν,n(θ):=logqνθ(z1,…,zn),

i.e., we plug the contaminated observations into the likelihood of . Now we approximate by which is the QMLE, that is,

 (4) θQMLν,n∈argmaxθ∈ΘℓQν,n(θ).

In addition, we are interested in the “true” MLE of a realization of . For this define the log-likelihood function

 ℓν,n(θ):=logpνθ(z1,…,zn),

which leads to the MLE given by

 (5) θMLν,n∈argmaxθ∈Θℓν,n(θ).

Under certain structural assumptions we prove that the QMLE from (4) is consistent. By adding one more condition this result can be used to verify that the MLE from (5) is also consistent.

### 3.1. Structural conditions

We prove consistency of the QMLE and the MLE under the following structural assumptions:

#### Irreducibility and continuity of X

1. [label=(P0)]

2. The transition matrix is irreducible.

3. The parametrization is continuous.

#### Proximity of Y and Z

1. [label=(C0)]

2. There exists such that for any and we have

 Pθ∗(m(Zn,Yn)≥ε∣Xn=s)=O(n−p).

(Recall that is the metric on .)

3. There exists an integer such that

 (6) Pπθ∗(k−1∏i=1maxs∈Sfθ∗,i(s,Zi)fθ∗(s,Zi)<∞) =1, Eπθ∗[maxs′∈Sfθ∗,n(s′,Zn)fθ∗(s′,Zn)∣Xn=s] <∞,∀s∈S,n≥k,

and

 (7) limsupn→∞Eπθ∗[maxs′∈Sfθ∗,n(s′,Zn)fθ∗(s′,Zn)∣Xn=s]≤1,∀s∈S.
4. For every with , there exists a neighborhood of such that there exists an integer with

 (8) Pπθ∗(k−1∏i=1supθ′∈Eθmaxs∈Sfθ′,i(s,Zi)fθ′(s,Zi)<∞) =1, Eπθ∗[supθ′∈Eθmaxs′∈Sfθ′,n(s′,Zn)fθ′(s′,Zn)∣Xn=s] <∞,∀s∈S,n≥k,

and

 (9) limn→∞(Eπθ∗[supθ′∈Eθmaxs′∈Sfθ′,n(s′,Zn)fθ′(s′,Zn)∣Xn=s])=1,∀s∈S.
###### Remark 1.

1 guarantees in particular that converges -a.s. to zero whereas 2 ensures that the ratio of and does not diverge exponentially or faster. Assumption 3 is needed to carry over the consistency of the QMLE to the MLE. In particular it implies that for all the ratio of and does not diverge exponentially or faster uniformly in .

#### Well behaving HMM

It is plausible that we are only able to prove consistency in the case where the unobservable sequence would lead to a consistent estimator of , itself. To guarantee that this is indeed the case we assume:

1. [label=(H0)]

2. For all let .

3. For every with , there exists a neighborhood of such that

 Eπθ∗[supθ′∈Uθ(logfθ′(s,Y1))+]<∞ for% all s∈S.
4. The mappings and are continuous for any , and .

5. For all and let .

###### Remark 2.

The conditions 13 coincide with the assumptions in [11, Sect. 3.2.] for finite state models and guarantee that the MLE for based on observations of is consistent. The condition 4 is an additional regularity assumption required for the inhomogeneous setting.

### 3.2. Consistency theorem

Now we formulate our main results about the consistency of the QMLE and the MLE.

###### Theorem 1.

Assume that the irreducibility and continuity conditions 1, 2, the proximity conditions 1, 2 and the well behaving HMM conditions 14 are satisfied. Further, let the initial distribution be strictly positive if and only if is strictly positive. Then

 θQMLν,n→θ∗,Pπθ∗-a.s.

as .

Note that condition 3 is not required in the previous statement. We only need it to prove the consistency of the MLE .

###### Corollary 1.

Assume that the setting and conditions of Theorem 1 and 3 are satisfied. Then

 θMLν,n→θ∗,Pπθ∗-a.s.

as .

## 4. Application

We consider two models where we explore the structural assumptions from Section 3.1 explicitly. The Poisson model, see Section 4.1, illustrates a simple example with countable observation space. The linear Gaussian model is an extension of the model introduced in (1) and (2) to multivariate and possibly correlated observations.

### 4.1. Poisson DHMM

For let

and define the vector

. Conditioned on the non-observed homogeneous sequence

is an independent sequence of Poisson-distributed random variables with parameter

. In other words, given we have . Here denotes the Poisson distribution with expectation . The observed sequence is determined by

 Zn=Yn+εn,

where is an independent sequence of random variables with . Here is a sequence of positive real numbers satisfying for some that

 (10) βn=O(n−p).

We also assume that is independent of and that the parameter determines the transition matrix and the intensity continuously. Note that the observation space is given by equipped with the counting measure . Figures 3 illustrates the empirical mean square error of approximations of the MLEs.

To obtain the desired consistency of the two estimators we need to check the conditions 1, 2, 13 and 14:

To 1 and 2: By the assumptions in this scenario those conditions are satisfied.

To 14: For , and we have

 |logfθ(s,y)|=−log⎛⎜ ⎜⎝(λ(s)θ)yy!exp(−λ(s)θ)⎞⎟ ⎟⎠ =−ylog(λ(s)θ)+log(y!)+λ(s)θ ≤−ylog(λ(s)θ)+y2+λ(s)θ.

Hence

 Eπθ∗[|logfθ∗(s,Y1)|] ≤−log(λ(s)θ∗)K∑s=1π(s)λ(s)θ∗+K∑s=1π(s)((λ(s)θ∗)2+λ(s)θ∗)+λ(s)θ∗<∞

and 1 is verified. A similar calculation gives 4. Condition 2 follows simply by . Condition 3 follows by the continuity in the parameter of the probability function of the Poisson distribution and the continuity of the mapping .

To 13: For any and any we have

From (10) it follows that

 1−exp(−βn)=O(n−p),

which proves 1. Observe that for any we have

 maxs∈Sfθ∗,n(s,z)fθ∗(s,z)=maxs∈S(βn+λ(s)θ∗)z(λ(s)θ∗)zexp(−βn)=(an)zexp(−βn),

with . Now we verify 2 with . For all and we have

 Eπθ∗[maxs′∈Sfθ∗,n(s′,Zn)fθ∗(s′,Zn)∣Xn=s] =Eπθ∗[aZnnexp(−βn)∣Xn=s] =exp((λ(s)θ∗+βn)(an−1)−βn)<∞.

Fix , and note that

 limsupn→∞Eπθ∗[maxs′∈Sfθ∗,n(s′,Zn)fθ∗(s′,Zn)∣Xn=s] =limsupn→∞exp((λ(s)θ∗+βn)(an−1)−βn)=1.

The last equality follows by the fact that and . Condition 3 follows by similar arguments.

The application of Theorem 1 and Corollary 1 leads to the following result.

###### Corollary 2.

For any initial distribution which is strictly positive if and only if is strictly positive, we have for the Poisson DHMM if (10) holds for some that

 θQMLν,n→θ∗,Pπθ∗-a.s.

and

 θMLν,n→θ∗,Pπθ∗-a.s.

as .

### 4.2. Multivariate linear Gaussian DHMM

For let , with full rank, where . Define as well as The sequences and are defined by

 Yn =μ(Xn)θ∗+Σ(Xn)θ∗Vn Zn =Yn+εn.

Here is an i.i.d. sequence of random vectors with , where

denotes the identity matrix, and

is a sequence of independent random vectors with , where is a positive real-valued sequence satisfying for some that

 (11) βn=O(n−q).

Here we also assume that the mapping is continuous. Furthermore, note that and is the -dimensional Lebesgue measure. Figures 5 illustrates the empirical mean square error of approximations of the MLEs.

To obtain consistency of the two estimators we need to check the conditions 1, 2, 13 and 14:

To 1 and 2: By definition of the model this conditions are satisfied.

To 14: For a matrix denote and . Note that for , and we have by

 fθ(s,y) fθ,n(s,z) ×exp(−12(z−μ(s)θ)T((Σ(s)θ)2+β2nI)−1(z−μ(s)θ)).

Further, observe that for all . For some constant we have

 Eπθ∗[|logfθ(s,Y1)|] ≤C1+Eπθ∗[12(Y1−μ(s)θ)T(Σ(s)θ)−2(Y1−μ(s)θ)]<∞,

since for each we have with the notation . By this estimate 1 and 2 follows easily. Condition 4 follows by similar arguments. More detailed, we have that is finite and converges to zero as well, as that there exists a constant such that

 Eπθ∗[∣∣logfθ∗,n(s,Zn)∣∣] ≤C2+Eπθ∗[12(Zn−μs)