1. Introduction
Motivation. Hidden Markov models (HMMs) have a long history and are widely used in a plenitude of applications ranging from econometrics, chemistry, biology, speech recognition to neurophysiology. For example, transition rates between openings and closings of ion channels, see [1], are often assumed to be Markovian and the observed conductance levels from such experiments can be modeled with homogeneous HMMs. The HMM is typically justified if the underlying experimental conditions, such as the applied voltage in ion channel recordings, are kept constant over time, see [2, 3, 4, 5, 6].
However, if the conductance levels are measured in experiments with varying voltage over time, then the noise appears to be inhomogeneous, i.e., the noise has a voltagedependent component. Such experiments play an important role in the understanding of the dependence of the gating behavior to the gradient of the applied voltage [7, 8]. To the best of our knowledge, there is a lack of a rigorous statistical methodology for analyzing such type of problems, for which we provide some first theoretical insights. More detailed, in this paper we are concerned with the consistency of the maximum likelihood estimator (MLE) in such models and with the question of how much maximum likelihood estimation in a homogeneous model is affected by inhomogeneity of the noise, a problem which appears to be relevant to many other situations, as well.
A homogeneous hidden Markov model, as considered in this paper, is given by a bivariate stochastic process , where
is a Markov chain with finite state space
, and is, conditioned on, an independent sequence of random variables mapping to a Polish space
, such that the distribution of depends only on . The Markov chain is not observable, but observations of are available. A well known statistical method to estimate the unknown parameters is based on the maximum likelihood principle, see [9, 10]. The study of consistency and asymptotic normality of the MLE of such homogeneous HMMs has a long history and is nowadays well understood in quite general situations. We refer to the final paragraph of this section for a review but already mention that the approach of [11] is particularly useful for us.In contrast to the classical setting, we consider an inhomogeneous HMM, namely a bivariate stochastic process , where conditioned on we assume that is a sequence of independent random variables on space , such that the distribution of depends not only on the value of , but additionally on . The dependence on implies that the Markov chain is inhomogeneous. In such generality a theory for maximum likelihood estimation in inhomogeneous hidden Markov models is, of course, a notoriously difficult task.
However, motivated by the example above (for details see below) we consider a specific situation where e.g. the inhomogeneity is caused by an exogenous quantity (e.g. the varying voltage) with decreasing influence as increases . To this end, we introduce the concept of a doubly hidden Markov model (DHMM).
Definition 1 (Dhmm).
A doubly hidden Markov model is a trivariate stochastic process such that is a nonobserved homogeneous HMM and is an inhomogeneous HMM with observations .
For such a DHMM we have in mind that the distribution of is getting “closer” to the distribution of for increasing . A crucial point here is that is observable whereas is not. Because of the “proximity” of and one might hope to carry theoretical results from homogeneous HMMs to inhomogeneous ones.
We illustrate a setting of a DHMM by modeling the conductance level of ion channel data with varying voltage^{1}^{1}1Measurements are kindly provided by the lab of C. Steinem, Institute for Organic and Molecular Biochemistry, University of Göttingen. In Figure 1 measurements of the current flow across the outer cell membrane of the porin PorB of Neisseria meningitidis are displayed in order to investigate the antibacterial resistance of the PorB channel. As the applied voltage increases linearly Ohm’s law suggests that the measured current increases also linearly, see Figure 1. A reasonable model for the observed current is to assume that it follows a Gaussian hidden Markov model, i.e., the dynamics can be described by
(1) 
Here the observation space and the finite state space of the hidden Markov chain is assumed to be , which corresponds to an “open” and “closed” gate. For , the expected slope is , the noise level and is an i.i.d. standard normal sequence, i.e., , where
denotes the normal distribution with mean
and variance
. Further, is another sequence of realvalued i.i.d. random variables, independent of , with and , which is necessary to model the background noise, even when .This is now a sequence of an inhomogeneous HMM. The state of the Markov chain determines the parameter or , both unknown. The nonobservable sequence of random variables of the homogeneous HMM is given by
(2) 
The observation of the inhomogeneous HMM is determined by
(3) 
with , such that where and as the voltage increases. Such a DHMM describes approximately the observed conductance level of ion channel recordings with linearly increasing voltage.
Intuitively, here one can already see that for sufficiently large the influence of “washes out” as decreases to zero and observations of are “close” to .
Main result. We explain now our main theoretical contribution for such a DHMM. Assume that we have a parametrized DHMM with compact parameter space . For let be the likelihood function of and be the likelihood function of with . Both functions are assumed to be continuous in . Given observations of our goal is to estimate “the true” parameter . The MLE , given by a parameter in the set of maximizers of the loglikelihood function, i.e.,
is the canonical estimator for approaching this problem. Note that this set is nonempty due to the compactness of the parameter space and the continuity of in . Unfortunately none of the strong consistency results of maximum likelihood parameter estimation provided for homogeneous HMMs are applicable, because of the inhomogenity. Namely, all proofs for consistency in HMMs rely on the fact that the conditional distribution of given is constant for all . In a DHMM this is usually not the case for , because of the timedependent noise. This issue can be circumvented by proving that under suitable assumptions is an asymptotic mean stationary process. This implies ergodicity and an ergodic theorem for , that can be used. However, for the computation of explicit knowledge of the inhomogeneity is needed, i.e., of the timedependent component of the noise which is hardly known in practice (recall our data example). That is the reason for us to introduce a quasimaximum likelihood estimator (QMLE), given by a maximizer of the quasilikelihood function, i.e.,
This is not a MLE, since the observations are generated from the inhomogeneous model, whereas is the likelihood function of the homogeneous model. Roughly, we assume the following (for a precise definition see Section 3.1):

[label=0.)]

The transition matrix of the hidden finite state space Markov chain is irreducible and satisfies a continuity condition w.r.t. the parameters.

The observable and nonobservable random variables and are “close” to each other in a suitable sense.

The homogeneous HMM is well behaving, such that observations of would lead to a consistent MLE.
We show that if the approximate the reasonably well (see the condition 1 in Section 3.1 ) the estimator provides also a reasonable way for approximating “the true” parameter . If the model satisfies all conditions, see Section 3.1, then Theorem 1, states that
Hence the QMLE is consistent. As a consequence we obtain under an additional assumption that also the MLE is consistent, almost surely, as . For a Poisson model and linear Gaussian model we specify Theorem 1, see Section 4. In the DHMM described in (2) and (3) we obtain consistency of the QMLE whenever for some . In Section 5 we reconsider the approximating condition 2, precisely stated in Section 3.1, provide an outlook to possible extensions and discuss asymptotic normality of the estimators.
Literature review and connection to our work. The study of maximum likelihood estimation in homogeneous hidden Markov models has a long history and was initiated by Baum and Petrie, see [9, 10], who proved strong consistency of the MLE for finite state spaces and . Leroux extends this result to general observation spaces in [12]. These consistency results rely on ergodic theory for stationary processes which is not applicable in our setting since the process we observe is not stationary. More precisely, it was shown that the relative entropy rate converges for any parameter in the parameter space using an ergodic theorem for subadditive processes. There are further extensions also to Markov chains on general state spaces, but under stronger assumptions, see [13, 14, 15, 16, 17]. A breakthrough has been achieved by Douc et al. [11] who used the concept of exponential separability. This strategy allows one to bound the relative entropy rate directly.
Although the state space of the Markov chain is more general than in our setting, we cannot apply the results of [11] due to the inhomogeneity of the observation, but we use the same approach to show our consistency statements.
The investigation of strong consistency of maximum likelihood estimation in inhomogeneous HMMs is less developed. In [18] and [19]
the MLE in inhomogeneous Markov switching models is studied. There, the transition probabilities are also influenced by the observations, but the inhomogeneity there is different from the timedependent inhomogeneity considered in our work, since the conditional law is not changing over time.
Related to strong consistency, as considered here, is the investigation of asymptotic normality (as it provides weak consistency). For homogeneous HMMs asymptotic normality has be shown for example in [14, 20]. In [19], also, asymptotic normality for the MLE in Markov switching models is studied whereas in [21] asymptotic normality of Mestimators in more general inhomogeneous situations is considered. However, the QMLE we suggest and analyze does not satisfy the assumptions imposed there. In Section 5.4 and in Appendix B we provide and discuss necessary conditions to achieve asymptotic normality for the QMLE by adapting the approach of [21].
To ease readability Section 6 is devoted to the proofs of our main results. In particular, we draw the connection between asymptotic mean stationary processes and inhomogeneous hidden Markov models.
2. Setup and notation
We denote the finite state space of by and denotes the power set of . Furthermore, let be a Polish space with metric and corresponding Borel field . The measurable space is equipped with a finite reference measure . Througout the whole work we consider parametrized families of DHMMs (see Definition 1) with compact parameter space for some . For this let be a sequence of probability measures on a measurable space such that for each parameter the distribution of is specified by

an initial distribution on and a transition matrix of the Markov chain , such that
where and for ,
(Here and elsewhere we use the convention that for any sequence .)

and by the conditional distribution of given , that is,
which satisfies that there are conditional density functions w.r.t. , such that
Here the distribution of given is independent of , whereas the distribution of given depends through also explicitly on .
By we denote the set of probability measures on . To indicate the dependence on the initial distribution, say , we write instead of just . To shorten the notation, let , and . Further, let and be the distributions of and on , respectively.
The “true” underlying model parameter will be denoted as and we assume that the transition matrix possesses a unique invariant distribution . We have access to a finite length observation of . Then, the problem is to find a consistent estimate of on the basis of the observations without observing . Consistency of the estimator of is limited up to equivalence classes in the following sense. Two parameters are equivalent, written as , iff there exist two stationary distributions for , respectively, such that . For the rest of the work assume that each represents its equivalence class.
For an arbitrary finite measure on , , and define
If is a probability measure on , then is the likelihood of the observations for the inhomogeneous HMM with parameter and . Although there are no observations of available, we define similar quantities for by
3. Assumptions and main result
Assume for a moment that observations
of are available. Then the loglikelihood function of , with initial distribution , is given byIn our setting we do not have access to observations of , but have access to “contaminated” observations of . Based on these observations define a quasiloglikelihood function
i.e., we plug the contaminated observations into the likelihood of . Now we approximate by which is the QMLE, that is,
(4) 
In addition, we are interested in the “true” MLE of a realization of . For this define the loglikelihood function
which leads to the MLE given by
(5) 
Under certain structural assumptions we prove that the QMLE from (4) is consistent. By adding one more condition this result can be used to verify that the MLE from (5) is also consistent.
3.1. Structural conditions
We prove consistency of the QMLE and the MLE under the following structural assumptions:
Irreducibility and continuity of

[label=(P0)]

The transition matrix is irreducible.

The parametrization is continuous.
Proximity of and

[label=(C0)]

There exists such that for any and we have
(Recall that is the metric on .)

There exists an integer such that
(6) and
(7) 
For every with , there exists a neighborhood of such that there exists an integer with
(8) and
(9)
Remark 1.
1 guarantees in particular that converges a.s. to zero whereas 2 ensures that the ratio of and does not diverge exponentially or faster. Assumption 3 is needed to carry over the consistency of the QMLE to the MLE. In particular it implies that for all the ratio of and does not diverge exponentially or faster uniformly in .
Well behaving HMM
It is plausible that we are only able to prove consistency in the case where the unobservable sequence would lead to a consistent estimator of , itself. To guarantee that this is indeed the case we assume:

[label=(H0)]

For all let .

For every with , there exists a neighborhood of such that

The mappings and are continuous for any , and .

For all and let .
3.2. Consistency theorem
Now we formulate our main results about the consistency of the QMLE and the MLE.
Theorem 1.
Note that condition 3 is not required in the previous statement. We only need it to prove the consistency of the MLE .
4. Application
We consider two models where we explore the structural assumptions from Section 3.1 explicitly. The Poisson model, see Section 4.1, illustrates a simple example with countable observation space. The linear Gaussian model is an extension of the model introduced in (1) and (2) to multivariate and possibly correlated observations.
4.1. Poisson DHMM
For let
and define the vector
. Conditioned on the nonobserved homogeneous sequenceis an independent sequence of Poissondistributed random variables with parameter
. In other words, given we have . Here denotes the Poisson distribution with expectation . The observed sequence is determined bywhere is an independent sequence of random variables with . Here is a sequence of positive real numbers satisfying for some that
(10) 
We also assume that is independent of and that the parameter determines the transition matrix and the intensity continuously. Note that the observation space is given by equipped with the counting measure . Figures 3 illustrates the empirical mean square error of approximations of the MLEs.
To obtain the desired consistency of the two estimators we need to check
the conditions 1, 2, 1–3
and 1–4:
Hence
and 1 is verified. A similar calculation gives 4.
Condition 2 follows simply by .
Condition 3 follows by the continuity in
the parameter of the probability function of
the Poisson distribution and the continuity
of the mapping .
To 1 – 3: For any and any we have
From (10) it follows that
which proves 1. Observe that for any we have
with . Now we verify 2 with . For all and we have
Fix , and note that
The last equality follows by the fact that and . Condition 3 follows by similar arguments.
Corollary 2.
For any initial distribution which is strictly positive if and only if is strictly positive, we have for the Poisson DHMM if (10) holds for some that
and
as .
4.2. Multivariate linear Gaussian DHMM
For let , with full rank, where . Define as well as The sequences and are defined by
Here is an i.i.d. sequence of random vectors with , where
denotes the identity matrix, and
is a sequence of independent random vectors with , where is a positive realvalued sequence satisfying for some that(11) 
Here we also assume that the mapping is continuous. Furthermore, note that and is the dimensional Lebesgue measure. Figures 5 illustrates the empirical mean square error of approximations of the MLEs.
To 1–4: For a matrix denote and . Note that for , and we have by
Further, observe that for all . For some constant we have
since for each we have with the notation . By this estimate 1 and 2 follows easily. Condition 4 follows by similar arguments. More detailed, we have that is finite and converges to zero as well, as that there exists a constant such that
Comments
There are no comments yet.