 # Detecting Changes in Hidden Markov Models

We consider the problem of sequential detection of a change in the statistical behavior of a hidden Markov model. By adopting a worst-case analysis with respect to the time of change and by taking into account the data that can be accessed by the change-imposing mechanism we offer alternative formulations of the problem. For each formulation we derive the optimum Shewhart test that maximizes the worst-case detection probability while guaranteeing infrequent false alarms.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

We consider a hidden Markov model (HMM) where is the observation process that is acquired sequentially and is a Markov process that controls the statistical behavior of but its state is hidden. Let also denote a changetime with the processes following a nominal probability measure up to and including time while, after , the probability measure switches to an alternative regime . This change induces a new measure which is denoted by with being reserved for the corresponding expectation.

To be more precise, for , we make the simplifying assumptions that the observations are i.i.d. with a common pdf and the Markov process having a transition pdf . After the change the observations are conditionally independent and controlled by the Markov process. In particular conditioned on has a pdf while the Markov process has transition pdf . It is possible to have , namely, the Markov process not to undergo any change. For simplicity, under the nominal measure the observations are assumed not to be controlled by the Markov process.

We would like to detect the onset of the change in the statistical behavior using a sequential strategy. We are therefore interested in defining a stopping time adapted to the filtration generated by the observations, that is, , to perform the detection. In order to select optimally we need to propose a suitable performance measure and properly optimize it. To derive our criterion we are going to extend the idea introduced in . Even though we can access only the observation process to perform detection, there is also a change-imposing mechanism that must decide about the time to impose the change. And this mechanism may have access to a completely different set of data to make this decision.

## Ii Performance Measure

As we suggested above is an -adapted stopping time. In order to capture the fact that the change-imposing mechanism may have access to completely different information, we are going to assume that is also a stopping time (the time that the data stop following the nominal model) but adapted to a filtration where . In other words, the change-imposing mechanism sequentially consults the data sequence and at each time instant makes a decision as to whether it should impose a change or not. Both, ourselves that select and the change-imposing mechanism that selects are bound by a causality constraint forbidding the use of any data from the future. Clearly process , which is available to the change-imposing mechanism, may or may not include and it can be dependent or completely independent from the observations.

The change-imposing mechanism decides what is the best instant to impose the change while we decide what is the best time to stop and declare that a change took place. If is a deterministic function expressing distance between or reward for the pair then we can use the conditional expectation as a generic performance measure for the detection process. We condition on the event of no false alarms in order to compute the performance of only during successes since we intend to take care of false alarms differently.

Most of the time the rule that defines the stopping time is unknown, therefore the proposed performance measure cannot be computed and, more importantly, used to derive an optimum detection stategy. In such cases it is common to follow a worst-case analysis with respect to . In other words try to find the worst-case that will make the conditional expectation as unfavorable as possible to the detection goal. We have the following lemma that addresses this problem.

###### Lemma 1.

Suppose that and are stopping times described as above, then

 infτEτ[ϕ(T,τ)|T>τ]=inft≥0essinfEt[ϕ(T,t)|T>t,Fwt], (1)

The previous equality is also valid if we replace and with and .

###### Proof.

The proof is given in the Appendix. ∎

If we select then we can define an extension of Lorden’s measure  by computing the worst-case average detection delay as follows

 J(T)=supt≥0esssupEt[T−t|T>t,Fwt].

We must emphasize that this is not the original Lorden measure since conditioning is with respect to the sigma-algebra that controls and not used in the original definition. Furthermore, in our approach the double maximization occurs naturally as a result of our worst-case analysis and not because of some arbitrary definition.

An alternative measure can be generated by evaluating the performance of using the probability of detecting the change immediately after it occurs. In other words we are interested in the probability of the event . For this reason we define . If we apply Lemma 1 we can compute the worst-case detection probability

 P(T)=inft≥0essinfPt(T=t+1|T>t,Fwt), (2)

which in this work is the criterion we intend to adopt.

Returning to HMMs and using (2) we distinguish four different cases depending on how is related to the existing data. i) The change-imposing mechanism accesses information that is independent from . In this case in (2) there is no conditioning with respect to since the probability does not depend on . This yields

 Pi(T)=inft≥0Pt(T=t+1|T>t). (3)

This is the Pollak-like criterion proposed in . ii) The change-imposing mechanism accesses only the observations, then

 Pii(T)=inft≥0essinfPt(T=t+1|T>t,Fξt). (4)

This is the Lorden-like criterion proposed in . iii) The change-imposing mechanism accesses only the state of the Markov process. This leads to

 Piii(T)=inft≥0essinfPt(T=t+1|T>t,Fzt), (5)

where corresponding to . iv) The change-imposing mechanism accesses both, the observations and the state of the Markov process resulting in

 Piv(T)=inft≥0essinfPt(T=t+1|T>t,Fξ,zt), (6)

where corresponding to .

For each criterion we can define a constrained optimization problem whose solution will provide the optimum :

 supTPl(T), subject to: E∞[T]≥γ>1, (7)

where . In other words we maximize the worst-case detection probability assuring at the same time that the average period between false alarms is lower bounded by a constant that we can select.

The idea of maximizing the detection probability was first introduced in  under Shiryaev’s  Bayesian formulation. In  we have a variant of the original Pollak measure  while a variant of Lorden’s measure  was adopted in  for independent processes and in  for Markov. In this work we address the case of HMMs. The problem of change-detection in HMMs has been considered in the past in [2, 3, 4]

and from these results it is well understood that even the asymptotic analysis is extremely challenging, not always leading to outcomes that are practically implementable.

## Iii Candidate Tests

Let us first present the joint data pdf induced by a change occurring at some time . For we have

 ft(ξs,…,ξ1,zs,…,z0)=f∞(ξs)⋯f∞(ξ1)×g∞(zs|zs−1)⋯g∞(z1|z0)g∞(z0),

while for the resulting pdf takes the form

 ft(ξs,…,ξ1,zs,…,z0)=f0(ξs|zs)⋯f0(ξt+1|zt+1)×g0(zs|zs−1)⋯g0(zt+1|zt)×f∞(ξt)⋯f∞(ξ1)×g∞(zt|zt−1)⋯g∞(z1|z0)g∞(z0),

where is the marginal pdf of . The pdfs , , , are assumed known.

To simplify our presentation we are going to assume that is the stationary pdf for the transition pdf , namely . We can then define the following average probability density

 ¯f10(ξt)=∬f0(ξt|zt)g0(zt|zt−1)g∞(zt−1)dzt−1dzt, (8)

which, when , simplifies to

 ¯f10(ξt)=∫f0(ξt|zt)g∞(zt)dzt, (9)

and will be used for Criteria i) and ii). For Criteria iii) and iv) we define

 ¯f20(ξt)=∬f0(ξt|zt)g0(zt|zt−1)π(zt−1)dzt−1dzt, (10)

where is a pdf to be specified in the sequel.

With the help of the average pdfs we can now define the candidate Shewhart stopping time as follows

 Lj(ξt)=¯fj0(ξt)f∞(ξt),  Sj=inf{t>0: Lj(ξt)≥νj}. (11)

Threshold is selected to satisfy the false alarm constraint with equality, namely

 E∞[Sj]=1P∞(Lj(ξt)≥νj)=γ, (12)

Existence of is guaranteed since the equation has always a solution if we assume that does not contain any atoms under . Otherwise, in order to satisfy (12) we may need randomization every time .

We can also compute the corresponding worst-case detection probability of the two Shewhart schemes. For the first test, since there is no dependence on the past, we have

 β1=Pi(S1)=Pii(S1)=∫¯f10(ξt)1{L1(ξt)≥ν1}dξt. (13)

For the second Shewhart test the analysis for finding the worst-case detection probability is slightly more involved. Consider first the conditional pdf

 f0(ξt|zt−1)=∫f0(ξt|zt)g0(zt|zt−1)dzt

then the worst-case detection probability satisfies

 β2=Piii(S2)=Piv(S2)=infzt−1∫f0(ξt|zt−1)1{L2(ξt)≥ν2}dξt. (14)

We recall that the second Shewhart test is defined in terms of an arbitrary probability density . This means that the stopping time and also the worst-case detection probability are functions of as well. To specify , let denote its support, then must be such that

 ∫f0(ξt|zt−1)1{L2(ξt)≥ν2}dξt=β2, for zt−1∈Z∫f0(ξt|zt−1)1{L2(ξt)≥ν2}dξt≥β2, for zt−1∉Z. (15)

In other words, must put all its probability mass onto points for which the Shewhart test exhibits its worst-case performance. In fact (15) is sufficient to define uniquely.

## Iv Max-Min Optimality

In this section we will demonstrate that the stopping times , defined in (11) solve the max-min constrained optimization problem defined in (7). In order to prove our claim we first need to find a suitable upper bound for . The following theorem provides the necessary expressions.

###### Theorem 1.

For any stopping time with we have

 Pl(T)≤E∞[L1(ξT)]E∞[T],l=i,ii;  Pl(T)≤E∞[L2(ξT)]E∞[T],l=iii,iv.

Additionally, if , then we have equality in the corresponding inequality.

###### Proof.

The proof is given in the Appendix. ∎

The next theorem optimizes the upper bounds proposed in Theorem 1.

###### Theorem 2.

If is any stopping time satisfying the false alarm constraint, then

 E∞[Lj(ξT)]E∞[T]≤βj, j=1,2,

where are defined in (13), (14) respectively.

###### Proof.

The proof is highlighted in the Appendix. ∎

Combining Theorems 1 and 2, immediately assures optimality of the Shewhart tests. In particular for we have

 Pl(T)≤E∞[L1(ξT)]E∞[T]≤β1=Pi(S1)=Pii(S1),

while for we conclude

 Pl(T)≤E∞[L2(ξT)]E∞[T]≤β2=Piii(S2)=Piv(S2).

These two relationships establish optimality of the two Shewhart tests. In the next section we offer an example involving an interesting HMM.

## V Example

We consider the case of a Gaussian process whose mean is controlled by a Gaussian Markov process. Specifically, let the observations before the change be i.i.d. with pdf and after the change assume . The process is unobservable and of the form where is a constant denoting the mean of and is an AR(1) Gaussian process with being conditionally Gaussian of the form with .

For the stationary pdf we have . Since in this example we assume that the Markov process does not change, if we focus on the solution for Criteria i) and ii), we use (9) to compute

 (16)

Following (11) we can easily establish that the optimal Shewhart test is equivalent to

 S1=inf{t>0:∣∣ξt+μ1−α2σ2∣∣≥ν1}. (17)

Threshold is related to the average false alarm period through (12) which takes the form

 Φ(μ1−α2σ2−ν1)+Φ(−μ1−α2σ2−ν1)=1γ, (18)

while the worst-case detection probability becomes

 β1=Φ⎛⎜⎝μ(1+1−α2σ2)−ν1√1+σ21−α2⎞⎟⎠+Φ⎛⎜⎝−μ(1+1−α2σ2)+ν1√1+σ21−α2⎞⎟⎠. (19)

Let us now consider Criteria iii) and iv). We focus on the computation of (10) and perform it in two steps. The first involves the computation of the conditional pdf

 f0(ξt|zt−1)=∫f0(ξt|zt)g0(zt|zt−1)dzt∼N((1−α)μ+αzt−1,1+σ2). (20)

The next step consists in finding the pdf . We are going to assume that puts all its mass on the single point . This implies that . We can then verify that the resulting Shewhart test is equivalent to

 S2=inf{t>0:|ξt|≥ν2} (21)

with the threshold satisfying the false alarm constraint

 2Φ(−ν2)=1γ. (22)

Of course, in order for our selection of to be correct we need to show validity of (15). Therefore we must prove that has a minimum for . Using (20) the desired probability is

 P0(|ξt|≥ν2|zt−1)=Φ(−ν2+((1−α)μ+αzt−1)√1+σ2)+Φ(−ν2−((1−α)μ+αzt−1)√1+σ2)

which is clearly minimized when with the minimum being equal to

 β2=2Φ(−ν2√1+σ2). (23)

The latter also constitutes the optimum worst-case detection probability for the Shewhart test in (21). It is worth mentioning that the Shewhart stopping time is UMP with respect to and since, as we can see, it does not require knowledge of these parameters. What is equally interesting is that the optimum worst-case detection probability is only a function of and not of . It is only that depends on these two parameters.

Suppose now that we erroneously assume that the change-imposing mechanism does not access the state of the Markov process when in reality it does. In this case we will be using from (17) instead of from (21). For it is not difficult to verify that the worst-case detection probability is equal to

 ~β1=2Φ(−ν1√1+σ2). (24)

A similar erroneous assumption can occur when we consider the change-imposing mechanism to be able to access the Markov state when in reality it does not. Consequently by using from (21) we need to compute its performance under the pdf in (16). This yields

 ~β2=Φ((−ν2+μ)√1−α2√1−α2+σ2)+Φ(−(ν2+μ)√1−α2√1−α2+σ2). (25)

Clearly (25) must be compared against the optimum (19) while (24) against the optimum (23). Fig. 1: Detection probability as a function of average false alarm period of Shewhart test when change-imposing mechanism does not access the Markov process state and we correctly assume it does not (blue); when it does not and we erroneously assume it does (black); when it does and we correctly assume it does (red) and finally when it does and we erroneously assume it does not (green).

For a numerical comparison, let , , with ranging from 1 to 1000. Fig. 1 depicts the corresponding detection probabilities. The graph in blue corresponds to the change-imposing mechanism having no access to the Markov process and we correctly assume that it does not. This means that we plot from (19) against computed from (18). If this assumption is wrong and the change-imposing mechanism can actually access the Markov state then we have a severe performance degradation depicted by the graph in green where we plot from (24) against from (18).

If we now use the test in (21) and the change-imposing mechanism can indeed access the Markov state then the red graph depicts the worst-case detection probability from (23) as a function of from (22). In case we made a mistake in our judgement and the change-imposing mechanism cannot access the Markov state then the same test has a performance depicted by the black curve where we plot from (23) in terms of from (22).

By using the Shewhart test in (21), which is obtained under more severe assumptions we do not lose much as compared to the optimum (17) if our assumption about the access capabilities of the change-imposing mechanism is incorrect. On the other hand, we guard ourselves against a hostile change-imposing mechanism when the latter can access all the available information. If, however, we assume that the change-imposing mechanism cannot access the Markov state and use , this assumption can be catastrophic if it is wrong.

## Vi Conclusion

We considered the sequential change-detection problem for HMM which is known for being challenging. By introducing a generalized version of Lorden’s performance measure we were able to come up with the optimum solution that maximizes the worst-case detection probability. This result is interesting since it is the first time we were able to obtain a solution for a performance measure that is different from the classical measures adopted so far in the literature.

## Acknowledgement

This work was supported by the US National Science Foundation under Grant CIF 1513373, through Rutgers University.

Proof of Lemma 1: Since is a -adapted stopping time we have that is -measurable consequently we can write

 Eτ[ϕ(T,τ)|T>τ]=∑∞t=0E∞[Et[ϕ(T,t)1{T>t}|Fwt]1{τ=t}]∑∞t=0E∞[Pt(T>t|Fwt)1{τ=t}]≥inft≥0E∞[Et[ϕ(T,t)1{T>t}|Fwt]1{τ=t}]E∞[Pt(T>t|Fwt)1{τ=t}]≥inft≥0essinfEt[ϕ(T,t)1{T>t}|Fwt]Pt(T>t|Fwt)=inft≥0essinfEt[ϕ(T,t)|T>t,Fwt].

This lower bound is in fact attainable. Suppose that the last double minimization is achieved by some (minimization over ) and realization (minimization over the data), then the change-imposing mechanism can simply impose a change at when the specific combination of data occur. If there are more choices yielding the same lower bound then it can perform randomization between them. Proof of Theorem 1: Let us consider first Criterion i). We have

 Pt(T=t+1|T>t)=Pt(T=t+1)Pt(T>t)=E∞[f0(ξt+1|zt+1)g0(zt+1|zt)f∞(ξt+1)g∞(zt+1|zt)1{T=t+1}]P∞(T>t),

where the denominator takes this specific form because the event is -measurable and therefore happens before the change. Since is -measurable we need to average out conditioned on . This is easy since under the observations and the Markov process are independent. Indeed this conditional expectation becomes

 E∞[f0(ξt+1|zt+1)g0(zt+1|zt)f∞(ξt+1)g∞(zt+1|zt)|Fξt+1]=∫f0(ξt+1|zt+1)f∞(ξt+1)g0(zt+1|zt)g∞(zt|zt−1)⋯g∞(z0)dzt+1⋯dz0=∫f0(ξt+1|zt+1)f∞(ξt+1)g0(zt+1|zt)g∞(zt)dzt+1dzt=¯f10(ξt+1)f∞(ξt+1),

where we used the fact that is the stationary pdf. Since we can conclude that

 Pi(T)P∞(T>t)≤E∞[¯f10(ξt+1)f∞(ξt+1)1{T=t+1}].

Summing over yields the desired inequality. The previous inequality becomes an equality when because the Shewhart test is an equalizer, namely, is a constant independent from .

For Criterion ii) derivations are similar. Indeed we can write

 Pii(T)P∞(T>t|Fξt)≤E∞[¯f10(ξt+1)f∞(ξt+1)1{T=t+1}|Fξt].

Taking expectation on both sides with respect to the measure and summing over yields the desired result. Again we have equality when because is an equalizer.

Let us now consider Criterion iii), we have

Multiplying both sides with and averaging with respect to yields

 Piii(T)E∞[ϖ(zt)1{T>t}]≤E∞[f0(ξt+1|zt+1)g0(zt+1|zt)f∞(ξt+1)g∞(zt+1|zt)ϖ(zt)1{T=t+1}].

For the left hand side we have

 E∞[ϖ(zt)1{T>t}]=E∞[E∞[ϖ(zt)|Fξt]1{T>t}]=(∫ϖ(zt)g∞(zt)dzt)E∞[1{T>t}]=E∞[1{T>t}],

where we define and, without loss of generality, we assume that . For the right hand side we can similarly write

where in the last equality we use the definition in (10). The desired inequality can be shown as in the previous cases. Finally, when we have equality because puts all its mass on values of where the essential infimum is attained and because the resulting value is independent from (equilizer). Similarly we can prove the upper bound for Criterion iv). Proof of Theorem 2: The first step in the proof consists in observing that we can limit ourselves to stopping times that satisfy the false alarm constraint with equality, that is, . Indeed if then we can perform a randomization before taking any observations as to whether we should stop at time 0 with probability or continue according to the stopping time with probability . This generates a new stopping that satisfies and therefore we can select so that satisfies . On the other hand we can verify that

 E∞[Lj(ξ~T)]E∞[~T]=E∞[Lj(ξT)]E∞[T].

Because of the previous observations we need to prove that over all satisfying the false alarm constraint with equality. In fact it will be sufficient if we consider the unconstrained version

 E∞[(βj−νjγ)T−Lj(ξT)]≥−νj (26)

obtained by subtracting from the left and from the right side. We can now assume that there is no constraint on and minimize the left hand side in (26) over . Since is -adapted and under is i.i.d., this optimal stopping problem can be easily solved and we can show that the optimum stopping time is defined in (11). By direct computation we can also verify that the minimum value of the left hand side in (26) is indeed equal to .

## References

•  T. Bojdecki, “Probability maximizing approach to optimal stopping and its application to a disorder problem,” Stochastics, vol. 3, pp. 61–71, 1979.
•  C.-D. Fuh, “SPRT and CUSUM in hidden Markov models,” Ann. Stat., vol. 31, no. 3, pp. 942–997, 2003.
• 

C.-D. Fuh and Y. Mei, “Quickest change detection and Kullback-Leibler divergence for two-state hidden Markov models,”

Trans. Inf. Theory, vol. 63, no. 18, pp. 4866–4878, 2015.
•  C.-D. Fuh and A. G. Tartakovsky, “Asymptotic Bayesian theory of quickest change detection for hidden Markov models,” Trans. Inf. Theory, vol. 65, no. 1, pp. 511–529, 2019.
•  G. Lorden, “Procedures for reacting to a change in distribution,” Ann. Math. Stat., vol. 42, pp. 1897–1908, 1971.
•  G. V. Moustakides, “Sequential change detection revisited,” Ann. Stat., vol. 36, no. 2, pp. 787–807, 2008.
•  G. V. Moustakides, “Multiple optimality properties of the Shewhart test,” Seq. Anal., vol. 33, pp. 318–344, 2014.
•  G. V. Moustakides, “Optimum Shewhart tests for Markovian data,” 53rd Annual Allerton Conference on Communication, Control and Computing, pp. 822–826, 2015.
•  M. Pollak, “Optimal detection of a change in distribution,” Ann. Stat., vol. 13, pp. 206–227, 1985.
•  M. Pollak and A. M. Krieger, “Shewhart revisited,” Seq. Anal., vol. 32, pp. 230–242, 2013.
•  W. A. Shewhart, Economic Control of Quality of Manufactured Product. New York: D. Van Nostrand Company, 1931.
•  A. N. Shiryaev, “On optimal methods in quickest detection problems,” Theory Probab. Applic. vol. 8, pp. 22–46, 1963.