# Monotonicity and robustness in Wiener disorder detection

We study the problem of detecting a drift change of a Brownian motion under various extensions of the classical case. Specifically, we consider the case of a random post-change drift and examine monotonicity properties of the solution with respect to different model parameters. Moreover, robustness properties -- effects of misspecification of the underlying model -- are explored.

## Authors

• 3 publications
• 4 publications
• ### Estimation for change point of discretely observed ergodic diffusion processes

We treat the change point problem in ergodic diffusion processes from di...
02/13/2021 ∙ by Yozo Tonaki, et al. ∙ 0

• ### Detecting Concept Drift With Neural Network Model Uncertainty

Deployed machine learning models are confronted with the problem of chan...
07/05/2021 ∙ by Lucas Baier, et al. ∙ 0

• ### Concept Drift Detection: Dealing with MissingValues via Fuzzy Distance Estimations

In data streams, the data distribution of arriving observations at diffe...
08/09/2020 ∙ by Anjin Liu, et al. ∙ 0

• ### V2: Fast Detection of Configuration Drift in Python

Code snippets are prevalent, but are hard to reuse because they often la...
09/13/2019 ∙ by Eric Horton, et al. ∙ 0

• ### Understanding Model Drift in a Large Cellular Network

Operational networks are increasingly using machine learning models for ...
09/07/2021 ∙ by Shinan Liu, et al. ∙ 0

• ### A probability theoretic approach to drifting data in continuous time domains

The notion of drift refers to the phenomenon that the distribution, whic...
12/04/2019 ∙ by Fabian Hinder, et al. ∙ 0

• ### A sequential test for the drift of a Brownian motion with a possibility to change a decision

We construct a Bayesian sequential test of two simple hypotheses about t...
07/25/2020 ∙ by Mikhail Zhitlukhin, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In the classical version of the quickest disorder detection (QDD) problem, see [9], one observes a one-dimensional process which satisfies

 Yt=b(t−Θ)++σWt,

where and are non-zero constants, is a standard Brownian motion and the disorder time

is an exponentially distributed random variable (with intensity

) such that and are independent. The associated Bayes’ risk (expected cost) corresponding to a stopping rule is defined as

 P(Θ>τ)+cE[(τ−Θ)+], (1.1)

where is the cost of one unit of detection delay. It is well-known (see [10, Chapter 4]

) that to minimise the Bayes risk one should stop the first time the conditional probability process

reaches a certain level . Moreover, the level is characterized as the unique solution of a transcendental equation.

In many situations, however, it is natural not to know the exact value of the disorder magnitude , but merely its distribution. This is the case for example when a specific machine is monitored continuously, and the machine can break down in several possible ways. To study such a situation, we allow for the new drift to be a random variable with distribution such that is independent of the other sources of randomness. In this setting we study monotonicity properties of the QDD problem, i.e. whether the (minimal) expected cost is monotone with respect to various model parameters. In particular, we study the dependence of the expected cost on the volatility , the distribution , and the disorder intensity . We also study robustness

in the QDD problem, i.e. what happens if one misspecifies various model parameters. More specifically, we aim at estimates for the increased cost associated with the use of suboptimal strategies. Clearly, such estimates are helpful in situations where the model is badly calibrated, but also in situations where one chooses to use a simpler suboptimal strategy rather than a computationally more demanding optimal strategy.

As mentioned above, the classical version of the QDD problem was studied in [9], see also [10, Chapter 4] and [8, Section 22]; for extensions to the case of detecting a change in the intensity of a Poission process, see [7], [3] and [4]. For the case of a random disorder magnitude, [2]

obtains asymptotic results of a problem with normally distributed drift. Concavity of the value function in a related hypothesis testing problem with two possible post-change drift values in a time-homogeneous case was obtained in

[6]. Finally, practical significance of the disorder detection problem in modern engineering applications is explained in [11].

## 2 General model formulation

We model a signal-processing activity on a stochastic basis , where the filtration satisfies the usual conditions. We are interested in the signal process , which is not directly observable, but we can continuously observe the noisy process

 Yt=∫t0Xudu+∫t0σ(u)dWu,t≥0. (2.1)

Here is a Brownian motion independent of , the dispersion is deterministic and strictly positive, and the signal process follows

 Xt=B01{Θ=0}+B11{0<Θ≤t}, (2.2)

where is a -valued random variable representing the disorder occurrence time. Moreover, are real-valued random variables corresponding to disorder magnitudes in the cases ‘disorder occurs before we start observing ’ and ‘disorder occurs while we observe ’, respectively. Also, , , and are independent. Let have the distribution , were is a probability measure on with a continuously differentiable distribution function . In addition, denote the distributions of and by and , respectively. When referring to and collectively, we will simply say that the prior is . Let us introduce the notation

 Dn:={π∈[0,∞)n:∥π∥1≤1}

and

 Δn:={π∈[0,∞)n:∥π∥1=1},

where . We assume that

 μ0=n∑i=1ˇpiδbi,μ1=n∑i=1piδbi,

where and .

The model studied in the paper is a generalisation of the classical disorder occurrence model [9]. Firstly, the exponential disorder distribution used in the classical problem is replaced by an arbitrary distribution with time-dependent intensity. The generalisation is advantageous in situations when the intensity of the disorder occurrence changes with time. For example, if the disorder corresponds to a component failure in a system, for many physical systems, the failure intensity is known to increases with age. Also, if occurrence of the disorder depends on external factors such as weather, then such dependency can be incorporated into the time-dependent disorder intensity from an accurate weather forecast. Moreover, in contrast to the classical problem in which the disorder magnitude is known in advance, in this generalisation, the magnitude takes a value from a range of possible values. Returning to the component failure example, the different possible disorder magnitudes would represent different types of component failure. In the problem of detecting malfunctioning atomic clocks [11], the disorder corresponds to a systematic drift of a clock. The sign of the disorder magnitude reflects whether a clock starts to go too slow or two fast while the absolute value represents the severity of the drift. In addition, the different distributions , of and and the weight reflect the prior knowledge about how likely different disorder magnitudes are if the disorder happened before or while observing . For instance, such model flexibility is relevant when we start observing the system after a particular incident (e.g. a storm if the system is affected by the weather) and we know that the distribution of possible disorder magnitudes after the incident is different than under normal operating conditions. From a mathematical point of view, and allows us to give a statistical interpretation to an arbitrary starting point in the Markovian embedding (2.6) of the original optimal stopping problem studied later.

###### Remark 2.1.

We point out that the finite support assumption on is made for notational convenience. As any distribution can be approximated arbitrarily well by finitely supported ones, obviously, our monotonicity results below can be extended to general disorder magnitude distributions.

We are interested in a disorder detection strategy incorporating two objectives: short detection delay and a small portion of false alarms. As noted in the introduction, a classical choice of Bayes’ risk for a detection strategy to minimize is given by (1.1). In the present paper, we consider a slightly more flexible risk structure by allowing a time-dependent cost for the detection delay. More precisely, we consider the Bayes’ risk

 R(τ) := E[1{τ<Θ}+∫τΘc(u)du]

where is a fixed penalty for a false alarm and the term is a penalty for detection delay. Here is a deterministic function with for all . Writing for the filtration generated by (which is our observation filtration), let us introduce . Then

 R(τ) = = E[1−~Πτ+∫τ0c(t)~Πtdt].

Hence the optimal stopping problem to solve is

 V = infτ∈TYE[1−~Πτ+∫τ0c(t)~Πtdt], (2.3)

where denotes the set of -stopping times.

### 2.1 Filtering equations

Let us define , where . By the Kallianpur-Striebel formula, see [5, Theorem 2.9 on p. 39], rCl 3l Π^(i)_t=
&& ~πˇpie0tbiσ(u)2dYu- ∫0tbi22 σ(u)2du+ (1-~π)pi[0,t]eθtbiσ(u)2dYu- ∫θtbi22 σ(u)2duν(dθ)~πjˇpje0tbjσ(u)2dYu- ∫0tbj22 σ(u)2du+ (1-~π)(jpj[0,t]eθtbjσ(u)2dYu- ∫θtbj22 σ(u)2du ν(dθ) + ν((t, ∞)))
for . Moreover, from the Kushner-Stratonovich equation, see [5, Theorem 3.1 on p. 58], we know that satisfies

 dΠ(i)t=piλ(t)(1−n∑j=1Π(j)t)dt+Π(i)tσ(t)(bi−n∑j=1bjΠ(j)t)d^Wt,i=1,…,n. (2.4)

Here is the intensity of the disorder occurring at time (conditional on not having occurred yet), and

 ^Wt=∫t01σ(u)(dYu−E[Xu|FYu]du)

is a standard Brownian motion with respect to , see [1] (the process is referred to as the innovation process). Note that yields

 d~Πt=λ(t)(1−~Πt)dt+^Xtσ(t)(1−~Πt)d^Wt, (2.5)

where .

The posterior distribution , so the -tuple fully describes the posterior. As a result, (2.1) and (2.4) provide two different representations of the posterior distribution.

### 2.2 Markovian embedding

Following standard lines in optimal stopping theory, we embed our optimal stopping problem into a Markovian framework. To do that, define a Markovian value function by

 V(t,π):=infτ∈TΠtEt,π[1−~Πt+τ+∫t+τtc(u)~Πudu],(t,π)∈[0,∞)×Dn, (2.6)

where denotes the stopping times with respect to the -dimensional process starting from at time and satisfying (2.4). It is worth noting that corresponds to the value of the problem in which the initial time is and .

###### Remark 2.2.

The value function in (2.6) is concave for any . Indeed, the concavity proof in [6] extends to the current setting; we omit the details.

#### 2.2.1 The classical Shiryaev solution

In this subsection we recall the solution in the classical case where the cost , the intensity and the post-change drift are constants. In that case, we have the optimal stopping problem

 U(π)=supτ∈TΠEπ[1−Πτ+c∫τ0Πtdt] (2.7)

with an underlying diffusion process

 dΠt=λ(1−Πt)dt+bσΠt(1−Πt)d^Wt.

It is well-known (see [10, Chapter 4] or [8, Section 22]) that solves the free-boundary problem

 ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩b2π2(1−π)22σ2∂2πU+λ(1−π)∂πU+cπ=0π∈(0,a)U(π)=1−ππ∈[a,1]∂πU(a)=−1. (2.8)

Here is the free-boundary, and it can be determined as the solution of a certain transcendental equation. Moreover, the stopping time is optimal in (2.7), and one can check that the value function is decreasing and concave.

## 3 Value dependencies and robustness

### 3.1 Monotonicity properties of the value function

In this section, we study parameter dependence of the optimal stopping problem (2.6). In particular, we investigate how the value function changes when we alter parameters of the probabilistic model, which include the prior for the drift magnitude and the prior for the disorder time.

The effects of adding more noise, stretching out the prior by scaling, and increasing the observation cost are explained by the following theorem.

###### Theorem 3.1 (General monotonicity properties of the value function V).
1. is increasing in the volatility .

2. Given a prior for the drift magnitude, let denote the Markovian value function (2.6) in the case when the drift prior is . Then the map is decreasing on for any .

3. is increasing in the cost function .

###### Proof.

For simplicity of notation, and without loss of generality, we consider the case in the proofs below.

1. For the volatility, let and be two time-dependent volatility functions satisying for all . Also, let

 Yit:=∫t0Xudu+∫t0σi(u)dWu,i=1,2,

and let , , be the corresponding value functions. In addition, let be a standard Brownian motion independent of and . Then, clearly,

Moreover, the process

 ~Y2t:=Y1t+∫t0√σ22(u)−σ21(u)dW⊥u

coincides in law with and . Hence it follows that

 V1=infτ∈TY1,W⊥E[1{τ<Θ}+∫τΘc(u)du]≤infτ∈T~Y2E[1{τ<Θ}+∫τΘc(u)du]=V2,

which finishes the proof of the claim.

2. Note that for , the process

 Ykt:=∫t0kXudu+∫t0σ(u)dWu

satisfies , where

 ~Yt:=∫t0Xudu+∫t0σ(u)kdWu.

Moreover, the set of -stopping times coincides with the set of -stopping times, so monotonicity in is implied by monotonicity in the volatility. Thus claim 2 follows from claim 1.

3. The fact that the value is increasing in is obvious from the definition (2.6) of the value function.

The monotonicity of the minimal Bayes’ risk with respect to volatility is of course not so surprising: more noise in the observation process gives a smaller signal-to-noise ratio, which slows down the speed of learning. It is less clear how a change in the disorder intensity should affect the value function under a general disorder magnitude distribution. However, we have the following comparison result for the case of constant parameters.

###### Theorem 3.2 (Monotonicity in the intensity for constant parameters).

Assume that the disorder magnitude can only take one value . Let the cost , the volatility and the intensity be constants, and assume that . Let be the value function for Shiryaev’s problem with parameters , and let denote the value function for the problem specification . Then for all and .

###### Proof.

Without loss of generality, we only consider the case . Let , denote by the observation process corresponding to the model specification , and let denote the corresponding process started from at time . Let be a bounded stopping time. Then, applying (a generalised version of) Ito’s formula and taking expectations at the stopping time , we get

 U(π) = E[U(Π′(τ))]−E[∫τ0(λ′(s)(1−Π′(s))∂πU(Π′(s)) +b22σ2(Π′)2(s)(1−Π′(s))2∂2πU(Π′(s)))ds] ≤ E[U(Π′(τ))]−E[∫τ0(λ(1−Π′(s))∂πU(Π′(s)) +b22σ2(Π′)2(s)(1−Π′(s))2∂2πU(Π′(s)))ds] ≤ E[U(Π′(τ))]+E[c∫τ0Π′(s)ds] ≤ E[1−Π′(τ)]+E[c∫τ0Π′(s)ds],

where we used the monotonicity of and the fact that

 λ(1−π)∂πU(π)+b22σ2π2(1−π)2∂2πU(π)+cπ≥0 (3.1)

at all points away from the optimal stopping boundary of Shiryaev’s classical problem, compare (2.8). Taking the infimum over bounded stopping times , we get , which finishes the proof. ∎

###### Remark 3.1.
1. The monotonicity in intensity does not easily extend to cases with unknown post-change drift by the same argument. In fact, one can check that in higher dimensions the partial derivatives are not necessarily all negative, which implies difficulties with extending the above proof to a more general setting. However, in the robustness result in Theorem 3.3 below we provide a partial extension in which models with general support for the drift magnitude and general intensities are compared with a fixed parameter model.

2. Though the authors expect the inequality in Theorem 3.2 to hold also when one time-dependent intensity dominates another, the comparison with the constant intensity case was chosen to avoid additional mathematical complications that need to be resolved in order to apply Ito’s formula to the value function of a time-dependent disorder detection problem.

### 3.2 Robustness

Robustness concerns how a possible misspecification of the model parameters affects the performance of the detection strategy when evaluated under the real physical measure. In this section, we use coupling arguments to study robustness properties with respect to the disorder magnitude and disorder time. For simplicity, we assume that the parameters , and are constant so that we have a time-independent case; generalizations to the time-dependent case are straightforward but notationally more involved.

Thus we assume that the signal process follows

 Xt=B01{Θ=0}+B11{0<Θ≤t}, (3.2)

where are random variables with distributions respectively, and has the distribution , where is an exponential distribution with intensity . Let us simply write .

For a given , let satisfy with distribution , where is an exponential distribution with intensity . Let

 gl(t,~π,Y) := ~πelσ2Yt−l22σ2t+(1−~π)λl∫t0elσ2(Yt−Yθ)−l22σ2(t−θ)e−θ/λldθ~πelσ2Yt−l22σ2t+(1−~π)(λl∫t0elσ2(Yt−Yθ)−l22σ2(t−θ)e−θ/λldθ+1−e−t/λl),

compare (2.1). Also, we introduce the notation

 Yμt:=∫t0Xudu+σWt,
 Yδlt:=l(t−Θl)++σWt,
 ~Πδlδl(t):=gl(t,~π,Yδl)

and

 ~Πμδl(t):=gl(t,~π,Yμ).

Here is the observation process for a setting in which the post-change drift has distribution and the disorder happens at . The process is the observation process and is the corresponding conditional probability process in the situation of a post-change drift that occurs at . Moreover, the process represents the conditional probability process calculated as if the drift change is described by in the scenario where the true drift-change is given by .

Now, let denote the optimal stopping boundary for the classical Shiryaev one-dimensional problem in the model , and define

 τδlδl:=inf{t≥0:~Πδlδl(t)≥a},
 τμδl:=inf{t≥0:~Πμδl(t)≥a},

and

 Vμδl:=E[1{τμδl<Θ}+c(τμδl−Θ)+].

Here is the optimal stopping time in the model , and is the (sub-optimal) stopping time and is the corresponding cost for someone who believes in , whereas the true model is .

Finally, let

 ~Πμt:=P(1R∖{0}(Xt)|FYμt)=Π(1)t+...+Π(n)t

as in Section 2, and define

 γμδl:=inf{t≥0:~Πμt≥a}.
###### Theorem 3.3 (Robustness with respect to disorder magnitude and intensity).
1. Suppose that or , and let .

1. Then

 Vμ≤Vμδl≤Vδl+cλ−λlλλl(1−~π), (3.3)

where and denote the minimal associated Bayes’ risks for the models and , respectively.

2. Also,

 Vμ≤P(Θ>γμδl)+cE[(γμδl−Θ)+]≤Vδl. (3.4)
2. Suppose , and define like for . If , then

 Vδr≤Vμ≤Vμδr. (3.5)
###### Remark 3.2.

Note that (3.3) and (3.5) correspond to situations in which the tester uses a misspecified model. More precisely, filtering and stopping are performed as if the underlying model had a one-point distribution as the disorder magnitude prior (the classical Shiryaev model). Such a situation may appear due to model miscalibration but is also relevant in situations with limited computational resources as the tester can deliberately choose to under/overestimate the actual parameters in order to use a simpler detection strategy. Equation (3.3) thus gives an upper bound for the expected loss when the classical Shiryaev model is employed. In (3.4), on the other hand, filtering is performed according to the correct model but the simple Shiryaev threshold strategy (suboptimal) is used for stopping.

###### Proof.
1. For definiteness, we consider the case so that ; the other case is completely analogous. First note that the suboptimality of yields . Next, observe that we have for all and for all , and therefore

 ~Πδlδl(t)=~Πμδl(t)for t∈[0,Θ]

and

 ~Πδlδl(t)≤~Πμδl(t)for all t≥0

by the filtering equation (3.2). Consequently,

 τδlδl≥τμδl,

so

 E[(τδlδl−Θl)+] ≥ E[(τμδl−Θ)+]−E[(Θl−Θ)+] = E[(τμδl−Θ)+]−λ−λlλλl(1−~π).

Moreover, since on the time interval , we have

 P(τδlδl<Θl)≥P(τδlδl<Θ)=P(τμδl<Θ),

which together with (1a) yields

 Vδl = E[1{τδlδl<Θl}+c(τδlδl−Θl)+] ≥ E[1{τμδl<Θ}+c(τμδl−Θ)+]−cλ−λlλλl(1−~π) = Vμδl−cλ−λlλλl(1−~π).
2. The first inequality is immediate by suboptimality of . For the second one, let be the value function of the classical Shiryaev problem so that . Then is on and on , so applying Itô’s formula to and taking expectations at the bounded stopping time , we get

 U(~π) = E[U(~Πγμδl∧k)]−E⎡⎣∫γμδl∧k0λ(1−~Πu)U′(~Πu)+^X2u2σ2(1−~Πu)2U′′(~Πu)du⎤⎦ ≥ E[U(~Πγμδl∧k)]−E[∫γμδl∧k0λl(1−~Πu)U′(~Πu)+l22σ2~Π2u(1−~Πu)2U′′(~Πu)du] = E[U(~Πγμδl∧k)]+E[c∫γμδl∧k0~Πudu],

where monotonicity and concavity of were used in the inequality. Letting gives

 U(~π)≥E[1−~Πγμδl]+E[c∫γμδl0~Πudu],

which finishes the proof of the claim.

1. Recall that

 d~Πt=λ(1−~Πt)dt+^Xtσ(1−~Πt)d^Wt.

Let . Since is on and on , where is the boundary in Shiryaev’s problem with drift and intensity , applying Itô’s formula to and taking expectations at a bounded stopping time yields

 U(~π) = E[U(~Πτ)]−E⎡⎣∫τ0λ(1−~Πu)U′(~Πu)+^X2u2σ2(1−~Πu)2U′′(~Πu)du⎤⎦ (3.7) ≤ E[U(~Πτ)]−E[∫τ0λr(1−~Πu)U′(~Πu)+r22σ2~Πu(1−~Πu)2U′′(~Πu)du] ≤ E[U(~Πτ)]+E[c∫τ0~Πudu] ≤ E[1−~Πτ]+E[c∫τ0~Πudu]. (3.8)

Here concavity was used for the first inequality, (3.7) follows from the fact that

 λr(1−~π)U′(~π)+r22σ2~π(1−~π)2U′′(~π)+c~π≥0,~π∈[0,a)∪(a,1],

and the inequality (3.8) because . Hence, since the same value is obtained if one in (2.3) restricts the infimum to only bounded stopping times,

 Vδr=U≤Vμ.

Lastly, since is a suboptimal strategy, we also have

 Vμ≤Vμδr,

which finishes the claim.

###### Corollary 3.1.

In the notation above, assume that so that there is no mis-specification of the intensity. Moreover, assume that , where . Then

 Vδr≤Vμ≤Vδl,

so monotonicity in the disorder magnitude holds when comparing with deterministic magnitudes. Furthermore,

 0≤Vμδl−Vμ≤Vδl−Vδr,

so the increase in the Bayes’ risk due to underestimation (with a constant) of the disorder magnitude is bounded by the difference of two value functions of the classical Shiryaev problem.

We finish with some implications concerning the stopping strategy , where is a standard abstractly defined optimal stopping set, see [8] (we now assume that we are in the case of time-independent coefficients so that the value function is merely a function of ). The concavity of , compare Remark 2.2, yields the existence of a boundary separating from its complement . The following result provides a more accurate location of the boundary .

###### Corollary 3.2 (Confined stopping boundary).

Assume that the coefficients , and are constant and that , where . Let and denote the boundaries in the classical Shiryaev problem with disorder magnitude and , respectively. Then

 al≤inf{∥π∥1:π∈γ}≤sup{∥π∥1:π∈γ}≤ar,

i.e. the stopping boundary is contained in a strip. Moreover, the optimal strategy satisfies

 1−ar≤P(τD<Θ|FYτD)≤1−al.

## References

• [1] Bain, A. and Crisan, D. (2009). Fundamentals of Stochastic Filtering. Stochastic Modelling and Applied Probability, 60. New York: Springer-Verlag.
• [2] Beibel, M. (1997). Sequential Change-Point Detection in Continuous Time when the Post-Change Drift is Unknown, Bernoulli 3: 457-478.
• [3] Bayraktar, E., Dayanik, S. and Karatzas, I. (2005). The Standard Poisson Disorder Problem Revisited, Stochastic Processes and their Applications 115: 1437-1450.
• [4] Bayraktar, E., Dayanik, S. and Karatzas, I. (2006). Adaptive Poisson Disorder Problem, The Annals of Applied Probability 16: 1190-1261.
• [5] Crisan, D. and Rozovskii, B. (2011).

The Oxford Handbook of Nonlinear Filtering

. Oxford University Press.
• [6] Muravlev, A., Shiryaev, A. (2014). Two-Sided Disorder Problem for a Brownian Motion in a Bayesian Setting, Proceedings of the Steklov Institute of Mathematics 287: 202-224.
• [7] Peskir, G. and Shiryaev, A. (2002). Solving the Poisson Disorder Problem. Advances in Finance and Stochastics, 295-312, Springer, Berlin.
• [8] Peskir, G. and Shiryaev, A. (2006). Optimal Stopping and Free-Boundary Problems. Lectures in Mathematics, ETH Zürich. Basel: Birkhäuser Verlag.
• [9] Shiryaev, A. N. (1967). Two Problems of Sequential Analysis. Cybernetics 3: 63-69.
• [10] Shiryaev, A. N. (1978). Optimal Stopping Rules, New York: Springer-Verlag.
• [11] Zucca, C., Tavella, P. and Peskir, G. (2016). Detecting Atomic Clock Frequency Trends using an Optimal Stopping Method. Metrologia 53: 89-95.