1 Introduction
Detecting a common change in multistream data or panel data is critical on sequential changepoint detection problem. Here, a common change is referred as a change that may occur only in a portion of the N panels; usually caused by external sources. In contrast, the traditional changepoint detection is focused on a single sequence (individual panels) where the change is typically caused from internal sources. Several typical detection procedures have been discussed and extended; see Xie and Siegmund (2013), Mei (2013), and Tartakovsky and Veeravalli(2008). Chan (2017) discussed the optimality of detection procedures.
Wu (2019) proposed a combined SRCUSUM procedure that uses the sum of N ShiryayevRoberts processes to detect the common change, while the N individual CUSUM processes are used to isolate the changed panels and estimate the change point. The alarming limit is chosen such that the average incontrol run length is equal to a designated value. For convenience of discussion, we shall focus on the normal case.
Assume there are independent panels and in panel , the observations follow for and () for if a change occurs at in this panel. Suppose a change may only occur in of the panels, called common change. The
panels can be assumed following a mixture model with probability
of change in each panel. We shall select the same reference parameter for for all panels.For , define
as the ShiryayevRoberts process for the panel and for .
An alarm will be raised at the stopping time
where is chosen such that the is equal to the designated value.
In the normal case, when is small, can be designed by using the following simple approximation (Pollak (1987)):
where .
For example, for , , and corresponding to and , respectively. Further properties on the average run lengths are referred to Wu (2018b). Comparison with other procedures as shown in Wu (2018b) demonstrated that the proposed procedure is very competitive when the proportion is small.
To isolate the changed panels and estimate the common change point, the combined SRCUSUM procedure calculates the CUSUM processes recursively as
and at the alarm time the changepoint for the panel is estimated as the last zero point of ,
for , which is indeed the MLE when . When is unknown, it can be estimated as
Apparently, to isolate the ”true” changed panels, both the changepoint estimation (or estimated delay detection time after the changepoint estimation) and estimated strength of signals (or ) provide related information. Here we propose a BHtype procedure to control the FDR. In Section 2, we shall first study the corresponding continuous time Brownian motion model and derive the exact null joint distribution for and
for the unchanged panels. The marginal moments and covariance shows that they are highly correlated. When
is small, we extend the results to the discrete time model in Section 3. In Section 4, we propose to use the approximate null distribution for to form a BHtype procedure to isolate the changed panel by controlling FDR. The isolated changed panels are then used to estimate the common change point. Simulation studies for the FDR, FNR, and biases of estimated common changepoint in several typical cases show the proposed method works quite well. The results also help us to select the proper FDR in order to balance FNR.2 Null Distribution under continuous time model
We assume that a common change is detected (B is large) and the change occurs far away from the beginning. For those unchanged panels, at the detection time, by looking backward at each CUSUM process and using its strong Markov property, we can see that for each , and are approximately equivalent in distribution to the maximum point and the maximum value for a normal random walk for with drift
and variance 1 where
Under the continuous time model, we shall denote for as a Brownian motion and as its corresponding probability measure with drift and as the probability measure when the drift is . Denote
For an independent copy of , we denote . The following theorem gives the joint distribution of and its proof is given in the appendix.
Theorem 1.
Note that by letting , we see
By taking derivative with respect to x, we get
since
Thus,
The following theorem shows that the conditional distribution of given is actually inverse Gaussian under and its proof is given in the appendix.
Theorem 2.
where and the conditional density function of given is
and the joint density function of is
By integrating with respect to , we have:
Corollary 1. The marginal density function and cdf of are given by
where
From Theorems 1 and 2, we have the following results and the proofs are given in the appendix.
Theorem 3.
(i) and ;
(ii) and .
The results show that and are highly correlated. For this reason, we shall consider to isolate the changed panels mainly based on .
3 Approximate null distribution under discrete time model
We derive the Laplace transform for the joint distribution of . Let . Define
and for
and
and if . It can be seen that
for where .
Thus, we can write
For given , is in distribution equivalent to the sum of k i.i.d. copies of . This leads to the following Laplace transform for in the normal case.
Theorem 4.
where .
Proof. By conditioning on the value of , we have
From the above theorem , the exact results for moments of and can be obtained. For example,
From the WienerHopf factorization (e.g. Siegmund (1985,Theorem 8.41)) , as the random walk has negative drift, we have
where and .
By letting and , we see
Thus, we have
Theorem 5.
The following corollary shows the second order approximate exponential property for as .
Corollary 2. As ,
Proof. By taking , we have as ,
As , . Thus, we have
To study the distribution of , we denote by the probability measure with mean and for . We fist note that as ,
For a given large value of , we use Equation of Siegmund (1985) and give the following inverse Gaussian approximation with overshoot correction:
In other words, the unconditional distribution of
can be treated approximately as a mixture of inverse Gaussian distribution.
4 Isolation of change panels and estimation of common change point
Since and are highly correlated, we shall consider the isolation mainly based on
. Conditioning on the common change is detected, we use the corrected exponential distribution for
for for unchanged panels. The BH procedure (Benjamin and Hochberg (1995)) will be used to control the FDR that is defined as the rate of unchanged panels among all claimed changed panels. Similarly, the FNR is defined as the rate of undiscovered true changed panels among all K true changed panels.We first calculate the pvalues by
and be the ordered sequence.
For controlled FDR , the number of isolated changed panels will be defined as
The wellknown theoretical results show that the FDR under this procedure has upper bound . Based on the isolated changed panels, we can estimate the common change point based on the corresponding change point estimations for .
To show how the proposed procedure performs, we conduct several simulations and leave theoretical investigation for future consideration.
For , , , (), and the number of changed panels , Table 1 gives the simulation results for FAR (false alarm rate) , FDR, FNR, biases of median estimate and mean estimation based on the changepoint estimates from the isolated changed panels, and mean number of total isolated changed panels , along with the conditional average delay detection time (CADT) based on 5000 simulations. All the values are calculated conditioning on the change is detected .
Table 2 gives the corresponding results for () and .
Figure 1 gives the histograms of simulated FDR, FNR, , and for , , , , and conditioning on .
Table 1. Simulation for and with
K  0.2  0.3  0.4  0.5  
1  FAR  0.0372  0.042  0.0382  0.046 
FDR  0.216  0.301  0.370  0.441  
FNR  0.0397  0.0401  0.0324  0.0285  
0.0  1.0  3.0  5.0  
1.368  4.01  6.52  9.88  
1.481  1.817  2.299  2.935  
CADT  60.03  59.70  59.51  60.27  
5  FAR  0.0406  0.0446  0.0422  0.0488 
FDR  0.188  0.270  0.349  0.431  
FNR  0.417  0. 346  0.282  0.243  
3  2  1  0  
6.6  4.6  2.9  1  
3.83  4.99  6.38  8.04  
CADT  32.89  32.98  33.21  33.01  
10  FAR  0.0424  0.0398  0.0434  0.0428 
FDR  0.172  0.256  0.333  0.422  
FNR  0.469  0.375  0.302  0.250  
3  2  1  0  
6.46  5.0  3.20  1.6  
6.65  8.88  11.31  14.36  
CADT  26.31  26.10  26.40  26.50  
20  FAR  0.0432  0.0404  0.0418  0.0388 
FDR  0.155  0.227  0.307  0.383  
FNR  0.482  0.374  0.283  0.216  
3  2  1  0  
6.4  4.5  2.9  1.2  
12.47  16.7  21.5  26.5  
CADT  21.52  21.30  21.42  21.29  
30  FAR  0.047  0.0438  0.0428  0.046 
FDR  0.140  0.205  0.273  0.342  
FNR  0.480  0.348  0.265  0.192  
3  2  1  0  
6.19  4.1  2.8  1.4  
18.34  25.0  30.9  37.57  
CADT  18.29  18.57  18.44  18.65  

Table 2. Simulation for and with
K  0.2  0.3  0.4  0.5  

1  FAR  0.0232  0.0284  0.0272  0.0232 
FDR  0.184  0.277  0.349  0.427  
FNR  0.0082  0.0082  0.0047  0.0055  
1.0  3.0  6.0  10.0  
4.58  8.37  11.68  17.20  
1.468  1.853  2.299  2.960  
CADT  75.07  75.21  74.47  74.94  
5  FAR  0.0244  0.0244  0.0278  0.0282 
FDR  0.185  0.268  0.351  0.434  
FNR  0.265  0.214  0.176  0.140  
1  0  1  2  
2.7  1.0  0.9  3.9  
4.79  5.91  7.21  8.99  
CADT  42.84  42.96  42.95  42.90  
10  FAR  0.0292  0.0224  0.0212  0.0242 
FDR  0.172  0.256  0.338  0.427  
FNR  0.296  0.224  0.176  0.139  
1  0  0  1.5  
2.9  1.6  0.3  2.6  
8.78  10.94  13.3  16.35  
CADT  35.47  35.37  35.40  35.46  
20  FAR  0.0234  0.0280  0.0234  0.0244 
FDR  0.158  0.234  0.314  0.388  
FNR  0.280  0.211  0.158  0.115  
1  0.5  0  1  
2.4  1.3  0.1  1.6  
17.34  21.04  25.2  29.9  
CADT  29.55  29.41  29.41  29.54  
30  FAR  0.0218  0.0242  0.0248  0.0252 
FDR  0.139  0.207  0.274  0.344  
FNR  0.271  0.193  0.138  0.102  
1  0  0  1  
2.1  1.2  0.1  0.98  
25.63  30.90  36.17  41.80  
CADT  26.38  26.52  26.59  26.48  

By looking at the simulation results of Tables 1 and 2, we have several important findings.
(i) The FDRs are not significantly different between and and decreases when K increases;
(ii) The FNRs are not significantly different when K changes for fixed and decreases when increases;
(iii) The simulated FDRs are very close to the theoretical upper bound ;
(iv) The FNRs decreases as increases and the two are roughly balanced around . So or 0.3 are recommended.
(v) The median estimate for the common change point is preferred as its bias are smaller than the mean estimate;
(vi) The number of isolated panels increases as increases and roughly equals to the true K at .
However as the postchange mean is rarely known and we typically select as the minimum magnitude to detect. So we also run a simulation study for and 2.0. Table 3 gives the results for both and with and 0.3. Additional findings are:
(vii) The FDRs are roughly the same when changes, while the FNRs are reduced more significantly for as increases. From this point of view, is preferred if stronger signals are expected.
(viii) However, as increases, the bias of the common change point becomes more negative, similar to Table 2.1 in Wu (2005, pg 40).
Table 3. Simulation for unknown
0.5  1.0  1.5  2.0  
FAR  0.0398  0.0392  0.0420  0.0432  
FDR  0.256  0.254  0.261  0.259  
FNR  0.375  0. 152  0.074  0.040  
2  4  5  5.5  
5.0  6.42  6.74  6.88  
8.88  11.86  13.01  13.40  
CADT  26.5  12.14  8.06  6.08  
FAR  0.0424  0.0442  0.0414  0.0368  
FDR  0.172  0.173  0.172  0.172  
FNR  0.469  0.226  0.128  0.080  
3  4.5  5  4.5  
6.46  7.28  6.89  6.71  
6.65  9.63  10.77  11.38  
CADT  26.31  12.21  8.05  6.07  
FAR  0.0224  0.0278  0.0254  0.0230  
FDR  0.256  0.258  0.265  0.268  
FNR  0.224  0.048  0.0167  0.007  
0  3  4.5  5  
1.6  4.4  5.5  5.9  
10.94  13.3  13.9  14.01  
CADT  35.37  15.95  10.42  7.82  
FAR  0.0292  0.0258  0.0262  0.0294  
FDR  0.172  0.172  0.176  0.178  
FNR  0.296  0.084  0.035  0.015  
1  3  4  4  
2.9  4.8  5.2  5.4  
8.78  11.33  11.97  12.24  
CADT  35.47  15.93  10.41  7.82  

5 Conclusion
In this paper, we proposed a BH procedure to control the FDR after a common change is detected in multipanel data stream. The method only uses partial information available from each individual CUSUM process and is shown performing quire well. To reduce the FDR for isolating changed panels and estimating the common change point, supplementary runs are necessary on isolated changed panels. A simple method is to run onesided truncated sequential tests by just finding the true changed panels as discussed in Wu (2018). Further discussions on sequential multiple tests on controlling FDR can also be used in the supplementary runs; see Bartroff (2017), De and Baron (2015), and Song and Fellouris (2019). As discussed in Wu (2019), we may also use the adaptive combined SRCUSUM procedure which can eliminate large biases of the common change point estimation when the post change means are unknown. The results will be presented in future communications.
6 Appendix
6.1 Proof of Theorem 1
where in the second equation from last, we use the fact
The last two terms are evaluated by using Equation (3.14) of Siegmund(1985):
where and are standard normal cdf and pdf.
6.2 Proof of Theorem 2
6.3 Proof of Theorem 3
First, we note and . Second, by using the property of inverse Gaussian distribution,
Also,
(ii) is proved by combining the above results.
Acknowledgement.
This research is partially supported by a RSCA grant from California State University at Stanislaus.
References
 Bartroff, J., 2018.

Multiple hypothesis tests controlling generalized error rates for sequential data. Statistica Sinica 28, 363–398.
 Benjamini,Y., Hochberg,Y.,1995.

Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. (B) 57, 289 – 300.
 Chan, H. P.,2017.

Optimal sequential detection in multistream data. Annals of Statistics 45(6),2636–2763.
 De, S. K., Baron, M.,2015.

Sequential tests controlling generalized familywise error rates. Statistical Methodology 23,88 – 102.
 Mei, Y.,2010.

Efficient scalable schemes for monitoring a large number of data streams. Biometrika 97,419 –433.
 Pollak, M., 1987.

Average run lengths of an optimal method for detecting a change in distribution. Annals of Statistics 15,749 –779.
 Siegmund, D., 1985.
 Song, Y., Fellouris, G., 2019.

Sequential multiple testing with generalized error control: An asymptotic optimality theory. Ann. Statist 47(3), 1776 – 1803.
 Tartakovsky, A.G., Veeravalli, V.V., 2008.

Asymptotically optimal quickest detection change detection in distributed sensor. Sequential Analysis 27, 441 –475.
 Wu, Y., 2005.

Inference for Changepoint and Postchange Means After a CUSUM Test. Lecture Notes in Statistics 180, Springer, New York .
 Wu, Y., 2018.

Supplementary score test for sparse signals in largescale truncated sequential tests. Journal of Statistical Theory and Practice 12(4),744–756.
 Wu, Y., 2019.

A combined SRCUSUM procedure for detecting common changes in panel data. Communication in Statistics: Theory and Methods 48(17): 4302–4319.
 Xie, Y., Siegmund, D.,2013.

Sequential multisensor changepoint detection. Annals of Statistics 41,670 –692.
Comments
There are no comments yet.