# Bayesian Methods for Multiple Change-Point Detection with Reduced Communication

In many modern applications, large-scale sensor networks are used to perform statistical inference tasks. In this paper, we propose Bayesian methods for multiple change-point detection using a sensor network in which a fusion center (FC) can receive a data stream from each sensor. Due to communication limitations, the FC monitors only a subset of the sensors at each time slot. Since the number of change points can be high, we adopt the false discovery rate (FDR) criterion for controlling the rate of false alarms, while minimizing the average detection delay (ADD). We propose two Bayesian detection procedures that handle the communication limitations by monitoring the subset of the sensors with the highest posterior probabilities of change points having occurred. This monitoring policy aims to minimize the delay between the occurrence of each change point and its declaration using the corresponding posterior probabilities. One of the proposed procedures is more conservative than the second one in terms of having lower FDR at the expense of higher ADD. It is analytically shown that both procedures control the FDR under a specified tolerated level and are also scalable in the sense that they attain an ADD that does not increase asymptotically with the number of sensors. In addition, it is demonstrated that the proposed detection procedures are useful for trading off between reduced ADD and reduced average number of observations drawn until discovery. Numerical simulations are conducted for validating the analytical results and for demonstrating the properties of the proposed procedures.

There are no comments yet.

## Authors

• 4 publications
• 2 publications
• 7 publications
04/09/2021

### Bayesian Method for Spatial Change-Point Detection of Propagating Event

Rapid detection of spatial events that propagate across a sensor network...
12/23/2020

### Quickest Detection over Sensor Networks with Unknown Post-Change Distribution

We propose a quickest change detection problem over sensor networks wher...
07/24/2020

### Multinomial Sampling for Hierarchical Change-Point Detection

Bayesian change-point detection, together with latent variable models, a...
01/14/2021

### Optimal network online change point localisation

We study the problem of online network change point detection. In this s...
01/24/2019

### Asynchronous Multi-Sensor Change-Point Detection for Seismic Tremors

We consider the sequential change-point detection for asynchronous multi...
09/01/2015

### Multi-Sensor Slope Change Detection

We develop a mixture procedure for multi-sensor systems to monitor data ...
06/03/2020

### Conformal e-prediction for change detection

We adapt conformal e-prediction to change detection, defining analogues ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Large-scale sensor networks are prominent new tools in various applications, e.g. Internet of Things (IoT), cyber-physical systems such as power grids, environmental monitoring, and wireless communication. These sensor networks can be used to perform statistical inference tasks [1, 2, 3, 4]. An important statistical inference problem is sequential change-point detection [5, 6, 7, 8, 9, 10, 11, 12, 13] in which one is interested in detecting a rapid change in the underlying probability model, anomaly or adversarial activity as quickly as possible subject to a false positive constraint. Sensor networks, where each sensor observes a different data stream and communicates with a fusion center (FC) or cloud, can be deployed to detect multiple change points in a monitored environment.
Multiple change-point detection is closely related to multiple hypothesis testing. A widely-used performance criterion in multiple hypothesis testing is the false discovery rate (FDR), where the FDR is the expected proportion of the number of false discoveries among all discoveries [14, 15, 16]. FDR control for multiple change-point detection has been considered in [3] and [4, 17] in the deterministic and Bayesian frameworks, respectively. These works assumed that all

data streams are observed in parallel, which may not be feasible in large-scale sensor networks used in IoT. In the context of change-point detection, a Type I error (false positive) occurs if the detection procedure declares a change before the true change actually happens. In general, one would be interested in detecting the change point with minimum possible delay, while controlling the Type I error rate

[6, 12]

. In the Bayesian framework, the posterior probability of a change point having occurred, or some variation of it, is a commonly used test statistic

[12, 11].
Several works have considered discrete time single change-point detection in which only a part of the observations is available. In [18]

, Bayesian change-point detection was considered by monitoring only a minimal number of sensors at each time slot, where the change detection problem was modeled as a Markov decision process. A Bayesian method to minimize the average detection delay (ADD) subject to constraints on both the probability of false alarm and the observation cost was proposed in

[19], where an on-off observation control policy was selected along with the stopping time at which the change is declared. Deterministic versions of this work were developed in [20, 21, 22]

under different settings. Deterministic change-point detection for high-dimensional data with missing elements was considered in

[23]. In [24] and [25], quickest change detection problems with sampling right constraints were considered in the deterministic and the Bayesian frameworks, respectively. Quickest deterministic change-point detection with observation scheduling was considered in [26], where the decision maker chooses one of two different sequences of observations at each time slot. In [27], deterministic change-point detection in sensor network with communication rate constraints was studied and adaptive censoring strategies were developed for the sensors. Quickest deterministic change-point detection over multiple data streams was considered in [28], where the observer can only observe one data stream at each time slot.
In this paper, we consider the problem of rapidly detecting change points in multiple data streams [3, 4]. In particular, an FC receives statistically independent data streams from multiple sensors in a large-scale sensor network. Due to communication limitations, at a given time slot the FC monitors only a subset of the active data streams for which change points have not been declared yet. The subset size has a fixed proportion with respect to (w.r.t.) the number of active data streams. We assume that each data stream has an associated random change point.
The contributions of this paper are:

1. A Bayesian sequential procedure, named the sequential maximum a-posteriori probability (S-MAP) procedure, is proposed. This procedure detects the change points in all of the data streams, while controlling the FDR. The proposed procedure is based on sequentially updating the sensors’ posterior probabilities of change points having occurred. Then, at each time slot we choose to monitor a subset of the sensors with the highest posterior probabilities within the allowed proportion. This approach aims to minimize the time between change point occurrence and its declaration by monitoring the sensors for which change-point occurrence is most probable given the data. The S-MAP procedure uses the same Type I error constraints as in [4] and extends this work to communication constrained scenarios. The FDR control of the S-MAP procedure is established using analytical tools.

2. We develop an improved S-MAP (IS-MAP) procedure that is less conservative than the S-MAP procedure in the sense that it has a lower ADD but higher FDR than the S-MAP procedure. The decrease of the ADD is obtained by reducing the detection threshold values of the IS-MAP procedure compared to the S-MAP procedure. It is proved analytically that the FDR of the IS-MAP procedure is still controlled under the desired level despite its lower detection threshold values.

3. The asymptotic ADD behavior of the S-MAP and the IS-MAP procedures is established analytically for geometric prior distribution of the change points. It is shown that for any proportion value, both detection procedures are scalable in the sense that their asymptotic ADD does not increase with the number of data streams. In addition, the asymptotic ADD improvement that is obtained by using the IS-MAP procedure in comparison to the S-MAP procedure is characterized quantitatively.

4. We conduct simulations in order to evaluate the performance and to verify the established theoretical properties of the S-MAP and the IS-MAP procedures.

5. The S-MAP and the IS-MAP procedures are used for investigating the tradeoff between reducing the ADD and reducing the average number of observations (ANO) drawn until change points are declared. The proposed analysis can be useful for developing distributed statistical inference procedures using large-scale sensor networks in limited communication capability scenarios.

Preliminary results of this paper appear in our conference paper [29] and a deterministic version of the proposed detection methods appears in [30]. The remaining of this paper is organized as follows. In Section II, we formulate the Bayesian multiple change-point detection problem. The S-MAP and the IS-MAP procedures are derived in Sections III and IV, respectively, and their FDR control property is proved. Asymptotic ADD analysis of the S-MAP and the IS-MAP procedures is conducted in Section V. Our simulations and conclusions appear in Sections VI and VII, respectively.

## Ii Bayesian problem formulation

We consider statistically independent discrete time data streams denoted by , . For the th data stream there is a random change point,

, where the prior distribution of each change point is assumed to be known. Commonly, geometric distribution is assumed as a prior for discrete time change-point detection

[11, 31]. The change points are assumed to be independent and identically distributed (i.i.d.) among the data streams. For the th data stream, given its change point, , we assume that are i.i.d. with known probability density and are i.i.d. with known probability density . Due to communication limitations, at a given time slot we choose a subset of data streams to observe among the active data streams. Let denote the number of active sensors at time slot . We set a fixed proportion value and observe of the active data streams, where

is the ceiling operator. The actual data vectors that are sequentially observed by the FC are denoted by

, where is the subset of sensor indices that are monitored at time slot . The filtration at time slot is the -algebra generated by the random vectors , which is denoted by . In addition, we define the filtration of all the data as . For , the event stands for the case that change in the th data stream has taken place before or at time slot . We define the posterior probability of the event using the observations up to time slot as

 π(k)n△=P(t(k)≤n|Fn), n=1,2,…, (1)

and . We also define the likelihood ratio (LR),

 L(X)△=f1(X)f0(X), (2)

and denote the Kullback-Leibler divergence of

and as .
Under the assumption of i.i.d. change points, by using Bayes’ rule we can recursively compute as follows:

 π(k)n=⎧⎪ ⎪⎨⎪ ⎪⎩L(X(k)n)(π(k)n−1+ρn(1−π(k)n−1))L(X(k)n)(π(k)n−1+ρn(1−π(k)n−1))+(1−ρn)(1−π(k)n−1), k∈snπ(k)n−1+ρn(1−π(k)n−1),      k∉sn (3)

, where depends on the prior distribution of the change point. In [12, Eq. (4.2)], the statistic is considered instead of and corresponding recursive update formula is presented for single change-point detection under general prior distribution. In case , then at time slot an observation is received from sensor and is computed using the observations received before time slot , the prior distribution of , and the new observation . The posterior update for the case corresponds to the case in which at time slot we do not receive an observation from sensor . In this case, is computed using only the observations received before time slot and the prior distribution of . It is shown in [25] that under mild conditions, the th sensor posterior probability is a sufficient statistic for evaluating the th stopping rule ADD and Type I error probability.
In the considered problem, we have to define multiple stopping rules , where the event is measurable w.r.t. . We define

 FDR△=E[Vmax{R,1}], (4)

where stands for the expectation. The term is the number of false discoveries, i.e. the size of the subset of s.t. . The term denotes the number of change points declared, i.e. the size of the subset of s.t. . We would like to control the FDR s.t. it will be no higher than a predefined tolerated level . The ADD for the th data stream is defined as

Since we consider multiple statistically independent data streams, we define the overall ADD as

Assume that at time slot , we have active data streams. Then, we observe of them. We define

 ANO△=E[1Ksupk∈[K]{T(k)}∑n=1⌈qKn⌉]. (7)

The ANO definition extends the definition from [19], which is defined for single change point detection, i.e. . A difference between the definitions is that the ANO from [19] does not consider the observations drawn after the change point occurs, while the ANO definition in (7) takes into account all the observations drawn until change points are declared. This is in order to properly evaluate the communication burden caused by transmissions of data streams from the sensors to the FC. In the following section, we propose the S-MAP procedure, which is a Bayesian multiple change-point detection procedure that controls the FDR under the limitation on the proportion of sensors communicating their data streams to the FC.

## Iii S-MAP detection procedure

In this section, we propose a Bayesian detection procedure that is tasked to eventually discover all the random change points that occur in the monitored environment. At a given time slot, we consider each sensor individually and evaluate its posterior probability from (1) using the recursive formula from (3). At time slot , we have active data streams of which we observe only a subset of size . The developed S-MAP procedure extends the method in [4] by proposing a rule for choosing the subset of data streams to observe. In the S-MAP procedure, we use the posterior probability from (1) as a test statistic, rather than the test statistic from [13, 4], which is based on a Bayesian version of the LR. The test statistic from [13, 4] is used under a very strong global false alarm probability constraint [13] that may be too conservative in terms of FDR control. Under the communication limitations, among the active data streams, we choose to observe the data streams with the highest posterior probabilities of a change point having occurred. The motivation for the S-MAP approach is that we are interested in minimizing the time between the occurrence of a change point and its declaration using the sequentially updated posterior probabilities. The S-MAP procedure that monitors all of the active data streams, i.e. with , is denoted as the parallel procedure. In the following, we describe the proposed S-MAP procedure.
We construct a descending set of thresholds , s.t. the detection on the th data stream that samples until has a Type I error probability that is smaller than or equal to , where is the predefined FDR tolerance level. Formally,

 P(∃n

According to [12] and [31, p. 225], the choice

 Qr=1−rαK (9)

ensures that (8) is satisfied. The proposed detection procedure is divided into sampling stages. Each sampling stage may take several time slots. In the beginning of a sampling stage, we gather all the active data streams and obtain observations from a subset of them, according to the S-MAP approach. This process is repeated at each time slot sequentially, until at least one active data stream posterior probability exceeds its corresponding threshold. Then, we declare changes for some of the active data streams, which are then eliminated from the active data streams set.
Let denote the set of indices of active data streams with cardinality at the beginning of the th sampling stage and let denote the time slot at the end of the th sampling stage. Note that and . The th stage of sampling is described as follows:

1. Sample the data streams with the currently highest posterior probabilities.

2. Update the posterior probabilities of the sensors with active data streams using (3).

3. Sort the updated posterior probabilities in ascending order as , where denotes the index of the th ordered posterior probability at time slot .

4. Repeat this process until time slot in which at least one of the posterior probabilities is higher than its corresponding threshold, i.e. .

5. Declare change points for the data streams , where and remove these data streams from the set of active data streams.

6. Update to be the set of indices of the remaining active data streams. Stop the procedure if .

In the following theorem, we show that we control the FDR of the S-MAP procedure to remain under the upper bound constraint .

###### Theorem 1.

For upper bound constraint , the S-MAP procedure satisfies

 FDR≤α. (10)
###### Proof.

Recall that we choose the thresholds from (9), s.t. (8) is satisfied. Thus, by following the lines of the FDR control proofs in [3, 4], we obtain that the FDR is controlled by the proposed S-MAP procedure under the upper bound constraint . ∎

In the following section, we propose an alternative detection procedure that is less conservative than the S-MAP procedure in terms of FDR control. Therefore, the proposed alternative procedure has improved performance in terms of ADD and ANO compared to the S-MAP procedure.

## Iv Improving the S-MAP procedure

In order to guarantee FDR control, the S-MAP procedure uses the false alarm constraints from (8), which are the same false alarm constraints as in [4] to guarantee FDR control. However, we show in this section that the false alarm constraints from [4] may be too conservative and the corresponding posterior probability threshold values may be too high. We propose the IS-MAP detection procedure, which is similar to the S-MAP procedure except that its threshold values are lower than the thresholds of the S-MAP procedure. Since the IS-MAP procedure uses lower threshold values, then for a fixed proportion, , the ADD and ANO will decrease compared to the S-MAP procedure, i.e. the ADD and ANO performance will improve. Moreover, using the lower thresholds, we prove that we can still control the FDR under the desired level, . In the IS-MAP procedure, we construct a set of thresholds , s.t. the detection on the th data stream that samples until has an individual Type I error probability that is smaller than or equal to , where is the predefined FDR tolerated level. Formally,

 P(∃n

According to [12] and [31, p. 225], the choice

 Qk=Q=1−α, ∀k∈[K], (12)

ensures that (11) is satisfied. Since the thresholds of the IS-MAP procedure are all equal to , its th sampling stage can be written in a more compact form than the corresponding sampling stage of the S-MAP procedure. Let denote the set of indices of active data streams with cardinality at the beginning of the th sampling stage and let denote the time slot at the end of the th sampling stage. The th stage of sampling is described as follows:

1. Sample the data streams with highest posterior probabilities.

2. Update the posterior probabilities of the sensors with active data streams using (3).

3. Repeat this process until time slot in which at least one of the posterior probabilities is higher than the threshold , i.e. .

4. Declare change points for all the data streams with indices in whose posterior probabilities are higher than or equal to and remove these data streams from the set of active data streams.

5. Update to be the set of indices of the remaining active data streams. Stop the procedure if .

In the following theorem, we show that the FDR of the IS-MAP procedure satisfies the desired upper bound constraint.

###### Theorem 2.

For upper bound constraint , the IS-MAP procedure satisfies

 FDR≤α. (13)
###### Proof.

The proof is given in Appendix A. ∎

As mentioned previously, the proposed IS-MAP procedure is similar to the S-MAP procedure from Section III, except that the procedures use different thresholds in order to guarantee the FDR control. Since the thresholds of the IS-MAP procedure in (12) are smaller than the thresholds of the S-MAP procedure, then for a fixed proportion, , the IS-MAP procedure will have a lower ADD and ANO than the S-MAP procedure, while the FDR of the IS-MAP procedure will be higher than the S-MAP FDR. It should be noted that in case of model uncertainty, FDR control is not guaranteed for the S-MAP and the IS-MAP procedures. Then, depending on the application, if ADD and ANO are more significant than FDR, the IS-MAP procedure should be implemented rather than the S-MAP procedure, while if FDR is more significant than ADD and ANO, then the S-MAP procedure may be preferred. In the following section, we analyze the asymptotic ADD behavior of the S-MAP and the IS-MAP procedures under the assumption of geometric prior distribution for the change points.

## V ADD analysis of the S-MAP and the IS-MAP procedures

In this section, we derive asymptotic lower and upper bounds on the ADD of the S-MAP and the IS-MAP procedures for and a fixed number of data streams . Then, we characterize the behavior of these bounds as . For simplicity of the analysis, we assume that the prior distribution of each change point obeys a geometric distribution with common parameter , i.e.

 P(t(k)=m)=ρ(1−ρ)m−1, ∀m=1,2,…,∀k∈[K]. (14)

The geometric prior distribution is commonly assumed in change-point detection problems. This is a memoryless distribution that is both mathematically convenient and provides a reasonable model in practical applications [11, 31]. Under the assumption of i.i.d. change points with geometric priors, it is shown in [19, 25] that the posterior probability of the th sensor evolves in a sequential manner via the recursion

 π(k)n=⎧⎪ ⎪⎨⎪ ⎪⎩L(X(k)n)(π(k)n−1+ρ(1−π(k)n−1))L(X(k)n)(π(k)n−1+ρ(1−π(k)n−1))+(1−ρ)(1−π(k)n−1), k∈snπ(k)n−1+ρ(1−π(k)n−1),      k∉sn (15)

. It can be seen that the recursive formula in (15) is obtained by substituting in (3).
Under communication limitations, the FC observes a subsequence of the complete observation sequence from each sensor. According to the maximum a-posteriori probability (MAP) approach, the indices of the monitored observations are random and determined online based on the proportion, , and the posterior probability values of the active sensors at each time slot. Therefore, it is difficult to characterize the subsequence of observations obtained from each sensor. In order to obtain asymptotic bounds on the ADD of the S-MAP and the IS-MAP procedures, we begin by considering a single change-point detection with the posterior update from (15). Thus, we consider the observation sequence with change point and stopping rule of the form

 T=inf{n∈N:πn≥1−η},η∈(0,1). (16)

We assume that only a subsequence of the complete observation sequence is obtained. It is shown in [25] that for any subsequence of observations, the ADD of the stopping rule in the form of (16) as satisfies

and

where as . The asymptotic ADD lower bound from (17) is attained when the complete observation sequence is available. The asymptotic ADD upper bound from (18) is attained when we do not take observations and the stopping rule is based only on the prior.
In the following theorem, using (17)-(18) we derive asymptotic lower and upper bounds on the ADDs of the S-MAP and the IS-MAP procedures as . These ADD bounds do not require any assumptions on the subsequence of observations obtained from each sensor.

###### Theorem 3.

For and any proportion of observed sensors, , we obtain

and

###### Proof.

The proof is given in Appendix B. ∎

For the ADD of the stopping rule, , from (16) we can derive a tighter upper bound than (18) under some assumptions on the subsequence of observations obtained for the detection. Let us denote by the subsequence of the complete observation sequence, where and are the integer time slots in which observations are obtained for the detection of the single change point, , using the stopping rule, . Equivalently, we sample the complete observation sequence with intervals

 ζn△=Vn−Vn−1≥1,n∈N. (23)

In addition, we define

 ζ(N)△=1NN∑n=1ζn=VNN, (24)

which is the average length of intervals in which we sample observations from the observation sequence, the stopping rule,

 Γ△=inf{n∈N:πVn≥1−η}, (25)

and the random change point,

 γ△=inf{n∈N:Vn≥t}. (26)

The stopping rule and change point from (25) and (26), respectively, represent the case in which we only count time slots where observations are obtained. The time slots, , and intervals, , may be unknown. For the derivation of a tighter asymptotic upper bound on the ADD of the stopping rule, , we only assume that the intervals are bounded, i.e. there exists s.t.

 ζn≤B, ∀n∈N, (27)

there exists s.t.

 limN→∞ζ(N)=ζ, (28)

and

 E[ζ(Γ)max{0,Γ−γ}]=ζE[max{0,Γ−γ}](1+oη(1)). (29)

From (23)-(24), , , , and the specific value of may be unknown. The assumption in (29) essentially requires that as . In the following proposition, we derive an asymptotic ADD upper bound for the stopping rule, , which is tighter than (18).

###### Proposition 4.

Assume that (27)-(29) are satisfied. Then, as the ADD of the stopping rule from (16) satisfies

###### Proof.

The proof is given in Appendix C. ∎

It should be noted that a special case of (30) with , was proved in [25].
Assume that each stopping rule in the S-MAP procedure satisfies the ADD upper bound in (30) with and that each stopping rule in the IS-MAP procedure satisfies the ADD upper bound in (30) with , . In addition, assume that . Then, in a similar manner to the derivation of the upper bounds in (20) and (22), we obtain tighter asymptotic ADD upper bounds for the S-MAP and the IS-MAP procedures, given by

and

respectively.
In (19), (20), and (31) and in (21), (22), and (32), we obtained asymptotic ADD bounds for the S-MAP and the IS-MAP procedures, respectively. For any fixed proportion, , of observed data streams and for sufficiently small these bounds hold. We characterize the behavior of these bounds as increases towards in order to investigate the scalability of the S-MAP and the IS-MAP procedures, as the number of data streams increases. Let

denote the asymptotic ADD lower bound for both the S-MAP and the IS-MAP procedures. It can be seen that this lower bound is a finite constant w.r.t. .
We denote the asymptotic ADD upper bounds for the S-MAP procedure as

and

Consider the sequence . Using [32, Eq. (5)] and Stirling’s approximation (see e.g. [32, 33]) and applying some algebraic manipulations, it can be verified that this sequence is monotonically increasing and converges to . Thus, we obtain that and are monotonically increasing with and converge to a finite constant, i.e.

and

In a similar manner to (34)-(35), we denote

and

The upper bounds in (38)-(39) are finite constants w.r.t. .
The sequence is nonnegative and thus,

and

In addition, by comparing (38)-(39) to (34)-(35) as , we obtain

where the second equality is obtained by substituting (36)-(39). The results in (40)-(42) demonstrate the ADD improvement obtained by using the IS-MAP procedure instead of the S-MAP procedure.
The presented asymptotic ADD results hold for any proportion value, . However, it is expected that the S-MAP ADD and the IS-MAP ADD will increase as the proportion of monitored sensors decreases. An intuitive explanation for this phenomenon is as follows: For fixed , the posterior probability in (15) is monotonically nondecreasing with the LR, . After a change occurs, we receive samples from . By taking the expectation of the difference w.r.t. and using and , we obtain

 Ef1[L(X)−1]=Ef0[(L(X)−1)2]≥0. (43)

The case corresponds to the case in which we choose not to monitor the corresponding sensor. Thus, as the number of observations increases, the threshold will usually be exceeded in an earlier time slot and consequently, the ADD will usually be lower. An advantage of observing only a small subset of sensors is that the ANO for the detection task may decrease, which reduces the communication burden. Consequently, we identify a tradeoff between the ADD and the ANO. We will investigate this tradeoff in Section VI.

## Vi Numerical simulations

In this section, we evaluate the performance of the proposed S-MAP and IS-MAP procedures in terms of FDR, ADD, and ANO. In addition, the analytical results from Sections III-V are verified in the simulations. The simulation results are based on Monte Carlo runs. We generate the true change points independently for each sensor from a geometric distribution with parameter and assume that we know this parameter when applying the procedure. It should be noted that in case is unknown then by assuming a sufficiently low value for , the FDR of the S-MAP and the IS-MAP procedures may still be controlled under the desired upper bound. The reason is that the posterior probabilities from (15) decrease as decreases. If the assumed value of is lower than the true value of , the change-points will usually be declared in later time slots than in the case in which the true value of is used. Thus, the FDR will not increase. In all cases, we set the FDR upper bound as .
For comparison purposes, we implement and evaluate the performance of two additional procedures. The first procedure is a simplified version of the S-MAP procedure, which is referred to as the simple procedure. This procedure simplifies S-MAP from Section III by replacing the method of choosing the subset of sensors to monitor. In the simple procedure, at each time slot we randomly choose a subset of active sensors with consecutive indices to monitor within the allowed proportion. Following the FDR control proofs in [3, 4], it can be shown that the simple procedure controls the FDR under the predefined upper bound. This procedure is implemented in order to verify that the MAP approach for choosing the subset of sensors to monitor, as used in the S-MAP procedure, improves the ADD performance compared to randomly choosing this subset, as used in the simple procedure. The second method implemented for comparisons is the fully parallel procedure of [4], named D-FDR, that observes all the data streams. The FDR control of the D-FDR procedure is shown in [4]. In this procedure, the following test statistic is used

 G(k)n=∞∑m=1P(t(k)=m)n∏i=mL(X(k)i), n=1,2,…. (44)

This test statistic is the average LR (ALR) between the hypotheses that the change occurs at and that the change never occurs, . This ALR test statistic is recursively updated according to the following formula:

 G(k)n=G(k)n−1L(X(k)n)+P(t(k)≥n+1)(1−L(X(k)n)), (45)

where . For , the D-FDR procedure is similar to the S-MAP procedure except that it uses the ALR test statistic, rather than the posterior probability test statistic, with the thresholds

 Qr=Krα, r∈[K], (46)

in order to guarantee the same false positive constraints as in (8). Assume that for the th data stream, the corresponding threshold is , . It is shown in [13] that in this case, using the ALR test statistic with the threshold is equivalent to using the posterior probability test statistic with the threshold

 Q∗rk=1−p(t(k)≥n+1)rkαK. (47)

Thus, from (9), (12), and (47), the posterior probability thresholds of the D-FDR procedure are higher than the posterior probability thresholds of the S-MAP and the IS-MAP procedures. Consequently for , the ADD and ANO of the S-MAP and the IS-MAP procedures will be lower than the ADD and ANO of the D-FDR procedure.
In Subsection VI-A

, we consider multiple change-point detection with known Gaussian distributions and in Subsection

VI-B, we consider a general model under some uncertainty and use -values [14, 15, 34, 35, 36, 37]

from each sensor as observations for the multiple change-points detection. It should be noted that in the simulations, we assume that we have a sufficient number of observations for declaring the changes so there are no Type II errors corresponding to infinite ADD.

### Vi-a Gaussian distribution scenario

We consider Gaussian distributions with a change in the mean and set and as depicted in Fig. 1. First, for , we examine the FDR control of the proposed S-MAP and IS-MAP procedures with , where is the proportion of monitored sensors. The proportion

corresponds to the parallel versions of the S-MAP and the IS-MAP procedures that observe all the active data streams at each time slot. Due to space limitations, we do not present tables of all the estimated FDR results. The resulting minimum and maximum estimated FDR values of the S-MAP procedure are

and , respectively, while the resulting minimum and maximum estimated FDR values of the IS-MAP procedure are and , respectively. Consequently, both procedures control the FDR under the upper bound . These results confirm the analytical results in Theorems 1 and 2. The S-MAP FDR values are lower than the IS-MAP FDR values, since the S-MAP procedure is more conservative and uses higher thresholds than the IS-MAP procedure. For both the S-MAP and the IS-MAP procedures there is still a gap between the FDR values and the upper bound . This result follows from the choices of thresholds in (9) and (12) for the S-MAP and the IS-MAP procedures, respectively, that neglect the overshoot in the stopping rule [12].
In Fig. 2, we evaluate the ADD of the procedures: D-FDR, S-MAP with , simple procedure with , and IS-MAP with versus . It can be seen that all the considered procedures have an approximately constant ADD as increases, which verifies the analytical results in Section V. The parallel version of the IS-MAP procedure, i.e. for , has the lowest ADD. Moreover, it can be seen that the IS-MAP procedure with outperforms the parallel version of the S-MAP procedure and the D-FDR procedure. These results demonstrate the advantage of using the IS-MAP procedure instead of the S-MAP or the D-FDR procedures in terms of ADD. The simple procedure with has the highest ADD among the considered procedures implying that the proposed MAP approach is desirable for choosing the sensors to monitor at each time slot within the allowed proportion. In Fig. 3, we evaluate the ANO versus of the procedures: D-FDR, S-MAP with , and IS-MAP with . It can be seen that IS-MAP with has the lowest and the D-FDR has the highest ANO. In addition, it can be seen that for all the procedures, the ANO is approximately a constant w.r.t. .
In order to evaluate the performance of the procedures using both the ADD and the ANO as criteria, we define a weighted risk,

where sets the weighting between the ADD and the ANO. For we are only interested in the ADD, while for we are only interested in the ANO. In the upper plot of Fig. 5, we compare the weighted risks of the S-MAP and the IS-MAP procedures with different proportions versus the proportion size for . It can be seen that the weighted risk of the IS-MAP procedure is lower than the weighted risk of the S-MAP procedure. For both the S-MAP and the IS-MAP procedures, the best tradeoff among the considered proportions is achieved with the proportion . Thus, when both the ADD and the ANO are taken into account it may not be necessary to monitor all the active data streams in parallel, i.e. to choose .
In the lower plot of Fig. 5, for both the S-MAP and the IS-MAP procedures, we present the best proportion among the proportions in terms of the weighted risk in (48), i.e. the proportion with lowest risk, versus the weighting coefficient . It can be seen that for both procedures, as increases the best proportion does not increase. Moreover, in most of the considered cases the best proportion decreases as increases. Thus, as we put a higher weight on the ANO compared to the ADD we should usually choose a lower proportion of data streams to observe. In addition, as we change from to there is a rapid decrease in the optimal proportion from to and in the S-MAP and the IS-MAP procedures, respectively. This result implies that even a small positive weight on the ANO leads to a much smaller proportion value than for which the lowest weighted risk is obtained among the considered proportions.

### Vi-B General model with uncertainty and known p-values

Due to bandwidth limitations, in many distributed detection applications the sensors communicate to the FC condensed information about their observations in the form of a local decision and/or sufficient statistic. In this case, significantly less data needs to be communicated. Moreover, the local distributions at each sensor may be different and local decision statistics from each sensor may be easier to fuse than fusing the raw data from all the sensors. A common local decision statistic is the -value [15, 35, 37]

, which is the probability of obtaining test results at least as extreme as the results observed during the test assuming that the null hypothesis is correct. The

-value is general and is not necessarily obtained from the Gaussian distribution. It is a tool for deciding whether to reject the null hypothesis. When the -value approaches zero, it is more likely that the alternative hypothesis is true [31, p. 63], [34].
In this example, we assume that the -values are accurately calculated by each sensor based on its local observations. The -values from each sensor are communicated to the FC for the multiple change-points detection. Under the null hypothesis the

-value is uniformly distributed on

and thus, we set . Usually, under the alternative hypothesis the -value follows a distribution that has high density for small -values and the density decreases as the -values increase towards [36, 38]. A commonly assumed distribution for the

-value under the alternative hypothesis is the beta distribution

[36, 37, 38]. Therefore, we set , i.e. , and , where is a parameter of the th data stream probability density under the alternative hypothesis, . For each sensor, We consider uncertainty in the value of the parameter , where it is only known that , and are known. The true and unknown value of for each sensor is set by randomly choosing a number in the interval .
Due to the uncertainty in , we implement all the procedures in this example with a generalized LR (GLR), , instead of the actual LR, where we set and . For each data stream, given the observation we compute the corresponding GLR and use its value instead of the unknown LR. The probability densities, and with and , are depicted in Fig. 6. It should be noted that since the true , is smaller than or equal to , the true LR is smaller than the implemented GLR and thus, the resulting FDR may be higher than the predefined upper bound.
We perform similar simulations as in Subsection VI-A. For , we examine the FDR values of the proposed S-MAP and IS-MAP procedures with different proportions . The resulting minimum and maximum estimated FDR values of the S-MAP procedure are and , respectively. The resulting minimum and maximum estimated FDR values of the IS-MAP procedure are and , respectively. Consequently, due to the model uncertainty and the maximization of w.r.t. , some of the resulting FDR values of the IS-MAP procedure are slightly higher than . This result demonstrates that since the S-MAP procedure is more conservative than the IS-MAP procedure in terms of FDR control then, the S-MAP procedure can be viewed as more robust than the IS-MAP procedure under the assumed model uncertainty.

###### Remark 1.

In order to attempt to still maintain the FDR control of the IS-MAP procedure under the desired upper bound, we also implement it with , which is lower than the true value, , under which the random change points are generated. As previously explained, in this case the FDR of the IS-MAP procedure will be lower at the expense of higher ADD. The resulting minimum and maximum estimated FDR values of the IS-MAP procedure are and , respectively. Thus, all the IS-MAP estimated FDR values are below the predefined upper bound and FDR control is maintained. In addition, it can be seen that alternating the value of compared to the true is a tool for controlling the tradeoff between FDR and ADD in case of model uncertainty.

In Fig. 7, we evaluate the ADD of the procedures: D-FDR, S-MAP with , simple procedure with , and IS-MAP with versus . It can be seen that under the model uncertainty, all the considered procedures still have an approximately constant ADD as increases, which is in accordance with the analytical results in Section V. The parallel version of the IS-MAP procedure has the lowest ADD. In addition, the IS-MAP procedure with outperforms the parallel version of the S-MAP procedure and the D-FDR procedure, demonstrating the advantage of using the IS-MAP procedure rather than the S-MAP or the D-FDR procedures in terms of ADD. The simple procedure with have the highest ADD among the considered procedures. Thus, even under the model uncertainty, there is an advantage in terms of ADD in using the proposed MAP approach for choosing the monitored sensors rather than randomly choosing the subset of sensors to monitor. In Fig. 8, we evaluate the ANO versus of the procedures: D-FDR, S-MAP with , and IS-MAP with . It can be seen that IS-MAP with has the lowest ANO, whereas the D-FDR has the highest one. In all the considered procedures, the ANO is approximately a constant w.r.t. .
In the upper and middle plots of Fig. 9, we plot the ADDs and ANOs, respectively, of the S-MAP and the IS-MAP procedures for versus the proportion values . It can be seen that for any of the considered proportions, the IS-MAP procedure has lower ADD and ANO than the S-MAP procedure. In addition, for both procedures the ADD decreases as the proportion increases, while the ANO increases as the proportion increases. Similar to the previous example, it can be seen that there is no significant increase in ADD when the proportion decreases from to . The ANO increases significantly as increases towards . In the lower plot of Fig. 9, we plot a curve connecting the ADD-ANO points of the S-MAP and the IS-MAP procedures from the upper and middle plots of Fig. 9. It can be seen that under the model uncertainty we still have a clear tradeoff between the ADD and ANO and the ADD decreases as the ANO increases.
In the upper plot of Fig. 10, we compare the weighted risks from (48) of the S-MAP and the IS-MAP procedures with proportions versus the proportion size for . It can be seen that the weighted risk of the IS-MAP procedure is lower than the weighted risk of the S-MAP procedure. For both the S-MAP and the IS-MAP procedures, the best tradeoff among the considered proportions is achieved with the proportion . Thus, under the model uncertainty, it is still not desirable to monitor all the active data streams in parallel, when both ADD and ANO are taken into account. In the lower plot of Fig. 10, for both the S-MAP and the IS-MAP procedures, we present the best proportion among the proportions in terms of the weighted risk in (48) versus the weighting coefficient . Similarly to the previous example, for both procedures, as we increase the best proportion value decreases or does not increase. We also noticed a rapid decrease in the optimal proportion from to , as we change from to .

## Appendix A Proof of Theorem 2

In this appendix, the FDR control of the IS-MAP procedure is proved. The number of change points declared, , is known given the filtration of all the data, . Thus, using the law of total expectation, we can rewrite the FDR from (4) as

 FDR=E[E[V|F∞]max{R,1}]. (49)

Recall that is the number of false discoveries, i.e. the size of the subset of s.t. . Thus, can be written as

 V=K∑k=11{t(k)>T(k)}, (50)

where is the indicator function of the event . By substituting (50) in (49) and using the linearity of the expectation operator, we obtain

 FDR=E[∑Kk=1E[1t(k)>T(k)|F∞]max{R,1}]=E[∑Kk=1E[1t(k)>T(k)|FT(k)]max{R,1}], (51)

where the second equality is obtained since the stopping times, , are known given and for we stop observing the th data stream after , i.e. after change point declaration for the th data stream. Rewriting the expected indicator functions in (51) as conditional probabilities, we obtain

 FDR=E[∑Kk=1(1−P(t(k)≤T(k)|FT(k)))max{R,1}]=E[∑Kk=1(1−π(k)T(k))max{R,1}], (52)

where the second equality is obtained by substituting (1) into the first equality. In case , then and thus,

 1−π(k)T(k)=0, ∀k∈[K], s.t. T(k)=∞. (53)

On the other hand, in case then, at time slot