Robust Narrowest Significance Pursuit: inference for multiple change-points in the median

We propose Robust Narrowest Significance Pursuit (RNSP), a methodology for detecting localised regions in data sequences which each must contain a change-point in the median, at a prescribed global significance level. RNSP works by fitting the postulated constant model over many regions of the data using a new sign-multiresolution sup-norm-type loss, and greedily identifying the shortest intervals on which the constancy is significantly violated. By working with the signs of the data around fitted model candidates, RNSP is able to work under minimal assumptions, requiring only sign-symmetry and serial independence of the signs of the true residuals. In particular, it permits their heterogeneity and arbitrarily heavy tails. The intervals of significance returned by RNSP have a finite-sample character, are unconditional in nature and do not rely on any assumptions on the true signal. Code implementing RNSP is available at https://github.com/pfryz/nsp.

Authors

• 8 publications
09/11/2020

Narrowest Significance Pursuit: inference for multiple change-points in linear models

We propose Narrowest Significance Pursuit (NSP), a general and flexible ...
06/23/2020

Seeded intervals and noise level estimation in change point detection: A discussion of Fryzlewicz (2020)

In this discussion, we compare the choice of seeded intervals and that o...
02/16/2020

Seeded Binary Segmentation: A general methodology for fast and optimal change point detection

In recent years, there has been an increasing demand on efficient algori...
01/30/2019

Detecting multiple generalized change-points by isolating single ones

We introduce a new approach, called Isolate-Detect (ID), for the consist...
05/25/2021

On robust learning in the canonical change point problem under heavy tailed errors in finite and growing dimensions

This paper presents a number of new findings about the canonical change ...
07/03/2018

Segmented correspondence curve regression model for quantifying reproducibility of high-throughput experiments

The reliability of a high-throughput biological experiment relies highly...
03/07/2021

DI2: prior-free and multi-item discretization ofbiomedical data and its applications

Motivation: A considerable number of data mining approaches for biomedic...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

There is a growing interest in the statistical literature in the problem of uncertainty quantification for possibly multiple parameter changes in time- or space-ordered data, motivated by the practical question of whether any suspected changes are “real” (i.e. can be attributed to structural changes in the underlying stochastic model) or due to random fluctuations. General types of approaches to this problem include confidence sets associated with simultaneous multiscale change-point estimation

(Frick et al., 2014; Pein et al., 2017), post-selection inference (Hyun et al., 2018, 2021; Jewell et al., 2020; Duy et al., 2020), inference without selection and post-inference selection via Narrowest Significance Pursuit (Fryzlewicz, 2021), asymptotic confidence intervals conditional on the estimated change-point locations (Bai and Perron, 1998; Eichinger and Kirch, 2018; Cho and Kirch, 2021), False Discovery Rate (Hao et al., 2013; Li and Munk, 2016; Cheng et al., 2020)

, and Bayesian inference

(Lavielle and Lebarbier, 2001; Fearnhead, 2006). These approaches go beyond mere change-point detection and offer statistical significance statements regarding the existence and locations of change-points in the statistical model underlying the data.

However, the vast majority of the existing works, with only a handful of exceptions which we review in more detail later, do so under strong assumptions on the innovations entering the model, such as their homogeneity over time, and/or finite moments up to a certain order, and/or a specific known distribution or distributional family. This is a limitation, because change-point modelling and inference for data that do not necessarily fall into these categories is of interest in various domains, including bioinformatics

(Muggeo and Adelfio, 2011; Arlot and Celisse, 2011), biophysics (Pein et al., 2017), cancer medicine (Kim et al., 2000), economics (Bai and Perron, 2003) and finance (Oomen, 2019), amongst others.

In this paper, we are concerned with the following problem: given a sequence of noisy independent observations, automatically determine localised regions of the data which each must contain a change-point in the median, at a prescribed global significance level. The methodology we introduce, referred to as Robust Narrowest Significance Pursuit (RNSP), achieves this for the piecewise-constant median model, capturing changes in the level of the median. It does so with practically no distributional assumptions on the data, other than serial independence of the signs of the true residuals. In contrast to the existing literature, RNSP requires no knowledge of the distribution on the part of the user, and permits arbitrarily heavy tails, heterogeneity over time and/or lack of symmetry. In addition, RNSP is able to handle distributions that are continuous, continuous with mass points, or discrete, where these properties may also vary over the signal.

To situate RNSP in the context of the existing literature, we now review the few existing approaches that permit uncertainty statements for multiple change-point locations under the assumption of possible heterogeneity in the distribution of the noise.

H-SMUCE (Pein et al., 2017) is a change-point detector in the heterogeneous Gaussian piecewise-constant model , where is a piecewise-constant signal, are i.i.d. variables, and is also piecewise-constant in such a way that it can only change when does. In their Lemma C.7, the authors provide a theoretical construction of confidence intervals for the locations of the change-points in which works for a fixed significance level . However, their result assumes that the number of change-points has not been underestimated, and therefore necessarily assumes certain (and in this particular case somewhat strong) minimum distance and minimum jump size conditions, which are typically difficult to verify in practice. In addition, it offers confidence intervals of the same length regardless of the individual prominence of each change-point. Finally, the practical use of this result requires the knowledge of the number of change-points, a lower bound on the minimum spacing between each pair of change-points, and the minimum jump size, all of which will often be unknown to the analyst.

However, Pein et al. (2017) also provide a different, practical, algorithmic construction of confidence intervals, in their Section A.1. This requires having an estimate of the number of change-points, and involves screening the data for short intervals over which a constant signal fit is unsuitable and they must therefore contain change-points. Crucially, this algorithmic construction relies on the knowledge of scale-dependent critical values (for measuring the unsuitability of a locally constant fit), which are not available analytically but only by simulation, and therefore the method would not extend automatically to unknown noise distributions (as the analyst needs to know what distribution to sample from). In Section 4

, we show that H-SMUCE suffers from inflated type I error rates in the sense that the thus-constructed confidence intervals, in the examples of Gaussian models shown, do not all contain at least one true change-point each in more than

of the cases, contrary to what this algorithmic construction sets out to do. H-SMUCE is also prone to failing if the model is mis-specified, e.g. if the distribution of the data has a mass point (which is unsurprising in view of its assumption of Gaussianity).

In contrast to H-SMUCE, RNSP does not assume Gaussianity, and works with any (possibly heterogeneous) noise distributions, as long as they are sign-symmetric around the median, a very weak requirement which is immaterial for continuous distributions (as all, even non-symmetric, continuous distributions are sign-symmetric). The noise distribution does not need to remain constant between each consecutive pair of change-points, can change arbitrarily, and can be continuous, discrete, or continuous with mass points. The execution of RNSP does not rely on having an estimate of the number of change-points. Critical values needed by RNSP do not depend on the noise distribution (which can be unknown to the analyst), and can be accurately approximated analytically. RNSP explicitly targets the shortest possible intervals of significance, and its algorithmic construction ensures exact finite-sample coverage guarantees.

The non-robust predecessor of RNSP, Narrowest Significance Pursuit (NSP; Fryzlewicz, 2021

) is able to handle heterogeneous data in a variety of models, including the piecewise-constant signal plus independent noise model. However, NSP requires that the noise, if heterogeneous, satisfies the central limit theorem (i.e. is within the domain of attraction of the normal distribution) and is symmetric, neither of which is assumed in RNSP. Crucially, noise heterogeneity is dealt with in NSP via self-normalisation using the functional-analytic framework of

Rac̆kauskas and Suquet (2003)

. The self-normalised statistic used in NSP includes a term resembling an estimate of the local variance of the noise, which, however, is only unbiased under the null hypothesis of no change-point being present locally. The fact that the same term over-estimates the variance under the alternative hypothesis, reduces the power of the detection statistic, which leads to typically long intervals of significance. We illustrate this issue with the self-normalised non-robust NSP for heterogeneous data in Section

4 and show that RNSP offers a significant improvement.

The Bayesian approach of Fearnhead (2006) provides an inference framework for independent observations drawn from a density within the th segment, with

possibly vector-valued, which permits some types of heterogeneity. Operating in this framework requires the choice of either of two types of priors: one involving a prior on the number of change-points plus a conditional prior on their locations, and the other – an instance of the product-partition model. In addition, the distribution family

is assumed known, and a prior need to be specified for the parameters. While self-contained and not overly computationally intensive, this approach suffers from the fact that the estimation of the number of change-points is a difficult statistical problem and therefore so is choosing a good prior for this quantity; for the same reason, it is a choice that can significantly affect the inferential conclusions.

Bai and Perron (1998) and Bai and Perron (2003), in seminal papers studying least-squares estimation of multiple change-points in regression models under possible heterogeneity of the errors, describe a procedure for computing confidence intervals conditional on detection. For an unknown distribution of the errors, the limiting distribution of each estimated change-point location converges to an appropriate functional of the Wiener process only under the assumption that the corresponding break size goes to zero with the sample size. The asymptotic validity of the resulting confidence interval relies on the consistency of the estimators of the unknown quantities involved (such as the local variance of the innovations or the break size); it is therefore a large-sample, asymptotic construction. Crucially, it does not take into the account the uncertainty associated with detection, which can be considerable especially for the more difficult problems (for an example, see the “US ex-post real interest rate” case study in Bai and Perron (2003), where there is genuine uncertainty between models with 2 and 3 change-points; we revisit this example in Section 5.1). By contrast, RNSP produces intervals of significant change in the median that are not conditional on detection, have a finite-sample nature and are valid regardless of the size of the breaks.

The paper is organised as follows. Section 2 motivates RNSP and sets out its general algorithmic framework. Section 3 describes how RNSP measures the local deviation from model constancy, its crucial ingredient. It also gives theoretical performance guarantees for RNSP. Section 4 contains numerical examples and comparisons. Section 5 includes two examples showing the practical usefulness of RNSP. Section 6 concludes with a brief discussion. Software implementing RNSP is available at https://github.com/pfryz/nsp.

2 Motivation and review of RNSP algorithmic setting

2.1 RNSP: context and modus operandi

Robust NSP (RNSP) differs from NSP as defined in Fryzlewicz (2021) in terms of the data functional it targets, and in how it measures local deviation of this data functional from the baseline model. RNSP discovers regions in the data in which the median departs from the baseline model. This is in contrast to NSP, which targets the mean. In RNSP, we do not make any moment assumptions about the data, and therefore the median is a natural measure of data centrality. In RNSP, the baseline model is constant, and therefore the algorithm will look to discover segments in the data in which the median must experience abrupt departures from constancy (change-points in the level), at a certain global significance level.

In this section, we review the components of the algorithmic framework that are shared between NSP and RNSP, with the generic measurement of local deviation from the constant model as one of its building blocks. In Section 3, we will introduce the particular way in which local deviation from the constant model is measured in RNSP, which is appropriate for the median and hence fundamentally different from NSP.

RNSP operates in the signal plus noise model

 Yt=ft+Zt,t=1,…,T, (1)

where the variables and the signal satisfy, respectively, the assumptions below. Before stating them, we define the function as follows.

 sign(x)=⎧⎪⎨⎪⎩1x>00x=0−1x<0. (2)
Assumption 2.1
1. The variables are mutually independent.

2. , , where is the median operator. (If the median is non-unique, we require .)

3. The variables are sign-symmetric in the sense that , .

do not have to be identically distributed, and can have arbitrary mass atoms, or none, as long as their distributions satisfy (b) and (c) in Assumption 2.1. We do not impose any moment assumptions. Any continuous distribution (even one with an asymmetric density function) is sign-symmetric. The distribution(s) of can be unknown to the analyst. Note that the requirement of the independence of is significantly weaker than that of the independence of itself: for example, if is a (G)ARCH process driven by symmetric, independent innovations, then is serially independent, while is not.

Assumption 2.2

In (1), is a piecewise-constant vector with an unknown number and locations of change-points. (The location is a change-point if .)

In this work, RNSP does not venture beyond the piecewise-constant model; this is in contrast to the non-robust NSP (Fryzlewicz, 2021)

, applicable also to piecewise-polynomial models of higher degrees, and piecewise-stationary regression models with arbitrary covariates (including with autoregression). The reason for it is that for robustness, RNSP works with the signs of the data – both at the stage of fitting the best constant model on each interval, and in examining the empirical residuals from the fits. Due to the non-linearity of the sign operator, RNSP cannot use linear programming for the constant model fits (unlike the non-robust NSP), and has to manually try all possible constant model fits on each interval, to be able to select one that gives the minimum deviation, as this is required for its coverage guarantees. This is a problem of dimensionality one in the piecewise-constant model, which makes it computationally straightforward and fast (details are in Section

3.2). However, in models of parameter dimension two or higher (such as the piecewise-linear model), an analogous exact solution would be too computationally demanding, hence the restriction to the piecewise-constant model in the current work. As we explain in detail later on, our coverage guarantees for RNSP in the piecewise-constant median model are exact and hold for any sample size.

RNSP achieves the high level of generality in terms of the permitted distributions of stated in Assumption 2.1 thanks to its use of the sign transformation. The use of 0 in is critical for RNSP’s objective to provide correct coverage guarantees also for discrete distributions and continuous distributions with mass points. We illustrate the importance of this point in Section 3.2.

2.2 (R)NSP: generic algorithm

The generic algorithmic framework underlying both RNSP and the non-robust NSP in Fryzlewicz (2021) is the same and is based on recursively searching for the shortest sub-samples in the data with globally significant deviations from the baseline model. In this section, we introduce this shared generic framework. In the following sections, we show how RNSP diverges from NSP through its use of a robust measure of deviation from the baseline model, suitable for the very broad class of distributions specified in Assumption 2.1.

We start with a pseudocode definition of the RNSP algorithm, in the form of a recursively defined function RNSP. In its arguments, is the current interval under consideration and at the start of the procedure, we have ; (of length ) is as in the model formula (1); is the minimum guaranteed number of sub-intervals of drawn (unless the number of all sub-intervals of is less than , in which case drawing sub-intervals would mean repetition); is the threshold corresponding to the global significance level (typical values for would be 0.05 or 0.1) and (respectively ) is a functional parameter used to specify the degree of overlap of the left (respectively right) child interval of with respect to the region of significance identified within , if any. The no-overlap case would correspond to . In each recursive call on a generic interval , RNSP adds to the set any globally significant local regions (intervals) of the data identified within on which is deemed to depart significantly (at global level ) from the baseline constant model. We provide more details underneath the pseudocode below. In the remainder of the paper, the subscript relates to a constant indexed by the interval , whose value will be clear from the context.

1:function RNSP(, , , , , , )
2:     if  then
3:         STOP
4:     end if
5:     if  then
6:
7:         draw all intervals , , s.t.
8:     else
9:         draw a representative (see description below) sample of intervals , , s.t.
10:     end if
11:     for  do
12:          DeviationFromConstantModel
13:     end for
14:
15:     if  then
16:         STOP
17:     end if
18:     AnyOf
19:     ShortestSignificantSubinterval
20:     add to the set of significant intervals
21:     RNSP
22:     RNSP
23:end function

The RNSP algorithm is launched by the pair of calls below.

RNSP

On completion, the output of RNSP is in the variable . We now comment on the RNSP function line by line. In lines 2–4, execution is terminated for intervals that are too short. In lines 5–10, a check is performed to see if is at least as large as the number of all sub-intervals of . If so, then is adjusted accordingly, and all sub-intervals are stored in . Otherwise, a sample of sub-intervals is drawn in which either (a) and are obtained uniformly and with replacement from (random sampling), or (b) and are all possible pairs from an (approximately) equispaced grid on which permits at least such sub-intervals (deterministic sampling).

In lines 11–13, each sub-interval is checked to see to what extent the response on this sub-interval (denoted by ) conforms to the baseline constant model. This core step of the RNSP algorithm will be described in more detail in Section 3.

In line 14, the measures of deviation obtained in line 12 are tested against threshold , chosen to guaranteed global significance level . How to choose is independent of the distribution of if it is continuous, and there is also a simple distribution-independent choice of

for discrete distributions and continuous distributions with probability masses; see Section

3.3. The shortest sub-interval(s) for which the test rejects the baseline model at global level are collected in set . In lines 15–17, if is empty, then the procedure decides that it has not found regions of significant deviations from the constant model on , and stops on this interval as a consequence. Otherwise, in line 18, the procedure continues by choosing the sub-interval, from among the shortest significant ones, on which the deviation from the baseline constant model has been the largest. The chosen interval is denoted by .

In line 19, is searched for its shortest significant sub-interval, i.e. the shortest sub-interval on which the hypothesis of the baseline model is rejected locally at a global level . Such a sub-interval certainly exists, as itself has this property. The structure of this search again follows the workflow of the RNSP procedure; more specifically, it proceeds by executing lines 2–18 of RNSP, but with in place of . The chosen interval is denoted by . The importance of this second-stage search in RNSP’s pursuit to produce short intervals is discussed in detail (in the context of NSP) in Fryzlewicz (2021). In line 20, the selected interval is added to the output set .

In lines 21–22, RNSP is executed recursively to the left and to the right of the detected interval . However, we optionally allow for some overlap with . The overlap, if present, is a function of and, if it involves detection of the location of a change-point within , then it is also a function of . An example of the relevance of this is given in Section 5.1.

3 Robust NSP: measuring deviation from the constant model

3.1 Principles and building blocks of the new deviation measure

The main structure of the DeviationFromConstantModel operation is as follows:

1. Fit the best, in the sense described precisely later, constant model to .

2. Examine the signs of the empirical residuals from this fit. If their distribution is deemed to contain a change-point (which indicates that the constant model fit is unsatisfactory on and therefore the model contains a change-point on that interval), the value returned by DeviationFromConstantModel should be large; otherwise small.

For RNSP to be able to control, globally and non-trivially, the probability of spurious detection, the deviation measure it uses must meet the two desiderata below.

Desideratum A.

If there is no change-point on , then the value of DeviationFromConstantModel should be bounded from above by the same deviation measure for the true model on (the latter is an oracular quantity, unknown to the analyst), and this in turn should be bounded from above, uniformly over all

, by a random variable involving the signs of the true residuals

only, such that either its distribution is known or its quantiles can easily be bounded, sharply, from above.

Desideratum B.

The deviation measure on cannot be made smaller by proposing unrealistic constant model fits on that interval; otherwise it would be easy to force non-detection on any interval. This is an important desired property as our deviation measure will need to ‘try’ all possible constant model fits and choose one for which the deviation measure is the smallest, to ensure that Desideratum A (the part relating to boundedness from above by the deviation measure for the true model) is met.

As in Fryzlewicz (2021), a key ingredient of our measure of deviation, which helps it achieve the above desired properties, is a multiresolution sup-norm introduced below. The fundamental difference with Fryzlewicz (2021) is that RNSP uses it on the sign of the input rather than in the original data domain. The basic building block of the multiresolution sup-norm is a scaled partial sum statistic, defined for an arbitrary input sequence by

 Us,e(x)=1(e−s+1)1/2e∑t=sxt. (3)

We define the multiresolution sup-norm (Nemirovski, 1986; Li, 2016) of an input vector (of length ) with respect to the interval set as

 ∥x∥I=max[s,e]∈I|Us,e(x)|. (4)

The set used in RNSP contains intervals at a range of scales and locations. A canonical example of a suitable interval set is the set of all subintervals of . We will use in defining the largest acceptable global probability of spurious detection. However, for computational reasons, DeviationFromConstantModel will use a smaller interval set (we give the details below). This will not affect the exactness of our coverage guarantees, because, naturally, if , then . We also define the restriction of to an arbitrary interval as

 I[s,e]={[u,v]⊆[s,e]:[u,v]∈I}.

Note the trivial inequality

 ∥xs:e∥Ia[s,e]≤∥x∥Ia (5)

for any .

There are a number of reasons for the use of the multiresolution sup-norm in our deviation measure. One is that thanks to the particular construction of the RNSP algorithm and our deviation measure, in order for us to be able to control spurious detections at a global level , it suffices to know, or be able to bound from above (as sharply as possible), the quantile of the distribution of . However, relatively sharp results of this type are either well-known or easy to obtain, as we show in Section 3.3.

Another reason is that the multiresolution sup-norm of the signs of the empirical residuals from a postulated baseline model fit is naturally large for proposed model fits if they are unrealistic, which penalises them, thereby ensuring that Desideratum B above is met. This important property would not be available if, instead of the multiresolution sup-norm of the residual signs, we chose to use, for example, maximum-likelihood or other contrast-based approaches to test deviations from the baseline (the simplest example of such a test is the CUSUM test). We discuss this point in more detail in Section 3.2.1, where we also give further reasons supporting the use of the multiresolution sup-norm in our context.

In line with Desideratum A, in the following sections we define DeviationFromConstantModel which satisfies the property that if there is no change-point on the interval , then it is guaranteed that

 D[sm,em]≤∥sign(Zsm:em)∥Ia[sm,em]. (6)

3.2 Deviation measure: definition and properties

We now define the deviation measure , returned by the DeviationFromConstantModel function from line 12 of the RNSP algorithm.

The discussion below assumes that there is no change-point in . For the true constant signal , denote . If we knew , we could use as , since we trivially have

 ∥sign(Ysm:em−f[sm,em])∥Ia[sm,em]=∥sign(Zsm:em)∥Ia[sm,em],

which satisfies the desired condition (6). Of course is not known, but a key observation is that there are only at most different possible constants leading to different sequences . To see this, sort the values of in non-decreasing order to create . Take candidate constants , , defined as follows.

 ~f{1}[sm,em] < Y(1)(but otherwise arbitrary) ~f{2}[sm,em] = Y(1) ~f{3}[sm,em] = 12(Y(1)+Y(2)) ~f{4}[sm,em] = Y(2) ~f{5}[sm,em] = 12(Y(2)+Y(3)) ⋮ ~f{2(em−sm)+2}[sm,em] = Y(em−sm+1) ~f{2(em−sm)+3}[sm,em] > Y(em−sm+1)(but otherwise arbitrary). (7)

We have the following simple result.

Proposition 3.1

Assume no change-point in and denote . Let the constants , be defined as in (7). There exists a such that

 {sign(Yt−f[sm,em])}emt=sm={sign(Yt−~f{j0}[sm,em])}emt=sm. (8)

Proof. If there is a such that , then choose so that and (8) is trivially satisfied. Otherwise, if there is a such that , then choose so that

 {sign(Yt−f[sm,em])}emt=sm = {sign(Yt−(Y(j1)+Y(j1+1))/2)}emt=sm = {sign(Yt−~f{j0}[sm,em])}emt=sm,

and (8) is satisfied. Otherwise, it must be that either or ; in the former case take and in the latter, and (8) is satisfied by a similar argument.

We now define our measure of deviation , and prove its key property as a corollary to Proposition 3.1.

Definition 3.1

Let the constants , be defined as in (7). We define

 D[sm,em]:=minj∈{1,…,2(sm−em)+3}∥sign(Ysm:em−~f{j}[sm,em])∥Ia[sm,em]. (9)

Essentially, tries all possible baseline constant model fits on and chooses the one for which the multiresolution norm of the signs of the residuals from the fit is the smallest. Choosing the fit that minimises the multiresolution fit is essential for guaranteeing coverage properties, as we will see below. If there is a change-point on , the hope is that even the minimum multiresoltion norm fit as returned by will be large enough for RNSP to indicate a change-point on that interval.

Corollary 3.1

For any interval on which there is no change-point, we have

 D[sm,em]≤∥sign(Zsm:em)∥Ia[sm,em]. (10)

In other words, the deviation measure defined in (9) satisfies the desired property (6).

Proof. Let the index be as in the statement of Proposition 3.1. We have

 D[sm,em] = minj∈{1,…,2(sm−em)+3}∥sign(Ysm:em−~f{j}[sm,em])∥Ia[sm,em] ≤ ∥sign(Ysm:em−~f{j0}[sm,em])∥Ia[sm,em] = ∥sign(Ysm:em−f[sm,em])∥Ia[sm,em] = ∥sign(Zsm:em)∥Ia[sm,em].

This leads to the following guarantee for the RNSP algorithm.

Theorem 3.1

Let be the set of intervals returned by the RNSP algorithm. The following guarantee holds.

 P(∃i=1,…,R∀j=1,…,N[ηj,ηj+1]⊈Si)≤P(∥sign(Z)∥Ia>λα).

Proof. On the set , each interval must contain a change-point as if it did not, then by Corollary 3.1 and inequality (5), we would have to have

 DSi≤∥sign(Z)∥Ia≤λα. (11)

However, the fact that was returned by RNSP means, by line 14 of the RNSP algorithm, that , which contradicts (11). This completes the proof.

Theorem 3.1 should be read as follows. Let . For a set of intervals returned by RNSP, we are guaranteed, with probability of at least , that there is at least one change-point in each of these intervals. Therefore, can be interpreted as an automatically chosen set of regions (intervals) of significance in the data. In the no-change-point case (), the correct reading of Theorem 3.1 is that the probability of obtaining one of more intervals of significance () is bounded from above by .

We emphasise that Theorem 3.1 does not promise to detect all the change-points, or to do so asymptotically as the sample size gets larger: this would be impossible without assumptions on the strength of the change-points (involving spacing between neighbouring change-points and the sizes of the jumps). Such assumptions are typically impossible to verify in practice and we do not make them in this work. Instead, Theorem 3.1 promises that every interval of significance returned by RNSP must contain at least one change-point each, with a certain global probability adjustable by the user.

The intervals of significance returned by RNSP have an “unconditional confidence interval” interpretation: they are not conditional on any prior detection event, but indicate regions in the data each of which must unconditionally contain at least one change in the median of , with a global probability of at least . Therefore, as in NSP (Fryzlewicz, 2021), RNSP can be viewed as performing “inference without selection” (where “inference” refers to producing the RNSP intervals of significance and “selection” to the estimation of change-point locations, absent from RNSP). This viewpoint also enables “post-inference selection” or “in-inference selection” if the exact change-point locations (if any) are to be estimated within the RNSP intervals of significance after or during the execution of RNSP. See the discussion in Fryzlewicz (2021) for a further discussion of these aspects. We also emphasise that the statement of Theorem 3.1 is of a finite-sample character and holds exactly and for any sample size.

3.2.1 Deviation measure: discussion

We now comment on a number of aspects of the deviation measure .

Method of computation. In the non-robust NSP (Fryzlewicz, 2021), the analogue of was amenable to computation via linear programming; the non-robust NSP with self-normalisation also used linear programming as the main step in its computation. By contrast, in RNSP, measuring the deviation from the baseline model involves examining the signs of the residuals rather than the residuals themselves as was the case in NSP. Because of the non-linearity of the sign function, it is not possible to use linear programming for the computation of in RNSP, and (9) needs to be computed by manually trying out all candidate constants . While this may appear expensive, it is the only option, as (a) trying only ‘realistic’ constant model fits would require an additional global parameter describing what it means for a local constant fit to be realistic, and (b) carrying out the fitting in an ad hoc way, e.g. by only fitting the empirical median of (rather than each of the constants ), would violate the desired property (6) and therefore not be able to lead to exact coverage guarantees.

Achieving computational savings without affecting coverage guarantees. Although the operation of trying each constant in (9) is relatively fast, in order to accelerate it further, we introduce the two computational savings below, which do not increase and therefore respect the crucial inequality (10) and hence also our coverage guarantees in Theorem 3.1.

Reducing the set .

To accelerate the computation of (9), we replace the set in with the set (with and standing for left and right, respectively), defined as follows.

 Ilr[sm,em]:=Il[sm,em]∪Ir[sm,em],

where

 Il[sm,em] = {[sm,sm+1],[sm,sm+2],…,[sm,em]} Ir[sm,em] = {[sm,em],[sm+1,em],…,[em−1,em]}.

This reduces the cardinality of the set of intervals included in from to . As and hence , the results of Corollary 3.1 and Theorem 3.1 remain unchanged for the thus-reduced . On the other hand, has been defined in this particular way so as not to compromise detection power in the piecewise-constant median model. To see this, consider the following illustrative example. Suppose (noiseless case) and for and for . On , the baseline constant signal level fitted is and we have

 sign(Yt−~f[1,100])={−1t=1,…,501t=51,…,100.

In this setting, the two multiresoltion sup-norms:

 ∥sign(Yt−~f[1,100])∥Ilr[1,100]and∥sign(Yt−~f[1,100])∥Ia[1,100]

are identical, equal to and achieve this value for the intervals and , members of both and . This simple example illustrates the wider phenomenon that if there is a single change-point in the median on a generic interval under consideration, then in the noiseless case the multiresolution norm over the set is maximised at one of the “left” or “right” intervals in , and we are happy to sacrifice potential negligible differences in the noisy case in exchange for the substantial computational savings.

Limiting interval lengths.

In practice, the analyst may not be interested in excessively long RNSP intervals of significance, or in other words in change-points so weak that only a very long interval around them is required to be able to detect them. With this in mind, our software at https://github.com/pfryz/nsp provides an option of limiting the length of intervals considered in the sense that is able to automatically return zero if for a user-specified maximum length .

Unsuitability of CUSUM or similar contrasts in deviation measure. We note that it would be impossible to replace the multiresolution sup-norm in by, for example, the CUSUM statistic or a similar contrast measure such as that described in Ellenberger et al. (2021). Consider, for example, the hypothetical definition of a deviation measure as follows:

 Dinv[sm,em]:=minj∈{1,…,2(sm−em)+3}|CUSUM{sign(Ysm:em−~f{j}[sm,em])}|,

where “inv” stands for invalid. The definition of the CUSUM statistic is well-established, see e.g. Fryzlewicz (2014) for details. This is an invalid definition as is a vector of ones, and the CUSUM statistic returns zero for constant vectors, so would not be able to offer detection under any circumstances as it would always equal zero. This is an example of a construction that violates Desideratum B.

Importance of zero in sign function. For valid coverage guarantees in the presence of mass points in (or if the distribution of is discrete), it is crucial for the sign function defined in (2) to return zero if its argument is zero. To illustrate this point, define

 signinv(x)={1x≥0−1x<0, (12)

where “inv” stands for “invalid”, and consider the trivial case . For no-change-point input data , there are only three different constants (see definition (7)). With in place of , this would lead to

 {signinv(Yt−~f{1}[sm,em])}emt=sm = (1,…,1) {signinv(Yt−~f{2}[sm,em])}emt=sm = (1,…,1) {signinv(Yt−~f{3}[sm,em])}emt=sm = (−1,…,−1)

and therefore we would have , the largest value can possibly take, which would therefore have to lead to the (spurious) designation of as containing a change-point, for suitably large. Naturally, the analogous argument would also apply if rather than 1. By contrast, note that the use of the (correct) sign function leads to

 {sign(Yt−~f{2}[sm,em])}emt=sm=(0,…,0),

which yields and therefore the (correct) designation of as not containing a change-point.

Importance of placing at data points. For the same reason, it is important that the set of test levels (definition (7)) includes those placed at the data points themselves (i.e. those indexed by even values of in definition (7)). Indeed, continuing the example directly above, if we were to exclude the constant from the list of test levels considered, we would then have even with the use of the correct function sign (and not only with ), which would again lead to spurious detection.

Importance of placing in between sorted data points. It is equally important that the test levels

should include those placed in between the sorted data points (i.e. those indexed by odd values of

in definition (7)). This can be see e.g. by considering such that ; for brevity, we omit the full discussion.

3.3 Evaluation and bounds for ∥sign(Z)∥Ia

To make Theorem 3.1 operational, we need to obtain an understanding of the distribution of so we are able to choose such that (or approximately so) for a desired global significance level .

Initially we consider such that , i.e. for all (the general case is covered in the next paragraph). From Theorem 1.1 in Kabluchko and Wang (2014), we have that

 limT→∞P(∥sign(Z)∥Ia>aT+τ/aT)=1−exp(−2Λexp(−τ)), (13)

where and is a constant. As Kabluchko and Wang (2014) do not unambiguously state the value of , we use simulation over a range of values of and to determine a suitable value of as 0.274. The practical choice of the significance threshold then proceed as follows: (a) fix to the desired level, for example 0.05 or 0.1; (b) obtain the value of by equating ; (c) obtain .

Suppose now that ; note that the sign-symmetry Assumption 2.1(c) implies . Construct the variable . As , the limiting statement (13) applies to . However, we have the double inequality

 ∥sign(Z)∥Ia≤∥sign(~Z)∥IaI≤∥sign(~Z)∥Ia, (14)

with , where . The first inequality in (14) holds because every constituent partial sum of has a corresponding larger or equal in magnitude partial sum in constructed by removing the zeros from its numerator and decreasing (or not increasing) its denominator as it contains fewer (or the same number of) terms. As an illustrative example, suppose the sequence of starts . The absolute partial sum , a constituent of , is majorised by the absolute partial sum , a constituent of