# A linear time algorithm for multiscale quantile simulation

Change-point problems have appeared in a great many applications for example cancer genetics, econometrics and climate change. Modern multiscale type segmentation methods are considered to be a statistically efficient approach for multiple change-point detection, which minimize the number of change-points under a multiscale side-constraint. The constraint threshold plays a critical role in balancing the data-fit and model complexity. However, the computation time of such a threshold is quadratic in terms of sample size n, making it impractical for large scale problems. In this paper we proposed an O(n) algorithm by utilizing the hidden quasiconvexity structure of the problem. It applies to all regression models in exponential family with arbitrary convex scale penalties. Simulations verify its computational efficiency and accuracy. An implementation is provided in R-package "linearQ" on CRAN.

## Authors

• 1 publication
• 8 publications
• 8 publications
• 34 publications
• ### Localised pruning for data segmentation based on multiscale change point procedures

The segmentation of a time series into piecewise stationary segments is ...
10/28/2019 ∙ by Haeran Cho, et al. ∙ 0

• ### Optimal Change-Point Detection and Localization

Given a times series Y in ℝ^n, with a piece-wise contant mean and indepe...
10/22/2020 ∙ by Nicolas Verzelen, et al. ∙ 0

• ### Multiscale change point detection for dependent data

In this paper we study the theoretical properties of the simultaneous mu...
11/14/2018 ∙ by Holger Dette, et al. ∙ 0

• ### Scalable GWR: A linear-time algorithm for large-scale geographically weighted regression with polynomial kernels

While a number of studies have developed fast geographically weighted re...
05/01/2019 ∙ by Daisuke Murakami, et al. ∙ 0

• ### New efficient algorithms for multiple change-point detection with kernels

Several statistical approaches based on reproducing kernels have been pr...
10/12/2017 ∙ by Alain Celisse, et al. ∙ 0

• ### Variational Multiscale Nonparametric Regression: Algorithms and Implementation

Many modern statistically efficient methods come with tremendous computa...
10/20/2020 ∙ by Miguel del Álamo, et al. ∙ 0

• ### A Variational View on Statistical Multiscale Estimation

We present a unifying view on various statistical estimation techniques ...
06/10/2021 ∙ by Markus Haltmeier, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In this paper, we assume that observations are independent from the regression model

 Yi∼Fϑ(i/n),i=0,…,n−1 (1)

where is a one-dimensional exponential family distribution with densities . The parametric function is a right-continuous piecewise constant function. The model (1) includes the Gaussian mean regression as a special case, that is,

 Yi∼ϑ(i/n)+σεi,i=0,…,n−1, (2)

where and the standard Gaussian.

The (multiple) change-point problem amounts to estimating the number and locations of change-points and the value of function

on each segment. The study of change-point detection problems has a long and rich history in the statistical literatures (Carlstein et al., 1994; Csörgö and Horvàth, 2011; DavidSiegmund, 2013), and has experienced a revival in recent years, mainly due to modern large scale applications, for example in bioinformatics, predicting transmembrane helix locations (Lio and Vannucci, 2000), detecting changes in the DNA copy number (Olshen et al., 2004; Venkatraman and Olshen, 2007); in climate, analyzing undocumented change-points in climate data (Reeves et al., 2007); and in economics and finance, identifying change-points in financial volatility (Spokoiny, 2009).

Among the vast literature of change-point problems, we consider the so-called multiscale change-point segmentation methods (see e.g. Frick et al., 2014; Li et al., 2016), which are statistically well-understood and meanwhile practically well-performed, see also (Davies et al., 2012; Hotz et al., 2013; Li et al., 2017). These multiscale segmentation methods minimize the number of the change-points subjected to a side-constraint that multiscale statistics does not exceed a specified threshold (see Section 2.1 for a formal definition). The threshold , as a balancing parameter between the data-fit and model complexity, is often chosen as the quantile of under null distribution (e.g., ). Unfortunately, the computation of such a quantile involves the evaluation of , which has quadratic computational time in terms of sample size, and has to be repeated sufficiently many times to guarantee a proper estimation accuracy. This makes the multiscale segmentation methods impractical for large scale applications (e.g, for ). To overcome this computation bottleneck, we proposed, in this paper, a fast algorithm with linear computational complexity for the evaluation of , see Figure 1 for an illustration.

The rest of the paper is organized as follows. In Section 2, we introduce the multiscale change-point segmentation methods, and some basic concepts from algorithmic geometry. In Section 3 we propose a linear algorithm for the evaluation of and give its complexity analysis. The performance of the proposed algorithm is examined by simulations in Section 4. Section 5 concludes this paper.

## 2 Background

### 2.1 Multiscale change-point segmentation

We start with a brief introduction of the multiscale change-point segmentation methods for the change-point problem in (1). Recall that the underlying truth is right-continuous and piecewise constant, i.e.,

 ϑ(t)=M∑m=0θm1[τm,τm+1)(t).

where denote the locations of change-points, and the function value on the th segment with . Let denote the space of all right continuous step functions, and, for every in , let denote the set of change-points and the number of change-points.

The multiscale change-point segmentation estimator is the solution of the optimization problem (Frick et al., 2014; Li et al., 2016, 2017):

 infϑ∈S#V(ϑ)subject to  Tn(Y,ϑ)≤q. (3)

where is a user-specified threshold, and a multiscale statistic. By we denote the collection of all subintervals of . The multiscale statistic is defined as the maximum of penalized local likelihood ratio statistic on every interval where is constant, that is,

 Tn(Y,ϑ)=maxI∈Lϑ(t)=θI, t∈ITI(Y,θ)−pI. (4)

Here the penalty terms play a role as scale (i.e., length of ) calibration, which aim to put different scales on equal baseline especially for small intervals (see Dümbgen and Spokoiny, 2001; Frick et al., 2014). The local likelihood ratio statistic

is a testing statistic on the hypothesis

versus the alternative with on interval , more precisely,

 TI(Y,θ0)= ⎷log(supθ∈Θ∏k/n∈Ifθ(Yk)∏k/n∈Ifθ0(Yk)). (5)

Note that the specific form in (5) is crucial if one wants to use a simple, additive penalty term that yields statistical optimality, as in (4), see Rivera and Walther (2013).

The user-specific threshold in (3

) controls the probability of overestimating and underestimating the number of change-points. From asymptotic analysis, it is sufficient to choose a universal threshold

, see Li et al. (2017). In practice, it is recommended to select the quantile of null asymptotic distribution of with certain significance level , which allows for an immediate statistical interpretation

 P{#V(^ϑ)≤#V(ϑ)}≥1−α

see Frick et al. (2014). Given such choices of , the solution to problem (3) exists but may be non-unique, in which case one is free to choose the solution, such as the constrained maximum likelihood estimator (Frick et al., 2014). Note that the value of can be estimated via Monte Carlo simulations, because the distribution of or its asymptotic distribution (Frick et al., 2014, Theorem 2.1) is independent of the unknown truth .

### 2.2 Constrained Minkowski sum

We now restrict ourselves to the Euclidean space . The Minkowski sum, a fundamental concept in algorithmic geometry, is defined as for . As in Bernholt et al. (2009), we define the constrained Minkowski sum as

 (P+Q)+={x∈P⊕Q|x1>0}with x1 the first coordinate of point x∈R2

By we denote the convex hull of , and by the set of vertices of . Bernholt and Hofmeister (2006), Bernholt et al. (2007, 2009) have shown that can be computed in time, if and are sorted with respect to the first coordinate. More precisely, a set can be computed such that and .

For general (not necessarily ordered) and , the computation of requires runtime, where the additional log factor is due to sorting algorithms. See (e.g. Fukuda, 2004; Weibel, 2007) for the computation of Minkowski sum in with .

### 2.3 Quasiconvexity

We recall some basic results of quasiconvexity, a useful generalization of convexity, see e.g., (Boyd and Vandenberghe, 2004, Section 4 in Chapter 3) for further details and the proofs.

###### Definition 1.

Let be a nonempty convex set. A function is called quasiconvex if its sublevel set is convex for every .

Note that convex functions are clearly quasiconvex, and that many properties of convex functions carry over to quasiconvex functions.

###### Proposition 1.

let be a nonempty convex set. A function is quasiconvex if and only if for any and any it holds that

 f(λ⋅s1+(1−λ)⋅s2)≤max{f(s1),f(s2)}.

## 3 An O(n) method for quantile simulation

In this section, we first consider the computation of quantiles for multiscale change-point segmentation methods. We will show that the evaluation of is equivalent to finding the maximal value of a quasiconvex function over a constrained Minkowski sum.

### 3.1 Fast quantile simulation

We start with the Gaussian mean regression model (2), and the penalty term given in Frick et al. (2014). Note that in model (2) the distribution of is independent of . Thus, it is sufficient to consider

 Tn\coloneqqTn(Y,ϑ≡0)=max1≤i≤j≤n1σ√j−i+1∣∣j∑k=iYk∣∣−√2log(enj−i+1). (6)

The direct evaluation of (6) leads to runtime. As the summation can be viewed as convolution, the evaluation of (6

) can be speeded up by utilizing fast Fourier transforms, resulting in

runtime (which is implemented in CRAN R-package “stepR”), see e.g. Hotz et al. (2013). A further speedup is possible by means of cumulative sum transformation , which reduces a summation over to a single subtraction. This leads to an algorithm of complexity (which is implemented in CRAN R-package “FDRSeg”), see also Allison (2003). In what follows, we will present a fast algorithm for evaluating (6) in a linear runtime, i.e., .

For , we define , and . The evaluation of in (6) can be written as an optimization of a bivariate function over finite collection of points, more precisely,

 Tn=max1≤i≤j≤nh(ℓi,j,si,j)with h(x1,x2)\coloneqq|x2|σ√x1−√2logenx1. (7)
###### Proposition 2.

The bivariate function in (7) is quasiconvex over .

###### Proof.

By Definition 1, it is sufficient to show that sublevel set

 Dα={(x1,x2):|x2|≤σ(α+√2logenx1)√x1, and 0

is convex for all . Define for . Notice that it is trivial when because sublevel set is empty. If is not empty, it follows that . Noting that implies , we have

Thus, is concave, and it follows that is convex for all . ∎

By Proposition 1 we have that the maximal value of in (7) over is attained at the vertices of the convex hull of . To be precise, we define and with the convention that . Note that . It follows that

 Tn=maxx∈(P⊕Q)+h(x)=maxx∈conv(P⊕Q)+h(x)=maxx∈vconv(P⊕Q)+h(x).

Moreover, it is known that there is a linear algorithm for finding (see Section 2.2). Based on it, we can derive a linear algorithm for the evaluation of , the details of which is given in Algorithm 1.

In Algorithm 1, the incremental Graham scan algorithm (Graham, 1972) is employed in first step to compute the convex hull of in runtime on line 2. For each point , we consider in order to satisfy the constraint  (line 9). Among such points, we compute a set that contains the vertices involving in  (line 11-16). After recording to (line 18), we delete and from (line 21), because there is no point in of the form for and , see Bernholt et al. (2009) for a proof. Then the algorithm proceeds recursively; each time we update and . In the end, the set , being a subset of , contains . The maximal value can be obtained on .

As in (6) is independent of , we can always assume . Given realization , we compute via Algorithm 1. The quantile of is computed via repetitions of such a procedure. Thus, the quantile of can be computed in runtime. This is significantly faster than the best existing algorithm, which is of runtime. In general, larger leads to more precise estimation of the quantile. In practice, we find that the estimate is quite stable for (see Figure 2), and thus suggest as the default choice.

### 3.2 Quasiconvexity of exponential family regression

By Proposition 1, a larger class of quasiconvex objective functions attain the maximum over on . In exponential family regression model, the multiscale statistic (4) is independent of . Similar to Gaussian mean model (7), it can be mapped into a bivariate function . Next we will discuss the computation of multiscale statistics for the exponential family regression model (1) and extend Algorithm 1 to a general form.

We assume in model (1), the independent observations ( come from a exponential family with standard density

 fθ(x)=exp{θx−ψ(θ)},

where the cumulant function is strictly convex on . Then the maximum scanning statistics in (5) through exponential family regression can be simplified as:

 TI(Y,θ0)=supθ∈Θj∑k=i{θYk−ψ(θ)}−j∑k=i{θ0Yk−ψ(θ0)}. (8)

Let and as defined before, then the evaluation of in (8) can be written as a supremum over of a bivariate function:

 TI(Y,θ0)=supθ∈Θhθ(ℓi,j,si,j)with hθ(ℓ,s)\coloneqq(θ−θ0)s−ℓ(ψ(θ)−ψ(θ0)). (9)
###### Lemma 1.

The bivariate function in (9) is convex over .

###### Proof.

Since is linear about , it follows that ,

 supθ∈Θhθ((1−λ)ℓ1+λℓ2,(1−λ)s1+λs2) =supθ∈Θ(λhθ(ℓ1,s1)+(1−λ)hθ(ℓ2,s2)) ≤λsupθ∈Θhθ(ℓ1,s1)+(1−λ)supθ∈Θhθ(ℓ2,s2).

By the definition of convex function, is convex on . ∎

The multiscale statistic in (4) is made up of the scanning statistic and a penalty term . The penalty function working as a scale calibration only depends on interval length . So multiscale statistic can be written as in (9) added by a penalty function:

 Tn(Y,θ0)=supθ∈Θhθ(ℓi,j,si,j)−pI(ℓi,j). (10)

By Lemma 1, the bivariate function is convex and it keeps convex if it is substracted by a concave penalty . According to Proposition 1 the maximum of (10) over can be attained on the vertices of the convex hull of . Thus, the optimization of multiscale statistic can also be solved by Algorithm 1. We state this result in Theorem 1.

###### Theorem 1.

The multiscale statistic for exponential family regression model with concave penalty terms can be evaluated in a linear runtime.

In summary, the proposed algorithm is a general method for simulating multiscale statistic from exponential family regression model with convex penalization. It speeds up the existing algorithms to a linear runtime. Meanwhile, the memory space mainly used for storing points is bounded by the number of vertices in , i.e., .

## 4 Simulation study

This section examines the empirical performance of the proposed Algorithm 1. We provide the implementation of the proposed method in R package “linearQ”, available from CRAN.

We start with the Gaussian mean regression in (2), and compare the proposed method with other existing methods. To this end, we consider the Fourier transform based algorithm, implemented in CRAN R package “stepR” (Frick et al., 2014), and the cumulative sum based algorithm, implemented in CRAN R package “FDRSeg” (Li et al., 2016), see Section 3.1

. The simulation data is generated as i.i.d. realizations of standard normal random variables, for different sample sizes ranging from

to . For a given sample size, we repeat times, which is set to . The average computation time for the evaluation of for different methods is reported in Figure 3. It shows that the proposed method is significantly faster than the other two, achieving one order speed-up, with its computation complexity .

In addition, the proposed method applies to every distribution in exponential family provided that the penalty term is convex, see Section 3.2. As a demonstration, we consider the Poisson case, i.e., in (1

) is the Poisson distribution with mean

. Figure 4 illustrates the computation time of evaluating without scale penalty, with the data sizes from to and repetition , via the proposed method. Again the empirical performance supports our theoretical complexity analysis that the computation time is linear in terms of sample size .

## 5 Conclusion

The multiscale change-point segmentation methods are recognized as the-state-of-the-art in change-point inference, and have been playing an important role in various applications. In this paper, we propose a fast algorithm for the computation of the only tuning parameter of such multiscale change-point segmentation methods. The proposed method has a linear computation complexity and a linear memory complexity, in terms of the sample size, in sharp contrast to the existing methods with at least quadratic computation complexity. The crucial idea behind is to transform the original problem into the maximization of a quasiconvex function over a constrained Minkowski sum. The theoretical complexity is well supported by the empirical performance. Extension to general models beyond exponential family is a possible line of future research.

## References

• Allison (2003) Allison, L. (2003). Longest biased interval and longest non-negative sum interval. Bioinformatics, 19(10):1294.
• Bernholt et al. (2007) Bernholt, T., Eisenbrand, F., and Hofmeister, T. (2007). A geometric framework for solving subsequence problems in computational biology efficiently. In Computational geometry (SCG’07), pages 310–318. ACM, New York.
• Bernholt et al. (2009) Bernholt, T., Eisenbrand, F., and Hofmeister, T. (2009). Constrained Minkowski sums: a geometric framework for solving interval problems in computational biology efficiently. Discrete Comput. Geom., 42(1):22–36.
• Bernholt and Hofmeister (2006) Bernholt, T. and Hofmeister, T. (2006). An algorithm for a generalized maximum subsequence problem. In LATIN 2006: Theoretical informatics, volume 3887 of Lecture Notes in Comput. Sci., pages 178–189. Springer, Berlin.
• Boyd and Vandenberghe (2004) Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge University Press, Cambridge.
• Carlstein et al. (1994) Carlstein, E., M ller, H. G., and Siegmund, D. (1994). Change-point problems. papers from the ams-ims-siam summer research conference held at mt. holyoke college, south hadley, ma, usa, july 11, 16, 1992. Institute of Mathematical Statistics Lecture Notes - Monograph Series, 23.
• Csörgö and Horvàth (2011) Csörgö, M. and Horvàth, L. (2011). Limit theorems in change-point analysis. John Wiley & Sons Ltd Chichester.
• DavidSiegmund (2013) DavidSiegmund (2013). Change-points: From sequential detection to biology and back. Communications in Statistics Part C Sequential Analysis, 32(1):2–14.
• Davies et al. (2012) Davies, L., Höhenrieder, C., and Krämer, W. (2012). Recursive computation of piecewise constant volatilities. Comput. Stat. Data Anal., 56(11):3623 – 3631.
• Dümbgen and Spokoiny (2001) Dümbgen, L. and Spokoiny, V. G. (2001). Multiscale testing of qualitative hypotheses. Ann. Statist., 29(1):124–152.
• Frick et al. (2014) Frick, K., Munk, A., and Sieling, H. (2014). Multiscale change-point inference. J. R. Stat. Soc. Ser. B. Stat. Methodol., with discussion and rejoinder by the authors, 76:495–580.
• Fukuda (2004) Fukuda, K. (2004). From the zonotope construction to the minkowski addition of convex polytopes. Journal of Symbolic Computation, 38(4):1261–1272.
• Graham (1972) Graham, R. L. (1972). An efficient algorith for determining the convex hull of a finite planar set. Information Processing Letters, 1(4):132–133.
• Hotz et al. (2013) Hotz, T., Schutte, O., Sieling, H., Polupanow, T., Diederichsen, U., Steinem, C., and Munk, A. (2013). Idealizing ion channel recordings by a jump segmentation multiresolution filter. IEEE Transactions on Nanobioscience, 12(4):376–386.
• Li et al. (2017) Li, H., Guo, Q., and Munk, A. (2017). Multiscale change-point segmentation: Beyond step functions. arXiv preprint arXiv:1708.03942.
• Li et al. (2016) Li, H., Munk, A., and Sieling, H. (2016). FDR-control in multiscale change-point segmentation. Electron. J. Stat., 10(1):918–959.
• Lio and Vannucci (2000) Lio, P. and Vannucci, M. (2000). Wavelet change-point prediction of transmembrane proteins. Bioinformatics, 16(4):376.
• Olshen et al. (2004) Olshen, A. B., Venkatraman, E. S., Lucito, R., and Wigler, M. (2004). Circular binary segmentation for the analysis of array based dna copy number data. Biostatistics, 5(4):557–572.
• Reeves et al. (2007) Reeves, J., Chen, J., Wang, X. L., Lund, R., and Lu, Q. (2007). A review and comparison of changepoint detection techniques for climate data. Journal of Applied Meteorology & Climatology, 46(6):900.
• Rivera and Walther (2013) Rivera, C. and Walther, G. (2013). Optimal detection of a jump in the intensity of a Poisson process or in a density with likelihood ratio statistics. Scand. J. Stat., 40(4):752–769.
• Spokoiny (2009) Spokoiny, V. (2009). Multiscale local change point detection with applications to value-at-risk. Ann. Statist., 37(3):1405–1436.
• Venkatraman and Olshen (2007) Venkatraman, E. S. and Olshen, A. B. (2007). A faster circular binary segmentation algorithm for the analysis of array cgh data. Bioinformatics, 23(6):657.
• Weibel (2007) Weibel, C. (2007). Minkowski sums of polytopes. Similar Records.