Change-point regression with a smooth additive disturbance

12/07/2021
by   Florian Pein, et al.
0

We assume a nonparametric regression model with signals given by the sum of a piecewise constant function and a smooth function. To detect the change-points and estimate the regression functions, we propose PCpluS, a combination of the fused Lasso and kernel smoothing. In contrast to existing approaches, it explicitly uses the assumption that the signal can be decomposed into a piecewise constant and a smooth function when detecting change-points. This is motivated by several applications and by theoretical results about partial linear model. Tuning parameters are selected by cross-validation. We argue that in this setting minimizing the L1-loss is superior to minimizing the L2-loss. We also highlight important consequences for cross-validation in piecewise constant change-point regression. Simulations demonstrate that our approach has a small average mean square error and detects change-points well, and we apply the methodology to genome sequencing data to detect copy number variations. Finally, we demonstrate its flexibility by combining it with smoothing splines and by proposing extensions to multivariate and filtered data.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

12/06/2021

Cross-validation for change-point regression: pitfalls and solutions

Cross-validation is the standard approach for tuning parameter selection...
03/14/2018

Signal Processing and Piecewise Convex Estimation

Many problems on signal processing reduce to nonparametric function esti...
03/11/2018

Piecewise Convex Function Estimation: Pilot Estimators

Given noisy data, function estimation is considered when the unknown fun...
12/01/2020

Anisotropic local constant smoothing for change-point regression function estimation

Understanding forest fire spread in any region of Canada is critical to ...
12/07/2021

Piecewise survival models: a change-point analysis on herpes zoster associated pain data revisited and extended

For many diseases it is reasonable to assume that the hazard rate is not...
06/21/2011

The group fused Lasso for multiple change-point detection

We present the group fused Lasso for detection of multiple change-points...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Change-point regression is currently one of the most active research areas in Statistics. It has wide ranging applications, for instance in biochemistry (hotz2013idealizing; pein2017heterogeneous), climatology (reeves2007review), environmental analysis (bolorinos2020consumption), finance (bai2003computation; kim2005structural), genomics (olshen2004circular), medicine (younes2014inferring), network traffic data analysis (lung2012distributed), quality monitoring (d2011incipient) and speech processing (harchaoui2009regularized).

A simple but very common setting considered is that where observations are modelled as

(1)

where is a piecewise constant signal, is the number of observations and

are errors with mean zero and variance

.

In many applications however, while the signal may change abruptly, between these change-points the signal may not be precisely constant. Examples in the literature are from climatology (QiuYandell02; Qiu03; WuZhao07), finance (Wang95; Gijbelsetal07; Lametal16; Liuetal18), light transmittance (abramovich2007estimation) and many more. Instead, many works assume (see Section 1.1) that

(2)

where is piecewise constant, and is a smooth function representing a systematic disturbance; an example of such a signal is given in Figure 1. The piecewise constant signal has change-points at , i.e.

(3)

with and . Note that the regression problem is not identifiable as a constant can be added to and subtracted from without changing . Hence, without loss of generality we assume that .

Figure 1: An illustrating example, the black line is the true signal and the grey points are the simulated observations.

An additive decomposition of this form may be motivated by the fact that the systematic disturbance and piecewise constant component typically have different sources in applications. In this work we will consider two additional examples: Firstly, in Section 5 we detect copy number variations by analysing genome sequencing data, an application to which various change-point regression approaches were applied before. Here larger abrupt changes indicate copy number variations and are the main interest. However, various biases in the measurements such as mappability and GC content bias are a source for systematic disturbances, see liu2013computational. Secondly, in Section 6.3 we analyse patchclamp recordings, experiments that allow to measure the conductance of a single transmembrane proteins over time. Once again the abrupt changes are the main interest as they indicate for instance a blocking of the protein or a change of conformation. However, oscillations caused by the electronic or by building vibrations can be seen as systematic disturbances. Furthermore, the recordings require dealing with filtered observations.

1.1 Related work

Model (2), where only the function value itself has an abrupt change at a change-point location, but all derivatives (if they exist) are not allowed to change abruptly, may be contrasted with the more general model where the signal is assumed to be piecewise smooth, i.e. all derivatives may change abruptly at a change-point location. Both models were considered in various papers by several approaches, mostly from the perspective of extending smoothing approaches, such as kernel regression (Mueller92; Qiu94; eubank1994nonparametric; Qiu03; GijbelsGoderniaux04), local polynomial regression (Loader96; Spokoiny98; Gjbelsetal04; Gijbelsetal07; Mboupetal08; xia2015jump; Zhang16), (smoothing) splines (Koo97; Lee02; MiyataShen03; HuangQui10; YangSong14; Liuetal18), wavelet (Raimondo98; AntoniadisGijbels02; abramovich2007estimation) and Bayesian (Denisonetal98; Ogden99; Dimatteoetal01; Punskayaetal02; Fearnhead05; Morenoetal13; Lee18) approaches. A helpful review is given in (qiu2005image).

To the best of our knowledge (almost) all existing approaches assume either explicitly or implicitly, the more general piecewise smooth regression model when detecting change-points. A decomposition (if assumed) is only be used in a second step to re-estimate the smooth signal by smoothing the observations with the piecewise constant signal subtracted. The only notable exception is the work by eubank1994nonparametric who estimated the location and size of a single change-point in the derivative of an otherwise smooth function by using a semi-parametric framework similar to the one which we will use in Section 1.2 to motivate our methodology. In this paper, we will propose an approach that explicitly uses the decomposition for detecting multiple change-points and for estimating the regression functions to increase detection power and estimation accurary as motivated in the following section.

1.2 Motivation

Let us consider estimating the size of a single change-point at a fixed location plus a smooth regression function. The more classical approaches listed above solve this problem by taking the difference between regression estimates on the right and left of the change-point location. However, if we assume the decomposition, i.e. that the smooth regression function is not changing abruptly at the change-point location, we can write the problem as a partial linear model , whereby the regression matrix X is given by the single regressor , with zeros left of the change-point and ones right of the change-point times the (unknown) size of the change-point.

Early approaches, e.g. the partial spline (wahba1984partial; wahba1986partial; engle1986seminparametric; shiau1986partial), can be written as

(4)

with , S a smoothing matrix and the

-dimensional identity matrix, see

(speckman1988kernel; shiau1988rates). The estimator for is linear in Y and is equal to the difference of positively weighted sums of the observations right and left of the change-point, hence of the same form as the classical piecewise smooth change-point estimators. However, undersmoothing is required in general to achieve the parametric minimax rate for , see (rice1986convergence; shiau1988rates; chen1991two). In comparison, the least squares estimator

(5)

from (denby1984smooth; speckman1988kernel) does not require undersmoothing to achieve this rate and has often a better finite sample performance, see speckman1988kernel; shiau1988rates; chen1991two. This aligns with the intuition that assuming the decomposition allows us to combine information about the smooth signal from the left and right side of a potential change-point which improves detection power and estimation accuracy.

Organisation of this work

In Section 2 we propose Piecewise Constant plus Smooth Regression Estimator, , to estimate change-points and regression functions in (2) using the decomposition into piecewise constant and smooth function explicitly. Motivated by the previous section, our estimator is a combination of the least squares estimator in (5) with a total-variation penalty to ensure sparse changes, see (10). Two postfilter steps to improve estimation of the number of changes-points and their jump sizes are explained in Section 2.1. Those steps are summarised in Algorithm 1. Tuning parameter selection by cross-validation is discussed in Section 2.2. Notably, we use -loss instead of the standard -loss is the cross-validation criterion. This follows the recommendation from pein2021crossvalidation, where it is shown that -loss is problematic for cross-validation in change-point regression. However, we will argue that in our setting, where also a bandwidth has to be selected, the issue is even worse.

Computational details are given in Section 3. In Section 4 we evaluate and compare with existing methods in various simulation settings and in Section 5 we apply it to genome sequencing data to detect copy number variations. The paper concludes with a discussion in Section 6 and an outlook which illustrates the flexibility of our approach. We detail in Section 6.1 an alternative using smoothing splines and outline in Sections 6.2 and 6.3 extensions to multivariate and filtered data.

We will use bold letters for vectors and matrices. We denote by

f, g, h and Y the vectors , , and .

2 Methodology

To estimate and in (2) we propose , piecewise constant plus smooth regression estimator. It is a combination of the fused Lasso and kernel smoothing and to the best of our knowledge the first approach that takes into account explicitly the decomposition in (2) when detecting change-points. This was motivated in Section 1.2. In (2) we do not know the number of change-points and their locations. Hence, we have to consider all regressors simultaneously. To ensure sparsity, i.e. not fitting a change-point everywhere, we penalize (5) by a total-variation penalty, see (10).

To motivate our estimator further, let us assume for a moment that the piecewise constant function

is known, then are observations of a usual smoothing problem. Hence, we can estimate by kernel smoothing using the Nadaraya-Watson estimator (nadaraya1964estimating; watson1964smooth; bierens1996topics). Let be a kernel function and a bandwidth. By default, we use the Epanechnikov kernel . Then the Nadaraya-Watson estimator is defined as

(6)

This means that for any f we can estimate g by

(7)

where is a (kernel) smoothing matrix with

(8)

On the other hand, if is known, then may be considered as observations in a standard piecewise constant change-point regression problem. Hence, f can be estimated by the fused Lasso (tibshirani2005sparsity; lin2017sharp)

(9)

where is a tuning parameter, the Euclidean norm and the total variation norm.

In our situation and are unknown, so we combine those two approaches by replacing g in (9) by the functional from (7). We obtain

(10)

where is the -dimensional identity matrix and the constraint is added to ensure identifiability. We remark that norouzirad2019differenced proposes estimator of a similar form, with is replaced by difference matrix D, for estimating the parametric component in a partial linear model under sparsity assumptions.

Finally, the smooth signal is estimated by

(11)

The tuning parameters and will be chosen by cross-validation as detailed in Section 2.2. An example is given in Figure 2.

Figure 2: Fit (red line) of the data in Figure 1 using the approach from Section 2 without post-filtering. It detects the change-point correctly and fits the smooth parts well, but underestimates the size of the change.

2.1 Post-filtering

We found in Figure 2 that (10) works already quite well, but in general we observe two systematic flaws. Firstly, sizes of change-points are underestimated because of the shrinkage effect of the total variation penalty in (10). Secondly, false positives of small jump size are detected, particularly when the bandwidth is large. This phenomenon is well known for the cross-validated fused Lasso. To deal with them, we propose two postfilter steps.

The fused Lasso can be preconditioned (qian2016stepwise) such that the number of are not systematically over-estimated, but this comes at the price of a smaller detection power. This is done by rewriting the fused Lasso as an ordinary Lasso problem and applying a transformation to the regression matrix. In the smoothing matrix acts as a form of preconditioning and reduces the detection of false positives. Simulations confirm that this works effectively when the chosen bandwidth is small. Once again the detection power is reduced. However, in our setting this is in any case unavoidable since the smooth signal component has to be taken into account.

To correct for false positives even further we re-estimate the change-points in a first postfilter step by applying (killick2012optimal) to the observations minus the estimated smooth signal . This is motivated by the fact, that unless bigger errors occur in the previous step, we have that the vector is similar to the vector and hence can be treated as a piecewise constant regression problem. More precisely, we re-estimate f by

(12)

where denotes the number of change-points and we use the -penalty . When the variance is unknown, we pre-estimate it by the difference based estimator

(13)

with

the quantile function of the standard Gaussian distribution.

One could use any other piecewise constant change-point estimator instead. However, we decided for not only because it is a fast and well-working method, but rather since the fused Lasso can be interpreted as convex relaxation of it. In fact it would be desirable to replace the fused Lasso in (10) by , but the resulting non-convex optimisation problem is computationally challenging.

This first postfilter step is particularly important when the selected bandwidth is large, since then the precondition effect of the smoothing matrix is small and even non-existent if the selected bandwidth is infinity and the initial step of is equivalent to applying the fused Lasso. However, when the bandwidth is large a potential overestimation of the change-points does not effect the estimation of the smooth function much, only of the piecewise constant function which however is not used in the following post-filtering step. Hence, the postfilter step is able to correct for this.

Secondly, underestimation of the jump size was caused by the total variation penalty. A penalty is not required when we know the number of change-points (and their locations). Hence, in the second postfilter step we repeat (10) without penalisation but with changes restricted to the change-point locations estimated by . This step is equivalent to (5), but with (potentially) multiple change-points and hence multiple regressors. More precisely, let be the estimated change-points and . Then, we re-estimate f and g by

(14)

and once again . The resulting of this 3-step procedure will be called . Its steps are summarised in Algorithm 1. An exemplary fit of the observations in Figure 1 by is shown in Figure 3. It detects the change-point correctly and estimates the size of the change and the smooth function well.

Figure 3: Fit (red line) of the data in Figure 1 by . The single change-point is correctly identified and the signal is estimated with high accuracy.
1:Observations satisfying model . Tuning parameters and bandwidth (can be selected by cross-validation, see Section 2.2).
2:The estimated signal
3:Fused Lasso combined with kernel estimation
(10)
4:
(11)
5:Post-filter step: Piecewise-constant fit of
(12)
6:Obtain , where are the change-points of .
7:Post-filter step: Estimate f without penalty, but with given change-point set
(14)
8:return
Algorithm 1 Overview of the steps of .

2.2 Cross-validation

We use the same bandwidth / smoothing matrix in (10) and (14) to simplify the selection of them and to reduce the computation time. Hence, has two tuning parameters: the bandwidth of the smoothing matrix and the fused Lasso penalty . Both parameters are selected jointly by cross-validation minimizing the -loss, and by default we use -fold cross-validation.

Use of the -loss instead of the -loss is not standard for cross-validation, but it is shown in pein2021crossvalidation that it avoids problems that can occur in change-point regression due to single points in the hold-out set before or after large changes dominating the whole cross-validation criterion. In the following we explain why this issue is even worse in our setting, where we are also selecting a bandwidth.

Consider a piecewise constant signal with a single large change-point of size in the middle. Clearly, a large bandwidth would be desirable, ideally infinitely large, which would correspond to a simple piecewise constant regression. However, in this simple setting, cross-validation with -loss is likely to select a very small bandwidth, as we now explain. If the point following the large change is in the hold-out set, this will be wrongly estimated to have the mean of the preceding observations and incur an error of roughly . On the other hand, when a very small bandwidth is used, the out-of-sample prediction will be the average of the observations either side of the change-point, which will then incur an error of roughly . Thus in total, we can expect that the cross-validation criterion will be at least for the piecewise constant regression, and plus an additional contribution from the higher variance of the regression function estimate due to the small bandwidth. Then in the ostensibly straightforward setting where is very large, the former can dominate the latter, and result in cross-validation incorrectly selecting a small bandwidth.

3 Computation

In this section we detail the computation of and also discuss briefly its computation time. is implemented in R and available on CRAN. Run time intensive code is written in C++ and interfaced.

The modified fused Lasso estimator in (10) can be written as a standard Lasso problem with regression matrix and observation vector , where is the matrix X without the first column and is defined by if and otherwise. See also the detailed description in (qian2016stepwise) how the standard fused Lasso can be rewritten as a Lasso problem. We define . The matrix is a banded matrix with sub- and superdiagonals. Its entries can be computed explicitly in steps using cumulative sums, since S and X have a simple systematic structure. The vectors and results from usual kernel smoothing and can be computed by sums instead of matrix-vector multiplication.

is computed by a pruned dynamic program, see killick2012optimal; killick2014changepoint; maidstone2017optimal. The optimisation problem (14

) is an ordinary least squares regression problem with regression matrix

, where denotes the matrix X restricted to the columns with indices in , and observation vector . Hence, we have to compute

(15)

This is can be computed quickly, since is only a dimensional matrix. Moreover, it is often a (sparse) block matrix, where the (i,j)-entry is zero if .

The cross-validation criterion is optimised by a grid search. For the bandwidth we use an exponential grid from to with by default entries plus the value infinity which is interpreted as piecewise constant regression by an ordinary fused Lasso. For each bandwidth for the penalty an exponential grid with by entries is searched. Explicit formulas for the construction of the matrix are a little bit more difficult in the case of -fold cross-validation than for the normal estimator, but all entries can still be calculated in steps.

The computation time is dominated by the computation of (10). The complexity of those Lasso optimisation problems increases quadratically in the number of observations. To give a rough impression, cross-validation lasts roughly a few seconds for observations and a minute for observations on a standard laptop. Computation of the estimator is much faster and lasts around a second for but increases still quadratically. Ideas for faster computation are discussed in Section 6.

4 Simulations

In this section we investigate the performance of in three Monte-Carlo simulations. In Section 4.1 we revisit the setting from (abramovich2007estimation) which consists of four signals that can be decomposed into a piecewise constant plus a smooth function. In Section 4.2 we revisit the setting from (olshen2004circular) which considers a piecewise constant function plus smooth artefacts. Finally, we examine in Section 4.3 how robust is against a violation of the assumption that the signal can be decomposed into a piecewise constant plus a smooth function. All simulations are performed in R and repeated times except the simulations with observations which are repeated times.

Methods

For we call the function PCplusS in the R-package PCplusS with its default parameters. This particularly means that we use 5-fold cross-validation to select simultaneously the bandwidth and the fused Lasso penalty .

A systematic comparison with existing piecewise smooth regression approaches is difficult, since to the best of our knowledge software is not publicable available for any existing approach and asking the authors of several publications was not successful either. Hence, we implemented by ourself the method from (xia2015jump), which is based on a jump information criterion and hence abbreviated by in the following. We followed their suggestion of a constant bandwidth of .

Additionally, we include piecewise constant change-point methods and kernel smoothing approaches. They can be interpreted as baseline measures. We include (killick2012optimal) and the fused Lasso (tibshirani2005sparsity), abbreviated by , with specifications as for . And we use the Nadaraya-Watson estimator with the same -fold cross-validation approach as for , abbreviated by , and with usual leave one out cross-validation with loss, abbreviated by , to select the bandwidth.

Evaluation

We mainly evaluate the estimation of the signal by the averaged mean square error

Moreover, we assess the estimation of the change-points. To this end, let be the true number of change-points and be the estimated number of change-points. We report the bias , how often is , equal to and as well as the average percentage that a true change-point is detected (averaged about all true changes). To this end, we say a change-point is detected if the distance between the true change-point location and a detected change-point is less than a given tolerance. For this analysis we have chosen the tolerance to be the minimum between and the smallest distance between two change-points divided by two (which is for the block signal). Note that for reasons of clarity and comprehensibility we sometimes omit some methods when evaluating change-point detection accuracy. Omitted methods perform poorly in the considered setting because they are not designed for it. All results are averaged among all repetitions.

4.1 Smooth plus piecewise constant signal

In this section we repeat and extend the simulations study from (abramovich2007estimation). They considered four different signals: blocks, burt, cosine, heavisine. These signals are visualised in Figure 4 and a formal definition can be found in their publication. All of those signals satisfy our model (2) of a piecewise constant function plus a smooth function. Most of them involve a significant smooth component and hence we will focus mainly on the averaged mean square error to assess the estimations, but also look briefly at the estimation of the change-points.

The standard deviation of the error is set to the value obtained by applying the empirical standard deviation formula to the signal and dividing by

. abramovich2007estimation considered and and . In our simulations we are extending this to with equal to and as well as with equal to and . When available ( and equal to or ), we are also reporting the results of , which is the method abramovich2007estimation proposed and an approach that selects tuning parameter by generalised cross-validation. However, one should keep in mind that those results are obtained in the same simulation setting but not in the same simulation study. Results for and about the averaged mean square error are shown in Table 1. Results for other and are shown in Tables 6-10 in Appendix A.1 to keep this section readable. Results about the detection of change-points for and are shown in Table 11 in Appendix A.1. We omit the results for other and , since they allow qualitatively the same comparison between the methods and all methods improve rather similarly with increasing and (decreasing noise).

(a) blocks
(b) burt
(c) cosine
(d) heavisine
Figure 4: Visualisation of the piecewise constant plus smooth signals.
Method blocks burt cosine heavisine
0.03185 1.052 0.2761 0.08788
1.833 23.84 0.5638 1.588
1.52 0.87 0.33 0.35
0.05983 5.87 0.7236 0.3701
0.1257 2.378 0.2475 0.1319
0.2163 2.214 0.5764 0.08472
0.2155 2.125 0.5571 0.08241
Table 1: Averaged mean square errors for and .

We found in Tables 6-10 that is most of the time the method with the smallest averaged mean square error and if not it is always close to the best method. It outperforms other piecewise smooth regression approaches (JIC and ABS1) with the exception of ABS1 for the burt signal. Remarkably, has even a smaller average mean square error than PELT for the blocks signal, a piecewise constant regression problem for which is designed, unless the noise is large. This can be explained by the fact that we use cross-validation to select our parameters while the penalty aims primarily for a correct estimation of the change-points. The fused Lasso is only competitive for the cosine signal and when the noise is not too large. Kernel smoothing works well for the heavisine signal (a signal with only two smaller jumps) and outperforms when the noise is large. We also want to stress that the cross-validation choice has only a small influence, but leave-one-out cross-validation leads always to slightly better results.

Table 11 shows that detects change-points well, only the change-points in the heavine signal are often missed. However, it also shows a tendency to overestimate slightly the number of change-points. This is not a surprise, since tuning parameters are selected by cross-validation and hence we cannot expect model consistency. Nonetheless, outperforms our method only for the cosine signal, a signal which favours its bandwidth choice. Overall, the simulation clearly shows that a default bandwidth does not lead to good results and tuning parameter selection is still a challenging research topic when one aims to detect change-points in a piecewise smooth signal or in a piecewise constant plus smooth signal well. Unsurprisingly, outperforms when the signal is piecewise constant but heavily overestimates the number of change-point when it is not. And, as already stated in the literature, cross-validated fused Lasso heavily overestimates the number of change-points, even when the true signal is piecewise constant.

4.2 Piecewise constant signal plus smooth artefacts

In this section we consider again signals that can be decomposed into a piecewise constant function and a smooth function. However, the focus is now on the piecewise constant component while the smooth component has a rather small amplitude, i.e. one might see it as a (small) ’deterministic noise’ that disturbs in addition to the usual noise the otherwise piecewise constant signal.

To this end, we revisit and extend the simulation study from (olshen2004circular). Motivated by genome sequencing, they investigated the robustness of change-point approaches to smooth artefacts. This setting was later picked up by several piecewise constant change-point papers, e.g. (zhang2007modified; niu2012screening; frick2014multiscale). We simulate observations with standard deviation . The piecewise constant signal has six change-points at locations and function levels . The smooth artefacts are given by . We vary and (long and short artefacts / trends). Note that and were not considered in the previous studies. An example for and is presented in Figure 5. Simulation results are given in Tables 2 and 3.

Figure 5: The piecwise constant signal plus smooth artefacts with and (black line) and exemplary observations (grey points).
Method
0.001733 0.002241 0.002835 0.002711 0.003622 0.003044 0.004205
0.03389 0.03346 0.03637 0.03377 0.04097 0.03671 0.05645
0.001564 0.002556 0.002884 0.00381 0.006408 0.00589 0.009259
0.0251 0.02588 0.02503 0.0274 0.02721 0.03045 0.03804
0.01142 0.01139 0.0114 0.01141 0.01139 0.01141 0.01138
0.01096 0.01095 0.01094 0.01096 0.01094 0.01095 0.01095
Table 2: Averaged mean square errors for the piecewise constant signal plus smooth artefacts for various and .
a b Method % detected
0 0 0.2361 0.0024 0.0078 0.0699 0.6854 0.1704 0.0447 0.0194 94.42
0 0 -4.272 0.9841 0.0158 0.0001 0 0 0 0 28.45
0 0 0.0766 0 0 0 0.9343 0.056 0.0087 0.001 97.3
0 0 5.128 0 0.0008 0.0043 0.02 0.0423 0.0857 0.8469 66.33
0.01 0.2 0.3541 0.003 0.0116 0.138 0.447 0.3103 0.0681 0.022 92.17
0.01 0.2 -4.115 0.9815 0.0179 0.0006 0 0 0 0 31
0.01 0.2 0.4548 0 0 0 0.6308 0.2967 0.0616 0.0109 95.73
0.01 0.2 7.638 0 0 0.0002 0.0016 0.0029 0.0135 0.9818 65.62
0.025 0.2 0.4842 0.0043 0.0257 0.1306 0.4379 0.2299 0.1099 0.0617 90.19
0.025 0.2 -4.418 0.9849 0.0147 0.0004 0 0 0 0 26.05
0.025 0.2 0.2841 0 0 0 0.7919 0.1505 0.0447 0.0129 95.58
0.025 0.2 4.131 0 0.0009 0.0076 0.0315 0.0786 0.1348 0.7466 65.95
0.01 0.4 0.4397 0.0049 0.0204 0.241 0.2902 0.2423 0.1401 0.0611 88.57
0.01 0.4 -4.103 0.982 0.0178 0.0002 0 0 0 0 31.16
0.01 0.4 1.768 0 0 0 0.0343 0.4055 0.3576 0.2026 92.9
0.01 0.4 10.32 0 0 0 0 0 0.0009 0.9991 64.85
0.025 0.4 0.5095 0.016 0.0867 0.2851 0.1854 0.1618 0.1119 0.1531 82.99
0.025 0.4 -4.622 0.9901 0.0098 0.0001 0 0 0 0 22.77
0.025 0.4 1.847 0 0 0 0.2579 0.2527 0.1855 0.3039 90.79
0.025 0.4 3.887 0 0.0002 0.007 0.0343 0.0868 0.149 0.7227 64.64
0.01 0.8 0.3138 0.0077 0.0424 0.3482 0.248 0.1553 0.075 0.1234 85.05
0.01 0.8 -4.11 0.9816 0.0184 0 0 0 0 0 31.03
0.01 0.8 3.614 0 0 0 0 0.0063 0.1209 0.8728 92.03
0.01 0.8 14.55 0 0 0 0 0 0 1 59.33
0.025 0.8 -0.125 0.0496 0.2016 0.3502 0.147 0.0954 0.0555 0.1007 76.24
0.025 0.8 -4.891 0.9965 0.0035 0 0 0 0 0 18.43
0.025 0.8 9.299 0 0 0 0 0 0.0001 0.9999 88.25
0.025 0.8 4.883 0 0.0002 0.0028 0.0142 0.0413 0.0888 0.8527 56.29
Table 3: Summary results about the detection of change-points for the piecewise constant signal plus smooth artefacts for various and .

Table 2 reveals that performs also in this setting very well with respect to the averaged mean square error. Without artefacts has a slightly smaller averaged mean square error, but at presence of artefacts is equal when the artefacts are small or preferable when the artefacts are larger. Similarly, when looking at the detection of change-points performance a bit better without artefacts or when the artefacts are small, but is worse when the artefacts are larger. Hence, as expected is much less affected by such artefacts, but they still worsens results. All other competitors are clearly outperformed.

4.3 Robustness

In this section we investigate how much our method is affected by a violation of the assumption that the signal can be decomposed into a piecewise constant plus a smooth function. Therefore, we simulate observations from a signal with a single change-point plus a shifted cosine

for . This signal has a single change-point of size at and the first derivative has a change of size at the same location. Hence, this signal satisfies our model assumptions only if and the violation gets more severe with increasing . Subtraction of ensures that size of the change is always . The standard deviation is set to be . An example for is presented in Figure 6. Simulation results are given in Tables 4 and 5. Note that this setting is designed among other considerations such that the default bandwidth of is a decent choice to allow a comparison.

Figure 6: The shifted cosine signal for (black line) and exemplary observations (grey points).
Method
0.008627 0.008676 0.009249 0.009883
0.01888 0.02113 0.02532 0.02614
0.0377 0.03834 0.03676 0.03244
0.01447 0.01402 0.01429 0.01581
0.01282 0.0129 0.01306 0.01325
0.01246 0.01254 0.01266 0.01286
Table 4: Averaged mean square errors for the shifted cosine signal and various .
a Method % detected
0 0.9389 0.0534 0.46 0.2698 0.0982 0.1186 93.11
0 -0.1012 0.1582 0.7879 0.0509 0.0029 0.0001 83.74
0.1 0.9511 0.0524 0.4285 0.3124 0.091 0.1157 93.22
0.1 -0.0771 0.1412 0.7988 0.0563 0.0034 0.0003 85.55
0.25 1.017 0.058 0.404 0.3209 0.086 0.1311 92.75
0.25 -0.0327 0.1356 0.7701 0.0865 0.0072 0.0006 86.07
0.5 0.9995 0.069 0.4291 0.2749 0.0963 0.1307 91.38
0.5 0.0004 0.1129 0.7844 0.0927 0.0094 0.0006 88.23
Table 5: Summary results about the detection of change-points for the shifted cosine signal and various .

Tables 4 and 5 show that is remarkably robust against such a violation of its assumptions. Not only that has the smallest averaged mean square error in all cases, but results are getting only slightly worse when increases. outperforms with respect to the detection of change-points but not with respect to the averaged mean square error. Remarkably, the increase of the averaged mean square error when increases is even larger for than for , despite this method was designed for piecewise smooth signals.
All in all, we found that performs well and with respect to the averaged mean square error it is outperformed only on rare occasions. It also shows decent change-point detection results, but has a slight tendency to overestimate the number of change-points. Moreover, it is very robust to a violation of the assumption that the signal can be decomposed into a piecewise constant plus a smooth function.

5 Application: Genome sequencing

Change-point regression is commonly used in the analysis of genome sequencing data for detecting copy number variations, see e.g. (olshen2004circular; zhang2007modified; futschik2014multiscale; hocking2020constrained) to list only a few approaches. Copy number variations are amplifications or deletions of genetic material, which can range from few base pairs to a whole chromosome. Detecting copy number variations is of high importance as they can be linked to diseases such as cancer. To this end, full genome sequencing techniques, such as array-based comparative genomic hybridization (CGH) and next-generation sequencing (NGS), are used to measure the copy number at several thousands genome locations simultaneously. The resulting data are normalized log2-ratios. Hence, values around zero indicate the normal two copies, while in an ideal situation stands for a single deletion and for three copies.

However, artefacts caused by several biases occur frequently. The two most important ones are mappability and GC content bias. Mappability bias means that some reads cannot matched uniquely to a certain region in the genome. And GC content bias means that regions with medium GC content (the amount of the chemical bases guanine G and cytosine C) are more likely mapped than others. Though some bias corrections are available, some artefacts still remain, either because the corrections do not work perfectly or because further unknown biases occur. A good overview is given in (liu2013computational). Hence, the classical model of a piecewise constant signal plus centred random errors is violated.

In the following we use to analyse the Coriell cell line GM03576 in the array CGH data set from (snijders2001assembly). It can almost be seen as a ”gold standard” for genome sequencing, mainly because for this data set the truth is already explored by spectral karyotyping. In cell line GM03576 trisomies for chromosomes 2 and 21 are known.

In Figure 7 we found that with

-penalty, which we use as an example for a piecewise constant change-point regression approach, detects both trisomies, but also single point outliers and several small changes. The positive outliers could potentially be linked to additional amplifications, but all other detections are most likely false positives.

Figure 7: Cell line GM03576 analysed by . It detects both trisomies and several single point outliers, but also several small changes, which are most likely false positives.

Figure 8 shows the corresponding fit using cross-validated . It selects bandwidth and penalty . It detects both trisomies and the positive outliers, too, but only one negative outlier and one other false positive. In Appendix A.2 we show that the additional false positive can be avoided by choosing slightly different tuning parameters. All in all, detected successfully all critical change-points, but almost no additional false positives.

Figure 8: Cell line GM03576 analysed by cross-validated . It also detects both trisomies and the positive outliers, but only one negative outlier and one other false positive.

6 Discussion

In this work we proposed , a combination of the fused Lasso and kernel smoothing, which is to the best of our knowledge the first approach that uses explicitly the decomposition into a piecewise constant function and a smooth function when detecting change-points. We showed in simulations and on real data that this approach has a small averaged mean squared error. One drawback is its larger computation time, in particular when cross-validation is required to select tuning parameters. Computation time is dominated by computing (10) solving a Lasso optimisation problem. The usual fused Lasso can be computed efficiently in linear time by a pruned dynamic program (johnson2013dynamic) or pathwise coordinate descent (friedman2007pathwise). It would be interesting to explore whether similar ideas can be used to compute (10). Alternatively, one can apply iterative procedures that alternate between computing a usual fused Lasso and smoothing, but we found that without further ado this requires a large number of iterations and hence is slow as well. All in all, computing (10) efficiently is challenging and an interesting topic for researchers from convex optimisation. We also hope to stimulate further statistical research on finding improved procedures following a decomposed approach, both statistically and computationally.

A second interest for further research is parameter tuning. As said before cross-validation works well when one aims to minimize the averaged mean square error as we did in this work. However, cross-validation is not only computationally demanding, it is also statistically suboptimal if model selection, i.e. a consistent estimation of the number of change-points, is the main goal. We found in Section 4 that cross-validation provides decent results but we are optimistic that further research can lead to better results with respect to such aims.

In addition to be a well working novel methodology, can also be adapted to many more complicated settings quite easily. We will illustrate this by three examples. First of all, we will replace in Section 6.1 kernel smoothing by smoothing splines. Secondly, we will outline in Sections 6.2 and 6.3 extensions to multivariate and filtered data, respectively.

6.1 Smoothing spline approach

Smoothing splines are a widely used non-parametric method because of its attractive statistical and computational properties, see de1978practical; green1993nonparametric among many others. In this section we will detail how smoothing splines can be used instead of kernel smoothing. A smoothing spline is defined by

(16)

By introducing base functions and solving a generalized ridge regression problem we can obtain a smoothing matrix

Z such that

(17)

we refer to (ruppert2003semiparametric, Chapter B.1)

for more details. The fused Lasso and smoothing splines can be combined to a modified fused Lasso regression problem as well, which reads as

(18)

and . Hence the smoothing splines variant can be computed similarly to the kernel version as outlined in Section 3. In a similar fashion can also be obtained using regression splines or kernel ridge regression. Moreover, a wavelet version using soft thresholding can be computed by solving a dimensional Lasso problem. All in all, can be formulated for most smoothing techniques.

6.2 Multivariate data

In this section we outline how can be extended to multivariate data. This is for instance of interest in genome sequencing when multiple patients with the same form of cancer are studied. In such a situation it is reasonable to assume that all patients have changes at the same locations in the genome, for more details see bleakley2011group. More precisely, we assume

(19)

where are piecewise constant signals with changes at the same locations and are individual smooth functions as we assume that the smooth artefacts are not shared among the patients. In this situation the fused Lasso can be replaced the group fused Lasso and we obtain

(20)

which can be computed in a similar fashion as before.

6.3 Filtered data

Finally, we outline how can be adapted to filtered data. More precisely, let F be the known convolution matrix of a discrete filter. Then, we assume

(21)

This occurs for instance in the analysis of ion channel recordings (neher1976single; sakmann2013single), which are experiments to measure the conductance of a single ion channel over time. Ion channels are poreforming proteins in the cell membrane that allows ions to pass the membrane since they are needed for many vital processes in the cells. Channels regulate the ion flow by opening and closing over time. Studying those gating dynamics is of interest for instance in medicine or biochemistry. The conductance of an ion channel can be modelled as piecewise constant, but smooth artefacts are common, for instance caused by the electronic, building vibrations or small holes in the membrane. The recordings are filtered by lowpass filters, often Bessel filters, which are integrated in the hardware and hence known. The time continuous filter can be discretized at small errors. For further details we refer to (pein2018fully) and the references therein. If in other experiments the filter is unknown, it can be pre-estimated using the techniques in (tecuapetla2017autocovariance). Adding filtering to to estimate the piecewise constant signal is straightforward

(22)

Note that Fg is still smooth and hence the smoothing matrix S was not replaced in (22). However, we stress that one might modify S to obtain better results. The outlined extensions demonstrate that is a flexible approach that can often be adapted to more evolved settings quite easily. In many cases an extension of only requires that one is able to adapt the fused Lasso and the smoothing separately to the new problem.

References

Appendix A Further results

a.1 Simulation

Tables 6-11 report further simulation results that are discussed in the main paper.

Method blocks burt cosine heavisine
0.01613 0.5287 0.118 0.04249
1.841 23.72 0.3047 1.629
0.47 0.43 0.25 0.21
0.05131 4.104 0.5492 0.2461
0.1143 1.012 0.1121 0.09581
0.1655 1.485 0.378 0.05286
0.145 1.387 0.3638 0.05132
Table 6: Averaged mean square errors for and .
Method blocks burt cosine heavisine
0.1595 4.14 0.9989 0.2555
1.875 28.93 2.387 1.723
0.1444 12.5 1.399 0.8109
0.2018 15.57 1.679 0.3135
0.4154 4.656 1.194 0.2048
0.4137 4.555 1.159 0.1991
Table 7: Averaged mean square errors for and .
Method blocks burt cosine heavisine
0.004551 0.3023 0.06772 0.02558
1.316 14.89 0.1831 0.8058
0.0137 2.341 0.2295 0.1581
0.1118 0.6714 0.08328 0.08054
0.1072 1.086 0.2793 0.03801
0.1032 1.038 0.2734 0.03768
Table 8: Averaged mean square errors for and .
Method blocks burt cosine heavisine
0.02761 1.087 0.3064 0.09066
1.4 16.29 0.6232 0.8415
0.03185 5.061 0.5115 0.3617
0.131 3.307 0.2998 0.131
0.2131 2.169 0.5628 0.08329
0.2066 2.134 0.5556 0.082
Table 9: Averaged mean square errors for and .
Method blocks burt cosine heavisine
0.1888 4.395 1.079 0.2632
1.596 21.76 2.397 1.023
0.1699 12.06 1.194 0.8275
0.228 30.53 2.472 0.3398
0.4236 4.592 1.179 0.202
0.4185 4.533 1.157 0.1968
Table 10: Averaged mean square errors for and .
Setting Method % detected
blocks 0.4471 0.0317 0.0391 0.071 0.545 0.1593 0.0774 0.0765 96.44
blocks -5.63 1 0 0 0 0 0 0 48.76
blocks 0.0925 0 0 0 0.92 0.0691 0.0095 0.0014 100
blocks 12.55 0 0 0 0 0 0.0003 0.9997 99.53
burt 1.004 0 0 0.0139 0.4565 0.257 0.2172 0.0554 98.55
burt 6.619 0 0 0.0161 0.0822 0.0555 0.0561 0.7901 98.39
cosine 1.019 0.0399 0.094 0.1071 0.1631 0.1711 0.1917 0.2331 84.74
cosine 0.5833 0.0015 0.0285 0.1518 0.4018 0.2098 0.1077 0.0989 92.7
heavisine -0.4309 0 0.3945 0.238 0.1848 0.0733 0.0329 0.0765 39.01
heavisine -0.0233 0 0.2288 0.3574 0.1273 0.0894 0.0657 0.1314 48.95
Table 11: Summary results about the detection of change-points for and .

a.2 Application

Figure 9 shows that does not detect a false positive at the end of chromosome 20 if we use bandwidth and penalty . This is times the by cross-validation selected bandwidth and times the by cross-validation selected penalty.

Figure 9: Cell line GM03576 analysed by with slightly modified tuning parameters. It does not detect the additional false positive.