 The Cadzow's algorithm is a signal denoising and recovery method which was designed for signals corresponding to low rank Hankel matrices. In this paper we first introduce a Fast Cadzow's algorithm which is developed by incorporating a novel subspace projection to reduce the high computational cost of the SVD in the Cadzow's algorithm. Then a Gradient method and a Fast Gradient method are proposed to address the non-decreasing MSE issue when applying the Cadzow's or Fast Cadzow's algorithm for signal denoising. Extensive empirical performance comparisons demonstrate that the proposed algorithms can complete the denoising and recovery tasks more efficiently and effectively.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Cadzow’s algorithm  was proposed by J. A. Cadzow in 1988. The algorithm was initially developed for signal denoising, but has already been extended to many other applications, including time series forecasting , the filtering of digital terrain models , and seismic data denoising and reconstruction [22, 21, 23, 24, 20]

. For simplicity, we consider the 1D signal denoising problem. Here the task is to estimate a target signal

from the corrupted measurements

 y=x+e,

where

denotes additive noise. It is not hard to see that, without any additional assumptions, one cannot expect a universally better estimator than the one simply given by the noisy vector

. Thus, effective denoising methods typically rely on certain inherent simple structures hidden in the target signal. For example, the method of wavelet denoising assumes that the representation of under certain wavelet transform has many entries that are close to zero [7, 8]. By contrast, the Cadzow’s denoising is based on the low rank property of the Hankel matrix associated with the target signal.

Let be a complex-valued vector of length . Let be a linear operator which maps into an matrix whose -th entry is equal to , i.e.,

 Hz=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣z0z1z2⋯zK−1z1z2z3⋯zKz2z3z4⋯zK+1⋮⋮⋮⋮⋮zL−1zLzL+1⋯zN−1⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦, (1.1)

where the numbering of vector and matrix entries starts from rather than the more standard . Matrices of the form (1.1

) are known as Hankel matrices in which each skew-diagonal is constant. Here we will refer to the Hankel matrix

as the Hankelization of . Furthermore, the Moore-Penrose pseudoinverse of , denoted , is a linear map from matrices to vectors of length . Given a matrix , the vector is obtained by averaging each skew-diagonal of . More precisely, letting be the number of entries on the -th skew-diagonal of an matrix,

 wa=#{(i,j) | i+j=a, 0≤i≤L−1, 0≤j≤K−1}, (1.2)

the -th entry of , denoted is given by

 [H†Z]a=1wa∑i+j=a[Z]ij.

It follows immediately that , where is the identity operator from to .

We are now in position to introduce the Cadzow’s algorithm which is based on the fact that signals of interest arising from a wide range of applications have low rank Hankelization. Assume , where . Additive noise often increases the rank of and hence it is very typical that . Consequently, an intuitive way of estimating from is to do rank reduction. Starting from , the Cadzow’s algorithm iteratively updates the estimate via the following rule:

 zk+1=H†TrHzk, k=0,⋯. (1.3)

Here computes the truncated SVD of an matrix, that is,

 Tr(Z)=r∑j=1σjujv∗j,where Z=min(L,K)∑j=1σjujv∗j is an SVD with σ1≥σ2≥…≥σmin(L,K).

It is worth noting that the Cadzow’s algorithm is also known as multichannel singular spectrum analysis (MSSA) which was proposed by Broomhead and King in 1986 for the analysis of time series from dynamical system . In particular, the first iteration of the Cadzow’s algorithm is usually referred to as singular spectrum analysis (SSA) which is also very important in many applications; see for example [15, 16, 12, 13].

We can also interpret the Cadzow’s algorithm as the method of alternating projections in the matrix domain. To this end, let and be the set of rank- and Hankel matrices, respectively. For any matrix , it follows from the Eckart-Young theorem that the projection of onto is given by . Moreover, it can be easily verified that the projection of onto is given by . Therefore, if we let , after multiplying on both sides of (1.3), it can be seen that the Cadzow’s algorithm is equivalent to

 Zk+1=PMHPMrZk, k=0,⋯, (1.4)

where and denote the projection onto and , respectively.

As suggested in , the Cadzow’s algorithm can be adapted for signal recovery problems when there are missing entries. Recall that is our target signal. Suppose in some situations we are not able to observe all the entries of , but can only observe those entries with indices in , where is a subset of . Moreover, let denotes the samples of on , namely,

 [PΩ(x)]j={xjif j∈Ω0otherwise.

A natural question is whether it is possible to reconstruct from . This is an ill-posed problem without any restrictions on . However, when is low rank, it has been shown that the missing entries can be completed through convex methods  as well as non-convex methods . In , a variant of the Cadzow’s algorithm was proposed for signal recovery. The algorithm iteratively fills the missing entries in the following way: ,

 zk+1=PΩ(x)+(I−PΩ)H†TrHzk, k=0,⋯. (1.5)

Roughly speaking, the algorithm refines the estimation of the unknown entries using the entries of on . In addition, when noise exists on the observed entries, i.e., under the measurement model , one can use an appropriate linear combination of and to estimate rather than simply filling back the noisy entries. Interested readers are referred to  for details.

### 1.1 Examples where low rank Hankelization appears

As shown previously, the low rank structure of Hankel matrices plays a key role in the development of the Cadzow’s algorithm and its variant. In this subsection we present several real examples where low rank Hankelization appears.

#### Time series satisfying an LRF

Let be a time series of length . We say satisfies a linear recurrent formula (LRF) if

 xj=a1xj−1+⋯+arxj−r,j=r,⋯,N−1 (1.6)

for some coefficients . Let be an integer which is usually referred to as the window length, and let . Define the -lagged vectors . If we construct an matrix as follows

 X=[x0,⋯,xK−1],

it is easy to see that is indeed a Hankel matrix and in fact we have . Moreover, since is a time series which obeys (1.6), is a matrix of rank at most . Time series satisfying an LRF is a very important model for many real applications, see  for more details.

#### Spectrally sparse signals

Consider the 1D spectrally sparse signal consisting of complex sinusoids without damping factors

 x(t)=r∑j=1djei2πfjt, (1.7)

where is the normalized frequency, and is the corresponding complex amplitude. Let be the discrete samples of at . Letting , because is obtained from a spectrally sparse signal, it can be shown that [17, 27] admits the Vandermonde decomposition of the form

 Hx=ELDETR,

where

 EL=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣11⋯1y1y2⋯yr⋮⋮⋮⋮yL−11yL−12⋯yL−1r⎤⎥ ⎥ ⎥ ⎥ ⎥⎦, ER=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣11⋯1y1y2⋯yr⋮⋮⋮⋮yK−11yK−12⋯yK−1r⎤⎥ ⎥ ⎥ ⎥ ⎥⎦

and is a diagonal matrix whose diagonal entries are . The Vandermonde decomposition of implies that provided and when .

#### Periodic stream of Diracs

In the previous two examples, low rank Hankelization occurs in the time domain. There are also examples where low rank Hankelization occurs in the Frequency domain. Consider the following -periodic of Diracs studied in :

 x(t)=r∑j=1∑j′∈Zxjδ(t−tj−j′), (1.8)

where and . Since is -periodic, it can be represented using the Fourier series; that is

 x(t)=∑k∈Z^xkei2πkt,% where ^xk=r∑j=1xje−i2πktj.

Let be a vector consisting of consecutive values of and define

 ^x(ω)=r∑j=1xje−i2πtjω. (1.9)

We can view as a discrete sample vector from . Noticing the similarity between (1.7) and (1.9), it is not hard to see that .

### 1.2 Main contributions and outline

The main contributions of this paper are two-fold. Firstly, a Fast Cadzow’s algorithm has been developed through the introduction of an additional subspace projection; see Section 2. Numerical experiments in Section 3 establish the computational advantages of the Fast Cadzow’s algorithm over the Cadzow’s algorithm. Secondly, to address the non-decreasing MSE issue of the Cadzow’s and Fast Cadzow’s algorithms for signal denoising, a Gradient method and its accelerated variant are introduced in Section 4. The algorithms are then tested for different denoising problems to justify the effectiveness of the gradient algorithms. Section 5 concludes this paper with a short discussion.

In general, the complexity of computing the SVD of an matrix is when and are proportional to , causing the computation of the SVD (i.e., the operation) to be the dominant cost in the Cadzow’s iteration (1.3). This has limited the computational efficiency of the Cadzow’s algorithm, especially for high dimensional problems. In this section, we present an accelerated variant of the Cadzow’s algorithm, dubbed Fast Cadzow’s algorithm, for the denoising problem as well as the recovery problem from missing entires; see Algorithm 1 for a formal description. We only focus on the discussion of the algorithm for the denoising problem since the same idea is used for the recovery problem.

Recall that the denoising problem is about estimating a signal from the noisy measurements . We assume the Hankel matrix associated with is rank . The Fast Cadzow’s iteration is overall similar to the Cadzow’s iteration, except that there is an additional projection in the Fast Cadzow’s algorithm, where denotes the projection onto a matrix subspace . In other words, after the Hankel matrix is constructed, we first project it onto , followed by the truncation to the best rank approximation. The motivation behind Algorithm 1 is that after is introduced the resulting matrix can be more structured. Thus it can be expected that the SVD of the matrix can be computed in a fast way. Notice that if we choose to be in each iteration, Algorithm 1 is indeed the Cadzow’s algorithm.

### 2.1 Choice for Tk

Our choice of is inspired by the Riemannian optimization methods for low rank matrix recovery [26, 25], which is closely related to the manifold structure of low rank matrices. To motivate this choice, assume has already been computed via in the -th iteration. Let . Noticing that is a matrix of rank , it has the reduced SVD , where and are and orthogonal matrices, respectively. In the -th iteration, is selected to be the direct sum of the column and row subspaces of , i.e,.

 Tk={UkB∗+CV∗k | B∈CK×r, C∈CL×r}. (2.1)

From the perspective of differential geometry, the set of fixed rank matrices form a smooth manifold and is the tangent space of the manifold at .

Given a matrix , the projection of onto is given by

 PTk(Z)=UkU∗kZ+ZVkV∗k−UkU∗kZVkV∗k. (2.2)

In the first iteration (i.e., ) of Algorithm 1, we simply choose . That is, the first iteration of the Fast Cadzow’s algorithm coincides with that of the Cadzow’s algorithm, which is an SSA step. Noting that and when is close to a Hankel matrix, we have in this situation. It follows that since . Therefore, the projection of onto can capture most of its energy and we can expect that the Fast Cadzow’s algorithm should exhibit similar behavior to the Cadzow’s algorithm. The numerical results in Section 3 will confirm this intuition.

### 2.2 Computational complexity

The true novelty of introducing the additional projection is that after this projection the SVD of the matrix can be computed more efficiently. In a nutshell, the SVD of an matrix can be reduced to the SVD of a matrix. In this subsection we will investigate the computational complexity of the Fast Cazdow’s algorithm, together with the details for computing the SVD.

For ease of exposition we split the single Fast Cadzow’s iteration into three steps:

 ⎧⎪⎨⎪⎩Wk=PTkHzkLk+1=TrWkzk+1=H†Lk+1. (2.3)

Letting which will not be formed explicitly, by (2.2), we have

 Wk =UkU∗kHk+HkVkV∗k−UkU∗kHkVkV∗k =UkU∗kHkVkV∗k+UkU∗kHk(I−VkV∗k)+(I−UkU∗k)HkVkV∗k =UkGV∗k+UkB∗+CV∗k, (2.4)

where , and . In a practical implementation will be stored in the form of (2.4). Since is a Hankel matrix, both and can be computed using fast matrix-vector multiplications without forming . This costs flops assuming . Once and are known, , and can be computed using a few matrix-matrix products with flops. Thus, it requires flops to compute , and in (2.4).

Next we show how to reduce the SVD of to the SVD of a matrix starting from the decomposition (2.4). Let and

be the QR decompositions of

and , respectively. Then, it is not hard to see that

 Q1⊥Vk,Q2⊥Uk.

Moreover, we have

 Wk =UkGV∗k+UkR∗1Q∗1+Q2R2V∗k

Let be the middle matrix and suppose its SVD is given by

 [GR∗1R20]=UGΣGV∗G.

Since both and are orthogonal matrices, the SVD of is given by

From the above discussion, we can see that computing from (or equivalently, computing the SVD of ) requires flops which account for the QR decompositions of and , and the SVD of . Moreover, when (i.e., matrices are square), nearly half computational costs and storage can be saved by using the Takagi factorization. Interested readers can find the details in .

It remains to investigate the cost for computing . Let be the SVD of which can be obtained by truncating the SVD of . We have

Noting that

 [H†[Uk+1(:,j)(Vk+1(:,j))∗]]a=1wa∑p+q=aUk+1(p,j)¯¯¯¯Vk+1(q,j),a=0,⋯,N−1,

where is defined in (1.2), can be computed by fast convolutions using flops. Thus, computing from requires flops.

Putting it all together, the dominant per iteration computational cost of the Fast Cadzow’s algorithm is . In addition, space is required to store the matrices appearing in the SVD and the QR decompositions.

### 2.3 High dimensional problems

We have presented the Cadzow’s and Fast Cadzow’s algorithms for the 1D problem. However, the algorithms can also work for high dimensional problems based on the multi-fold Hankel structures. For ease of exposition we discuss the two-dimensional case briefly but emphasize that the situation in general higher dimensional cases is similar.

Let be a 2D signal, and let and be two pairs of positive numbers which satisfy and . The Hankel matrix corresponding to , denoted , is defined as follows:

 HX=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣HX(0,:)HX(1,:)HX(2,:)⋯HX(K1−1,:)HX(1,:)HX(2,:)HX(3,:)⋯HX(K1,:)HX(2,:)HX(3,:)HX(4,:)⋯HX(K1+1,:)⋮⋮⋮⋮⋮HX(L1−1,:)HX(L1,:)HX(L1+1,:)⋯HX(N1−1,:)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦,

where is the Hankel matrix for the -th row of . In other words, we first form the Hankel matrix for each row of and then form the block Hankel matrix using the row Hankel matrices.

Assuming is a low rank matrix, the Cadzow’s and Fast Cadzow’s algorithms can be similarly developed for the 2D denoising and reconstruction problems. The details will be omitted. Moreover, there are real signals whose Hankel matrices are low rank, for example if

is a 2D spectrally sparse signal or a frequency slice of seismic data after the Fourier transform

[17, 27, 19].

## 3 Numerical Experiments

In this section we evaluate the empirical performance of the Fast Cadzow’s algorithm against the Cadzow’s algorithm. In our implementations of the algorithms and (or and for higher dimensions) are chosen in such a way that the resulting Hankel matrix is approximately square. Numerical results demonstrate that the two algorithms exhibit similar denoising and reconstruction performance while the Fast Cadzow’s algorithm can be substantially faster. We first compare the algorithms on randomly generated spectrally sparse signals (Subsection 3.1) and then test them on problems arising from two applications (Subsections 3.2 and 3.3). The experiments are executed from Matlab 2018a on a desktop with two Inter i5 CPUs (1.60GHz and 1.80GHz respectively) and 8GB memory.

To utilize the fast Hankel matrix-vector multiplication, we use the Krylov subspace-based SVD solver from the PROPACK package  to compute the partial SVD of in the Cadzow’s algorithm. It would be difficult to summarize the computational cost of such a method for computing the partial SVD and optimistically we can say that the cost is flops. Though it has the same order as the SVD method for the Fast Cadzow’s algorithm, it is worth noting that the constant hidden in is not clear since the performance of the Krylov’s method depends heavily on properties of the input matrix and on the amount of effort spent to stabilize the algorithm . In contrast, the constants in the costs and for computing the SVD in the Fast Cadzow’s algorithm are exactly known since for example only comes from the computatioins of , and . Moreover, our numerical experiments show that the Fast Cadzow’s algorithm is still two to four times faster even after the fast partial SVD package is used for the Cadzow’s algorithm.

### 3.1 Spectrally sparse signal denoising and reconstruction

The definition of spectrally sparse signals can be found in Section 1.1. We first test the performance of the algorithms on signal denoising. The test signals are generated in the following way: the frequency of each harmonic is uniformly sampled from , and the argument of the weight is uniformly sampled from while the amplitude is chosen to be with being uniformly sampled from .

We consider the additive noise model in which a true signal is corrupted by a noisy vector in the form of

 e=ϵ⋅∥x∥⋅w∥w∥,

where is referred to the noise level and

follows the standard multivariate normal distribution. Letting

be the output of the algorithms, the mean squared error (MSE) defined by

 MSE=∥zk−x∥∥x∥

will be used to evaluate their performance. Tests are conducted for 1D, 2D and 3D spectrally sparse signals with , , and . The Cadzow’s and Fast Cadzow’s algorithms are terminated whenever . For each pair of randomly generated problem instances are tested. Then we compute the average computation time, the average number of iterations and the average MSE for each algorithm; see Table 1. From the table we can see that the Cadzow’s and Fast Cadzow’s algorithms achieve similar MSE and it also takes them roughly the same number of iterations to converge. However, the Fast Cadzow’s algorithm is at least two times faster than the Cadzow’s algorithm since the additional subspace projection can reduce the computational cost of the SVD.

Next we compare the two algorithms on reconstruction problems when there are missing entries. The overall setup is similar to the denoising case except that instead of having being contaminated by additive noise we only observe of its entries uniformly at random. Note that the variant of the Cadzow’s algorithm for handling missing entries is presented in (1.5). In the tests the algorithms are terminated when . The average computational results over random simulations are presented in Table 2. Clearly, both of the algorithms are able to exactly recover the missing entries and the Fast Cadzow’s algorithm is substantially faster.

The algorithms can also handle missing entries and additive noise simultaneously. The basic idea is to make an appropriate linear combination between observed measurements and the update. More precisely, we consider the following variants of the algorithms:

where is a tuning parameter. In our tests we choose as suggested by Gao et al. , and the algorithms are terminated when . The numerical results for this setting are summarized in Table 3. Again, the performance of the Fast Cadzow’s algorithm is comparable to that of the Cadzow’s algorithm but the former one is computationally more efficient.

### 3.2 Denoising in the reconstruction of periodic stream of Diracs

Methods for efficient signal acquisition and reconstruction are fundamental in signal processing. In , a novel paradigm has been proposed for sampling and reconstruction of a -periodic stream of Diracs. The new paradigm can achieve the minimum samples at the signal’s rate of innovation. The overall sampling and reconstruction procedure is illustrated in Figure 1.

For simplicity consider the -periodic stream of Diracs described in (1.8). Noticing that and () are the only unknowns in , the goal is to devise a sampling and reconstruction scheme such that and can be retrieved from about samples. In , the signal is first convolved with a sinc kernel of bandwidth where

is an odd integer and then the samples are obtained at a equally-spaced grid. This can be formally expressed as

 yn=⟨x(t),sinc(B(nT−t))⟩=r∑j=1xjϕ(nT−tj),n=1,2,⋯N, (3.3)

where , and is the Dirichlet kernel:

 ϕ(t)=sin(πBt)Bsin(πt). (3.4)

A annihilating filter method has been proposed to reconstruct from the samples . A key ingredient in the method is the construction of a filter which can annihilate . Recall that is a periodic signal with discretized Fourier coefficients . It has been shown in  that the annihilation filter can be identified as a null space vector of a Toeplitz matrix involving a set of consecutive values of . Then the reconstruction of from the samples finally boil down to the problems of estimating from . To this end, it is shown in  that the Fourier coefficients of coincide with a subset of .

If there is no noise, the annihilating filter method is able to reconstruct exactly from the minimum number of measurements. However, noise is prevalent during the sampling process. We will consider the additive noise model; namely, is observed. When noise exists a common strategy is to oversample the measurements and then do denoising. This is where the Cadzow’s or Fast Cadzow’s algorithm plays a role. In Section 1.1 we have shown that the Hankel matrix corresponding to a subset of is low rank. Noticing the agreement between and , it follows immediately that the Hankel matrix associated with is also low rank. Therefore, the Cadzow’s or Fast Cadzow’s algorithm can be used for the task of denoising.

The denoising problem studied here differs from the one in the last subsection as the Hankelization of the signal is not low rank but the Hankelization of its Fourier coefficients is low rank. Next we compare the performance of the Cadzow’s and Fast Cadzow’s algorithms in this situation following the setup from . Tests are conducted for with , where is uniformly sampled from and is uniformly sampled from . The window length of the sinc kernel is chosen to and samples are observed. We test three different noise levels and 1500 randomly generated problem instances are simulated for each noise level. The algorithms are terminated when . The average computational results are shown in Table 4. Even though both of the algorithms are very fast in the experiments due to the small problem size, the Fast Cadzow’s algorithms is about four times faster.

### 3.3 Seismic data denoising and reconstruction

Seismic denoising and seismic recovery from missing traces are two major tasks in seismic data processing and different techniques have been developed. The Cadzow’s algorithm which was called the - eigen filtering has been proposed to attenuate random noise from seismic records by Trickett [22, 21, 23, 24, 20]. It is based on the idea that if seismic data consists of linear events then the associated Hankel matrices of the constant-frequency slices are of rank . The algorithm was extended in  to the problem of reconstructing missing traces (see (1.5)), which can be viewed as a non-convex version of the method of projection onto convex sets (POCS, ). Moreover, a generalized version of the algorithm was also presented in  for simultaneously denoising and reconstruction of the seismic data, which is actually the one listed in (3.1).

As an accelerated variant of the Cadzow’s algorithm, the Fast Cadzow’s algorithm is equally applicable for seismic data processing. We compare the performance of the two algorithms on a 4D volume with traces and each trace contains time samples. Thus, this is a 5D data array with one time dimension and four spatial dimensions. The data consists of three linear events, so we set in the algorithms. Tests have been conducted for the denoising problem with as well as for the recovery problem with 50% missing traces. As is typical in the literature on seismic data processing, the algorithms are run for a total number of iterations. The MSE and computational time are presented in Table 5. In addition, the seismic profile of one data slice with three fixed spatial dimensions has been plotted in Figure 2 and 3 for seismic denoising and recovery, respectively. From the table and the figures we can see that the two algorithms have comparable performance. However, the table does show that the MSE of the Fast Cadzow’s algorithm is slightly better than that of the Cadzow’s algorithm for seismic denoising. Moreover, the Fast Cadzow’s algorithm is two times faster for the denoising problem and four times faster for the recovery problem. Figure 2: A slice of (a) noiseless data, (b) noisy data, (c) data returned by Cadzow denoising, and (d) data returned Fast Cadzow denoising. Vertical: time dimension; Horizontal: one spatial dimension with the other three being held fixed. Figure 3: A slice of (a) data with missing traces, (b) data returned by Cadzow recovery, and (c) data returned by Fast Cadzow recovery. Vertical: time dimension; Horizontal: one spatial dimension with the other three being held fixed.

## 4 A gradient variant for denoising

### 4.1 Motivation and new algorithms

When applying the Cadzow’s algorithm or the Fast Cadzow’s algorithm for signal denoising, it has been observed that the MSE does not always decrease as the algorithm iterates. For example, in  Trickett notes that

Cadzow recommended iterating between the rank reduction and averaging steps, but I have not found it to give better results for this application.

Gillard also notes in his paper  that

In the simulation study within this paper, it has been demonstrated that repeated iterations of Cadzow’s basic algorithm (in an attempt to separate the noise from the signal) may result in an increased RMSE from the true signal.

As an illustration, the MSE plots corresponding to two random tests on 1D spectrally sparse signal denoising with , , and are presented in Figure 4. Apparently, the MSE in the left plot decreases but the MSE in the right plot increases.

To explain this phenomenon, we will study the equivalent form of the Cadzow’s algorithm in (1.4). It is trivial that the update can be rewritten as

 Zk+1 =PMHPMrZk =PMHPMr(Zk+t(Y−Zk)),t=0,

where and . Thus, the Cadzow’s algorithm can be interpreted as a projected gradient method for the following optimization problem

 min12∥Z−Y∥2Fsubject torank(Z)≤r and Z is % Hankel, (4.1)

though with a step length . Note that the objective function in the above optimization problem is equal to

 N−1∑a=0wa|za−ya|2,

where is the number of entries on the -th skew-diagonal of an matrix; see (1.2). Thus, the Cadzow’s algorithm for signal denoising indeed solves an optimization problem which puts different weights on different entries of the signal. Moreover, the Fast Cadzow’s algorithm should share the same property as the Cadow’s algorithm. Since the weights for the middle entries are larger than those at the two ends, one may expect that after the Cadzow’s or Fast Cadzow’s denoising the component-wise MSE (defined as ) of the middle entries should be smaller. The plot in Figure 5 confirms this speculation, which shows the average component-wise MSE out of 1500 random tests after applying the Cadow’s and Fast Cadzow’s algorithms for 1D spectrally sparse signal denoising with , , and .

In order to address the unbalanced weight issue, we consider the following optimization problem with re-weighted objective function:

 min12∥√W⊙(Z−Y)∥2Fsubject torank(Z)≤r and Z is Hankel, (4.2)

where means taking the square-root of each entry of , denotes the component-wise product and

 √W=H(√w0,⋯,√wN−1)=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣11√21√3⋮⋮1√21√3⋮⋮⋮1√3⋮⋮⋮1√3⋮⋮⋮1√31√2⋮⋮1√31√21⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

A projected gradient method with step length can then be developed for (4.2) as follows:

 Zk+1=PMHPMr(Zk+W⊙(Y−Zk)).

The equivalent form of the algorithm in the vector domain, dubbed Gradient method, is presented in Algorithm 2, where .

Of course we can use the same technique as in the Fast Cadzow’s algorithm to accelerate Algorithm 2, leading to the Fast Gradient method; see Algorithm 3. In the remainder of this section we compare the algorithms on different signal denoising problems, focusing on the denoising performance of the algorithms. About the computation efficiency, it is clear that the dominant per iteration computational cost of the Gradient method is the same as that of the Cadzow’s algorithm, and the dominant per iteration computational cost of the Fast Gradient method is the same as that of the Fast Cadzow’s algorithm.

### 4.2 Empirical evaluations

We first compare the four algorithms, Cadzow, Fast Cadzow, Gradient and Fast Gradient, on spectrally sparse signal denoising (see Section 3.1) with , and noise level . All the algorithms are run for the maximum of iterations. For each , 1500 random problem instances are tested. The average per iteration MSE of each algorithm against the number of iterations is presented in Figure 6. Overall the MSE of the Cadzow’s and Fast Cadzow’s algorithms for the 1D denoising problem increases as the algorithm iterate. In contrast, the MSE of the Gradient and Fast Gradient methods decreases as the algorithms iterate. For the 2D and 3D denoising problems, though on average the MSE of all the algorithms shows a decreasing trend, the Gradient and Fast Gradient methods can achieve smaller MSE. Figure 6: Average per iteration MSE out of 1500 random tests for spectrally sparse signal denoising.

In addition, we also examine each individual test and consider one to be a positive test if the MSE of an algorithm at the last iteration is smaller than that at the first iteration. The portion of positive tests for each test algorithm is presented in Table 6. It can be observed that the number of positive tests of the Gradient and Fast Gradient methods is larger than that of the Cadzow’s and Fast Cadzow’s algorithms, especially for the 1D denoising problem.

It is also interesting to compare the average component-wise MSE of the outputs of the four algorithms; see Figure 7 for the 1D case. The figure shows that, to some extent, the Gradient and Fast Gradient methods are able to alleviate the issue of unbalanced weights over different entries.

Next we test the algorithms for the denoising problem arising from the reconstruction of Diracs; see Section 3.2. Roughly speaking, the Fourier transform of the test signal has a low rank Hankelization. Thus we first do denoising in the Frequency domain and then get the denoised signal via the inverse FFT. Here we repeat 1500 random tests for the noise level . The average MSE against the number of iterations is plotted in Figure 8. Meanwhile, the portion of positive tests of each algorithm is contained in Table 7. Obviously, the average MSE of the Gradient and Fast Gradient methods decreases while the MSE of the Cadzow’s and Fast Cadzow’s algorithms increases, and there are more positive tests after the Gradient and Fast Gradient denoising. Moreover, Figure 8 also shows the denoising results of the four different algorithms in the time domain from a single random test. It can be seen that the Gradient and Fast Gradient methods exhibit better denoising performance in the region where . Figure 8: Left: Average per iteration MSE out of 1500 random tests for denoising Dirac samples in the Frequency domain; Right: Denoising results in the time domain.

Lastly, we compare the algorithms on the seismic denoising problem from Section 3.3. The MSE plot against the number of iterations is presented in Figure 9. The plot shows that overall all the algorithms have decreasing MSE, but the MSE of the Gradient and Fast Gradient methods is smaller. It is also worth noting that for seismic data denoising the accelerated algorithms have noticeable smaller MSE than their non-accelerated counterparts. Moreover, the seismic profile of a data slice after the Gradient and Fast Gradient denoising is presented in Figure 10 to help visualize their denoising performance. Figure 9: MSE for seismic data denoising. Figure 10: A slice of (a) data returned by Gradient denoising and (b) data returned by Fast Gradient denoising. Vertical: time dimension; Horizontal: one spatial dimension with the other three being held fixed.

## 5 Conclusion

In this paper we consider the signal denoising and recovery problems for signals corresponding to low rank Hankel matrices. New algorithms have been proposed which can complete the tasks more efficiently and effectively. For future work, we would like to study the theoretical convergence analysis of the proposed algorithms under certain random models. It may also be interesting to see whether it is possible to design better adaptive re-weighting schemes for the gradient methods.

## Acknowledgments

HW and KW would like to thank Jianjun Gao for sending us the 5D data for the seismic denoising and recovery tests.

## References

•  R. Abma and N. Kabir,

3d interpolation of irregular data with a pocs algorithm

, Geophysics, 71 (2006), pp. E91–E97.
•  T. Blu, P.-L. Dragotti, M. Vetterli, P. Marziliano, and L. Coulot, Sparse sampling of signal innovations, IEEE Signal Processing Magazine, 25 (2008), pp. 31–40.
•  D. S. Broomhead and G. P. King, Extracting qualitative dynamics from experimental data, Physica D: Nonlinear Phenomena, 20 (1986), pp. 217–236.
•  J. A. Cadzow, Signal enhancement – a composite property mapping algorithm, IEEE Transactions on Acoustics, Speech, and Signal Processing, 36 (1988), pp. 49–62.
•  J.-F. Cai, T. Wang, and K. Wei, Fast and provable algorithms for spectrally sparse signal reconstruction via low-rank hankel matrix completion, Applied and Computational Harmonic Analysis, 46 (2019), pp. 94–121.
•  Y. Chen and Y. Chi, Robust spectral compressed sensing via structured matrix completion, IEEE Transactions on Information Theory, 60 (2014), pp. 6576–6601.
•  D. L. Donoho, De-noising by soft-thresholding, IEEE Transactions on Information Theory, 41 (1995), pp. 613–627.
•  D. L. Donoho and J. I. M., Minimax estimation via wavelet shrinkage, The Annals of Statistics, 26 (1998), pp. 879–921.
•  J. Gao, M. D. Sacchi, and X. Chen, A fast reduced-rank interpolation method for prestack seismic volumes that depend on four spatial dimensions, Geophysics, 78 (2013), pp. V21–V30.
•  J. Gillard, Cadzow’s basic algorithm, alternating projections and singular spectrum analysis, Statistics and its Interface, 3 (2010), pp. 335–343.
•  N. Golyandina, I. Florinsky, and K. Usevich, Filtering of digital terrain models by 2d singular spectrum analysis, International Journal of Ecology & Development, 8 (2007), pp. 81–94.
•  N. Golyandina, V. Nekrutkin, and A. A. Zhigljavsky, Analysis of time series structure: SSA and related techniques, Chapman and Hall/CRC, 2001.
•  N. Golyandina and D. Stepanov, SSA-based approaches to analysis and forecast of multidimensional time series, in proceedings of the 5th St. Petersburg workshop on simulation, vol. 2, 2005, pp. 293–298.
•  N. Halko, P. G. Martinsson, and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review, 53 (2011), pp. 217–288.
•  H. Hassani and D. Thomakos, A review on singular spectrum analysis for economic and financial time series, Statistics and its Interface, 3 (2010), pp. 377–397.
•  H. Hassani and A. Zhigljavsky, Singular spectrum analysis: methodology and application to economics data, Journal of Systems Science and Complexity, 22 (2009), pp. 372–394.
•  Y. Hua, Estimating two-dimensional frequencies by matrix enhancement and matrix pencil, IEEE Transactions on Signal Processing, 40 (1992), pp. 2267–2280.
•  R. Larsen, PROPACK-software for large and sparse SVD calculations, version 2.1. http://sun.stanford.edu/~rmunk/PROPACK/, Apr. 2005.
•  V. Oropeza, The singular spectrum analysis method and its application to seismic data denoising and reconstruction. M.Sc. thesis, University of Alberta, 2010.
•  V. Oropeza and M. Sacchi, Simultaneous seismic data denoising and reconstruction via multichannel singular spectrum analysis, Geophysics, 76 (2011), pp. V25–V32.
•  V. E. Oropeza and M. D. Sacchi, Multifrequency singular spectrum analysis, in SEG Technical Program Expanded Abstracts 2009, Society of Exploration Geophysicists, 2009, pp. 3193–3197.
•  M. D. Sacchi, FX singular spectrum analysis, in CSPG CSEG CWLS Convention, 2009, pp. 392–395.
•  S. Trickett, F-xy cadzow noise suppression, in SEG Technical Program Expanded Abstracts 2008, Society of Exploration Geophysicists, 2008, pp. 2586–2590.
•  S. Trickett and L. Burroughs, Prestack rank-reduction-based noise suppression, CSEG Recorder, 34 (2009), pp. 24–31.
•  B. Vandereycken, Low-rank matrix completion by Riemannian optimization, SIAM Journal on Optimization, 23 (2013), pp. 1214–1236.
•  K. Wei, J.-F. Cai, T. F. Chan, and S. Leung, Guarantees of Riemannian optimization for low rank matrix recovery, SIAM Journal on Matrix Analysis and Applications, 37 (2016), pp. 1198–1222.
•  H. H. Yang and Y. Hua, On rank of block hankel matrix for 2-d frequency detection and estimation, IEEE Transactions on Signal Processing, 44 (1996), pp. 1046–1048.