# Fast Singular Value Shrinkage with Chebyshev Polynomial Approximation Based on Signal Sparsity

We propose an approximation method for thresholding of singular values using Chebyshev polynomial approximation (CPA). Many signal processing problems require iterative application of singular value decomposition (SVD) for minimizing the rank of a given data matrix with other cost functions and/or constraints, which is called matrix rank minimization. In matrix rank minimization, singular values of a matrix are shrunk by hard-thresholding, soft-thresholding, or weighted soft-thresholding. However, the computational cost of SVD is generally too expensive to handle high dimensional signals such as images; hence, in this case, matrix rank minimization requires enormous computation time. In this paper, we leverage CPA to (approximately) manipulate singular values without computing singular values and vectors. The thresholding of singular values is expressed by a multiplication of certain matrices, which is derived from a characteristic of CPA. The multiplication is also efficiently computed using the sparsity of signals. As a result, the computational cost is significantly reduced. Experimental results suggest the effectiveness of our method through several image processing applications based on matrix rank minimization with nuclear norm relaxation in terms of computation time and approximation precision.

## Authors

• 1 publication
• 5 publications
• 3 publications
• 7 publications
09/01/2015

### Fast Randomized Singular Value Thresholding for Low-rank Optimization

Rank minimization can be converted into tractable surrogate problems, su...
04/14/2017

### A Fast Implementation of Singular Value Thresholding Algorithm using Recycling Rank Revealing Randomized Singular Value Decomposition

In this paper, we present a fast implementation of the Singular Value Th...
04/27/2020

### Input-Sparsity Low Rank Approximation in Schatten Norm

We give the first input-sparsity time algorithms for the rank-k low rank...
12/02/2010

### A Block Lanczos with Warm Start Technique for Accelerating Nuclear Norm Minimization Algorithms

Recent years have witnessed the popularity of using rank minimization as...
12/29/2021

### Fast Subspace Identification Method Based on Containerised Cloud Workflow Processing System

Subspace identification (SID) has been widely used in system identificat...
12/06/2014

### Generalized Singular Value Thresholding

This work studies the Generalized Singular Value Thresholding (GSVT) ope...
03/10/2020

### Error Estimation for Sketched SVD via the Bootstrap

In order to compute fast approximations to the singular value decomposit...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

The low-rank structure inherent in various signals has been widely exploited in many signal processing applications, such as matrix and tensor completion

[1, 2, 3, 4], image decomposition [5, 6], photometric stereo [7, 8], image alignment [9, 10][11], inpainting [12, 13], background modeling [14, 15, 16, 17, 18], color artifact removal [19], cognitive radio [20], and voice separation [21]. In such applications, the low-rank structure is incorporated into a minimization problem involving the rank function or its continuous relaxation. The problem is solved using some iterative algorithms, with which the thresholding of singular values is usually required at each iteration. We refer to this methodology as matrix rank minimization.

There are two representative approaches of matrix rank minimization. One is the exact method. It is an ideal formulation, but the resulting problem is very difficult to solve due to the non-convexity and combinatorial nature of the rank function. The other is the nuclear norm relaxation [22]. Since the nuclear norm, the sum of the singular values of a matrix, is the tightest convex relaxation of the rank function, we can efficiently solve the resulting problem via convex optimization techniques. Weighted nuclear norm relaxation [23, 24] has recently been proposed as a non-convex but continuous approximation of the rank function.

Essentially, both of the above methods require the thresholding of singular values, which we call singular value shrinkage, at each iteration of certain optimization methods (Fig. 1). That is, most methods for matrix rank minimization must carry out singular value decomposition (SVD) many times. This is a serious problem in terms of computational cost when we handle large matrices, even with high-spec computers.

Several methods have been proposed to tackle this issue [25, 26, 27]. The basic concept of [25, 26] is to approximately compute partial singular values and/or vectors. These methods can drastically reduce the computation time of singular value shrinkage but would not be suitable for the matrix rank minimization. Since the number of singular values above a threshold is not identified without the full decomposition, many singular values above a threshold are reduced to zero in each iteration. As a result, large approximation errors are produced, which results in an unstable convergence in the matrix rank minimization. With the other method [27], singular value shrinkage is carried out by computing neither singular values nor vectors, but the reduction in the computation time is still limited. This is because the method requires a complete orthogonal decomposition [28] and the calculation of the inverse of a large matrix. It also leads to large approximation errors, i.e., an unstable convergence in the matrix rank minimization. We consider a method similar to that by Cai and Osher [27]: We only need a “processed” matrix with thresholded singular values.

In this paper, we propose a fast singular value shrinkage method for reducing the computational cost in the matrix rank minimization of high dimensional matrices. Note that the proposed method computes neither singular values nor vectors during the process of singular value shrinkage, similar to the method by Cai and Osher [27]. Furthermore, our method maintains computational precision to lead matrix rank minimization algorithms to stable convergence. The two key tools of our method are described as follows.

• Chebyshev polynomial approximation (CPA) [29, 30, 31]: This tool is often used for designing filters in signal processing [32, 33] and is a key tool for reducing computational cost. The applications of CPA have been studied by Saad et al.[34, 35, 36, 37]. With the applications by Saad et al., CPA is used to calculate a vector after being transformed by a matrix with singular value shrinkage. That is, it requires the iterative multiplications of a matrix and vector to derive the Chebyshev polynomials. The concept of the applications has recently been used for improving the performance of image filtering methods such as bilateral filter, non-local means, and BM3D [38, 39, 40]. In contrast, we propose a method to obtain a matrix whose singular values are processed by using CPA. Since CPA results in truncation errors, such as ripples in the lower-order approximations, we also investigate the designs of thresholding functions and appropriate approximation order for reducing approximation errors.

• Sparsity of signals: By using CPA, our method can represent singular value shrinkage as a multiplication of matrices. Since the multiplication can be computed efficiently when the matrices are sparse, our method exploits the inherent sparsity of signals in their frequency domain for further acceleration.

Since matrix rank minimization plays a central role in various signal processing tasks, our method offers many promising applications. For this study, we validated the proposed method by using two image processing applications: image inpainting [12] and background modeling[14, 15, 16, 17, 18]. In these applications, target problems are formulated as convex optimization problems involving the nuclear norm so that they can be efficiently solved using the alternating direction method of multiplier (ADMM) [41] with our method.

Although the ADMM is widely known as a robust method for computation errors in each iteration, optimization methods (including the ADMM) with the other fast singular value shrinkage methods [25, 26, 27] do not converge well due to their large approximation errors. In contrast, our CPA-based singular value shrinkage method leads optimization methods to stable convergence. We validated this advantage experimentally by comparing our method with the other fast singular value shrinkage methods in several image processing tasks and a synthetic data.

The preliminary version of this study, without using signal sparsity, analysis of our method, and new applications, has previously been published [42].

This paper is organized as follows. Section II defines notations and preliminaries. We discuss our CPA-based singular value shrinkage method, which is the main contribution in this paper, in Section III. We discuss an approximation order of CPA for reducing the size of approximation errors in Section IV and verification of our method through applications in Section V. Finally, we conclude the paper in Section VI.

## Ii Notations and Preliminaries

### Ii-a Notations

Bold-face capital and small letters indicate a matrix and a vector, respectively. Superscript is the transpose of a matrix and a vector, and superscript is the inverse of a non-singular matrix. The matrices and

are the identity matrix and null matrix, respectively. The vector

. The norm for is defined as . We also use CPA as follows.

### Ii-B Chebyshev Polynomial Approximation

Let and be a real-valued function defined on the interval and its approximated function by using CPA, respectively. Chebyshev polynomial approximation [29, 30, 31] gives an approximate solution of by using the truncated Chebyshev series:

 ˆh(x):=12c0+α−1∑k=1ckψk(x), (1)

where and denote a Chebyshev coefficient (described later) and an approximation order, respectively. Additionally, denotes the -th order Chebyshev polynomials of the first kind, defined as

 ψk(x):=cos(karccos(x)). (2)

It can also be computed using the stable recurrence relation:

 ψk(x)=2xψk−1(x)−ψk−2(x),ψ0(x)=1,ψ1(x)=x. (3)

The initial condition is defined as and . Since the polynomials consist of cosine functions, the value of is bounded between and for . By using and the orthogonality of the cosine function, is calculated as

 ck:=2αα∑l=1cos(kθ(l))h(cosθ(l)), (4)

where .

## Iii Singular Value Shrinkage using Chebyshev Polynomial Approximation by Exploiting Sparsity

We discuss singular value shrinkage using CPA. First, the CPA of a matrix form, which can approximately shrink the eigenvalues of a matrix (

eigenvalue shrinkage), is indicated then extended to the singular one.

### Iii-a Chebyshev Polynomial Approximation for Matrix

Let be a full rank matrix and be its eigendecomposition (EVD), where

is the matrix composed of eigenvectors and

is the diagonal matrix with the corresponding eigenvalues. We assume that the eigenvalues are bounded between and , where . Hence, the eigenvalues of are shrunk as

 H(A):=Pdiag(h(λA1),…,h(λAn))P−1, (5)

where is the eigenvalue shrinkage function, and is the filter kernel defined in . In this subsection, we consider the approximated solution of (5) using the CPA.

The CPA of the matrix form [30, 34, 36, 37] gives an approximated solution of the eigenvalue shrinkage function by using truncated Chebyshev series as

 ˆH(A):=12ˆc0Id+α−1∑k=1ˆckΨk(ˆA), (6)

where and are Chebyshev coefficients and Chebyshev polynomials, respectively, which are defined later. Additionally, is the eigenvalue-shifted matrix given by

 ˆA:=2λAmaxA−Id, (7)

whose eigenvalues are obviously within . Thanks to (7), the -th order Chebyshev polynomial of is computed as

 Ψk(ˆA) =Ψk(2λAmaxA−Id) =PΨk(2λAmaxΛA−Id)P−1 =Pdiag(coskθ1,…,coskθn)P−1, (8)

where . Similarly to (3), the Chebyshev polynomials are obtained using the recurrence relation:

 (9)

Recall that is defined only in the interval . Therefore, the range of the filter kernel is modified by deriving as

 (10)

The term returns the shifted range back to the original range . From (10), can also be represented using as

 (11)

The function , which is referred to as the CPA-based eigenvalue shrinkage function, results in approximate eigenvalue shrinkage. The CPA-based eigenvalue shrinkage actually computes neither eigenvalues nor vectors thanks to the recurrence relation (9).

### Iii-B CPA-based Singular Value Shrinkage

Let be a rectangular matrix and be its singular value decomposition, where and are orthogonal matrices. The is the singular value matrix represented as

 Σ=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣σ1O⋱σnO⎤⎥ ⎥ ⎥ ⎥ ⎥⎦. (12)

Without loss of generality, we can assume . The singular values of are shrunk with the singular value shrinkage function as

 G(B):=U⎡⎢ ⎢ ⎢ ⎢ ⎢⎣g(σ1)O⋱g(σn)O⎤⎥ ⎥ ⎥ ⎥ ⎥⎦V⊤, (13)

where is an arbitrary function.

The eigenvalue shrinkage in (5) can be extended to in (13) as [35]

 G(B)=BH(B⊤B), (14)

where in in (5). Equation (14) is derived as follows. First, (13) can be expanded as

 G(B) =UΣdiag(g(σ1)σ1,…,g(σn)σn)V⊤ (15) =UΣV⊤Vdiag(g(σ1)σ1,…,g(σn)σn)V⊤ =BVdiag(g(σ1)σ1,…,g(σn)σn)V⊤.

When the eigenvalue matrix of is defined as , it is obviously represented using the singular values of as . Consequently, (15) is equally calculated using in (5) as

 G(B) =BVdiag(g(σ1)σ1,…,g(σn)σn)V⊤ (16) =BVdiag(h(σ21),…,h(σ2n))V⊤ =BH(B⊤B).

Note that [35] aims to calculate a vector represented as

 ˆx=BH(B⊤B)x, (17)

where and are the input and output vectors, respectively. The CPA is applied to to quickly derive in [35]. In contrast, our method is focused on deriving the matrix in (16) itself. When the matrix, whose singular values are shrunk using CPA, is represented as , the explicit SVD of can be avoided. However, deriving the matrix, not the vectors, usually requires enormous computation time because of multiplication of dense matrices. To accelerate the calculation, the sparseness of a matrix is exploited with our method, as indicated below.

Assume that is a matrix composed of an inherently sparse signal. Let

be an arbitrary orthogonal matrix that efficiently sparsifies

, e.g.,

is considered as the discrete Fourier transform

[43], discrete cosine transform (DCT) [44], and discrete wavelet transform (DWT) [45]. Note that the matrix is the forward transform, i.e., when let be a column vector, the forward transform is represented as . From the above, (14) is further rewritten with as

 G(B)=BT⊤H(TB⊤BT⊤)T=BT⊤H(Φ)T, (18)

where for simplicity. With the CPA-based eigenvalue shrinkage function in (6) and (11), in (18) is efficiently approximated as

 (19)

The form of (19) enables us to use the sparsity of a signal in its frequency domain.

For further enhancing the sparsity of , its components are thresholded as

 Φij–––={Φijif |Φij|≥ε,0otherwise, (20)

where and are the -th row and -th column of and its truncated coefficient, and is an arbitrary small value. We show that this truncation has little effect on the performance of our method and provides recommended settings of in Section V. As a result, the CPA-based eigenvalue shrinkage of is approximately given by

 ˆH(Φ)≈ˆH(Φ––). (21)

In summary, the singular value shrinkage of is approximately represented as

 G(B) =U⎡⎢ ⎢ ⎢ ⎢ ⎢⎣g(σ1)O⋱g(σn)O⎤⎥ ⎥ ⎥ ⎥ ⎥⎦V⊤ =BH(B⊤B) =BT⊤H(Φ)T ≈BT⊤ˆH(Φ)T ≈BT⊤ˆH(Φ––)T=ˆG(B), (22)

where the function is the CPA-based singular value shrinkage function. It can be calculated with the recurrence relation as

 (23)

where

 ˆB:=2ΛmaxΦ––−Id, (24)

in which is the maximum eigenvalue of . The pseudocode of the CPA-based singular value shrinkage is indicated in Algorithm 1.

### Iii-C Computational Complexity of CPA-based Singular Value Shrinkage

We now discuss the computational complexity of our method. Assume that matrices and have and nonzero elements, respectively. The maximum number of multiplications of nonzero elements required to calculate is represented as in the case of a sparse matrix. The computational complexity of line 9 in Algorithm 1 can be represented as due to the multiplication . At line 10 in Algorithm 1, the computation takes from the multiplication . That is, the total computational complexity is represented as , where represents the maximum value among . From the above, when becomes small, the computational cost is also reduced.

For low computational complexity, the matrix should be constructed so as not to increase the number of its nonzero elements as much as possible in the multiplication of matrices.

## Iv Shrinkage Functions and Approximation Order

In this section, we discuss suitable approximation orders for shrinkage functions approximated by CPA, which has small truncation errors. Additionally, we argue that CPA is a reasonable choice for our method among a variety of polynomial approximation methods.

As an introduction, we consider the shrinkage function shown in Fig. 2(a). Let be the hard shrinkage response defined as

 hhard(x):={1if x>τhard,0otherwise, (25)

where is an arbitrary real value and . Chebyshev polynomial approximation gives an approximate response of in (1). As in Fig. 2(b), the approximated response has ripples, which is widely known in digital filter design [46, 47, 32, 48, 49]. Therefore, studying the design of appropriate shrinkage responses and approximation orders is an important topic, even for our method.

### Iv-a Approximation Order

Possible shrinkage responses handled with our method can be expressed as the following generic form:

 h(x;w(x)ρ,τ):=⎧⎪ ⎪⎨⎪ ⎪⎩√x−w(x)ρ√xif √x>τ,0otherwise, (26)

where is a weight function, and and are arbitrary thresholding values. The choices of and determine the characteristics of (26) as follows:

•  : Hard-shrinkage.

•  : Weighted soft-shrinkage.

•  : Soft-shrinkage111Soft-shrinkage is widely known as in which is an operator choosing the greater one out of and . However, we call soft-shrinkage because it is finally transformed into in (22)..

Note that we defined in (14), where is an arbitrary shrinkage function for the eigenvalues of .

As a result, becomes the hard-shrinkage, weighted soft-shrinkage, or soft-shrinkage functions when is set as above.

These choices among the shrinkages are shown in Fig. 3. It is clear that hard-shrinkage has a sharp transition band (see Fig. 3(a)); therefore, CPA, which is computed as a linear combination of cosine functions, may not approximate it well. In contrast, one can expect that a response that has a smooth transition band is suitable for CPA. To verify this numerically, the approximated responses were compared among hard-shrinkage, weighted soft-shrinkage, and soft-shrinkage. In this experiment, was used. For weighted soft-shrinkage, and were used. Additionally, the thresholding value for hard-shrinkage was set to .

Figure 3 also shows the approximated shrinkage responses of (26) for various shrinkage conditions. Hard-shrinkage yields larger errors than soft ones. Empirically, hard-shrinkage requires more than the 50th-oder approximation. In contrast, soft-shrinkages only require 10–20th-order approximations. To be more specific, is recommended for a small weight shown in Fig. 3(b), whereas for the soft-shrinkage response shown in Fig. 3(d).

### Iv-B Suitability of CPA

There are many polynomial approximations. Even among them, minimax polynomial approximation [50, 46, 47, 51, 52] and least squares approximation [53] are well known as the best approximation in the sense of the minimization of the infinity norm and the least squares error w.r.t the difference between an exact and approximated responses, respectively. To derive polynomial coefficients, their optimization requires a minimization of norm represented as

 minh′(x)∈R∥h(x)−h′(x)∥p, (27)

where is an approximated shrinkage response with the above two polynomial approximations. Clearly, (27) requires the exact response for . When is precisely represented using many sampling points, exhibits good performance. However, computational complexity becomes high when many sampling points are used, especially in the case of least squares approximation. Let be the column vector of coefficients for the polynomial approximation. That is, the approximated shrinkage response can be calculated as . Additionally, let and be the column vectors composed of real values, respectively. The Vandermonde matrix is defined as

 Υ:=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣1x21⋯xα−111x22⋯xα−12⋮⋮⋱⋮1x2n⋯xα−1n⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (28)

From the above definitions, coefficients of the least squares approximation are calculated as , where is the pseudo inverse of a matrix. The calculation requires high computational cost when and/or are large. In contrast, CPA only requires the inner product of and to derive coefficients of polynomials from (4), where . Additionally, CPA performs better approximation than other optimization methods. To verify the exellent approximation, CPA was compared with minimax approximation and least squares approximation, as shown in Fig. 4. As can be seen, CPA and least squares approximations have a similar oscillation pattern. Chebyshev polynomial approximation sufficiently attenuates ripples, compared with the other methods in the stopband, as shown in the differences of Fig. 4.

## V Applications

We compared our CPA-based singular value shrinkage method with the exact and approximate singular value shrinkage methods. Specifically, we applied our method to two applications using nuclear norm relaxation, i.e., inpainting of texture images and background subtraction of videos. Additionally, we compared our CPA-based method with the existing methods, i.e., the exact partial singular value decomposition (PSVD) based method and fast singular value shrinkage methods [25, 26, 27], in Section V-F. The computation time and approximation precision were indicated for the comparisons.

### V-a Experimental Conditions

The applications were implemented with MATLAB R2015b and run on a 3.2-GHz Intel Xeon E5-2667 processor with 512-GB RAM. We compared our method with the SVD-based naive method (denoted as SVD-based method) in (13) and EVD-based methods in (15) with respect to approximation precision and computation time. Both SVD and EVD-based methods are exact singular value soft-shrinkage methods. The EVD-based method222The EVD-based method could lead to loss of computational precision compared with the SVD-based one. Though the errors may affect the performance of applications, we did not encounter such a problem in the experiments described in this paper. is usually faster than the SVD-based method and is widely used in many applications. Therefore, the computation time of only the EVD-based method is indicated for the results of the exact methods. With the SVD-based method, the SVD of an arbitrary matrix is first performed, then the obtained singular values are shrunk as , where indicates the -th largest singular value of . The EVD-based method uses the relation between singular value shrinkage and eigenvalue shrinkage: the EVD of is first computed, then the obtained eigenvalues are shrunk as , where denotes the -th largest eigenvalue of , to derive singular value shrinkage. Also, the shrinkage function in (14) is defined as the soft-shrinkage case given by from (26) for our method. The DWT [45] was used in (22) to sparsify the signals. We used Haar wavelet transform as the DWT. In the DWT, one level transform was performed and all high frequency components were set to . The selection of a transform method naturally affects the computation time of our method. Therefore, we indicate the effect of the selection in Section V-E. To indicate the approximation precision, root mean squared error (RMSE) was used, which was computed using the results of our method and those of the SVD/EVD-based methods. Furthermore, the computation times of all the methods are shown, and the average computation times of the CPA-based/exact singular value shrinkage in each iteration are also indicated. In all applications, we used the 5th, 10th, 15th, and 20th-order approximations. We also used the following optimization tools to solve the above applications.

### V-B Optimization Tools

#### V-B1 Proximity Operator

Let be the set of all proper lower semicontinuous convex functions333A function is called proper lower semicontinuous convex if , is closed in , and in and , respectively. over . The proximity operator [54] of a function of index is defined as

 proxγf:RN→RN:x↦\operatornamewithlimitsarg miny∈RNf(y)+12γ∥x−y∥2. (29)

The proximity operator plays a central role in the optimization of applications, as discussed in this section. When function is defined as the nuclear norm, i.e., , the proximity operator can be calculated by singular value shrinkage with the thresholding parameter [2]. Therefore, our CPA-based method is applied to the operator in the case of the nuclear norm.

#### V-B2 Alternating Direction Method of Multipliers

The ADMM [41] is an algorithm for solving a convex optimization problem represented as

 minx∈Rn1,z∈Rn2f(x)+g(z)s.t.z=Kx, (30)

where , and . For arbitrary , , and , the ADMM algorithm is given by

 ⎢⎢ ⎢ ⎢ ⎢ ⎢⎣xt+1:=\operatornamewithlimitsarg minxf(x)+ρ2∥zt−Kx−ut∥22zt+1:=prox1/ρg(Kxt+1+ut)ut+1:=ut+Kxt+1−zt+1. (31)

We recall a convergence analysis of the ADMM by Eskstein and Bertsekas [41].

###### Fact 1 (Convergence of the ADMM [41])

Consider Prob. (30). Assume that is invertible and that a saddle point of its unaugmented Lagrangian exists, where . Then the sequence generated using (31) converges to a solution of Prob. (30).

We used the ADMM algorithm to practically solve the following applications. In all applications, the stopping criterion444For example, in (39), which is indicated in Appendix A-A, the criterion is evaluated using . in the ADMM algorithm was set to .

### V-C Texture Image Inpainting [12, 13]

The objective with this application is to recover a missing region (as shown in the later Fig. 6(b)).

Let and be a texture image and a given image with missing regions, respectively. Then, let and be observed and missing regions and and be linear operators extracting pixels in their regions. From the notations, the missing region is represented as . The pixels surrounding with the size of five pixels, as shown in Fig. 5, are defined as . Let and be the DCT matrices in the horizontal and vertical matrix directions, i.e., these matrices transform an image to its frequency domain. Since a regular texture image is basically sparse in its frequency domain, it can be represented as , where is the coefficients on the frequency domain of . Additionally, the set of a normalized dynamic range constraint is defined as . When and are assumed to be low rank and sparse, the reconstruction problem can approximately be solved using the nuclear norm555The nuclear norm of is defined as , where . and the norm as

 minL,S,vec(L)∈D ∥L∥∗+η∥S∥1 (32) s.t.PΩ(I)=PΩ(L),L=T1ST⊤2,ave(vec(M))=ave(vec(M∂)),

where a positive real value is a regularization parameter, calculates the arithmetic average, and is the operator vectorizing a matrix. The average pixel value on the recovered region is assumed to be identical to that of its surrounding pixel values. As can be seen, (32) is composed of nuclear norm relaxation so that our method can be used for its efficient calculation. Hereafter, we discuss the validation of our method by applying the ADMM to (32) to obtain the optimal solution. In Appendix A-A, (32) is converted to the form to which the ADMM is applicable.

Eight-bit color images Bricks and Office widows666The images are available at http://www.mayang.com/textures/., as shown in Figs. 6(a) and (f), were used for the application, where the size of each color component of the images was 1920 2560. The pixel values of each color component of the images were in the range from 0 to 1. In the application, each color component was inpainted separately. The observed image with the missing regions were defined, as shown in Fig. 6(b) and (g)777In the experiment, the images are transposed to “portrait”.. The number of missing pixels in Fig. 6(b) was and that in Fig. 6(g) was . In the ADMM applicable form (see Appendix A-A), the column vectors and were initialized by all-one vectors. Additionally, for and , the thresholding parameters were set to in Bricks and in Office windows, where the parameters were determined for the fast and stable convergence of the optimization. For fast computation, parallel processing888The MATLAB function parfor, which is contained in the parallel computing toolbox, was used for the parallel computing only in the image inpainting method. was performed in the application of the color components.

Figure 6 shows the results of image inpainting with the 20th-order approximation. The resulting image recovered using our method was practically equivalent to that with the SVD-based method by comparing Figs. 6(c), (d), (h), and (i). In Figs. 6(e) and (j), it is clear that the exact and approximated solutions had little differences, which visually indicates the high approximation precision of our method.

Table I lists the computation time and RMSE comparisons. Our method was faster than the EVD-based method while maintaining reconstruction performance. Regarding the total computation times of Bricks and Office windows, our method with the 5th-order and 10th-order approximations was slower than with the 15th-order approximations. This is because our method with the low-order approximations did not converge well due to the low approximate precision.

### V-D Background Modeling of Video[14, 15, 16, 17, 18]

The objective with this application is to divide a video sequence into background and object sequences (as shown in Fig. 7(a) and (b)).

Let be the -th frame of a video sequence. The sequence is rearranged into a matrix as

 I:=[vec(I(1)) vec(I(2)) … vec(I(K))]. (33)

Then, let and be the background sequence and sequence of moving objects of a video. In , pixel values corresponding to are zero and vice versa. The background and moving objects can be assumed to be low rank and sparse; hence, the background modeling is solved as the following convex optimization problem:

 minL,S ∥L∥∗+η∥S∥1s.t.I=L+S. (34)

Problem (34) is also solved using the ADMM.

In the background modeling, an eight-bit grayscale video Laboratory999This video was recorded with our video camera. It was downsampled and transformed into grayscale for the experiment. was used. The was the -th frame of the video in ; hence, the matrix of the sequence was . The pixel values of the video were in the range from 0 to 1. In the ADMM applicable form (see Appendix A-B), the column vectors , , and were initialized by all-one vectors.

Additionally, the thresholding parameters were set to , where the parameters were determined for the fast and stable convergence of the optimization. For comparison, low rank and sparse components in the 160th frame are shown in Fig. 7 with the 20th-order approximation.

Our method effectively decomposed the video sequences to low rank and sparse sequences, as shown in Figs. 7(a) and (b). They are almost equivalent to those with the EVD-based method; hence, the difference between the low rank images are not displayed even though the difference is amplified (bottom left of Fig. 7).

Table II summarizes the computation times and RMSEs between the background modeling of our method and that of the EVD-based method101010The SVD-based method ran out of memory in our machine so that the results of the EVD-based method were used for the RMSEs.. Our method was sufficiently faster than the EVD-based method in all approximation orders.

This is because the reduction in computational complexity by thresholding the transformed coefficients described in (21) is effective for our method to have low computational complexity while retaining high approximation precision. However, our method with the 5th-order approximations took more time than that with 10th-order approximation because it did not converge well due to its low approximate precision.

### V-E Effects of Selections: Transform Matrix and Thresholding Value

The selections of a transform matrix in (22) and a thresholding value in (20) affect computation time and size of approximation error. In this subsection, we indicate these effects experimentally by using the image inpainting method for Bricks. In all experiments, the 15th-order approximation was used for our method.

#### V-E1 Effects of Selected Transform Matrix

We compared the DWT with the DCT and the block diagonal forms of the DCT (block DCT) [32] whose block size was for indicating the differences among chosen transform matrices in (22). The threshold values in (20) were fixed to for the DCTs. The other experimental conditions were the same as those discussed in Section V-C.

The results of the proposed method in Table III show the performance comparisons. Our method with the DWT is as fast as that with the block DCT in the total computation time, though the singular value shrinkage with our method with the DWT takes more time than the others. This is because the maximum iteration of our method with the DWT is only 83, whereas those of our method with the block DCT and DCT are 101 and 97, respectively. Therefore, the DWT leads our method to be stable convergence. However, our method with the DWT indicates a higher RMSE than the others since all the high frequency components were removed in it. The fact is not fatal problem because the images derived by using our method with the DWT were very similar to that of the exact one as shown in the previous section. Additionally, the block DCT can substantially sparsify an image compared to the DWT and DCT. Therefore, our method with the block DCT is faster than the others. In spite of fast computation, the results with the block DCT shows an RMSE as low as that of the DCT.

#### V-E2 Effects of Thresholding Value

When is an excessively large value, our method becomes fast but matrix rank minimization cannot converge well due to the errors w.r.t. the reduction in the number of components. For achieving the fast computation and stable convergence, we present a recommended guideline on the thresholding values.

Let be an error at the -th iteration of the ADMM, i.e., from Appendix A-A and be a function that returns the -th largest element in a matrix , where is used as the threshold. Basically, when is a large value, a small does not have any problem to decrease the error. In contrast, when is a small value, i.e., the optimization almost converges, should be large for a stable convergence. For this purpose, we recommend the thresholding percentage for as

 ε(Et)=⎧⎪ ⎪⎨⎪ ⎪⎩K(|Φ––|,(0.5×10−4)n2)if Et>Eℓ,K(|Φ––|,(1.5×10−4)n2)if Eℓ≥Et>Em,K(|Φ––|,(0.5×10−3)n2)otherwise, (35)

where and . This was determined experimentally. Recall that the size of is . To verify (35), it was compared with fixed threshold. Four values , , , and were used for this comparison, where means that all components of a matrix are retained. Additionally, the DCT was exploited for the sparsifying matrix.

The results are listed in Table IV. The adaptive method was faster than the fixed method. Our method with was the fastest but it did not converge well for the large size of approximation errors, where is considered as the limitation of the thresholding percentage in our method. In addition, the singular value shrinkage of our CPA-based method by thresholding the matrix components was about five times faster than that by maintaining those.

### V-F Comparison with Existing Methods

As previously mentioned, there are several fast singular value shrinkage methods [25, 26, 27]. To illustrate the advantage of our method, we compared it with the singular value shrinkage by using the exact PSVD, fast randomized singular value shrinkage (FRSVS) [25], singular value shrinkage by using the Nyström method (NSVS) [26], and the fast singular value shrinkage without the exact SVD (FSVS) [27]. The experiments were conducted using the image inpainting method for Bricks and a synthetic data. The synthetic data is constructed as a block diagonal matrix whose number of the main diagonal blocks is equal to its rank. Let be the matrix form of the synthetic data with rank , and this is defined as , where , and . In the experiment, 10, 100, 200, and 500 were used, and of elements in were randomly replaced with zero, where . To restore the corrupted data, the optimization problem in (32) was solved without using the average term, i.e., , because the matrices and can hardly be defined for the random corruption. The function svdsechon11 was used for carrying out the exact PSVD, and the 500 largest singular values were calculated for Bricks. All preferences of the FRSVS [25] was determined in the original code121212Available at http://thohkaistackr.wixsite.com/page/projectfrsvt provided by the authors. In the FRSVS, the 550 largest singular values were approximately derived for Bricks, whose number of singular values was experimentally determined for carrying out precise and fast singular value shrinkage. In addition, the 200 largest singular values were derived for the synthetic data in the exact PSVD and the FRSVS. The partial singular values derived in the exact PSVD and the FRSVS were soft-thresholded: The th partial singular value is shrunk to . Since the Nyström method requires a square matrix to derive partial eigenvalues, it was applied to , as in the EVD-based method. The 500 and 200 largest eigenvalues were calculated for the Bricks and the synthetic data, respectively. Those eigenvalues were then shrunk in the same way as with the EVD-based method, whose number of calculated eigenvalues was experimentally determined from the same reason as the FRSVS. All preferences used in the FSVS were directly used as suggested in [27].

The results for Bricks are indicated in Table III and Fig. 8. Note that the experiments of the existing methods were stopped at the 80th iterations since these methods did not converge. The concept of the FSVS is similar to our method, but it requires longer computation time as shown in Table III and does not converge. For the FRSVS and NSVS, although their average computation times are slightly less than ours, they result in much larger errors. This would be because many singular values or eigenvalues above the threshold were reduced to zero, so that the exact PSVD, FRSVS, and NSVS produced the large errors in each iteration leading to unstable convergence. In contrast, our method is stable and does not affect the convergence of the optimization method because the CPA-based method can shrink the entire singular values.

Figure 9 shows the comparison of errors in the case of the synthetic data in several conditions according to the reduction rate of data elements and the matrix rank. The optimization methods using the existing methods do not converge well when the matrix rank is 500. This is because many singular values above

are discarded, i.e., enormous computation errors are produced in each iteration. From the results, the matrix rank of target data should be estimated beforehand, and then the numbers of partial singular values and vectors should be estimated to be larger than the matrix rank, in order to make the optimization method converged. In contrast, our method can lead the optimization method to stable convergence. It certainly generates some approximation errors, but it can process all singular values, which means that most singular values above

are remained.

## Vi Conclusion

We proposed a fast thresholding method of singular values without computing singular values and vectors. The key tool of the proposed method is CPA. From CPA characteristics, singular value shrinkage could be computed by a multiplication of matrices. The proposed method was further accelerated using the sparsity of a signal, where the frequency transform was used for obtaining sparse coefficients. Moreover, we studied the approximation order for reducing the size of approximation errors. The experimental results revealed that our method was much faster than the exact methods with high approximation precision in the case of a large data size. In addition, our method can lead the optimization method to be stable convergence in comparison of the existing fast singular value shrinkage methods because of its approximation precision.

## Appendix A ADMM Applicable Forms

### A-a Texture Image Inpainting

Let , , , and . The 2-D DCT matrix is represented as , and the matrix form of and are defined as and . In addition, the indicator functions of the sets