Efficient ADMM-based Algorithms for Convolutional Sparse Coding

by   Farshad G. Veshki, et al.

Convolutional sparse coding improves on the standard sparse approximation by incorporating a global shift-invariant model. The most efficient convolutional sparse coding methods are based on the alternating direction method of multipliers and the convolution theorem. The only major difference between these methods is how they approach a convolutional least-squares fitting subproblem. This letter presents a solution to this subproblem, which improves the efficiency of the state-of-the-art algorithms. We also use the same approach for developing an efficient convolutional dictionary learning method. Furthermore, we propose a novel algorithm for convolutional sparse coding with a constraint on the approximation error.



There are no comments yet.


page 1

page 2

page 3

page 4


Scalable Online Convolutional Sparse Coding

Convolutional sparse coding (CSC) improves sparse coding by learning a s...

Distributed Convolutional Sparse Coding

We consider the problem of building shift-invariant representations for ...

Globally Variance-Constrained Sparse Representation for Image Set Compression

Sparse representation presents an efficient approach to approximately re...

Sparse and silent coding in neural circuits

Sparse coding algorithms are about finding a linear basis in which signa...

A Local Block Coordinate Descent Algorithm for the Convolutional Sparse Coding Model

The Convolutional Sparse Coding (CSC) model has recently gained consider...

Optimization Methods for Convolutional Sparse Coding

Sparse and convolutional constraints form a natural prior for many optim...

Convolutional Dictionary Learning

Convolutional sparse representations are a form of sparse representation...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Sparse representations are widely used in various applications of signal and image processing [face_rec2009, sig_rec2007, im_den20016, im_sr2010, hyper2012, 5G2015, my_mffusion, my_mmfusion]. The sparse synthesis model admits that natural signals can be approximated using a linear combination of only a small number of atoms (columns) of a dictionary (matrix). A common formulation of the sparse coding problem is given as


where , , is the dictionary,

is the vector of sparse coefficients, and

is the signal. Moreover, is the upper bound on the energy of the approximation error and represents a function that measures the level of sparsity of a vector, for example, the number of nonzero elements (denoted by ) or its convex relaxation the norm (denoted by ). The problem of finding sparsity promoting dictionaries is called dictionary learning [MOD1999, KSVD2006].

The applications of sparse representations and dictionary learning usually involve either or both extraction and estimation of local features. Typically, this is handled by a prior decomposition of the original signal into vectorized overlapping blocks (

e.g., patches in image processing). As a drawback, this strategy results in multi-valued representations, so that each point in the signal is estimated multiple times. Moreover, since the relationships among neighboring blocks are ignored, dictionaries learned using this approach tend to contain shifted versions of the same features.

Convolutional sparse coding (CSC) incorporates a single-valued and shift-invariant model that represents the entire signal. In this model, the product in the standard sparse coding problem is replaced by a sum of convolutions. The convolutional form of the standard sparse coding problem (1) can be written as follows


where denotes the convolution operator (usually, with a “same

padding), and

and , are the sparse coefficient maps and the dictionary filters, respectively. Several applications have shown that the CSC model performs better in handling natural signals, such as audio and images, in comparison with its standard version [piano2017, music2016, CT2019, fusion2016, RGB2018, superres2015, rain2018, astro2021].

A majority of available CSC algorithms, including [Bristow2013, Heide2015, Wohlberg2016, Peng2019, Choudhury2017, Papyan2017, Otero2020, Papyan20172, Wang2018], are based on the alternating direction method of multipliers (ADMM) framework [ADMM2011]. ADMM breaks the CSC problem into two main sub-problems, one of which is a sparse approximation problem which is efficiently addressed using hard-thresholding (when ) or a shrinkage operator (when ), and the other entails a convolutional least-squares regression. An efficient solution to the second sub-problem based on the convolution theorem and the Sherman-Morrison formula is given in [Wohlberg2016]. CSC problem (2) is typically addressed by solving its unconstrained equivalent, which is written as


where is a Lagrange multiplier. It is known that there is a unique for each . However, the appropriate value of also depends on and . Thus, despite being more convenient to solve, the unconstrained reformulation introduces data dependency to the CSC algorithm.

A common approach for convolutional dictionary learning (CDL) entails optimizing the filters and the sparse coefficient maps using a batch of training signals [Heide2015, Choudhury2017, Wohlberg2016, Peng2019]. This problem can be formulated as follows


where . The CDL problem is usually addressed by alternating optimization with respect to and [Bristow2013, Heide2015, Wohlberg2016]. Several works have shown that solving (4) with respect to

can be also done effectively and efficiently using ADMM in frequency domain 


This paper presents a direct method for solving the convolutional least-squares regression which yields a constant improvement on the complexity of the available CSC algorithms. The same method can be used to improve the efficiency of existing CDL methods. Additionally, we propose an efficient CSC algorithm with a constraint on the energy of the approximation residuals using our solution to the unconstrained CSC problem. MATLAB implementations of the proposed algorithms are available at GitHub repository [GithubRep].

Throughout the paper, we use to denote the (non-conjugate) transpose operator. represents complex-conjugate of complex number,

denotes the discrete Fourier transform of a signal, and

denotes the solution to an optimization problem. Moreover, we use and to denote element-wise multiplication and element-wise division operators, respectively.

Ii Proposed Algorithms

Ii-a Unconstrained CSC

In this work, we consider the convex formulation of CSC problem, i.e., we use . Using variable splitting, problem (3) in ADMM form can be reformulated as [ADMM2011],


The augmented Lagrangian corresponding to (5) is written as


where is the penalty parameter and are Lagrangian multipliers. Defining , the scaled-form ADMM iterations are expressed as


The second subproblem (-update step) can be addressed in an element-wise manner using a shrinkage (soft-thresholding) operator. The solution is written as


with the shrinkage operator defined as follows


The only challenging step is solving the first subproblem (-update step). In a general form, this step entails solving the optimization problem


Using the convolution theorem, problem (10) in Fourier domain can be written as


Note that the filters are zero-padded to the size of before performing the discrete Fourier transform. Denoting


with , problem (11) can be addressed as independent problems:


Equating the derivative with respect to to zero, we have


which gives




the solution to the -update step based on (15) can be written as


Computational Complexity

The available ADMM-based CSC algorithms usually address the -update step by computing the following


which can be inferred from the second line of (14). Solving problem (18) using direct matrix inversion results in a time complexity of [Bristow2013]. However, the work of [Wohlberg2016] demonstrated that this can be reduced to using the Sherman-Morrison formula. The time complexity of the proposed method is also of . However, using further simplifications, the proposed approach eliminates the need for explicit matrix inversion and requires fewer computations. In particular, performing the -update step on a batch of images using the proposed method requires flops, while it takes flops using the method of [Wohlberg2016], indicating a considerable improvement provided by our method.

Ii-B Constrained CSC

The ADMM formulation of the constrained CSC problem (2) is given as


where is an indicator function of the constraint set in (3), that is,




The ADMM iterations are


The -update step requires solving the following optimization problem


Depending on , problem (23) either has a trivial solution or it is equivalent to an equality-constrained optimization problem. This can be expressed as


Using a suitable Lagrange multiplier , the problem in the second term of (24) can be reformulated as


which has the same form as problem (10). Finding the solution of (25) using (17) and plugging it into (21) gives


where the division by is required by Parseval’s theorem. Thus, problem (23) is simplified to a single-variable optimization problem for finding the optimal multiplier , which satisfies


Considering that is monotonically increasing in , this problem can be efficiently addressed, for example, using the secant method. Once is known, the -update can be performed as


where and are calculated using (16).

Ii-C Dictionary Update

Addressing CDL optimization problem (4) over is equivalent to solving the following optimization problem


where is an indicator function associated with the constraint set in (4). Problem (29) can be efficiently addressed using the consensus ADMM method [cardona2018]. The consensus ADMM formulation of problem (29) is given as


with the ADMM iterations


The first subproblem (-update) is similar to problem (10). Thus, it can be efficiently addressed using the proposed approach in Section II-A. The use of the Fourier domain-based approach requires to be the same size as . As a result, the filters are zero-padded to the size of to be conformable with . The second subproblem (-update) can be solved simply by projecting on the constraint set by mapping the entries outside the constraint support to zero before normalizing the norm.

Ii-D CDL Algorithm

CDL problem (4) is addressed by alternating between CSC (see Section II-A) and dictionary update (see Section II-C) subproblems. We use a single iteration for each subproblem. This approach has been shown to be effective while simplifying the algorithm [Wohlberg2016, cardona2018]. We also use the variable coupling approach suggested in [cardona2017] which is shown to provide a better numerical stability [Wohlberg2016, cardona2018]. Specifically, the sparse codes and the constrained filters are passed to the next subproblem.

Iii Experimental Results

In this section, we first compare the proposed unconstrained CSC algorithm with the state-of-the-art method, which uses the Sherman-Morrison formula in convolutional fitting step (the SM method) [Wohlberg2016]. Then, we compare our unconstrained and constrained CSC methods in terms of convergence speed. Finally, we compare the proposed CDL algorithm with three available methods. All methods are based on the same alternating approach explained in Section II-D and use ADMM in both phases (CSC and dictionary update). All compared methods use the SM method in CSC phase. The compared dictionary learning methods are based on the conjugate gradient method (CG) [Wohlberg2016], the iterative Sherman-Morrison method (ISM) [Wohlberg2016] and a method based on the consensus ADMM framework and the Sherman-Morrison formula (SM-cns) [cardona2018].

A greyscale Lena image is used in the CSC experiments. The CDL experiments are performed using a dataset of 20 images taken from the USC-SIPI database [SIPIdatabase]. All images in the dataset are converted to greyscale and resized to pixels. All methods are implemented using MATLAB. All experiments are conducted on a PC equipped with an Intel(R) Core(TM) i5-8365U 1.60GHz CPU.

Iii-a CSC Results

Fig. 1 shows the functional values over time for 25 iterations of the proposed unconstrained CSC method and the SM method using different values of and . We use a fixed number of iterations to display the deference in efficiencies (the iterations of the two methods are equally effective). As it can be seen, the proposed method is significantly more efficient in all cases. The algorithm complexities have been compared in Section II-A.

Fig. 1: Functional values over time for the proposed unconstrained CSC method and the SM method using (a) different values of for , and (b) different values of for . A dictionary of filters of size is used in both cases.

The proposed constrained and unconstrained CSC methods are compared in Fig. 2. Specifically, we executed the unconstrained CSC method using , then we used the observed quadratic functional value () to run our constrained CSC method, while keeping the rest of the parameters unchanged. As it can be seen, the quadratic and the norm functionals converge to the same values for both CSC methods. The constrained method results in a longer runtime, which accounts for optimization with respect to in each iteration.

Fig. 2: The quadratic and norm functional values for the proposed unconstrained and constrained CSC methods using (), . A dictionary of filters of size is used.

Iii-B CDL Results

In Fig. 3, the functional values over time for 50 iterations of all CDL methods using different dataset sizes () are compared. The complexity of the ISM method is of , which makes it inefficient when is large. CG improves scalability, but slows down the convergence. The complexities of the proposed method and SM-cns are both of , while their iterations are equally effective. However, as it can be seen, the proposed method is substantially faster. This is achieved by using the method explained in Section II-A instead of the Sherman-Morrison formula, in both the -update step (CSC phase) and the -update step (dictionary update phase).

Fig. 3: Functional values over time using different values of , , , filters of size .

In Fig. 4 the convergence speeds of the proposed CDL method and SM-cns using different dictionary sizes () are compared. The improved computational efficiency of the proposed method can be clearly observed.

Fig. 4: Functional values over time using different values, , , and filters of size .

Iv Conclusion

An efficient solution for the convolutional least-squares fitting problem has been presented. The proposed method has been used to substantially improve the efficiency of the state-of-the-art convolutional sparse coding and dictionary learning algorithms. In addition, a novel method for convolutional sparse approximation with a constraint on the approximation error has been proposed.