1 Introduction
Convolutional sparse coding (CSC) (Zeiler et al., 2010) has been successfully used in image processing (Gu et al., 2015; Heide et al., 2015) signal processing (Cogliati et al., 2016), and biomedical applications (Pachitariu et al., 2013; Andilla & Hamprecht, 2014; Chang et al., 2017; Jas et al., 2017; Peter et al., 2017). It is closely related to sparse coding (Aharon et al., 2006), but CSC is advantageous in that its shiftinvariant dictionary can capture shifted local patterns common in signals and images. Each data sample is then represented by the sum of a set of filters from the dictionary convolved with the corresponding codes.
Traditional CSC algorithms operate in the batch mode (Kavukcuoglu et al., 2010; Zeiler et al., 2010; Bristow et al., 2013; Heide et al., 2015; Šorel & Šroubek, 2016; Wohlberg, 2016; Papyan et al., 2017), which take time and space (where is the number of samples, is the number of filters, and is the dimensionality. Recently, a number of online CSC algorithms have been proposed for better scalability (Degraux et al., 2017; Liu et al., 2017; Wang et al., 2018). As data samples arrive, relevant information is compressed into small history statistics, and the model is incrementally updated. In particular, the stateoftheart OCSC algorithm (Wang et al., 2018) has the much smaller time and space complexities of and , respectively.
However, the complexities of OCSC still depend quadratically on , and cannot be used with a large number of filters. The number of local patterns that can be captured is thus limited, and may lead to inferior performance especially on higherdimensional data sets. Besides, the use of more filters also leads to a larger number of expensive convolution operations. Rigamonti et al. (2013) and Sironi et al. (2015) proposed to postprocess the learned filters by approximating them with separable filters, making the convolutions less expensive. However, as learning and postprocessing are then two independent procedures, the resultant filters may not be optimal. Moreover, these separate filters cannot be updated online with the arrival of new samples.
Another direction to scale up CSC is via distributed computation (Bertsekas & Tsitsiklis, 1997). By distributing the data and workload onto multiple machines, the recent consensus CSC algorithm (Choudhury et al., 2017) can handle large, higherdimensional data sets such as videos, multispectral images and light field images. However, the heavy computational demands of the CSC problem are only shared over the computing platform, but not fundamentally reduced.
In this paper, we propose to approximate the possibly large number of filters by a sampledependent combination of a small set of base filters learned from the data. While the standard CSC dictionary is shared by all samples, we propose each sample to have its own “personal” dictionary to compensate for the reduced flexibility of using these base filters. In this way, the representation power can remain the same but with a reduced number of parameters. Computationally, this structure also allows efficient online learning algorithms to be developed. Specifically, the base filter can be updated by the alternating direction method of multipliers (ADMM) (Boyd et al., 2011), while the codes and combination weights can be learned by accelerated proximal algorithms (Yao et al., 2017). Extensive experimental results on a variety of data sets show that the proposed algorithm is more efficient in both time and space, and outperforms existing batch, online and distributed CSC algorithms.
The rest of the paper is organized as follows. Section 2 briefly reviews convolutional sparse coding. Section 3 describes the proposed algorithm. Experimental results are presented in Section 4, and the last section gives some concluding remarks.
Notations
: For a vector
, its th element is , norm is and norm is . The convolution of two vectors and is denoted . For a matrix , is its complex conjugate, and its conjugate transpose. The Hadamard product of two matrices and is. The identity matrix is denoted
.is the Fourier transform that maps
from the spatial domain to the frequency domain, while is the inverse operator which maps back to .2 Review: Convolutional Sparse Coding
Given samples in , CSC learns a shiftinvariant dictionary , with the columns representing the filters. Each sample is encoded as , with the th column being the code convolved with filter . The dictionary and codes are learned together by solving the optimization problem:
(1) 
where the first term measures the signal reconstruction error, ensures that the filters are normalized, and is a regularization parameter controlling the sparsity of ’s.
Convolution in (1) is performed in the spatial domain. This takes time, and is expensive. In contrast, recent CSC methods perform convolution in the frequency domain, which takes time (Mallat, 1999) and is faster for typical choices of and . Let , , and be the Fouriertransformed counterparts of and . The codes and filters are updated in an alternating manner by block coordinate descent, as:
Update Codes: Given , each is independently obtained as
(2)  
s.t. 
where is an auxiliary variable.
Update Dictionary: is updated by solving
s.t.  
where is an auxiliary variable, and crops the extra dimensions in .
Both (2) and (2) can be solved by the alternating direction method of multipliers (ADMM) (Boyd et al., 2011). Subsequently, and can be transformed back to the spatial domain as and . Note that while ’s (in the spatial domain) are sparse, the FFTtransformed ’s are not.
On inference, given the learned dictionary , the testing sample is reconstructed as , where is the obtained code.
2.1 PostProcessing for Separable Filters
Filters obtained by CSC are nonseparable and subsequent convolutions may be slow. To speed this up, they can be postprocessed and approximated by separable filters (Rigamonti et al., 2013; Sironi et al., 2015). Specifically, the learned is approximated by , where contains rank1 base filters , and contains the combination weights. However, this often leads to performance deterioration.
2.2 Online CSC
An online CSC algorithm (OCSC) is recently proposed in (Wang et al., 2018). Given the Fouriertransformed sample and dictionary from the last iteration, the corresponding are obtained as in (2). The following Proposition updates and by reformulating (2) for use with smaller history statistics.
Proposition 1 ((Wang et al., 2018)).
can be obtained by solving the optimization problem:
s.t.  
where , and .
Problem (1) can be solved by ADMM. The total space complexity is only , which is independent of . Moreover, and can be updated incrementally.
Two other online CSC reformulations have also been proposed recently. Degraux et al. (2017) performs convolution in the spatial domain, and is slow. Liu et al. (2017) performs convolution in the frequency domain, but requires expensive huge sparse matrix operations.
3 Online CSC with SampleDependent Dictionary
Though OCSC scales well with sample size , its space complexity still depends on quadratically. This limits the number of filters that can be used and can impact performance. Motivated by the ideas of separable filters in Section 2.1, we enable learning with more filters by approximating the filters with base filters, where . In contrast to the separable filters, which are obtained by postprocessing and may not be optimal, we propose to learn the dictionary directly during signal reconstruction. Moreover, filters in the dictionary are combined from the base filters in a sampledependent manner.
3.1 Problem Formulation
Recall that each in (1) is represented by . Let , with columns being the base filters. We propose to represent as:
(5) 
where is the matrix for the filter combination weights. In other words, each in (1) is replaced by , or equivalently,
(6) 
which is sampledependent. As will be seen, this allows the ’s to be learned independently (Section 3.3). This also leads to more sampledependent patterns being captured and thus better performance (Section 4.4).
Sampledependent filters have been recently studied in convolutional neural networks (CNN)
(Jia et al., 2016). Empirically, this outperforms standard CNNs in oneshot learning (Bertinetto et al., 2016), video prediction (Jia et al., 2016) and image deblurring (Kang et al., 2017). Jia et al. (2016) uses a specially designed neural network to learn the filters, and does not consider the CSC model. In contrast, the sampledependent filters here are integrated into CSC.3.2 Learning
Plugging (6) into the CSC formulation in (1), we obtain
(7)  
s.t.  (9)  
where
and . As and are coupled together in (9), this makes the optimization problem difficult. The following Proposition decouples and . All the proofs are in the Appendix.
Proposition 2.
For , we have if (i) , or (ii) .
To simplify notations, we use to denote or . By imposing either one of the above structures on , we have the following optimization problem:
s.t. 
On inference with sample , the corresponding can be obtained by solving (3.2) with the learned fixed.
3.3 Online Learning Algorithm for (3.2)
As in Section 2.2, we propose an online algorithm for better scalability. At the th iteration, consider
s.t. 
Let , where
is zeropadded to be
dimensional. Note that the number of convolutions can be reduced from to by rewriting the summation above as . The following Proposition rewrites (3.3) and performs convolutions in the frequency domain.Proposition 3.
The spatialdomain base filters can be recovered from as .
3.3.1 Obtaining
From (3), can be obtained by solving the subproblem:
s.t.  
where is an auxiliary variable. This is of the same form as (2). Hence, analogous to (1), can be obtained as:
s.t.  
where , and . They can be incrementally updated as
(14)  
(15) 
space  code update time  filter update time  

OCSC (Wang et al., 2018)  
OCDLDegraux (Degraux et al., 2017)  
OCDLLiu (Liu et al., 2017)  
CCSC (Choudhury et al., 2017)  
SCSC 
3.3.2 Obtaining and
With the arrival of , we fix the base filters to learned at the last iteration, and obtain from (3) as:
(16) 
where is the indicator function on (i.e., if and otherwise).
As in the CSC literature, it can be shown that ADMM can also be used to solve (16). While CSC’s code update subproblem in (2) is convex, problem (16) is nonconvex and existing convergence results for ADMM (Wang et al., 2015) do not apply.
In this paper, we will instead use the nonconvex and inexact accelerated proximal gradient (niAPG) algorithm (Yao et al., 2017). This is a recent proximal algorithm for nonconvex problems. As the regularizers on and in (16) are independent, the proximal step w.r.t. the two blocks can be performed separately as: (Parikh & Boyd, 2014). As shown in (Parikh & Boyd, 2014), these individual proximal steps can be easily computed (for or ).
3.3.3 Complete Algorithm
The whole procedure, which will be called “Sampledependent Convolutional Sparse Coding (SCSC)”, is shown in Algorithm 1. Its space complexity, which is dominated by and , is . Its periteration time complexity is , where the term is due to gradient computation, and is due to FFT/inverse FFT. Table 1 compares its complexities with those of the other online and distributed CSC algorithms. As can be seen, SCSC has much lower time and space complexities as .
4 Experiments
Experiments are performed on a number of data sets (Table 2). Fruit and City are two small image data sets that have been commonly used in the CSC literature (Zeiler et al., 2010; Bristow et al., 2013; Heide et al., 2015; Papyan et al., 2017). We use the default training and testing splits provided in (Bristow et al., 2013). The images are preprocessed as in (Zeiler et al., 2010; Heide et al., 2015; Wang et al., 2018), which includes conversion to grayscale, feature standardization, local contrast normalization and edge tapering. These two data sets are small. In some experiments, we will also use two larger data sets, CIFAR10 (Krizhevsky & Hinton, 2009) and Flower (Nilsback & Zisserman, 2008). Following (Heide et al., 2015; Choudhury et al., 2017; Papyan et al., 2017; Wang et al., 2018), we set the filter size as , and the regularization parameter in (1) as 1.
size  #training  #testing  
Fruit  100100  10  4 
City  100100  10  4 
CIFAR10  3232  50,000  10,000 
Flower  500500  2,040  6,149 
To evaluate efficacy of the learned dictionary, we will mainly consider the task of image reconstruction as in (Aharon et al., 2006; Heide et al., 2015; Sironi et al., 2015). The reconstructed image quality is evaluated by the testing peak signaltonoise ratio (Papyan et al., 2017): , where is the reconstruction of from test set . The experiment is repeated five times with different dictionary initializations.
4.1 Choice of versus
First, we study the choice of in Proposition 2. We compare SCSCL1, which uses , with SCSCL2, which uses . Experiments are performed on Fruit and City. As in (Heide et al., 2015; Papyan et al., 2017; Wang et al., 2018), the number of filters is set to 100. Recall the space complexity results in Table 1, we define the compression ratio of SCSC relative to OCSC (using the same ) as . We vary in . The corresponding CR is .
Results are shown in Figure 1. As can be seen, SCSCL1 is much inferior. Figure 2(a) shows the weight obtained with and by SCSCL1 on a test sample from City (results on the other data sets are similar). As can be seen, most of its entries are zero because of the sparsity induced by the norm. The expressive power is severely limited as typically only one base filter is used to approximate the original filter. On the other hand, the learned by SCSCL2 is dense and has more nonzero entries (Figure 2(b)). In the sequel, we will only focus on SCSCL2, which will be simply denoted as SCSC.
4.2 SampleDependent Dictionary
In this experiment, we set , and compare SCSC with the following algorithms that use sampleindependent dictionaries: (i) SCSC (shared): This is a SCSC variant in which all ’s in (5
) are the same. Its optimization is based on alternating minimization. (ii) Separable filters learned by tensor decomposition (SEPTD)
(Sironi et al., 2015), which is based on postprocessing the (shared) dictionary learned by OCSC as reviewed in Section 2.1; (iii) OCSC (Wang et al., 2018): the stateoftheart online CSC algorithm.Results are shown in Figure 3. As can be seen, SCSC always outperforms SCSC(shared) and SEPTD, and outperforms OCSC when (corresponding to ) or above. This demonstrates the advantage of using a sampledependent dictionary.
Next, we compare against OCSC with finetuned filters, which are also sampledependent. Specifically, given test sample , we first obtain its code from (2) with the learned dictionary , and then finetune by solving (2) using the newly computed . As in (Donahue et al., 2014), this is repeated for a few iterations.^{1}^{1}1In the experiments, we stop after five iterations. We set OCSC’s to be equal to SCSC’s , so that the two methods take the same space (Table 1). The used in SCSC is still 100. Results are shown in Figure 4. As can be seen, though finetuning improves the performance of OCSC slightly, this approach of generating sampledependent filters is still much worse than SCSC.
4.3 Learning with More Filters
Recall that SCSC allows the use of more filters (i.e., a larger ) because of its lower time and space complexities. In this Section, we demonstrate that this can lead to better performance. We compare SCSC with two most recent batch and online CSC methods, namely, slicebased CSC (SBCSC) (Papyan et al., 2017) and OCSC. For SCSC, we set for Fruit and City, and for CIFAR10 and Flower.
Figure 5 shows the testing PSNR’s at different ’s. As can be seen, a larger consistently leads to better performance for all methods. SCSC allows the use of a larger because of its much smaller memory footprint. For example, on CIFAR10, at ; on Flower, at .
4.4 Comparison with the StateoftheArt
First, we perform experiments on the two smaller data sets of Fruit and City, with . We set (i.e., ) for SCSC. This is compared with the batch CSC algorithms, including (i) deconvolution network (DeconvNet) (Zeiler et al., 2010), (ii) fast CSC (FCSC) (Bristow et al., 2013), (iii) fast and flexible CSC (FFCSC) (Heide et al., 2015), (iv) convolutional basis pursuit denoising (CBPDN) (Wohlberg, 2016), (v) the CONSENSUS algorithm (Šorel & Šroubek, 2016), and (vi) slicebased CSC (SBCSC) (Papyan et al., 2017). We also compare with the online CSC algorithms, including (vii) OCSC (Wang et al., 2018), (viii) OCDLDegraux (Degraux et al., 2017), and (ix) OCDLLiu (Liu et al., 2017).
Figure 6 shows convergence of the testing PSNR with clock time. As also demonstrated in (Degraux et al., 2017; Liu et al., 2017; Wang et al., 2018), online CSC methods converge faster and have better PSNR than batch CSC methods. Among the online methods, SCSC has comparable PSNR as OCSC, but is faster and requires much less storage ().
Next, we perform experiments on the two large data sets, CIFAR10 and Flower. All the batch CSC algorithms and two online CSC algorithms, OCDLDegraux and OCDLLiu, cannot handle such large data sets. Hence, we will only compare SCSC with OCSC. On CIFAR10, we set , and the corresponding CR for SCSC is 100. On Flower, is still 300 for SCSC. However, OCSC can only use because of its much larger memory footprint. Figure 7 shows convergence of the testing PSNR. In both cases, SCSC significantly outperforms OCSC.
4.5 HigherDimensional Data
In this section, we perform experiments on data sets with dimensionalities larger than two. To alleviate the large memory problem, Choudhury et al. (2017) proposed the use of distributed algorithms. Here, we show that SCSC can effectively handle these data sets using one single machine.
Experiments are performed on three data sets (Table 3) in (Choudhury et al., 2017). The Video data set contains image subsequences recorded in an airport (Li et al., 2004). The length of each video is 7, and each image frame is of size . The Multispectral data contains patches from multispectral images (covering 31 wavelengths) of realworld objects and materials (Yasuma et al., 2010). The Light field data contains patches of light field images on objects and scenes (Kalantari et al., 2016). For each pixel, the light rays are from different directions. Following (Choudhury et al., 2017), we set the filter size to for Video, for Multispectral, and for Light field.
size  #training  #testing  

Video  1001007  573  143 
Multispectral  606031  2,200  1,000 
Light field  606088  7,700  385 
We compare SCSC with OCSC and the concensus CSC (CCSC) (Choudhury et al., 2017) algorithms, with . For fair comparison, only one machine is used for all methods. We do not compare with the batch methods and the two online methods (OCDLDegraux and OCDLLiu) as they are not scalable (as already shown in Section 4.4).
Because of the small memory footprint of SCSC, we run it on a GTX 1080 Ti GPU in this experiment. OCSC is also run on GPU for Video. However, OCSC can only run on CPU for Multispectral and Light field. CCSC, which needs to access all the samples and codes during processing, can only be on CPU.^{2}^{2}2For Video, the memory used (in GB) by CCSC, OCSC, SCSC (with ) and SCSC (with ) are 28.73, 7.58, 2.66, and 2.87, respectively. On Multispectral, they are 28.26, 11.09, 0.73 and 0.76; on Light field, they are 29.79, 15.94, 7.26 and 8.88, respectively.
Results are shown in Table 4. Note that SCSC is the only method that can handle the whole of Video, Multispectral and Light field data sets on a single machine. In comparison, CCSC can only handle a maximum of 30 Video samples, 40 Multispectral samples, and 35 Light field samples. OCSC can handle the whole of Video and Multispectral, but cannot converge in 2 days when the whole Light field data set is used. Again, SCSC outperforms OCSC and CCSC.
Video  Multispectral  Light field  

PSNR  time  PSNR  time  PSNR  time  
CCSC  20.430.11  11.910.07  17.670.14  27.880.07  13.700.09  8.990.11  
OCSC  33.170.01  1.410.04*  30.120.02  31.190.02      
SCSC  35.300.02  0.730.02*  30.510.02  1.210.03*  29.300.03  11.120.07*  
38.020.03  0.810.01*  31.710.01  1.400.01*  31.700.02  17.970.05* 
As for speed, SCSC is the fastest. However, note that this is for reference only as SCSC is run on GPU while the others (except for OCSC on Video) are run on CPU. Nevertheless, this still demonstrates an important advantage of SCSC, namely that its small memory footprint can benefit from the use of GPU, while the others cannot.
denoising  inpainting  
SBCSC  OCSC  SCSC  SBCSC  OCSC  SCSC  
Wind Mill  14.880.03  16.200.03  17.270.02  29.760.13  29.400.14  29.760.08 
Sea Rock  14.800.02  16.010.02  17.100.02  24.920.06  25.040.04  25.170.04 
Parthenon  14.970.02  16.330.01  17.440.03  27.060.06  26.790.04  28.040.04 
Rolls Royce  15.230.01  16.270.01  17.630.02  24.960.13  24.660.10  25.060.05 
Fence  15.210.04  16.530.02  17.560.03  26.810.05  26.710.08  26.850.05 
Car  16.900.01  18.050.03  20.060.05  29.600.07  29.400.09  30.440.04 
Kid  14.900.01  16.210.02  17.220.03  25.360.01  25.420.07  25.670.07 
Tower  14.890.02  16.190.01  18.360.05  26.640.04  26.480.06  26.960.03 
Fish  16.400.01  17.400.01  18.610.02  27.490.03  26.980.08  27.230.07 
Food  16.380.01  17.680.02  18.560.03  29.960.05  29.620.08  31.490.02 
4.6 Image Denoising and Inpainting
In previous experiments, superiority of the learned dictionary is demonstrated by reconstruction of clean images. In this section, we further examine the learned dictionary on two applications: image denoising and inpainting. Ten test images provided by (Choudhury et al., 2017)
are used. In denoising, we add Gaussian noise with zero mean and variance 0.01 to the test images (the average input PSNR is 10dB). In inpainting, we random subsample 50% of the pixels as 0 (the average input PSNR is 9.12dB). Following
(Heide et al., 2015; Choudhury et al., 2017; Papyan et al., 2017), we use a binary weight matrix to mask out positions of the missing pixels. We use the filters learned from Fruit in Section 4.4. SCSC is compared with (batch) SBCSC and (online) OCSC.Results are shown in Table 5. As can be seen, the PSNRs obtained by SCSC are consistently higher than those by the other methods. This shows that the dictionary, which yields high PSNR on image reconstruction, also leads to better performance in other image processing applications.
4.7 Solving (16): niAPG vs ADMM
Finally, we compare the performance of ADMM and niAPG in solving subproblem (16). We use a training sample from City. The experiment is repeated five times with different initializations. Figure 8 shows convergence of the objective in (16) with time. As can be seen, niAPG has fast convergence while ADMM fails to converge. Figure 9 shows , which measures violation of the ADMM constraints, with the number of iterations. As can be seen, the violation does not go to zero, which indicates that ADMM does not converge.
5 Conclusion
In this paper, we proposed a novel CSC extension, in which each sample has its own sampledependent dictionary constructed from a small set of shared base filters. Using online learning, the model can be efficiently updated with low time and space complexities. Extensive experiments on a variety of data sets including large image data sets and higherdimensional data sets all demonstrate its efficiency and scalability.
Acknowledgements
The second author especially thanks Weiwei Tu and Yuqiang Chen from 4Paradigm Inc. This research was supported in part by the Research Grants Council, Hong Kong, under Grant 614513, and by the University of Macau Grant SRG201500050FST.
References
 Aharon et al. (2006) Aharon, M., Elad, M., and Bruckstein, A. KSVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11):4311–4322, 2006.
 Andilla & Hamprecht (2014) Andilla, F. and Hamprecht, F. Sparse spacetime deconvolution for calcium image analysis. In Advances in Neural Information Processing Systems, pp. 64–72, 2014.
 Bertinetto et al. (2016) Bertinetto, L., Henriques, J. F., Valmadre, J., Torr, P., and Vedaldi, A. Learning feedforward oneshot learners. In Advances in Neural Information Processing Systems, pp. 523–531, 2016.
 Bertsekas & Tsitsiklis (1997) Bertsekas, D.P. and Tsitsiklis, J.N. Parallel and Distributed Computation: Numerical Methods. Athena Scientific, 1997.

Boyd et al. (2011)
Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J.
Distributed optimization and statistical learning via the alternating
direction method of multipliers.
Foundations and Trends in Machine Learning
, 3(1):1–122, 2011. 
Bristow et al. (2013)
Bristow, H., Eriksson, A., and Lucey, S.
Fast convolutional sparse coding.
In
IEEE Conference on Computer Vision and Pattern Recognition
, pp. 391–398, 2013. 
Chang et al. (2017)
Chang, H., Han, J., Zhong, C., Snijders, A., and Mao, J.
Unsupervised transfer learning via multiscale convolutional sparse coding for biomedical applications.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.  Choudhury et al. (2017) Choudhury, B., Swanson, R., Heide, F., Wetzstein, G., and Heidrich, W. Consensus convolutional sparse coding. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 4280–4288, 2017.
 Cogliati et al. (2016) Cogliati, A., Duan, Z., and Wohlberg, B. Contextdependent piano music transcription with convolutional sparse coding. IEEE/ACM Transactions on Audio Speech and Language Processing, 24(12):2218–2230, 2016.
 Degraux et al. (2017) Degraux, K., Kamilov, U. S., Boufounos, P. T., and Liu, D. Online convolutional dictionary learning for multimodal imaging. In IEEE International Conference on Image Processing, pp. 1617–1621, 2017.
 Donahue et al. (2014) Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Darrell, T. Decaf: A deep convolutional activation feature for generic visual recognition. In International Conference on Machine Learning, pp. 647–655, 2014.

Gu et al. (2015)
Gu, S., Zuo, W., Xie, Q., Meng, D., Feng, X., and Zhang, L.
Convolutional sparse coding for image superresolution.
In International Conference on Computer Vision, pp. 1823–1831, 2015.  Heide et al. (2015) Heide, F., Heidrich, W., and Wetzstein, G. Fast and flexible convolutional sparse coding. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 5135–5143, 2015.
 Jas et al. (2017) Jas, M., La Tour, T. D., Simsekli, U., and Gramfort, A. Learning the morphology of brain signals using alphastable convolutional sparse coding. In Advances in Neural Information Processing Systems, pp. 1099–1108, 2017.
 Jia et al. (2016) Jia, X., De Brabandere, B., Tuytelaars, T., and Gool, L. V. Dynamic filter networks. In Advances in Neural Information Processing Systems, pp. 667–675, 2016.
 Kalantari et al. (2016) Kalantari, N. K., Wang, T., and Ramamoorthi, R. Learningbased view synthesis for light field cameras. ACM Transactions on Graphics, 35(6):193, 2016.
 Kang et al. (2017) Kang, D., Dhar, D., and Chan, A. Incorporating side information by adaptive convolution. In Advances in Neural Information Processing Systems, pp. 3870–3880, 2017.
 Kavukcuoglu et al. (2010) Kavukcuoglu, K., Sermanet, P., Boureau, Y., Gregor, K., Mathieu, M., and LeCun, Y. Learning convolutional feature hierarchies for visual recognition. In Advances in Neural Information Processing Systems, pp. 1090–1098, 2010.
 Krizhevsky & Hinton (2009) Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
 Li et al. (2004) Li, L., Huang, W., Gu, I. Y., and Tian, Q. Statistical modeling of complex backgrounds for foreground object detection. IEEE Transactions on Image Processing, 13(11):1459–1472, 2004.
 Liu et al. (2017) Liu, J., GarciaCardona, C., Wohlberg, B., and Yin, W. Online convolutional dictionary learning. In IEEE International Conference on Image Processing, pp. 1707–1711, 2017.
 Mallat (1999) Mallat, S. A Wavelet Tour of Signal Processing. Academic Press, 1999.
 Nilsback & Zisserman (2008) Nilsback, M. and Zisserman, A. Automated flower classification over a large number of classes. In Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729, 2008.
 Pachitariu et al. (2013) Pachitariu, M., Packer, A., Pettit, N., Dalgleish, H., Hausser, M., and Sahani, M. Extracting regions of interest from biological images with convolutional sparse block coding. In Advances in Neural Information Processing Systems, pp. 1745–1753, 2013.
 Papyan et al. (2017) Papyan, V., Romano, Y., Sulam, J., and Elad, M. Convolutional dictionary learning via local processing. In International Conference on Computer Vision, pp. 5296–5304, 2017.
 Parikh & Boyd (2014) Parikh, N. and Boyd, S. Proximal algorithms. Foundations and Trends in Optimization, 1(3):127–239, 2014.

Peter et al. (2017)
Peter, S., Kirschbaum, E., Both, M., Campbell, L., Harvey, B., Heins, C.,
Durstewitz, D., Diego, F., and Hamprecht, F. A.
Sparse convolutional coding for neuronal assembly detection.
In Advances in Neural Information Processing Systems, pp. 3678–3688, 2017.  Rigamonti et al. (2013) Rigamonti, R., Sironi, A., Lepetit, V., and Fua, P. Learning separable filters. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2754–2761, 2013.
 Sironi et al. (2015) Sironi, A., Tekin, B., Rigamonti, R., Lepetit, V., and Fua, P. Learning separable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1):94–106, 2015.
 Šorel & Šroubek (2016) Šorel, M. and Šroubek, F. Fast convolutional sparse coding using matrix inversion lemma. Digital Signal Processing, 55:44–51, 2016.
 Wang et al. (2015) Wang, Y., Yin, W., and Zeng, J. Global convergence of ADMM in nonconvex nonsmooth optimization. arXiv preprint arXiv:1511.06324, 2015.
 Wang et al. (2018) Wang, Y., Yao, Q., Kwok, J. T., and Ni, L. M. Scalable online convolutional sparse coding. IEEE Transactions on Image Processing, 2018.
 Wohlberg (2016) Wohlberg, B. Efficient algorithms for convolutional sparse representations. IEEE Transactions on Image Processing, 25(1):301–315, 2016.
 Yao et al. (2017) Yao, Q., Kwok, J., Gao, F., Chen, W., and Liu, T.Y. Efficient inexact proximal gradient algorithm for nonconvex problems. In International Joint Conferences on Artifical Intelligence, pp. 3308–3314, 2017.
 Yasuma et al. (2010) Yasuma, F., Mitsunaga, T., Iso, D., and Nayar, S. K. Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum. IEEE Transactions on Image Processing, 19(9):2241–2253, 2010.
 Zeiler et al. (2010) Zeiler, M., Krishnan, D., Taylor, G., and Fergus, R. Deconvolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2528–2535, 2010.
Appendix A Proofs
a.1 Proposition 2
Since , we have
(17) 
so that
(18) 
a.2 Proposition 3
Let , (3) is equivalent to (3.3) since the following equations hold:
(20)  
(21)  
where (20) is due to
Then, (21) comes from the convolution theorem (Mallat, 1999), i.e.,
where and are first zeropadded to dimensional, and the Parseval’s theorem (Mallat, 1999): where .
As for constraints, when is transformed to the frequency domain, it is padded from dimensional to dimensional. Thus, we use to crop the extra dimensions to get back the original support.
Comments
There are no comments yet.