In the SC framework, we seek to efficiently represent data by using only a sparse combination of available basis vectors. We therefore assume that an-dimensional data vector can be approximated as
where is sparse and is a dictionary, sometimes referred to as the synthesis matrix, whose columns are the basis vectors.
This paper focuses on the SC problem of decomposing a signal into morphologically distinct components. A typical assumption for this problem is that the data is a linear combination of source signals:
The MCA framework (Starck et al., 2004) requires that each component admits a sparse representation within the corresponding dictionary . The dictionaries s are distinct, i.e. each source-specific dictionary allows obtaining sparse representation of the corresponding source signal and is highly inefficient in representing the other content in the mixture. This leads to a signal model that generalizes the one given by Equation 1 as
The bottleneck of SC techniques is that at inference a sparse code has to be computed for each data point or data patch (as in case of high-resolution images) and this is typically done via iterative optimization. In case of single dictionary setting, ISTA (Daubechies et al., 2004) and FISTA (Beck & Teboulle, 2009) are classical algorithmic choices for this purpose. For the MCA problem, the standard choice is SALSA (Afonso et al., 2011), an instance of ADMM (Boyd et al., 2011). This process is prohibitively slow for high-throughput real-time applications.
The key contribution of this paper is an efficient and accurate deep learning architecture that is general enough to well-approximate optimal codes for both classic SC in a single-dictionary framework and MCA-based signal separation. We call our deep learning approximator Learned SALSA (LSALSA). The proposed encoder is formulated as a time-unfolded version of the SALSA algorithm with a fixed number of iterations, where the depth of the deep learning model corresponds to the number of SALSA iterations. We train the deep model in the supervised fashion to predict optimal sparse codes for a given input and thus in practice we can use shallow architectures of fixed-depth that correspond to only few iterations of the original SALSA and achieve superior performance to this algorithm. Furthermore, SALSA comes with a built-in source separation mechanism and cross-iteration memory sharing connections. These main algorithmic features of SALSA translate to a specific connectivity pattern of the corresponding deep learning architecture of LSALSA that in consequence gives LSALSA an advantage over previous deep encoders, like LISTA (Gregor & LeCun, 2010), in terms of applicability to a broader class of learning problems (LISTA is used only in the single dictionary setting) and performance. To the best of our knowledge, our approach is the first one to utilize an instance of ADMM unrolled into a deep learning architecture to address a source separation problem.
This paper is organized as follows: Section 2 provides literature review, Section 3 formulates the SC problem in detail, Section 4 shows how to derive predictive single dictionary SC and multiple dictionary MCA from their iterative counterparts and explains our approach (LSALSA). Finally, Section 5 shows experimental results for both the single dictionary setting and MCA. Section 6 concludes the paper.
2 Related Work
A sparse code inference aims at computing sparse codes for given data and is most widely addressed via iterative schemes such as aforementioned ISTA and FISTA. Predicting approximations of optimal codes can be done using deep feed-forward learning architectures based on truncated convex solvers. This family of approaches lies at the core of this paper. A notable approach in this family known as LISTA (Gregor & LeCun, 2010) stems from earlier predictive sparse decomposition methods (Kavukcuoglu et al., 2010; Jarrett et al., 2009)
, which however were obtaining approximations to the sparse codes of insufficient quality. LISTA improves over these techniques and enhances ISTA by unfolding a fixed number of iterations to define a fixed-depth deep neural network that is trained with examples of input vectors paired with their corresponding optimal sparse codes obtained by conventional methods like ISTA or FISTA. LISTA was shown to provide high-quality approximations of optimal sparse codes with a fixed computational cost. Unrolling methodology has also been applied to algorithms solving SC with-regularization (Wang et al., 2016) and message passing schemes (Borgerding & Schniter, 2016)
. In other prior works, ISTA was recast as a recurrent neural network unit giving rise to a variant of LSTM(Gers et al., 2003; Zhou et al., 2018). The above mentioned algorithms do not suit well to the MCA problem as they have no algorithmic mechanism for handling multiple dictionaries. In other words, they would approach the MCA problem by casting it as a SC problem with access to a single dictionary that is a concatenation of source-specific dictionaries, e.g. .
This paper considers a generalization of the single-dictionary SC problem to the MCA framework. The framework assumes that the data can be explained by multiple distinct dictionaries. MCA has been used successfully in a number of applications that include decomposing images into textures and cartoons for denoising and inpainting (Starck et al., 2005b, a; Elad et al., 2005; Peyré et al., 2007; Shoham & Elad, 2008; Peyré et al., 2010), detecting text in natural scene images (Liu et al., 2017), as well as other source separation problems such as separating non-stationary clutter from weather radar signals (Uysal et al., 2016), transients from sustained rhythmic components in EEG signals (Parekh et al., 2014), and stationary from dynamic components of MRI videos (Otazo et al., 2015). The MCA problem is traditionally solved via SALSA algorithm, which constitutes a special case of the ADMM method.
There exist a few approaches in the literature utilizing ADMM unrolled into a deep learning architecture. One such computationally efficient framework (Sprechmann et al., 2013) was applied to learning task-specific (reconstruction or classification) sparse models via sparsity-promoting convolutional operators. Another unrolled version of ADMM (Yang et al., 2016) was demonstrated to improve the reconstruction accuracy and computational speed of baseline ADMM algorithm for the problem of compressive sensing Magnetic Resonance Imaging. A variety of papers followed up on this work for various image reconstruction tasks, such as the Learned Primal-dual Algorithm (Adler & Öktem, 2017). None of these approaches were applied to the MCA or other source separation problems. An unrolled nonnegative matrix factorization (NMF) algorithm (Le Roux et al., 2015) was implemented as a deep network for the task of speech separation. In another work (Wisdom et al., 2017), the NMF-based speech separation task was solved with an ISTA-like unfolded network.
3 Problem Formulation
This paper focuses on the inference problem in SC. It is formulated as finding the optimal sparse code given input vector and dictionary matrix , whose columns are the normalized basis vectors. minimizes the -regularized linear least squares cost function
where the scalar constant balances sparsity with data fidelity. Thus is the optimal code for with respect to . The dictionary matrix
is usually learned by minimizing a loss function given below(Olshausen & Field, 1996)
with respect to
using stochastic gradient descent (SGD), whereis the size of the training data set, is the training sample, and is the corresponding optimal sparse code. The optimal sparse codes in each iteration are obtained in this paper with FISTA.
where for , , and s are the coefficients controlling the sparsity penalties. We denote the concatenated optimal codes with .
In the classic MCA works, the dictionaries s are selected to be well-known filter banks with explicitly designed sparsification properties. Such hand-designed transforms have good generalization abilities and help to prevent overfitting. Also, MCA algorithms often require solving large systems of equations involving or . An appropriate constraining of leads to a banded system of equations and in consequence reduces the computational complexity of these algorithms, e.g.: (Parekh et al., 2014). More recent MCA works use learned dictionaries for image analysis (Shoham & Elad, 2008; Peyré et al., 2007). Some extensions of MCA consider learning dictionaries s and sparse codes jointly (Peyré et al., 2007, 2010). In our paper, we learn dictionaries independently. In particular, for each we minimize
with respect to using SGD, where is the mixture component of the training sample and is the corresponding optimal sparse code. The optimal sparse codes in each iteration are obtained with FISTA.
4 From iterative to predictive SC and MCA
4.1 Split Augmented Lagrangian Shrinkage Algorithm (SALSA)
The objective functions used in SC (Equation 4) and MCA (Equation 6) are each convex with respect to , allowing a wide variety of optimization algorithms with well-studied convergence results to be applied (Bauschke & Combettes, 2011). Here we describe a popular algorithm that is general enough to solve both problems called SALSA, which is an instance of ADMM.
ADMM addresses an optimization problem of the form
by re-casting it as the equivalent, constrained problem
ADMM then minimizes the corresponding augmented Lagrangian,
where correspond to Lagrangian multipliers, one variable at a time until convergence.
SALSA addresses an instance of the general optimization problem from Equation 11, where is a least-squares term and is such that its proximity operator can be computed exactly (Afonso et al., 2010). The algorithm falls under a sub-category of Douglas-Rachford Splitting methods, for which convergence has been proved (Eckstein & Bertsekas, 1992).
SALSA is given in Algorithms 1 and 2 for the single-dictionary case and the MCA case involving two dictionaries111In this paper we consider MCA framework with two dictionaries. Extensions to more than two dictionaries are straightforward., respectively, where
is the soft-thresholding function with threshold . Note that in Algorithm 2, the and updates can be performed with element-wise operations. The -update, however, is non-separable with respect to components for general . We call this the splitting step.
As mentioned in Section 3, the -update is often simplified to element-wise operations by constraining matrix to have special properties. For example: requiring , , reduces the -update step to element-wise division (after applying the matrix inverse lemma). In (Yang et al., 2016),
is set to be the partial Fourier transform, reducing the system of equations of the-update to be a series of convolutions and element-wise operations. In our work, as is typical in the case of SC, is a learned dictionary without any imposed structure.
. This however needs to be done just once, at the very beginning, as this matrix remains fixed during the entire run of SALSA. We abbreviate the inverted matrix as
We call this matrix a splitting operator. The recursive block diagram of SALSA is depicted in Figure 2.
4.2 Learned SALSA (LSALSA)
We next describe our proposed deep encoder architecture that we refer to as Learned SALSA (LSALSA).
Consider truncating SALSA algorithm to a fixed number of iterations and then time-unfolding it into a deep neural network architecture that matches the truncated SALSA’s output exactly. The obtained architecture is illustrated in Figure 1 for .
We initialize the parameters of the deep model with
where in the MCA case, to achieve an exact correspondence with SALSA. All splitting operators share parameters across the network.
LSALSA can be trained with a standard backpropagation. Letdenote the output of the LSALSA architecture. The cost function used for training the model is defined as
4.3 LSALSA versus LISTA
Here we explain the conceptual difference between LSALSA and LISTA (see also Section A in the Supplement). This difference is a direct consequence of a different nature of their maternal algorithms, SALSA and ISTA respectively. ISTA is a proximal gradient method that solves the optimization problem of Equation 4 by iteratively repeating gradient descent step followed by soft thresholding. SALSA on the other hand is a second-order method that recasts the problem in terms of constrained optimization and optimizes the corresponding Augmented Lagrangian. Consequently, LISTA has a simple structure such that each layer depends only on the previous layer and re-injection of the filtered data (see (Gregor & LeCun, 2010) for reference). LSALSA has cross-layer connections resulting from the existence of the Lagrangian multiplier update (the -step) in the SALSA algorithm, which allows for learning dependencies between non-adjacent layers.
5 Experimental Results
We run different optimization algorithms to predict optimal codes for various data sets and for a varying number of iterations (). We provide empirical evaluation for both one and two-dictionary (MCA) settings. We focus on the inference problem and thus for each experiment the dictionaries were learned off-line and used for all methods (the visualization of the atoms of the obtained dictionaries can be found in Section B in the Supplement). We compare the following methods: LSALSA, truncated SALSA, truncated FISTA, and LISTA. Both LSALSA and LISTA are implemented as feedforward neural networks. For MCA experiments, we simply run FISTA and LISTA using the concatenated dictionary .
5.1 Single Dictionary Case
We run experiments with four data sets: Fashion MNIST (Xiao et al., 2017) ( classes), ASIRRA (Elson et al., 2007) ( classes), MNIST (LeCun et al., 2009) ( classes), and CIFAR-10 (Krizhevsky & Hinton, 2009) ( classes). The ASIRRA data set is a collection of natural images of cats and dogs. We use a subset of the whole data set: training images and testing images as commonly done (Golle, 2008). The results for MNIST and CIFAR-10 are reported in Section C in the Supplement.
The Fashion MNIST images were first divided into non-overlapping patches (ignoring extra pixels on two edges), resulting in patches per image. Then, optimal codes were computed for each vectorized patch by minimizing the objective from Equation 4 with FISTA for iterations. The ASIRRA images come in varying sizes. We resized them all to the resolution of and converted them to grayscale. Then we divided them into non-overlapping patches, resulting in patches per image. Optimal codes were computed patch-wise as for Fashion MNIST, but taking iterations to ensure convergence on this more difficult SC problem. For both data sets, was chosen so that the sparsity level was about .
The data sets were then separated into training and testing sets. The training patches were used to produce the dictionaries. Visualizations of the dictionary atoms are provided in Section B in the Supplement. An exhaustive hyper-parameter search was performed for each encoding method and for each number of iterations , to minimize RMSE between obtained and optimal codes. The hyper-parameters search included for all methods, for SALSA and LSALSA, as well as learning rates and learning rate decays for LSALSA and LISTA.
The obtained encoders were used to compute sparse codes on the test set. Those were then compared with the optimal codes via RMSE. The results for Fashion MNIST are shown in terms of the number of iterations (Figure 3) and the wallclock time in seconds (Figure 4) used to make the prediction. It takes FISTA more than iterations and SALSA more than to reach the error achieved by LSALSA in just one. Near , both FISTA and SALSA are finally converging to the optimal codes. LISTA outperforms FISTA at first, but does not show much improvement after . Similar results for ASIRRA are shown in Figures 5 and 6. On this more difficult problem, it takes FISTA more than iterations and SALSA more than to catch up with LSALSA with a single iteration. LISTA and LSALSA are comparable for , after which LSALSA dramatically improves its optimal code prediction and, similarly as in case of Fashion MNIST, shows advantage in terms of the number of iterations, wallclock time, and the quality of the recovered sparse codes over other methods.
We also investigated which method yields better codes in terms of the classification task. For each data set, we trained a logistic regression classifier to predict the label from the corresponding sparse code. Thus, for Fashion MNIST each image is associated withoptimal codes (one for each patch), yielding a total feature length of . The Fashion MNIST classifier was trained until it achieved classification error on the testing set. For ASIRRA, each concatenated optimal code had length ; to reduce the dimensionality we applied a random Gaussian projection before inputting the codes into the classifier. The classifier was trained on the optimal projected codes of length until it achieved error. The results for Fashion MNIST and ASIRRA are shown in Table 1 and 2, respectively. Note: The classifier was trained on the optimal codes for images from a test set. Thus, the resulting classification error is only due to the difference between the optimal and estimated codes.
|Classification Error (in %)|
|Classification Error (in %)|
5.2 MCA: Two-Dictionary Case
We first describe the data set that we use for the MCA experiments. Following the notation introduced previously in the paper, we set s to be the whole MNIST images and s to be the non-overlapping patches from ASIRRA (thus we have patches per image). We obtain k training and k testing patches from ASIRRA, and k training and k testing images from MNIST. We randomly mix images from MNIST and patches from ASIRRA and generate k mixed training images and k mixed testing images. Optimal codes were computed using SALSA (Algorithm 2) for iterations, ensuring that both components had a sparsity level around . Note that we also performed experiments on a mixed dataset of CIFAR-10 and MNIST. Those can be found in Section D in the Supplement.
An exhaustive hyper-parameter search was performed for each encoding method and for each number of iterations . The hyper-parameters search included for FISTA and LISTA, for SALSA and LSALSA, as well as learning rates for LSALSA and LISTA.
Code prediction error curves are presented in Figure 8 and 9. LSALSA steadily outperforms the others, until SALSA catches up around . FISTA and LISTA, without a mechanism for distinguishing two dictionaries, struggle to estimate the optimal codes.
In Figure 7 we illustrate each method’s sparsity/accuracy trade-off on the ASIRRA test data set, while varying (Section E in the Supplement shows the same plot for a wider variety of as well as contains a similar plot for MNIST). The LSALSA retains both the highest sparsity and accuracy levels, even for small , from among all the methods.
Similarly as before, we performed an evaluation on the classification task. A separate classifier was trained for each data set using the separated optimal codes and , respectively. As before, a random Gaussian projection was used to reduce the ASIRRA codes to the length before inputting to the classifier. The classification results are depicted in Table 3 for MNIST and Table 4 for ASIRRA.
Finally, in Figure 10 we present exemplary reconstructed images obtained by different methods when performing source separation (more reconstruction results can be found in Section F in the Supplement). No additional learning was performed to achieve reconstruction, the estimated codes were simply multiplied by the corresponding dictionary matrix, i.e. for LSALSA we have , where represents the th component of an encoder’s output. FISTA and LISTA are unable to separate components without severely corrupting the ASIRRA component. LSALSA has visually recognizable separations even at , and the MNIST component is almost gone by .
|Classification Error (in %)|
|Classification Error (in %)|
In this paper we propose a deep encoder architecture LSALSA obtained from time-unfolding the Split Augmented Lagrangian Shrinkage Algorithm (SALSA). We demonstrate that LSALSA outperforms baseline methods such as SALSA, FISTA, and LISTA, in terms of the quality of predicted sparse codes, as well as the running time in both the single and multiple (MCA) dictionary case. In the two-dictionary MCA setting, we furthermore show that LSALSA obtains the separation of image components that has better visual quality than the separation obtained by SALSA.
- Adler & Öktem (2017) Adler, J. and Öktem, O. Learned primal-dual reconstruction. CoRR, abs/1707.06474, 2017.
- Afonso et al. (2010) Afonso, M., Bioucas-Dias, J., and Figueiredo, M. Fast image recovery using variable splitting and constrained optimization. IEEE Trans. Image Processing, 19(9):2345–2356, 2010.
- Afonso et al. (2011) Afonso, M., Bioucas-Dias, J., and Figueiredo, M. An augmented lagrangian approach to the constrained optimization formulation of imaging inverse problems. Trans. Img. Proc., 20(3):681–695, 2011.
- Bauschke & Combettes (2011) Bauschke, H. H. and Combettes, P. L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer Publishing Company, 1st edition, 2011.
- Beck & Teboulle (2009) Beck, A. and Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Img. Sci., 2(1):183–202, 2009.
- Borgerding & Schniter (2016) Borgerding, M. and Schniter, P. Onsager-corrected deep learning for sparse linear inverse problems. In GlobalSIP, 2016.
Boyd et al. (2011)
Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J.
Distributed optimization and statistical learning via the alternating
direction method of multipliers.
Foundations and Trends® in Machine Learning, 3(1):1–122, 2011.
- Daubechies et al. (2004) Daubechies, I., Defrise, M., and De Mol, C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004.
- Eckstein & Bertsekas (1992) Eckstein, J. and Bertsekas, D. On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program., 55:293–318, 1992.
Elad et al. (2005)
Elad, M., Starck, J. L., Querre, P., and Donoho, D. L.
Simultaneous cartoon and texture image inpainting using morphological component analysis (mca).Applied and Computational Harmonic Analysis, 19(3):340–358, 2005.
- Elson et al. (2007) Elson, J., Douceur, J., Howell, J., and Saul, J. Asirra: a captcha that exploits interest-aligned manual image categorization. In ACM CCS, 2007.
- Gers et al. (2003) Gers, F. A., Schraudolph, N. N., and Schmidhuber, J. Learning precise timing with lstm recurrent networks. J. Mach. Learn. Res., 3:115–143, 2003.
- Golle (2008) Golle, P. Machine learning attacks against the asirra captcha. In ACM CCS, 2008.
- Gregor & LeCun (2010) Gregor, K. and LeCun, Y. Learning fast approximations of sparse coding. In ICML, 2010.
- Jarrett et al. (2009) Jarrett, K., Kavukcuoglu, K., Koray, M., and LeCun, Y. What is the best multi-stage architecture for object recognition? In ICCV, 2009.
- Kavukcuoglu et al. (2010) Kavukcuoglu, K., Ranzato, M. A., and LeCun, Y. Fast inference in sparse coding algorithms with applications to object recognition. CoRR, abs/1010.3467, 2010.
- Krizhevsky & Hinton (2009) Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images, 2009.
- Le Roux et al. (2015) Le Roux, J., Hershey, J. R., and Weninger, F. Deep nmf for speech separation. In ICASSP, 2015.
- LeCun et al. (2009) LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, 2009.
- Liu et al. (2017) Liu, S., Xian, Y., Li, H., and Yu, Z. Text detection in natural scene images using morphological component analysis and laplacian dictionary. IEEE/CAA Journal of Automatica Sinica, PP(99):1–9, 2017.
- Olshausen & Field (1996) Olshausen, B. and Field, D. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996.
- Otazo et al. (2015) Otazo, R., Candès, E., and Sodickson, D. K. Low-rank and sparse matrix decomposition for accelerated dynamic mri with separation of background and dynamic components. Magn Reson Med, 73(3):1125–36, 2015.
- Parekh et al. (2014) Parekh, A., Selesnick, I., Rapoport, D., and Ayappa, I. Sleep spindle detection using time-frequency sparsity. In IEEE SPMB, 2014.
- Peyré et al. (2007) Peyré, G., J.Fadili, and Starck, J.-L. Learning adapted dictionaries for geometry and texture separation. In SPIE Wavelets, 2007.
- Peyré et al. (2010) Peyré, G., J.Fadili, and Starck, J.-L. Learning the morphological diversity. SIAM J. Imaging Sciences, 3(3):646–669, 2010.
- Shoham & Elad (2008) Shoham, N. and Elad, M. Algorithms for signal separation exploiting sparse representations, with application to texture image separation. In Proceedings of the IEEE 25th Convention of Electrical and Electronics Engineers in Israel, 2008.
- Sprechmann et al. (2013) Sprechmann, P., Litman, R., Yakar, T., Bronstein, A., and Sapiro, G. Efficient supervised sparse analysis and synthesis operators. In NIPS, 2013.
- Starck et al. (2004) Starck, J.-L, Elad, M., and Donoho, D. Redundant multiscale transforms and their application for morphological component separation. Advances in Imaging and Electron Physics - ADV IMAG ELECTRON PHYS, 132:287–348, 2004.
- Starck et al. (2005a) Starck, J.-L., Elad, M., and Donoho, D. Image decomposition via the combination of sparse representations and a variational approach. IEEE Trans. Image Processing, 14(10):1570–1582, 2005a.
- Starck et al. (2005b) Starck, J.-L., Moudden, Y., J.Bobina, Elad, M., and Donoho, D. Morphological component analysis. In Proc. SPIE Wavelets, 2005b.
- Uysal et al. (2016) Uysal, F., Selesnick, I., and Isom, B. Mitigation of wind turbine clutter for weather radar by signal separation. IEEE Trans. Geoscience and Remote Sensing, 54(5):2925–2934, 2016.
- Wang et al. (2016) Wang, Z., Ling, Q., and Huang, T. Learning deep l0 encoders. In AAAI, 2016.
- Wisdom et al. (2017) Wisdom, S., Powers, T., Pitton, J., and Atlas, L. Deep recurrent nmf for speech separation by unfolding iterative thresholding. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 254–258, 2017.
- Xiao et al. (2017) Xiao, H., Rasul, K., and Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017.
- Yang et al. (2016) Yang, Y., Sun, J., Li, H., and Xu, Z. Deep admm-net for compressive sensing mri. In NIPS, 2016.
- Zhou et al. (2018) Zhou, J., Di, K., Du, J., Peng, X., Yang, H., S. Pan, I. Tsang, Liu, Y., Qin, Z., and Goh, R. Sc2net: Sparse lstms for sparse coding. In AAAI, 2018.
Appendix A Additional discussion on the difference between LSALSA and LISTA
The recursive formula in LISTA is given as
The recursive formula in LSALSA is derived below. We start with the output of the non-linearity from Algorithm 1:
where , and u(t) is the output of the network non-linearity. Clearly, in the case of LSALSA, the non-linearity output has a dependence on all of the previous layers’ outputs. This comes from the auxiliary variable , i.e. the Lagrangian multipliers term. LISTA’s output only depends directly on the previous layer.
Appendix B Dictionary Learning Experiments
b.1 Dictionaries used in single dictionary experiments
b.2 Dictionaries used in MCA experiments
In the first set of MCA experiments we performed source separation on MNIST + ASIRRA images. We used two dictionaries trained independently using whole MNIST images and patches of ASIRRA images. In the second set of MCA experiments, we performed source separation on spatially added MNIST and CIFAR-10 images (more results of this experiment showed in Section D of the Supplement). We used same MNIST dictionary as used in MNIST + ASSIRA experiments and trained a CIFAR-10 dictionary on the whole grayscale CIFAR-10 data set images.These dictionaries have 1024 atoms (complete), all normalized vectors of length 1024 reshaped to . A subset of atoms of the dictionaries used in MCA experiments are visualized in Figure 13 and Figure 14.
Appendix C Additional single dictionary experiments
The single dictionary experiments on MNIST and CIFAR-10 data sets are summarized below. The code prediction errors for MNIST are captured in Figure 15 and for CIFAR are captured in Figure 16. The classification results are captured in Table 5 for MNIST and Table 6 for CIFAR.
The MNIST images were first scaled to pixel values in range and then divided into non-overlapping patches (ignoring extra pixels on edges), resulting in
patches per image. Only patches with standard deviation0.1 were used in training and the remaining ones were discarded (as they are practically all-black). Optimal codes were computed for each vectorized patch by minimizing the objective from Equation 4 by running FISTA for iterations giving approximately sparse optimal codes.
|Classification Error (in %)|
In CIFAR-10 experiments, natural images were first converted to grayscale, scaled to values in range , and broken down to non-overlapping patches. Each image resulted in patches. Then optimal codes were computed on these patches in similar fashion as described above for MNIST data set.
|Classification Error (in %)|
Appendix D Additional MCA experiments
d.1 Mnist + Cifar
MNIST + CIFAR-10 MCA experimental results are summarized here. We combined whole MNIST digits images with grayscale CIFAR-10 images and performed source separation on them. Code prediction error curves are presented with respect to the number of iterations and wallclock time used to make prediction in Figure 17. The classification results are captured in Table 8 for MNIST codes and Table 8 for CIFAR-10 codes.
Appendix E Additional plots: MNIST+ASIRRA
Figure 18 shows an extended version of Figure 7. In Figure 19 we illustrate each method’s sparsity/accuracy trade-off on the MNIST test data set, while varying . The LSALSA retains both the highest sparsity and accuracy levels, even for small , from among all the methods.