Learned Interpretable Residual Extragradient ISTA for Sparse Coding

by   Lin Kong, et al.

Recently, the study on learned iterative shrinkage thresholding algorithm (LISTA) has attracted increasing attentions. A large number of experiments as well as some theories have proved the high efficiency of LISTA for solving sparse coding problems. However, existing LISTA methods are all serial connection. To address this issue, we propose a novel extragradient based LISTA (ELISTA), which has a residual structure and theoretical guarantees. In particular, our algorithm can also provide the interpretability for Res-Net to a certain extent. From a theoretical perspective, we prove that our method attains linear convergence. In practice, extensive empirical results verify the advantages of our method.



There are no comments yet.


page 1

page 2

page 3

page 4


Learned ISTA with Error-based Thresholding for Adaptive Sparse Coding

The learned iterative shrinkage thresholding algorithm (LISTA) introduce...

Learning Fast Approximations of Sparse Nonlinear Regression

The idea of unfolding iterative algorithms as deep neural networks has b...

Learning step sizes for unfolded sparse coding

Sparse coding is typically solved by iterative optimization techniques, ...

Towards Understanding Residual and Dilated Dense Neural Networks via Convolutional Sparse Coding

Convolutional neural network (CNN) and its variants have led to many sta...

Convolutional Sparse Coding Fast Approximation with Application to Seismic Reflectivity Estimation

In sparse coding, we attempt to extract features of input vectors, assum...

Convolutional Neural Networks Analyzed via Convolutional Sparse Coding

Convolutional neural networks (CNN) have led to many state-of-the-art re...

Deviant Learning Algorithm: Learning Sparse Mismatch Representations through Time and Space

Predictive coding (PDC) has recently attracted attention in the neurosci...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this paper, we mainly consider the following problem, which is to recover a sparse vector

from an observation vector with noise

(e.g., additive Gaussian white noise):


where ( in general) is the dictionary matrix. To solve Problem (1) which is generally ill-posed, some prior information such as sparsity or low-rankness needs to be incorporated, for example,

is sparse. A common way to estimate

is to solve the Lasso problem (Tibshirani, 1996):


where is a regularization parameter. Many methods have been proposed to solve the sparse coding problem, such as least angle regression (Efron et al., 2004), approximate message passing (AMP) (Donoho et al., 2009) and iterative shrinkage thresholding algorithm (ISTA) (Daubechies et al., 2004; Blumensath and Davies, 2008). For solving Problem (2), the update rule of ISTA is

where is the soft-thresholding (ST) operator with the threshold , is the step size which should be taken in , where

is the largest singular value of the dictionary matrix.

Beck and Teboulle (2009) proved that ISTA can only achieve a sublinear convergence rate.

Recently, a class of methods of unfolding the traditional iterative algorithms into deep neural networks (DNNs), which are called

Algorithm Unfolding (Monga et al., 2021) or Deep Unfolding (Hershey et al., 2014), have been proposed, and have gradually attracted more and more attention. This idea was first proposed by Gregor and LeCun (2010)

, and they unfolded ISTA and viewed ISTA as a recurrent neural network (RNN) and proposed a learning-based model named Learned ISTA (LISTA):


where , and are initialized as , and , respectively. All the parameters are learnable and data-driven. Many empirical and theoretical results as in (Aberdam et al., 2020; Giryes et al., 2018) have shown that LISTA can recover from more accurately and use one or two order-of-magnitude fewer iterations than original ISTA. Moreover, the linear convergence of a variant of LISTA (i.e., LISTA-CPSS) was proved for the first time in (Chen et al., 2018b). In addition, these networks have higher interpretability than general networks, thus can provide some explanations for deep networks. Actually, the deep unfolding algorithm (actually a network) was believed to incorporate some priors of models and algorithms in traditional optimization problems and have the learning capacity of network obtained from training data.

Due to the advantages of the idea of algorithm unfolding, a lot of works such as (Wang et al., 2016; Sprechmann et al., 2015; Ito et al., 2019; Borgerding et al., 2017; Sreter and Giryes, 2018) inspired by (Gregor and LeCun, 2010) have been proposed and successfully applied in various fields. Moreover, a series of studies on LISTA have attracted increasing attentions and inspired many subsequent works in different aspects, including learning based optimization (Xie et al., 2019; Sun et al., 2016), design of DNNs (Metzler et al., 2017; Zhang and Ghanem, 2018; Zhou et al., 2018; Chen et al., 2020; Rick Chang et al., 2017; Zhang et al., 2020; Simon and Elad, 2019) and interpreting the DNNs (Zarka et al., 2020; Papyan et al., 2017; Aberdam et al., 2019; Sulam et al., 2018, 2019).

There are also many works such as (Xin et al., 2016; Giryes et al., 2018; Moreau and Bruna, 2017; Chen et al., 2018b; Liu et al., 2019; Wu et al., 2020; Ablin et al., 2019) to discuss and understand LISTA and its variants from a theoretical perspective. Among them, Chen et al. (2018b) proved that there is a coupling relationship between the two learnable matrices of each layer of LISTA, thereby reducing the number of learnable parameters. They also proved the linear convergence of LISTA for the first time. Later, many subsequent works (Liu et al., 2019; Wu et al., 2020; Ablin et al., 2019) further improved LISTA with different methods. For instance, Liu et al. (2019) simplified the different matrix parameters of each layer of the network to the product of a matrix shared by the network and different scalar parameters of each layer, and proved that using the matrix parameters obtained by solving an optimization problem can achieve the same performance obtaind by learnable matrices. Then Wu et al. (2020)

proposed that the value of the element in the estimate obtained by LISTA may be lower than the expected value, and thus, inspired by gated recurrent unit (GRU)

(Cho et al., 2014; Chung et al., 2015), GLISTA (Wu et al., 2020) was proposed to gain the LISTA-related algorithms. Besides, we also make improvements based on LISTA and proposed an innovative work (Li et al., 2021), and this paper is a condensed version of (Li et al., 2021).

However, we find that all the existing variants of LISTA with convergence guarantees are serial, the residual network (Res-Net) (He et al., 2016)

, which is influential in deep learning, has not been introduced into LISTA. An important reason is that changing the original structure of LISTA will destroy its excellent mathematical interpretability. Can we get a new LISTA with an interpretable residual structure, which has a convergence guarantee?

Our Main Contributions: The main contributions of this paper are listed as follows:

We propose a novel unfolding network, named Extragradient based LISTA (ELISTA), which is a variant of LISTA with residual structure by employing the idea of extragradient into LISTA and establishing the relationship with Res-Net, which is an improvment about the network structure for solving sparse coding problems. To the best of our knowledge, this is the first residual structure LISTA with theoretical guarantees.

We prove the linear convergence of ELISTA. Moreover, we conduct extensive experiments to verify the effectiveness of our algorithm. The experimental results show that our ELISTA is superior to the state-of-the-art methods.

2 Extragradient Based LISTA

In this section, we first introduce the technique of extragradient into LISTA and propose an innovative algorithm, named Extragradient based LISTA (ELISTA), and depict it in detail. Moreover, we establish the relationship between ELISTA and Res-Net, which is one of the reasons why ELISTA is advantageous.

2.1 Extragradient Method

We note that iterative algorithms, such as ISTA, can actually be treated as a proximal gradient descent method, which is a first-order optimization algorithm, for special objective functions. Thus, we want to introduce the idea of extragradient into the related iterative algorithms. The extragradient method was first proposed by (Korpelevich, 1976), which is a classical method for variational inequality problems. For optimization problems, the idea of extragradient was first used in (Nguyen et al., 2018), which proposed an extended extragradient method (EEG) by combining this idea with some first-order descent methods. In the -th iteration of EEG, it first calculates the gradient at , and updates according to the gradient to get an intermediate point , then calculates the gradient at , and updates the original point according to the gradient at the intermediate point to obtain , which is the key idea of extragradient. Intuitively, the additional step in each iteration of EEG allows us to examine the geometry of the problem and consider its curvature information, which is one of the most important bottlenecks for first-order methods. Thus, by using the idea of extragradient, we can get a better result after each iteration. The update rules of EEG for Problem (2) can be rewritten as follows:


This form of EEG is similar to ISTA, and thus it can be regarded as a generalization of ISTA.

2.2 Extragradient Based LISTA and the Relationship with Res-Net

(a) Res-Net: a building block.
(b) ELISTA: a building layer.
Figure 1: Comparison of the network structures of Res-Net (He et al., 2016) and ELISTA (ours).

In order to speed up the convergence of EEG, we combine the algorithm with deep networks and regard and two thresholds of two steps in (4) as learnable parameters, and get the following update rules:


However, since the above scheme has two different matrices and to learn in each layer, the number of network parameters greatly increases and the training of the network slows down significantly. Therefore, to address this issue and further establish the connection between the two steps of (5), we convert and into and , respectively, where and are two scalars to learn. Then, inspired by (Liu et al., 2019), we change the of each layer into the same and get a tied algorithm, which can significantly reduce the number of learnable parameters. Finally, we obtain the following update rules for our Extragradient Based LISTA (ELISTA):


According to (6), we can get the network structure diagram of ELISTA, as shown in Figure 1. Through our observation and comparison, we find that the network structure of ELISTA is corresponding to Res-Net. Since is already given, we can regard as a bias. Thus, from Figure 1

, we can see that the structure of the network obtained by ELISTA is the same as that of Res-Net, including weight layer, activation function and identity. As we all know, Res-Net can obtain a better performance by improving network structures. Therefore, it is meaningful to discuss and study the explanation for the internal mathematical mechanism of Res-Net. On the one hand, to some extend, our algorithm may be regarded as a mathematical explanation of the reason for the superiority of Res-Net. On the other hand, the connection and combination of ELISTA and Res-Net might be able to explain why our algorithm has better performance than existing methods. Besides, there are a lot of work using ordinary differential equation (ODE) to interpret the network by considering ODE as a continuous equivalent of the residual network (ResNet)

(Chen et al., 2018a). However, we found that ODE can only explain the networks with linear connection blocks, while ours is nonlinear. But, the form of our blocks are less general than those of ODE.

Table 1: Comparison of the number of parameters to learn in different methods.

Moreover, the comparison of the number of parameters of the network corresponding to different algorithms is shown in Table 1, where LAMP (Borgerding et al., 2017) is an algorithm to transform AMP (Donoho et al., 2009) into a neural network inspired by (Gregor and LeCun, 2010).

3 Convergence Analysis

In this section, we provide the convergence analysis of our algorithm. We first give a basic assumption. Then we provide the convergence property of ELISTA. We note that our analysis, like that of Theorems 3 and 4 of (Wu et al., 2020), is proved under the existence of “false positive”, while the theoretical analysis of (Chen et al., 2018b; Liu et al., 2019) was provided under the assumption of no “false positive”, which is difficult to satisfy in reality.

Assumption 1 (Basic assumption).

The signal is sampled from the following set:

In other words, is bounded and -sparse . Furthermore, we assume .

This assumption is a basic assumption for this class of algorithms. Almost all the related algorithms need to satisfy this assumption, e.g., (Liu et al., 2019; Wu et al., 2020).

Based on the assumption, we can get the linear convergence of ELISTA, which can be given by the following theorem.

Theorem 1 (Linear Convergence for ELISTA).

If Assumption 1 holds, can be satisfied by selecting properly,


are achieved, and is small enough, then for sequences generated by ELISTA, there exist “false positive” with and


where , and .

The definitions of and can be found in Definition 1 in (Liu et al., 2019). From Lemma 1 in (Chen et al., 2018b), we know . Besides, the definitions of and can be given by referring to Definition 2 in (Wu et al., 2020). Theorem 1 shows that our ELISTA attains linear convergence. We note that we have not given the detailed proof of Theorem 1, due to page limits. We will provide it in our future work.

4 Experimental Results

In this section, we evaluate our ELISTA in terms of sparse representation performance and 3D geometry recovery via photometric stereo. All the experimental settings are the same as the previous works (Chen et al., 2018b; Liu et al., 2019; Wu et al., 2020). However, the performance of SS (Chen et al., 2018b) is greatly affected by the hyper-parameters, and it is necessary to know the sparsity of in advance to set the hyper-parameters, which is difficult to get in real situations. Thus, in order to more fairly compare the impact of the network itself on performance, all the networks do not use SS. All training follows (Chen et al., 2018b). For all the methods, and are initialized as 1.0, and and are initialized as . All the results are obtained by running ten times and averaged.

4.1 Sparse Representation Performance

In this subsection, we compare our ELISTA with the state-of-the-art methods: LISTA, LAMP and GLISTA. We set , and

, and train the networks with two different noise levels: SNR (Signal-to-Noise Ratio) = 30,

and three different ill conditioned matrices with condition numbers = 5, 50, 500. For detailed data generation methods, please see (Li et al., 2021).

, SNR -38.658 -44.967 -65.569 -83.997
, SNR -37.471 -46.385 -63.523 -82.848
, SNR -31.845 -43.097 -57.542 -77.865
, SNR -23.593 -25.045 -32.757 -32.832
Table 2: Comparison of the NMSE performance with different algorithms under different and SNR.

Table 2 shows that our method obviously outperform the compared methods in the noiseless case. Especially, compared with LISTA, the NMSE performance of our method is almost twice as much as that of LISTA. In the presence of noise, our method achieves the state-of-the-art accuracy.

4.2 3D Geometry Recovery via Photometric Stereo

35 0.06836 0.06249 0.04724
25 0.09664 0.10033 0.06597
15 0.69334 0.63967 0.53269
Table 3: The mean angular error of 3D geometry recovery via photometric stereo.

In this subsection, we compare our ELISTA with the state-of-the-art methods: LISTA and GLISTA for 3D geometry recovery via photometric stereo, which is a powerful technique used to recover high resolution surface normals from a 3D scene using appearance changes of 2D images in different lighting (Woodham, 1980). In practice, however, the estimation process is often interrupted by non-lambert effects, such as highlights, shadows, or image noise. This problem can be solved by decomposing the observation matrix of the superimposed image under different lighting conditions into ideal lambert components and sparse error terms (Wu et al., 2010; Ikehata et al., 2012), i.e., , where denotes the resulting measurements, denotes the true surface normal, defines a lighting direction, is the diffuse albedo, acting here as a scalar multiplier and is an unknown sparse vector. By multiplying both sides of by the orthogonal complement to , we can get . Let be and be , can be obtained by solving the sparse coding problem. Then we can use to recover . The main experimental settings follow (Xin et al., 2016; Wu et al., 2020; He et al., 2017). Tests are performed using the 32-bit HDR gray-scale images of objects “Bunny” as in (Xin et al., 2016) with and 40 of the elements of the sparse noise are non-zero. From Table 3, we can find that our method performs much better than LISTA and GLISTA.

5 Conclusions

We proposed a novel extragradient based learned iterative shrinkage thresholding algorithm (called ELISTA) with an interpretable residual structure. Moreover, we proved ELISTA can achieve linear convergence. Extensive empirical results verified the high efficiency of our method. This could have both theoretical and practical impacts to the relationship between new neural network architectures and advanced algorithms, and potentially deepen our understanding to interpretability of deep learning models. One limitation of this paper is that we use the same assumption as in the previous work (Chen et al., 2018b; Liu et al., 2019; Wu et al., 2020), that the sparsity of is small enough. Removing this common assumption is our future work.


This work was supported by the National Natural Science Foundation of China (Nos. 61876221, 61876220 and 61976164), the Project supported the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (No. 61621005), the Major Research Plan of the National Natural Science Foundation of China (Nos. 91438201 and 91438103), the Program for Cheung Kong Scholars and Innovative Research Team in University (No. IRT_15R53), the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) (No. B07048), and the National Science Basic Research Plan in Shaanxi Province of China (No. 2020JM-194).


  • A. Aberdam, A. Golts, and M. Elad (2020) Ada-LISTA: learned solvers adaptive to varying models. arXiv preprint arXiv:2001.08456. Cited by: §1.
  • A. Aberdam, J. Sulam, and M. Elad (2019) Multi-layer sparse coding: the holistic way.

    SIAM Journal on Mathematics of Data Science

    1 (1), pp. 46–77.
    Cited by: §1.
  • P. Ablin, T. Moreau, M. Massias, and A. Gramfort (2019) Learning step sizes for unfolded sparse coding. In Advances in Neural Information Processing Systems, pp. 13100–13110. Cited by: §1.
  • A. Beck and M. Teboulle (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2 (1), pp. 183–202. Cited by: §1.
  • T. Blumensath and M. E. Davies (2008) Iterative thresholding for sparse approximations. Journal of Fourier analysis and Applications 14 (5-6), pp. 629–654. Cited by: §1.
  • M. Borgerding, P. Schniter, and S. Rangan (2017) AMP-inspired deep networks for sparse linear inverse problems. IEEE Transactions on Signal Processing 65 (16), pp. 4293–4308. Cited by: §1, §2.2.
  • R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud (2018a) Neural ordinary differential equations. In Advances in Neural Information Processing Systems, pp. 6571–6583. Cited by: §2.2.
  • X. Chen, J. Liu, Z. Wang, and W. Yin (2018b) Theoretical linear convergence of unfolded ISTA and its practical weights and thresholds. In Advances in Neural Information Processing Systems, pp. 9061–9071. Cited by: §1, §1, §3, §3, §4, §5.
  • X. Chen, Y. Li, R. Umarov, X. Gao, and L. Song (2020) Rna secondary structure prediction by learning unrolled algorithms. In Proceedings of the International Conference on Learning Representations, Cited by: §1.
  • K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv: 1406.1078. Cited by: §1.
  • J. Chung, C. Gulcehre, K. Cho, and Y. Bengio (2015) Gated feedback recurrent neural networks. In

    International Conference on Machine Learning

    pp. 2067–2075. Cited by: §1.
  • I. Daubechies, M. Defrise, and C. De Mol (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 57 (11), pp. 1413–1457. Cited by: §1.
  • D. L. Donoho, A. Maleki, and A. Montanari (2009) Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences 106 (45), pp. 18914–18919. Cited by: §1, §2.2.
  • B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, et al. (2004) Least angle regression. Annals of Statistics 32 (2), pp. 407–499. Cited by: §1.
  • R. Giryes, Y. C. Eldar, A. M. Bronstein, and G. Sapiro (2018) Tradeoffs between convergence speed and reconstruction accuracy in inverse problems. IEEE Transactions on Signal Processing 66 (7), pp. 1676–1690. Cited by: §1, §1.
  • K. Gregor and Y. LeCun (2010) Learning fast approximations of sparse coding. In Proceedings of the 27th International Conference on International Conference on Machine Learning, pp. 399–406. Cited by: §1, §1, §2.2.
  • H. He, B. Xin, S. Ikehata, and D. Wipf (2017) From bayesian sparsity to gated recurrent nets. In Advances in Neural Information Processing Systems, pp. 5554–5564. Cited by: §4.2.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 770–778. Cited by: §1, Figure 1.
  • J. R. Hershey, J. L. Roux, and F. Weninger (2014) Deep unfolding: model-based inspiration of novel deep architectures. arXiv preprint arXiv:1409.2574. Cited by: §1.
  • S. Ikehata, D. Wipf, Y. Matsushita, and K. Aizawa (2012) Robust photometric stereo using sparse regression. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 318–325. Cited by: §4.2.
  • D. Ito, S. Takabe, and T. Wadayama (2019) Trainable ISTA for sparse signal recovery. IEEE Transactions on Signal Processing 67 (12), pp. 3113–3125. Cited by: §1.
  • G. Korpelevich (1976) The extragradient method for finding saddle points and other problems. Matecon 12, pp. 747–756. Cited by: §2.1.
  • Y. Li, L. Kong, F. Shang, Y. Liu, H. Liu, and Z. Lin (2021) Learned extragradient ista with interpretable residual structures for sparse coding. In Proc. AAAI Conf. Artif. Intell., Cited by: §1, §4.1.
  • J. Liu, X. Chen, Z. Wang, and W. Yin (2019) ALISTA: analytic weights are as good as learned weights in LISTA. In Proceedings of the International Conference on Learning Representations, Cited by: §1, §2.2, §3, §3, §3, §4, §5.
  • C. Metzler, A. Mousavi, and R. Baraniuk (2017) Learned D-AMP: principled neural network based compressive image recovery. In Advances in Neural Information Processing Systems, pp. 1772–1783. Cited by: §1.
  • V. Monga, Y. Li, and Y. C. Eldar (2021) Algorithm unrolling: interpretable, efficient deep learning for signal and image processing. IEEE Signal Processing Magazine 38 (2), pp. 18–44. Cited by: §1.
  • T. Moreau and J. Bruna (2017) Understanding trainable sparse coding via matrix factorization. In Proceedings of the International Conference on Learning Representations, Cited by: §1.
  • T. P. Nguyen, E. Pauwels, E. Richard, and B. W. Suter (2018) Extragradient method in optimization: convergence and complexity. Journal of Optimization Theory and Applications 176 (1), pp. 137–162. Cited by: §2.1.
  • V. Papyan, Y. Romano, and M. Elad (2017) Convolutional neural networks analyzed via convolutional sparse coding. Journal of Machine Learning Research 18 (1), pp. 2887–2938. Cited by: §1.
  • J.H. Rick Chang, C. Li, B. Poczos, B. Vijaya Kumar, and A. C. Sankaranarayanan (2017) One network to solve them all–solving linear inverse problems using deep projection models. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5888–5897. Cited by: §1.
  • D. Simon and M. Elad (2019) Rethinking the csc model for natural images. In Advances in Neural Information Processing Systems, pp. 2271–2281. Cited by: §1.
  • P. Sprechmann, A. M. Bronstein, and G. Sapiro (2015) Learning efficient sparse and low rank models. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (9), pp. 1821–1833. Cited by: §1.
  • H. Sreter and R. Giryes (2018) Learned convolutional sparse coding. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2191–2195. Cited by: §1.
  • J. Sulam, A. Aberdam, A. Beck, and M. Elad (2019) On multi-layer basis pursuit, efficient algorithms and convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §1.
  • J. Sulam, V. Papyan, Y. Romano, and M. Elad (2018) Multilayer convolutional sparse modeling: pursuit and dictionary learning. IEEE Transactions on Signal Processing 66 (15), pp. 4090–4104. Cited by: §1.
  • J. Sun, H. Li, Z. Xu, et al. (2016) Deep ADMM-Net for compressive sensing MRI. In Advances in Neural Information Processing Systems, pp. 10–18. Cited by: §1.
  • R. Tibshirani (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 (1), pp. 267–288. Cited by: §1.
  • Z. Wang, Q. Ling, and T. S. Huang (2016) Learning deep encoders. In

    Proceedings of Thirtieth AAAI Conference on Artificial Intelligence

    Cited by: §1.
  • R. J. Woodham (1980) Photometric method for determining surface orientation from multiple images. Optical Engineering 19 (1), pp. 139–144. Cited by: §4.2.
  • K. Wu, Y. Guo, Z. Li, and C. Zhang (2020) SPARSE coding with gated learned ISTA. In Proceedings of the International Conference on Learning Representations, Cited by: §1, §3, §3, §3, §4.2, §4, §5.
  • L. Wu, A. Ganesh, B. Shi, Y. Matsushita, Y. Wang, and Y. Ma (2010) Robust photometric stereo via low-rank matrix completion and recovery. In Asian Conference on Computer Vision, pp. 703–717. Cited by: §4.2.
  • X. Xie, J. Wu, Z. Zhong, G. Liu, and Z. Lin (2019) Differentiable linearized ADMM. In Proceedings of the 27th International Conference on International Conference on Machine Learning, Cited by: §1.
  • B. Xin, Y. Wang, W. Gao, D. Wipf, and B. Wang (2016) Maximal sparsity with deep networks?. In Advances in Neural Information Processing Systems, pp. 4340–4348. Cited by: §1, §4.2.
  • J. Zarka, L. Thiry, T. Angles, and S. Mallat (2020) Deep network classification by scattering and homotopy dictionary learning. In Proceedings of the International Conference on Learning Representations, Cited by: §1.
  • J. Zhang and B. Ghanem (2018) ISTA-Net: interpretable optimization-inspired deep network for image compressive sensing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1828–1837. Cited by: §1.
  • Q. Zhang, X. Ye, H. Liu, and Y. Chen (2020) A novel learnable gradient descent type algorithm for non-convex non-smooth inverse problems. arXiv preprint arXiv:2003.06748. Cited by: §1.
  • J. T. Zhou, K. Di, J. Du, X. Peng, H. Yang, S. J. Pan, I. W. Tsang, Y. Liu, Z. Qin, and R. S. M. Goh (2018) SC2Net: sparse LSTMs for sparse coding. In Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §1.