Unfolding Neural Networks for Compressive Multichannel Blind Deconvolution

10/22/2020 ∙ by Bahareh Tolooshams, et al. ∙ Weizmann Institute of Science Harvard University 2

We propose a learned-structured unfolding neural network for the problem of compressive sparse multichannel blind-deconvolution. In this problem, each channel's measurements are given as convolution of a common source signal and sparse filter. Unlike prior works where the compression is achieved either through random projections or by applying a fixed structured compression matrix, this paper proposes to learn the compression matrix from data. Given the full measurements, the proposed network is trained in an unsupervised fashion to learn the source and estimate sparse filters. Then, given the estimated source, we learn a structured compression operator while optimizing for signal reconstruction and sparse filter recovery. The efficient structure of the compression allows its practical hardware implementation. The proposed neural network is an autoencoder constructed based on an unfolding approach: upon training, the encoder maps the compressed measurements into an estimate of sparse filters using the compression operator and the source, and the linear convolutional decoder reconstructs the full measurements. We demonstrate that our method is superior to classical structured compressive sparse multichannel blind-deconvolution methods in terms of accuracy and speed of sparse filter recovery.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In a multi-receiver radar system, a transmit source signal is reflected from sparsely located targets and measured at the receivers. The received signals are modeled as convolutions of the source signal and sparse filters that depend on the targets’ location relative to the receivers [3, 4]. Often, the source signal is unknown at the receiver due to distortion during transmission. With an unknown source, the problem of determining sparse filters is known as sparse multichannel blind-deconvolution (S-MBD). This model is ubiquitous in many other applications such as seismic signal processing [23], room impulse response modeling [24], sonar imaging [7], and ultrasound imaging [27, 28]. In these applications, the receivers’ hardware and computational complexity depend on the number of measurements required at each receiver or channel to determine sparse filters uniquely. Hence, it is desirable to compress the number of measurements on each channel.

Prior works have proposed computationally efficient and robust algorithms to recover the filters from the measurements. Examples are -norm methods [29, 17, 6], the sparse dictionary calibration [16, 22] and truncated power iteration methods [19], and convolutional dictionary learning [14]. The works in [18, 29, 10] establish theoretical guarantees on identifying the S-MBD problem. However, these methods are computationally demanding, require a series of iterations for convergence, and need access to the full measurements.

Recent works proposed model-based neural networks to address computational efficiency [26, 25], but still require full measurements for recovery. To enable compression, Chang et al. [8]

proposed an autoencoder, called RandNet, for dictionary learning. In this work, compression is achieved by projecting data into a lower-dimensional space through a data-independent unstructured random matrix. Mulleti et al. 

[22] proposed a data-independent, structured compression operator that could be realized through a linear filter. Specifically, the compressed measurements are computed by applying a specific filter to the received signals followed by the truncation of the filtered measurements. Two natural questions are (1) can we learn a compression filter from a given set of measurements, rather than applying a fixed filter as in [22]? (2) will this data-driven approach result in better compression for a given estimation accuracy?

We propose a model-based neural network that learns a hardware-efficient, data-driven, and structured compression matrix to recover sparse filters from reduced measurements. Our approach takes inspiration from filter-based compression [22], learning compression operators [21], and the model-based compressed learning approach for S-MBD [8]. The architecture, which we call learned (L) structured (S) compressive multichannel blind-deconvolution network (LS-MBD), learns a filter for compression, recovers the unknown source, and estimates sparse filters. In contrast to [22, 21, 8], LS-MBD improves in the following ways. In [22], a fixed and data-independent filter is used for compression, and the reconstruction is independent of the compression. In LS-MBD, we learn a filter that enables compression and lets us estimate sparse filters accurately. The approach is computationally efficient and results in lower reconstruction error for a given compression ratio compared with [22]. Unlike in [21], our compression operator is linear, is used recurrently in the architecture, has fewer parameters to learn, and has computationally efficient implementation [11].

Section 2 explains the S-MBD problem formulation. In Section 3, we introduce our method, its architecture and training procedure. We present our results in Section 4, highlighting the superiority of LS-MBD and its efficiency compared to baselines.

Stage 2 Loss

Stage 1 Loss

Repeat times



Available only

for Training

Figure 1: Architecture of Learned Structured Multichannel Blind-Deconvolution (LS-MBD). , where we set as in FISTA [5]. In the first stage of training, we set to have access to the full measurements. Trainable blocks/weights and the loss for the first and second stage of training are in orange and blue, respectively.

2 Problem Formulation

Consider a set of signals given as


where denotes the linear-convolution operation and the matrix

is the convolution matrix constructed from the vector

. We make the following structural assumption on the signals: (A1) and (A2) and for . Due to convolution, each has length . In a S-MBD problem, the objective is to determine the common source signal and sparse filters from the measurements .

In compressive S-MBD, the objective is to estimate the unknowns from a set of compressive measurements where . The compression operator is a mapping from to . This operator could be either linear or nonlinear, random or deterministic, structured or unstructured, data-dependent or data-independent (see [11] for details).

We consider the problem of designing a linear, structured, and data-driven compression operator that enables accurate estimation of sparse filters from the compressed measurements. Specifically, our goal is to jointly learn , and a structured, practically-realizable and data-driven such that sparse filters, , are determined from the compressive measurements .

3 Learning Structured Compressive S-MBD

3.1 Compression Operator

We impose a Toeplitz structure on the compression operator and denote the filter associated with the operator by . Such compression matrices have several advantages compared to random projections; they facilitate the use of computationally efficient matrix operations and can be practically implemented as a filtering operation in hardware. The compression operator involves a convolution followed by a truncation. As the operation holds for all , we drop the channel-index superscript, for simplicity. Consider a causal filter of length . The truncated convolution samples are


which are the convolution samples where and have complete overlap. Choice of samples is to ensure that maximum information from is retained in . The corresponding measurement matrix is a Toeplitz matrix given as

3.2 Network Architecture

We aim to minimize the following objective


where is a sparsity-enforcing parameter, and the norm constraints are to avoid scaling ambiguity. Following a similar approach to [15, 26, 20], we construct an autoencoder where its encoder maps into a sparse filter by unfolding iterations of a variant of accelerated proximal gradient algorithm, FISTA [5], for sparse recovery. Specifically, each unfolding layer performs the following iteration


where , is set as in FISTA, is the unfolding step-size, and the sparsity promoting soft-thresholding operator where .

One may leave the bias in each layer unconstrained to be learned. However, theoretical analysis of unfolding ISTA [9] has proved that converges to as goes to infinity. Hence, we set with . In this regard, is a scalar that can be either tuned or learned. We keep the unfolded layers tied as this leads to better generalization in applications with limited-data.

The decoder reconstructs the data using . We call this network, shown in Figure 1, LS-MBD. In this architecture, and

correspond to weights of the neural network and are learned by backpropagation in two stages. This method’s novelty is in the hardware-efficient design of the operator

capturing data information when compressing. This architecture reduces to RandNet [8] when and have no structure (e.g. Toeplitz) and .

3.3 Training Procedure

We follow a two-stage approach to learn the compression filter. In the first stage, we start with a limited set of full measurements and estimate the source and filters. In the second, we learn the compression filter associated with given the estimated source. This two-stage training disentangles source estimation and learning of the compression. Hence, source estimation is performed only once to be used for various compression ratios (CR)s. Besides, a low CR would not affect the quality of source estimation, and the number of measurements required for filter recovery can be optimized independently.

Specifically, having access to the full measurements

, we set the compression operator to be the identity matrix (i.e.,

), hence reducing the autoencoder architecture to a variant of CRsAE [25], and learn by minimizing . Then, given the learned source matrix , we run the forward set of the encoder to estimate sparse filters, which we denote . In the second stage, for a given compression ratio , we set to the learned source and train within the encoder by minimizing .

In practice, the method is useful in the following scenario. Consider a multi-receiver radar scenario where a set of receivers are already in place and operate with full measurements. Let it require adding a new set of receivers to replace the existing ones or gather more information. While designing these new set of receivers, one can use the information from the available full measurements from the existing receivers to estimate the source and then design the optimal compression filters for new receivers. Thus, the new receivers sense compressed measurements that result in reduced cost and complexity. In essence, the approach is similar to the deep-sparse array method [12], where a full array is used in one scan to learn sparse arrays for the rest of the scans.

4 Experiments

4.1 Data Generation

We considered the noiseless case and generated measurements following the model in (1) where , , and . The source follows a Gaussian shape generated as where , and then normalized such that . The support of the filters are generated uniformly at random over the set the set

, and their amplitudes from a uniform distribution


In the aforementioned assumptions, we neither impose restrictions on the minimum separation of any two non-zero components of the sparse filter nor the the filter components’ relative amplitudes. In the presence of noise, both of these factors play a crucial role in the filter estimation, and the recovery error is expected to increase. Our method’s recovery performance and stability analysis in the presence of noise is a direction of future research.

4.2 Network and Training Parameters

We implemented the network in PyTorch. We unfolded the encoder for

iterations, and set . We set the regularization parameter and decreased by a factor of at every iteration (i.e., ). To ensure stability, we chose the parameter

to be less than the reciprocal of the singular value of the matrix

at initialization. In the second stage, we tuned and finely by grid-search in the ranges and , respectively. Given the non-negativity of sparse filters, we let .

In the first stage, we initialize the source’s entries randomly according to a standard Normal distribution. The network is trained for

epochs using full-batch gradient descent. We use the ADAM optimizer with a learning rate of , which we decrease by a factor of every epochs. To achieve convergence without frequent skipping of the minimum of interest, we set of the optimizer to be . Given the learned source, and following a similar approach in generating sparse filters, we generate examples for training, for validation, and for testing. Then, we run the encoder given to estimate of sparse filters . In the second stage, we initialize the filter for similarly to the source. We use the ADAM optimizer with and a learning rate of with a decaying factor of for every epochs. We use a batch size of and train for epochs for each CR.

The iterative usage of and within each layer of the architecture, a property that does not exist in [21], allows us to use a combination of analytic and automatic differentiation [1] to compute the gradient; in this regard, we backpropagate up to the last iterations within the encoder to compute the gradient and assume that the representation is not a function of the trainable parameters. This allows us to unfold the network for large in the forward pass without increasing the computational and space complexity of backpropagation, a property that unfolded networks with untied layers do not possess. Lastly, we normalize the source and compression filter after every gradient update.

4.3 Baselines

For each CR, we compare five methods detailed below.

LS-MBD:  is learned (L) and structured (S).

LS-MBD-L: is learned (L) and structured (S). Motivated by the fast convergence of LISTA, the encoder performs steps of the proximal gradient iteration . In the second stage of training, we learn .

GS-MBD: is random Gaussian (G) and structured (S).

FS-MBD: is fixed (F) and structured (S) [22]. In [22], the authors derive identifiability in the Fourier domain, and design the filters to enable computation of the specific Fourier measurements. The authors also show that for FS-MBD, the -sparse filters are uniquely identifiable from compressed measurements from any two channels and compressed measurements from the rest of the channels. Here, we applied the blind-dictionary calibration approach from [22] together with FISTA [5] for the sparse-coding step.

G-MBD: is an unstructured random Gaussian (G) matrix. We consider G-MBD as an oracle baseline that implements a computationally expensive compression operator.

LS-MBD, GS-MBD, and G-MBD use the architecture shown in Figure 1, each with a different .

4.4 Results

We show that LS-MBD is superior to GS-MBD, FS-MBD, and LS-MBD-L. We evaluate performance in terms of how well we can estimate the filters and source. Let be an estimate true filters . We use the normalized MSE in dB as a comparison metric and call a method successful if this error is below dB. Letting denote an estimate of the source, we quantify the quality of source recovery using the error  [2], which ranges from to , where corresponds to exact source recovery.

Figures 22, and 2 visualize the results from the first stage of training. Figure 2 shows that the source estimation error and the training loss both decrease and converge to zero as a function of epochs. Figure 2 shows the filter estimation error and successful recovery of the filters upon training. Figure 2 shows the source before and after training, where the learned and true sources match.

Table 1 shows the inference runtime of the methods averaged over 20 independent trials. All methods except FS-MBD are run on GPU. LS-MBD-L has the fastest inference because it only unfolds a small number of proximal gradient iterations. Despite its speed, LS-MBD-L has the worst recovery performance.

Figure 2: (a) Reconstruction and source estimation losses as a function of epochs during the first stage of training. (b) Sparse filter recovery error as a function of epochs during the first stage of training. (c) Initialized (blue), learned (green), and true (black) source. (d) Sparse filter recovery of a test example for .
runtime [s] 164
Table 1: Comparison of the inference runtime of the methods for when .
99 -54.05 -44.93 -43.96 -53.27 -26.54
80 -55.07 -40.55 -26.52 -52.80 -
70 -52.43 -40.00 -22.76 -51.50 -
62 -53.63 -37.13 -21.86 -54.71 -
50 -53.36 -28.57 -8.40 -51.41 -
47 -50.60 -26.11 -6.84 -50.35 -
45 -52.98 -23.17 -6.14 -43.61 -
40 -47.39 -14.75 -5.13 -17.07 -
Table 2: Filter recovery error in dB on the test set for various CRs (bold indicates successful recovery with error below dB).

Table 2 shows the filter recovery error for various CR on the test set. G-MBD and GS-MBD results are averages over ten independent trials. For LS-MBD-L, we only report for , which already shows a very high recovery error. Among the three structured compression matrix methods, we observe that the proposed LS-MBD approach outperforms other structured methods. Theoretical results in [22] suggest that for structured compression when (i.e., ), we will have successful recovery of sparse filters. LS-MBD goes beyond and has a successful recovery for CR lower than up to .

Comparison of LS-MBD and LS-MBD-L highlights the importance of deep unfolding/encoder depth and the use of a model-based encoder (i.e., large and tied encoder/decoders). LS-MBD-L, even in the presence of a learned encoder, fails to recover sparse filters. Besides, we observed that LS-MBD generalizes better (i.e., closer test and training errors) than LS-MBD-L, highlighting the importance of limiting the number of trainable parameters by weight-tying in applications where data are limited.

For FS-MBD, we observe that for the given set of data, the approach fails to estimate the filters for accurately. In large part, this is due to the failure of the FISTA algorithm in the sparse-coding step. To verify, we consider the non-blind case where the source is assumed to be known. In this case, the normalized MSE in the estimation of the filters from the compressed Fourier measurements by using FISTA are comparable to that of FS-MBD. In comparison to [22], LS-MBD learns a filter that enables compression and accurate sparse coding, which results in a lower MSE. We observe that our oracle method, G-MBD, outperforms the structure-based compression methods. We attribute this superior performance to the fact that the compression matrix in G-MBD has independent random entries that result in low mutual coherence among its columns [13]

. In the rest of the methods, the compression matrices have a Toeplitz structure, results in high coherence. Despite being less accurate compared to G-MBD, the compression matrix in LS-MBD has fewer degrees of freedom, is practically feasible to implement, and its Toeplitz structure can be used to speed up matrix computations in the recovery process. Table 


compares the memory storage and computational costs of the unstructured (G-MBD) and structured (LS-MBD) compression operators. The table highlights the efficiency of LS-MBD; in this case, we report the complexity of the operation performed using the fast Fourier transform.

Method Cost Memory Storage Complexity
Table 3: Memory storage and computational complexity of the compression operator when it is structured or unstructured.
Figure 3: Learned compression filter corresponding to compression matrix in LS-MBD for various CR shown in time domain.
Figure 4: Magnitude of discrete Fourier transform of the source (black), the learned compression filter (blue), and the compressed measurements (red) for in LS-MBD.

In LS-MBD, when , the filter corresponding to is initialized at random. For lower ratios, we “warm-start” the network using a shortened version of the filter learned when . Figures 2 visualizes the filter recovered using a test example obtained with a CR of . Figure 3 shows, in the time domain, the compression filters learned for various compression ratios. Figure 4 depicts the magnitude of the discrete Fourier transform of the source (black), learned compression filter (blue), and the compressed measurements (red) when . The alignment of and indicates that the filtering operation performed by the learned filter preserves information from the source, which may explain the success of LS-MBD compared to the other methods.

5 Conclusions

We proposed a compressive sparse multichannel blind-deconvolution method, named LS-MBD, based on unfolding neural networks [20]. LS-MBD is an autoencoder that recovers sparse filters at the output of its encoder, and whose convolutional decoder corresponds to the source of interest. In this framework, we learn an efficient and structured compression matrix that allows us to have a faster and better accuracy in sparse filter recovery than other methods. We attribute our framework’s superiority against FS-MBD [22] to learning a compression optimized for both reconstruction and filter recovery.


  • [1] P. Ablin, G. Peyré, and Th. Moreau (2020) Super-efficiency of automatic differentiation for functions defined as a minimum. In Proc. Int. Conf. on Machine Learning (ICML), pp. 1–10. Cited by: §4.2.
  • [2] A. Agarwal, A. Anandkumar, P. Jain, P. Netrapalli, and R. Tandon (2016) Learning sparsely used overcomplete dictionaries via alternating minimization. SIAM J. Opt. 26, pp. 2775–2799. Cited by: §4.4.
  • [3] W. U. Bajwa, K. Gedalyahu, and Y. C. Eldar (2011-Jun.)

    Identification of parametric underspread linear systems and super-resolution radar

    IEEE Trans. Signal Process. 59 (6), pp. 2548–2561. External Links: Document, ISSN 1053-587X Cited by: §1.
  • [4] O. Bar-Ilan and Y. C. Eldar (2014-Apr.) Sub-Nyquist radar via Doppler focusing. IEEE Trans. Signal Process. 62 (7), pp. 1796–1811. External Links: Document, ISSN 1053-587X Cited by: §1.
  • [5] A. Beck and M. Teboulle (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2 (1), pp. 183–202. Cited by: Figure 1, §3.2, §4.3.
  • [6] C. Bilen, G. Puy, R. Gribonval, and L. Daudet (2014-Sep.) Convex optimization approaches for blind sensor calibration using sparsity. IEEE Trans. Signal Process. 62 (18), pp. 4847–4856. External Links: Document, ISSN 1053-587X Cited by: §1.
  • [7] G. Carter (1981-Jun.) Time delay estimation for passive sonar signal processing. IEEE Trans. Acoust., Speech, Signal Process. 29 (3), pp. 463–470. External Links: Document, ISSN 0096-3518 Cited by: §1.
  • [8] T. Chang, B. Tolooshams, and D. Ba (2019)

    Randnet: deep learning with compressed measurements of images

    In Proc. Workshop on Machine Learning for Signal Process. (MLSP), pp. 1–6. Cited by: §1, §1, §3.2.
  • [9] X. Chen, J. Liu, Z. Wang, and W. Yin Theoretical linear convergence of unfolded ista and its practical weights and thresholds. In Proc. Advances in Neural Info. Process. Sys. (NeurIPS), pp. 9061–9071. Cited by: §3.2.
  • [10] A. Cosse (2017) A note on the blind deconvolution of multiple sparse signals from unknown subspaces. Proc. SPIE 10394 (). External Links: Document, Cited by: §1.
  • [11] M. F. Duarte and Y. C. Eldar (2011) Structured compressed sensing: from theory to applications. IEEE Tran. Signal Process. 59 (9), pp. 4053–4085. Cited by: §1, §2.
  • [12] A. M. Elbir, K. V. Mishra, and Y. C. Eldar (2019) Cognitive radar antenna selection via deep learning. IET Radar, Sonar & Navigation 13 (6), pp. 871–880. Cited by: §3.3.
  • [13] Y. C. Eldar and G. Kutyniok (2012) Compressed sensing: theory and applications. Cambridge University Press. External Links: Document Cited by: §4.4.
  • [14] C. Garcia-Cardona and B. Wohlberg (2018) Convolutional dictionary learning: A comparative review and new algorithms. IEEE Trans. Comput. Imag. 4 (3), pp. 366–381. Cited by: §1.
  • [15] K. Gregor and Y. LeCun (2010) Learning fast approximations of sparse coding. In Proc. Int. Conf. Machine Lerning (ICML), pp. 399–406. Cited by: §3.2.
  • [16] R. Gribonval, G. Chardon, and L. Daudet (2012-Mar.) Blind calibration for compressed sensing by convex optimization. In Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. (ICASSP), Vol. , pp. 2713–2716. Cited by: §1.
  • [17] N. Kazemi and M. D. Sacchi (2014) Sparse multichannel blind deconvolution. Geophysics 79 (5), pp. V143–V152. Cited by: §1.
  • [18] Y. Li, K. Lee, and Y. Bresler (2017-Feb.) Identifiability in bilinear inverse problems with applications to subspace or sparsity-constrained blind gain and phase calibration. IEEE Trans. Info. Theory 63 (2), pp. 822–842. External Links: Document, ISSN 0018-9448 Cited by: §1.
  • [19] Y. Li, K. Lee, and Y. Bresler (2019-05) Blind gain and phase calibration via sparse spectral methods. IEEE Trans. Info. Theory 65 (5), pp. 3097–3123. External Links: ISSN 1557-9654 Cited by: §1.
  • [20] V. Monga, Y. Li, and Y. C. Eldar (2019) Algorithm unrolling: interpretable, efficient deep learning for signal and image processing. arXiv preprint arXiv:1912.10557. Cited by: §3.2, §5.
  • [21] A. Mousavi, A. B. Patel, and R. G. Baraniuk (2015) A deep learning approach to structured signal recovery. In Proc. Allerton Conf. Commun., Control, and Compu. (Allerton), Vol. , pp. 1336–1343. Cited by: §1, §4.2.
  • [22] S. Mulleti, K. Lee, and Y. C. Eldar (2020) Identifiability conditions for compressive multichannel blind deconvolution. IEEE Trans. Signal Process. 68 (), pp. 4627–4642. Cited by: §1, §1, §1, §4.3, §4.4, §4.4, §5.
  • [23] K. Nose-Filho, A. K. Takahata, R. Lopes, and J. M. T. Romano (2018-Mar.) Improving sparse multichannel blind deconvolution with correlated seismic data: Foundations and further results. IEEE Signal Process. Mag. 35 (2), pp. 41–50. External Links: Document, ISSN 1053-5888 Cited by: §1.
  • [24] C. Papayiannis, C. Evers, and P. A. Naylor (2017-Aug.)

    Sparse parametric modeling of the early part of acoustic impulse responses

    In proc. European Signal Process. Conf. (EUSIPCO), Vol. , pp. 678–682. External Links: Document, ISSN 2076-1465 Cited by: §1.
  • [25] B. Tolooshams, S. Dey, and D. Ba (2018) Scalable convolutional dictionary learning with constrained recurrent sparse auto-encoders. In Proc. Workshop on Machine Learning for Signal Process. (MLSP), pp. 1–6 (eng). Cited by: §1, §3.3.
  • [26] B. Tolooshams, S. Dey, and D. Ba (2020)

    Deep residual autoencoders for expectation maximization-inspired dictionary learning

    IEEE Trans. Neural Netw. Learn. Syst., pp. 1–15. External Links: Document Cited by: §1, §3.2.
  • [27] R. Tur, Y. C. Eldar, and Z. Friedman (2011-Apr.) Innovation rate sampling of pulse streams with application to ultrasound imaging. IEEE Trans. Signal Process. 59 (4), pp. 1827–1842. External Links: Document, ISSN 1053-587X Cited by: §1.
  • [28] N. Wagner, Y. C. Eldar, and Z. Friedman (2012-Sep.) Compressed beamforming in ultrasound imaging. IEEE Trans. Signal Process. 60 (9), pp. 4643–4657. External Links: Document, ISSN 1053-587X Cited by: §1.
  • [29] L. Wang and Y. Chi (2016-Oct.) Blind deconvolution from multiple sparse inputs. IEEE Signal Process. Lett. 23 (10), pp. 1384–1388. External Links: Document, ISSN 1070-9908 Cited by: §1.