## 1 Introduction

Compressed Sensing (CS) [cs] is a modern technique to recover signals of interest from few linear and possibly corrupted measurements , with . Iterative optimization algorithms applied on CS are by now widely used [fista], [vamp], [fpc]

. Recently, approaches based on deep learning were introduced

[dagan], [deepcodec]. It seems promising to merge these two areas by considering what is called*deep unfolding*. The latter pertains to unfolding the iterations of well-known optimization algorithms into layers of a deep neural network (DNN), which reconstructs the signal of interest.

Related work: Deep unfolding networks have gained much attention in the last few years [lista], [source], [bertocchi], because of some advantages they have compared to traditional DNNs: they are interpretable, integrate prior knowledge about the signal structure [deligiannis], and have a relatively small number of trainable parameters [adm]. Especially in the case of CS, many unfolding networks have proven to work particularly well. The authors in [amp], [ista-net], [admm-net], [holgernet] propose deep unfolding networks that learn a decoder, which aims at reconstructing from . Additionally, these networks jointly learn a dictionary that sparsely represents , along with thresholds used by the original optimization algorithms.

Motivation: Our work is inspired by the articles [ista-net] and [holgernet], which propose unfolded versions of the iterative soft thresholding algorithm (ISTA), with learnable parameters being the sparsifying (orthogonal) basis and/or the thresholds involved in ISTA. The authors then test their frameworks on synthetic data and/or real-world image datasets. In a similar spirit, we derive a decoder by interpreting the iterations of the alternating direction method of multipliers algorithm [admm] (ADMM) as a DNN and call it ADMM Deep Analysis Decoding (ADMM-DAD) network. We differentiate our approach by learning a redundant analysis operator as a sparsifier for , i.e. we employ

*analysis sparsity*in CS. The reason for choosing analysis sparsity over its synthesis counterpart is due to some advantages the former has. For example, analysis sparsity provides flexibility in modeling sparse signals, since it leverages the redundancy of the involved analysis operators. We choose to unfold ADMM into a DNN since most of the optimization-based CS algorithms cannot treat analysis sparsity, while ADMM solves the generalized LASSO problem [genlasso] which resembles analysis CS. Moreover, we test our decoder on speech datasets, not only on image ones. To the best of our knowledge, an unfolded CS decoder has not yet been used on speech datasets. We compare numerically our proposed network to the state-of-the-art learnable ISTA of [holgernet], on real-world image and speech data. In all datasets, our proposed neural architecture outperforms the baseline, in terms of both test and generalization error.

Key results: Our novelty is twofold: a) we introduce a new ADMM-based deep unfolding network that solves the analysis CS problem, namely ADMM-DAD net, that jointly learns an analysis sparsifying operator b) we test ADMM-DAD net on image and speech datasets (while state-of-the-art deep unfolding networks are only tested on synthetic data and images so far). Experimental results demonstrate that ADMM-DAD outperforms the baseline ISTA-net on speech and images, indicating that the redundancy of the learned analysis operator leads to a smaller test MSE and generalization error as well.

Notation: For matrices , we denote by their concatenation with respect to the first dimension, while we denote by their concatenation with respect to the second dimension. We denote by a square matrix filled with zeros. We write for the real identity matrix. For , the soft thresholding operator is defined in closed form as . For , the soft thresholding operator acts componentwise, i.e. . For two functions , we write their composition as .

## 2 Main Results

Optimization-based analysis CS:

As we mentioned in Section 1, the main idea of CS is to reconstruct a vector

from , , where is the so-called measurement matrix and , with , corresponds to noise. To do so, we assume there exists a redundant sparsifying transform () called the analysis operator, such that is (approximately) sparse. Using analysis sparsity in CS, we wish to recover from . A common approach is the analysis -minimization problem(1) |

A well-known algorithm that solves (1) is ADMM, which considers an equivalent generalized LASSO form of (1), i.e.,

(2) |

with being a scalar regularization parameter. ADMM introduces the dual variables , so that (2) is equivalent to

(3) |

Now, for (penalty parameter), initial points and , the optimization problem in (3) can be solved by the iterative scheme of ADMM:

(4) | |||

(5) | |||

(6) |

The iterates (4) – (6) are known [admm] to converge to a solution of (3), i.e., and as .

Neural network formulation: Our goal is to formulate the previous iterative scheme as a neural network. We substitute first (4) into the update rules (5) and (6) and second (5) into (6), yielding

(7) | ||||

where

(8) | ||||

(9) |

We introduce and set to obtain

(10) |

Now, we set and , , so that (10) is transformed into

(11) |

Based on (11), we formulate ADMM as a neural network with layers/iterations, defined as

The trainable parameters are the entries of (or more generally, the parameters in a parameterization of ). We denote the concatenation of such layers (all having the same ) as

(12) |

The final output is obtained after applying an affine map motivated by (4) to the final layer , so that

(13) |

where . In order to clip the output in case its norm falls out of a reasonable range, we add an extra function defined as if and otherwise, for some fixed constant . We introduce the hypothesis class

(14) |

5 layers | CS ratio | |||||||
---|---|---|---|---|---|---|---|---|

DecoderDataset | SpeechCommands | TIMIT | MNIST | CIFAR10 | ||||

test MSE | gen. error | test MSE | gen. error | test MSE | gen. error | test MSE | gen. error | |

ISTA-net | ||||||||

ADMM-DAD |

10 layers | 40% CS ratio | 50% CS ratio | ||||||
---|---|---|---|---|---|---|---|---|

DecoderDataset | SpeechCommands | TIMIT | SpeechCommands | TIMIT | ||||

test MSE | gen. error | test MSE | gen. error | test MSE | gen. error | test MSE | gen. error | |

ISTA-net | ||||||||

ADMM-DAD |

consisting of all the functions that ADMM-DAD can implement. Then, given the aforementioned class and a set of training samples, ADMM-DAD yields a function/decoder that aims at reconstructing from . In order to measure the difference between and , , we choose the training mean squared error (train MSE)

(15) |

as loss function. The test mean square error (test MSE) is defined as

(16) |

where is a set of test data, not used in the training phase. We examine the generalization ability of the network by considering the difference between the average train MSE and the average test MSE, i.e.,

(17) |

## 3 Experimental Setup

Datasets and pre-processing: We train and test the proposed ADMM-DAD network on two speech datasets, i.e., SpeechCommands [speechcommands] (85511 training and 4890 test speech examples, sampled at 16kHz) and TIMIT [timit] (phonemes sampled at 16kHz; we take 70% of the dataset for training and the 30% for testing) and two image datasets, i.e. MNIST [mnist] (60000 training and 10000 test image examples) and CIFAR10 [cifar] (50000 training and 10000 test coloured image examples). For the CIFAR10 dataset, we transform the images into grayscale ones. We preprocess the raw speech data, before feeding them to both our ADMM-DAD and ISTA-net: we downsample each .wav file from 16000 to 8000 samples and segment each downsampled .wav into 10 segments.

Experimental settings: We choose a random Gaussian measurement matrix and appropriately normalize it, i.e., . We consider three CS ratios

. We add zero-mean Gaussian noise with standard deviation

to the measurements, set the redundancy ratio for the trainable analysis operator , perform He (normal) initialization for and choose . We also examine different values for , as well as treating as trainable parameters, but both settings yielded identical performance. We evaluate ADMM-DAD for and layers. All networks are trained using the*Adam*optimizer [adam] and batch size . For the image datasets, we set the learning rate and train the - and -layer ADMM-DAD for and epochs, respectively. For the audio datasets, we set and train the - and -layer ADMM-DAD for and epochs, respectively. We compare ADMM-DAD net to the ISTA-net proposed in [holgernet]. For ISTA-net, we set the best hyper-parameters proposed by the original authors and experiment with and

layers. All networks are implemented in PyTorch

[pytorch]. For our experiments, we report the average test MSE and generalization error as defined in (16) and (17) respectively.## 4 Experiments and Results

We compare our decoder to the baseline of the ISTA-net decoder, for layers on all datasets with a fixed CS ratio, and for layers and both and CS ratios on the speech datasets and report the corresponding average test MSE and generalization error in Table 1. Both the training errors and the test errors are always lower for our ADMM-DAD net than for ISTA-net. Overall, the results from Table 1 indicate that the redundancy of the learned analysis operator improves the performance of ADMM-DAD net, especially when tested on the speech datasets. Furthermore, we extract the spectrograms of an example test raw audio file of TIMIT reconstructed by either of the 5-layer decoders. We use FFT points. The resulting spectrograms for and CS ratio are illustrated in Fig. 1. Both figures indicate that our decoder outperforms the baseline, since the former distinguishes many more frequencies than the latter. Naturally, the quality of the reconstructed raw audio file by both decoders increases, as the CS ratio also increases from to . However, ADMM-DAD reconstructs –even for the CS ratio– a clearer version of the signal compared to ISTA-net; the latter recovers a significant part of noise, even for the CS ratio. Finally, we examine the robustness of both 10-layer decoders. We consider noisy measurements in the test set of TIMIT, taken at and CS ratio, with varying standard deviation (std) of the additive Gaussian noise. Fig. 2 shows how the average test MSE scales as the noise std increases. Our decoder outperforms the baseline by an order of magnitude and is robust to increasing levels of noise. This behaviour confirms improved robustness when learning a redundant sparsifying dictionary instead of an orthogonal one.

## 5 Conclusion and future directions

In this paper we derived ADMM-DAD, a new deep unfolding network for solving the analysis Compressed Sensing problem, by interpreting the iterations of the ADMM algorithm as layers of the network. Our decoder jointly reconstructs the signal of interest and learns a redundant analysis operator, serving as sparsifier for the signal. We compared our framework with a state-of-the-art ISTA-based unfolded network on speech and image datasets. Our experiments confirm improved performance: the redundancy provided by the learned analysis operator yields a lower average test MSE and generalization error of our method compared to the ISTA-net. Future work will include the derivation of generalization bounds for the hypothesis class defined in (14) similar to [holgernet]. Additionally, it would be interesting to examine the performance of ADMM-DAD, when constraining to a particular class of operators, e.g., for being a tight frame.

Comments

There are no comments yet.