DeepAI

# Continuous Generative Neural Networks

In this work, we present and study Continuous Generative Neural Networks (CGNNs), namely, generative models in the continuous setting. The architecture is inspired by DCGAN, with one fully connected layer, several convolutional layers and nonlinear activation functions. In the continuous L^2 setting, the dimensions of the spaces of each layer are replaced by the scales of a multiresolution analysis of a compactly supported wavelet. We present conditions on the convolutional filters and on the nonlinearity that guarantee that a CGNN is injective. This theory finds applications to inverse problems, and allows for deriving Lipschitz stability estimates for (possibly nonlinear) infinite-dimensional inverse problems with unknowns belonging to the manifold generated by a CGNN. Several numerical simulations, including image deblurring, illustrate and validate this approach.

• 6 publications
• 5 publications
• 1 publication
06/15/2020

### Globally Injective ReLU Networks

We study injective ReLU neural networks. Injectivity plays an important ...
12/09/2019

### Efficient approximation of high-dimensional functions with deep neural networks

In this paper, we develop an approximation theory for deep neural networ...
11/28/2022

### Lipschitz constant estimation for 1D convolutional neural networks

In this work, we propose a dissipativity-based method for Lipschitz cons...
09/07/2020

### Stabilizing Invertible Neural Networks Using Mixture Models

In this paper, we analyze the properties of invertible neural networks, ...
08/28/2022

### Neural Network Approximation of Lipschitz Functions in High Dimensions with Applications to Inverse Problems

The remarkable successes of neural networks in a huge variety of inverse...
03/25/2018

### SUNLayer: Stable denoising with generative networks

It has been experimentally established that deep neural networks can be ...
10/05/2019

### Minimum "Norm" Neural Networks are Splines

We develop a general framework based on splines to understand the interp...

## 1 Introduction

Deep generative models are a large class of deep learning architectures whose goal is to approximate high-dimensional probability distributions

[54]. A trained model is then able to easily generate new realistic samples. They received huge interest in the last decade both for very promising applications in physics [21, 47], medicine [30, 45, 56], computational chemistry [58, 59] and more recently also for their worrying ability in producing realistic fake videos, a.k.a. deepfakes [27]

. Several architectures and training protocols have proven to be very effective, including variational autoencoders (VAEs)

[35][25] and normalising flows [53, 22].

In this paper we consider a generalization of some of these architectures to a continuous setting, where the samples to be generated belong to an infinite-dimensional function space. One reason is that many physical quantities of interest are better modeled as functions than vectors, e.g. solutions of partial differential equations (PDEs). In this respect, this work fits in the growing research area of neural networks in infinite-dimensional spaces, often motivated by the study of PDEs, which includes Neural Operators

[38], Deep-O-Nets [42], PINNS [52] and many others.

A second reason concerns the promising applications of generative models in solving inverse problems. A typical inverse problem consists in the recovery of a quantity from noisy observations that are described by a ill-posed operator between function spaces [23]

. Virtually, every imaging modality can be modeled in such a way, including computed tomography (CT), magnetic resonance imaging (MRI) and ultrasonography. In recent years, machine learning based reconstruction algorithms have become the state of the art in most imaging applications

[10, 46]. Among these algorithms, the ones combining generative models with classical iterative methods – such as the Landweber scheme – are very promising since they retain most of the explanaibility provided by inverse problems theory. However, despite the impressive numerical results [16, 57, 9, 34, 55, 11, 33], many theoretical questions have not been studied yet, for instance concerning stability properties of the reconstruction.

In this work (Section 2), we introduce a family of continuous generative neural networks (CGNNs), mapping a finite-dimensional space into an infinite-dimensional function space. Inspired by the architecture of deep convolutional GANs (DCGANs) [51], CGNNs are obtained by composing an affine map with several (continuous) convolutional layers with nonlinear activation functions. The convolutional layers are constructed as maps between the subspaces of a multi-resolution analysis (MRA) at different scales, and naturally generalize discrete convolutions. In our continuous setting, the scale parameter plays the role of the resolution of the signal/image. We note that wavelet analysis has played a major role in the design of deep learning architectures in the last decade [44, 17, 7, 19, 8].

The main result of this paper (Section 3) is a set of sufficient conditions that the parameters of a CGNN must satisfy in order to guarantee global injectivity of the network. This result is far from trivial because in each convolutional layer the number of channels is reduced, and this has to be compensated by the higher scale in the MRA. Generative models that are not injective are of no use in solving inverse problems or inference problems, or at least it is difficult to study their performance from the theoretical point of view. Furthermore, injective generators are needed whenever they are used as decoders in an autoencoder. In the discrete settings, some families of injective networks have been already thoroughly characterized [12, 41, 49, 24, 37, 50, 29, 28]. Note that normalizing flows are injective by construction, yet they are maps between spaces of the same (generally large) dimension, a feature that does not necessarily help with our desired applications.

Indeed, another useful property of CGNNs is dimensionality reduction. For ill-posed inverse problems, it is well known that imposing finite-dimensional priors improves the stability and the quality of the reconstruction [6, 14, 13, 5, 15], also working with finitely-many measurements [3, 31, 2, 4, 1]. In practice, these priors are unknown or cannot be analytically described: yet, they can be approximated by a (trained) CGNN. The second main result of this work (Section 4) is that an injective CGNN allows us to transform a possibly nonlinear ill-posed inverse problem into a Lipschitz stable one.

As a proof-of-concept (Section 5

), we show numerically the validity of CGNNs in performing image deblurring on the MNIST dataset

[40] with increasing levels of noise. The numerical model is obtained by training a VAE whose decoder is designed with a CGNN architecture. The classical Landweber iteration method is used as a baseline for comparison. We also provide some qualitative experiments on the expressivity of CGNNs.

## 2 Architecture of CGNNs

We first review the architecture of a DCGAN [51], and then present our continuous generalization. For simplicity, the analysis is done for D signals, but it can be extended to the D case (see Appendix A.4).

### 2.1 1D discrete generator architecture

A deep generative model can be defined as a map , where is a finite-dimensional space with , constructed as the forward pass of a neural network with parameters . Our main motivation being the use of generators in solving ill-posed inverse problems, we consider generators that carry out a dimensionality reduction, i.e. , that will yield better stability in the reconstructions.

As a starting point for our continuous architecture, we then consider the one introduced in [51]. It is a map (we drop the dependence on the parameters

) obtained by composing an affine layer and

convolutional layers with nonlinear activation functions. More precisely:

which can be summarized as

 G=(2◯l=Lσl∘Ψl)∘(σ1∘Ψ1). (1)

The natural numbers are the vector sizes and represent the resolution of the signals at each layer, while are the number of channels at each layer. The output resolution is . Generally, one has , since the resolution increases at each level. Moreover, we impose that is divisible by for every . We now describe the components of .

##### The nonlinearities.

Each layer includes a pointwise nonlinearity , i.e. a map defined as

 σl(x1,…,xαl⋅cl)=(σ(x1),…,σ(xαl⋅cl)),

with nonlinear.

##### The fully connected layer.

The first layer is , where is a linear map and is a bias term.

##### The convolutional layers.

The fractional-strided convolutional layer

represents a convolution with stride such that , where is the number of input channels and the number of output channels, with . This convolution with stride corresponds to the transpose of the convolution with stride , and is often called deconvolution. We refer to Appendix A.2 for more details on fractional-strided convolutions.

Note that the most significant dimensional increase occurs in the first layer, the fully connected one. Indeed, after the first layer, in the fractional-strided convolutional layers the increase of the vectors’ size is compensated by the decrease of the number of channels; see Figure 1 for an illustration. At each layer, the resolution of a signal increases thanks to a deconvolution with higher-resolution filters, as we explain in detail in Appendix A.2. The final output is then a single high-resolution signal.

### 2.2 1D CGNN architecture

We now describe how to reformulate this discrete architecture in the continuous setting, namely, by considering signals in . The resolution of these continuous signals is modelled through wavelet analysis. Indeed, the higher the resolution of a signal, the finer the scale of the space to which the signal belongs. The idea to link multi-resolution analysis to neural networks is partially motivated by scattering networks [17].

##### The spaces.

In the discrete formulation, the intermediate spaces , with , describe vectors of increasing resolution. In the continuous setting, it is natural to replace these spaces by using a MRA of [20, 32, 43] (see Definition 1 in Appendix A.1), namely, using the spaces

 Vj1⊂Vj2⊂⋯⊂VjL,

with , representing an increasing (finite) sequence of scales. We have that if and only if , so that contains signals at a resolution that is twice that of the signals in . Thus, the relation between the indexes and is

 αl=2ναl−1⟺jl=ν+jl−1,

where is a free parameter, or, equivalently,

 jl−jl−1=log2αlαl−1=log2(s−1). (2)

Similarly to the discrete case, the intermediate spaces are for , with . The norm in these spaces is

 ∥f∥22=cl∑i=1∥fi∥2L2(R)=cl∑i=1∫R|fi(x)|2dx,f∈(Vjl)cl.
##### The nonlinearities.

The nonlinearities act on functions in by pointwise evaluation:

 σl(f)(x)=σl(f(x)),a.e.\ x∈R. (3)

Note that this map is well defined if there exists such that for every . Indeed, in this case, for . Moreover, if a.e., then a.e. It is worth observing that, in general, this nonlinearity does not preserve the spaces , namely, . However, in the case when the MRA is associated to the Haar wavelet, the spaces consist of dyadic step functions, and so they are preserved by the action of .

##### The fully connected layer.

The map in the first layer is given by

 Ψ1=F⋅+b, (4)

where is a linear map and .

##### The convolutional layers.

We first need to model the stride in the continuous setting. A convolution with stride that maps functions from the scale to the scale with filter can be seen as the map

 ⋅∗j+ν→jg:L2(R)→L2(R),f∗j+ν→jg=PVj(PVj+νf∗g),

where denotes the continuous convolution and denotes the orthogonal projection onto the closed subspace . In other words,

 ⋅∗j+ν→jg=PVj∘(⋅∗g)∘PVj+ν.

As a consequence, the corresponding deconvolution (i.e. a convolution with stride ) is given by its adjoint, which can be easily computed since projections and convolutions are self-adjoint:

 ⋅∗j→j+νg=PVj+ν∘(⋅∗g)∘PVj:L2(R)→L2(R).

We are now able to model a convolutional layer. The -th layer of a CGNN, for , is

 σl∘¯Ψl:(L2(R))cl−1→(L2(R))cl,

where is the nonlinearity defined above and are the convolutions with stride . In view of the above discussion, and of the discrete counterpart explained in Appendix A.2, we define

 ¯Ψl=P(Vjl)cl∘Ψl∘P(Vjl−1)cl−1,

where the convolution is given by

 (Ψl(x))k:=cl−1∑i=1xi∗tli,k+blk,k=1,...,cl, (5)

with filters and biases .

##### Summing up.

Altogether, the full architecture in the continuous setting may be written as

 G:RSΨ1−−→f.c.(Vj1)c1σ1−−−−→nonlin.(L2(R))c1P(Vj1)c1−−−−→proj.(Vj1)c1Ψ2−−−→conv.(L2(R))c2P(Vj2)c2−−−−→proj.(Vj2)c2σ2−−−−→nonlin.(L2(R))c2P(Vj2)c2−−−−→proj.(Vj2)c2Ψ3−−−→conv.⋯⋯ΨL−−−→conv.L2(R)PVjL−−−→proj.VjL−−−−→nonlin.σLL2(R)PVjL−−−→proj.VjL,

which can be summarized as

 G=(2◯l=L~σl∘~Ψl)∘(~σ1∘Ψ1), (6)

where

 ~Ψl:=P(Vjl)cl∘Ψl:(Vjl−1)cl−1→(Vjl)cl,l=2,...,L, (7)

and

 ~σl:=P(Vjl)cl∘σl:(Vjl)cl→(Vjl)cl,l=1,...,L. (8)
##### A simple example: the Haar case.

Let be the scaling spaces of piecewise constant functions, i.e. consider an MRA with the Haar scaling function . This simple scaling function, in addition to naturally extending the discrete case to the continuous one, also makes it possible to simplify the structure of a CGNN. Indeed , thanks to the form of and the fact that and , defined as , have disjoint support for every . In this setting, the projections after the nonlinearities can be removed.

## 3 Injectivity of CGNNs

We are interested in studying the injectivity of the continuous generator (6) to guarantee uniqueness in the representation of the signals. The injectivity will also allow us, as a by-product, to obtain stability results for inverse problems using generative models, as in Section 4.

We consider here the 1D case with stride ; then, applying (2) iteratively, we obtain for . We also consider non-expansive convolutional layers, i.e. . We note that the same result holds also with expansive convolutional layers, arbitrary stride (possibly dependent on ) and in the 2D case (see Appendix A.4).

We make the following assumptions.

##### Assumptions on the scaling spaces Vjl
###### Hypothesis 1.

The spaces , with , belong to an MRA (see Definition 1 in Appendix A.1), whose scaling function is compactly supported and bounded. Furthermore, there exists such that

 ∫R2ϕ(t)ϕ(z)ϕ(2t+z−r)dzdt≠0. (9)
###### Remark 1 (Haar and Daubechies scaling functions).

For positive functions, such as the Haar scaling function, i.e. , condition (9) is easily satisfied. For the Daubechies scaling functions with

vanishing moments for

( corresponds to the Haar scaling function), we verified condition (9) numerically. We believe that this condition is satisfied for every scaling function , but have not been able to prove this rigorously.

##### Assumptions on the convolutional filters

The following hypothesis asks that, at each convolutional layer, the filters are compactly supported with the same support, where represents the filters’ size. Generally, the convolutional filters act locally, so it is natural to assume that they have compact support. Furthermore, we ask the filters to be linearly independent, in a suitable sense: this is needed for the injectivity of the convolutional layers.

###### Hypothesis 2.

Let . For every , the convolutional filters of the -th convolutional layer (5) satisfy

 tli,k=¯p∑p=0dlp,i,kϕjl,p,i=1,...,cl−1,k=1,...,cl, (10)

where , and , where is the matrix defined by

 (Dl)i,k:=⎧⎪⎨⎪⎩dl0,i,kk=1,...,c12l,dl1,i,k−c12lk=c12l+1,...,c12l−1. (11)
###### Remark 2.

The condition is sufficient for the injectivity of the convolutional layers, but not necessary. The necessary condition is given in Appendix A.3, and consists in requiring the rank of a certain block matrix to be maximum, in which is simply the first block. We note that this condition is independent of the scaling function , but depends only on the filters’ coefficients .

###### Remark 3 (Analogy between continuous and discrete case).

The splitting operation of the filters’ scaling coefficients in odd and even entries, as in Hypothesis

2, reminds the expression of the discrete convolution with stride . Indeed, a discrete filter is split into , containing the even entries of , and , containing the odd ones (see Appendix A.2)

##### Assumptions on the nonlinearity

For simplicity, we consider the same nonlinearity in each layer (the generalization to the general case is straightforward). The following conditions guarantee that is injective.

###### Hypothesis 3.

We assume that

1. is injective and for every , for some ;

2. and preserves the sign, i.e.  for every ;

Note that these conditions ensure that for every , for some , and so for every . It is also straightforward to check that the injectivity of ensures the injectivity of

 σl:(Vjl)cl→(L2(R))cl,f↦σl(f).
###### Remark 4.

In the Haar case, the projection after the nonlinearity can be removed, as explained at the end of Section 2.2, and we need to verify only the injectivity of instead of that of . As noted above, a sufficient condition to guarantee the injectivity of is the injectivity of . So, in the Haar case, Hypothesis 3 can be relaxed and replaced by:

1. is injective;

2. There exists such that for every .

Hypothesis 3 is satisfied for example by the function

. Its relaxed version, in the Haar case, is satisfied by some commonly used nonlinearities, such the Sigmoid, the Hyperbolic tangent, the Softplus, the Exponential linear unit (ELU) and the Leaky rectified linear unit (Leaky ReLU). Our approach does not allow us to consider non-injective

’s, such as the ReLU [49].

For simplicity, in Hypothesis 3, we require that and that is strictly positive everywhere. This allows us to use Hadamard’s global inverse function theorem [26] to obtain the injectivity of the generator. However, thanks to a generalized version of Hadamard’s theorem [48], we expect to be able to relax the conditions by requiring only that is Lipschitz and its generalized derivative is strictly positive everywhere. In this way, the Leaky ReLU would satisfy the assumptions.

##### Assumptions on the fully connected layer

We impose the following natural hypothesis on the fully connected layer.

###### Hypothesis 4.

We assume that

1. The linear function is injective;

2. There exists such that and .

The inclusion is natural, since we start with low-resolution signals. The second condition in Hypothesis 4 means that the image of the first layer, , contains only compactly supported functions with the same support. This is natural since we deal with signals of finite size.

Even the injectivity of is non-restrictive, since we choose the dimension of the latent space to be much smaller than the dimension of , which is . So, maps a low-dimensional space into a higher dimensional one.

##### The injectivity theorem
###### Theorem 1.

Let and . Let , and for every . Let be the scaling function space arising from an MRA, and for every , and . Let and be defined as in (7) and (8), respectively. Let be defined as in (4). If Hypotheses 1, 2, 3 and 4 are satisfied, then the generator defined in (6) is injective.

###### Sketch of the proof.

Consider (6), (7) and (8). Note that is injective by Hypothesis 4. If we also show that is injective for every and that is injective for every , then the injectivity of will immediately follow.

The injectivity of is a consequence of Hypothesis 2 (together with Hypothesis 1). The injectivity of follows from Hypothesis 3 (together with Hypotheses 1 and 4) and from Hadamard’s global inverse function theorem applied to . The full proof is presented in Appendix A.3. ∎

## 4 Stability of inverse problems with generative models

We now show how an injective CGNN can be used to solve ill-posed inverse problems. The purpose of a CGNN is to reduce the dimensionality of the unknown to be determined, and the injectivity is the main ingredient to obtain a rigurous stability estimate.

We consider an inverse problem of the form

 y=F(x), (12)

where is a possibly nonlinear map between Banach spaces, and , modeling a measurement (forward) operator, is an quantity to be recovered and is the noisy data. Typical inverse problems are ill-posed (e.g. CT, accelerated MRI or electrical impedance tomography), meaning that the noise in the measurements is amplified in the reconstruction. For instance, in the linear case, this instability corresponds to having an unbounded (namely, not Lipschitz) inverse . The ill-posedness is classically tackled by using regularization, which often leads to an iterative method, as the gradient-type Landweber algorithm [23]. This can be very expensive if has a large dimension.

However, in most of the inverse problems of interest, the unknown can be modeled as an element of a low-dimensional manifold in . We choose to use a generator to perform this dimensionality reduction and therefore our problem reduces to finding such that

 y=F(G(z)). (13)

In practice, the map is found via an unsupervised training procedure, starting from a training dataset. From the computational point of view, solving (13) with an iterative method is clearly more advantageous than solving (12), because belongs to a lower dimensional space. We note that the idea of solving inverse problems using deep generative models has been considered in [16, 55, 9, 34, 33, 46, 57, 11].

The dimensionality reduction given by the composition of the forward operator with a generator as in (13), has a regularizing/stabilizing effect that we aim to quantify. More precisely, we show that an injective CGNN yields a Lipschitz stability result for the inverse problem (13); in other words, the inverse map is Lipschitz continuous, and noise in the data is not amplified in the reconstruction. For simplicity, we consider the D case with stride and non-expansive convolutional layers, but the result can be extended to the D case and arbitrary stride as done in Appendix A.4 for Theorem 1.

###### Theorem 2.

Let and be a CGNN satisfying Hypotheses 1, 2, 3 and 4. Let , be a compact set, be a Banach space and be a map (possibly nonlinear). Assume that is injective and is injective for every . Then there exists a constant such that

 ∥x−y∥X≤C∥F(x)−F(y)∥Y,x,y∈K.

The proof of Theorem 2 can be found in Appendix A.5 and is mostly based on Theorem 1 and [1, Theorem 2.2]. This Lipschitz estimate can also be obtained in the case when only finite measurements are available, i.e. a suitable finite-dimensional approximation of , thanks to [1, Theorem ].

## 5 Numerical results

We present here numerical results validating our theoretical findings. In Section 5.1 we describe how we train a CGNN, in Section 5.2 we apply a CGNN-based reconstruction algorithm to image deblurring, and in Section 5.3 we show qualitative results for generation and reconstruction purposes. Additional numerical simulations are included in Appendix A.6.

### 5.1 Training

The conditions for injectivity given in Theorem 1

are not very restrictive and we can use an unsupervised training protocol to choose the parameters of a CGNN. Even though our theoretical results concern only the injectivity of CGNNs, we numerically verified that training a generator to also well approximate a probability distribution gives better reconstructions for inverse problems. For this reason, we choose to train CGNNs as parts of variational autoencoders (VAEs)

[35], a popular architecture for generative modeling. In particular, our VAEs are designed so that the corresponding decoder has a CGNN architecture. We refer to [36] for a thorough review of VAEs and to Appendix A.6 for more details on the numerical implementation. Note that there is growing numerical evidence showing that an untrained convolutional network is competitive with trained ones, when solving inverse problems with generative priors [57, 11].

The training is done on images in all our examples. We use the MNIST dataset of handwritten digits [40], which contains images in the training set and in the test set. Each example is a grayscale image. Since our training is unsupervised, we do not use any label information.

Our VAE includes a decoder with non-expansive convolutional layers and an encoder with a similar, yet mirrored, structure. More precisely, the decoder is composed by a fully connected layer, which maps vectors from the latent space (for different ) to channels of size . Then, there are three convolutional layers with stride . In each layer the size of the images doubles both in width and in height, while the number of channels is divided by . We eventually obtain one image of size , from which we extract a image, by removing the external rows and columns. At each layer, the same pointwise nonlinearity is applied.

The training is done with the Adam optimizer using a learning rate of

and the loss function, commonly used for VAEs, given by the weighted sum of two terms: the Binary Cross Entropy (BCE) between the original and the generated images and the Kullback-Leibler Divergence (KLD) between the standard Gaussian distribution and the one generated by the encoder in the latent space

111All computations were implemented with Python3, running on a workstation with 256GB of RAM and 2.2 GHz AMD EPYC 7301 CPU and Quadro RTX 6000 GPU with 22GB of memory. All the codes are available at https://github.com/ContGenMod/Continuous-Generative-Neural-Network.

The injectivity of the decoder, i.e. the generator, is guaranteed if the hypotheses of Theorem 1 are satisfied, where Hypothesis 3 can be relaxed as in Remark 4, since, in practice, we are in the Haar case. We test Hypothesis 2 a posteriori, i.e. after the training, and, in our cases, it is always satisfied.

### 5.2 Deblurring with generative models

In Figure 2 we numerically show that combining Landweber iterations with a CGNN provides better and more stable results with respect to the standard Landweber scheme for a classical imaging inverse problem: image deblurring with a Gaussian blurring filter. In other words, we solve (13) instead of (12), where is a CGNN. More details on both algorithms can be found in Appendix A.6.

The ground truth is a handwritten digit taken from the MNIST test set. The data is obtained by blurring the ground truth with a fixed Gaussian blurring operator at four different levels of additive Gaussian noise. We compare the original image, the corrupted one , the reconstruction obtained by Landweber iterations , and the one obtained by our approach combining Landweber with a CGNN

, by measuring the peak signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM). We choose a CGNN with the architecture explained in Section

5.1 with the Leaky ReLU activation function and the latent space dimension . For all noise levels, our CGNN based algorithm clearly outperforms the classical one with respect to both indicators, even though at low levels of noise the Landweber method recovers higher resolution details. We verified that choosing different activation functions does not change the reconstruction quality significantly.

### 5.3 Generation

To assess the quality of the generation, in Figures 2(a) and 2(b) we show random samples of images from trained D injective CGNN with different activation functions (see Appendix A.6). We consider the ReLU, which is not injective, the Leaky ReLU, which is injective but not , the ELU function, which is injective, but does not satisfy for some (see Condition 1 of Hypothesis 3), and a nonlinearity satisfying Hypothesis 3: . In order to improve the quality of the generation of new samples, we reweight the loss function as . We compare two different latent space dimensions: and . We qualitatively observe that has a generative power comparable to other common nonlinearities.

To evaluate the reconstruction power of our trained VAE, we compute the mean and the variance, over the

images of the test set, of the SSIM between the true image and the reconstructed one, obtained by applying the full VAE to the true image. Here, we fix the nonlinearity and the loss function and we consider different latent space dimensions (see Figure 2(c)). We observe that the SSIM increases significantly up to .

## 6 Conclusions

In this work, we have introduced CGNNs, a family of generative models in the continuous, infinite-dimensional, setting, generalizing popular architectures such as DCGANs [51]. We have shown that, under natural conditions on the weights of the networks and on the nonlinearity, a CGNN is globally injective. This allowed us to obtain a Lipschitz stability result for (possibly nonlinear) ill-posed inverse problems, with unknowns belonging to the manifold generated by a CGNN.

The main mathematical tool used is wavelet analysis and, in particular, a multi-resolution analysis of . While wavelets yield the simplest multi-scale analysis, they are suboptimal when dealing with images. So, it would be interesting to consider CGNNs with other systems, such as curvelets [18] or shearlets [39], more suited to higher-dimensional signals.

For simplicity, we considered only the case of a smooth nonlinearity : we leave the investigation of Lipschitz ’s to future work. This would allow for including many commonly used activation functions. Some simple illustrative numerical examples (only in the case of the Haar wavelet) are included in this work, which was mainly focused on the theoretical properties of CGNNs. It would be interesting to perform more extensive numerical simulations in order to better evaluate the performance of CGNNs, also with other types of wavelets, e.g. with the Daubechies wavelets, and with nonlinear inverse problems, such as electrical impedance tomography.

## Acknowledgments

This material is based upon work supported by the Air Force Office of Scientific Research under award number FA8655-20-1-7027. The authors are members of the “Gruppo Nazionale per l’Analisi Matematica, la Probabilità e le loro Applicazioni”, of the “Istituto Nazionale di Alta Matematica”.

## References

• [1] G. S. Alberti, Á. Arroyo, and M. Santacesaria. Inverse problems on low-dimensional manifolds. arXiv preprint arXiv:2009.00574, 2020.
• [2] G. S. Alberti, P. Campodonico, and M. Santacesaria. Compressed sensing photoacoustic tomography reduces to compressed sensing for undersampled fourier measurements. SIAM Journal on Imaging Sciences, 14(3):1039–1077, 2021.
• [3] G. S. Alberti and M. Santacesaria. Calderón’s inverse problem with a finite number of measurements. In Forum of Mathematics, Sigma, volume 7. Cambridge University Press, 2019.
• [4] G. S. Alberti and M. Santacesaria. Infinite-dimensional inverse problems with finite measurements. Archive for Rational Mechanics and Analysis, 243(1):1–31, 2022.
• [5] G. Alessandrini, V. Maarten, R. Gaburro, and E. Sincich. Lipschitz stability for the electrostatic inverse boundary value problem with piecewise linear conductivities. Journal de Mathématiques Pures et Appliquées, 107(5):638–664, 2017.
• [6] G. Alessandrini and S. Vessella. Lipschitz stability for the inverse conductivity problem. Advances in Applied Mathematics, 35(2):207–241, 2005.
• [7] J. Andén and S. Mallat. Deep scattering spectrum. IEEE Transactions on Signal Processing, 62(16):4114–4128, 2014.
• [8] T. Angles and S. Mallat. Generative networks as inverse problems with scattering transforms. arXiv preprint arXiv:1805.06621, 2018.
• [9] L. Ardizzone, J. Kruse, S. Wirkert, D. Rahner, E. W. Pellegrini, R. S. Klessen, L. Maier-Hein, C. Rother, and U. Köthe. Analyzing inverse problems with invertible neural networks. arXiv preprint arXiv:1808.04730, 2018.
• [10] S. Arridge, P. Maass, O. Öktem, and C.-B. Schönlieb. Solving inverse problems using data-driven models. Acta Numerica, 28:1–174, 2019.
• [11] M. Asim, F. Shamshad, and A. Ahmed. Blind image deconvolution using deep generative priors. IEEE Transactions on Computational Imaging, 6:1493–1506, 2020.
• [12] J. Behrmann, W. Grathwohl, R. T. Chen, D. Duvenaud, and J.-H. Jacobsen. Invertible residual networks. In International Conference on Machine Learning, pages 573–582. PMLR, 2019.
• [13] E. Beretta, M. V. de Hoop, E. Francini, S. Vessella, and J. Zhai. Uniqueness and Lipschitz stability of an inverse boundary value problem for time-harmonic elastic waves. Inverse Problems, 33(3):035013, 2017.
• [14] E. Beretta, E. Francini, A. Morassi, E. Rosset, and S. Vessella. Lipschitz continuous dependence of piecewise constant lamé coefficients from boundary data: the case of non-flat interfaces. Inverse Problems, 30(12):125005, 2014.
• [15] E. Beretta, E. Francini, and S. Vessella. Lipschitz stable determination of polygonal conductivity inclusions in a two-dimensional layered medium from the Dirichlet-to-Neumann map. SIAM Journal on Mathematical Analysis, 53(4):4303–4327, 2021.
• [16] A. Bora, A. Jalal, E. Price, and A. G. Dimakis. Compressed sensing using generative models. In International Conference on Machine Learning, pages 537–546. PMLR, 2017.
• [17] J. Bruna and S. Mallat. Invariant scattering convolution networks. IEEE transactions on pattern analysis and machine intelligence, 35(8):1872–1886, 2013.
• [18] E. J. Candès and D. L. Donoho. New tight frames of curvelets and optimal representations of objects with piecewise singularities. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 57(2):219–266, 2004.
• [19] X. Cheng, X. Chen, and S. Mallat. Deep Haar scattering networks. Information and Inference: A Journal of the IMA, 5(2):105–133, 04 2016.
• [20] I. Daubechies. Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, USA, 1992.
• [21] L. de Oliveira, M. Paganini, and B. Nachman. Learning particle physics by example: location-aware generative adversarial networks for physics synthesis. Computing and Software for Big Science, 1(1):1–24, 2017.
• [22] L. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.
• [23] H. W. Engl, M. Hanke, and A. Neubauer. Regularization of inverse problems, volume 375. Springer Science & Business Media, 1996.
• [24] C. Etmann, R. Ke, and C.-B. Schönlieb. iUNets: Fully invertible U-Nets with learnable up-and downsampling. arXiv preprint arXiv:2005.05220, 2020.
• [25] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
• [26] W. B. Gordon. On the diffeomorphisms of Euclidean space. The American Mathematical Monthly, 79(7):755–759, 1972.
• [27] D. Güera and E. J. Delp.

Deepfake video detection using recurrent neural networks.

In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–6, 2018.
• [28] P. Hagemann, J. Hertrich, and G. Steidl. Stochastic normalizing flows for inverse problems: a Markov Chains viewpoint. arXiv preprint arXiv:2109.11375, 2021.
• [29] P. Hagemann and S. Neumayer. Stabilizing invertible neural networks using mixture models. Inverse Problems, 37(8):Paper No. 085002, 23, 2021.
• [30] C. Han, H. Hayashi, L. Rundo, R. Araki, W. Shimoda, S. Muramatsu, Y. Furukawa, G. Mauri, and H. Nakayama. GAN-based synthetic brain MR image generation. In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pages 734–738. IEEE, 2018.
• [31] B. Harrach. Uniqueness and Lipschitz stability in electrical impedance tomography with finitely many electrodes. Inverse problems, 35(2):024005, 2019.
• [32] E. Hernandez and G. Weiss. A First Course on Wavelets. Studies in Advanced Mathematics. CRC Press, 1996.
• [33] C. M. Hyun, S. H. Baek, M. Lee, S. M. Lee, and J. K. Seo. Deep learning-based solvability of underdetermined inverse problems in medical imaging. Medical Image Analysis, 69:101967, 2021.
• [34] C. M. Hyun, H. P. Kim, S. M. Lee, S. Lee, and J. K. Seo. Deep learning for undersampled MRI reconstruction. Physics in Medicine & Biology, 63(13):135007, jun 2018.
• [35] D. P. Kingma and M. Welling. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.
• [36] D. P. Kingma and M. Welling. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.
• [37] K. Kothari, A. Khorashadizadeh, M. de Hoop, and I. Dokmanić. Trumpets: Injective flows for inference and inverse problems. In

Uncertainty in Artificial Intelligence

, pages 1269–1278. PMLR, 2021.
• [38] N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and A. Anandkumar. Neural operator: Learning maps between function spaces. arXiv preprint arXiv:2108.08481, 2021.
• [39] D. Labate, W.-Q. Lim, G. Kutyniok, and G. Weiss. Sparse multidimensional representation using shearlets. In Wavelets XI, volume 5914, page 59140U. International Society for Optics and Photonics, 2005.
• [40] Y. LeCun, C. Cortes, and C. Burges. MNIST handwritten digit database., 2010.
• [41] Q. Lei, A. Jalal, I. S. Dhillon, and A. G. Dimakis. Inverting deep generative models, one layer at a time. Advances in neural information processing systems, 32, 2019.
• [42] L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021.
• [43] S. Mallat. A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way. Academic Press, Inc., USA, 3rd edition, 2008.
• [44] S. Mallat. Group invariant scattering. Communications on Pure and Applied Mathematics, 65(10):1331–1398, 2012.
• [45] M. Mardani, E. Gong, J. Y. Cheng, S. S. Vasanawala, G. Zaharchuk, L. Xing, and J. M. Pauly. Deep generative adversarial neural networks for compressive sensing MRI. IEEE Transactions on Medical Imaging, 38(1):167–179, 2019.
• [46] G. Ongie, A. Jalal, C. A. Metzler, R. G. Baraniuk, A. G. Dimakis, and R. Willett. Deep learning techniques for inverse problems in imaging. IEEE Journal on Selected Areas in Information Theory, 1(1):39–56, 2020.
• [47] S. Otten, S. Caron, W. de Swart, M. van Beekveld, L. Hendriks, C. van Leeuwen, D. Podareanu, R. Ruiz de Austri, and R. Verheyen. Event generation and statistical sampling for physics with deep generative models and a density information buffer. Nature communications, 12(1):1–16, 2021.
• [48] B. Pourciau. Global invertibility of nonsmooth mappings. Journal of mathematical analysis and applications, 131(1):170–179, 1988.
• [49] M. Puthawala, K. Kothari, M. Lassas, I. Dokmanić, and M. de Hoop. Globally injective ReLU networks. arXiv e-prints, pages arXiv–2006, 2020.
• [50] M. Puthawala, M. Lassas, I. Dokmanić, and M. de Hoop. Universal joint approximation of manifolds and densities by simple injective flows. arXiv preprint arXiv:2110.04227, 2021.
• [51] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
• [52] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
• [53] D. Rezende and S. Mohamed. Variational inference with normalizing flows. In International conference on machine learning, pages 1530–1538. PMLR, 2015.
• [54] L. Ruthotto and E. Haber. An introduction to deep generative modeling. GAMM-Mitteilungen, 44(2):e202100008, 2021.
• [55] J. K. Seo, K. C. Kim, A. Jargal, K. Lee, and B. Harrach. A learning-based method for solving ill-posed nonlinear inverse problems: a simulation study of lung EIT. SIAM journal on Imaging Sciences, 12(3):1275–1295, 2019.
• [56] N. K. Singh and K. Raza. Medical image generation using generative adversarial networks: a review. Health Informatics: A Computational Perspective in Healthcare, pages 77–96, 2021.
• [57] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Deep image prior. In

Proceedings of the IEEE conference on computer vision and pattern recognition

, pages 9446–9454, 2018.
• [58] W. P. Walters and R. Barzilay. Applications of deep learning in molecule generation and molecular property prediction. Accounts of chemical research, 54(2):263–270, 2020.
• [59] Y. Xu, K. Lin, S. Wang, L. Wang, C. Cai, C. Song, L. Lai, and J. Pei. Deep learning for molecular generation. Future medicinal chemistry, 11(6):567–597, 2019.

## Appendix A Appendix

### a.1 Wavelets

#### a.1.1 1D wavelet analysis

We give a brief review of concepts from wavelet analysis: in particular the definitions and the meaning of scaling function spaces and Multi-Resolution analysis in the D case. See [20, 32, 43] for more details.

Given a function , we define

 ϕj,n(x)=2j2ϕ(2jx−n),x∈R, (14)

for every . The integers and are the scale and the translation parameters, respectively, where the scale is proportional to the speed of the oscillations of (the larger , the finer the scale, the faster the oscillations).

###### Definition 1.

A Multi-Resolution Analysis (MRA) is an increasing sequence of subspaces defined for

 ...⊆V−1⊆V0⊆V1⊆...

together with a function such that

1. is dense in and ;

2. if and only if ;

3. and is an orthonormal basis of .

The function is called scaling function of the MRA.

From Points and of Definition 1, we have that is an orthonormal basis of .

Given a function with the properties below, it is possible to construct an MRA with as scaling function. Indeed, if we require that

1. are orthonormal;

2. is a convergent sum in and ;

3. is continuous in with ;

then together with define an MRA.

Intuitively, the space contains functions where the finest scale is , which explains the name MRA.

#### a.1.2 2D Wavelet analysis

From [43, Section ], we recall the principal concepts of D wavelet analysis. In D, the scaling function spaces become with , with orthonormal basis given by , where

 ϕj,(n1,n2)(x1,x2)=ϕj,n1(x1)ϕj,n2(x2),

and in an orthonormal basis of . We recall that , where . As in D, the MRA properties of Definition 1 hold:

1. is dense in and ;

2. if and only if ;

3. and is an orthonormal basis of .

Moreover, for every , as in D.

### a.2 Discrete strided convolutions

A fractional-strided convolution with stride such that , input channels and output channels is defined by

 (Ψx)k:=cin∑i=1xi∗sti,k+bk,k=1,...,cout, (15)

where are the convolutional filters and are the bias terms, for and . The operator is defined as

 (x∗st)(n):=∑m∈Zy(m)t(n−s−1m), (16)

where we extend the signals and to finitely supported sequences by defining them zero outside their supports, i.e. , where is the space of sequences with finitely many nonzero elements. We can rewrite (16) in the following way:

 (x∗st)(n)=(x∗tr)(k), (17)

where with , and for every . The symbol represents the discrete convolution

 x∗t:=∑m∈Zx(m)t(⋅−m),x,t∈c00(Z). (18)

The output belongs to . We motivate (16) by taking the adjoint of the strided convolution with stride , as we explain below.

Equation (17) is useful to interpret Hypothesis 2 on the convolutional filters (Section 3). We observe that in our case the convolution is well defined since the signals we consider have a finite number of non-zero entries. However, in general, it is enough to require that and with to obtain a well-defined discrete convolution.

We now want to justify (16). We compute the adjoint of the convolutional operator with stride and filter , which is defined as , where

 (19)

As before, the signals and are seen as elements of by extending them to zero outside their supports and the symbol denotes the discrete convolution defined in (18). The adjoint of , , satisfies

 ⟨A∗s−1,ty,x⟩2=⟨y,As−1,tx⟩2, (20)

where is the scalar product on . Using (20), we find that

 (21)

However, in order to be consistent with the definition in (19) when , we do not define the fractionally-strided convolution as the adjoint given by (21), but as in (16).

Figure 4 presents a graphical illustration of three examples of strided convolutions with different strides: in Figure 3(a), in Figure 3(b), and in Figure 3(c). The input vector and the filter have a finite number of non-zero entries indicated with yellow and orange squares/rectangles, respectively, and the output has a finite number of non-zero entries indicated with red squares/rectangles. For simplicity, we identify the infinite vectors in with vectors in where is the number of their non-zero entries. Given the illustrative purpose of these examples, for simplicity we ignore boundary effects. The signals’ sizes are:

1. [label=()]

2. , input vector , output vector ;

3. , input vector , output vector ;

4. , input vector , output vector .

When the stride is an integer, equation (19) describes what is represented in Figures 3(a) and 3(b). When the stride is , as depicted in Figure 3(c), it is intuitive to consider a filter whose entries are half the size of the input ones. This is equivalent to choosing the filter in a space of higher resolution with respect to the space of the input signal. As a result, the output belongs to the same higher resolution space. For instance, the filter belongs to a space that is twice the resolution of the input space when the stride is . This notion of resolution is coherent with the scale parameter used in the continuous setting.

In fact, Figure 3(c) does not represent equation (16) exactly when . However the illustration is useful to model the fractional-strided convolution in the continuous setting. A more precise illustration of the -strided convolution of equation (19) is presented in Figure 5.

### a.3 Proof of Theorem 1

We begin with some preliminary technical lemmas.

###### Lemma 3.

Let Hypothesis 1 hold and let

 η(r):=∫R2ϕ(t)ϕ(z)ϕ(2t+z−r)dzdt,r∈Z. (22)

Then

 ^η(ξ)≠0 for a.e. ξ∈[0,1],

where is the Fourier series of , defined as

 ^η(ξ):=∑r∈Zη(r)e−2πiξr,ξ∈[0,1]. (23)
###### Proof.

By Hypothesis 1, is compactly supported, thus the series has a finite number of non-zero entries. Then, is an analytic function, with , since by Hypothesis 1, there exists such that . Therefore . ∎

###### Lemma 4.

If Hypotheses 2 and 4 are satisfied, then the image of each layer of the generator defined in (6) contains only compactly supported functions with the same support, i.e.

 (2◯~l=l~σ~l∘~Ψ~l)∘(~σ1∘Ψ1)(RS)⊆Wl,l=2,...,L,(~σ1∘Ψ1)(RS)⊆W1,
 ~Ψl+1∘(2◯~l=l~σ~l∘~Ψ~l)∘(~σ1∘Ψ1)(RS)⊆Wl+1,l=2,