SUNLayer: Stable denoising with generative networks

It has been experimentally established that deep neural networks can be used to produce good generative models for real world data. It has also been established that such generative models can be exploited to solve classical inverse problems like compressed sensing and super resolution. In this work we focus on the classical signal processing problem of image denoising. We propose a theoretical setting that uses spherical harmonics to identify what mathematical properties of the activation functions will allow signal denoising with local methods.

Authors

• 27 publications
• 18 publications
• Intermediate Layer Optimization for Inverse Problems using Deep Generative Models

We propose Intermediate Layer Optimization (ILO), a novel optimization a...
02/15/2021 ∙ by Giannis Daras, et al. ∙ 13

• Compressed Sensing with Invertible Generative Models and Dependent Noise

We study image inverse problems with invertible generative priors, speci...
03/18/2020 ∙ by Jay Whang, et al. ∙ 11

In the recent years, there has been a significant improvement in the qua...
06/12/2019 ∙ by Shady Abu Hussein, et al. ∙ 0

• Compressed Sensing with Deep Image Prior and Learned Regularization

We propose a novel method for compressed sensing recovery using untraine...
06/17/2018 ∙ by David Van Veen, et al. ∙ 10

• Reconstructing the Noise Manifold for Image Denoising

Deep Convolutional Neural Networks (DCNNs) have been successfully used i...
02/11/2020 ∙ by Ioannis Marras, et al. ∙ 0

• Back-Projection based Fidelity Term for Ill-Posed Linear Inverse Problems

Ill-posed linear inverse problems appear in many image processing applic...
06/16/2019 ∙ by Tom Tirer, et al. ∙ 1

• End-to-End Adaptive Monte Carlo Denoising and Super-Resolution

The classic Monte Carlo path tracing can achieve high quality rendering ...
08/16/2021 ∙ by Xinyue Wei, et al. ∙ 0

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep neural networks, in particular generative adversarial networks by [Goodfellow et al., 2014] have been recently used to produce generative models for real world data that can capture very complex structures. This is especially true for natural images (see for instance [Nguyen et al., 2016]). Those generative priors have been successfully used to efficiently solve classical inverse problems in signal processing, like super resolution ([Johnson et al., 2016]) and compressed sensing ([Bora et al., 2017]). The latter numerically demonstrates that the generative prior can be exploited to solve the compressed sensing problem with ten times fewer measurements than the classic compressed sensing theory requires. Follow-up work by [Hand and Voroninski, 2017]

recently explained the success of local methods (namely empirical risk minimization) in the compressed sensing task by assuming a generative model of a multi-layer neural network with random weights and ReLU activation functions.

The aim of this paper is to propose a theoretical framework that will allow us to analyze neural networks in the context of another classical inverse problem in signal processing: signal denoising. It has been experimentally established that deep neural networks can be used for image inpainting and denoising

[Xie et al., 2012]

. We are interested in denoising in the high-noise regime, in which modern methods that do not rely on machine learning appear less capable. In this work we propose a simple model for the generative model where linear maps are composed with non-linear activation functions, and we study what mathematical properties of the activation function will allow signal denoising with local methods. We assume our generative model can be expressed as the composition of simple neural network layers we call SUNLayer and we use tools from harmonic analysis to understand what are the

good properties for activation functions for the denoising task. We perform numerical experiments to complement the theory.

1.1 Main contributions

The main contributions of this paper can be summarized in two points.

• We introduce SUNLayer, a simple model for spherical uniform neural network layers (Section 2).

• We prove performance guarantees for denoising with a generative network under the SUNLayer model. In particular, given noise with SUNLayer for some activation function, we show that all critical points of the map are close to provided the activation function is well behaved and the noise is appropriately small.(Section 4).

We believe the theoretical framework we introduce in this paper could be useful to provide mathematical intuition about neural networks in a more general context. See Section 6 for a more in-depth discussion.

2 SUNLayer: a neural network model

Let be an input signal, we consider the linear map where the inner product in between and . Let be an activation function. We define one layer of the SUNLayer neural network to be

 Ln:Sn → L2(Sn) (1) Ln(x) = θ∘fx.

Note that if instead of the linear map we had considered, as one usually does in neural networks, a matrix , then the analogous to is essentially that can be seen as a function defined in the rows of as as

. The SUNLayer model is heuristically generalizing the linear step to a continuum of possible rows.

We are interested in the case where where is a finite dimensional subspace of (and therefore locally compact). The finite dimensionality will allow us to compose several layers of the SUNLayer model. For all , we have that with . A very simple observation (see proof of Lemma 1) shows that for all where is a constant that depends on the activation function and on the dimension

of the domain. Therefore the normalization step (which a priori may have resembled practice standards like batch normalization (

[Ioffe and Szegedy, 2015])) amounts to simple rescaling, and furthermore, we even have when is scaled appropriately (see Lemma 3).

We then conclude that is well defined as long that is finite dimensional. In Section 4 we observe that a necessary and sufficient condition for to be finite dimensional is that is a polynomial.

2.1 Denoising

Let us assume we have a generative model that given a parameter produces , an element of a target space (for instance an image)333The generative model could have been produced for instance with a generative adversarial network (GAN) trained with a large set of images or more generally structured dataset (that comes from an unknown latent distribution). The GAN consists of two neural networks, one known as the generator, which aims to construct new data plausible to be coming from the latent distribution of the training set, and the other is the discriminator which aims to distinguish between instances from the true dataset and the candidates produced by the generator. Both networks get trained against each other.
After training the generator produces a neural network with several layers. We assume the parameter is space is normalized, so the generator finds a generative model where . For all we have that is an element in the target space (for instance, an image) and

is the vector of parameters that generates it.

. The question we aim to answer is when is it possible to denoise an element to the closest element in the image of by using local methods like gradient descent. Figure 1 shows an example of the phenomenon we aim to explain.

We assume our generative model is the composition of layers from the SUNLayer model defined in (1). We solve the denoising problem one layer at a time. Fix . Given for some and noise , then denoising for one SUNLayer corresponds with the least squares problem

 minx∈Sn∥θ∘fx−y∥2L2(Sn). (2)

There exists at least one minimizer for (2) due to compactness.

3 Preliminaries: spherical harmonics

To analyze denoising under the SUNLayer model, we leverage ideas from spherical harmonics. In this section we summarize some classical results about spherical harmonics that can be found on Chapter 2 of [Morimoto, 1998], focusing on theorems and definitions we use in this paper. We refer the reader to [Morimoto, 1998] for a comprehensive review.

Let the space of homogeneous polynomials of degree in variables (we could have also considered real or complex coefficients but real is enough for the scope of this paper).

Definition 1 (Spherical harmonics).

The Laplacian is the differential operator defined as

 Δx=∂2∂x21+…+∂2∂x2n+1,

and the space of spherical harmonics is defined as:

 Hk(Sn)={Hk∈Pk(Sn):ΔHk=0}⊂L2(Sn). (3)

In other words, is the restriction of the polynomials with Laplacian 0 to .

Propositon 1.

is a finite dimensional space and

 L2(Sn)=⊕∞k=0Hk(Sn). (4)

In the sequel, we let denote the dimension of .

Definition 2.

For fixed and let an orthonormal basis of . Define the bilinear form

 Fk(σ,τ)=αn,k∑i=1Yik(σ)¯¯¯¯¯¯¯¯¯¯¯¯¯Yik(τ).

A simple computation shows that is independent of the choice of the orthonormal basis. The bilinear forms will be very useful in the analysis of the SUNLayer model. Some of their relevant properties are summarized in the following lemma.

Propositon 2.

The following statements hold.

1. Reproducing property: for all .

2. Zonal property: there exists so that . In particular only depends on .

3. The function is the Gegenbauer polynomial of degree and dimension . The set is an orthogonal basis of polynomials over with respect to the measure

 dμn=(1−t2)(n−2)/2dt (5)

(here is the standard Borel measure in ). Note that this is not a standard normalization for the Gegenbauer polynomials but we use it to simplify the results of this paper. In fact Chapter 2 of [Morimoto, 1998] considers the Legendre polynomials to be (the term is the -dimensional volume of the sphere and it does not show up in Morimoto’s analysis since he uses the normalized measure in the spheres). In Chapter 5 Morimoto considers the Gegenbauer polynomials as a generalization of the Legendre polynomials where can be any real number, with a different normalization.

4. The discussion in pages 26–27 of [Morimoto, 1998] shows that . This together with the facts

 αn,k=(n+kk)−(n+k−2k−2)=(2k+n−1)(k+n−2)!k!(n−1)!=O(kn−1) for k≥2,

and allow us to identify the correct normalization for the Gegenbauer polynomials.

5. Using that and Theorems 2.29 and 2.34 of [Morimoto, 1998] one obtains the following identities:

 ∥φn,k(t)∥∞=φn,k(1)=αn,kvol(Sn), (6) ∥φn,k(t)∥2L2(μn)=∫1−1φn,k(t)2(1−t2)(n−2)/2dt=αn,kvol(Sn)vol(Sn−1). (7)
6. Using (5.1) and (5.3) of [Morimoto, 1998] (pages 97–98) one can express a relationship between and its derivative , namely

 φ′n,k(t)=(n+1)vol(Sn)vol(Sn+2)φn+2,k−1(t). (8)
7. Let a function, then one can decompose in the spherical harmonics as where . Theorem 2.45 of [Morimoto, 1998] in particular shows that for all one has

 k2r∥hk(τ)∥L2(Sn)≤∥(ΔSn)rh∥L2(Sn) (9)

where is the spherical Laplacian. In particular, if there exists an axis under which is rotationally invariant (i.e. for some fixed and some ) then if

 ΔSn(h)=θ′′(t)(1−t2)−ntθ′(t) (10)

(see for instance (2.9)).

Note that for all , thus . The reproducing property says that for all

 ⟨H,Fk(σ,⋅)⟩=H(σ). (11)

Observe that for all there exists such that . Then which implies that

 span({Fk(σ,⋅)}σ∈Sn)=Hk(Sn)⊂L2(Sn).

4 Analysis

Given an activation function , then since form an orthogonal basis of polynomials over with respect to some measure, we can decompose as

 θ(t)=∞∑k=0akφn,k(t),

for some . Then

 (θ∘fx)(y)=θ(x⋅y)=∞∑k=0akφn,k(x⋅y)=∞∑k=0akFk(x,y). (12)

In other words one layer of the SUNLayer neural network model (1) can be expressed as

 Ln(x)=θ∘fx=∞∑kakFk(x,⋅).

Note that if is a polynomial of degree , then which is finite dimensional. Reciprocally, finite dimensional subspaces of are included in for some finite . This observation, combined with the remark from Section 2 suggest that polynomial activation functions are a useful model for studying the composition of multiple layers.

Lemma 1 shows an alternative expression for the least squares problem (2).

Lemma 1.

For all we have

 argminx∈Sn∥θ∘fx−y∥2L2(Sn)=argmaxx∈Sn⟨θ∘fx,y⟩L2(Sn).
Proof.

Note that for all rotations we have

 θ∘fQx(z)=θ(x⊤Q⊤z)=(θ∘fx)(Q⊤z),

and so

 ∥θ∘fQx∥2L2(Sn) = ∫z∈Sn|(θ∘fQx)(z)|2dz=∫z∈Sn|(θ∘fx)(Q⊤z)|2dz = ∫z∈Sn|θ∘fx(z)|2dz=∥θ∘fx∥2.

Therefore is constant for all , which implies the lemma since

 ∥θ∘fx−y∥2=∥θ∘fx∥2+∥y2∥−2⟨θ∘fx,y⟩=constant−2⟨θ∘fx,y⟩.

Given , according to Lemma 1 and equation (12) we need to find that maximizes

 ⟨θ∘fx,θ∘fx♯⟩ = ∞∑k=0⟨akFk(x,⋅),akFk(x♯,⋅)⟩=∞∑k=0a2kFk(x,x♯) (13) = ∞∑k=0a2kφn,k(x⋅x♯)=:gθ(x⋅x♯).

Note that the second equality is a consequence of the reproducing property (11). The function will be particularly useful in our analysis.

Definition 3.

Let be an activation function, with Gegenbauer decomposition . Then we define as

Lemma 2.

If is and (convergence in ) then the functions and are well-defined (and the convergence is also point-wise and absolute). Furthermore, if is we also have that for all .

Proof.

See Appendix 7. ∎

Lemma 3.

If then

 c2n,θ=∥θ∘fx∥2L2(Sn)=vol(Sn−1)∥θ∥2L2(μn)=gθ(1)
Proof.

Consider Due to the rotational invariance observed in the proof of Lemma 1 one take , obtaining

 ∥θ∘fx∥2 =∫π0θ(cos(s))2sin(s)n−1vol(Sn−1)ds=vol(Sn−1)∫1−1θ(t)2(1−t2)(n−2)/2dt =vol(Sn−1)∥θ∥2L2(μn) =vol(Sn−1)∫1−1(∞∑k=0akφn,k(t))2(1−t2)(n−2)/2dt =vol(Sn−1)∞∑k=0a2k∫1−1φn,k(t)2(1−t2)(n−2)/2dt=∞∑k=0a2kαn,kvol(Sn)=gθ(1).

The last line is due to Fubini-Tonelli and orthogonality of the Gegenbauer polynomials. The following equality is due to (7) and the last equality is due to (6). ∎

4.1 Noiseless case

The following Theorem provides a sufficient condition that makes recovery possible in the noiseless case.

Theorem 1.

Suppose for all . Then for each , the only critical points of

 x↦∥θ∘fx−θ∘fx♯∥2

are , with being the unique local minimizer.

Proof.

Lemma 1 and equation (13) imply that critical points of coincide with critical points of . In fact, local minima of the former correspond with local maxima of the latter. Using Lagrange multipliers we have which gives optimality conditions

 {[l]0=∇xLn=g′θ(x⋅x♯)x♯+2λx0=∂∂λLn=∥x∥2−1

If for all , then which implies . Since then . ∎

4.2 Denoising

The following Theorem is the main result of this paper.

Theorem 2.

Let and . We decompose as follows:

 η=∞∑k=0dk∑i=1ek,iFk(σk,i,⋅)=:∞∑k=0ηk with ηk∈Hk(Sn).

Let and let . Then

1. Every critical point of satisfies that

2. Define , then .

Proof.

of Theorem 2 (a) According to Lemma 1 we need to solve

 maxx∈Sn⟨θ∘fx,θ∘fx♯+η⟩=maxx∈Sngθ(x⋅x♯)+⟨θ∘fx,e⟩.

The reproducing property implies

 ⟨θ∘fx,η⟩ = K∑k=0⟨akFk(x,⋅),dk∑i=1ek,iFk(σk,i,⋅)⟩ = K∑k=0akdk∑i=1ek,iφn,k(x⋅σk,i).

Therefore the denoising objective is

 maxx∈Sngθ(x⋅x♯)+K∑k=0akdk∑i=1ek,iφ(x⋅σk,i) (14)

For critical point of (14) Lagrange multipliers give us

 Ln(x,λ)=gθ(x⋅x♯)+K∑k=0akdk∑i=1ek,iφ(x⋅σk,i)+λ(∥x∥2−1)

and implies

 0=g′θ(x⋅x♯)x♯A+K∑k=0akdk∑i=1ek,iφ′n,k(x⋅σk,i)σk,iB+2λx

By hypothesis we have then which implies , therefore

 x=−12λ(g′θ(x⋅x♯)x♯+B)

and

 2|λ|=∥g′θ(x⋅x♯)x♯+B∥≤|g′θ(x⋅x♯)|+ϵ

therefore

 |x⋅x♯|=12|λ||g′θ(x⋅x♯)+Bx♯|≥g′θ(x⋅x♯)2|λ|≥g′θ(x⋅x♯)−ϵg′θ(x⋅x♯)+ϵ≥1−2ϵT+ϵ.

The key parameter in Theorem 2 depends on both the noise and the activation function . In order to understand the behavior of in terms of the noise , and prove Theorem 2 (b), we choose ( so that forms a tight frame. To this end it suffices for to form a spherical -design for .

Definition 4 (Spherical t-design).

A spherical -design is a sequence of points such that for every polynomial of degree at most we have

 1NtNt∑i=1p(xi)=∫Snp(x)dx.
Definition 5 (Tight frame).

Let be vector space with an inner product. A tight frame is a sequence such that there exists a constant so that for all

 ∑k∈I|⟨v,vk⟩|2=c∥v∥2.
Lemma 4.

If form a spherical -design with then is a tight frame for with constant .

Proof.

Let be an orthonormal basis for . Consider if and 0 otherwise. It suffices to show

 Nkδj,j′ = Nk∑i=1⟨Fk(σk,i,⋅),Yj⟩¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯⟨Fk(σk,i,⋅),Yj′⟩ = Nk∑i=1(∫τ∈SnFk(σk,i,τ)¯¯¯¯¯¯¯¯¯¯¯¯Yj(τ))¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯(∫τ∈SnFk(σk,i,τ)¯¯¯¯¯¯¯¯¯¯¯¯¯Yj′(τ)) = Nk∑i=1⎛⎝∑j′′Yj′′(σk,i)∫τ∈SnYj′′(τ)¯¯¯¯¯¯¯¯¯¯¯¯Yj(τ)⎞⎠¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯∑j′′′Yj′′′(σk,i)∫τ∈SnYj′′′(τ)¯¯¯¯¯¯¯¯¯¯¯¯¯Yj′(τ) = Nk∑i=1Yj(σk,i)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Yj′(σk,i)

Observe that if then is a polynomial of degree . Then using the -design property we get

 Nk∑i=1Yj(σk,i)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Yj′(σk,i) = Nk1NkNk∑i=1Yj(σk,i)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Yj′(σk,i) = Nk∫τ∈SnYj(τ)¯¯¯¯¯¯¯¯¯¯¯¯¯Yj′(τ) = Nk⟨Yj,Yj′⟩Hk(Sn)

which proves the theorem. ∎

Proof of Theorem 2 (b).

We choose so that is a tight frame for with constant . We write . One can uniquely decompose with and we have . In fact can be chosen so that . Following the notation in the proof of Theorem 2 (a) we have:

 B=K∑k=1akNk∑i=1ek,iφ′n,k(x⋅σk,i)σk,i

and

 ϵ=∥B∥≤K∑k=1|ak|∥∥ ∥∥Nk∑i=1ek,iφ′n,k(x⋅σk,i)σk,i∥∥ ∥∥

Let such that . Let

 ∥Gk,x∥2→2=sup∥ν∥L2(Sn)=1∥Gk,x(ν)∥Sn

then for all we have

 ϵ≤(K∑k=0|ak|∥Gk,x∥2→2)∥η∥.

Since we bound

 maxx∥Gk,x∥2→2≤MkNkNk∑i=1⟨η,F(σk,i,⋅)⟩∥σk,i∥=MkNk∑i=1|ek,i|,

obtaining the bound

 ϵ≤K∑k=1Mk|ak|Nk∑i=1|ek,i|.

Using Theorem 2 (a) we conclude that denoising is possible provided that

 K∑k=1Mk|ak|Nk∑i=1|ek,i|≤inft∈[−1,1]K∑k=1a2kφ′n,k(t).

Note that the left hand side depends on the activation function and the noise whereas the right hand side depends only on . Using the frame properties and Cauchy-Schwarz inequality one can write

 ϵ≤k∑k=1Mk|ak|∥ηk∥