DeepAI

# A Hybrid Scattering Transform for Signals with Isolated Singularities

The scattering transform is a wavelet-based model of Convolutional Neural Networks originally introduced by S. Mallat. Mallat's analysis shows that this network has desirable stability and invariance guarantees and therefore helps explain the observation that the filters learned by early layers of a Convolutional Neural Network typically resemble wavelets. Our aim is to understand what sort of filters should be used in the later layers of the network. Towards this end, we propose a two-layer hybrid scattering transform. In our first layer, we convolve the input signal with a wavelet filter transform to promote sparsity, and, in the second layer, we convolve with a Gabor filter to leverage the sparsity created by the first layer. We show that these measurements characterize information about signals with isolated singularities. We also show that the Gabor measurements used in the second layer can be used to synthesize sparse signals such as those produced by the first layer.

• 18 publications
• 3 publications
• 10 publications
• 22 publications
12/20/2013

### Generic Deep Networks with Wavelet Scattering

We introduce a two-layer wavelet scattering network, for object classifi...
07/11/2017

### Underwater object classification using scattering transform of sonar signals

In this paper, we apply the scattering transform (ST), a nonlinear map b...
06/16/2022

### The Scattering Transform Network with Generalized Morse Wavelets and Its Application to Music Genre Classification

We propose to use the Generalized Morse Wavelets (GMWs) instead of commo...
10/18/2015

### Clustering Noisy Signals with Structured Sparsity Using Time-Frequency Representation

We propose a simple and efficient time-series clustering framework parti...
09/27/2018

### Compressing the Input for CNNs with the First-Order Scattering Transform

We study the first-order scattering transform as a candidate for reducin...
03/07/2019

### A Learnable ScatterNet: Locally Invariant Convolutional Layers

In this paper we explore tying together the ideas from Scattering Transf...
03/27/2017

### Scaling the Scattering Transform: Deep Hybrid Networks

We use the scattering network as a generic and fixed ini-tialization of ...

## I Introduction

The wavelet scattering transform is a mathematical model of Convolutional Neural Networks (CNNs) introduced by S. Mallat[3]. Analogously to the feed-forward portion of a CNN, it produces a latent representation of an input signal via an alternating sequence of filter convolutions and nonlinearities. It differs, most notably, by using predesigned wavelet filters rather than filters learned from data.

Using predefined filters allows for rigorous analysis and helps us understand why a deep nonlinear network is better than a wide, shallow, linear network with the same number of parameters. Ideally, a feed-forward network should produce a representation which is sufficiently descriptive for downstream tasks, but also stable to deformations such as translations. Linear networks are typically unable to do both and often must discard high-frequency information to achieve stability. Mallat’s analysis in [3]

shows that the scattering transform, on the other hand, captures high-frequency information via wavelets and then pushes it down to lower, more stable, frequencies using a nonlinear activation function. Thus, the nonlinear structure enables the network to stably capture high-frequency information.

The scattering transform also helps us understand which filters are useful for effectively encoding information. While the optimal choice is task dependent, wavelets are often a good choice since natural images are typically sparse in the wavelet basis and as discussed above, they are able to capture high-frequency information. Moreover, and perhaps most importantly, the filters learned in the early layers of CNNs typically resemble wavelets.

This paper focuses on the choice of filters for later layers of the network. In particular, we propose a two-layer hybrid scattering model. In the first layer, we use a wavelet convolution to sparsify the input. Then, we use a Gabor type filter to leverage this sparsity.

For simplicity, we assume that the input is a piecewise polynomial whose knots are located at points . We shall also assume that each of its piecewise components has degree at most . We let be a mother wavelet with and let

 ψℓ(t)=12ℓψ(t2ℓ).

We will assume that has

vanishing moments, which implies that

(see e.g. [2]). It follows that is contained in

. To further promote sparsity, we next apply a max-pooling operator:

 MPℓz(t)={z(t)if z(t)=maxt′∈[ti−2ℓ,ti+2ℓ]∩hZz(t′)0otherwise.

As summarized in the following theorem, this yields a linear combination of Dirac delta functions.

###### Theorem 1.

Assume that . Then,

 MPℓ(|ψℓ⋆y|)(t)=k∑j=1ajδvj(t).

for some

In our second layer, rather than another wavelet, we use a Gabor filter

 gs,ξ=w(ts)eiξt, (1)

where the parameters and determine the scale and central frequency and the window function is supported on an interval of unit length. Next, we take the norm for some integer As a result, we obtain translation invariant hybrid scattering coefficients

 ∥gs,ξ⋆MPℓ(|ψℓ⋆y|)∥p.

By design, these measurements are invariant to translations, reflections, and global sign changes. We aim to investigate the ability of our measurements to characterize up to these natural ambiguities. The wavelet-modulus is known to be a powerful signal descriptor[4]. Therefore, in light of Theorem 1, we shall analyze the ability of the measurements

 fξ(s)[x]\coloneqq∥gs,ξ⋆x∥p (2)

to characterize signals of the form

 x(t)=k∑j=1ajδvj(t). (3)

For such a signal, we will let

be the vector] defined by

and let denote its norm.

To supplement our theory, we will show that the measurements (2) can be used to reconstruct a sparse signal of the form (3) up to translations, reflections and global sign changes in Section VI.

We will show that our measurements characterize the support set . For , we let and consider the difference set

 D(x)\coloneqq{Δi,j:1≤i

We will assume that is collision free, i.e., that except for when and that is contained in a fine grid, for some . Under these assumptions, it is known [1, 5] that the support set is determined (up to reflection and translation) by except for in the case where and the belong to a specific parametric family. (See Theorem 1 of [5] for full details. For the remainder of this work, we will assume that does not belong to this family and therefore the support set is determined by ) This motivates the following theorem which shows that the measurements (2) uniquely determine .

###### Theorem 2.

Let be an integer and let . Then for almost every , the function

 fξ(s)=∥gs,ξ⋆x∥p

is piecewise linear. Morover, the set of its isolated singularities is exactly the support set .

Theorem 4 shows that selecting a single random frequency and enough scales such that there is one in between each element of allows us detect the location of each point of by evaluating at each of the (up to a precision corresponding to the density of the scales). The next result shows that the amplitudes can also be recovered with randomly chosen frequencies. Thus, the measurements (2) characterize sparse signals up to natural ambiguities.

###### Theorem 3.

Let and, let

 x(t)=k∑j=1ajδvj(t)

be a sparse signal of the form (3). Let

be i.i.d. standard normal random variables, where

is assumed to be at least if is even and at least if

is odd. Then the following uniqueness result holds almost surely:

Let

 ˜x(t)=k∑j=1˜ajδ˜vj(t).

Suppose that that , and that

 ∂2sfξℓ[x](d)=∂2sfξℓ[˜x](d)

for all and all .

Then we have that , and therefore is equivalent to up to translation, reflection, and global sign change.

## Ii Generalized Exponential Polynomials

In this section, we will introduce some notation and state some lemmas that are needed in order to prove Theorems 2 and 3. For the proof of the lemmas in this section, please see section V.

We let denote the set of functions that can be written as

 p(θ)=N∑k=1αkeiγkθ (4)

where and . Since the are allowed to be arbitrary (possibly negative or irrational) real numbers, we call these functions generalized exponential polynomials. For we refer to as the degree of . We let refer to the set of all with and let denote the set of such that

The following lemma shows that each has a unique representation as the sum of exponentials, and that therefore, the degree of is well defined.

###### Lemma 1.

Let with

 p(θ)=N∑k=1αkeiγkθ and q(θ)=N′∑k=1βkeiηkθ.

Then if and only if and for all and

Lemma 1 implies that if and , then

 pq∈E(d1+d2). (5)

In particular, if

 |p|2=p¯p∈E(d+0)=E(d). (6)

Furthermore, if then

 (p+q)∈E(d1), (7)

except, of course, if and the lead coefficients of and are negatives of one another.

The next several lemmas will be needed in the proofs of Theorems 2 and 3.

###### Lemma 2.

For let assume that Then the set of points such that

 |p1(θ)|p+|p2(θ)|p=|p3(θ)|p+|p4(θ)|p (8)

has measure zero.

###### Lemma 3.

Let be an odd integer, and let Let and If there are more than distinct such that

 |p(θ)|p−|q(θ)|p=C,

then and

###### Lemma 4.

Let be an integer and let Then the set of such that

 ∣∣a+beiθ+cei(γ+1)θ∣∣p−∣∣∣κa+1κbeiθ+κcei(γ+1)θ∣∣∣p=C (9)

has measure zero.

## Iii The proof of Theorem 2

Before proving Theorem 2 we will first prove a preliminary result which shows, even without the assumption that is collision free, that is a peicewise linear function whose set of knots is contained in . This result is based on the observation that we may write

 fξ(s)=∑i

where for each ,

 βi,j(ξ)\coloneqqj∑ℓ=iaℓeiξΔi,ℓ (10)

is a function that only depends on and is piecewise linear function of whose singularities are contained in

Specifically, we prove the following theorem. We emphasize that this result does not assume that is collision free, which is why for there might be multiple such that .

###### Theorem 4.

Let be an integer, and assume For be as in (10). Then, for every fixed the function is piecewise linear, and is a grid-free sparse signal whose support is contained in Specifically,

 ∂2sfξ(s)=∑d∈D(x)⎛⎝∑Δi,j=dci,j(ξ)⎞⎠δd, (11)

where

 ci,i+1(ξ)=|βi,i+1(ξ)|p−|βi+1,i+1(ξ)|p−|βi,i(ξ)|p (12)

and for

 ci,j(ξ)=|βi,j(ξ)|p+|βi+1,j−1(ξ)|p−|βi+1,j(ξ)|p−|βi,j−1(ξ)|p. (13)
###### Proof.

We first note that

 |(gs,ξ⋆x)(t)| =∣∣ ∣∣k∑i=1aigs,ξ(t−vi)∣∣ ∣∣ =∣∣ ∣∣k∑i=1aieiξ(t−vi)1[vi,vi+s](t)∣∣ ∣∣ =∣∣ ∣∣k∑i=1aie−iξvi1[vi,vi+s](t)∣∣ ∣∣.

For let be the set of for which is nonzero if and only if , i.e.,

 RI(s)={t:t∈[vi,vi+s]∀i∈I,t∉[vi,vi+s]∀i∉I}.

Then, since it is clear that for

 |(gs,ξ⋆x)(t)|=∣∣ ∣∣∑i∈Iaie−iξvi∣∣ ∣∣\eqqcolonyI(ξ).

Therefore,

 fξ(s)=∥(gs,ξ⋆x)(t)∥pp=∑I⊆{1,…k}|yI(ξ)|p|RI(s)|, (14)

where denotes the Lebesgue measure of . We will show that for all , is piecewise linear function whose knots are contained in

First, we note that unless has the form for some Therefore,

 fs(ξ)=k∑i=1k∑j=i|βi,j(ξ)|p|Ri,j(s)|, (15)

where and, as in (10), is given by

 |βi,j(ξ)|=∣∣ ∣∣j∑ℓ=iaℓeiξΔi,ℓ∣∣ ∣∣=∣∣ ∣∣j∑ℓ=iaℓeiξvℓ∣∣ ∣∣.

Now, turning our attention to we observe by definition that a point is in if and only if it satisfies the following three conditions:

 vℓ≤ t≤vℓ+sfor all i≤ℓ≤j, t>vi−1+s, and t

Therefore, letting and denote and , we see

 Ri,j(s) =[vj,vi+s]∩[vi−1+s,vj+1] (16) =[vj∨(vi−1+s),(vi+s)∧vj+1], (17)

and therefore

 |Ri,j(s)|=((vi+s)∧vj+1)−(vj∨(vi−1+s))

if the above quantity is positive and zero otherwise. It follows from that is a piecewise linear function, and that is given by

 ∂2s|Ri,j(S)|=δΔi,j(s)+δΔi−1,j+1(s)−δΔi−1,j(s)−δΔi,j+1(s). (18)

We note that in order for this equation to be valid for all we identify and with and and therefore, are interpreted as being the zero function since the domain of is Likewise is interpreted as the zero function in the above equation.

Combining (18) with (15) implies that is a sparse signal with support contained in and for

 ∂2sfξ(d)=∑Δi,j=dci,j(ξ)

as desired. ∎

Before we prove Theorem 2, we note the following example which shows that, in general, the support of may be a proper subset of

###### Example 1.

If and

 x(t)=δ1(t)+δ2(t)+δ3(t)−δ4(t),

then but

 ∂2sfξ(2)=0.
###### Proof.

For this choice of there are two pairs such that namely and . Therefore, by Theorem 4,

 ∂2sfξ(2) =(|y1,3(ξ)|2+|y2,2(ξ)|2−|y1,2(ξ)|2−|y2,3(ξ)|2) +(|y2,4(ξ)|2+|y3,3(ξ)|2−|y2,3(ξ)|2−|y3,4(ξ)|2).

Inserting and into (10) implies that

 ∂2sfξ(2) =(|1+eiξ+e2iξ|2+1−|1+eiξ|2−|1+eiξ|2) +(|1+eiξ−e2iξ|2+1−|1+eiξ|2−|1−eiξ|2) =|1+eiξ+e2iξ|2+|1+eiξ−e2iξ|2 +2−3|1+eiξ|2−|1−eiξ|2 =0.

The last inequality follows from repeatedly applying the the trigonometric identities and

We shall now prove Theorem 2.

###### The Proof of Theorem 2.

By assumption, is collision free. Therefore, for all , there is a unique such that , and so, by (11), it suffices to show that for all and for almost every where as in (12) and for (13)

 ci,i+1(ξ)=|βi,i+1(ξ)|p−|βi+1,i+1(ξ)|p−|βi,i(ξ)|p,

and for ,

 ci,j(ξ)=|βi,j(ξ)|p+|βi+1,j−1(ξ)|p−|βi+1,j(ξ)|p−|βi,j−1(ξ)|p,

where

 βi,j(ξ)=∣∣ ∣∣j∑k=iake−iξΔi,k∣∣ ∣∣.

Observe that are generalized exponential Laurent polynomials of the form introduced in Section II, and in particular, Therefore, when it follows from Lemma 2 that vanishes on a set of measure zero since if we have

 |βi,j(ξ)|p+|βi+1,j−1(ξ)|p=|βi+1,j(ξ)|p+|βi,j−1(ξ)|p.

In the case where we see that

 ci,i+1(ξ)=|ai+ai+1e−iξΔi,i+1|p−|ai|p−|ai+1|p,

For any such that we see that is a solution to

 ∣∣ai+ai+1eiθ∣∣2−(|ai|p+|ai+1|p)2/p=0.

Thus, vanishes on a set of measure zero since the left-hand side of the above equation is a trigonometric polynomial.

## Iv The Proof of Theorems 3

###### Proof.

Let be i.i.d. standard normal random variables. Since

is collision free, with probability one, each of the

are distinct modulo , i.e.

 ξℓΔi,i+1(x)≢ξℓ′Δi′,i′+1(x)mod2π (19)

for all and except when For the rest of the proof we will assume this is the case.

Let

 ˜x(t)=k∑j=1˜ajδ˜vj(t)

be a signal , and for all and for all Note that depends on but is independent of By assumption that and are collision free (and also, as discussed in the Section I, we assume that we are not in the special case where and the belong to a special parametrized family). Therefore, the fact that implies that the support sets of and are equivalent up to translation and reflection, so we may assume without loss of generality that for all

We will show that must be given by

 ˜ai={1caiif i is oddcaiif i is even, (20)

where or

 |c|p=∑⌊k+12⌋i=1|a2i−1|p∑⌊k2⌋i=1|a2i|p. (21)

Then, we will show that, if satisfies (21), but then with probability one

 ∂2sfξL[x](Δ1,3)≠∂2sfξL[˜x](Δ1,3).

Since (and therefore ) was chosen to depend on but not these two facts together will imply that, with probability one, if is any signal such that and for all and all then and therefore is equivalent to up to reflection and translation.

We first will show that (20) holds in the case where is odd. Setting and using (12) implies that for all and all we have

 |ai+ai+1eiξℓΔi,i+1|p−|ai+1|p−|ai|p = |˜ai+˜ai+1eiξℓΔi,i+1|p−|˜ai+1|p−|˜ai|p. (22)

Therefore, constitute solutions, which are distinct modulo to the equation

 |ai+ai+1eiθ|p−|˜ai+˜aiθi+1|p=|˜ai|p+|˜ai+1|p−|ai|p−|ai+1|p.

Since Lemma 3 implies that

 aiai+1=˜ai˜ai+1and˜a2i+˜a2i+1 (23)

for all It follows from (23) that (20) holds with

Now consider the case where is even. Similarly to (22), the assumption that implies that for all

 |ai+ai+1eiξℓΔi,i+1|2m−|ai|2m−|ai+1|2m = |˜ai+˜ai+1eiξℓΔi,i+1|2m−|˜ai|2m−|˜ai+1|2m.

Therefore, for all are zeros of

 hi(θ) \coloneqq|ai+ai+1eiθ|2m−|˜ai+˜ai+1eiθ|2m +|˜ai|2m+|˜ai+1|2m−|ai|2m−|ai+1|2m

which are distinct modulo Using the fact that

 |ai+ai+1eiθ|2=a2i+a2i+1+2aiai+1cos(θ)

one may verify that is a trigonometric polynomial of degree at most given by

 hi(θ) =(a2i+a2i+1+2aiai+1cos(θ))m −(˜a2i+˜a2i+1+2˜ai˜ai+1cos(θ))m +˜a2mi+˜a2mi+1−a2mi−a2mi+1

Thus, since this implies that must be uniformly zero. In particular, setting the lead coefficient equal to zero implies

 (aiai+1)m=(˜ai˜ai+1)m

for all Using the binomial theorem and setting the coefficient equal to zero gives

 (a