## I Introduction

The wavelet scattering transform is a mathematical model of Convolutional Neural Networks (CNNs) introduced by S. Mallat[3]. Analogously to the feed-forward portion of a CNN, it produces a latent representation of an input signal via an alternating sequence of filter convolutions and nonlinearities. It differs, most notably, by using predesigned wavelet filters rather than filters learned from data.

Using predefined filters allows for rigorous analysis and helps us understand why a deep nonlinear network is better than a wide, shallow, linear network with the same number of parameters. Ideally, a feed-forward network should produce a representation which is sufficiently descriptive for downstream tasks, but also stable to deformations such as translations. Linear networks are typically unable to do both and often must discard high-frequency information to achieve stability. Mallat’s analysis in [3]

shows that the scattering transform, on the other hand, captures high-frequency information via wavelets and then pushes it down to lower, more stable, frequencies using a nonlinear activation function. Thus, the nonlinear structure enables the network to stably capture high-frequency information.

The scattering transform also helps us understand which filters are useful for effectively encoding information. While the optimal choice is task dependent, wavelets are often a good choice since natural images are typically sparse in the wavelet basis and as discussed above, they are able to capture high-frequency information. Moreover, and perhaps most importantly, the filters learned in the early layers of CNNs typically resemble wavelets.

This paper focuses on the choice of filters for later layers of the network. In particular, we propose a two-layer hybrid scattering model. In the first layer, we use a wavelet convolution to sparsify the input. Then, we use a Gabor type filter to leverage this sparsity.

For simplicity, we assume that the input is a piecewise polynomial whose knots are located at points . We shall also assume that each of its piecewise components has degree at most . We let be a mother wavelet with and let

We will assume that has

vanishing moments, which implies that

(see e.g. [2]). It follows that is contained in. To further promote sparsity, we next apply a max-pooling operator:

As summarized in the following theorem, this yields a linear combination of Dirac delta functions.

###### Theorem 1.

Assume that . Then,

for some

In our second layer, rather than another wavelet, we use a Gabor filter

(1) |

where the parameters and determine the scale and central frequency and the window function is supported on an interval of unit length. Next, we take the norm for some integer As a result, we obtain translation invariant hybrid scattering coefficients

By design, these measurements are invariant to translations, reflections, and global sign changes. We aim to investigate the ability of our measurements to characterize up to these natural ambiguities. The wavelet-modulus is known to be a powerful signal descriptor[4]. Therefore, in light of Theorem 1, we shall analyze the ability of the measurements

(2) |

to characterize signals of the form

(3) |

For such a signal, we will let

be the vector] defined by

and let denote its norm.To supplement our theory, we will show that the measurements (2) can be used to reconstruct a sparse signal of the form (3) up to translations, reflections and global sign changes in Section VI.

We will show that our measurements characterize the support set . For , we let and consider the difference set

We will assume that is collision free, i.e., that except for when and that is contained in a fine grid, for some . Under these assumptions, it is known [1, 5] that the support set is determined (up to reflection and translation) by except for in the case where and the belong to a specific parametric family. (See Theorem 1 of [5] for full details. For the remainder of this work, we will assume that does not belong to this family and therefore the support set is determined by ) This motivates the following theorem which shows that the measurements (2) uniquely determine .

###### Theorem 2.

Let be an integer and let . Then for almost every , the function

is piecewise linear. Morover, the set of its isolated singularities is exactly the support set .

Theorem 4 shows that selecting a single random frequency and enough scales such that there is one in between each element of allows us detect the location of each point of by evaluating at each of the (up to a precision corresponding to the density of the scales). The next result shows that the amplitudes can also be recovered with randomly chosen frequencies. Thus, the measurements (2) characterize sparse signals up to natural ambiguities.

###### Theorem 3.

Let and, let

be a sparse signal of the form (3). Let

be i.i.d. standard normal random variables, where

is assumed to be at least if is even and at least ifis odd. Then the following uniqueness result holds almost surely:

Let

Suppose that that , and that

for all and all .

Then we have that , and therefore is equivalent to up to translation, reflection, and global sign change.

## Ii Generalized Exponential Polynomials

In this section, we will introduce some notation and state some lemmas that are needed in order to prove Theorems 2 and 3. For the proof of the lemmas in this section, please see section V.

We let denote the set of functions that can be written as

(4) |

where and . Since the are allowed to be arbitrary (possibly negative or irrational) real numbers, we call these functions generalized exponential polynomials. For we refer to as the degree of . We let refer to the set of all with and let denote the set of such that

The following lemma shows that each has a unique representation as the sum of exponentials, and that therefore, the degree of is well defined.

###### Lemma 1.

Let with

Then if and only if and for all and

Lemma 1 implies that if and , then

(5) |

In particular, if

(6) |

Furthermore, if then

(7) |

except, of course, if and the lead coefficients of and are negatives of one another.

###### Lemma 2.

For let assume that Then the set of points such that

(8) |

has measure zero.

###### Lemma 3.

Let be an odd integer, and let Let and If there are more than distinct such that

then and

###### Lemma 4.

Let be an integer and let Then the set of such that

(9) |

has measure zero.

## Iii The proof of Theorem 2

Before proving Theorem 2 we will first prove a preliminary result which shows, even without the assumption that is collision free, that is a peicewise linear function whose set of knots is contained in . This result is based on the observation that we may write

where for each ,

(10) |

is a function that only depends on and is piecewise linear function of whose singularities are contained in

Specifically, we prove the following theorem. We emphasize that this result does not assume that is collision free, which is why for there might be multiple such that .

###### Theorem 4.

Let be an integer, and assume For be as in (10). Then, for every fixed the function is piecewise linear, and is a grid-free sparse signal whose support is contained in Specifically,

(11) |

where

(12) |

and for

(13) |

###### Proof.

We first note that

For let be the set of for which is nonzero if and only if , i.e.,

Then, since it is clear that for

Therefore,

(14) |

where denotes the Lebesgue measure of . We will show that for all , is piecewise linear function whose knots are contained in

Now, turning our attention to we observe by definition that a point is in if and only if it satisfies the following three conditions:

Therefore, letting and denote and , we see

(16) | ||||

(17) |

and therefore

if the above quantity is positive and zero otherwise. It follows from that is a piecewise linear function, and that is given by

(18) |

We note that in order for this equation to be valid for all we identify and with and and therefore, are interpreted as being the zero function since the domain of is Likewise is interpreted as the zero function in the above equation.

as desired. ∎

Before we prove Theorem 2, we note the following example which shows that, in general, the support of may be a proper subset of

###### Example 1.

If and

then but

###### Proof.

We shall now prove Theorem 2.

###### The Proof of Theorem 2.

By assumption, is collision free. Therefore, for all , there is a unique such that , and so, by (11), it suffices to show that for all and for almost every where as in (12) and for (13)

and for ,

where

Observe that are generalized exponential Laurent polynomials of the form introduced in Section II, and in particular, Therefore, when it follows from Lemma 2 that vanishes on a set of measure zero since if we have

In the case where we see that

For any such that we see that is a solution to

Thus, vanishes on a set of measure zero since the left-hand side of the above equation is a trigonometric polynomial.

∎

## Iv The Proof of Theorems 3

###### Proof.

Let be i.i.d. standard normal random variables. Since

is collision free, with probability one, each of the

are distinct modulo , i.e.(19) |

for all and except when For the rest of the proof we will assume this is the case.

Let

be a signal , and for all and for all Note that depends on but is independent of By assumption that and are collision free (and also, as discussed in the Section I, we assume that we are not in the special case where and the belong to a special parametrized family). Therefore, the fact that implies that the support sets of and are equivalent up to translation and reflection, so we may assume without loss of generality that for all

We will show that must be given by

(20) |

where or

(21) |

Then, we will show that, if satisfies (21), but then with probability one

Since (and therefore ) was chosen to depend on but not these two facts together will imply that, with probability one, if is any signal such that and for all and all then and therefore is equivalent to up to reflection and translation.

We first will show that (20) holds in the case where is odd. Setting and using (12) implies that for all and all we have

(22) |

Therefore, constitute solutions, which are distinct modulo to the equation

Now consider the case where is even. Similarly to (22), the assumption that implies that for all

Therefore, for all are zeros of

which are distinct modulo Using the fact that

one may verify that is a trigonometric polynomial of degree at most given by

Thus, since this implies that must be uniformly zero. In particular, setting the lead coefficient equal to zero implies

for all Using the binomial theorem and setting the coefficient equal to zero gives