The wavelet scattering transform is a mathematical model of Convolutional Neural Networks (CNNs) introduced by S. Mallat. Analogously to the feed-forward portion of a CNN, it produces a latent representation of an input signal via an alternating sequence of filter convolutions and nonlinearities. It differs, most notably, by using predesigned wavelet filters rather than filters learned from data.
Using predefined filters allows for rigorous analysis and helps us understand why a deep nonlinear network is better than a wide, shallow, linear network with the same number of parameters. Ideally, a feed-forward network should produce a representation which is sufficiently descriptive for downstream tasks, but also stable to deformations such as translations. Linear networks are typically unable to do both and often must discard high-frequency information to achieve stability. Mallat’s analysis in 
shows that the scattering transform, on the other hand, captures high-frequency information via wavelets and then pushes it down to lower, more stable, frequencies using a nonlinear activation function. Thus, the nonlinear structure enables the network to stably capture high-frequency information.
The scattering transform also helps us understand which filters are useful for effectively encoding information. While the optimal choice is task dependent, wavelets are often a good choice since natural images are typically sparse in the wavelet basis and as discussed above, they are able to capture high-frequency information. Moreover, and perhaps most importantly, the filters learned in the early layers of CNNs typically resemble wavelets.
This paper focuses on the choice of filters for later layers of the network. In particular, we propose a two-layer hybrid scattering model. In the first layer, we use a wavelet convolution to sparsify the input. Then, we use a Gabor type filter to leverage this sparsity.
For simplicity, we assume that the input is a piecewise polynomial whose knots are located at points . We shall also assume that each of its piecewise components has degree at most . We let be a mother wavelet with and let
We will assume that has
vanishing moments, which implies that(see e.g. ). It follows that is contained in
. To further promote sparsity, we next apply a max-pooling operator:
As summarized in the following theorem, this yields a linear combination of Dirac delta functions.
Assume that . Then,
In our second layer, rather than another wavelet, we use a Gabor filter
where the parameters and determine the scale and central frequency and the window function is supported on an interval of unit length. Next, we take the norm for some integer As a result, we obtain translation invariant hybrid scattering coefficients
By design, these measurements are invariant to translations, reflections, and global sign changes. We aim to investigate the ability of our measurements to characterize up to these natural ambiguities. The wavelet-modulus is known to be a powerful signal descriptor. Therefore, in light of Theorem 1, we shall analyze the ability of the measurements
to characterize signals of the form
For such a signal, we will let
be the vector] defined byand let denote its norm.
We will show that our measurements characterize the support set . For , we let and consider the difference set
We will assume that is collision free, i.e., that except for when and that is contained in a fine grid, for some . Under these assumptions, it is known [1, 5] that the support set is determined (up to reflection and translation) by except for in the case where and the belong to a specific parametric family. (See Theorem 1 of  for full details. For the remainder of this work, we will assume that does not belong to this family and therefore the support set is determined by ) This motivates the following theorem which shows that the measurements (2) uniquely determine .
Let be an integer and let . Then for almost every , the function
is piecewise linear. Morover, the set of its isolated singularities is exactly the support set .
Theorem 4 shows that selecting a single random frequency and enough scales such that there is one in between each element of allows us detect the location of each point of by evaluating at each of the (up to a precision corresponding to the density of the scales). The next result shows that the amplitudes can also be recovered with randomly chosen frequencies. Thus, the measurements (2) characterize sparse signals up to natural ambiguities.
Let and, let
be a sparse signal of the form (3).
Let be i.i.d. standard normal random variables, where is odd. Then the following uniqueness result holds almost surely:
be i.i.d. standard normal random variables, whereis assumed to be at least if is even and at least if
is odd. Then the following uniqueness result holds almost surely:
Suppose that that , and that
for all and all .
Then we have that , and therefore is equivalent to up to translation, reflection, and global sign change.
Ii Generalized Exponential Polynomials
We let denote the set of functions that can be written as
where and . Since the are allowed to be arbitrary (possibly negative or irrational) real numbers, we call these functions generalized exponential polynomials. For we refer to as the degree of . We let refer to the set of all with and let denote the set of such that
The following lemma shows that each has a unique representation as the sum of exponentials, and that therefore, the degree of is well defined.
Then if and only if and for all and
Lemma 1 implies that if and , then
In particular, if
Furthermore, if then
except, of course, if and the lead coefficients of and are negatives of one another.
For let assume that Then the set of points such that
has measure zero.
Let be an odd integer, and let Let and If there are more than distinct such that
Let be an integer and let Then the set of such that
has measure zero.
Iii The proof of Theorem 2
Before proving Theorem 2 we will first prove a preliminary result which shows, even without the assumption that is collision free, that is a peicewise linear function whose set of knots is contained in . This result is based on the observation that we may write
where for each ,
is a function that only depends on and is piecewise linear function of whose singularities are contained in
Specifically, we prove the following theorem. We emphasize that this result does not assume that is collision free, which is why for there might be multiple such that .
Let be an integer, and assume For be as in (10). Then, for every fixed the function is piecewise linear, and is a grid-free sparse signal whose support is contained in Specifically,
We first note that
For let be the set of for which is nonzero if and only if , i.e.,
Then, since it is clear that for
where denotes the Lebesgue measure of . We will show that for all , is piecewise linear function whose knots are contained in
First, we note that unless has the form for some Therefore,
where and, as in (10), is given by
Now, turning our attention to we observe by definition that a point is in if and only if it satisfies the following three conditions:
Therefore, letting and denote and , we see
if the above quantity is positive and zero otherwise. It follows from that is a piecewise linear function, and that is given by
We note that in order for this equation to be valid for all we identify and with and and therefore, are interpreted as being the zero function since the domain of is Likewise is interpreted as the zero function in the above equation.
as desired. ∎
Before we prove Theorem 2, we note the following example which shows that, in general, the support of may be a proper subset of
We shall now prove Theorem 2.
The Proof of Theorem 2.
and for ,
Observe that are generalized exponential Laurent polynomials of the form introduced in Section II, and in particular, Therefore, when it follows from Lemma 2 that vanishes on a set of measure zero since if we have
In the case where we see that
For any such that we see that is a solution to
Thus, vanishes on a set of measure zero since the left-hand side of the above equation is a trigonometric polynomial.
Iv The Proof of Theorems 3
Let be i.i.d. standard normal random variables. Since
is collision free, with probability one, each of theare distinct modulo , i.e.
for all and except when For the rest of the proof we will assume this is the case.
be a signal , and for all and for all Note that depends on but is independent of By assumption that and are collision free (and also, as discussed in the Section I, we assume that we are not in the special case where and the belong to a special parametrized family). Therefore, the fact that implies that the support sets of and are equivalent up to translation and reflection, so we may assume without loss of generality that for all
We will show that must be given by
Then, we will show that, if satisfies (21), but then with probability one
Since (and therefore ) was chosen to depend on but not these two facts together will imply that, with probability one, if is any signal such that and for all and all then and therefore is equivalent to up to reflection and translation.
Therefore, constitute solutions, which are distinct modulo to the equation
Now consider the case where is even. Similarly to (22), the assumption that implies that for all
Therefore, for all are zeros of
which are distinct modulo Using the fact that
one may verify that is a trigonometric polynomial of degree at most given by
Thus, since this implies that must be uniformly zero. In particular, setting the lead coefficient equal to zero implies
for all Using the binomial theorem and setting the coefficient equal to zero gives