Group Invariant Scattering

01/12/2011 ∙ by Stéphane Mallat, et al. ∙ 0

This paper constructs translation invariant operators on L2(R^d), which are Lipschitz continuous to the action of diffeomorphisms. A scattering propagator is a path ordered product of non-linear and non-commuting operators, each of which computes the modulus of a wavelet transform. A local integration defines a windowed scattering transform, which is proved to be Lipschitz continuous to the action of diffeomorphisms. As the window size increases, it converges to a wavelet scattering transform which is translation invariant. Scattering coefficients also provide representations of stationary processes. Expected values depend upon high order moments and can discriminate processes having the same power spectrum. Scattering operators are extended on L2 (G), where G is a compact Lie group, and are invariant under the action of G. Combining a scattering on L2(R^d) and on Ld (SO(d)) defines a translation and rotation invariant scattering on L2(R^d).



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories


Topics Course on Deep Learning UC Berkeley

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Symmetry and invariants, which play a major role in physics [6], are making their way into signal information processing. The information content of sounds or images is typically not affected under the action of finite groups such as translations or rotations, and it is stable to the action of small diffeomorphisms that deform signals [21]. This motivates the study of translation-invariant representations of functions, which are Lipschitz continuous to the action of diffeomorphisms, and which keep high-frequency information to discriminate different types signals. Invariance to the action of compact Lie groups and rotations are then studied.

We first concentrate on translation invariance. Let denote the translation of by . An operator from to a Hilbert space is translation-invariant if for all and . Canonical translation invariant operators satisfy for some which depends upon [15]

. The modulus of the Fourier transform of

is an example of non-canonical translation invariant operator. However, these translation invariant operators are not Lipschitz continuous to the action of diffeomorphisms. Instabilities to deformations are well-known to appear at high frequencies [10]. The major difficulty is to maintain the Lipschitz continuity over high frequencies.

To preserve stability in we want to be nonexpansive:

It is then sufficient to verify its Lipschitz continuity relatively to the action of small diffeomorphisms close to translations. Such a diffeomorphism transforms into , where is the displacement field. Let denote the action of the diffeomorphism on . Lipschitz stability means that is bounded by the “size” of the diffeomorphism and hence by the distance between the and , up to a multiplicative constant multiplied by . Let denote the Euclidean norm in , the sup norm of the matrix , and

the sup norm of the Hessian tensor. The weak topology on

diffeomorphisms defines a distance between and , over any compact subset of , by:


A translation invariant operator is said to be Lipschitz continuous to the action of diffeomorphisms if for any compact there exists such that for all supported in and all


Since is translation invariant, the Lipschitz upper bound does not depend upon the maximum translation amplitude of the diffeomorphism metric (1). The Lipschitz continuity (2) implies that is invariant to global translations, but it is much stronger. is almost invariant to “local translations” by , up to the first and second order deformation terms.

High frequency instabilities to deformations can be avoided by grouping frequencies into dyadic packets in , with a wavelet transform. However, a wavelet transform is not translation invariant. A translation invariant operator is constructed with a scattering procedure along multiple paths, which preserves the Lipschitz stability of wavelets to the action of diffeomorphisms. A scattering propagator is first defined as a path ordered product of non-linear and non-commuting operators, each of which computes the modulus of a wavelet transform [13]

. This cascade of convolutions and modulus can also be interpreted as a convolutional neural-network

[11]. A windowed scattering transform is a nonexpansive operator which locally integrates the scattering propagator output. For appropriate wavelets, the main theorem in Section 2 prove that a windowed scattering preserves the norm: for all , and it is Lipschitz continuous to diffeomorphisms.

When the window size increases, windowed scattering transforms converge to a translation invariant scattering transform, defined on a path set which is not countable. Section 3 introduces a measure and a metric on , and proves that scattering transforms of functions in belong to . A scattering transform has striking similarities with a Fourier transform modulus, but a different behavior relatively to the action of diffeomorphisms. Numerical examples are shown. An open conjecture remains on conditions for strong convergence in .

The representation of stationary processes with the Fourier power spectrum results from the translation invariance of the Fourier modulus. Similarly, Section 4 defines an expected scattering transform which maps stationary processes to an

space. Scattering coefficients depend upon high order moments of stationary processes, and can thus discriminate processes having same second-order moments. As opposed to the Fourier spectrum, a scattering representation is Lipschitz continuous to random deformations up to a log term. For large classes of ergodic processes, it is numerically observed that the scattering transform of a single realization provides a mean-square consistent estimator of the expected scattering transform.

Section 5 extends scattering operators to build invariants to actions of compact Lie groups . The left action of on is denoted . An operator on is invariant to the action of if for all and all . Invariant scattering operators are constructed on with a scattering propagator which iterates on a wavelet transform defined on , and a modulus operator which removes complex phases. A translation and rotation invariant scattering on is obtained by combining a scattering on and a scattering on .

A software package is available at, to reproduce numerical experiments. Applications to audio and image classification can be found in [1, 3, 4, 18].

Notations: , , and where is the norm of the Hessian tensor. The inner product of is . The norm of in a Hilbert space is and in : . The norm in is . We denote the Fourier transform of . We denote the action of a group element . An operator parametrized by is denoted and . The sup norm of a linear operator in is denoted and the commutator of two operators is .

2 Finite Path Scattering

To avoid high frequency instabilities under the action of diffeomorphisms, Section 2.2 introduces scattering operators, which iteratively apply wavelet transforms and remove complex phases with a modulus. Section 2.3 proves that a scattering is nonexpansive and preserves norms. Translation invariance and Lipschitz continuity to deformations are proved in Section 2.4 and 2.5.

2.1 From Fourier to Littlewood-Paley Wavelets

The Fourier transform modulus is translation-invariant. Indeed for , the translation satisfies and hence . However, deformations lead to well-known instabilities at high frequencies [10]. This is illustrated with a small scaling operator, , for and . If then scaling by translates the central frequency to . If is regular with a fast decay then


Since can be arbitrarily large, does not satisfy the Lipschitz continuity condition (2) when scaling high frequencies. The frequency displacement from to has a small impact if sinusoidal waves are replaced by localized functions having a Fourier support which is wider at high frequencies. This is achieved by a wavelet transform [7, 14], whose properties are briefly reviewed in this section.

A wavelet transform is constructed by dilating a wavelet with a scale sequence for . For image processing, usually [3, 4]. Audio processing requires a better frequency resolution with typically [1]. To simplify notations, we normalize , with no loss of generality. Dilated wavelets are also rotated with elements of a finite rotation group , which also includes the reflection with respect to : . If is even then is a subgroup of , and if

is odd then

is a finite subgroup of . A mother wavelet is dilated by and rotated by


Its Fourier transform is . A scattering transform is computed with wavelets that can be written


where is a real function concentrated in a low frequency ball centered at , whose radius is of the order of . It results that is real and concentrated in a frequency ball of same radius, but centered at . To simplify notations we denote , with . After dilation and rotation, covers a ball centered at with a radius proportional to . The index thus specifies the frequency localization and spread of .

As opposed to wavelet bases, a Littlewood-Paley wavelet transform [7, 14] is a redundant representation which computes convolution values at all , without subsampling:


Its Fourier transform is

If is real then and if is real then . Let denote the quotient of with , where two rotations and are equivalent. It is sufficient to compute for “positive” rotations . If is complex then must be computed for all .

A wavelet transform at a scale only keeps wavelets of frequencies . The low frequencies which are not covered by these wavelets are provided by an averaging over a spatial domain proportional to :


If is real then the wavelet transform is indexed by . Its norm is


If then with . Its norm is . For complex-valued functions , all rotations in are included by defining and . The following proposition gives a standard Littlewood-Paley condition [7] so that is unitary.

Proposition 2.1

For any or , is unitary in the spaces of real-valued or complex-valued functions in if and only if for almost all


where for complex functions and for real functions.

Proof: If is complex, and one can verify that (9) is equivalent to


Since , multiplying (10) by , and applying the Plancherel formula proves that . For the same result is obtained by letting go to .

Conversely, if then (10) is satisfied for almost all . Otherwise, one can construct a function where has a support in the domain of where (10) is not valid. With the Plancherel formula we verify that , which contradicts the hypothesis.

If is real then so . Hence remains the same if is restricted to and is multiplied by , which yields condition (9) with .

In all the following, is a real function which satisfies the unitary condition (9). It implies that and for all . We choose to be real and symmetric so that is also real and symmetric and for all . We also suppose that and are twice differentiable and that their decay as well as the decay of their partial derivatives of order and is .

A change of variable in the wavelet transform integral shows that if is scaled and rotated, with , then the wavelet transform is scaled and rotated according to:


Since is invariant to rotations in we verify that commutes with rotations in : for all .

In dimension , . According to (5), to build a complex wavelet concentrated on a single frequency band, we set for . Following (9), is unitary if and only if


If is a real wavelet which generates a dyadic orthonormal basis of [14] then satisfies (9). Numerical examples in the paper are computed with a complex wavelet calculated from a cubic-spline orthogonal Battle-Lemarié wavelet [14].

In any dimension , can be defined as a separable product in frequency polar coordinates , with in the unit sphere of :

The one-dimensional function is chosen to satisfy (12). The Littlewood-Paley condition (9) is then equivalent to

2.2 Path Ordered Scattering

Convolutions with wavelets defines operators which are Lipschitz continuous to the action of diffeomorphisms, because wavelets are regular and localized functions. However, a wavelet transform is not invariant to translations, and translates when is translated. The main difficulty is to compute translation invariant coefficients, which remain stable to the action of diffeomorphisms, and retain high frequency information provided by wavelets. A scattering operator computes such a translation invariant representation. We first explain how to build translation invariant coefficients from a wavelet transform, while maintaining stability to the action of diffeomorphisms. Scattering operators are then defined, and their main properties are summarized.

If is an operator defined on , not necessarily linear but which commutes with translations, then is translation invariant, if finite. commutes with translations but because

. More generally, one can verify that any linear transformation of

, which is translation invariant, is necessarily zero. To get a non-zero invariant, we set where is a non-linear “demodulation” which maps to a lower frequency function having a non-zero integral. The choice of must also preserve the Lipschitz continuity to diffeomorphism actions.

If then , and hence


The convolution is a low-frequency filtering because covers a frequency ball centered at , of radius proportional to . A non-zero invariant can thus be obtained by canceling the modulation term with . A simple example is:


where is the complex phase of . This non-linear phase registration guarantees that commutes with translations. It results from (13) that . It recovers the Fourier modulus representation, which is translation invariant but not Lipschitz continuous to diffeomorphisms as shown in (3). Indeed, the demodulation operator in (14) commutes with translations but does not commute with the action of diffeomorphisms, and in particular with dilations. The commutator norm of with a dilation is equal to , even for arbitrarily small dilations, which explains the resulting instabilities.

Lipschitz continuity under the action of diffeomorphisms is preserved if commutes with the action of diffeomorphisms. For stability, we also impose that is nonexpansive. One can prove [4] that is then necessarily a pointwise operator, which means that only depends on the value of at . We further impose that for all , which then implies that . The most regular functions are obtained with , which eliminates all phase variations. We derive from (13) that this modulus maps into a lower frequency envelop:

Lower frequencies created by a modulus result from interferences. For example, if where and are in the frequency band covered by then oscillates at the interference frequency , which is smaller than and .

The integration is translation invariant but it removes all the high frequencies of . To recover these high frequencies, a scattering also computes the wavelet coefficients of each : . Translation invariant coefficients are again obtained with a modulus and an integration . If with , and in the support of then is proportional to . The second wavelet captures the interferences created by the modulus, between the frequency components of in the support of . We now introduce the scattering propagator, which extends these decompositions.

Definition 2.2

An ordered sequence with is called a path. The empty path is denoted . Let for . A scattering propagator is a path ordered product of non-commutative operators defined by


with .

The operator is well defined on because for all . The scattering propagator is a cascade of convolutions and modulus:


Each filters the frequency component in the band covered by , and maps it to lower frequencies with the modulus. The index sequence is thus a frequency path variable. The scaling and rotation by of a path is written . The concatenation of two paths is denoted , in particular . It results from (15) that


Section 2.1 explains that if is complex valued then its wavelet transform is whereas if is real then . If is complex then at the next iteration is real so next stage wavelet transforms are computed only for . The scattering propagator of a complex function is thus defined over “positive” paths and “negative” paths denoted . This is analogous to the positive and negative frequencies of a Fourier transform. If is real then so and hence . To simplify explanations, all results are proved on real functions with scattering propagators restricted to positive paths. These results apply to complex functions by including negative paths.

Definition 2.3

Let be the set of all finite paths. The scattering transform of is defined for any by


A scattering is a translation-invariant operator which transforms into a function of the frequency path variable . The normalization factor results from a path measure introduced in Section 3. Conditions are given so that does not vanish. This transform is then well-defined for any and any of finite length . Indeed so (16) implies that . We shall see that a scattering transform has similarities with Fourier transform modulus, where the path plays the role of a frequency variable. However, as opposed to a Fourier modulus, a scattering transform is stable to the action of diffeomorphisms, because it is computed by iterating on wavelet transforms and modulus operators, which are stable. For complex-valued functions, is also defined on negative paths, and if is real.

If then is non-linear but it preserves amplitude factors:


A scattering has similar scaling and rotation covariance properties as a Fourier transform. If is scaled and rotated, , then (11) implies that and cascading this result shows that


Inserting this result in the definition (18) proves that


Rotating thus rotates identically its scattering, whereas if is scaled by then the frequency paths is scaled by . The extension of the scattering transform in is done as a limit of windowed scattering transforms, that we now introduce.

Definition 2.4

Let and be the set of finite paths with and hence . A windowed scattering transform is defined for all by


The convolution with localizes the scattering transform over spatial domains of size proportional to :

It defines an infinite family of functions indexed by , denoted

For complex-valued functions, negative paths are also included in , and if is real.

Section 2.3 proves that for appropriate wavelets, . However, the signal energy is mostly concentrated on a much smaller set of frequency-decreasing paths for which . Indeed, the propagator progressively pushes the energy towards lower frequencies. The main theorem of Section 2.5 proves that a windowed scattering is Lipschitz continuous to the action of diffeomorphisms.

Since is continuous at , if then its windowed scattering transform converges pointwise to its scattering transform when the scale goes to :


However, when increases, the path set also increases. Section 3 shows that defines a multiresolution path approximation of a much larger set including paths of infinite length. This path set is not countable as opposed to each , and Section 3 introduces a measure and a metric on .

Section 3.2 extends the scattering transform to all and to all , and proves that . A sufficient condition is given to guarantee a strong convergence of to , and it is conjectured that it is valid on . Numerical examples illustrate this convergence and show that a scattering transform has strong similarities with a Fourier transforms modulus, when mapping the path to a frequency variable .

2.3 Scattering Propagation and Norm Preservation

We prove that a windowed scattering is nonexpansive, and preserves the norm. Family of operators indexed by a path set are written and .

A windowed scattering can be computed by iterating on the one-step propagator defined by

with and . After calculating , applying again to each yields a larger infinite family of functions. The decomposition is further iterated by recursively applying to each . Since and , it results that


Let be the set of paths of length , with . It is propagated into


Since , one can compute from by iteratively computing for going from to , as illustrated in Figure 1.

Figure 1: A scattering propagator applied to computes each and outputs . Applying to each computes all and outputs . Applying iteratively to each outputs and computes the next path layer.

Scattering calculations follow the general architecture of convolution neural-networks introduced by LeCun [11]

. Convolution networks cascade convolutions and a “pooling” non-linearity, which is here the modulus of a complex number. Convolution networks typically use kernels that are not predefined functions such as wavelets, but which are learned with backpropagation algorithms. Convolution network architectures have been successfully applied to number of recognition tasks

[11] and are studied as models for visual perception [2, 17]. Relations between scattering operators and path formulations of quantum field physics are also studied in [9].

The propagator is nonexpansive because the wavelet transform is unitary and a modulus is nonexpansive in the sense that for any . This is valid whether is real or complex. As a consequence