Optimal translational-rotational invariant dictionaries for images

09/04/2019 ∙ by Davide Barbieri, et al. ∙ Universidad Autónoma de Madrid 0

We provide the construction of a set of square matrices whose translates and rotates provide a Parseval frame that is optimal for approximating a given dataset of images. Our approach is based on abstract harmonic analysis techniques. Optimality is considered with respect to the quadratic error of approximation of the images in the dataset with their projection onto a linear subspace that is invariant under translations and rotations. In addition, we provide an elementary and fully self-contained proof of optimality, and the numerical results from datasets of natural images.



There are no comments yet.


page 11

page 13

page 14

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The purpose of this work is to present the theoretical solution, and outline the numerical implementation, for an optimal approximation problem on digital images. The problem is the following: given a dataset of square images, we want to find the optimal generators that provide, by translations and 90 degrees rotations, the best approximation of the dataset with respect to a quadratic error.

We are not considering the full set of all possible translations, which would give rise to a convolutional problem, but rather we consider translations on a lattice. Together with rotations, they will define a nonabelian semidirect product group of discrete Euclidean rigid movements of images.

This work is an adaptation of a result obtained for more general groups in [BCHM2019], which nevertheless can not be directly applied to this setting. Our approach to invariant approximation borrows several ideas from the theory of approximation by shift-invariant spaces developed in [ACHM2007], see also [AT2011, CMP2017]. Noncommutative problems of harmonic analysis related to group actions have a long tradition in signal processing, and recent works with relevant interactions with the present one are [BHP2015, BHP2018, GHP2018, GHI2018].

The presence of invariances in natural images has been long studied and exploited in vision (see [CS2006, B2015, PA2016

] and references therein), and it plays a central role in several approaches to machine learning [

Bekkers2018, AERP2019]. In particular, the solution presented in this work makes use of a special form of a data augmentation, a classical technique now of common use for networks training (see e.g. [DM2001, google2019] and references therein). Our approach differs in a fundamental way from patch-based ones such as [OF1996, AEB2006], because we do not extract patches from images, to be then used by translations, but rather consider entire images, and find the optimal generators for a fixed set of translations and rotations with methods of Fourier analysis.

The structure of this paper is the following. In Section 2 we describe the group invariance, focusing on invariant subspaces, and provide a formal statement of the approximation problem. In Section 3 we introduce an isometric isomophism that allows us to treat the group symmetries with Fourier analysis, and study how invariant subspaces behave under such a map. In Section 4 we provide a formal statement of the proposed solution, and outline the algorithm that allows us to compute it. In Section 5 we finally show the numerical results on a well-known dataset of natural images.

Most of the theoretical results presented in this paper could be deduced, without major difficulties, from the ones obtained in [BCHM2019]. The only obstruction from applying them directly to the present setting is due to the fact that the hypotheses for Proposition 4.1 of [BCHM2019] are not met here due to the presence of certain nontrivial stabilizers for the group action. In Section 3 we overcome this issue by defining an isometry that is slightly different from the one introduced in Section 4.2 of [BCHM2019]. The present setting represents a great simplification of the general case, mainly due to the finiteness of the problem. This gives us the possibility to present a fully self-contained approach to the solution. Indeed, although the arguments used to solve this problem refer to much more general principles, and could be proved with more abstract techniques, in the present case it is possible to provide full proofs of all the results needed to construct the desired approximation with only elementary techniques. We have chosen to do so with the intention of making this work accessible to the non specialist reader, which may be interested in applying this technique.

2 The invariant approximation problem

2.1 Group Invariance

We will consider grayscale digital images of

pixels, and we will treat here the case of an odd number

. It is convenient, for the purposes of this work, to consider a digital image as a function on the square lattice in centered coordinates

i.e. . This space is , indexed by , and endowed with the Euclidean norm, that we denote by , associated to the inner product


In particular, is the Frobenius norm of viewed as a matrix.

Note that, for simplicity, we allow ourselves the slight abuse of keeping the same notation commonly used for . For any we will also keep the additive notation, and denote always by the periodic sum


With this operation, is an abelian group, and 90 degrees rotations, defined by the linear action on of


are automorphisms. This can be easily checked because and, using (2.2), . For we denote by the -th power of the matrix , so corresponds to a 180 degrees rotation, etc. Note that is the identity: is a cyclic group of order 4.

Moreover, whenever is not a prime number, admits nontrivial proper subgroups for which 90 degrees rotations are also automorphisms of these subgroups. We give a precise statement in the next lemma.

Lemma 2.1

Let be odd, and let .

  • is a subgroup, isomorphic to .

  • is invariant for , i.e. , and for we have .

On images, and on every , the symmetries of translations and rotations are formally described as follows. Let be a subgroup. For , the translation of by , denoted by , is

where the operation is intended as in (2.2). The 90 degrees rotation of , denoted by , is

where is the 90 degrees rotation of given by (2.3). For we denote by the -th power (iteration) of the operator . The set of operators define a unitary representation of the nonabelian group of discrete Euclidean rigid movements, i.e. -translations and 90 degrees rotations on . The composition law of can be written as

for and . Indeed,


and, for all , and all we have . Note that we are always using the composition (2.2) for variables, while periodic composition is used for the rotation variables, i.e. is considered mod 4.

By general arguments, see [BHP2018, Lemma 11], a subspace is invariant under the action of the group , i.e. it is such that for all , if and only if it is linearly generated by the

-orbit of a set of vectors of

. This is the object of the next definition.

Definition 2.2

For a set of generators, we denote by

the -invariant linear subspace of generated by the action of on .

2.2 Best Approximation

The best approximation problem solved in this paper is the following. Suppose we are given a dataset of digital images. For , we want to find a family such that is a Parseval frame of , i.e. such that the orthogonal projection of onto can be written as

Moreover, we want this projection to minimize the quadratic error resulting from the projection onto a -invariant space with generators, i.e.


where is the orthogonal projection of onto .

The solution to this problem will be provided in Section 4 together with the construction that allows us to compute an optimal set of Parseval frame generators, given in Table 1.

Observe that the dimension of a -invariant space with generators is at most If and is as in Lemma 2.1, then . Thus, we will always consider families of generators with cardinality Indeed, this implies that the dimension of is certainly smaller than , that is the dimension of , and the ratio approximately quantifies the dimensionality reduction that one obtains when replacing with .

3 Fourier analysis of invariance

Fourier duality on is provided by the DFT:

We will write, for short,

where . The DFT is a multiple of an invertible isometry: it satisfies

and its inverse reads

It is easy to see by direct computation that the DCT intertwines translations with a phase factor, i.e.

while it commutes with rotations, i.e.


3.1 A group adapted isometry

For a subgroup of , its annihilator is defined by

is a subgroup of that plays a special role in Fourier analysis. The next lemma, whose proof is elementary, defines its structure and how it relates with rotations.

Lemma 3.1

Let be odd, let , and let be as in Lemma 2.1.

  • The annihilator of is .

  • The set is a fundamental set for , in the sense that

    • for all ;

    • .

  • is invariant for , i.e. , and for we have .

  • Let . The set satisfies

    • for all ;

    • .

The following definition introduces a transform, denoted by , that is well adapted to perform Fourier analysis on in the presence of the action of the group . This transform is a variation of the one introduced in Section 4.2 of [BCHM2019].

Definition 3.2

Let , with odd, and let and be as in Lemma 3.1. Let be the linear map on defined by

or, equivalently, . For any we define


which belongs to . For each we denote the corresponding element of by

where is the component of :

We provide now a proof of the following result, which also clarifies the role of the map .

Theorem 3.3

The map (3.2) is an invertible isometry . For , its inverse is


Proof. Let us first prove that is an isometry. Using (3.1) and Lemma 3.1

To prove the inversion formula, again using Lemma 3.1

where the last identity is due to (3.1).

The next lemma shows how the isometry intertwines the action of .

Lemma 3.4

Let be as in Lemma 2.1, and let and be as in Lemma 3.1. For all , all , all and all it holds


Proof. Observe first that, since is only a normalization of the Fourier coefficients on the subgroup , it commutes with translations: . Thus, for all

where the second to last identity makes use of the fact that is the annihilator of and of the invariance of under rotations. This last fact also implies that commutes with rotations: . Thus

3.2 Group invariant spaces

Using the previous results, we can deduce how -invariant spaces are transformed under the action of the map . The next theorem shows that, for any fixed , the element obtained by the transform of a linear combination of translates and rotates of a family is a linear combination of the transform of the rotates of .

Theorem 3.5

Let be as in Lemma 2.1, and let and be as in Lemma 3.1. Let , let be as in Definition 2.2, and let be the map defined in (3.2). For and , let us denote by

and let us denote by


Then these two subspaces of coincide:


Proof. Observe first that (3.5) implies in particular that for any , any , and any , the elements and belong to the same subspace of . This, however, can already be deduced by Lemma 3.4, because , and is invariant under rotations.

Let us now prove that for every and every , noting that, by the previous argument, it suffices to prove this only for .

Let . Then, using Lemma 3.4


where we have set


This proves that for every and every .

To prove the opposite inclusion, fix and let . We want to prove that for each there exists such that . Again, since is rotation invariant, by Lemma 3.4 it suffices to consider only . Now, observe that (3.7) is a DFT for . Indeed since for , equation (3.7) can be written as . Hence, it is inverted by

Thus, for we get . Now, by (3.2), the element of given by

satisfies . This concludes the proof.

Remark 3.6

The function is a nonabelian variant of an object known in the literature on shift-invariant spaces as range function (see e.g. [Bownik2000, CP2010] and references therein), and, up to a minor change, it corresponds to the map introduced in Definition 4.5 of [BCHM2019]. We note here also that, for the same used in the previous proof, we can easily compute the components of . Using Lemma 3.4 and (3.2)