The purpose of this work is to present the theoretical solution, and outline the numerical implementation, for an optimal approximation problem on digital images. The problem is the following: given a dataset of square images, we want to find the optimal generators that provide, by translations and 90 degrees rotations, the best approximation of the dataset with respect to a quadratic error.
We are not considering the full set of all possible translations, which would give rise to a convolutional problem, but rather we consider translations on a lattice. Together with rotations, they will define a nonabelian semidirect product group of discrete Euclidean rigid movements of images.
This work is an adaptation of a result obtained for more general groups in [BCHM2019], which nevertheless can not be directly applied to this setting. Our approach to invariant approximation borrows several ideas from the theory of approximation by shift-invariant spaces developed in [ACHM2007], see also [AT2011, CMP2017]. Noncommutative problems of harmonic analysis related to group actions have a long tradition in signal processing, and recent works with relevant interactions with the present one are [BHP2015, BHP2018, GHP2018, GHI2018].
The presence of invariances in natural images has been long studied and exploited in vision (see [CS2006, B2015, PA2016
] and references therein), and it plays a central role in several approaches to machine learning [Bekkers2018, AERP2019]. In particular, the solution presented in this work makes use of a special form of a data augmentation, a classical technique now of common use for networks training (see e.g. [DM2001, google2019] and references therein). Our approach differs in a fundamental way from patch-based ones such as [OF1996, AEB2006], because we do not extract patches from images, to be then used by translations, but rather consider entire images, and find the optimal generators for a fixed set of translations and rotations with methods of Fourier analysis.
The structure of this paper is the following. In Section 2 we describe the group invariance, focusing on invariant subspaces, and provide a formal statement of the approximation problem. In Section 3 we introduce an isometric isomophism that allows us to treat the group symmetries with Fourier analysis, and study how invariant subspaces behave under such a map. In Section 4 we provide a formal statement of the proposed solution, and outline the algorithm that allows us to compute it. In Section 5 we finally show the numerical results on a well-known dataset of natural images.
Most of the theoretical results presented in this paper could be deduced, without major difficulties, from the ones obtained in [BCHM2019]. The only obstruction from applying them directly to the present setting is due to the fact that the hypotheses for Proposition 4.1 of [BCHM2019] are not met here due to the presence of certain nontrivial stabilizers for the group action. In Section 3 we overcome this issue by defining an isometry that is slightly different from the one introduced in Section 4.2 of [BCHM2019]. The present setting represents a great simplification of the general case, mainly due to the finiteness of the problem. This gives us the possibility to present a fully self-contained approach to the solution. Indeed, although the arguments used to solve this problem refer to much more general principles, and could be proved with more abstract techniques, in the present case it is possible to provide full proofs of all the results needed to construct the desired approximation with only elementary techniques. We have chosen to do so with the intention of making this work accessible to the non specialist reader, which may be interested in applying this technique.
2 The invariant approximation problem
2.1 Group Invariance
We will consider grayscale digital images of
pixels, and we will treat here the case of an odd number. It is convenient, for the purposes of this work, to consider a digital image as a function on the square lattice in centered coordinates
i.e. . This space is , indexed by , and endowed with the Euclidean norm, that we denote by , associated to the inner product
In particular, is the Frobenius norm of viewed as a matrix.
Note that, for simplicity, we allow ourselves the slight abuse of keeping the same notation commonly used for . For any we will also keep the additive notation, and denote always by the periodic sum
With this operation, is an abelian group, and 90 degrees rotations, defined by the linear action on of
are automorphisms. This can be easily checked because and, using (2.2), . For we denote by the -th power of the matrix , so corresponds to a 180 degrees rotation, etc. Note that is the identity: is a cyclic group of order 4.
Moreover, whenever is not a prime number, admits nontrivial proper subgroups for which 90 degrees rotations are also automorphisms of these subgroups. We give a precise statement in the next lemma.
Let be odd, and let .
is a subgroup, isomorphic to .
is invariant for , i.e. , and for we have .
On images, and on every , the symmetries of translations and rotations are formally described as follows. Let be a subgroup. For , the translation of by , denoted by , is
where the operation is intended as in (2.2). The 90 degrees rotation of , denoted by , is
where is the 90 degrees rotation of given by (2.3). For we denote by the -th power (iteration) of the operator . The set of operators define a unitary representation of the nonabelian group of discrete Euclidean rigid movements, i.e. -translations and 90 degrees rotations on . The composition law of can be written as
for and . Indeed,
and, for all , and all we have . Note that we are always using the composition (2.2) for variables, while periodic composition is used for the rotation variables, i.e. is considered mod 4.
By general arguments, see [BHP2018, Lemma 11], a subspace is invariant under the action of the group , i.e. it is such that for all , if and only if it is linearly generated by the
-orbit of a set of vectors of. This is the object of the next definition.
For a set of generators, we denote by
the -invariant linear subspace of generated by the action of on .
2.2 Best Approximation
The best approximation problem solved in this paper is the following. Suppose we are given a dataset of digital images. For , we want to find a family such that is a Parseval frame of , i.e. such that the orthogonal projection of onto can be written as
Moreover, we want this projection to minimize the quadratic error resulting from the projection onto a -invariant space with generators, i.e.
where is the orthogonal projection of onto .
The solution to this problem will be provided in Section 4 together with the construction that allows us to compute an optimal set of Parseval frame generators, given in Table 1.
Observe that the dimension of a -invariant space with generators is at most If and is as in Lemma 2.1, then . Thus, we will always consider families of generators with cardinality Indeed, this implies that the dimension of is certainly smaller than , that is the dimension of , and the ratio approximately quantifies the dimensionality reduction that one obtains when replacing with .
3 Fourier analysis of invariance
Fourier duality on is provided by the DFT:
We will write, for short,
where . The DFT is a multiple of an invertible isometry: it satisfies
and its inverse reads
It is easy to see by direct computation that the DCT intertwines translations with a phase factor, i.e.
while it commutes with rotations, i.e.
3.1 A group adapted isometry
For a subgroup of , its annihilator is defined by
is a subgroup of that plays a special role in Fourier analysis. The next lemma, whose proof is elementary, defines its structure and how it relates with rotations.
Let be odd, let , and let be as in Lemma 2.1.
The annihilator of is .
The set is a fundamental set for , in the sense that
for all ;
is invariant for , i.e. , and for we have .
Let . The set satisfies
for all ;
The following definition introduces a transform, denoted by , that is well adapted to perform Fourier analysis on in the presence of the action of the group . This transform is a variation of the one introduced in Section 4.2 of [BCHM2019].
Let , with odd, and let and be as in Lemma 3.1. Let be the linear map on defined by
or, equivalently, . For any we define
which belongs to . For each we denote the corresponding element of by
where is the component of :
We provide now a proof of the following result, which also clarifies the role of the map .
The map (3.2) is an invertible isometry . For , its inverse is
The next lemma shows how the isometry intertwines the action of .
Proof. Observe first that, since is only a normalization of the Fourier coefficients on the subgroup , it commutes with translations: . Thus, for all
where the second to last identity makes use of the fact that is the annihilator of and of the invariance of under rotations. This last fact also implies that commutes with rotations: . Thus
3.2 Group invariant spaces
Using the previous results, we can deduce how -invariant spaces are transformed under the action of the map . The next theorem shows that, for any fixed , the element obtained by the transform of a linear combination of translates and rotates of a family is a linear combination of the transform of the rotates of .
Proof. Observe first that (3.5) implies in particular that for any , any , and any , the elements and belong to the same subspace of . This, however, can already be deduced by Lemma 3.4, because , and is invariant under rotations.
Let us now prove that for every and every , noting that, by the previous argument, it suffices to prove this only for .
Let . Then, using Lemma 3.4
where we have set
This proves that for every and every .
To prove the opposite inclusion, fix and let . We want to prove that for each there exists such that . Again, since is rotation invariant, by Lemma 3.4 it suffices to consider only . Now, observe that (3.7) is a DFT for . Indeed since for , equation (3.7) can be written as . Hence, it is inverted by
Thus, for we get . Now, by (3.2), the element of given by
satisfies . This concludes the proof.
The function is a nonabelian variant of an object known in the literature on shift-invariant spaces as range function (see e.g. [Bownik2000, CP2010] and references therein), and, up to a minor change, it corresponds to the map introduced in Definition 4.5 of [BCHM2019]. We note here also that, for the same used in the previous proof, we can easily compute the components of . Using Lemma 3.4 and (3.2)