With the ever-increasing quantity of information produced by sensors, efficient processing techniques for identifying meaningful information in high-dimensional data sets become crucial. One of the key challenges is to be able to identify relevant objects captured at different times, from various viewpoints, or by different sensors. Sparse signal representations, which decompose linearly signals into key features, have recently been shown to be a powerful tool in image analysis tasks[36, 20, 9]. In general, it is however necessary to align signals a priori in order to derive meaningful comparisons or distances in the analysis. Image alignment or registration
thus represents a crucial yet non-trivial task in many image processing and computer vision applications, such as object detection, localization and classification to name a few.
In this paper, we propose a registration algorithm for sparse images that are given as a linear combination of geometric features drawn from a parametric dictionary. The estimation of the global geometric transformation between images is performed first by building a set of candidate transformation solutions with all the relative transformations between features in each image. The transformation that leads to the smallest transformation-invariant distance is finally selected as the global transformation estimate. While image registration is generally a complex optimization problem, our algorithm offers a low complexity solution when the images have a small number of constitutive components. We analyze its theoretical performance, which mainly depends on the construction of the dictionary that supports the sparse image representations. We introduce two novel properties for redundant dictionaries, namely the robust linear independence and transformation inconsistency, which permit to characterize the performance of the registration algorithm. The benefits of these properties are studied in detail and compared to common properties such as the coherence or the restricted isometry property. We finally provide illustrative registration and classification experiments, where our algorithm outperforms baseline solutions from the literature, particularly when relative transformations between images are large.
The image registration problem has been widely investigated from different perspectives in the literature, but not from the point of view of sparse image approximations as studied in this paper. Image registration algorithms are usually classified into direct (pixel-based) methods, and featured-based methods. We review these two classes of methods, and refer the reader to [38, 31] for a general survey on image alignment.
Direct pixel-based methods simply consist in trying all candidate transformations and see how much pixels agree when the images are transformed relatively to each other. A major drawback of these methods is their inefficiency when the number of candidate transformations becomes large. Therefore, hierarchical coarse-to-fine techniques based on image pyramids have been developed [2, 35] to offer a compromise between accuracy and computational complexity. In a different approach, the authors of [28, 37] formulate the registration problem as a low-rank matrix recovery problem with sparse noise, and leverage the recent advances in convex optimization to find the optimal transformation that best aligns the images. The approaches developed in [29, 11] map the images to a canonical space where deformations take a simple form and thus allows easier registration.
The popular feature-based approaches  represent a more efficient class of methods for image registration. They are usually built on several steps: (i) feature detection, which searches for stable distinctive locations in the images, (ii) feature description, which provides a description of each detected location with an invariant descriptor, (iii) features matching between the images and (iv) transformation estimation that estimates the global transformation by looking at matched features. Note that it is crucial in this class of methods to describe the features in a transformation-invariant way for easier matching. We refer the reader to  for a comparison of the main different methods. A popular example of the feature-based approach relies on the scale invariant feature transform (SIFT)  that combines the Difference-of-Gaussian (DoG) detector with a descriptor based on image gradient orientations around the keypoint. The SIFT method is invariant to rotation, scaling and translation and some of its extensions achieve invariance to affine transformations . Moreover, the SIFT descriptors are often used in combination with affine invariant detectors such as those proposed in [22, 24, 17, 3] for affine image registration. Even though SIFT has been very successful in many computer vision applications, it is mostly built on empirical results and several parameters need to be set manually. Feature-based methods in general are not well suited for estimating large transformations between target images, as the matching accuracy and keypoint localization degrade for large transformations.
Finally, we mention some recent advances in transformation-invariant distance estimation, which is closely related to image registration. The transformation-invariant distance is defined as the minimum distance between the possible transformations of two patterns. In general, the signals generated by the possible transformations of a pattern can be represented by a non linear manifold. Computing the transformation-invariant distance between two patterns or equivalently the manifold distance is thus a difficult problem in general. The authors in  locally approximate the transformation invariant distance with the distance between the linear spaces that are tangent to both manifolds. Vasconcelos et. al.  go beyond the limitations of local invariance in tangent distance methods by embedding the tangent distance computation in a multiresolution framework. Kokiopoulou et. al. in  achieve global invariance by approximating the original pattern with a linear combination of atoms from a parametric dictionary. Thanks to this approximation, the manifold is given in a closed form and the objective function becomes equal to a difference of convex functions that can be globally minimized using cutting plane methods. Unfortunately, this class of optimization methods have a slow convergence rate with complexity limitations in practical settings.
In this paper, we propose to examine the image registration problem from a novel perspective by building on our earlier work  where we consider that images are given in the form of sparse approximations. Unlike the existing methods, this approach guarantees invariance to transformations of arbitrary magnitude and is generic with respect to the transformation group considered in the registration problem. The detailed analysis of our new framework further provides useful insights on the connections between image registration problems and sparse signal processing.
The rest of this paper is organized as follows. In Section 2, we formulate the problem of registration of sparse images and present our registration algorithm. Section 3 proposes a theoretical performance analysis of our algorithm, and introduces two new dictionary properties. We finally present illustrative experiments in Section 4.
2 Registration of sparse images
We first define the notations and conventions used in this paper. We denote respectively by , , the set of real numbers, the set of non negative real numbers and the set of positive real numbers. We consider images to be continuous functions in . We denote the scalar product associated with as: , and the norm by . Then, we define to be a transformation group and denote by its associated composition rule. We consider that the group includes the transformations between pairs of images in our registration problem. We represent any transformation
by a vector in(where denotes the dimension of ) containing the parameters of the transformation.
Alternatively, we represent a transformation with its unitary representation in . Therefore, for any , is the function that maps an image to its transformed image by . Moreover, as is a unitary operator, we have . In order to avoid heavy notations, we also use to denote . We give in Table 1 some examples of transformation groups and their unitary representation in .
|Special Euclidean group|
The group is the group of translations in the plane. The Special Euclidean group is the group of translations and rotations in the plane. Its dimension is equal to (degrees of freedom are associated with the translation and one is associated with rotation). The similarity group of the plane is the set of transformations consisting of translations, isotropic dilations and rotations. This group plays a particular importance in transformation invariant image processing since it contains the basic transformations we usually want to be invariant to.
Finally, if and , we denote by the norm of defined by . Note that the notation is overloaded since it denotes either the continuous norm or the discrete norm. However, the distinction between both cases will be clear from the context.
2.2 Problem formulation
We formulate now the registration problem that we consider in the paper. Let and be two images in . We are interested in computing the optimal transformation between images and . Hence, we formulate the original alignment problem as follows:
We denote by the transformation invariant distance between and . It corresponds to the regular Euclidean distance when the images are aligned optimally in the sense. Unfortunately, computing the transformation and the transformation invariant distance is a hard problem since the objective function is typically non convex and exhibits many local minima.
In order to circumvent this problem, we consider that the images are well approximated by their sparse expansion in a series of geometric functions. Specifically, let be a parametric dictionary of geometric features constructed by transforming a generating function as follows:
where is a finite discretization of the transformation group and denotes the transformation of the generating function by . We denote by and the respective -sparse approximations of and in the dictionary :
Since the dictionary contains features that represent potential parts of the image, we assume that coefficients and are all non negative so that the different features do not cancel each other.
We refer to any element in as a feature or atom. We suppose in this paper that the generating function is non negative. Besides, we suppose for simplicity that defines a one-to-one mapping. This assumption means that the generating function does not have any symmetries in 111We extend this assumption to the more general setting where the stabilizer of defined by is a finite set in Appendix B.. Finally, we suppose without loss of generality that the mother function is normalized so that .
We can now reformulate the registration problem as the problem of finding the optimal relative transformation between sparse patterns. In particular, we reformulate our registration problem as follows:
The smallest distance is the transformation invariant distance computed between the sparse image approximations and . Compared to the original problem, the images and are replaced by their respective sparse approximations and . This presents some potential advantages in applications where users do not have access to the original images; more importantly, the prior information on the support of and effectively guides the registration process, as we will see in the next paragraph. We should note that if the images are not well approximated by their sparse expansions, the solution of may substantially differ from the true transformation obtained by solving .
2.3 Registration algorithm
We propose now a novel and simple algorithm to solve the registration problem for images given by their sparse approximations. The core idea of our registration algorithm lies in the covariance property of the dictionary : a global transformation applied on the image induces an equivalent transformation on the corresponding features222The meaning of covariance that is used in this paper is not to be confused with that of covariance used in statistics.. Thanks to this covariance property, it is possible to infer the global transformation between the images by a simple computation of the relative transformations between the features in both images.
Specifically, let be the set of relative transformations between pairs of features taken respectively in and : . We can thus estimate the relative transformation between the images by solving the following relaxed problem of :
The minimum of the objective function is defined as the approximate transformation-invariant distance between and .
Even though problems and share some similarities, they differ in an important aspect, that is the search space. It is reduced from to the finite set . This constrains the estimated transformation to be equal to a transformation that exactly maps two features taken respectively from and . The assumption that can be replaced by originates from the observation that features are covariant to the global transformation applied on the original image. Even though this assumption is not necessarily true for all features when innovation exists between the images (other than a global transformation), we expect to have at least one feature whose transformation is consistent with the optimal transformation . We analyze in detail the error due to this assumption in Section 3. The advantage of replacing by is however immediate: we have reduced an intractable problem to a problem whose search space is of cardinality at most . Since is generally chosen to be small enough, the problem can be efficiently solved by a full search over all the elements of . The registration algorithm is summarized in Algorithm 1.
Input: sparse approximations and .
The value of controls the computational complexity of Algorithm 1: a large value of results in a large cardinality of the search space . Furthermore, the value of also generally controls the error in the approximation of the original images by their sparse expansions and . We discuss more in detail the influence of on our registration algorithm in Section 4. Note finally that we have supposed for simplicity that both images and are approximated by the same number of features. However, it is easy to see that one can generalize it to the case where the number of features are different in the two images. In this case, we have instead of , where and are the number of features in and respectively.
In the next section, we analyze the performance of the proposed registration algorithm in different settings, and focus in particular on the influence of the dictionary on the registration performance.
3 Theoretical analysis
In this section, we examine the penalty of relaxing the original problem into in terms of registration performance. We first discuss the framework and the assumptions used in our analysis. Then, we study a simple case where the image patterns are exactly related by a (possibly very large) geometrical transformation. We show that under a mild assumption on the dictionary, our algorithm achieves perfect registration. We then extend the analysis to the general case and introduce two key properties of the dictionary (namely robust linear independence and transformation inconsistency). We show that under some conditions on these properties, our algorithm succeeds in recovering the correct relative transformation with a bounded error in the general case, as long as the innovation between the images (other than the global geometrical transformation) is controlled. We give at each step of the analysis the main intuitions and several examples to illustrate the novel notions introduced in our analysis.
3.1 Analysis framework
We first define a performance metric to measure the image registration accuracy. As we want to capture the performance of our registration algorithm with respect to the optimal image alignment obtained by solving , a natural metric consists in computing the difference between the transformation invariant distance and its approximate version, i.e., . We however assume in this paper that the images are given by their sparse expansions. Therefore, we use an alternative registration performance given by , where we use the transformation invariant distance computed between the sparse image approximations and instead of the original images. Note that since .
We relate in the following proposition the two registration metrics and to the sparse approximation errors and .
using the triangle inequality. We now show that . Let . We have:
Using the triangle inequality, we derive a lower and an upper bound as follows:
As is a unitary operator, we have . Hence, rewriting the previous equation, we get:
Recall that and . Hence, by taking the minimum over all , we obtain , which concludes the proof of the proposition. ∎
When most of the energy of and is captured by and (namely when is small), the registration errors and are equivalent. We suppose in the rest of this section that this condition is satisfied and we measure the registration error with . Hence we focus exclusively in this analysis on the penalty induced by restricting the search space to , that is the penalty induced by relaxing the problem into the problem in the above section.
Before studying the registration performance, we describe additional assumptions on the discretization of the transformation group . Recall that the transformation optimally aligns and in the sense in problem . We assume that it satisfies the following assumptions:
where is the discretization of used to construct dictionary as given in Eq. (1). These hypotheses state that the atoms of and belong to the dictionary, where is the optimal alignment of with and is the optimal alignment of with . As is obviously not known beforehand, it is difficult to verify this assumption in practice. However, we can assume that Eq. (4) and Eq. (5) hold when the parameter space used to design is discretized finely.
Finally, the assumptions in our performance analysis can be summarized as follows:
3.2 Registration performance with exact pattern transformation
In our performance analysis, we first consider the special case where . This means that there exists a transformation for which , i.e., the sparse image approximations can be aligned exactly. We show that in this case, our registration algorithm is able to recover the exact global transformation between and , as long as any subset of size in is linearly independent. We have the following proposition:
Suppose that any subset of size in is linearly independent. In this case, if , then .
If , then we have . Thanks to the linear independence of any subset of size in , for any there exists such that . Indeed, if this is not the case, we could write as a linear combination of atoms in that are all different from and that all belong to thanks to assumption . This contradicts the assumption that any subset of atoms in is linearly independent. Then, since the mapping is one-to-one function thanks to our dictionary design assumption, we have . Thus, and . ∎
We can make the following remark about the design of the dictionary. The linear independence assumption guarantees that, when two -sparse signals are equal, they have at least one atom in common333The linear independence of any subset of size in the dictionary actually guarantees a stronger result: it guarantees that any -sparse signal has a unique decomposition in . In other words, it guarantees that when two -sparse signals are equal, all the atoms are equal.. If this condition is violated, the patterns and can have several decompositions in the dictionary with disjoint supports. In this case, all the features of the transformed pattern and are distinct, which generally lead to . Note that this assumption appears in many problems related to overcomplete dictionaries since it guarantees the uniqueness of -sparse decompositions [6, 5, 33].
Finally, since Proposition 2 ensures that for an exactly transformed pattern, and we have when the sparse approximation errors are not too large (Assumption ), we can guarantee that the registration error is small in this case.
3.3 Registration performance in the general case
3.3.1 Bound on the registration error
We now study the performance of our registration algorithm in the general case. The previous result only applies to an ideal scenario since the condition is rarely satisfied in practice. There is usually some slight innovation between the images (other than a transformation in ), which result in a distance that is non-zero. In addition, even when the original images are exactly related by a global transformation (i.e., ), there is no guarantee that the sparse approximations are can be perfectly aligned (i.e., ) due to the discretization of the dictionary.
We study the general case where where the sparse image approximations and have differences that cannot be explained by a global geometric transformation in . In more detail, when and denote respectively the coefficient vectors for patterns and following Eq. (2), we suppose that there exists a real number such that . The quantity therefore measures the normalized innovation between and .
We now turn to the main result of our paper, which is formulated in Theorem 1. This result relates the error of the registration algorithm in Algorithm 1 to the properties of the dictionary, namely the Robust Linear Independence (RLI) and the transformation inconsistency. It reads as follows.
If with , then:
when is -RLI for some , and is the transformation inconsistency of .
Theorem 1 shows that robust linear independence with a small and a small transformation inconsistency are key properties of the dictionary in order to guarantee the success of our algorithm. The RLI property can be thought as an extension of the linear independence assumption to the case where . Specifically, it guarantees the existence of two approximately similar features in and when is small. The transformation inconsistency captures the fact that geometrical transformations have a different effect on distinct atoms in the dictionary. We defer the proof of Theorem 1 to Appendix A, and we study in details in the rest of this section the novel RLI and transformation inconsistency properties.
3.3.2 Robust linear independence
We study now in more detail the novel dictionary properties. We first show that the linear independence assumption introduced in Section 3.2 is no longer sufficient to bound the registration performance in the case where (but close to zero). To see this, we construct a linearly independent dictionary and two sparse patterns and for which can be made arbitrarily close to zero (i.e., ) yet the registration error is large. As illustrated in Fig. 1, we consider a dictionary containing four square atoms and an additional big square atom parametrized by its position with respect to . Clearly, when , the dictionary is linearly independent since one cannot write an atom as a linear combination of the four other atoms. We consider the patterns and . When is small, the transformation that best aligns and is the identity transformation444If we look among all possible transformations, the optimal transformation is a translation that exactly aligns and . However, this transformation does not satisfy the assumptions in Eq. (4) and (5). To illustrate the main issue here, we consider only transformations that satisfy these assumptions. For small , the optimal transformation is therefore the identity.. All relative transformations between features in and are however dilations composed with translations, which result in an estimated transformation in our algorithm that is significantly different from the identity. Hence we obtain a large registration error in this example. This example shows that the linear independence assumption defined in Section 3.2 is fragile: it does not allow us to bound the registration error even when is very small. One needs a more robust condition in order to guarantee a small registration error even in cases where the innovation between images is small (but nonzero).
Therefore, we propose to extend the notion of linear independence to a novel property called robust linear independence (RLI) to characterize sets of vectors. It is formally defined as follows.
Let be a normed space and . A family of vectors is -robustly linearly independent (RLI) if the following implication holds for any vector :
In other words, when and the parameter are small, any linear combination of vectors that nearly vanishes in a RLI vector set contains at least two vectors that approximately cancel each other.
We now discuss the relation between RLI and linear independence. While linear independence prevents having collinear vectors, it is natural in our registration framework to allow collinear vectors in the dictionary since they represent essentially the same feature. Specifically, as the underlying transformation parameter of collinear atoms is the same, selecting one atom or the other is not important for the purpose of registration555Note that this is in contrast to recovery problems (e.g., compressed sensing) where collinear vectors (in the measurement matrix) are not allowed, since it will not be possible then to recover the active component of the signals.. The notion of linear independence where collinear vectors are allowed can be written as follows. For any such that ,
Note that this essentially corresponds to the notion of robust linear independence in the case where . Since we want to study the behavior of the algorithm for nonzero innovation between the images, we naturally extend the notion of linear independence (where collinear vectors are allowed) to Definition 1; if a linear combination of vectors has a small magnitude (where quantifies the magnitude), there exist two vectors that approximately cancel each other (where quantifies this approximation). Note that, for a fixed , the RLI gets harder to satisfy for a larger . In addition, for a fixed , the condition is harder to satisfy for a smaller .
The following toy example illustrates the notion of robust linear independence in .
Consider the setting of Figure 2 with . Then, for , we have:
is RLI with .
is not RLI unless .
The proof of Example 1 is straightforward from simple trigonometry. The set of vectors has a better behavior in terms of robust linear independence than . The underlying reason is that is very close to the vector (i.e., is close to zero), while is close to a linear combination of and (but not to or ). While it is acceptable to have vectors that are close to each other, the RLI property prevents having a vector that is close to a linear combination of the other vectors. This can also be readily seen in the example of Fig. 1
The definition of robust linear independence can be extended to dictionaries as follows.
A dictionary is -RLI if any subset of size in is -RLI.
The dictionary in the example of Fig. 1 is not -RLI for , with small (unless is large). Indeed, by choosing a vector of coefficients , we obtain , yet . Note that the RLI property on the dictionary has to be satisfied in order to obtain a good registration performance, as it ensures the existence of two approximately similar features (in the sense) in and , when is small.
We study now in more detail the RLI property on dictionaries. In particular, we examine the main difference between RLI and the well known Restricted Isometry Property (RIP) . The restricted isometry condition assumes that a collection of vectors behaves almost like an orthonormal system but only for sparse linear combinations. Specifically, the RIP with constant implies that any linear combination of elements in the dictionary satisfies:
By imposing a RIP property on the dictionary with , the norm of any sparse linear combination of atoms is guaranteed to be large (i.e., larger than ). In our case, contrarily to the RIP, we are interested in linear combinations of atoms that nearly vanish. The RLI property imposes in this case the existence of two atoms that approximately cancel each other in the signal support. Consequently, RLI can be seen as a weak form of RIP, where we allow the norm of linear combinations to be close to zero provided that two atoms approximately cancel each other in the sense of Eq. (6). In particular, any dictionary that satisfies the RIP property with a parameter will be (, , )-RLI. Indeed, since holds for any subset of dictionary elements, the left hand side of Eq. (6) cannot be satisfied when .
Let us consider a simple example to compare the new RLI property with the common ways of characterizing dictionaries, namely, the coherence  and the restricted isometry property 666Even though the definitions of RIP and coherence are originally for vectors in , we consider here a straightforward extension of the definitions of RIP and coherence to the case where vectors are in ..
Example 2 (Dictionary of translated box functions).
Let and define the box function
We consider the infinite-size dictionary , where is the translation operator by . The dictionary has the following properties:
is RIP with a constant equal to 1, for any .
The coherence of is equal to 1.
is -RLI for and .
As the proof of the robust linear independence of is rather technical and not essential to the main understanding of the paper, it is given in Appendix D.
Even if the dictionary hardly satisfies the RIP and is highly coherent, it is still an interesting one in our framework. Indeed, it satisfies the key property that two sparse signals that are close in the sense have at least two approximately similar features. When applied to our registration problem, this guarantees the existence of two features that are related approximately by a transformation in the sense777More precisely, this means that there exists a and a such that is small. when remains small. This property is at the core of our registration algorithm since we infer the global transformation by looking at the relative transformations between the features.
We finally stress the differences between the proposed RLI property and other dictionary properties as the RIP, coherence or more recently the properties introduced in [4, 15, 27]. While the latter properties are specifically designed for the task of signal recovery, the proposed RLI property is introduced in the context of image registration. This explains in particular why a dictionary can be well-behaved in terms of RLI property despite having coherent atoms. In contrast, coherent columns are forbidden in the context of recovery problems (e.g., compressed sensing) as it is then difficult to distinguish between similar components in the signal reconstruction.
3.3.3 Transformation inconsistency
The second dictionary property that is important to study the performance of our algorithm is the transformation inconsistency, which measures the difference in the effect of the same transformation on distinct atoms in the dictionary. It is formally defined as follows for parametric dictionaries given by Eq. (1).
The transformation inconsistency of a parametric dictionary is equal to:
where is the identity transformation. The transformation inconsistency is always larger than or equal to . Furthermore, when is commutative, the transformation inconsistency takes it minimal value and is equal to . Indeed, for any in and , we have:
Hence, taking the supremum over all and atoms in results in having . This is expected since when is commutative, a fixed transformation acts on all atoms similarly.
On the other hand, a large value of the transformation inconsistency (i.e., ) means that there exist two atoms in the dictionary that are affected in a very different way when they are subject to the same transformation. The transformation inconsistency plays a key role in our registration algorithm. Indeed, as the global transformation between two sparse patterns is estimated from one of the relative transformations between features, it is preferable that transformations act in a similar way on all the features of the sparse patterns for more consistent registration. That means that dictionaries with small transformation inconsistency provide better registration performance.
In order to outline the importance of this novel property in our registration framework, we give a few illustrative examples of dictionaries with different transformation inconsistency parameters.
Example 3 (Dictionary with quasi isotropic mother function, ).
We consider to be the Special Euclidean group (). That is, accounts for translations, rotations and combinations of those. We consider an ellipse-shaped mother function as shown in Figure 3 (a) with anisotropy . Then, we suppose for the sake of simplicity that (i.e., the dictionary is built by applying all transformations to the generating function ).
We illustrate in Fig 3 (b) the effect of transformation , which is a simple rotation, on two different atoms with parameters and positioned at different points in the 2D plane. While the rotation of the atom parametrized by induces a very slight change on it (when ), the same rotation applied on the atom changes completely its position. This is due to the fact that translations and rotations do not commute. Hence, the transformation has a very different impact on atoms and , and we get from Definition 3. Therefore, when the generating function approaches isotropy, the transformation inconsistency grows to infinity.
In this example, our registration algorithm is not guaranteed to have a small error. To illustrate it, let us consider the patterns and illustrated in Fig. 3 (c), which are each composed of two atoms whose coefficients are all equal. The distance between the patterns can be made arbitrarily small with a generating function that is close to isotropic (i.e., ) while the minimal distance in our algorithm remains large. Indeed, since our algorithm considers only relative transformations between pairs of atoms, the estimated global transformation between the patterns can only be equal to a combination of a translation and rotation of . However, when , the optimal transformation is clearly the identity, which cannot be selected with our algorithm: this results in a large registration error . Note that the error here is entirely related to the fact that the transformation inconsistency is large, and not to the RLI property since the dictionary under consideration here is robustly linearly independent for small values of the sparsity .
Example 4 (Dictionary built on an elongated mother function, ).
Similarly to the previous example, we consider the transformation group and that . However, the dictionary is now built on an elongated mother function as shown in Fig 4 (a). As in the previous example, we can make the transformation inconsistency very large by taking elongated atoms (large ) and a transformation that is a small translation, as shown in Fig 4 (b). It is again possible to construct an example where the registration algorithm performs poorly (see Fig 4 (c)) : the set of transformations between features in each sparse pattern contains only translations and rotations of . Therefore, any candidate transformation results in a large value of the global registration error term ; the optimal global transformation is the identity in this case, which leads to a small value of the minimal distance between the patterns when is large.
To be complete, we should note that the one-to-one mapping assumption defined in Section 2.2 for the function is not satisfied in Example 3 and Example 4, since has a rotational symmetry of . In this case, a slightly more complicated definition of the transformation inconsistency has to be made to avoid having (with the definition of given in Definition 3, we obtain by setting to be a rotation of , to be the identity and choosing any different from ). The main intuitions of the transformation inconsistency , as defined in Definition 3 however hold when has a finite number of symmetries. We study in detail the generalization of the transformation inconsistency to the case where has symmetries in in Appendix B.
Example 5 (Dictionary built with translation and isotropic dilations, ).
In this example, we let to be the group of translations and isotropic dilations. The generating function of the dictionary could have any form, as long as its support is much smaller than the dimension of the image. For example, we can choose a circle-shaped mother function, as depicted in Fig 5 (a). Then, we consider the scenario where the two atoms and are separated by (where is considered to be very large) as illustrated in Fig.5 (b). A transformation that consists of a small isotropic dilation has a very different effect on both atoms since translations and dilations do not commute. In particular, the transformation applied to results in an atom that has no intersection with , while the same transformation has almost no effect on , i.e., . Thus, the transformation inconsistency is very high and according to Definition 3. In Fig. 5 (c), we illustrate why this may cause a problem in our registration algorithm: we consider the two sparse patterns and composed of two features each, where the coefficients of all the atoms are equal. It is not hard to see that the optimal global transformation between both patterns is the identity. At the same time, our algorithm can only estimate a global transformation that is a dilation (combined possibly with a translation) since all transformations between pairs of atoms in and consist in combinations of dilation and translation.
Overall, the above examples suggest that, whenever the transformation inconsistency of the dictionary is large, one may construct an example where our registration algorithm approximates poorly the transformation invariant distance. It is worth mentioning that even though the previous examples consider localized atoms with finite support, our approach is not constrained to such atoms. In the general setting where is any transformation group (and for the sake of simplicity), such example of failure could be constructed as follows. The basic idea is to build two patterns and of the form and for which: (i) , (ii) and are large (with respect to ). The optimal transformation between and is then simply the identity, whereas the transformations considered in our algorithm (namely and , along with and ) result in a poor registration performance as they all differ from the identity transformation.
In more details, when we know that there exist two atoms and with and , along with a transformation for which while is large. By posing , we get that . Hence, the norm is necessarily small since . Besides, we know by construction that is large and is also generally large since the group is non commutative. This gives us, in general, large values of and . This construction shows that, when the dictionary has a large inconsistency parameter, one can find patterns for which the registration algorithm fails to recover the right global transformation.
In general, the above examples show that it is better to choose a dictionary with a small transformation inconsistency (i.e., small) to have good registration performance irrespectively of the patterns to be aligned.
The performance of the registration algorithm depends on the transformation inconsistency as well as on the robust linear independence of the dictionary, as shown in Theorem 1. The success of our registration algorithm for all sparse signals in the dictionary is guaranteed when the RLI and transformation inconsistency conditions are satisfied. Note that the conditions on the dictionary properties are essentially tight, as one can construct an example where our algorithm fails whenever one of the parameters is large enough. The performance bound should be interpreted more in a qualitative way than a quantitative way. It provides two rather intuitive conditions for our algorithm to provide low registration error. In order to use this bound quantitatively, one has however to be able to compute explicitly the newly defined properties on generic dictionaries. We outline here the fact that such a bound could not have been established with traditional measures for characterizing dictionaries, namely coherence or restricted isometry property constant. Finally, we remark that the result in Theorem 1 can be used to bound the registration error thanks to Proposition 1. The price to pay in this case is the approximation error .
4 Image registration experiments
In this section, we evaluate the performance of our algorithm in image registration experiments. We first describe the implementation choices in our registration algorithm. Then, we study its performance for different dictionaries and put the results in perspective with the theoretical guarantees in Section 3. Then, we present illustrative image registration and classification experiments with simple test images and handwritten digits. Finally, we provide some simple comparisons with baseline registration algorithms with simple features from the computer vision literature.
4.1 Algorithm implementation
In all the experiments of Section 4.3
, we focus on achieving invariance to translation, rotation and scaling. Invariance to these transformations is indeed considered to be a minimal requirement in invariant pattern recognition. These three operations generate thegroup of similarities that we denote by . Any element in is therefore indexed by 4 parameters: a translation vector , dilation and rotation parameter . We describe now the sparse approximation algorithm and the dictionary design used in our experiments.
4.1.1 Sparse approximation algorithm
There are many methods to construct sparse approximations of images. In our experiments, we use a modified implementation of the Matching Pursuit (MP)  algorithm, as MP is a pretty simple algorithm that works relatively well in practice. It is an iterative algorithm that successively identifies the atoms in that best match the image to be approximated. More precisely, MP iteratively computes the correlation between the atoms in and the signal residual, which is obtained by subtracting the contributions of the previously chosen atoms from the original image. At each iteration, the atom with the highest correlation is selected and the residual signal is updated. While the standard MP algorithm solves the sparse approximation problem without positivity constraint on the coefficients, we propose a slightly modified algorithm (that we call Non negative Matching Pursuit (NMP)) in order to select atoms that have the highest positive correlation with the residual signal. This choice is driven by the objective of having a part-based signal expansion, where each feature participate to constructing the signal representation. The NMP algorithm is formally defined in Algorithm 2.
Input: image , sparsity , dictionary .
Ensure: coefficients , support .
Non negative Matching Pursuit (NMP) for feature extraction
One way to choose the sparsity consists in controlling the approximation error of and . Specifically, we can impose a stopping criterion in the NMP algorithm of the form where is the residual at iteration and is a fixed threshold controlling the approximation error. When is chosen to be small enough, this guarantees a relatively small sparse approximation error.
Note that the complexity of NMP is governed by the selection step, hence operations need to be performed. Besides, the complexity of solving