I Introduction
This paper investigates the implications of lowresolution image registration on the registration in higher resolutions. We focus on rigid intensitybased registration with correlation as fitness measure. We show that the fitness computed for lower resolution image pairs puts a bound on the fitness for higher resolution images. Our results can be exploited for the design and analysis of multiresolution registration techniques, especially the global optimization approaches.
Most of the approaches to globally optimally solving the registration problem deal with the alignment of point sets [1, 2, 3, 4, 5] or other geometric properties [6]. For example, Li and Hartley in [2] propose an algorithm to register two sets of points in a globally optimal way based on Lipschitz optimization [7]. Yang et al. [3] obtain a globally optimal solution by combing a branch and bound scheme with the Iterative Closest Point algorithm.
Such approaches have not been applied, as much, to intensitybased registration, due to the high computational cost. Nonetheless, in some applications it is worth to sacrifice efficiency to achieve more accurate results. A notable example is the construction of Shape Models [8] where the models are trained once and for all. In such a context, Cremers et al. [9] use Lipschitz optimization to search for optimal rotation and translation parameters for the alignment of 2D shape priors. After all, the computational burden is still the major limiting factor of such algorithms. It is well known that the multiresolution techniques can help with speeding up the registration algorithms [10, 11, 12]. However, to use a multiresolution scheme in a global optimization framework we need a theory which describes the relation among the target values of different resolutions. We provide such a theory in this paper, and based on that, propose an algorithm in which many candidate areas of the search space get rejected by examining the target value in lower resolutions.
This idea of early elimination has been well explored in the area of template matching, where a small template patch is searched in a target image. The successive elimination approach [13] rules out many candidate patch locations by examining a lowcost lowerbound on the cost function. The approach is extended in [14] to a multilevel algorithm, each level providing a tighter bound with higher computational cost. A multitude of approaches have been proposed based on the same idea of a multilevel succession of bounds [15, 16, 17, 18, 19, 20, 21]. In [22] some of the most successful algorithms of this class have been compared. Among them the work of GharaviAlkhansari [23]
is very related to ours. In this approach each level corresponds to a different image resolution. By moving from low to high resolution we get tighter bounds requiring more computations. For most images many of the candidate solutions are eliminated by computing the match measure in lower resolutions. However, in this method, like all the approaches above, the search is only performed on a grid of translation vectors with 1 pixel intervals. Most of the work to extend this to more general transformations do not fully search the parameter space. We should mention, however, the work of Korman et al.
[24]for affine pattern matching. It considers a grid of affine transformations and makes use of
sublinear approximations to reduce the cost of examining each transformation. It runs a branch and bound algorithm in which a finer grid is used at each stage. Obtaining a solution close enough to the optimal target is guaranteed withhigh probability
. The bounds, however, are in the form of asymptotic growth rates, and are learned empirically from a large data set. The algorithm, therefore, is not provably globally optimal in the exact mathematical sense.Our work mainly deals with the alignment of two images rather than matching a small patch. Similar to [23]
we take a multiresolution approach by providing bounds between different resolutions. However, since rotation is involved and also subpixel accuracy is required, the registration problem is analyzed in the continuous domain, by interpolating the discrete images. We incorporate the multiresolution paradigm into a Lipschitz optimization framework which searches the entire rigid motion space for a globally optimal solution. We show that the multiresolution approach can lead to huge efficiency gains. Another closely related problem is estimating reflective symmetry in 2D and 3D images
[25]. We deomostrate that our algorithm can readily be applied to this problem as well. In short, the major contributions of this paper are
Providing a framework to analyze the relation among registration targets at different resolutions,

presenting interresolution bounds for several scenarios such as the use of different interpolation techniques or when only one image is decimated, and giving insights about how tighter bounds could be found,

A novel algorithm to integrate the multiresolution paradigm into a Lipschitz global optimization framework resulting in considerable savings in computations.
We also introduce new effective bounds for the Lipschitz constants which are much smaller than what proposed in the literature. We should mention, however, that our main concern here is to demonstrate the efficiency gains of a multiresolution approach, and not optimizing the singleresolution algorithm itself. After providing a background in Sect. II, we introduce the interresolution bounds in Sect. III. A basic grid search algorithm is presented in Sect. IV, for better understanding the concept of the multiresolution search, before presenting the multiresolution Lipschitz algorithm in Sect. V. We conduct experiments in Sect. VI to evaluate our algorithm on 2D and 3D registration examples, as well as, finding the axis or plane of symmetry in 2D and 3D images.
Ii Background
Consider two images and represented as continuous functions , where or . For intensitybased rigid registration we seek to maximize the correlation function
(1) 
over and , the rotation matrix and the translation vector^{1}^{1}1Here, all integrals are over unless otherwise specified.. This is equivalent to minimizing the integrated squared error , as the transformation is rigid.
The fitness function (1) can be reformulated in the frequency domain. Let and
respectively represent the Fourier transforms of
and , and notice that the Fourier transform of is equal to . Applying the Parseval’s Theorem to (1) gives(2) 
where represents the complex conjugate of .
Iia Discrete images
For discrete images we only have samples at discrete locations ( or ). One could think of as samples taken from a continuous image according to , where the scalar is the sampling period^{2}^{2}2We have assumed that the discrete and continuous coordinates share the same origin. More generally, we can write , where is the displacement of the origin.. To get the continuous image from the discrete image we suppose that the continuous image is bandlimited to the corresponding Nyquist frequency . This means that the continuous image can be obtained by sinc interpolation:
(3) 
where for scalars, and
(4) 
for . Digital images are usually defined at a rectangular grid of pixels . For consistency with (3), we extend them by setting outside the image boundaries where . Thus, the summation in (3) can be over rather than . Notice that the continuous image created by (3) can be nonzero outside the image boundaries at noninteger coordinates.
The following proposition is useful for numerical integration in the frequency domain, and can be directly verified from the definition of the Discrete Fourier Transform (DFT).
Proposition 1.
Consider a discrete image defined over a square grid of pixels with pixels along each dimension. If the continuous image is obtained using (3), the corresponding DFT values are equal to the samples of , the Fourier transform of , taken at intervals^{3}^{3}3The equality might be up to a known global scaling factor depending on the convention used for defining DFT..
Other interpolation techniques
To accurately perform the sinc interpolation one needs to consider a large neighbourhood of pixels at any certain point. Therefore, in practice, kernels with a bounded support are used instead of the sinc function. A large class of interpolation techniques can be formulated as:
(5) 
where is the interpolation kernel, which is typically a lowpass filter. To have a consistent interpolation, needs to be equal to 1 at the origin, and equal to zero at all other discrete sites with integer coordinates. Two examples of a bounded support kernel are:
Nearest neighbor interpolation
In this technique each point in the continuous domain gets the value of the closest discrete pixel. This can be obtained by using the box kernel
(6) 
Bilinear interpolation
In bilinear interpolation (trilinear for 3D) is the product of triangular kernels in each dimension. The kernel has a bounded support, and can also be represented as the convolution of a box kernel with itself:
(7) 
IiB Computation of the target function
Assume that and are defined on a grid of pixels . By substituting (and ) in (5) for and in (1), we can write the target function corresponding to the discrete image pair and as follows
(8)  
(9) 
where
(10) 
Notice that in (9) is the translation in the pixel coordinates. Now look at . This is equivalent to rigidly transforming according to and , and then taking the difference with the pixel position . The weight given to each pair of pixel values and is equal to . The value of is expected to decay as the vector grows in size. If has a bounded support like (6) or (7), then will have a bounded support too. In other cases is negligible for large enough vectors . Therefore, for each , we can sum over pixels within a certain neighbourhood of . This means that computing the target (9) needs rather than computation, where is the number of pixels.
An essential part in computing (9) is to find the weights . For the nearest neighbour kernel (6), the integral (10) is simply the intersection area of two squares. As for the sinc kernel, a formula for (10) can be calculated in the frequency domain using the Parseval’s theorem. Nevertheless, even if large neighbourhoods are avoided by using bounded support kernels, the computation of itself is still costly. One solution is to precompute on a grid of and values to look up when necessary. Most registration algorithms, however, consider a simple form for which does not depend on once is known. Basically, what they do is discretizing the correlation integral. Let us rewrite (8) as
(11) 
Now, we discretize the above integral at for all :
(12) 
Here, the weights are simply the kernel values which do not depend on if is known. Another observation is that (12) is independent of what kernel is used for interpolating in the case where and are interpolated using two different kernels. For the nearest neighbour interpolation, is simply the intensity value of the closest pixel to . For the sinc interpolation, this value may be approximated efficiently by looking up in an upsampled version of . In our experiments we observed that discretized computation of the correlation integral does not cause major problems, unless for extremely decimated images or when a very high accuracy is expected (e.g. less than .2 pixels for translation). However, one ideally needs to further consider the discretization error.
Iii Implications of low resolution registration
Assume that the discrete images and are decimated to lowresolution images and . One may ask the question “What can the fitness (9) computed for lower resolution images say about the fitness for higher resolution images?”. This is an important question since the target function can be computed much faster in low resolution. If the original images are decimated by a factor of , then computing the target function takes less computations, where for 2D and 3D images respectively. This fact is clear from (9), where it is shown that the amount of computations is proportional to the number of pixels^{4}^{4}4In fact, the double summation in (9) implies that the amount of computations is proportional to the square of the number of pixels. But, in practice, we only consider s within a fixed neighbourhood of each ..
Decimation of a discrete image may be carried out by

lowpass filtering the corresponding continuous image obtained from (3), and

sampling the filtered images at a period of , where is the sampling period of .
The lowpass filter handles the aliasing distortion caused by downsampling, and thus, must have a cutoff frequency of or less along every direction. To simplify the derivations, we consider a radial filter with the ideal frequency response
(13) 
where is the norm (length) of . The filtered image can be obtained as where is the impulse response of and “*” is the convolution operator. This filter eliminates all the frequency components outside a ball of radius around the origin. Therefore, nothing is lost by sampling the filtered image at intervals to obtain , the low resolution image. We can also define a complementary highpass filter
(14) 
The values of for do not matter, as is applied to which are bandlimited to due to (3). One could filter with to obtain . Notice that . To make a discrete image out of we should sample at intervals . Therefore, the discrete image has the same size as while has roughly less samples in every direction^{5}^{5}5We do not sample outside the boundaries of for creating and , even though the samples might not be exactly zero after filtering. The loss is supposedly negligible given that the images have a dark (near zero) margin.. The corresponding continuous images of and computed from (3) are exactly equal to and respectively. Note that to use (3) on one must use instead of .
Now, let us have a closer look at the continuous images , , and . An important observation is that and are orthogonal for any choice of and . This is because has no frequency component outside the ball of radius , while has no frequency components inside this ball. One way to see this is to write the inner product (correlation) between these two functions in the frequency domain using the Parseval’s theorem:
(15)  
(16) 
Similarly, and are orthogonal. This implies
(17) 
where is the target function computed for and . Therefore, we have
(18)  
(19) 
where represents the norm of . We obtain (19) using the CauchySchwarz inequality followed by a change of variables for the right square root. It follows
(20) 
where . The idea here is that tends to be small as for natural images the energy is mostly concentrated in the lower frequency bands.
Now, suppose that we want to find the maximum of the target function over a grid of registration parameters. Assume that we have computed the target function for the lower resolution images for all the grid values . Represent respectively by and the maximizers of and over the grid, where are yet unknown. Using (20) we get
(21) 
It means that we could rule out any for which , since in that case
(22) 
Fig. 1(c) illustrates this idea. To further limit the search space one can compute the high resolution target function at and discard all with , as shown in Fig. 1(d). The example provided in Fig. 1 shows the effectiveness of the proposed bound even when the images are decimated by a factor of 16 in each dimension.
(a)  (b) 
(c)  (d) 
Iiia Bounded support interpolation
The results in the previous section hold when the sinc kernel (4) is used to compute the target functions (9) or (12). In this case, and do not have a compact support, and for (20) to hold we need a large neighbourhood of pixels around the location of each transformed pixel , which requires a lot of computations. We now consider computing the target function (9) with two kernels of bounded support, namely the box kernel (6) and the triangular kernel (7) which respectively correspond to nearest neighbour and bilinear (or trilinear) interpolation methods. Many other interpolation techniques can be treated in a similar manner.
Here, we assume that the low resolution images are obtained, as before, by performing the ideal filter (13) on the continuous images and obtained by the sinc kernel using (3), and then resampling. However, for computing the target function using (9), we use a box or triangular kernel for both lowresolution and highresolution images. Now, let us see what happens in the frequency domain. The interpolation formula (5) can be written as the following convolution
(23) 
where is the Dirac delta function. The Fourier transform of the kernel is with for the box kernel (6) and for the triangular kernel (7). The Fourier transform of (23) is then
(24) 
Notice that is bandlimited to in every dimension. Therefore, the term is just the periodic repetition of with a period of along every dimension. In the same way, for the lowresolution image we have
(25) 
Notice that is bandlimited to along every dimension. Similarly, and can be obtained.
Now, we like to find a bound on , where is the target function (1) calculated for and , and is the target for and . Here, we present a simple way of doing this, and leave more elaborate bounds for future research. Define the energy of a function within the frequency region by
(26) 
denote by the ball of radius around the origin in the frequency domain, and let be its complement. Then we have the following proposition:
Proposition 2.
The absolute difference on the target functions is bounded from above by
(27)  
(28)  
(29) 
Notice that outside the frequency ball one expects and to have low energy and hence (29) is supposed to be small. As for (27) and (28), it is expected that within , the energies and are small, as we will soon see. The proof is is given in Appendix A. Here, we give the value of the above energy terms calculated for function .
(30)  
(31)  
(32)  
(33) 
(34) 
where is the ball of radius , that is , and the function is defined on the ball of radius as
(35) 
For the 1D case () it has the following formula
(36) 
for , where is the polygamma function of order . For we define . Considering Proposition 1 the integrals (3034) can be computed numerically using DFT.
It is simple to derive (30) and (31) from (24) and (25) considering the fact that and are both equal to inside . Notice that (31) tends to be relatively small for natural images as for small sized the difference is small and for larger the frequency spectrum tends to be small. With a similar argument one can assert that the integrals (33) and (34) are small. The integral (32) is small as it is over . To obtain (32) and (33) observe that is equal to
(37) 
As for all , the integral over is equal to (32). For the integral over , first for a vector and a set define . Then, we have
(38) 
which is equal to (33) as . In a similar way (34) can be obtained; only the period is instead of . This means that, instead of , we must consider the ball of radius . Notice that in this case is zero, as is zero outside . All the integrals can be numerically computed using DFT in the light of Proposition 1. Fig. 2(ac) illustrates the effect of the bound (2729) for the example of Fig. 1.
(a) m = 16  (b) m = 8  (c) m = 4 
(d) m = 16  (e) m = 8  (f) m = 4 
Finally, let us consider the case where the discretized integration (12) is used for computing the target function. As discussed earlier, this approximation does not depend on how is interpolated. Therefore, we may assume that is interpolated with a sinc kernel, that is and . Under these assumptions, one can show that is bounded from above by
(39) 
where is the ball of radius (see Appendix B). The effect of using this bound is depicted in Fig. (2)(d,e,f). The bound is generally smaller than (2729). However, here the discretization error for computing the target integral has been neglected. This is evident from the slight fluctuations in the bounds in Fig. (2)(d,e) for very low resolutions.
IiiB Lowresolution to highresolution registration
In this section we examine the situation where only one of the images is decimated. We study how this affects the amount of computations and the quality of the bounds. We also consider the case where the other image is upsampled.
Assume that is decimated by a factor of to create the lowresolution image . Using (5) a corresponding continuous image can be obtained as
(40) 
where is the grid of lowresolution pixels. For the image we do not change the resolution, and thus, we have . Now, there are two ways to compute the correlation target between and : the exact way like (9) and the approximate way by discretizing the integral like (12). We leave it to the reader to check that for computing the exact integral similar to (9), lowering the resolution of just one image does not significantly reduce the computations. This is because while the number of pixels in the first image is reduced by a factor of , the size of the neighbourhood around each transformed pixel required for the computation of is increased by the same factor. However, if we discretize the integral, like in (12) we get
(41)  
(42) 
The above shows that if has a compact support then for every we only need to consider those pixels which are in the corresponding neighbourhood of . The size of this neighbourhood is the same as that of (12). Since the first sum is over the lowresolution pixel grid which has about times less pixels than , we see that the amount of required computation when lowering the resolution of just one image is the same as when both images are decimated. This, of course, comes at the cost of having an approximate integral by discretization.
Now, let us see what happens to the bounds. If the interpolation is done using the sinc kernel (, and thus and ), then the target function (41) is equal to , and hence, the bound (20) does not change. This is due to the fact that and are orthogonal, and thus, the correlation between and is the same as the correlation between and . However, for other interpolation techniques the bounds can be further tightened. In a similar way to Proposition 2 we can obtain
(43) 
which is generally smaller than the bound in Proposition 2. If the discretized integral is used, similar to the way (39) is obtained, one can obtain the following bound
(44) 
which is obtained by replacing with in (39). Fig. 3(a) shows an example of applying this bound.
Now, let us see what happens if the second image is upsampled by a factor of to obtain . We assume that the new samples are obtained by the natural sinc interpolation. Therefore, the corresponding continuous function remains the same, which means . Similarly to the way we obtained (24), if is interpolated by the nearest neighbour () or the bilinear () method to obtain the interpolated image , the corresponding Fourier transform will be
(45) 
Notice that is the periodic repetition of with period . Since is bandlimited to in every dimension, this means that for there exist a lot of empty spaces in the spectrum (45) in which . Specially, is zero when . Thus, inside , where is the ball of radius , and . Now, if we derive (44) for instead of we will get the following bound
(46) 
Notice that for all we have
(47) 
Therefore, inside , approaches to as gets larger. This means that as increases, (46) gets closer to the original bound in (19), since . Nonetheless, one should bear in mind that here we have neglected the discretization error for computing the target integral^{6}^{6}6The reader might have notice that the bound (46) is actually smaller than (19). This, however, this does not make it a better bound, since the corresponding target functions are different.. Fig. 3(b,c) shows examples of using (46) for and . We have observed experimentally that upsampling by a factor larger than 2 does not help much with reducing the search space. We, therefore, use in all our experiments.
(a) m = 16, p = 1  (b) m = 16, p = 2  (c) m = 16, p = 4 
IiiC Obtaining Tighter bounds
IiiC1 Partitioning to radial frequency bands
We can obtain slightly tighter bounds by dividing the highfrequency area into radial bands. From (18) we have
(48) 
where is the area outside the ball of radius around the origin. The above is obtained using the Parseval’s theorem given the fact that and where the highpass filter was defined in (14). Now, we partition to radial bands as illustrated in Fig. 4. More precisely, for radii . Then we have
(49) 
where (49) is obtained by using the triangle inequality followed by the CauchySchwarz inequality, and a change of variables in the second integral. Notice that if and only if . The bound (49) can be computed numerically using DFT (see Proposition 1). We have tighter bounds if more frequency bands are used. The extreme case is when each band only contains DFT samples with the same distance to the origin. On the other extreme, when , then (49) is reduced to (19). Our experiments show that for natural image pairs with similar spectra (49) is only slightly better than (19). For instance, in the example of Fig. 1, the bound (19) is equal to 372.6 while (49) is equal to 356.4 in the best case (maximum ). However, dividing the frequency domain into radial bands can be a useful trick in similar problems. We will use it for obtaining Lipschitz bounds in Sect. VA.
IiiC2 Hölder’s inequality
Instead of using the CauchySchwarz inequality in the derivation of the bounds, one may use the more general Hölder’s inequality which asserts for any such that . This can be done by minimizing with respect to and . This is specially useful when of one the images is considerably smaller or sparser than the other.
Iv A basic algorithm
We showed that the search space for highresolution images can be limited given the lowresolution target values. This inspires a multiresolution search strategy whereby the target values at each resolution level further restrict the search space for the next (higher) level of resolution, as formalized in Algorithm 1. This is a very basic algorithm which searches a grid of the registration parameters . Here, we use as an index to represent the resolution level, as opposed to the previous section, where was a symbol indicating lowresolution. The resolution levels are , where and respectively correspond to highest and lowest resolutions. The target function for each resolution is represented by . Therefore, . At each resolution level, the algorithm finds the maximizer of the target function over (line 9), updates , the sofar best target value (line 10), and discards some elements of based on and the interresolution bounds (line 11). Here, can be any of the bounds introduced in the previous section. The algorithm might not seem interesting from a practical perspective as the target function has to be evaluated at every element of the grid, at least at the lowest resolution. Nonetheless, it demonstrates a strategy purely based on our interresolution bounds, and gives insights about how to reduce computations in globally optimal registration algorithms where the whole parameter space is searched, which is the subject of the next section.