Multiresolution Search of the Rigid Motion Space for Intensity Based Registration

We study the relation between the target functions of low-resolution and high-resolution intensity-based registration for the class of rigid transformations. Our results show that low resolution target values can tightly bound the high-resolution target function in natural images. This can help with analyzing and better understanding the process of multiresolution image registration. It also gives a guideline for designing multiresolution algorithms in which the search space in higher resolution registration is restricted given the fitness values for lower resolution image pairs. To demonstrate this, we incorporate our multiresolution technique into a Lipschitz global optimization framework. We show that using the multiresolution scheme can result in large gains in the efficiency of such algorithms. The method is evaluated by applying to 2D and 3D registration problems as well as the detection of reflective symmetry in 2D and 3D images.

READ FULL TEXT VIEW PDF

page 5

page 8

page 12

10/10/2019

Breathing deformation model – application to multi-resolution abdominal MRI

Dynamic MRI is a technique of acquiring a series of images continuously ...
07/13/2018

Performance of Image Registration Tools on High-Resolution 3D Brain Images

Recent progress in tissue clearing has allowed for the imaging of entire...
07/29/2018

Towards ultra-high resolution 3D reconstruction of a whole rat brain from 3D-PLI data

3D reconstruction of the fiber connectivity of the rat brain at microsco...
12/30/2017

A Real-time and Registration-free Framework for Dynamic Shape Instantiation

Real-time 3D navigation during minimally invasive procedures is an essen...
11/10/2015

Experimental robustness of Fourier Ptychography phase retrieval algorithms

Fourier ptychography is a new computational microscopy technique that pr...
05/20/2016

TRIM: Triangulating Images for Efficient Registration

With the advancement in the digital camera technology, the use of high r...
10/04/2020

Multi-Level Evolution Strategies for High-Resolution Black-Box Control

This paper introduces a multi-level (m-lev) mechanism into Evolution Str...

I Introduction

This paper investigates the implications of low-resolution image registration on the registration in higher resolutions. We focus on rigid intensity-based registration with correlation as fitness measure. We show that the fitness computed for lower resolution image pairs puts a bound on the fitness for higher resolution images. Our results can be exploited for the design and analysis of multiresolution registration techniques, especially the global optimization approaches.

Most of the approaches to globally optimally solving the registration problem deal with the alignment of point sets [1, 2, 3, 4, 5] or other geometric properties [6]. For example, Li and Hartley in [2] propose an algorithm to register two sets of points in a globally optimal way based on Lipschitz optimization [7]. Yang et al. [3] obtain a globally optimal solution by combing a branch and bound scheme with the Iterative Closest Point algorithm.

Such approaches have not been applied, as much, to intensity-based registration, due to the high computational cost. Nonetheless, in some applications it is worth to sacrifice efficiency to achieve more accurate results. A notable example is the construction of Shape Models [8] where the models are trained once and for all. In such a context, Cremers et al. [9] use Lipschitz optimization to search for optimal rotation and translation parameters for the alignment of 2D shape priors. After all, the computational burden is still the major limiting factor of such algorithms. It is well known that the multiresolution techniques can help with speeding up the registration algorithms [10, 11, 12]. However, to use a multiresolution scheme in a global optimization framework we need a theory which describes the relation among the target values of different resolutions. We provide such a theory in this paper, and based on that, propose an algorithm in which many candidate areas of the search space get rejected by examining the target value in lower resolutions.

This idea of early elimination has been well explored in the area of template matching, where a small template patch is searched in a target image. The successive elimination approach [13] rules out many candidate patch locations by examining a low-cost lower-bound on the cost function. The approach is extended in [14] to a multilevel algorithm, each level providing a tighter bound with higher computational cost. A multitude of approaches have been proposed based on the same idea of a multilevel succession of bounds [15, 16, 17, 18, 19, 20, 21]. In [22] some of the most successful algorithms of this class have been compared. Among them the work of Gharavi-Alkhansari [23]

is very related to ours. In this approach each level corresponds to a different image resolution. By moving from low to high resolution we get tighter bounds requiring more computations. For most images many of the candidate solutions are eliminated by computing the match measure in lower resolutions. However, in this method, like all the approaches above, the search is only performed on a grid of translation vectors with 1 pixel intervals. Most of the work to extend this to more general transformations do not fully search the parameter space. We should mention, however, the work of Korman et al.

[24]

for affine pattern matching. It considers a grid of affine transformations and makes use of

sublinear approximations to reduce the cost of examining each transformation. It runs a branch and bound algorithm in which a finer grid is used at each stage. Obtaining a solution close enough to the optimal target is guaranteed with

high probability

. The bounds, however, are in the form of asymptotic growth rates, and are learned empirically from a large data set. The algorithm, therefore, is not provably globally optimal in the exact mathematical sense.

Our work mainly deals with the alignment of two images rather than matching a small patch. Similar to [23]

we take a multiresolution approach by providing bounds between different resolutions. However, since rotation is involved and also sub-pixel accuracy is required, the registration problem is analyzed in the continuous domain, by interpolating the discrete images. We incorporate the multiresolution paradigm into a Lipschitz optimization framework which searches the entire rigid motion space for a globally optimal solution. We show that the multiresolution approach can lead to huge efficiency gains. Another closely related problem is estimating reflective symmetry in 2D and 3D images

[25]. We deomostrate that our algorithm can readily be applied to this problem as well. In short, the major contributions of this paper are

  • Providing a framework to analyze the relation among registration targets at different resolutions,

  • presenting inter-resolution bounds for several scenarios such as the use of different interpolation techniques or when only one image is decimated, and giving insights about how tighter bounds could be found,

  • A novel algorithm to integrate the multiresolution paradigm into a Lipschitz global optimization framework resulting in considerable savings in computations.

We also introduce new effective bounds for the Lipschitz constants which are much smaller than what proposed in the literature. We should mention, however, that our main concern here is to demonstrate the efficiency gains of a multiresolution approach, and not optimizing the single-resolution algorithm itself. After providing a background in Sect. II, we introduce the inter-resolution bounds in Sect. III. A basic grid search algorithm is presented in Sect. IV, for better understanding the concept of the multiresolution search, before presenting the multiresolution Lipschitz algorithm in Sect. V. We conduct experiments in Sect. VI to evaluate our algorithm on 2D and 3D registration examples, as well as, finding the axis or plane of symmetry in 2D and 3D images.

Ii Background

Consider two images and represented as continuous functions , where or . For intensity-based rigid registration we seek to maximize the correlation function

(1)

over and , the rotation matrix and the translation vector111Here, all integrals are over unless otherwise specified.. This is equivalent to minimizing the integrated squared error , as the transformation is rigid.

The fitness function (1) can be reformulated in the frequency domain. Let and

respectively represent the Fourier transforms of

and , and notice that the Fourier transform of is equal to . Applying the Parseval’s Theorem to (1) gives

(2)

where represents the complex conjugate of .

Ii-a Discrete images

For discrete images we only have samples at discrete locations ( or ). One could think of as samples taken from a continuous image according to , where the scalar is the sampling period222We have assumed that the discrete and continuous coordinates share the same origin. More generally, we can write , where is the displacement of the origin.. To get the continuous image from the discrete image we suppose that the continuous image is bandlimited to the corresponding Nyquist frequency . This means that the continuous image can be obtained by sinc interpolation:

(3)

where for scalars, and

(4)

for . Digital images are usually defined at a rectangular grid of pixels . For consistency with (3), we extend them by setting outside the image boundaries where . Thus, the summation in (3) can be over rather than . Notice that the continuous image created by (3) can be nonzero outside the image boundaries at non-integer coordinates.

The following proposition is useful for numerical integration in the frequency domain, and can be directly verified from the definition of the Discrete Fourier Transform (DFT).

Proposition 1.

Consider a discrete image defined over a square grid of pixels with pixels along each dimension. If the continuous image is obtained using (3), the corresponding DFT values are equal to the samples of , the Fourier transform of , taken at intervals333The equality might be up to a known global scaling factor depending on the convention used for defining DFT..

Other interpolation techniques

To accurately perform the sinc interpolation one needs to consider a large neighbourhood of pixels at any certain point. Therefore, in practice, kernels with a bounded support are used instead of the sinc function. A large class of interpolation techniques can be formulated as:

(5)

where is the interpolation kernel, which is typically a low-pass filter. To have a consistent interpolation, needs to be equal to 1 at the origin, and equal to zero at all other discrete sites with integer coordinates. Two examples of a bounded support kernel are:

Nearest neighbor interpolation

In this technique each point in the continuous domain gets the value of the closest discrete pixel. This can be obtained by using the box kernel

(6)
Bilinear interpolation

In bilinear interpolation (trilinear for 3D) is the product of triangular kernels in each dimension. The kernel has a bounded support, and can also be represented as the convolution of a box kernel with itself:

(7)

Ii-B Computation of the target function

Assume that and are defined on a grid of pixels . By substituting (and ) in (5) for and in (1), we can write the target function corresponding to the discrete image pair and as follows

(8)
(9)

where

(10)

Notice that in (9) is the translation in the pixel coordinates. Now look at . This is equivalent to rigidly transforming according to and , and then taking the difference with the pixel position . The weight given to each pair of pixel values and is equal to . The value of is expected to decay as the vector grows in size. If has a bounded support like (6) or (7), then will have a bounded support too. In other cases is negligible for large enough vectors . Therefore, for each , we can sum over pixels within a certain neighbourhood of . This means that computing the target (9) needs rather than computation, where is the number of pixels.

An essential part in computing (9) is to find the weights . For the nearest neighbour kernel (6), the integral (10) is simply the intersection area of two squares. As for the sinc kernel, a formula for (10) can be calculated in the frequency domain using the Parseval’s theorem. Nevertheless, even if large neighbourhoods are avoided by using bounded support kernels, the computation of itself is still costly. One solution is to precompute on a grid of and values to look up when necessary. Most registration algorithms, however, consider a simple form for which does not depend on once is known. Basically, what they do is discretizing the correlation integral. Let us rewrite (8) as

(11)

Now, we discretize the above integral at for all :

(12)

Here, the weights are simply the kernel values which do not depend on if is known. Another observation is that (12) is independent of what kernel is used for interpolating in the case where and are interpolated using two different kernels. For the nearest neighbour interpolation, is simply the intensity value of the closest pixel to . For the sinc interpolation, this value may be approximated efficiently by looking up in an upsampled version of . In our experiments we observed that discretized computation of the correlation integral does not cause major problems, unless for extremely decimated images or when a very high accuracy is expected (e.g. less than .2 pixels for translation). However, one ideally needs to further consider the discretization error.

Iii Implications of low resolution registration

Assume that the discrete images and are decimated to low-resolution images and . One may ask the question “What can the fitness (9) computed for lower resolution images say about the fitness for higher resolution images?”. This is an important question since the target function can be computed much faster in low resolution. If the original images are decimated by a factor of , then computing the target function takes less computations, where for 2D and 3D images respectively. This fact is clear from (9), where it is shown that the amount of computations is proportional to the number of pixels444In fact, the double summation in (9) implies that the amount of computations is proportional to the square of the number of pixels. But, in practice, we only consider -s within a fixed neighbourhood of each ..

Decimation of a discrete image may be carried out by

  1. low-pass filtering the corresponding continuous image obtained from (3), and

  2. sampling the filtered images at a period of , where is the sampling period of .

The low-pass filter handles the aliasing distortion caused by downsampling, and thus, must have a cutoff frequency of or less along every direction. To simplify the derivations, we consider a radial filter with the ideal frequency response

(13)

where is the -norm (length) of . The filtered image can be obtained as where is the impulse response of and “*” is the convolution operator. This filter eliminates all the frequency components outside a ball of radius around the origin. Therefore, nothing is lost by sampling the filtered image at intervals to obtain , the low resolution image. We can also define a complementary high-pass filter

(14)

The values of for do not matter, as is applied to which are bandlimited to due to (3). One could filter with to obtain . Notice that . To make a discrete image out of we should sample at intervals . Therefore, the discrete image has the same size as while has roughly less samples in every direction555We do not sample outside the boundaries of for creating and , even though the samples might not be exactly zero after filtering. The loss is supposedly negligible given that the images have a dark (near zero) margin.. The corresponding continuous images of and computed from (3) are exactly equal to and respectively. Note that to use (3) on one must use instead of .

Now, let us have a closer look at the continuous images , , and . An important observation is that and are orthogonal for any choice of and . This is because has no frequency component outside the ball of radius , while has no frequency components inside this ball. One way to see this is to write the inner product (correlation) between these two functions in the frequency domain using the Parseval’s theorem:

(15)
(16)

Similarly, and are orthogonal. This implies

(17)

where is the target function computed for and . Therefore, we have

(18)
(19)

where represents the -norm of . We obtain (19) using the Cauchy-Schwarz inequality followed by a change of variables for the right square root. It follows

(20)

where . The idea here is that tends to be small as for natural images the energy is mostly concentrated in the lower frequency bands.

Now, suppose that we want to find the maximum of the target function over a grid of registration parameters. Assume that we have computed the target function for the lower resolution images for all the grid values . Represent respectively by and the maximizers of and over the grid, where are yet unknown. Using (20) we get

(21)

It means that we could rule out any for which , since in that case

(22)

Fig. 1(c) illustrates this idea. To further limit the search space one can compute the high resolution target function at and discard all with , as shown in Fig. 1(d). The example provided in Fig. 1 shows the effectiveness of the proposed bound even when the images are decimated by a factor of 16 in each dimension.

(a) (b)
(c) (d)
Fig. 1: Bounds on high resolution registration fitness based on low resolution registration. (a) A starfish image (obtained from http://www.ck12.org) is chosen as the reference image , as it exhibits multiple local maxima, (b) is rotated by 45 degrees around its centre and then undergoes a slight affine warp to produce . The only registration parameter is the rotation angle . Both images are decimated by a factor of to create and (tiny images on the bottom-right of each image). (c) the lower and upper bounds (red dashed graphs) envelop the high resolution fitness (blue solid graph). The horizontal line shows the value . The red thick parts of this line show the reduced search space, where we have or . (d) to further restrict the search space the high resolution fitness is computed at . The horizontal line shows the value . The red thick parts of this line shows the reduced search space, namely where . Notice that decimating by a factor of 16 reduces the computations by a factor of 256 for 2D images. The amount of reduction in the search space provided by such a low resolution registration is notable.

Iii-a Bounded support interpolation

The results in the previous section hold when the sinc kernel (4) is used to compute the target functions (9) or (12). In this case, and do not have a compact support, and for (20) to hold we need a large neighbourhood of pixels around the location of each transformed pixel , which requires a lot of computations. We now consider computing the target function (9) with two kernels of bounded support, namely the box kernel (6) and the triangular kernel (7) which respectively correspond to nearest neighbour and bilinear (or trilinear) interpolation methods. Many other interpolation techniques can be treated in a similar manner.

Here, we assume that the low resolution images are obtained, as before, by performing the ideal filter (13) on the continuous images and obtained by the sinc kernel using (3), and then resampling. However, for computing the target function using (9), we use a box or triangular kernel for both low-resolution and high-resolution images. Now, let us see what happens in the frequency domain. The interpolation formula (5) can be written as the following convolution

(23)

where is the Dirac delta function. The Fourier transform of the kernel is with for the box kernel (6) and for the triangular kernel (7). The Fourier transform of (23) is then

(24)

Notice that is bandlimited to in every dimension. Therefore, the term is just the periodic repetition of with a period of along every dimension. In the same way, for the low-resolution image we have

(25)

Notice that is bandlimited to along every dimension. Similarly, and can be obtained.

Now, we like to find a bound on , where is the target function (1) calculated for and , and is the target for and . Here, we present a simple way of doing this, and leave more elaborate bounds for future research. Define the energy of a function within the frequency region by

(26)

denote by the ball of radius around the origin in the frequency domain, and let be its complement. Then we have the following proposition:

Proposition 2.

The absolute difference on the target functions is bounded from above by

(27)
(28)
(29)

Notice that outside the frequency ball one expects and to have low energy and hence (29) is supposed to be small. As for (27) and (28), it is expected that within , the energies and are small, as we will soon see. The proof is is given in Appendix A. Here, we give the value of the above energy terms calculated for function .

(30)
(31)
(32)
(33)
(34)

where is the -ball of radius , that is , and the function is defined on the -ball of radius as

(35)

For the 1D case () it has the following formula

(36)

for , where is the polygamma function of order . For we define . Considering Proposition 1 the integrals (30-34) can be computed numerically using DFT.

It is simple to derive (30) and (31) from (24) and (25) considering the fact that and are both equal to inside . Notice that (31) tends to be relatively small for natural images as for small sized the difference is small and for larger the frequency spectrum tends to be small. With a similar argument one can assert that the integrals (33) and (34) are small. The integral (32) is small as it is over . To obtain (32) and (33) observe that is equal to

(37)

As for all , the integral over is equal to (32). For the integral over , first for a vector and a set define . Then, we have

(38)

which is equal to (33) as . In a similar way (34) can be obtained; only the period is instead of . This means that, instead of , we must consider the ball of radius . Notice that in this case is zero, as is zero outside . All the integrals can be numerically computed using DFT in the light of Proposition 1. Fig. 2(a-c) illustrates the effect of the bound (27-29) for the example of Fig. 1.

(a) m = 16 (b) m = 8 (c) m = 4
(d) m = 16 (e) m = 8 (f) m = 4
Fig. 2: The bounds obtained for the nearest neighbour interpolation for the registration problem given in Fig. 1. Images are decimated by a factor of (a) 16, (b) 8 and (c) 4. The search space reduction approach is the same as that of Fig. 1(d). The bounds are not as effective as that of the sinc interpolation, but still are useful. (d,e,f) the same figures, but this time discretized integration is used instead of exact integration to compute the target function, and (39) is used for the bounds. Notice that the discretized integration has resulted in fluctuations in the bounds which increase as the decimation rate gets larger.

Finally, let us consider the case where the discretized integration (12) is used for computing the target function. As discussed earlier, this approximation does not depend on how is interpolated. Therefore, we may assume that is interpolated with a sinc kernel, that is and . Under these assumptions, one can show that is bounded from above by

(39)

where is the ball of radius (see Appendix B). The effect of using this bound is depicted in Fig. (2)(d,e,f). The bound is generally smaller than (27-29). However, here the discretization error for computing the target integral has been neglected. This is evident from the slight fluctuations in the bounds in Fig. (2)(d,e) for very low resolutions.

Iii-B Low-resolution to high-resolution registration

In this section we examine the situation where only one of the images is decimated. We study how this affects the amount of computations and the quality of the bounds. We also consider the case where the other image is upsampled.

Assume that is decimated by a factor of to create the low-resolution image . Using (5) a corresponding continuous image can be obtained as

(40)

where is the grid of low-resolution pixels. For the image we do not change the resolution, and thus, we have . Now, there are two ways to compute the correlation target between and : the exact way like (9) and the approximate way by discretizing the integral like (12). We leave it to the reader to check that for computing the exact integral similar to (9), lowering the resolution of just one image does not significantly reduce the computations. This is because while the number of pixels in the first image is reduced by a factor of , the size of the neighbourhood around each transformed pixel required for the computation of is increased by the same factor. However, if we discretize the integral, like in (12) we get

(41)
(42)

The above shows that if has a compact support then for every we only need to consider those pixels which are in the corresponding neighbourhood of . The size of this neighbourhood is the same as that of (12). Since the first sum is over the low-resolution pixel grid which has about times less pixels than , we see that the amount of required computation when lowering the resolution of just one image is the same as when both images are decimated. This, of course, comes at the cost of having an approximate integral by discretization.

Now, let us see what happens to the bounds. If the interpolation is done using the sinc kernel (, and thus and ), then the target function (41) is equal to , and hence, the bound (20) does not change. This is due to the fact that and are orthogonal, and thus, the correlation between and is the same as the correlation between and . However, for other interpolation techniques the bounds can be further tightened. In a similar way to Proposition 2 we can obtain

(43)

which is generally smaller than the bound in Proposition 2. If the discretized integral is used, similar to the way (39) is obtained, one can obtain the following bound

(44)

which is obtained by replacing with in (39). Fig. 3(a) shows an example of applying this bound.

Now, let us see what happens if the second image is upsampled by a factor of to obtain . We assume that the new samples are obtained by the natural sinc interpolation. Therefore, the corresponding continuous function remains the same, which means . Similarly to the way we obtained (24), if is interpolated by the nearest neighbour () or the bilinear () method to obtain the interpolated image , the corresponding Fourier transform will be

(45)

Notice that is the periodic repetition of with period . Since is bandlimited to in every dimension, this means that for there exist a lot of empty spaces in the spectrum (45) in which . Specially, is zero when . Thus, inside , where is the -ball of radius , and . Now, if we derive (44) for instead of we will get the following bound

(46)

Notice that for all we have

(47)

Therefore, inside , approaches to as gets larger. This means that as increases, (46) gets closer to the original bound in (19), since . Nonetheless, one should bear in mind that here we have neglected the discretization error for computing the target integral666The reader might have notice that the bound (46) is actually smaller than (19). This, however, this does not make it a better bound, since the corresponding target functions are different.. Fig. 3(b,c) shows examples of using (46) for and . We have observed experimentally that upsampling by a factor larger than 2 does not help much with reducing the search space. We, therefore, use in all our experiments.

(a) m = 16, p = 1 (b) m = 16, p = 2 (c) m = 16, p = 4
Fig. 3: Bounds for low-resolution to high-resolution registration with nearest neighbour interpolation and discretized computation of the target integral for the registration problem of Fig. 1. In all cases, the image is decimated by a factor of , and has been upsampled by a factor of (a) , (b) and (c) . Changing from 2 to 4 has not made a noticeable difference.

Iii-C Obtaining Tighter bounds

Iii-C1 Partitioning to radial frequency bands

We can obtain slightly tighter bounds by dividing the high-frequency area into radial bands. From (18) we have

(48)

where is the area outside the ball of radius around the origin. The above is obtained using the Parseval’s theorem given the fact that and where the high-pass filter was defined in (14). Now, we partition to radial bands as illustrated in Fig. 4. More precisely, for radii . Then we have

(49)

where (49) is obtained by using the triangle inequality followed by the Cauchy-Schwarz inequality, and a change of variables in the second integral. Notice that if and only if . The bound (49) can be computed numerically using DFT (see Proposition 1). We have tighter bounds if more frequency bands are used. The extreme case is when each band only contains DFT samples with the same distance to the origin. On the other extreme, when , then (49) is reduced to (19). Our experiments show that for natural image pairs with similar spectra (49) is only slightly better than (19). For instance, in the example of Fig. 1, the bound (19) is equal to 372.6 while (49) is equal to 356.4 in the best case (maximum ). However, dividing the frequency domain into radial bands can be a useful trick in similar problems. We will use it for obtaining Lipschitz bounds in Sect. V-A.

Fig. 4: The frequency domain outside has been partitioned into nested radial frequency bands ().

Iii-C2 Hölder’s inequality

Instead of using the Cauchy-Schwarz inequality in the derivation of the bounds, one may use the more general Hölder’s inequality which asserts for any such that . This can be done by minimizing with respect to and . This is specially useful when of one the images is considerably smaller or sparser than the other.

Iv A basic algorithm

We showed that the search space for high-resolution images can be limited given the low-resolution target values. This inspires a multiresolution search strategy whereby the target values at each resolution level further restrict the search space for the next (higher) level of resolution, as formalized in Algorithm 1. This is a very basic algorithm which searches a grid of the registration parameters . Here, we use as an index to represent the resolution level, as opposed to the previous section, where was a symbol indicating low-resolution. The resolution levels are , where and respectively correspond to highest and lowest resolutions. The target function for each resolution is represented by . Therefore, . At each resolution level, the algorithm finds the maximizer of the target function over (line 9), updates , the so-far best target value (line 10), and discards some elements of based on and the inter-resolution bounds (line 11). Here, can be any of the bounds introduced in the previous section. The algorithm might not seem interesting from a practical perspective as the target function has to be evaluated at every element of the grid, at least at the lowest resolution. Nonetheless, it demonstrates a strategy purely based on our inter-resolution bounds, and gives insights about how to reduce computations in globally optimal registration algorithms where the whole parameter space is searched, which is the subject of the next section.

1:
2:: registration image pairs for each resolution