# Fundamental Limits in Multi-image Alignment

The performance of multi-image alignment, bringing different images into one coordinate system, is critical in many applications with varied signal-to-noise ratio (SNR) conditions. A great amount of effort is being invested into developing methods to solve this problem. Several important questions thus arise, including: Which are the fundamental limits in multi-image alignment performance? Does having access to more images improve the alignment? Theoretical bounds provide a fundamental benchmark to compare methods and can help establish whether improvements can be made. In this work, we tackle the problem of finding the performance limits in image registration when multiple shifted and noisy observations are available. We derive and analyze the Cramér-Rao and Ziv-Zakai lower bounds under different statistical models for the underlying image. The accuracy of the derived bounds is experimentally assessed through a comparison to the maximum likelihood estimator. We show the existence of different behavior zones depending on the difficulty level of the problem, given by the SNR conditions of the input images. We find that increasing the number of images is only useful below a certain SNR threshold, above which the pairwise MLE estimation proves to be optimal. The analysis we present here brings further insight into the fundamental limitations of the multi-image alignment problem.

## Authors

• 3 publications
• 17 publications
• 4 publications
• 72 publications
• ### Rank-one Multi-Reference Factor Analysis

In recent years, there is a growing need for processing methods aimed at...
05/29/2019 ∙ by Yariv Aizenbud, et al. ∙ 0

• ### Analytical Comparison of Noise Reduction Filters for Image Restoration Using SNR Estimation

Noise removal from images is a part of image restoration in which we try...
12/02/2014 ∙ by Poorna Banerjee Dasgupta, et al. ∙ 0

• ### An integration of fast alignment and maximum-likelihood methods for electron subtomogram averaging and classification

Motivation: Cellular Electron CryoTomography (CECT) is an emerging 3D im...
04/04/2018 ∙ by Yixiu Zhao, et al. ∙ 0

• ### Tensor SVD: Statistical and Computational Limits

In this paper, we propose a general framework for tensor singular value ...
03/08/2017 ∙ by Anru Zhang, et al. ∙ 0

• ### Joint Data-Aided Carrier Frequency Offset, Phase Offset, Amplitude and SNR Estimation for Millimeter-Wave MIMO Systems

This work is devoted to solve the problem of estimating the carrier freq...
12/16/2017 ∙ by Javier Rodríguez-Fernández, et al. ∙ 0

• ### Super-resolution multi-reference alignment

We study super-resolution multi-reference alignment, the problem of esti...
06/27/2020 ∙ by Tamir Bendory, et al. ∙ 0

• ### Fast and Asymptotically Powerful Detection for Filamentary Objects in Digital Images

Given an inhomogeneous chain embedded in a noisy image, we consider the ...
09/19/2020 ∙ by Kai Ni, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Multi-image alignment consists in registering a group of images to a common reference. 111We use hereafter the terms image alignment and image registrationinterchangeably. The multi-image alignment problem is ubiquitous in many fundamental image processing applications such as high dynamic range imaging [1, 2]

[3, 4, 5], burst denoising [6] and burst deblurring [7, 8]. Indeed, this problem is of great importance for very different domains, such as biomedical imaging, astronomy and remote sensing, where due to physical or biological constraints the photographing system captures a series of unregistered and often noisy images.

Various methods have been proposed for multi-image alignment [9, 10, 11, 12, 13, 4] and a great amount of effort is being invested to further improve their performance, mostly in applications that deal with very low signal-to-noise ratio (SNR) conditions [14, 15, 16, 17]. Hence, an important question arises: Which are the fundamental limits on multi-image alignment performance? Theoretical performance bounds provide a fundamental benchmark to compare different methods and can help establish whether improvements can be made. In this work, we tackle the problem of finding the performance limits in image registration when multiple shifted and noisy observations are available.

Theoretical statistical performance bounds are of great interest and have been used in a wide variety of signal processing problems. One of the most widely used approaches, probably because of its simplicity, is the Cramér-Rao bound (CRB)

[18]

, which establishes a lower bound on the variance of any unbiased estimator of the parameter of interest. For instance, CRBs have been previously used to establish performance limits in pairwise image alignment

[19, 20], super-resolution [3], high dynamic range imaging [21] and image denoising [22], among others. Another example is the bound proposed by Ziv and Zakai [23] and its extensions [24, 25, 26]. They proposed to relate the mean squared error of the estimator to the probability of error in a binary detection problem, leading in general to tighter bounds than the CRB. Examples of the application of the Ziv-Zakai bound (ZZB) to practical problems can be found in pairwise image alignment [27] and time delay estimation [28, 29], among others. Both the CRB and the ZZB will be computed here for the problem of multi-image alignment.

Image registration can easily become very complex with the kind of scene motions we face in real world scenarios. In this work, we focus on global translation which, despite being the most basic motion model, is of great interest because it is present in almost all applications. The considered motion model is thus given by

 z(x)=u(x−τ)+n(x), (1)

where is the observed image at pixel position , is the underlying image,

is the 2D translation vector and

is additive white Gaussian noise independent of .

A fundamental aspect that has to be considered when computing a performance bound for the image alignment problem under Model (1), is how to characterize the underlying image . Even if the parameter of interest is the shift vector , assumptions have to be made about and each assumption will lead to different performance bounds. For instance, could be considered as deterministic, known or unknown, or as a realization of a known random process.

Various performance bounds have been derived for the pairwise image alignment problem (i.e., registration between two images) assuming a deterministic known underlying image. Examples of this are the CRB for translation estimation derived by Robinson and Milanfar [19], the CRB for general parametric registration introduced by Pham et al. [20] , and the ZZB derived by Xu et al. [27] for rigid pairwise registration including translation and rotation.

Regarding multi-image alignment, a specific case was analyzed by Rais et al. [30], who computed the CRB for the registration of a group of Earth satellite images that were uniformly translated, i.e., all shifts are multiples of a single unknown value that needs to be estimated. In [3], Robinson and Milanfar presented a thorough statistical performance analysis on super-resolution, of which multi-image registration is typically a major component. They studied translation estimation and image reconstruction jointly, thus assuming an unknown underlying image. This work shed light on image super-resolution, giving important insight into which are the main bottlenecks for improving performance. They derived bounds for the combined problem under two different assumptions for : the CRB assuming an unknown deterministic image and a Bayesian CRB assuming a Gaussian prior for . In both cases, and assuming the considered images are aliasing free, the computed CRB for the multiple shifts estimation was independent of the number of available images.

It is interesting to remark that the problem of image translation estimation is closely related to the problem of time delay estimation of a signal observed at two or more spatially separated receivers [28, 29]. Indeed, our analysis follows and extends the results from [29] to the case where multiple noisy versions of the same flat spectrum signal are observed, each with a different shift.

In this work, we derive and analyze various performance bounds for the multi-image alignment problem under two different models for the underlying image . First, we consider to be deterministic and unknown. Under this image model, we compute the CRB and a Bayesian CRB assuming a generalized Gaussian prior for the shifts. Second, assuming a stochastic Gaussian model for the underlying image , we derive the CRB and the extended Ziv-Zakai bounds (EZZB).

A thorough analysis is conducted, which unveils the similarities between these seemingly different approaches. We find a per-region behavior depending on the difficulty level of the problem, given by the SNR conditions. For certain SNR values, performance depends on the number of images. Also, it degrades dramatically below a given threshold, until reaching a region where the SNR is too low to enable alignment.

In order to assess the tightness of the computed bounds, we compare them to the alignment accuracy obtained by the maximum likelihood estimator (MLE). The MLE, besides being a widely used estimator, is known to be asymptotically efficient and also efficient for any number of observations in various problems [21]. A per-region behavior depending on the SNR level, similar to the one predicted by the EZZB, is observed for the MLE as well. We find that all the computed bounds are very tight in very high SNR conditions, where the MLE achieves them and is thus efficient. For such high SNR, we find that the alignment performance only depends on the ratio between the energy of the image gradient and the noise level, and does not depend on the number of available images. Hence, for very high SNR, multi-image alignment can be performed in a pairwise fashion without losing information.

However, this is not the case for low SNR where the performance shows a dependence on the number of images, until reaching a steady state error for extremely low SNR where alignment is not possible. The SNR values delimiting these regions, which are of particular importance in practice, are also derived and found to depend on the number of available images. Therefore, increasing the number of images is useful since, not only it improves the achievable performance, but it also shifts the SNR thresholds making alignment possible for a larger noise level range.

This article is organized as follows. Section II presents the statistical framework used to state multi-image alignment as a parameter estimation problem. Sections III and IV are devoted to the computation and analysis of the different performance bounds, under the deterministic and stochastic image models respectively. Section V presents an analysis and comparison of all the computed bounds. The bounds accuracy is assessed in Section VI. Section VII summarizes the conclusions.

## Ii Multi-image registration: an estimation problem

In what follows, we present the image model used throughout the article for the derivation of the different performance bounds. Also, we introduce the performance indicators used to evaluate the translation estimators. Table I summarizes the notation used in the article.

### Ii-a Image model

Let us consider the image acquisition model:

 zi(x)=u(x−τi)+ni(x),i=0,…,K, (2)

where is the observed -th image at pixel position , is the underlying continuous image generating the noisy shifted observations, is the 2D translation vector of frame with respect to the underlying image (frame zero, ), and is additive Gaussian noise assumed to be independent of .

In practice, we do not have access to the continuous images but to a finite discretization of them. We will assume that all the images are band-limited and sampled according to the Nyquist sampling theorem. Regarding the finite observation support, we will additionally assume that the energy of the signal outside the observed sampling grid is negligible. These two assumptions guarantee an almost perfect interpolation of the

continuous images from the digital ones. Thus, under this ideal framework, we are able to compute image derivatives or image shifts (or any other linear operator) directly from the discrete samples. Although we will omit the details for simplicity, all the considered operators could be computed via Fourier interpolation (e.g., using the dft). Let us assume that the digital images are indexed into vectors of size pixels, where and are the number of rows and columns respectively.

Let be the concatenation of all 2D unknown translations, and be the concatenation of the observed images. The goal in multi-image alignment is then to estimate from .

### Ii-B Performance evaluation

Let us call the vector of parameters to be estimated, e.g. . Given any estimate of , its performance can be measured through the error correlation matrix,

 Rϵ=Ez|θ[ϵϵT], (3)

where is the error with respect to the real parameter value and is the expected value over the observed data distribution given . The fundamental limits on the estimation of can be stated through the family of performance bounds which consider the parameter as an unknown deterministic quantity and provide a limit on . Examples of this family are the Cramér-Rao [18], Bhattacharyya [31], Barankin [32], and Abel [33] bounds, among others.

In some cases, prior information is known about . This motivates the development of the Bayesian bounds, which model the parameter as a random variable with a known prior

distribution, and give a limit on the expected error correlation matrix under the joint distribution of the data and the parameter

 ¯¯¯¯Rϵ=Ez,θ[ϵϵT]. (4)

Examples of Bayesian bounds are the Bayesian Cramér-Rao [34], the Ziv-Zakai [23], and the Weiss-Weinstein [35] bounds.

A more practical performance indicator is the mean squared error of the estimated parameters, which corresponds to the trace of the error correlation matrix. We refer hereafter as mean squared error (MSE) to the trace of and expected mean squared error (EMSE) to the trace of .

In the following sections, we compute and analyze variants of two performance bounds for the multi-image alignment problem, the Cramér-Rao [18] and the Extended Ziv-Zakai lower bounds [26]. The performance analysis is conducted under two different hypothesis for the unknown underlying image: is a deterministic unknown image (Section III), and is a realization of a zero mean Gaussian random process with known covariance matrix (Section IV). Although the Gaussian model is over-simplistic [36], it is nonetheless interesting, not only because of its practicality, but also because it has proven to be very powerful for locally modeling natural images in several applications [37, 38, 39, 40].

## Iii Performance bounds: deterministic image model

In this section, we assume that is an unknown deterministic digital image. We also assume that the noise in the digital observations has a diagonal covariance matrix . Notice that, even if the goal of multi-image registration is to estimate and not , the latter is unknown and needs to be accounted for in the analysis. This kind of parameters, whose estimation is not of direct interest but because they are related to the analysis have to be accounted for, are commonly referred to as nuisance parameters [41]. Hence, the parameter vector becomes , where we are only interested in estimating from the noisy observed images .

### Iii-a Cramér-Rao lower bound: deterministic image model

The performance of any unbiased estimator of is bounded by the CRB [18]

 Rϵ≥J−1, (5)

where is the Fisher information matrix (FIM) given by

 {J}i,j=−Ez|θ[∂2ℓ(z;θ)∂θi∂θj], (6)

and is the logarithm of the likelihood function. The FIM in this case can be expressed as

 JD=[JuuJTuτJuτJττ], (7)

where the term captures the information provided by the image only, the term captures the available information of the translations and represents the information provided by the intercorrelation between and . Using the block matrix inversion principle [42], the inverse of can be expressed as

 J−1D=[S−1uJ−1uuJuτS−1τS−1τJTuτJ−1uuS−1τ], (8)

where and are the Schur complements of the submatrix regarding and respectively, namely,

 Sτ =Jττ−JTuτJ−1uuJuτ, (9) Su =Juu−JuτJ−1ττJTuτ. (10)

It can be shown that for multi-image registration, is given by (see Appendix A)

 S−1τ=σ2[IK+11T]⊗Q−1, (11)

where

is the identity matrix of size

, is a vector of ones of size , is the Kronecker product between matrices,

 Q=[uTxuxuTxuyuTxuyuTyuy], (12)

and , are the derivatives of the latent image in the horizontal and vertical directions respectively.

Equation (5) gives a bound on the covariance matrix of any unbiased estimator of . Therefore, from (5) and (8), the MSE of the estimated translations is bounded by the trace of  [43],

 mse =12KK∑j=1E[(τjx−^τjx)2+(τjy−^τjy)2] (13) ≥12Ktr(S−1τ) (14) =σ2(uTxux+uTyuy)(uTxux)(uTyuy)−(uTxuy)2. (15)

Hence, we define the CRB under a deterministic image model (CRBD) as,

 \textsccrbd\makebox[0.0pt]\tiny def=σ2(uTxux+uTyuy)(uTxux)(uTyuy)−(uTxuy)2. (16)

According to the CRBD, the registration error is proportional to the noise level and inversely proportional to the energy of the gradient. A similar result is presented by Robinson and Milanfar [3], who derived the CRB for the super-resolution problem. Multi-image registration can be seen as a particular case of the super-resolution problem, where the under-sampling operator is equal to the identity matrix.

Performance independence of . An unexpected result is that the bound (16) does not depend on the number of images . This means that this fundamental limit of multi-image registration performance is the same for a set of 2 or any number of images. Nevertheless, unlike stated in [3, Ap. III], this does not imply that registration can be done pairwise without loss of information. The CRB gives a lower bound on performance but it does not ensure the existence of an efficient estimator that reaches this bound. In practice, depending on the problem, the CRB may or may not be tight. Hence, the independence of the CRB of the number of images , does not imply that registration can be done pairwise without loss of information.

As it will be shown experimentally in Section VI, for the multi-image registration problem, the CRBD is tight in high SNR conditions, where we observe indeed that registration can be done pairwise without loss of accuracy. However, it is not necessarily tight in low SNR conditions. In that case, there are other bounds, which are dependent on , that are closer to the actual performance estimators can achieve.

Case with known underlying image. The bound on (16) corresponds to the translations estimate error when the real image is unknown, which is usually the case in practice. In previous works [19, 20], however, the CRB has been computed for the pairwise image registration problem when the only unknown parameters are the shift values.

In that case, the FIM for the multi-image registration problem (7) simplifies to

 JDkn=Jττ=1σ2IK⊗Q, (17)

and the CRB for the case where is known becomes,

 \textsccrbdkn \makebox[0.0pt]\tiny def=12Ktr(J−1ττ) (18) (19)

Therefore, the MSE bound, assuming the underlying image is known, is half that of the case when is unknown. When is known, and the first image in the set is assumed to be aligned (i.e., ), all the other images can be pairwise aligned to the known reference. Indeed, in that case, the different observed images are conditionally independent given the known underlying image. Hence, there is no gain in using the rest of the images for estimating the translation of one image. Therefore, the limiting factor in the pairwise alignment is the noise in one image.

On the other hand, when is unknown, the bound doubles. This may represent a best case scenario where the limiting factor is twice the noise, corresponding to the pairwise alignment of two noisy images.

### Iii-B Bayesian Cramér-Rao with prior on shifts

A natural question that arises after finding that the fundamental performance limit given by the CRBD (16) does not depend on the number of images, is whether this limit can be improved if some prior information about the shifts is known. Intuitively, having more images could improve the alignment performance in the case that is unknown. Let us imagine the case of an algorithm that uses an estimation of the latent image to estimate the shifts. One could expect that the estimation of could be improved by having more images, for example by reducing the noise, and thus leading to a better estimate of the shifts.

Hence, the question is what happens if the motion estimation can be improved by including prior knowledge on the shift vectors. A typical assumption is that the shifts are independent and drawn from a uniform distribution within a limited range. Nevertheless, in some particular applications (e.g., in microscopy or in burst photography), each shift vector depends on the previous ones so modeling the motion as a random walk might be more accurate. In this work, we restrict the analysis to the case where the shifts are independent.

A Bayesian version of the CRB bound can be computed to include prior information on the unknown parameters. The Bayesian Cramér-Rao bound (BCRB) gives a lower bound on the expected error correlation matrix under the joint data and parameter distribution

 ¯¯¯¯Rϵ≥J−1B, (20)

where is the Bayesian Fisher information matrix given by

 {JB}i,j=−Ez,θ[∂2ℓ(z,θ)∂θi∂θj], (21)

and is the logarithm of the joint likelihood function.

Generalized Gaussian prior on shifts. Let us consider a centered generalized Gaussian prior distribution for each component of . This family of densities, indexed by the parameters and , is given by

 p(τ;c,δ)=cη(δ,c)Γ(1/c)exp(−ηc(c,δ)|τ|c), (22)

where

 η(δ,c)=1δ(Γ(3/c)Γ(1/c))1/2, (23)

and denotes the gamma function. The case of corresponds to the Gaussian density, while the distribution approaches the uniform density with variance as .

Given independent samples following model (2), and assuming that is an unknown deterministic image and is a random variable following the generalized Gaussian prior (22), the Bayesian FIM is given by (see Appendix B)

 JB=JD+Jp, (24)

with

 Jp=[0001λ2I2K], (25)

where and is the FIM given in . Therefore, including a prior on the translations adds the term to the classical FIM given in (7).

Then, the Schur complement of the submatrix regarding , becomes (see Appendix B)

 ¯Sτ =1σ2(I−(K+1)11T)⊗Q+1λ2I, (26)

and

 ¯S−1τ =I⊗(1σ2Q+1λ2I)−1 (27) =+11T⊗λ2((K+2)I+λ2σ2Q+(K+1)σ2λ2Q−1)−1.

The EMSE of the translations under the given prior is then lower bounded by

 \textscemse≥\textscbcrb\makebox[0.0pt]\tiny def% =12Ktr(¯S−1τ). (28)

As a first observation, let us point out that adding prior information on the translations makes the bound dependent on the number of images . Figure 1 shows a comparison of the CRB ( known and unknown) and the BCRB for different number of images with a Gaussian prior () with . Note that both bounds are very similar for high SNR values. This means that if the SNR is high enough, there is no gain in having prior knowledge about the shifts. However, for low enough SNR and large enough , the prior bounds the errors on the shift estimates and the BCRB is below the CRBD and approaches the bound for a known image until it reaches a steady-state value equal to the variance of the prior.

Notice that the example shown in Figure 1 corresponds to a pretty tight prior (), meaning that an accurate interval for the shifts is known a priori. Remarkably, even under this seemingly favorable condition, the reduction of the BCRB is observed only for a very low SNR range and for a very large number of images. This suggests a very limited impact of this shift prior in practice, being useful only for very low SNR conditions, a tight prior of the shifts interval, and a very large number of images.

The generalized Gaussian prior approaches the uniform distribution when . Thus, for any fixed , as the shift prior approaches a uniform distribution. Hence, the prior information becomes irrelevant and the performance is bounded by the CRBD, which is independent of . Of course, this does not mean that having more images does not help for estimating the shifts. As previously mentioned, if the CRBD is overoptimistic and cannot be attained, a tighter bound may still exist, that does depend on the number of images.

## Iv Performance bounds: stochastic image model

In this section, we consider a zero-mean Gaussian stochastic model for the underlying unknown image . As stated before, our goal is to estimate the shifts between every pair of images given by (2), or equivalently in the Fourier domain, from

 ~zi(ω)=~u(ω)e−iω⋅τi+~ni(ω),i=0,…,K, (29)

where ~  denotes 2D image Fourier transforms, represents the 2D Fourier spatial frequency and denotes the inner product operation.

We now assume that the signal samples are drawn from a stationary zero-mean Gaussian process with spectral density . The additive noise is modeled by the zero-mean Gaussian process with spectral density , assumed to be independent of the underlying signal .

The observed digital images can be converted into the Fourier domain by applying the 2D dft. In practice, since the input images are real, the complex Fourier coefficients have Hermitian symmetry, where two of the four quadrants fully determine . Here, we arbitrarily choose to work with the positive values of and the complete range for (i.e., first and second quadrants of the 2D dft). Hence, we will only consider the complex Fourier coefficients corresponding to frequencies with and . In addition, we will assume that the Fourier coefficients of are uncorrelated at the considered spatial frequencies.

Let , with , index all the considered 2D frequencies . The Fourier transform of the () observed images can be arranged into a vector

 ~z=[~z0(ω1),~z1(ω1),…,~zK(ω1),…,~z0(ωM),~z1(ωM),…~zK(ωM)]T. (30)

Under Gaussian assumptions for the noise and the underlying image,

follows a complex Gaussian distribution with zero mean and covariance matrix

 Σ=E[~z~zH]=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣Στ(ω1)0…00Στ(ω2)…0⋮⋮⋱⋮00…Στ(ωM)⎤⎥ ⎥ ⎥ ⎥ ⎥⎦, (31)

where each matrix has size and is composed by

 Στ(ω)=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣S(ω)+N(ω)S(ω)e−iτ1⋅ω…S(ω)e−iτK⋅ωS(ω)eiτ1⋅ωS(ω)+N(ω)…S(ω)ei(τ1−τK)⋅ω⋮⋮⋱⋮S(ω)eiτK⋅ωS(ω)e−i(τ1−τK)⋅ω…S(ω)+N(ω)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (32)

### Iv-a Cramér-Rao lower bound: stochastic image model

In order to compute the CRB for the shifts estimation in the multi-image alignment problem under model (2), we first compute the corresponding FIM matrix. For the considered complex Gaussian process , it is given by [18, Ap. 15C]

 {JS}ih,jq=tr(Σ−1∂Σ∂τihΣ−1∂Σ∂τjq), (33)

where , and index the two components of each 2D shift vector . Carrying out the indicated operations (see Appendix C) we get

 JS=[(K+1)IK−11T]⊗B, (34)

with

 B=[ρx,x−ρx,y−ρx,yρy,y], (35)

and

 ρh,q=M∑l=12S2(ωl)ωlhωlqN2(ωl)+(K+1)S(ωl)N(ωl). (36)

Hence, we have

 J−1S=1(K+1)[IK+11T]⊗B−1. (37)

The error covariance matrix of any unbiased estimate of the shifts is thus bounded by

 Ez|τ[(^τ−τ)(^τ−τ)T]≥J−1S, (38)

and the MSE is lower bounded by the trace of ,

 mse ≥12Ktr(J−1S)=1(K+1)(ρ2x,x+ρ2y,yρx,xρy,y−ρ2x,y). (39)

If and are rotationally symmetric (i.e., rotation invariant), it can be shown that and . In this case, we define the CRB under the Gaussian stochastic image model (CRBS) as

 \textsccrbs\makebox[0.0pt]\tiny def=2(K+1)ρxx=2(K+1)ρyy. (40)

Notice that, unlike the CRBD (16), the CRBS (40) depends on the number of images .

High SNR. Under high signal-to-noise conditions, the CRBS for a rotation invariant process (40) simplifies to

 \textsccrbs\textschsnr\makebox[0.0pt]\tiny def% =2σ2(2π)2Np∫S(ω)ω2xdω. (41)

This bound is indeed independent of the number of images and agrees with the deterministic CRB given by (16) (see Appendix C).

To help further understand the behavior of the computed bound we analyze its behavior both for natural and flat spectrum images.

Natural images. A typical natural image presents complex structure that is difficult to model accurately. One classical assumption, it that the power spectrum of natural images falls quadratically with the Fourier frequency. Although simplistic, this is in fact reasonable if we consider that natural images have a relative contrast energy that is scale invariant [44]. Let us assume that the considered underlying image follows this law, that is,

 S(ω)={Sn∥ω∥−2if% max(|ωx|,|ωy|)≤W/2,0otherwise, (42)

where is a known parameter, and models the signal bandwidth. Also, we will assume that the additive noise spectrum has a constant value in the frequency band and is zero otherwise.

Let us define the signal-to-noise ratio as the ratio between the energy of the derivative and the noise power

 \textscsnr\makebox[0.0pt]\tiny def=1NW2∫S(ω)∥ω∥2dω. (43)

For the case of a natural image (42), the snr is then

 \textscsnrn=Sn/N. (44)

and the CRBS bound for natural images becomes (see Appendix C)

 {crbs}n\makebox[0.0pt]\tiny def=8πNp(K+1)\textscsnr2nacoth(1+2π(K+1)\textscsnrnW2). (45)

When , we have that

 {crbs}n→16π2NpW2\textscsnrn, (46)

which does not depend on . The breaking point from the asymptotic (very high SNR point) occurs approximately when , which happens at,

 \textscsnrKn1=W22π(K+1). (47)

This implies that, if , having access to more than two images will not improve the bound. This absolute breaking point happens at approximately .

Flat spectrum images. Another helpful case is to study the behavior of the CRBS when the underlying signal has a flat power spectral density, that is,

 S(ω)={Swif max(|ωx|,|ωy|)≤W/2,0otherwise. (48)

Similarly, we assume that the additive noise spectrum has a constant value of in the frequency band and is zero otherwise.

The signal-to-noise ratio (as defined in (43)) for white signals becomes

 \textscsnrw=SwW26N. (49)

In this case, the CRB bound for flat spectrum images is (see Appendix C)

 {crbs}w \makebox[0.0pt]\tiny def=8π2(W2+6\textscsnrw(K+1))3Np(K+1)\textscsnr2wW2 =8π23Np(K+1)\textscsnr2w+16π2NpW2\textscsnrw. (50)

In particular, when we have

 {crbs}w→16π2NpW2\textscsnrw, (51)

which does not depend on the number of images . The breaking point is when both terms in (50) are approximately equal, which happens at

 \textscsnrKw1=W26(K+1). (52)

Thus, if , having access to more than two images will not improve the bound. This absolute breaking point happens at approximately .

Because this threshold is very similar to the one obtained for natural images (see Eq. (47)), for simplicity, we refer hereafter to both and as .

Figure 2 shows the computed CRBS bounds for both image models, with different number of input images and varying SNR levels. Both image models have very similar behavior. There is a very high SNR zone where the bounds depend linearly with the SNR level (). Within this SNR region, having access to more images does not have an impact on the bound. In moderate to low SNRs () both Cramér-Rao stochastic bounds depend super-linearly with the SNR (i.e., performance degrades faster at low SNR values than in the very high SNR region). Increasing the number of images pushes back , increasing the SNR range where performance is linear with SNR. The performance is linear with image size in both cases on the whole SNR domain.

An alternative approach to include an image model is to compute a Hybrid Cramér-Rao bound (HCRB).222Hybrid in the sense that there are random and deterministic parameters. Similarly to what was done in Section III-B, one can compute a HCRB including the desired model as an image prior and then compute the expected FIM under this prior. Robinson and Milanfar [3] computed such bound for the super-resolution problem assuming a Gaussian model for the image similar to the one presented here. Although related, these two bounds are different. The HCRB gives a bound on the expected MSE under the given image prior. This can be seen as an average bound for the different likelihoods obtained for each given possible value of the image. On the other hand, the CRB on Equation (40) gives the bound based on the expected likelihood under the given image model. Under some regularity conditions, it is possible to show that the CRB is always tighter than the HCRB [45, Thm. 1]. Nevertheless, in many applications the computation of the HCRB is much simpler than the CRB, leading to a reasonable alternative.

### Iv-B Extended Ziv-Zakai lower bound

In general, the CRB is known to be tight in high SNR but overoptimistic in low SNR conditions. Various Bayesian bounds have been derived to obtain tighter and more accurate predictions of the MSE behavior in the entire SNR range. One example of this is the bound proposed by Ziv and Zakai [23], which relates the expected MSE (EMSE) of the estimator over a given prior, to the probability of error in a binary detection problem.

Consider the estimation of a -dimensional random vector with a prior distribution , based upon an observation vector . The extended Ziv-Zakai lower bound (EZZB) on the EMSE of any estimate of over is given by [26]

 aT¯¯¯¯Rϵa≥∫∞0V{maxδ:aTδ=h [∫RKmin(pθ(φ),pθ(φ+δ)) ⋅Pmin(φ,φ+δ)dφ]}hdh, (53)

where is any -dimensional vector, is the valley-filling function,333The valley-filling of a function is obtained by filling-in any valleys [26], and is given by . and , , is the probability of error in the binary detection problem

 (54) H1:^δ=φ+δ;z∼p(z|θ=φ+δ), (55)

with equally likely hypotheses. The vector , with represents a possible 2D shift between the -th and the first image (indexed in the same way as ).

The Ziv-Zakai bound is based on the probability of correctly choosing the parameter to be estimated between two possible values: or . The bound is found by integrating the minimum error along all possible estimated values (in general ruled by both and

), weighted by their prior probability of occurrence, and by bounding the minimum probability of error in this binary detection problem.

If the probability of error is only a function of the offset between the hypothesis, i.e., , which is precisely the case in our translation estimation problem, the bound simplifies to

 aT¯¯¯¯Rϵa≥∫∞0V{maxδ:aTδ=hA(δ)Pmin(δ)}hdh, (56)

where

 A(δ)=∫R2Kmin(pθ(φ),pθ(φ+δ))dφ. (57)

Thus, to compute the EZZB of the shift estimation problem we need to compute and the probability of error .

If we assume the shifts to be uniformly distributed , takes the simplified form

 A(δ)=2K∏i=1(1−δiD). (58)

The probability of error for the case of multi-image registration is given by (see Appendix D)

 Pmin(δ)≈12exp{a(δ)+b(δ)}Φ(√2b(δ)), (59)

where

 a(δ)=−M∑l=1log[1+γ(δ,ωl)],b(δ)=M∑l=1γ(δ,ωl)1+γ(δ,ωl), (60)
 γ(ω,δ)=S(ω)2((K+1)2−T(δ,ω))4(N(ω)2+(K+1)N(ω)S(ω)), (61)
 T(δ,ω)=∣∣1+K∑j=1e−iδj⋅ω∣∣2% andΦ(t)=1√2π∫∞te−t22dt. (62)

Flat spectrum signals. As done for the CRBS case, let us consider the particular case of flat spectrum signals defined previously by Equation (48). The analysis presented hereafter closely follows and extends the work by Weinstein and Weiss [29] to the case of multiple signals.

For simplicity, the following analysis is restricted to one-dimensional signals. The extension to two-dimensional signals is straightforward in the case where the image is assumed to be drawn from a white random process (full bandwidth flat spectrum, i.e., ) . In this case, knowing the translation in one direction does not give any additional information to the estimation of the other one. As a consequence, the 2D image can be rearranged into a one-dimensional vector by concatenating its rows without loss of information regarding the estimation of the translation along the columns. Following this remark, in this section, we will consider one-dimensional signals having length and .

The EZZB corresponding to the estimation of one single component is given by (see Appendix E)

 {emse}1≥\textscezzbw \makebox[0.0pt]\tiny def=1c2∫√2b0hexp{−9h420Np}Φ(h)dh \makebox[0.0pt]\tiny def=+D26ea+bΦ(√2b), (63)

where

 a=−Nplog(√κ2+1+12),b=Np2√κ2+1−1√κ2+1,c2=Npπ2κ112 κ1=9\textscsnr2w(K+1)8π4+12π2\textscsnrw(K+1),κ2=9\textscsnr2wK4π4+6π2\textscsnrw(K+1). (64)

Analysis of the EZZB: different SNR regions. The EZZB behaves differently in low and high SNR regimes, as dictated by the two terms in Eq. (63) and as illustrated in Figure 3(a).

i) High SNR. For , the error term (63) is mainly driven by the first term since and .

Assuming that , we obtain [28]

 \textscezzbw \textscsnrw→∞−−−−−−−−−−→1c2∫√2b0h⋅exp{−9h420Np}Φ(h)dh (65) ≈1c2∫∞0h⋅exp{−9h420Np}Φ(h)dh=14c2 (66) =8π2+12\textscsnrw(K+1)3Np\textscsnr2w(K+1)={crbs}w. (67)

Indeed, in the high SNR regime, the EZZB approaches the CRBS under the stochastic image model given by (50).

ii) Low SNR. On the other hand, in a very low SNR scenario, , , and . Thus,

 \textscezzbw\textscsnrw→0−−−−−−−−−−→D26ea+bΦ(√2b)≈D212, (68)

which is the variance of the shifts prior.

iii) Transition zone. The transition from the low-SNR to the high-SNR region starts when the two terms have a similar contribution to the bound, that is,

 14c2=D26ea+bΦ(√2b). (69)

We could (arbitrarily) say that the transition is completed when the bound reaches half the asymptotic value, i.e.,

 ea+bΦ(√2b)=14. (70)

Equations (69) and (70) characterize the limit SNR levels of the transition zone. Let be the SNR level that satisfies (69) and the SNR level satisfying (70). Within this transition region the bound is essentially dominated by the behavior of .

Figure 3 shows how this region changes when varying the number of images , the prior and the image size . The threshold , below which the EMSE decreases significantly and worsens exponentially with the SNR, depends on the number of available images . This is probably the most important consequence of having access to more images. Figure 3(b) shows how this threshold can be pushed back several dBs by increasing , until reaching a limit.

Note that this critical SNR level also depends on the image size . As a consequence, increasing the image size reduces the performance bound and pushes this SNR limit as shown in Figure 3(c).

On the other hand, as illustrated in Figure 3(d), the threshold is not significantly affected by the shift prior parameter . This means that, for practical values (e.g., ), having a tighter shift prior does not push back significantly. Nevertheless, as expected, the steady state EMSE predicted by EZZB does decrease with .

## V Comparison of performance bounds

In this section, we analyze and compare the behavior of the previously computed CRB and EZZB bounds. To this effect, it is important to make the distintion that the CRB is a bound on the MSE while the EZZB and the BCRB are bounds on the EMSE over a given prior for the shifts.

To simplify the discussion, let us consider the case of white signals. Figure 4 shows a comparison of the CRB bounds (both for deterministic (CRBD) and stochastic (CRBS) image models), the BCRB (with Gaussian shift prior of variance ) and the EZZB (with uniform shift prior, ) assuming an image of size pixels. For the CRBD and the BCRB cases, that depend on a deterministic signal, we used a realization from the white random process used in CRBS and EZZB. Based on the SNR values, the behavior of the bounds can be characterized into four different regions i-iv.

i) Very high SNR (). In this region, all bounds agree. Hence, the same fundamental limit is predicted for both the MSE and the EMSE. This limit does not depend on the shift value nor on the width of the prior, within practical limits for and (). The performance bound only depends on the total image gradient energy and the noise level, and it is linear with the SNR and image size . Hence, a very important remark is that, in this SNR region, all bounds predict that having access to more than two images () or having a more accurate shift priors (a smaller or within practical limits) will not lead to better performance. The threshold defining this region, , depends on the number of images (see Eq. (