Burst ranking for blind multi-image deblurring

10/29/2018 ∙ by Fidel A. Guerrero Peña, et al. ∙ UFPE 0

We propose a new incremental aggregation algorithm for multi-image deblurring with automatic image selection. The primary motivation is that current bursts deblurring methods do not handle well situations in which misalignment or out-of-context frames are present in the burst. These real-life situations result in poor reconstructions or manual selection of the images that will be used to deblur. Automatically selecting best frames within the burst to improve the base reconstruction is challenging because the amount of possible images fusions is equal to the power set cardinal. Here, we approach the multi-image deblurring problem as a two steps process. First, we successfully learn a comparison function to rank a burst of images using a deep convolutional neural network. Then, an incremental Fourier burst accumulation with a reconstruction degradation mechanism is applied fusing only less blurred images that are sufficient to maximize the reconstruction quality. Experiments with the proposed algorithm have shown superior results when compared to other similar approaches, outperforming other methods described in the literature in previously described situations. We validate our findings on several synthetic and real datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 4

page 5

page 8

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Original Burst Fourier Burst Accumulation
Sorted Burst (Ours) Incremental Fourier Burst Accumulation (Ours)
Fig. 1: Original and sorted bursts with its respective aggregations. In our Incremental Fourier Burst Accumulation algorithm a degradation recognition mechanism interrupts the aggregation whenever a certain level of deterioration is observed during the reconstruction.

Image captured by mobiles camera devices usually contain motion blur, caused by hand tremors and dynamic scene content [1]. The motion blur occurs because photons accumulation is the main principle of the image acquisition process. While more photons reach the sensor, a better image is obtained as the noise is minimized. However, a static scene is required because motion between the acquisition surface and the scene produce a wrong accumulation of photons on pixels neighborhood leading to a loss of sharpness.

Recent mobiles cameras capture a burst, which are a sequence of frames taken in a short period of time. Several researchers study the benefits of burst frames aggregations to form images with less noise [2]. However, the methods are affected by the blur in the frames. The multi-image deconvolution approach solves the inverse problem searching for the kernels and the latent sharp image [3]

. Although good results are obtained with this approach, it is very slow and have a high dependency on the kernel estimation method. This is a very a challenging problem by itself.

Because motion in mobile devices is originated from hand tremor, the nature of the blur is random [4]. This implies that changes in a frame of the burst are independent to changes in others frames. Then, the motion blur in every image within the burst is also different. An information aggregation approach can be applied by taking a less blurred region of each image to build a sharper reconstruction [5]. The problem is still challenging as the less blurred region needs to be identified in each frame and, even after the correct identification, images might be misaligned.

Most of the modern methods for multi-frame deblurring require the input to match a fixed temporal size [6, 7, 8]. In these methods, it is established the burst length, and the aggregation process does not handle more extended frames sequences. This problem relaxation can significantly harm the deblurring task under real-life conditions because motion blur random nature does not guarantee that enough information is available to obtain a proper reconstruction given a fixed amount of frames.

To work with variable burst lengths, some authors performed a burst-size independent aggregation using a weighted average in the Fourier domain [5] and a recurrent network with temporal feature transference [1]. Despite the advantages of these approaches, referred works do not consider that some images in the burst present severe blur degradation, misalignment (even after registration) or out-of-context frames that can lead to noisy reconstruction. We show experimentally that sorting the burst from less to higher blurred images and then applying an incremental aggregation with automatic frame selection will result in better image deblurring.

Our approach performs a local relative ranking of frames through a novel blur estimation bi-variable function approximated by a VGG [9], a well-known CNN model architecture. In order to train this model, we successfully generate a synthetic dataset of blurred images through motion blur kernels synthetically created with no cost in the acquisition process. In the sequel, the dataset samples annotation is also automatically carried out through the proposed kernel complexity metric score. Lastly, an incremental aggregation method is applied by using only two images at each step until the maximum quality reconstruction is obtained (equivalently a reconstruction degradation is measured) or no more images are left in the burst. The proposed algorithm deals with arbitrary lengths of sequences and performs an automatic relevant images selection. Fig. 1 shows an example of a sorted burst and incremental aggregation.

The remaining of the paper is organized as follows. Section II discusses the related work and the substantial differences to what we propose. Section III explains how a burst can be sorted and fused while in Section IV we present and discuss results of the proposed ranking function and aggregation procedure, comparing the algorithm to other literature methods. Conclusions are finally summarized in Section VI.

Ii Related Work

Image burst ranking: An approach closely related to our blur comparison function was proposed in [10] to learn a similarity function. Their study was conducted with several networks to regress a similarity score. The referred work shows that even without a direct notion of a descriptor, a simple VGG with two inputs corresponding to 8 bits images is sufficient to learn a comparison function. Here we extend the approach for classification over RGB images, and we force the network to meet the trichotomy law for better results.

Another method close to this was proposed in [11]

for automatic selection of optimal deblurring algorithms. The authors performed a massive user study and manually select the features that capture common deblurring artifacts. Then, a metric is learned combining the information of each computed feature to meet the rank order given by the users. Finally, a robust evaluation metric was introduced to compare the ranks obtained by their method and the ground truth. Their algorithm returned a global score but involve a blurry reference image in the metric computation. Although the problem solved here does not use any reference image, several ideas were taken from their work for sorting and evaluation. In another hand, the relevant features herein are learned by the network and not hand-crafted.

Blind multi-image deblurring: One of the most relevant work in multi-image deblurring is the Fourier Burst Accumulation [5] (FBA). The authors present an algorithm that aggregates a burst of images in the frequency domain, taking what is less blurred of each frame to build an image that is sharper and less noisy than all burst frames. To this purpose, the method takes as input a series of registered images and computes a weighted average of the Fourier coefficients of the images. The algorithm is straightforward to implement and conceptually simple with excellent results. Despite the method usefulness, frames are assumed to be well registered (even those with to much blur) something very hard to obtain. The method proposed in here deals with this problem introducing a new degradation recognition mechanism.

Wieschollek et al.[1] recently proposed a novel U-Net based Recurrent Deblur Network (RDN). The network can handle arbitrary temporal sizes and propagate previous frames information through a new kind of temporal skip connections. Also, the information aggregation is performed incrementally like in this work. However, images where the scene changes drastically by motion blur or out of context are also considered during the deblurring reducing the reconstruction quality. Our proposed degradation recognition mechanism is used to handle this situation. Because their work uses an incremental aggregation network, we plan to extend our method using an RDN as the deblurring step.

Iii Proposed Method

As mentioned above, we formulate the multi-image deblurring problem as a two steps process where it is performed a burst sorting as a local relative ranking task, and then an incremental aggregation method is applied.

For the burst reorganization step, given a burst with frames, denoted as , the set of all possible pairs of is obtained as the cartesian product , also written as . Let be a generic tuple of . Our goal is to find a comparison function that takes two frames of

as input, and outputs the probability that

is blurrier than , . From it can be defined the blurrest binary relation over using the maximum a posteriori decision rule for a binary classification problem, that is, , once has been directly computed through the output value. Then, a totally ordered set can be defined over given the binary relation , .

We propose to approach the information aggregation step in an iterative way. The objective is to find an optimal ordered subset, , such that is as close as possible to the ideal sharp image , being the reconstructed image from the frames . An incremental reconstruction over an ordered burst is defined here as , where is an iteration number, is the frame of the sorted burst , and is the obtained reconstruction at iteration , with . Let be the set of all possible incremental reconstructions over , . An incremental aggregation function is defined in a manner that the deblurring is performed iteratively using a previously deblurred image. Let be the blurrest binary relation over , and the obtained totally ordered set, . The maximum of is then defined as the sharpest reconstructed image in , such that and , . Then, the searched optimal ordered subset will be the one that contains the minimum amount of elements that are sufficient to generate the maximum of .

Fig. 2 depicts the overall process where burst sorting is the first step followed by an incremental aggregation with degradation recognition for deblurring. We perform the deblurring until a restoration improvement is no longer obtained. A detailed explanation is given below for every stage.

Fig. 2: Example of the overall method scheme for an input burst of four frames sorted, and then performed an incremental aggregation deblurring with degradation recognition.

Iii-a Data set

As will be discussed later, the target function is approximated through a CNN, and training such a neural network to predict the relative blur ranking given two blurry inputs requires realistic training data. Because there is no public burst dataset available for our ranking purpose, we synthetically generate our dataset to train the comparison function . One way to do it is sampling continuous frames from existing videos to simulate burst sequences which seem to be reasonably straightforward. This approach was used in recent works [12, 8] building a training dataset by recording videos captured at 240fps with a GoPro Hero camera. Next, the average is taken over obtained images providing an acceptable synthetic motion blur. However, a huge amount of blur needs to be manually assigned to each frame in a given tuple for the creation of the binary relation. We have found very expensive to create by hand the enormous amount of ground truth data required for the network. Another way to perform this task as accurate as possible requires also a sharp image of the scene to compute a blur score using a reference metric. Nevertheless, while a significant effort can be made to construct the dataset, there is a limitation in the number of captured videos, the used recording devices and in the diversity of scenes.

Based on the data generation proposed in [7] we generated our dataset by applying synthetic blur kernels to patches extracted from the MS COCO dataset [13]. This dataset is composed of highly variated real-world images obtained from the internet. For a fair evaluation, we use the provided splitting in training and validation set. Optimizing network parameters is done on the training set only. This kind of data generation gives us a nearly infinite quantity of training data.

Blur kernels were created following [14] and using authors implementation. First, a particle random motion trajectory with length is generated. Then, the Point-Spread Function (PSF)

is obtained by sampling the continuous trajectory on a regular pixel grid using subpixel linear interpolation,

with . A blurred input tuple is generated on-the-fly by applying two created kernels and to a randomly selected sharp patch .

Iii-B Measuring blur

, , , , , , , , , ,

, , , , , , , , , ,
Fig. 3: Generated burst manually ordered in an increasing blur order and different quality metrics for each image. Image blur increases from left to right and from top to bottom.

To measure the quantity of blur it is typical to use image quality metrics as shown in the Fig. 3, where the notation , and refers to the Mean Square Error, Peak Signal-to-Noise Ratio, and Structural Similarity Index quality metrics respectively. Despite these metrics detect the amount of blur as expected in cases as seen in the Fig. 3, they are also affected by other deteriorations like noise, illuminance change and simple image shift.

Because the amount of blur in an image depends solely on the trajectory, and we synthetically generate the kernels, a better blur metric can be obtained from it rather than computing a reference quality metric with the sharp patch . Given two kernels and , the goal is to compute a blur score over each kernel and compare them to create the binary relation ground truth, . To this end, we explore several PSF descriptors as can be seen in Fig. 3. For each image in the figure the respective values for the following quantities are presented: , the trajectory length, and

that corresponds to trajectory eigenvalues

[14], and

that represent half of the harmonic mean between eigenvalues. From a quick inspection of the figure it can be observed that from the aforementioned metrics a burst ranking cannot be obtained as in the manually sorted ground truth. The disagreement occurs because the methods only focus either on the trajectory shape or the length for describing the movement complexity, without considering both merged in a single descriptor, or the shifting kernels effect over the metric. A shifted kernel refers to a PSF with non-vanishing first moment. That is,

[5].

We observed that the farther the position in the trajectory from the kernel center, the worst is the blur because a broader pixel neighborhood is used in the degradation process. On the other hand, the larger the region, the higher the probability of averaging out of context pixels information. We propose a metric that intrinsically considers the trajectory length, shape and shift as an exponential distance between the kernel and its center.

(1)

where and is the greatest value of that can be obtained, in this work . In Fig. 3 it can be seen the proposed metric value and its correspondence with the ground truth order. Also, it is observed that similarly blurred images have closer values of whereas high differences are perceptually differentiated.

After computing and for a generated kernel pair , a training tuple will belong to the blurrest binary relation, , iff , e.g. the image is blurrier than .

Iii-C Burst images ranking

Although the proposed blur metric (Eq. 1) allows sorting a burst according to the blur of each image, it is restricted to cases when the PSF that originated the degradation is known. Then, as discussed in Sec. III, a function that can predict the amount of blur of an image within an unknown-length burst needs to be learned. In this work it is proposed the use of a convolutional neural network to determine both the discriminative features as well as the comparison function to be used in the ranking. Although the blur score is known and can be used to train a regression function [15], it is very slow to train and context dependent. We also find that this type of Image Quality Assignment (IQA) models requires specific regression operations like pooling by weighted average patch aggregation [16] and general regression neural networks [17] that needs to be tuned depending on the metric to be learned.

To circumvent these issues we solved the problem as a relative ranking measure. Then, the problem is cast as a binary classification task using a simple VGG architecture to approximate the trainable function. This method allows using any classification architecture for sorting a set of images. The network input is a pair of blurred images and the prediction is the amount of blur in each image relative to the other, e.g. the probability that image is blurrier than and vice versa. For a given pair from the training set, the classification label used to train VGG is the indicator function over the ground truth set , e.g.  if , otherwise . Using an input pair rather than directly regressing the blur score of one image simplifies the comparison task and remove images dependence as the network sees at once two frames that belong to the same burst and therefore, the same scene. In so doing, the features that make one image to looks blurrier than the other without considering the scene context.

Some authors use a VGG-based comparison network to perform images triage with Siameses [18] and Generative Adversarial Networks [19] architectures. Our blur comparison network, however, is closest to the approach studied in [10] to learn a similarity function because its superiority was validated concerning one input image-based approaches. Nevertheless, we use color images for a classification task rather than a grayscale image for regression. Here it is simply considered the two RGB images of an input pair as a 6-channel map, which is directly fed to the first convolutional layer of the VGG. As in [10]

, there is no direct notion of a descriptor in the architecture. This network provides greater flexibility compared to the above models as it starts by processing the two images jointly. During network training the output vector

is a

map used for Binary Cross Entropy (BCE) loss function minimization, such that

. However, without loss of generality, the output of comparison function is assumed to be the first component which represent the probability that is blurrier than .

After learning the function, the blurrest binary relation can be obtained for a burst of any size. Firstly, is computed for every pair , being obtained as output the probability that is blurrier than . Because a total order over is required, the definition must meet the trichotomy law, which means that for a generic tuple and its reverse , also written as and respectively, only one of the following holds: , or . This implies that as considered in training. Then, the probability that is blurrier than is calculated as:

(2)

and .

In practice, for creation only a subset needs to be evaluated on . Given a pair , is created such that if then else . This simplification reduces the computation in the worst of the cases to O() when used the bubble sort algorithm. The parallel proficiency of current GPUs was used to evaluate for all pairs of at once, reducing significantly the execution time for such operations.

Once is derived for a given burst , the totally ordered set can be obtained. For such, the simplest approach is to compute a score for a given image in the burst as:

(3)
(4)

where is a simplified notation for the pair and

is the characteristic function over

. The lowest the value of , the sharpest is the image with respect to others images in the burst.

A soft version to compute the score and therefore the sorted set is proposed as follows:

(5)
(6)

Using the statements that and pairs on are sufficient to obtain , the crisp and soft score computation can be performed as in Eq. 4 and Eq. 6 respectively, where only pairs are used. The ordinal rank of is then obtained such that the rank of is greater than the rank of if . is the set of elements of in the order established by .

Iii-D Incremental aggregation

After obtaining a sorted version of the burst in an increasing blur order, a deblurring process is carried out. Implementing an information aggregation method, where less blurred information of each image is used to perform the reconstruction, is a successful approach for multi-images deblurring. The classical and simplest form of aggregation is the Fourier Burst Accumulation (FBA) proposed by Delbracio and Sapiro [5]. This algorithm can be viewed as a lucky image algorithm working in the frequency domain. However, while in lucky methods the burst should contain at least one sharp image or region to obtain a proper reconstruction, the FBA fuses images within a burst by considering their frequency content without the need of sharp frames. The bases of the algorithm are that the motion blur process strongly attenuates some Fourier coefficients while leaving others almost unaltered. If a Fourier coefficient power is larger in than its corresponding coefficients power in other frames, then has been less attenuated by motion blur. The method is effective even if each image is blurred. A better reconstruction is obtained if the burst is sufficiently time-variant to ensure that each Fourier coefficient has been left unaltered at least once in some neighboring frames [20].

Because in the original FBA proposal the burst order does not matter, the method is implemented using all images at once. Quite the contrary, here an Incremental Fourier Burst Accumulation (IFBA) using only two images at each iteration along with an automatic image selection is proposed. Our method is based on the fact that FBA aggregation usually assigns the higher weights to coefficients of the sharpest images, almost disregarding frequencies in frames with too much blur, see Fig. 4. Then, from all possible subsets in the powerset of , , that can lead to a reconstruction, we are particularly interested in such that if then , being the frame of the sorted burst . In so doing, only the sharpest images of the burst will be used to perform the reconstruction. Our goal then is to automatically find the optimal burst such that the reconstruction , where is the latent sharp image. In the worst case, that is, all frames are needed to get a significant better reconstruction, the set and our method performs similarly to the FBA.

Fig. 4: FBA weights for nine frames where the top-left image is the sharpest one and the bottom-right is blurrier. The image blur increases from left to right and from top to bottom. The hottest colors represent higher weights while the coolest colors represent the lowest weight values.

As stated in the problem definition section, an incremental aggregation parametrized by and is defined as . The definition of

can be obtained directly from the FBA aggregation formulation. For notation simplification let us denote the Fourier Transform of the frame

as and let be the mean power of the image channels. The FBA aggregation function is computed as:

(7)

Let’s denote . The Eq. 7 is then expressed as:

(8)

giving a recursive expression,

(9)

Following Eq. 9, the Incremental Fourier Burst Accumulation is obtained as:

(10)

where is the Fourier power accumulation of images used to obtain the reconstruction .

This incremental formulation of the FBA allow us to efficiently compute an iterative reconstruction with a degradation recognition mechanism. The deterioration measure is needed because the random nature of the blur does not allow a fixed amount of images to be set for obtaining the closest sharp image reconstruction. Considering that is the frame of the sorted burst , then at iteration will be computed with the sharpest images of the burst. Because the aggregation of a seriously blurred image can harm the deblurring process, then a reconstruction degradation is measured for automatically stop the IFBA. The relative comparison function is used here to measure a reconstruction degradation as described in previous section. The optimal subset is then build as , where , and are related through Eq. 10 as .

Iii-E Algorithm implementation

The Algorithm 1 summarize the implementation of the proposed method including the burst sorting (lines 3-8) and the incremental aggregation (lines 9-19) discussed in previous sections. The steps 3-6 of the sorting process were implemented as a single mini batch evaluation over all pairs in using GPU parallel computation. The crisp version of the algorithm for implementing Eq. 4 has a small variation in step 4 for obtaining the indicator function value, if else . Note that if the algorithm corresponds to Bubble sort, whereas any other sort algorithm like quick sort can be used instead.

As can be seen in the Algorithm 1, the incremental aggregation steps perform the deblurring using less blurred frames first according to the order in . Because at iteration , then (step 13) and (step 15). For values of the aggregation is performed as in Eq. 10. The optimal burst is composed by the firsts images of until a reconstruction degradation is found (line 16). A Gaussian smoothing over the mean magnitude is performed (line 12) selecting as in [5].

1:procedure IFBA(,)
2:     
3:BURST SORTING
4:     for  do
5:         
6:         
7:               
8:      //Ordinal rank
9:     
10:INCREMENTAL AGGREGATION
11:     for  do
12:         
13:         
14:         
15:         
16:         
17:         
18:         if  then
19:              break          
20:               
21:     
Algorithm 1 Incremental burst deblurring

Iv Materials and Methods

To evaluate and validate our approach we conduct several experiments including a comprehensive comparison with state-of-the-art techniques on a real-world dataset, and performance evaluation on a synthetic dataset to test the robustness of our approach with varying image quality of the input sequence.

We trained the sorting network directly on a sequence of unaligned frames featuring large camera shakes. The training is done using only synthetically blurred images. To fulfill the trichotomy law necessary for the sorting, we pass in each train minibatch the tuple and its reverse

. We use RMSProp

[21] for minimizing the BCE loss function . We leave the optimizer’s default parameters (, ) and the initial learning rate was set to

. The number of epochs and batch size were

and respectively. For kernels generation it was used anxiety parameter to [14] and trajectory length spanning in the range . We further augmented this data with random crops of size

and mirroring in every training iteration. Network initialization was made with normally distributed weights using Xavier’s method

[22].

During the inference we pass a pair of frames and compute the relative ranking as in Algorithm 1 using for the step 3. It was assumed a uniform blur in the images, therefore a tile of size was selected from the input frames for faster evaluation. Only images from the validation set of the MS COCO dataset were used in this phase for the synthetic burst generation.

Iv-a Burst sorting evaluation

For the evaluation of the obtained burst sorting, by means of the comparison function , a sequence disagreement distance was used. Given the images ordinal rank of the sorted burst and the ground truth global score , the disagreement is measured with the weighted Kendall distance, also denoted as [11].

(11)

where is the set of pairs whose orders by and do not agree, and is the worst Bradley-Terry (B-T) score [23]. For comparison with other metrics whose scores have different scales, we use a normalized version of Eq. 11, which is defined as , where is the maximum mismatch generated by comparing against the reversed ground truth ranking. The distance obtained for the opposite rank is and for a random guessing function.

For evaluation purposes, we manually sort a set of synthetically created bursts and compute score from the user labels following the Bradley-Terry model [23], which is widely used for fitting pairwise comparison results to a global ranking. A comprehensive discussion about the B-T model and how to compute a global score given the pairwise comparison labels can be seen in [11][24]. In this work we computed using a quasi-Newton accelerated Minorization-Maximization (MM) algorithm111http://personal.psu.edu/drh20/code/btmatlab/ for the B-T model [24]. Table I shows the ranking disagreement where lower values are better adequacy to the ground truth rank. Here, corresponds to the weighted Kendall distance of the proposed learned function. The FBA Overall Weights Energy () can also be employed as a comparison criterion because it represents the overall image importance for the FBA deblurring process. A set of non-reference metrics: Cumulative Probability of Blur Detection () [25], Gradients of Small magnitudes () [11] and Normalize Sparsity () [26] were also used to rank the burst.

V Results

As can be seen in Table I our ranking function outperforms with high margin the current IQA metrics for recognizing the relative rank of blurred images. The mean execution time for sorting a burst with ten images is approximately seconds for our VGG16-based function. The OWE was second placed but has the limitation that it needs to compute the Fourier Transform for every frame, which is a time-consuming task for high resolutions images.

Burst No. (Ours) [5] [25] [11] [26]
1 0.0003 0.2170 0.0876 0.3261 0.0064
2 0.0074 0.0329 0.1164 0.6196 0.0537
3 0.0120 0.2173 0.5016 0.3997 0.0356
4 0.0000 0.0161 0.0476 0.6323 0.0328
5 0.0083 0.1489 0.2257 0.2930 0.0588
6 0.0051 0.0432 0.7396 0.1713 0.1215
7 0.0066 0.0911 0.1454 0.2961 0.4918
8 0.0058 0.2157 0.0643 0.3250 0.0052
9 0.0222 0.1529 0.1773 0.0499 0.0230
10 0.0156 0.0566 0.2404 0.0788 0.3213
11 0.0075 0.2773 0.2293 0.5058 0.0371
12 0.0027 0.0673 0.2408 0.1960 0.1016
13 0.0244 0.1922 0.6090 0.0514 0.0927
14 0.0221 0.0123 0.3927 0.0140 0.1193
15 0.0093 0.2372 0.0808 0.0413 0.4688
16 0.0007 0.1023 0.2020 0.0243 0.3930
17 0.0002 0.0000 0.0171 0.0638 0.3587
18 0.0015 0.0908 0.0537 0.4191 0.0059
19 0.0117 0.1014 0.5045 0.0117 0.0870
20 0.0663 0.2845 0.2230 0.2786 0.3533
21 0.0046 0.0547 0.0471 0.7965 0.3560
22 0.0000 0.0731 0.2417 0.1545 0.0127
23 0.0089 0.0618 0.3373 0.0117 0.1891
24 0.0190 0.0831 0.0752 0.8395 0.3024
25 0.0049 0.2141 0.2666 0.0331 0.0316
26 0.0000 0.0134 0.0863 0.0132 0.4964
27 0.0017 0.0234 0.1128 0.0187 0.1051
28 0.0099 0.2013 0.3246 0.4804 0.0676
29 0.0052 0.0751 0.3035 0.1438 0.3036
30 0.0323 0.2245 0.3827 0.0722 0.3467
Mean 0.0105 0.1194 0.2359 0.2454 0.1793
TABLE I: Weighted Kendall distance for bursts measuring the agreement to the ground truth of our ranking VGG16-based function, , , and metrics. Lower values corresponds to better ground truth agreement.

For further analysis, we generate a set of bursts, and the ground truth score was defined using Eq. 1. The statistical difference between the proposed ranking function and state-of-the-art metrics was probed with the Friedman test [27] using as significance . The statistical test resulted in a p-value of , strongly rejecting the null hypotheses that all metrics have a similar mean performance. The Nemenyi post-hoc test [28] showed that is significantly different from the other ranking methods, and therefore better. In Fig. 5

it can be seen the resulting boxplots for analyzed metrics reinforcing the obtained statistical results. Note that the small standard deviation suggests low image content dependence, unlike the

feature.

Fig. 5: Boxplot of the synthetic blurred burst ranking with the proposed method and , , and metrics.

Table II shows the execution time for a full burst sorting in variable length real datasets. To further improve this computational performance, either a network model compression or Quicksort as a sorting algorithm could be used. Fig. 6 shows the obtained increasing blur order over the Pueblo Cabo Polonio dataset. The sorting was made in seconds as reported in Table II and it was not used a prior knowledge about the burst order in network training. The set was sorted by hand and the global score was computed as previously by using the B-T model. The resulting weighted Kendall distance was only mistaking the rank of frames 5-6 (first row of the figure), and frames 13-14 (last two images of the second row).

Dataset Burst length Sorting Time (sec)
Anthropologie 8 1.27
Auvers 12 1.31
Bookshelf 10 1.30
Lucky Imaging 12 1.31
Parking Night 10 1.30
Pueblo Cabo Polonio 14 1.33
Tequila 8 1.23
Woods 13 1.32
TABLE II: Sorting time of different length bursts with the proposed VGG16-based function in real images datasets[5].
Fig. 6: Obtained burst sorted with the proposed comparison function over the real images dataset Pueblo Cabo Polonio [5]. Images blur increase from left to right and from top to bottom. The obtained weighted Kendall distance was when compared with the manually sorted burst.

V-a Deblurring bursts with fixed frames number

In this experiment it is analyzed the burst sorting influence in the reconstruction process. As in [7], we test the deblurring method fixing the number of frames used during the aggregation. Fig. 7 shows the obtained results, visually comparing our approach to the FBA. The first five images of the synthetic dataset were sufficient to obtain a perceptually good reconstruction. A clear improvement is noticed even in the earlier steps of the deblurring. Such a fixed frames number approach is equivalent with multi-image deblurring methods with non-varying temporal size [6, 7, 8] for the aggregation burst length. However, different from others, the reconstruction burst is bounded only in our method, but the input burst length remains the same. In the last row of Fig. 7 it is also seen an example of reconstruction on a real image dataset, in which a reconstruction result of quality very close to the final FBA deblurring is obtained within the very first frames from , notably improving the FBA on these steps.

2 Frames 3 Frames 4 Frames 5 Frames
FBA

IFBA (Ours)

FBA

IFBA (Ours)

FBA

IFBA (Ours)

Fig. 7: Fixed size aggregation results using FBA and the proposed IFBA. Only four aggregations steps are executed over for FBA and for IFBA. First four rows are synthetic burst and last two rows corresponds to real dataset from [5].

V-B Reconstruction degradation recognition

To evaluate the robustness of the automatic image selection mechanism, first, we test it over a misaligned frames dataset. A inaccurate registration is usually obtained when images are too much blurred, as a result features extractors are not able to correctly detect fiducial points, like corners or gradients. The used dataset corresponds to a shifted version of the bursts from [5], where half of the frames were randomly selected and shifted. Fig. 8 depicts some obtained results. A severe degradation in the FBA results is observed because the frames misalignment cause the fusion of non-corresponding frequencies. This mistaken aggregation ends in several artifacts and blurred reconstruction. The proposed incremental FBA, however, fuses all frames within the burst that do not cause a deterioration in the final image as expected. As can be seen in the figure our algorithm surpasses literature methods to a great extent.

Our deblurring of misaligned bursts Random shot fastMBD [29] FBA [5] IFBA (Ours)



Fig. 8: Misaligned bursts deblurring of a shifted version of Ref. [5] dataset using fastMBD, FBA and our IFBA.

Another experiment was performed over a dataset of real motion-blurred images. With this purpose, we recreate another common real-life situation where a burst is captured, and a moving object appears in the scene partially or entirely occluding the target object. A total of 27 frames were captured under camera shake and motion blur. The registration was performed using SURF [30] to extract the features and RANSAC [31] for matching, followed by a similarity transformation estimation. Fig. 9 displays an example of the captured frames and the obtained results. It can be observed brightest reconstruction with fastMBD and FBA because all the frames within the burst in the reconstruction process were selected. Ringing artifacts and blurred regions are also seen whenever all images in the burst are considered. Nevertheless, one more time our proposal performs an effective selection of images resulting in a better fusion by using only the first seven frames of the burst.

Frames sample from the motion blur dataset.

fastMBD [29]
FBA [5] IFBA (Ours)


Fig. 9: Deblurring results over a motion blur dataset with partial and full occlusion of the scene. First row shows a numbered sample of the frames within the burst. The obtained reconstructions through our method and other approaches can be observed in the second row.

Vi Conclusions

In this work a new relative ranking method for frames within a burst using a CNN as a comparison function is proposed. An incremental aggregation with reconstruction degradation recognition to fuse images that do not cause a drop in the reconstruction quality is also introduced. We conducted several experiments for validation of the proposed burst sorting and incremental aggregation. It was demonstrated the superiority of our approach when compared to other similar methods in the wild. The burst sorting algorithm shows a good agreement with the ground truth rank, while outperforming by a large margin other literature metrics. We also improved the results over misaligned and out-of-context frames through the use of our incremental aggregation.

Acknowledgment

We thank financial support from the Brazilian funding agencies FACEPE, CAPES and CNPq. This work was supported by the research cooperation project between Motorola Mobility LLC (a Lenovo Company) and the Center for Informatics of the Federal University of Pernambuco. The authors would also like to thank Leonardo Coutinho de Mendonça, Alexandre Cabral Mota and Rudi Minghim for valuable discussions.

References

  • [1] P. Wieschollek, M. Hirsch, B. Schölkopf, and H. P. Lensch, “Learning blind motion deblurring,” in

    IEEE International Conference on Computer Vision (ICCV)

    , vol. 4, 2017.
  • [2] T. Buades, Y. Lou, J.-M. Morel, and Z. Tang, “A note on multi-image denoising,” in IEEE International Workshop on Local and Non-Local Approximation in Image Processing (LNLA), 2009, pp. 1–15.
  • [3] H. Zhang, D. Wipf, and Y. Zhang, “Multi-image blind deblurring using a coupled adaptive sparse prior,” in

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , 2013, pp. 1051–1058.
  • [4] B. Carignan, J.-F. Daneault, and C. Duval, “Quantifying the importance of high frequency components on the amplitude of physiological tremor,” Experimental brain research, vol. 202, no. 2, pp. 299–306, 2010.
  • [5] M. Delbracio and G. Sapiro, “Removing camera shake via weighted Fourier Burst Accumulation,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 3293–3307, 2015.
  • [6] A. Chakrabarti, “A neural approach to blind motion deblurring,” in European Conference on Computer Vision (ECCV), 2016, pp. 221–235.
  • [7] P. Wieschollek, B. Schölkopf, H. P. Lensch, and M. Hirsch, “End-to-end learning for image burst deblurring,” in Asian Conference on Computer Vision (ACCV), 2016, pp. 35–51.
  • [8] M. Noroozi, P. Chandramouli, and P. Favaro, “Motion Deblurring in the Wild,” in German Conference on Pattern Recognition, 2017, pp. 65–77.
  • [9] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [10] S. Zagoruyko and N. Komodakis, “Learning to compare image patches via convolutional neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4353–4361.
  • [11] Y. Liu, J. Wang, S. Cho, A. Finkelstein, and S. Rusinkiewicz, “A no-reference metric for evaluating the quality of motion deblurring.” ACM Transactions on Graphics (TOG), vol. 32, no. 6, pp. 175–1, 2013.
  • [12] S. Su, M. Delbracio, J. Wang, G. Sapiro, W. Heidrich, and O. Wang, “Deep Video Deblurring for Hand-held Cameras,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1279–1288.
  • [13] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in European Conference on Computer Vision (ECCV), 2014, pp. 740–755.
  • [14] G. Boracchi and A. Foi, “Modeling the performance of image restoration from motion blur,” IEEE Transactions on Image Processing, vol. 21, no. 8, pp. 3502–3517, 2012.
  • [15] Y. Li, Z. Wang, G. Dai, S. Wu, S. Yu, and Y. Xie, “Evaluation of realistic blurring image quality by using a shallow convolutional neural network,” in IEEE International Conference on Information and Automation (ICIA), 2017, pp. 853–857.
  • [16] S. Bosse, D. Maniry, K.-R. Müller, T. Wiegand, and W. Samek, “Deep neural networks for no-reference and full-reference image quality assessment,” IEEE Transactions on Image Processing, vol. 27, no. 1, pp. 206–219, 2018.
  • [17] S. Yu, F. Jiang, L. Li, and Y. Xie, “CNN-GRNN for image sharpness assessment,” in Asian Conference on Computer Vision (ACCV), 2016, pp. 50–61.
  • [18] H. Chang, F. Yu, J. Wang, D. Ashley, and A. Finkelstein, “Automatic triage for a photo series,” ACM Transactions on Graphics (TOG), vol. 35, no. 4, p. 148, 2016.
  • [19] B. Wang, N. Vesdapunt, U. Sinha, and L. Zhang, “Real-time Burst Photo Selection Using a Light-Head Adversarial Network,” arXiv preprint arXiv:1803.07212, 2018.
  • [20] J. Anger and E. Meinhardt-Llopis, “Implementation of Local Fourier Burst Accumulation for Video Deblurring,” Image Processing On Line, vol. 7, pp. 56–64, 2017.
  • [21]

    T. Tieleman and G. Hinton, “Lecture 6.5 - RMSProp, COURSERA: Neural networks for machine learning,”

    Technical Report, 2012.
  • [22] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in

    13th International Conference on Artificial Intelligence and Statistics

    , 2010, pp. 249–256.
  • [23] R. A. Bradley and M. E. Terry, “Rank Analysis of Incomplete Block designs: The method of paired comparisons,” Biometrika, vol. 39, no. 3/4, pp. 324–345, 1952.
  • [24] D. R. Hunter et al., “Mm algorithms for generalized bradley-terry models,” The annals of statistics, vol. 32, no. 1, pp. 384–406, 2004.
  • [25] N. D. Narvekar and L. J. Karam, “A no-reference image blur metric based on the cumulative probability of blur detection (CPBD),” IEEE Transactions on Image Processing, vol. 20, no. 9, pp. 2678–2683, 2011.
  • [26] D. Krishnan, T. Tay, and R. Fergus, “Blind deconvolution using a normalized sparsity measure,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 233–240.
  • [27] M. Friedman, “A comparison of alternative tests of significance for the problem of m rankings,” The Annals of Mathematical Statistics, vol. 11, no. 1, pp. 86–92, 1940.
  • [28] P. Nemenyi, “Distribution-free multiple comparisons,” in Biometrics, 1962, pp. 20 005–2210.
  • [29] F. Sroubek and P. Milanfar, “Robust multichannel blind deconvolution via fast alternating minimization,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1687–1700, 2012.
  • [30] H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,” in European Conference on Computer Vision (ECCV), 2006, pp. 404–417.
  • [31] M. A. Fischler and R. C. Bolles, “Random Sample Consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.