A Bayesian Hyperprior Approach for Joint Image Denoising and Interpolation, with an Application to HDR Imaging

06/10/2017 ∙ by Cecilia Aguerrebere, et al. ∙ Duke University Télécom ParisTech 0

Recently, impressive denoising results have been achieved by Bayesian approaches which assume Gaussian models for the image patches. This improvement in performance can be attributed to the use of per-patch models. Unfortunately such an approach is particularly unstable for most inverse problems beyond denoising. In this work, we propose the use of a hyperprior to model image patches, in order to stabilize the estimation procedure. There are two main advantages to the proposed restoration scheme: Firstly it is adapted to diagonal degradation matrices, and in particular to missing data problems (e.g. inpainting of missing pixels or zooming). Secondly it can deal with signal dependent noise models, particularly suited to digital cameras. As such, the scheme is especially adapted to computational photography. In order to illustrate this point, we provide an application to high dynamic range imaging from a single image taken with a modified sensor, which shows the effectiveness of the proposed scheme.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 7

page 8

page 9

page 10

page 11

page 12

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Digital images are subject to a wide variety of degradations, which in most cases can be modeled as

(1)

where is the observation, is the degradation operator, is the underlying ground-truth image and is additive noise. Different settings of the degradation matrix model different problems such as zooming, deblurring or missing pixels. Different versions of the noise term

include the classical additive Gaussian noise with constant variance or more complicated and realistic models such as signal dependent noise.

Due to the inherent ill-posedness of such inverse problems, standard approaches impose some prior on the image, in either variational or Bayesian approaches. Popular image models have been proposed through the total variation [1], wavelet decompositions [2] or the sparsity of image patches [3]. Buades et al. [4] introduced the use of patches and the self-similarity hypothesis to the denoising problem leading to a new era of patch-based image restoration techniques. A major step forward in fully exploiting the potential of patches was achieved by several state-of-the-art restoration methods with the introduction of patch prior models, in a Bayesian framework. Some methods are devoted to the denoising problem [5, 6, 7, 8], while others propose a more general framework for the solution of image inverse problems [9, 10], including inpainting, deblurring and zooming. The work by Lebrun et al. [11, 7] presents a thorough analysis of several recent restoration methods, revealing their common roots and their relationship with the Bayesian approach.

Among the state-of-the-art restoration methods, two noticeable approaches are the patch-based Bayesian approach by Yu et al. [10], namely the piece-wise linear estimators (PLE), and the non-local Bayes (NLB) algorithm by Lebrun et al. [7]. PLE is a general framework for the solution of image inverse problems under Model (1), while NLB is a denoising method (). Both methods use a Gaussian patch prior learnt from image patches through iterative procedures. In the case of PLE, patches are modeled according to a Gaussian Mixture Model (GMM), with a relatively small number of classes (19 in all their experiments), whose parameters are learnt from all image patches111Actually, the authors report the use of image sub-regions in their experiments, so we may consider PLE as a semi-local approach.. In the case of NLB, each patch is associated with a single Gaussian model, whose parameters are computed from similar patches chosen from a local neighbourhood, i.e., a search window centered at the patch. We refer hereafter to this kind of per-patch modelling as local.

Zoran and Weiss [9] (EPLL) follow a similar approach, but instead of iteratively updating the GMM from image patches, they use a larger number of classes that are fixed and learnt from a large database of natural image patches. Wang and Morel [8] claim that, in the case of denoising, it is better to have fewer models that are updated with the image patches (as in PLE) than having a large number of fixed models (as in EPLL).

All of the previous restoration approaches share a common Bayesian framework based on Gaussian patch priors. Relying on local priors [7, 8] has proven more accurate for the task of image denoising than relying on a mixture of a limited number of Gaussian models [9, 10]. In particular, NLB outperforms PLE for this task [12], mostly due to its local model estimation. On the other hand, PLE yields state-of-the-art results in other applications such as interpolation of missing pixels (especially with high masking rates), deblurring and zooming.

As a consequence we are interested in taking advantage of a local patch modelling for more general inverse problems than denoising. The main difficulty lies in the estimation of the models, especially when the image degradations involve a high rate of missing pixels, in which case the estimation is seriously ill-posed.

In this work we propose to model image patches according to a Gaussian prior, whose parameters, the mean and the covariance matrix

, will be estimated locally from similar patches. In order to tackle this problem, we include prior knowledge on the model parameters making use of a hyperprior, i.e. a probability distribution on the parameters of the prior. In Bayesian statistics,

and

are known as hyperparameters, while the prior on them is called a hyperprior. Such a framework is often called hierarchical Bayesian modelling 

[13]. Its application to inverse problems in imaging is not new. In particular, in the field of image restoration, this methodology was proposed by Molina et al. [14, 15], and was more recently applied to image unmixing problems [16] and to image deconvolution and the estimation of the point spread function of a camera [17]. However, to our knowledge, this is the first time that such a hierarchical Bayesian methodology is used to reduce ill-posedness in patch-based image restoration. In this context, the use of a hyperprior compensates for the patches missing information.

There are two main contributions of this work: First, as described above, we propose a robust framework enabling the use of Gaussian local priors on image patches for solving a useful family of restoration problems by drawing on a hierarchical Bayesian approach. The second advantage of the proposed framework is its ability to deal with signal dependent noise, therefore making it adapted to realistic digital photography applications.

Experiments on both synthetic and real data show that the approach is well suited to various problems involving a diagonal degradation operator. First, we show state-of-the-art results in image restoration problems such as denoising, zooming and interpolation of missing pixels. Then we consider the generation of high dynamic range (HDR) images from a single snapshot using spatially varying pixel exposures [18] and demonstrate that our approach significantly outperforms existing methods to deal with this inverse problem. It is worth mentioning that modified sensors enabling such approaches have been recently made available by Sony but are not yet fully exploited by available smartphones and digital cameras.

The article is organized as follows. Section II introduces the proposed approach while Section III presents the main implementation aspects. Supportive experiments are presented in Section IV. Section V is devoted to the application of the proposed framework to the HDR imaging problem. Last, conclusions are summarized in Section VI.

Ii Hyperprior Bayesian Estimator

Fig. 1: Diagram of the proposed iterative approach.

The proposed restoration method, called Hyperprior Bayesian Estimator (HBE), assumes a Gaussian prior for image patches, with parameters and . A joint maximum a posteriori formulation is used to estimate both the image patches and the parameters and , thanks to a Bayesian hyperprior model on these parameters, stabilizing the local estimation of the Gaussian statistics. As a consequence, we can exploit the accuracy of local model estimation for general restoration problems, in particular with missing values (e.g. for interpolation or zooming). Figure 1 illustrates the proposed approach which is described in detail in the following.

Ii-a Patch degradation model

The observed image is decomposed into overlapping patches of size . Each patch

is considered to be a realization of the random variable

given by

(2)

where is a degradation operator, is the original patch we seek to estimate and

is an additive noise term, modeled by a Gaussian distribution

. Therefore, the distribution of given can be written as

(3)

In this noise model, the matrix is only assumed to be diagonal (the noise is uncorrelated). It can represent a constant variance, spatially variable variances or even variances dependent on the pixel value (to approximate Poisson noise).

This degradation model is deliberately generic. We will see in Section IV that keeping a broad noise model is essential to properly tackle the problem of HDR imaging from a single image. The model also includes the special case of multiplicative noise.

Ii-B Joint Maximum A Posteriori

We assume a Gaussian prior for each patch, with unknown mean and covariance matrix , To simplify calculations we work with the precision matrix . As it is usual when considering hyperpriors, we assume that the parameters and follow a conjugate distribution. In our case, that boils down to assuming a Normal-Wishart222

The Normal-Wishart distribution is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix.

denotes the Wishart distribution [19]. prior for the couple (),

(4)

with parameters , , scale parameter and degrees of freedom.

Now, assume that we observe a group of similar patches and that we want to recover the restored patches . If these unknown are independent333We rely on the classical independence assumption made in the patch-based literature, even if it is wrong in case patches overlap. and follow the same Gaussian model, we can compute the joint maximum a posteriori

(5)

In this product, the first term is given by the noise model (3), the second one is the Gaussian prior on the set of patches and the third one is the hyperprior (4). In the last equality we omit the explicit dependence on and in , since these parameters are completely determined by the set .

Ii-C Optimality conditions

Computing the joint maximum a posteriori amounts to minimizing

over the set , with the set of real symmetric positive definite matrices of size .

The function is biconvex respectively in the variables and . To minimize this energy for a given set of hyper-parameters , we will use an alternating convex minimization scheme. At each iteration, is first minimized with respect to with fixed, then viceversa.

Differentiating with respect to each variable, we get explicit optimality equations for the minimization scheme. The proofs of the following propositions are straightforward and available in the supplementary material.

Proposition 1.

Assume that is fixed and that the covariance does not depend on the . The function is convex on and attains its minimum at , given by

(6)
(7)

with .

Proposition 2.

Assume that the variables are fixed. The function is convex on and attains its minimum at such that

(8)

The expression of in (7) is obtained under the hypothesis that the noise covariance matrix does not depend on . Under the somewhat weaker hypothesis that the noise and the signal are uncorrelated, this estimator is also the affine estimator that minimizes the Bayes risk (c.f. supplementary material). The uncorrelatedness of and is a reasonable hypothesis in practice. This includes various noise models, such as with independent of , which approximates CMOS and CCD raw data noise [20].

From (6), we find that the MAP estimator of is a weighted average of two terms: the mean estimated from the similar restored patches and the prior . The parameter controls the confidence level we have on the prior . With the same idea, we observe that the MAP estimator for is a combination of the prior on , the covariance imposed by and the covariance matrix estimated from the patches .

Ii-D Alternating convex minimization of

The previous propositions imply that we can derive an explicit alternating convex minimization scheme for , presented in Algorithm 1. Starting with a given value of , at each step, and are computed according to Equations (6) and  (7), then is updated according to (8).

Input: , ,
Output: ,, ,
1 Initialization: Set
2 for l = 1 to maxIts do
3       Compute by equations (6) and (7) Compute by Eq. (8)
4 end for
, ,
Algorithm 1 Alternating convex minimization for

We show in Appendix A the following convergence result for the previous algorithm. The proof adapts the arguments in [21] to our particular case.

Proposition 3.

The sequence converges monotonically when . The sequence generated by the alternate minimization scheme has at least one accumulation point. The set of its accumulation points forms a connected and compact set of partial optima and stationary points of , and they all have the same function value.

In practice, we observe in our experiments that the algorithm always converges after a few iterations.

Ii-E Full restoration algorithm

The full restoration algorithm used in our experiments is summarized in Algorithm 2 and illustrated by Figure 1. It alternates between two stages: the minimization of using Algorithm 1, and the estimation of the hyper-parameters . In order to estimate these parameters, we rely on an oracle image computed by aggregation of all the patches estimated on the first stage (details are provided in Section III-D).

Input: , , (see details in Section III-C)
Output:
1 Decompose and into overlapping patches.
2 Initialization: Compute first oracle image (see details in Section III-D)
3 for it = 1 to  do
4       for all patches not yet restored do
5             Find patches similar ( distance) to the current in (see details in Section III-A).
6             Compute and from (see details in Section III-C).
7             Compute following Algorithm 1.
8            
9       end for
10      Perform aggregation to restore the image.
11       Set .
12      
13 end for
Algorithm 2 HBE algorithm.

Iii Implementation details

Iii-a Search for similar patches

The similar patches are all patches within a search window centered at the current patch, whose distance to the central patch is less than a given threshold. This threshold is given by a tolerance parameter times the distance to the nearest neighbour (the most similar one). In all our experiments, the search window was set to size (with a patch size of ) and . The patch comparison is performed in an oracle image (i.e. the result of the previous iteration), so all pixels are known. However, it may be useful to assign different confidence levels to the known pixels and to those originally missing and then restored. For all the experimental results presented in Section IV, the distance between patches and in the oracle image is computed as

(9)

where indexes the pixels in the patch, if (known pixel) and otherwise (originally missing then restored pixel) [22]. With this formulation, known pixels are assigned a much higher priority than unknown ones. Variations on these weights could be explored.

Iii-B Optional speed-up by Collaborative Filtering

The proposed method computes one Gaussian model per image patch according to Equations (6) and (8). In order to reduce the computational cost, we can rely on the collaborative filtering idea previously introduced for patch-based denoising techniques [7, 23]. Based on the hypothesis that similar patches share the same model, we assign the same model to all patches in the set of similar patches (as defined in Section III-A).

The number of similar patches jointly restored depends on the image and the tolerance parameter , but it is often much smaller than what would result from the patch clustering performed by methods that use global GMMs such as PLE or EPLL. Performance degradation is observed in practice when using a very large tolerance parameter (), showing that mixing more patches than needed is detrimental. The collaborative filtering strategy helps accelerating the algorithm up to a certain point, but a trade-off with performance needs to be considered.

Iii-C Parameter choices

The four parameters of the Normal-Wishart distribution: , , the prior mean and the prior covariance matrix , must be set in order to compute and .

Choice of and

The computation of according to (6) combines the mean estimated from the similar patches and the prior mean . The parameter is related to the degree of confidence we have on the prior . Hence, its value should be a trade-off between the confidence we have in the prior accuracy vs. the one we have in the information provided by the similar patches. The latter improves when both (i.e. the number of similar patches) and (i.e. the number of known pixels in the current patch) increase. These intuitive insights suggest the following rule to set :

(10)

A similar reasoning leads to the same rule for ,

(11)

where the addition of ensures the condition required by the Normal-Wishart prior to be verified.

This rule is used to obtain the experimental results presented in Section IV, and proved to be a consistently good choice despite its simplicity. However, setting these parameters in a more general setting is not a trivial task and should be the subject of further study. In particular we could explore a more continuous dependence of on , , and possibly a third term where . This term estimates to what an extent similar patches cover the missing pixels in the current patch.

Setting of and

Assuming an oracle image is available (see details in Section II-E), and can be computed using the classical MLE estimators from a set of similar patches taken from

(12)

This is the same approach followed by Lebrun et al. [7] to estimate the patch model parameters in the case of denoising.

Iii-D Initialization

A good initialization is crucial since we aim at solving a non-convex problem through an iterative procedure. To initialize the proposed algorithm we follow the approach proposed by Yu et al. [10] (described in detail in Appendix A in the supplementary material). They propose to initialize the PLE algorithm by learning the GMM covariance matrices from synthetic images of edges with different orientations as well as from the DCT basis to represent isotropic patterns. As they state, in dictionary learning, the most prominent atoms represent local edges which are useful to represent and restore contours. Hence, this initialization helps to correctly restore corrupted patches even in quite extreme cases. The oracle of the first iteration of the proposed approach is the output of the first iteration of the PLE algorithm.

Iii-E Computational complexity

With the original per-patch strategy, the complexity of the algorithm is given by step 3 in Algorithm 1: , so the total complexity is (where total number of patches to be restored and assuming the Cholesky factorization is used for matrix inversion). The collaborative filtering strategy reduces this value by a factor that depends on the number of groups of similar patches, which depends on the image contents and the distance tolerance parameter . The main difference with the PLE algorithm complexity () is a factor given by the number of groups defined by the collaborative filtering approach and the ratio between and . As mentioned by Yu et al. [10], computational complexity can be further reduced in the case of binary masks by removing the zero rows and inverting a matrix of size instead of where is the masking ratio. Moreover, the proposed algorithm can be run in parallel in different image subregions thus allowing for even further acceleration in multiple-core architectures. The complexity comparison to NLB needs to be made in the case where the degradation is additive noise with constant variance (translation invariant degradation), which is the task performed by NLB. In that case, the complexity of the proposed approach (without considering collaborative filtering nor parallelization, which are both done also in NLB), is whereas that of NLB is .

Iv Image Restoration Experiments

In this section we illustrate the ability of the proposed method to solve several image inverse problems. Both synthetic (i.e., where we have added the degradation artificially) and real data (i.e., issued from a real acquisition process) are used. The considered problems are: interpolation, combined interpolation and denoising, denoising, and zooming. The reported values of peak signal-to-noise ratio () are averaged over 10 realizations for each experiment (variance is below 0.1 for interpolation and below 0.05 for combined interpolation and denoising and for denoising only). Similar results are obtained with the structural similarity index (SSIM) which is included in the supplementary material (Appendix B).

Interpolation - PSNR (dB) Interpolation & Denoising - PSNR (dB)
50% 70% 90% 70% 90%
HBE PLE EPLL E-PLE HBE PLE EPLL E-PLE HBE PLE EPLL E-PLE HBE PLE EPLL E-PLE HBE PLE EPLL E-PLE
barbara 39.11 36.93 32.99 35.43 34.69 32.50 27.96 28.77 24.86 23.62 23.30 23.26 33.34 31.99 27.63 27.75 24.57 23.53 23.27 23.20
boat 34.92 34.32 34.21 33.59 31.37 30.74 30.38 30.26 25.96 25.35 24.72 25.43 30.61 30.41 30.15 29.54 25.78 25.45 24.71 25.47
traffic 30.17 30.12 30.19 28.86 27.27 27.12 27.13 26.64 22.84 22.34 21.85 22.27 26.99 26.98 27.05 26.35 22.87 22.43 22.21 22.35
Denoising - PSNR (dB) Zooming - PSNR (dB)
10 30 50 80
HBE NLB EPLL HBE NLB EPLL HBE NLB EPLL HBE NLB EPLL HBE PLE EPLL E-PLE Lanczos
barbara 41.26 41.20 40.56 38.40 38.26 37.32 37.13 36.94 35.84 35.96 35.73 34.51 38.17 37.11 31.34 36.51 28.01
boat 40.05 39.99 39.47 36.71 36.76 36.34 35.41 35.46 35.13 34.30 34.33 34.12 32.35 31.96 31.95 32.08 29.60
traffic 40.73 40.74 40.55 37.03 36.99 36.86 35.32 35.26 35.20 33.78 33.70 33.72 25.05 24.78 25.17 24.91 21.89
TABLE I: Results of the interpolation, combined interpolation and denoising, denoising and zooming tests described in Section IV-A. Patch size of for all methods in all tests. Parameter setting for interpolation, combined interpolation and denoising, and zooming, HBE: , , PLE: , ,  [10], EPLL: default parameters [24], E-PLE: parameters set as specified in [25]. Parameter setting for denoising, HBE: , NLB: code provided by the authors [26] automatically sets parameters from input , EPLL: default parameters for the denoising example [24]

Iv-a Synthetic degradation

Interpolation

Random masks with 50%, 70% and 90% missing pixels are applied to the tested ground-truth images. The interpolation performance of the proposed method is compared to that of PLE [10], EPLL [9] and E-PLE [25] using a patch size of for all methods. PLE parameters are set as indicated in [10] (, , ). We used the EPLL code provided by the authors [24] with default parameters and the E-PLE code available in [25] with the parameters set as specified in the companion demo. The parameters for the proposed method are set to , ( and define the values for and , see Section III-C). The PSNR results are shown in Table I. Figure 2 shows some extracts of the obtained results, the PSNR values for the extracts and the corresponding difference images with respect to the ground-truth. The proposed method gives sharper results than the other considered methods. This is specially noticeable on the reconstruction of the texture of the fabric of Barbara’s trousers shown in the first row of Figure 2 or on the strips that appear through the car’s window shown in the second row of the same figure.

(a) Ground-truth
(b) HBE (30.01 dB)
(c) PLE (26.78 dB)
(d) EPLL (21.12 dB)
(e) E-PLE (23.12 dB)
(f) Ground-truth
(g) HBE (30.20 dB)
(h) PLE (27.89 dB)
(i) EPLL (27.83 dB)
(j) E-PLE (26.79 dB)
Fig. 2: Synthetic data. Interpolation with 70% of randomly missing pixels. Left to right: (first row) Ground-truth (extract of barbara), result by HBE, PLE, EPLL, E-PLE. (second row) input image, difference with respect to the ground-truth of each of the corresponding results. (third and fourth row) Idem for an extract of the traffic image. See Table I for the PSNR results for the complete images. Please see the digital copy for better details reproduction.
Combined interpolation and denoising

For this experiment, the ground-truth images are corrupted with additive Gaussian noise with variance 10, and a random mask with 70% and 90% of missing pixels. The parameters for all methods are set as in the previous interpolation-only experiment. Table I summarizes the PSNR values obtained by each method. Figure 3 shows some extracts of the obtained results, the PSNR values for the extracts and the corresponding difference images with respect to the ground-truth. Once again, the results show that the proposed approach outperforms the others. Fine structures, such as the mast and the ropes of the ship, as well as textures, as in Barbara’s headscarf, are much better preserved.

(a) Ground-truth
(b) HBE (26.20 dB)
(c) PLE (24.76 dB)
(d) EPLL (23.84 dB)
(e) E-PLE (23.60 dB)
(f) Ground-truth
(g) HBE (28.34 dB)
(h) PLE (27.50 dB)
(i) EPLL (27.27 dB)
(j) E-PLE (26.83 dB)
Fig. 3: Synthetic data. Combined interpolation and denoising with 70% of randomly missing pixels and additive Gaussian noise (). Left to right: (first row) Ground-truth (extract of barbara), result by HBE, PLE, EPLL, E-PLE. (second row) input image, difference with respect to the ground-truth of each of the corresponding results. (third and fourth row) Idem for an extract of the boat image. See Table I for the PSNR results for the complete images. Please see the digital copy for better details reproduction.
Denoising

For the denoising task the proposed approach should perform very similarly to the state-of-the-art denoising algorithm NLB [7]. The following experiments are conducted in order to verify this.

The ground-truth images are corrupted with additive Gaussian noise with variance . The code provided by the authors [26] automatically sets the NLB parameters from the input and the patch size, in this case . For this experiment, there are no unknown pixels to interpolate (the mask

is the identity matrix).

The results of both methods are very similar if HBE is initialized with the output of the first step of NLB [7] (instead of using the initialization described in Section II-E) and the parameters and are large enough. In this case, and are prioritized in equations (6) and (8) and both algorithms are almost the same. That is what we observe in practice with , as demonstrated in the results summarized in Table I. The denoising performance of HBE is degraded for small and values. This is due to the fact that and , as well as and in NLB, are computed from an oracle image resulting from the first restoration step. This restoration includes not only the denoising of each patch, but also an aggregation step that greatly improves the final result. Therefore, the contribution of the first term of (6) to the computation of degrades the result compared to that of using only (i.e. using a large ).

Zooming

In order to evaluate the zooming capacity of the proposed approach, ground-truth images are downsampled by a factor 2 (no anti-aliasing filter is used) and the zooming is compared to the ground-truth. The results are compared with PLE, EPLL, E-PLE and Lanczos interpolation. Table I summarizes the obtained PSNR values. Figure 4 shows extracts from the obtained results, the PSNR values for the extracts and the corresponding difference images with respect to the ground-truth. Again, HBE yields a sharper reconstruction than the other methods.

(a) Ground-truth
(b) HBE (38.17 dB)
(c) PLE (37.11 dB)
(d) EPLL (31.34 dB)
(e) E-PLE (36.51 dB)
(f) Lanczos (28.01 dB)
Fig. 4: Synthetic data. Zooming . Left to right: (first row) Ground-truth high resolution image (extract of barbara). Result by HBE, PLE, EPLL, E-PLE, lanczos interpolation. (second row) Input low-resolution image, difference with respect to the ground-truth of each of the corresponding results. Please see the digital copy for better details reproduction.

Iv-B Real data

Fig. 5: Left. Real data. JPEG version of the raw image used in the experiments presented in Section IV-B. The boxes show the extracts displayed in Figure 6. Right. Synthetic data. Ground-truth images used in the experiments presented in Section IV-A. The green boxes indicate the extracts used for the zooming experiments.

For this experiment, we use raw images captured with a Canon 400D camera (set to ISO 400 and exposure time 1/160 seconds). The main noise sources for CMOS sensors are: the Poisson photon shot noise, which can be approximated by a Gaussian distribution with equal mean and variance; the thermally generated readout noise, which is modeled as an additive Gaussian distributed noise and the spatially varying gain given by the photo response non uniformity (prnu[20, 27]. We thus consider the following noise model for the non saturated raw pixel value at position

(13)

where is the camera gain, models the prnu factor, is the exposure time, is the irradiance reaching pixel , and are the readout noise mean and variance. The camera parameters have to be estimated by a calibration procedure [20]. The noise covariance matrix is thus diagonal with entries that depend on the pixel value .

In order to evaluate the interpolation capacity of the proposed approach, we consider the pixels of the green channel only (i.e. 50% of the pixels in the RGGB Bayer pattern) and interpolate the missing values. We compare the results to those obtained using an adaptation of PLE to images degraded with noise with variable variance (PLEV) [28]. The results for the EPLL and E-PLE methods are not presented here since these methods are not suited for this kind of noise. Figure 6 shows extracts of the obtained results (see Figure 5 for a JPEG version of the raw image showing the location of the extracts). As it was already observed in the synthetic data experiments, fine details and edges are better preserved. Compare for example the reconstruction of the balcony edges and the wall structure in the first row of Figure 6, as well as the structure of the roof and the railing in the second row of the same image.



Ground-truth                         HBE                              PLEV                            Bicubic                            Lanczos

Fig. 6: Real data. Zooming . Interpolation of the green channel of a raw image (RGGB). Left to right: Input low-resolution image, result by HBE, PLEV [28], bicubic and lanczos interpolation.

Iv-C Discussion

In all the previous experiments, the results obtained with HBE outperform or are very close to those obtained by the other evaluated methods. Visually, details are better reconstructed and images are sharper.

We interpret this as the inability of a fixed set of patch classes (19 for PLE) to accurately represent patches that seldom appear in the image, such as edges or textures (as in Barbara’s trouser). The fact that methods such as PLE are actually semi-local (classes are estimated on regions [10]) does not solve this issue. On the contrary, a local model estimation as the one performed by HBE correctly handles those cases.

The performance difference is much more remarkable for the higher masking rates. In such cases, the robustness of the estimation is crucial. Indeed the proposed method performs the estimation from similar patches in a local window. The hypothesis of self-similarity being reinforced by considering local neighbourhoods, such a strategy restricts the possible models, therefore making the estimation more robust. Furthermore, the local model estimation, previously shown to be successful at describing patches [7], gives a better reconstruction even when a very large part of the patch is missing.

EPLL uses more mixture components (200 components are learnt from patches of natural images [9]) in its GMM model than PLE. It was observed in [8] that this strategy is less efficient than PLE for the denoising task. In this work, we also observe that the proposed approach outperforms EPLL, not only in denoising, but also in inpainting and zooming. However, here it is harder to tell if the improvement is due to the local model estimation performed from similar patches or to the different restoration strategies followed by these methods.

V High dynamic range imaging from a single snapshot

In this section, we apply the proposed framework to generate HDR images from a single shot. HDR imaging aims at reproducing an extended dynamic range of luminosity compared to what can be captured using a standard digital camera, which is often not enough to produce an accurate representation of real scenes. In the case of a static scene and a static camera, the combination of multiple images with different exposure levels is a simple and efficient solution [29, 30, 27]. However, several problems arise when either the camera or the elements in the scene move [31, 32].

An alternative to the HDR from multiple frames is to use a single image with spatially varying pixel exposures (SVE) [18]. An optical mask with spatially varying transmittance is placed adjacent to a conventional image sensor, thus controlling the amount of light that reaches each pixel (see Figure 7[18, 33, 34].

The greatest advantage of this acquisition method is that it avoids the need for image alignment and motion estimation. Another advantage is that the saturated pixels are not organized in large regions. Indeed, some recent multi-image methods tackle motion problems by taking a reference image and then by estimating motion or reconstructing the image relative to this reference [35, 31]. A problem encountered by these approaches is the need to inpaint very large saturated and underexposed regions in the reference frame. The SVE acquisition strategy avoids this problem since, in general, all scene regions are sampled by at least one of the exposures.

Taking advantage of the ability of the proposed framework to simultaneously estimate missing pixels and denoise well-exposed ones, we propose a novel approach to generate HDR images from a single shot acquired with spatially varying pixel exposures. The proposed approach shows significant improvements over existing methods.

V-a Spatially varying exposure acquisition model

An optical mask with spatially varying transmittance is placed adjacent to a conventional image sensor to give different exposure levels to the pixels. This optical mask does not change the acquisition process of the sensor. Hence, the noise model (13) can be adapted to the SVE acquisition by including the per-pixel SVE gain 444Some noise sources not modeled here, such as blooming, might have an impact in the SVE acquisition strategy and should be considered in a more accurate image modeling.:

(14)

In the approach proposed by Nayar and Mitsunaga [18], the varying exposures follow a regular pattern. Motivated by the aliasing problems of regular sampling patterns, Schöberl et al. [36] propose to use spatially varying exposures on a non-regular pattern. Figure 7 shows examples of both acquisition patterns. This observation led us to choose the non-regular pattern in the proposed approach.



Input                    Ground-truth                     HBE                        PLEV                 Schöberl et al.       Nayar & Mitsunaga
Fig. 7: Synthetic data. Left: (top) Tone mapped version of the ground-truth image used for the experiments in Section V-C1. (bottom) Regular (left) and non-regular (right) optical masks for an example of 4 different filters. Right: Results for extracts 1 and 6. From left to right: Input image with random pattern, ground-truth, results by HBE, PLEV [28], Schöberl et al. [36], Nayar and Mitsunaga [18]. 50% missing pixels (for both random and regular pattern). See PSNR values for these extracts in Table II. Please see the digital copy for better details reproduction.

V-B Hyperprior Bayesian Estimator for Single Shot High Dynamic Range Imaging

V-B1 Problem statement

In order to reconstruct the dynamic range of the scene we need to solve an inverse problem. We want to estimate the irradiance image from the SVE image , knowing the exposure levels of the optical mask and the camera parameters. For this purpose we map the raw pixel values to the irradiance domain with

(15)

We take into account the effect of saturation and under-exposure by introducing the exposure degradation matrix , a diagonal matrix given by

(16)

with equal to the pixel saturation value. From (14) and (16), can be modeled as

(17)

Notice that (17) is the distribution of for a given exposure degradation factor , since is itself a random variable that depends on . The exposure degradation factor must be included in (17) since the variance of the over or under exposed pixels no longer depends on the irradiance but is only due to the readout noise . From (17) we have

(18)

where is zero-mean Gaussian noise with diagonal covariance matrix given by

(19)

Then the problem of irradiance estimation can be stated as retrieving from , which implies denoising the well-exposed pixel values () and estimating the unknown ones ().

V-B2 Proposed solution

From (18), image is under the hypothesis of the HBE framework introduced in Section II. The proposed HDR imaging algorithm consists of the following steps: 1) generate from according to (16), 2) obtain from according to (15), 3) apply the HBE approach to with the given and .

V-C Experiments

The proposed reconstruction method was thoroughly tested in several synthetic and real data examples. A brief summary of the results is presented in this section.

V-C1 Synthetic data

Sample images are generated according to Model (18) using the HDR image in Figure 7 as the ground-truth. Both a random and a regular pattern with four equiprobable exposure levels are simulated. The exposure time is set to seconds and the camera parameters are those of a Canon 7D camera set to ISO 200 (, , , [27].

Figure 7 shows extracts of the results obtained by the proposed method, by PLEV [28] (basically an adaptation of PLE to the same single image framework) and by Schöberl et al. [36] for the random pattern and by Nayar et Mitsunaga [18] using the regular pattern. The percentage of unknown pixels in the considered extracts is 50% (it is nearly the same for both the regular and non-regular pattern). Table II shows the PSNR values obtained in each extract marked in Figure 7. The proposed method manages to correctly reconstruct the irradiance on the unknown pixels. Moreover, its denoising performance is much better than that of Schöberl et al. and Nayar and Mitsunaga, and still sharper than PLEV.

PSNR (dB)
1 (Fig. 7) 2 (Fig. 7) 3 4 5 6
HBE 33.08 33.87 22.95 35.10 36.80 35.66
PLEV 29.65 30.82 22.77 33.99 36.42 34.73
Schöberl et al. 30.38 31.16 21.39 30.04 32.84 31.02
Nayar and Mitsunaga 29.39 30.10 23.24 25.83 30.26 26.90
TABLE II: PSNR values for the extracts shown in Figure 7.




Fig. 8: Real data. Left: Tone mapped version of the HDR image obtained by the proposed approach and its corresponding mask of unknown (black) and well-exposed (white) pixels. Right: Comparison of the results obtained by the proposed approach (first row) and PLEV (second row) in the extracts indicated in the top image. Please see the digital copy for better details reproduction.

V-C2 Real data

The feasibility of the SVE random pattern has been shown in [34] and that of the SVE regular pattern in [33]. Nevertheless, these acquisition systems are still not available for general usage.555 While writing the last version of this article the authors got aware of Sony’s latest sensor IMX378. This sensor has a special mode called SME-HDR, which is a variation of the SVE acquisition principle. Whereas this sensor was adopted by the Google Pixel smartphone in 2016, the special SME-HDR mode is never activated by the Google Pixel phone, according to experts from the company DxO, and we found no way to activate it and access the raw image. However, as stated in Section V-A, the only variation between the classical and the SVE acquisition is the optical filter. Hence, the noise at a pixel captured using SVE with an optical gain factor and exposure time and a pixel captured with a classical camera using exposure time should be very close. We take advantage of this fact in order to evaluate the reconstruction performance of the proposed approach using real data.

For this purpose, we generate an SVE image from four raw images acquired with different exposure times. The four different exposure times simulate four different filters of the SVE optical mask. The value at position in is chosen at random among the four available values at that position . Notice that the Bayer pattern is kept on by construction. The images are acquired using a remotely controlled camera and a tripod so as to be perfectly aligned. This protocol does not allow us to take scenes with moving objects. Let us emphasize, however, that using a real SVE device, this, as well as the treatment of moving camera, would be a non-issue.

Figures 8 and 9 show the results obtained from two real scenes, together with the masks of well-exposed (white) and unknown (black) pixels (the SVE raw images are included in Appendix B in the supplementary material). Recall that among the unknown pixels, some of them are saturated and some of them are under exposed. Square patches of size 6 and 8 were used for the examples in Figure 9 and Figure 8 respectively. Demosaicing [37] and tone mapping [38] are used for displaying purposes.

We compare the results to those obtained by PLEV [28]. A comparison against the methods by Nayar and Mitsunaga and Schöberl et al. is not presented since they do not specify how to treat raw images with a Bayer pattern. The proposed method manages to correctly reconstruct the unknown pixels even in extreme conditions where more than of the pixels are missing, as for example the last extract in Figure 9.

These examples show the suitability of the proposed approach to reconstruct the irradiance information in both very dark and bright regions simultaneously. See for instance the example in Figure 9, where the dark interior of the building (which can be seen through the windows) and the highly illuminated part of another building are both correctly reconstructed (see the electronic version of the article for better visualization).




Fig. 9: Real data. Left: Tone mapped version of the HDR image obtained by the proposed approach and its corresponding mask of unknown (black) and well-exposed (white) pixels. Right: Comparison of the results obtained by the proposed approach (first row) and PLEV (second row) in the extracts indicated in the top image. Please see the digital copy for better details reproduction.

Vi Conclusions

In this work we have presented a novel image restoration framework. It has the benefits of local patch characterization (that was key to the success of NLB as a state of the art denoising method), but manages to extend its use to more general restoration problems where the linear degradation operator is diagonal, by combining local estimation with Bayesian restoration based on hyperpriors. This includes problems such as zooming, inpainting and interpolation. In this way, all these restoration problems are set under the same framework. It does not include image deblurring or deconvolution, since the degradation operator is no longer diagonal. Correctly addressing deconvolution with large kernels with patch-based approaches and Gaussian prior models is a major challenge that will be the subject of future work.

We have presented a large series of experiments both on synthetic and real data that confirm the robustness of the proposed strategy based on hyperpriors. These experiments show that for a wide range of image restoration problems HBE outperforms several state-of-the-art restoration methods.

This work opens several perspectives. The first one concerns the relevance of the Gaussian patch model and its relation to the underlying image patches manifold. If this linear approximation has proven successful for image restoration, its full relevance in other areas remains to be explored, especially in all domains requiring to compare image patches. Another important related question is the one of the estimation of the degradation model in images jointly degraded by noise, missing pixels, blur, etc. Restoration approaches generally rely on the precise knowledge of this model and of its parameters. In practice however, we often deal with images for which the acquisition process is unknown, and that have possibly been affected by post-treatments. In such cases, blind restoration remains an unsolved challenge.

Finally, we have presented a novel application of the proposed general framework to the generation of HDR images from a single variable exposure (SVE) snapshot. The SVE acquisition strategy allows the creation of HDR images from a single shot without the drawbacks of multi-image approaches, such as the need for global alignment and motion estimation to avoid ghosting problems. The proposed method manages to simultaneously denoise and reconstruct the missing pixels, even in the presence of (possibly complex) motions, improving the results obtained by existing methods. Examples with real data acquired in very similar conditions to those of the SVE acquisition show the capabilities of the proposed approach.

Appendix A Alternate minimization scheme convergence

We study in the following the convergence of the alternate minimization algorithm 1. To show the main convergence result, we need the following lemma

Lemma 1.

The function is coercive on .

Proof.

We need to show that

Now, if and only if or or .

The matrix being positive-definite, the terms and are both positive. Thus

Now, this function of is convex and coercive on , which implies that as soon as . It also follows that the previous function of has a global minimum that we denote by . We can now write

and this function of clearly tends towards as soon as or . ∎

We now show the main convergence result for our alternate minimization algorithm. The proof adapts the arguments in [21] to our case.

Proposition 3.

The sequence converges monotonically when . The sequence generated by the alternate minimization scheme has at least one accumulation point. The set of its accumulation points forms a connected and compact set of partial optima and stationary points of , all having the same function value.

Proof.

The sequence obviously decreases at each step by construction. The coercivity and continuity of imply that this sequence is also bounded from below, and thus converges. The convergence of implies that the sequence is bounded. It follows that it has at least one accumulation point and that there exists a strictly increasing sequence of integers such that converges to .

Now, we can show that such an accumulation point is a partial optimum of , i.e. that attains its minimum at and attains its minimum at . By construction,

which implies by continuity of that

(20)

Let us denote with

The alternate minimization scheme consists in updating at each iteration. From Equations (6), (7), and (8), we see that is explicit and continuous. Since is strictly increasing, for each , . The sequence decreases, so

Therefore, as , since is continuous, it follows that

Now, writing , we get

We can conclude that all these terms are equal and in particular

(21)

From (20) and (21) we deduce that the accumulation point is a partial optimum of and since is differentiable, it is also a stationary point of . Moreover, since is strictly convex and has a unique minimum, it follows from (21) that . As a consequence, . Therefore, the accumulation point (and actually any accumulation point of the sequence) is also a fixed point of function .

We have shown that accumulation points of the sequence are partial optima of and fixed points of the function . The set of accumulation points is obviously compact. Let us show that it is also a connected set. First, observe that whatever the norm , the sequence converges to when . If it was not the case, it would be possible to extract a subsequence converging to an accumulation point while converges to a different accumulation point , but we know that it is impossible since would also tend toward . The sequence being bounded and such that converges to , the set of its accumulation points is connected (see [39]). The fact that all accumulation points have the same function value is obvious since the sequence decreases. ∎

Acknowledgement

We would like to thank the reviewers for their thorough and fruitful contributions, as well as Mila Nikolova and Alasdair Newson for their insightful comments and the authors of [26, 25, 24] for kindly providing their code. This work has been partially funded by the French Research Agency (ANR) under grant nro ANR-14-CE27-001 (MIRIAM), and by the “Investissement d’avenir” project, reference ANR-11-LABX-0056-LMH.

References

  • [1] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, no. 1, pp. 259–268, 1992.
  • [2] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Image denoising using scale mixtures of gaussians in the wavelet domain,” IEEE Transactions on Image processing, vol. 12, no. 11, pp. 1338–1351, 2003.
  • [3] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Transactions on Image processing, vol. 15, no. 12, pp. 3736–3745, 2006.
  • [4] A. Buades, B. Coll, and J. M. Morel, “A Review of Image Denoising Algorithms, with a New One,” SIAM Multiscale Modeling & Simulation, vol. 4, no. 2, pp. 490–530, 2005.
  • [5] S. Lyu and E. Simoncelli, “Modeling Multiscale Subbands of Photographic Images with Fields of Gaussian Scale Mixtures,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, pp. 693–706, 2009.
  • [6] P. Chatterjee and P. Milanfar, “Patch-Based Near-Optimal Image Denoising,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1635–1649, 2012.
  • [7] M. Lebrun, A. Buades, and J. Morel, “A nonlocal bayesian image denoising algorithm,” SIAM Journal on Imaging Sciences, vol. 6, no. 3, pp. 1665–1688, 2013.
  • [8] Y.-Q. Wang and J.-M. Morel, “SURE Guided Gaussian Mixture Image Denoising,” SIAM Journal on Imaging Sciences, vol. 6, no. 2, pp. 999–1034, 2013.
  • [9] D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in

    Proceedings of IEEE International Conference on Computer Vision (ICCV)

    , 2011, pp. 479–486.
  • [10] G. Yu, G. Sapiro, and S. Mallat, “Solving inverse problems with piecewise linear estimators: From gaussian mixture models to structured sparsity,” IEEE Transactions on Image Processing, vol. 21, no. 5, pp. 2481–2499, 2012.
  • [11] M. Lebrun, M. Colom, A. Buades, and J. Morel, “Secrets of image denoising cuisine,” Acta Numerica, vol. 21, no. 1, pp. 475–576, 2012.
  • [12] Y.-Q. Wang, “The Implementation of SURE Guided Piecewise Linear Image Denoising,” Image Processing On Line, vol. 3, pp. 43–67, 2013.
  • [13] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian data analysis.   Chapman & Hall/CRC Boca Raton, FL, USA, 2014, vol. 2.
  • [14] R. Molina, “On the hierarchical bayesian approach to image restoration: applications to astronomical images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 11, pp. 1122–1128, 1994.
  • [15] R. Molina, A. K. Katsaggelos, and J. Mateos, “Bayesian and regularization methods for hyperparameter estimation in image restoration,” IEEE Transactions on Image Processing, vol. 8, no. 2, pp. 231–246, 1999.
  • [16] N. Dobigeon, J.-Y. Tourneret, and C.-I. Chang, “Semi-supervised linear spectral unmixing using a hierarchical bayesian model for hyperspectral imagery,” IEEE Transactions on Signal Processing, vol. 56, no. 7, pp. 2684–2695, 2008.
  • [17] F. Orieux, J. Giovannelli, and T. Rodet, “Bayesian estimation of regularization and point spread function parameters for wiener–hunt deconvolution,” J. Opt. Soc. Am. A, vol. 27, no. 7, pp. 1593–1607, Jul 2010.
  • [18] S. Nayar and T. Mitsunaga, “High Dynamic Range Imaging: Spatially Varying Pixel Exposures,” in

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , vol. 1, Jun