Higher-order MRFs based image super resolution: why not MAP?

10/27/2014 ∙ by Yunjin Chen, et al. ∙ TU Graz 0

A trainable filter-based higher-order Markov Random Fields (MRFs) model - the so called Fields of Experts (FoE), has proved a highly effective image prior model for many classic image restoration problems. Generally, two options are available to incorporate the learned FoE prior in the inference procedure: (1) sampling-based minimum mean square error (MMSE) estimate, and (2) energy minimization-based maximum a posteriori (MAP) estimate. This letter is devoted to the FoE prior based single image super resolution (SR) problem, and we suggest to make use of the MAP estimate for inference based on two facts: (I) It is well-known that the MAP inference has a remarkable advantage of high computational efficiency, while the sampling-based MMSE estimate is very time consuming. (II) Practical SR experiment results demonstrate that the MAP estimate works equally well compared to the MMSE estimate with exactly the same FoE prior model. Moreover, it can lead to even further improvements by incorporating our discriminatively trained FoE prior model. In summary, we hold that for higher-order natural image prior based SR problem, it is better to employ the MAP estimate for inference.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Markov Random Fields (MRFs) based models have a long history in low-level computer vision problems, which treat the image as a random field

[6]

. It is well-known that MRFs are particularly effective for image prior modeling in image processing. In a MRF-based image prior model, the probability of a whole image is defined based on the potential (or energy) of the overlapping local cliques.

An elegant MRF-based image prior model, called Fields of Experts (FoE) was recently proposed by Roth and Black [8]. The proposed FoE model is defined by (1) a heavy-tailed potential function, which is derived from the observation that the filter response of natural images exhibit heavy-tailed distribution when applying derivative filters onto them, (2) a set of linear filters, which are trained from image samples.

Due to its effectiveness of the FoE image prior model for many image restoration problems, many works have been devoted to the FoE-based image restoration problems, such as image denoising, inpainting, deblurring, etc [8, 9, 1]. Usually, there are two ways to investigate the learned FoE prior model for specific image restoration problems, the sampling-based MMSE estimation, such as [10, 4, 15, 16], and the energy minimization based MAP estimation, such as [8, 9, 1, 2].

It is [10] that for the first time claimed that the MMSE estimation can lead to better performance compared to the MAP estimation for the image denoising task with their learned FoE image prior model. After that, many works follow their suggestion to make use of the MMSE estimation for FoE related models, such as image deblurring [11, 17], image denoising [4], depth estimation [13, 5], image separation [15] and single image super resolution [16].

In a recent paper [16], the FoE prior model was exploited in the context of image super resolution. The authors also proposed to employ the MMSE estimate in the inference procedure instead of the MAP estimate. With the MMSE estimate, the FoE-based SR model demonstrates a state-of-the-art SR algorithm. However, it is well known that the sampling based approach is very time consuming, alluding to the fact that the FoE-based SR model is not appealing for practical applications.

It is generally true that the MMSE estimate is a better alternative than MAP, as it can exploit the uncertainty of the model, especially in the case of multimodal distribution with multiple peaks. However, in practice it is usually hard to find an accurate solution for the MMSE estimate due to the difficulty of taking the expectations over entire images. As a consequence, the MAP inference, which seeks the maximum peak, might have the possibility to work equally well for some problems.

In this letter, we evaluate the performance of the MAP inference for the FoE based SR problem. Our experimental results demonstrate that the MAP inference of the FoE-based SR model has been underestimated in the previous work [16]. Numerical results show that with exactly the same image prior model exploited in the MMSE estimation, the MAP inference can achieve equivalent performance in terms of both quantitative measurements (PSNR and SSIM values) and visual perception quality. In addition, the MAP inference can obtain further improvements with the discriminatively trained FoE image prior of the same model capacity. It is clear that the MAP inference has a significant advantage of efficiency, and this advantage is even more remarkable with our recently proposed non-convex optimization algorithm - iPiano [7].

To sum up, our experimental findings suggest us to exploit the MAP inference for solving the FoE prior-based image super resolution problem, because (1) there is no performance loss by using this simpler inference criterion, and (2) the MAP inference has an apparent advantage of high efficiency.

Ii MAP inference of FoE image prior based SR

In a typical image super resolution task, the low-resolution (LR) image is generated from a high-resolution (HR) image using the following formulation

where and is the HR and LR image, respectively. is the matrix corresponding to the blurring operation and () signifies the down-sampling operation.

is the noise (typically assumed to be Gaussian white noise with level

).

The FoE image prior based SR model is formulated by the following Bayesian probabilistic model

(II.1)

where is the probability density of an image under the FoE framework, written as

where is the maximal cliques, is the number of the filters, refers to the -th pixel in the filtered image by , is the potential function with associated weights . In [16], the potential function is given by the Gaussian scale mixtures (GSMs) as

(II.2)

where are the normalized weights of the Gaussian component with scale

and base variance

.

According to the posterior (II.1), [16] used the sampling-based MMSE estimation to recover the underlying HR image . In this letter, we consider the MAP estimate. With the MAP estimation, the FoE-based SR task is formulated as the following energy minimization problem

(II.3)

where with penalty function defined in (II.2).

Gradient-based algorithms are applicable to solve the minimization problem (II.3). First, we need to calculate the gradient , which is given as

(II.4)

where a highly sparse matrix, implemented as 2D convolution of the image with filter kernel , i.e., , , with .

In our work, we consider a newly developed non-convex optimization - iPiano [7] to solve the above minimization problem, instead of the commonly used conjugate gradient (CG) algorithm. We find that the iPiano algorithm is significantly faster than CG. We refer the interested readers to [7] for more details about the iPiano algorithm.

Iii Experimental results

We mainly conducted two types of experiments. The first type is to perform a direct comparison between the MAP estimate and the MMSE estimate for the FoE based SR task. The second type is to compare the MAP based SR model to very recent state-of-the-art SR approaches. The corresponding implementations are all from publicly available codes provided by the authors, and are used as is.

Iii-a Comparison between the MAP and MMSE estimate

In order to conduct a fair comparison with the MMSE estimation, we first considered the MAP estimation with exactly the same image prior model exploited in [16] (8 filters of size with GSMs potential). We repeated the experiments presented in the TABLE I of [16], where eight noise-free images were upsampled with a zooming factor of 3. The results of the MMSE and MAP estimates are shown in Table I. One can see that the MAP estimate using the same image prior model performs equally well compared to the MMSE estimate, in terms of PSNR and SSIM index111 Note that we were not able exactly reproduce the results presented in [16] due to the randomness of the sampling-based approach. We actually achieved slightly different results. .

House Peppers Cameraman Barbara Lena Boat Hill Couple
MMSE with prior (II.2) 31.73/88.85 25.94/90.94 26.26/83.43 25.55/74.44 32.93/90.34 29.32/83.32 30.28/81.83 28.47/80.34
MAP with prior (II.2) 32.25/89.03 25.86/89.50 25.91/82.20 25.65/75.41 33.16/90.97 29.10/83.53 30.71/82.59 28.41/80.55

 

MAP with prior (III.1) 32.72/89.61 26.62/91.43 26.69/84.66 25.71/75.71 33.52/91.44 29.48/84.47 31.14/83.77 28.77/81.91
TABLE I: SR () result comparison between the MMSE and MAP estimate (PSNR/SSIM). Better results of the first two rows (with the same FoE prior) are colored with blue. The best results are highlighted in bold.

We then exploited a discriminatively trained FoE prior for the MAP-based SR model to further investigate its performance. The discriminatively trained FoE prior has the same model capacity, and is directly optimized based on the MAP estimate in the context of Gaussian denoising. We employed the Student-t based FoE model trained in our previous work [2], which is defined as

(III.1)

where the penalty function is given as the Lorentzian function shown in Figure 1(b), and is the weight of the corresponding filter . The corresponding filters are shown in Figure 1(a).

The results of the MAP-based SR model with this discriminatively trained FoE prior (III.1) are also shown in Table I. One can see that the MAP inference with our discriminatively trained FoE model improves the PSNR and SSIM results. An illustrative example is presented in Figure 2.

Methods House Peppers Cameraman
1 MMSE with prior (II.2) 31.26/87.74 25.69/88.83 26.13/82.40
MAP with prior (II.2) 31.66/87.30 25.87/87.93 25.72/80.91
MAP with prior (III.1) 32.03/87.55 26.23/89.24 26.17/81.96

 

2 MMSE with prior (II.2) 30.47/85.80 25.23/86.00 25.67/80.20
MAP with prior (II.2) 30.84/85.62 25.38/85.75 25.24/78.82
MAP with prior (III.1) 31.25/85.87 25.49/86.97 25.69/79.68

 

3 MMSE with prior (II.2) 29.33/83.21 24.54/82.32 24.94/77.04
MAP with prior (II.2) 30.30/84.55 24.88/83.80 24.65/76.53
MAP with prior (III.1) 30.59/84.63 25.10/85.19 25.26/77.97
TABLE II: Noisy image SR () result comparison between the MMSE and MAP estimate (PSNR/SSIM). Better results of the first two rows (with the same FoE prior) are colored with blue. The best results are highlighted in bold.
Fig. 1: Discriminatively trained FoE prior with the Lorentzian penalty function. (a) the learned filters. The first number in the bracket is weight and the second one is the norm of the filter . (b) the corresponding Lorentzian penalty function derived from the Student-t distribution.
(a) Bicubic (26.62/80.20)
(b) MMSE with (II.2) (32.93/90.34)
(c) MAP with (II.2) (33.16/90.97)
(d) MAP with (III.1) (33.52/91.44)
Fig. 2: Single image SR results for the noise-free “Lena” image () with different algorithms ( evaluated in terms of PSNR and SSIM () index).

We also evaluated the performance of the MAP inference in the presence of noise. For the cases of mild Gaussian noise, the results of the MAP inference with two different FoE image prior models are shown in Table II, together with the results of the MMSE based model. Note that this is a direct comparison to TABLE III of [16]. Again, one can see that the MAP estimate with the same FoE model (i.e., (II.2)) works equally well, and it leads to better results with our discriminatively trained FoE prior (III.1).

For the MAP estimate based SR model (II.3), we need to search an optimal for each case. For the noise-free image SR task, we use a relative large , and for the SR tasks with Gaussian noise, we find the following empirical choice (1) , (2) , and (3) , generally works well.

Run time: We run the inference algorithms on a server with Inter(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz. For the SR task of upsampling an image of size to the size of , the average computation time per iteration of the MMSE-based algorithm is 87s. Typically, the MMSE estimate takes 100 iterations, and therefore for this SR task, it requires about 2.4h, making this approach hardly appealing for practical application.

In contrast, the MAP inference is much faster. The average computation time per iteration of the MAP inference is 0.039s in the case of the Student-t based FoE prior (III.1)222 With the same model capacity of 8 filters of size . Typically, it takes 150200 iterations to solve the resulting non-convex minimization problem333Also note that the required iterations is dramatically reduced by using the iPiano algorithm, compared to the usual CG algorithm used in previous works, such as [8, 10], where the iterative algorithm has to run 5000 iterations. . As a consequence, the MAP inference with the Student-t based FoE prior is able to accomplish the same SR task in 7s, which is dramatically faster than the MMSE inference (2.4h). Implementation will be available at our homepage (www.GPU4Vision.org) after acceptance.

Moreover, as demonstrated in our previous works [1, 2], the MAP inference of the FoE prior based models can be easily implemented on GPU for parallel computation, which can generally obtain an approximate speedup factor of .

Iii-B Comparison to state-of-the-art SR approaches

In order to conduct a comprehensive evaluation for the MAP based SR model, we further compared it with very recent state-of-the-art SR approaches: the K-SVD based method [14], the ANR (Anchored Neighborhood Regression) based method [12] and deep convolutional network based method - SRCNN [3]. In order to perform a fair comparison with these methods, we strictly obey the same test protocols as in [12]. We used the same test sets - Set14 and Set5 to evaluate the upscaling factor of 3. For the MAP based SR model, we incorporated a FoE prior model with larger filter size and more filters (shown in Figure 3, 48 filters of size ), which is trained in [2]. Replacing the FoE prior model show in Figure 1 with this new FoE model having increased model capacity can improve the performance of the MAP based SR model.

The SR results on Set14 and Set5 are summarized in Table III. We can see that the FoE based SR model with filters of size achieves similar average PSNR as the SRCNN method, and outperforms other competing algorithms. A visual example is shown in Figure 4444Following [3]

, we only consider the luminance channel (in YCrCb color space) in our experiments. The two chrominance channels are directly upsampled using the bicubic interpolation for the purpose of display.

. In the highlighted region, one can see that our SR method achieve much clear edges than other approaches. In summary our model obtains strongly competitive quality performance to very recent state-of-the-art SR methods.

Fig. 3: 48 learned filters of size exploited in our MAP based SR model.
Set14 images Bicubic K-SVD ANR SR-CNN
baboon 23.21 23.52 23.56 23.60 23.58
barbara 26.25 26.76 26.69 26.66 26.43
bridge 24.40 25.02 25.01 25.07 25.13
coastguard 26.55 27.15 27.08 27.20 27.25
comic 23.12 23.96 24.04 24.39 24.26
face 32.82 33.53 33.62 33.58 33.70
flowers 27.23 28.43 28.49 28.97 28.84
foreman 31.18 33.19 33.23 33.35 33.83
lenna 31.68 33.00 33.08 33.39 33.31
man 27.01 27.90 27.92 28.18 28.15
monarch 29.43 31.10 31.09 32.39 31.88
pepper 32.39 34.07 33.82 34.35 34.30
ppt3 23.71 25.23 25.03 26.02 26.42
zebra 26.63 28.49 28.43 28.87 26.81
average 27.54 28.67 28.65 29.00 28.99
Set5 images Bicubic K-SVD ANR SR-CNN
baby 33.91 35.08 35.13 35.01 35.10
bird 32.58 34.57 34.60 34.91 35.07
butterfly 24.04 25.94 25.90 27.58 26.79
head 32.88 33.56 33.63 33.55 33.72
woman 28.56 30.37 30.33 30.92 30.79
average 30.39 31.90 31.92 32.39 32.29
TABLE III: Scale performance in terms of PSNR on the Set14 and Set5 datasets. We compare the results of MAP based SR model (48 filters of size ) to three very recent state-of-the-art SR methods: ANR [12], K-SVD [14] and SR-CNN [3].
(a) Original / PSNR
(b) Bicubic / 23.71dB
(c) ANR / 25.03dB
(d) K-SVD / 25.23dB
(e) SR-CNN / 26.02dB
(f) / 26.42dB
Fig. 4: Visual qualitative assessment and quantitative results for “ppt3” image from Set14 with magnification (best viewed on screen). Note the difference in the highlighted region with red rectangle.

Iv Discussion and Conclusion

In the context of higher-order MRF based models, it is generally true that the MAP estimate, which only seeks for the posterior mode, could not generally exploit the full potential offered by the probabilistic modeling, while the MMSE estimate, which directly draw samples from the probability model, should be more powerful. On the other hand, it is well-known that the sampling based MMSE estimation is very slow, making the corresponding methods hardly appealing for practical applications if one has to stick to the MMSE inference.

In this letter, we have concentrated on the higher-order MRFs based SR problem, and evaluated the performance of the MAP estimate in inference. We found that the MAP estimate can work equally well compared to MMSE in the presence of the same FoE prior, despite of the non-convexity of the resulting optimization problem. We believe the reason is two-folds: first, the exploited iPiano algorithm which is an effective non-convex optimization algorithm, helps us reach the MAP mode in a short time; secondly, in practice one is not able to obtain an accurate solution for the MMSE estimate. In addition, we found that the performance of MAP estimate can be further boosted by using discriminatively trained FoE prior models. As a consequence, the resulting model, which involves 48 filters of size can lead to strongly competitive results to very recent state-of-the-art SR methods. Therefore, concerning the higher-order MRFs based SR task, we suggest to exploit the MAP estimate for inference because there is no performance loss by using this simpler inference criterion while it has an obvious advantage of high efficiency.

Furthermore, it is notable to point out that the findings about the MAP estimate presented in this letter strengthen our arguments drawn based on the Gaussian denoising problem in our previous works [1, 2]. We have show in [1, 2] that the MAP-based denoising model with our discriminatively trained FoE prior leads to the best results among the MRF-based systems, including MMSE based models. Therefore, we believe that MAP-based denoising model does not perform well in previous works, e.g., [10, 9] just because they have not obtained a good FoE prior well-suited for the MAP inference.

In summary, we believe that in the context of higher-order MRF image prior based modeling for image restoration problems, it is a better choice to make use of the MAP estimate, together with the discriminatively trained FoE prior.

References

  • [1] Y. Chen, T. Pock, R. Ranftl, and H. Bischof. Revisiting loss-specific training of filter-based mrfs for image restoration. In GCPR, pages 271–281, 2013.
  • [2] Y. Chen, R. Ranftl, and T. Pock. Insights into analysis operator learning: From patch-based sparse models to higher order MRFs. IEEE Transactions on Image Processing, 23(3):1060–1072, 2014.
  • [3] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In Computer Vision–ECCV 2014, pages 184–199. Springer, 2014.
  • [4] Q. Gao and S. Roth. How well do filter-based MRFs model natural images? In DAGM/OAGM Symposium, pages 62–72, 2012.
  • [5] C. D. Herrera, J. Kannala, P. Sturm, and J. Heikkila. A learned joint depth and intensity prior using markov random fields. In 3DTV-Conference, 2013 International Conference on, pages 17–24. IEEE, 2013.
  • [6] S. Z. Li. Markov random field modeling in computer vision. Springer-Verlag New York, Inc., 1995.
  • [7] P. Ochs, Y. Chen, T. Brox, and T. Pock. iPiano: Inertial Proximal Algorithm for Nonconvex Optimization. SIAM Journal on Imaging Sciences, 7(2):1388–1419, 2014.
  • [8] S. Roth and M. J. Black. Fields of experts. International Journal of Computer Vision, 82(2):205–229, 2009.
  • [9] K. G. G. Samuel and M. Tappen. Learning optimized map estimates in continuously-valued MRF models. In CVPR, 2009.
  • [10] U. Schmidt, Q. Gao, and S. Roth. A generative perspective on MRFs in low-level vision. In CVPR, pages 1751–1758, 2010.
  • [11] U. Schmidt, K. Schelten, and S. Roth. Bayesian deblurring with integrated noise estimation. In

    Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on

    , pages 2625–2632. IEEE, 2011.
  • [12] R. Timofte, V. De, and L. V. Gool. Anchored neighborhood regression for fast example-based super-resolution. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 1920–1927. IEEE, 2013.
  • [13] X. Wang, C. Hou, L. Pu, and Y. Hou. A depth estimating method from a single image using FoE CRF. Multimedia Tools and Applications, pages 1–16, 2014.
  • [14] R. Zeyde, M. Elad, and M. Protter. On single image scale-up using sparse-representations. In Curves and Surfaces, pages 711–730. Springer, 2012.
  • [15] H. Zhang and Y. Zhang. Bayesian image separation with natural image prior. In Image Processing (ICIP), 2012 19th IEEE International Conference on, pages 2097–2100. IEEE, 2012.
  • [16] H. Zhang, Y. Zhang, H. Li, and T. S. Huang. Generative bayesian image super resolution with natural image prior. Image Processing, IEEE Transactions on, 21(9):4054–4067, 2012.
  • [17] B. Zhao, W. Zhang, H. Ding, and H. Wang. Non-blind image deblurring from a single image. Cognitive Computation, 5(1):3–12, 2013.