Markov Random Fields (MRFs) based models have a long history in low-level computer vision problems, which treat the image as a random field
. It is well-known that MRFs are particularly effective for image prior modeling in image processing. In a MRF-based image prior model, the probability of a whole image is defined based on the potential (or energy) of the overlapping local cliques.
An elegant MRF-based image prior model, called Fields of Experts (FoE) was recently proposed by Roth and Black . The proposed FoE model is defined by (1) a heavy-tailed potential function, which is derived from the observation that the filter response of natural images exhibit heavy-tailed distribution when applying derivative filters onto them, (2) a set of linear filters, which are trained from image samples.
Due to its effectiveness of the FoE image prior model for many image restoration problems, many works have been devoted to the FoE-based image restoration problems, such as image denoising, inpainting, deblurring, etc [8, 9, 1]. Usually, there are two ways to investigate the learned FoE prior model for specific image restoration problems, the sampling-based MMSE estimation, such as [10, 4, 15, 16], and the energy minimization based MAP estimation, such as [8, 9, 1, 2].
It is  that for the first time claimed that the MMSE estimation can lead to better performance compared to the MAP estimation for the image denoising task with their learned FoE image prior model. After that, many works follow their suggestion to make use of the MMSE estimation for FoE related models, such as image deblurring [11, 17], image denoising , depth estimation [13, 5], image separation  and single image super resolution .
In a recent paper , the FoE prior model was exploited in the context of image super resolution. The authors also proposed to employ the MMSE estimate in the inference procedure instead of the MAP estimate. With the MMSE estimate, the FoE-based SR model demonstrates a state-of-the-art SR algorithm. However, it is well known that the sampling based approach is very time consuming, alluding to the fact that the FoE-based SR model is not appealing for practical applications.
It is generally true that the MMSE estimate is a better alternative than MAP, as it can exploit the uncertainty of the model, especially in the case of multimodal distribution with multiple peaks. However, in practice it is usually hard to find an accurate solution for the MMSE estimate due to the difficulty of taking the expectations over entire images. As a consequence, the MAP inference, which seeks the maximum peak, might have the possibility to work equally well for some problems.
In this letter, we evaluate the performance of the MAP inference for the FoE based SR problem. Our experimental results demonstrate that the MAP inference of the FoE-based SR model has been underestimated in the previous work . Numerical results show that with exactly the same image prior model exploited in the MMSE estimation, the MAP inference can achieve equivalent performance in terms of both quantitative measurements (PSNR and SSIM values) and visual perception quality. In addition, the MAP inference can obtain further improvements with the discriminatively trained FoE image prior of the same model capacity. It is clear that the MAP inference has a significant advantage of efficiency, and this advantage is even more remarkable with our recently proposed non-convex optimization algorithm - iPiano .
To sum up, our experimental findings suggest us to exploit the MAP inference for solving the FoE prior-based image super resolution problem, because (1) there is no performance loss by using this simpler inference criterion, and (2) the MAP inference has an apparent advantage of high efficiency.
Ii MAP inference of FoE image prior based SR
In a typical image super resolution task, the low-resolution (LR) image is generated from a high-resolution (HR) image using the following formulation
where and is the HR and LR image, respectively. is the matrix corresponding to the blurring operation and () signifies the down-sampling operation.
is the noise (typically assumed to be Gaussian white noise with level).
The FoE image prior based SR model is formulated by the following Bayesian probabilistic model
where is the probability density of an image under the FoE framework, written as
where is the maximal cliques, is the number of the filters, refers to the -th pixel in the filtered image by , is the potential function with associated weights . In , the potential function is given by the Gaussian scale mixtures (GSMs) as
where are the normalized weights of the Gaussian component with scale
and base variance.
According to the posterior (II.1),  used the sampling-based MMSE estimation to recover the underlying HR image . In this letter, we consider the MAP estimate. With the MAP estimation, the FoE-based SR task is formulated as the following energy minimization problem
where with penalty function defined in (II.2).
Gradient-based algorithms are applicable to solve the minimization problem (II.3). First, we need to calculate the gradient , which is given as
where a highly sparse matrix, implemented as 2D convolution of the image with filter kernel , i.e., , , with .
In our work, we consider a newly developed non-convex optimization - iPiano  to solve the above minimization problem, instead of the commonly used conjugate gradient (CG) algorithm. We find that the iPiano algorithm is significantly faster than CG. We refer the interested readers to  for more details about the iPiano algorithm.
Iii Experimental results
We mainly conducted two types of experiments. The first type is to perform a direct comparison between the MAP estimate and the MMSE estimate for the FoE based SR task. The second type is to compare the MAP based SR model to very recent state-of-the-art SR approaches. The corresponding implementations are all from publicly available codes provided by the authors, and are used as is.
Iii-a Comparison between the MAP and MMSE estimate
In order to conduct a fair comparison with the MMSE estimation, we first considered the MAP estimation with exactly the same image prior model exploited in  (8 filters of size with GSMs potential). We repeated the experiments presented in the TABLE I of , where eight noise-free images were upsampled with a zooming factor of 3. The results of the MMSE and MAP estimates are shown in Table I. One can see that the MAP estimate using the same image prior model performs equally well compared to the MMSE estimate, in terms of PSNR and SSIM index111 Note that we were not able exactly reproduce the results presented in  due to the randomness of the sampling-based approach. We actually achieved slightly different results. .
|MMSE with prior (II.2)||31.73/88.85||25.94/90.94||26.26/83.43||25.55/74.44||32.93/90.34||29.32/83.32||30.28/81.83||28.47/80.34|
|MAP with prior (II.2)||32.25/89.03||25.86/89.50||25.91/82.20||25.65/75.41||33.16/90.97||29.10/83.53||30.71/82.59||28.41/80.55|
|MAP with prior (III.1)||32.72/89.61||26.62/91.43||26.69/84.66||25.71/75.71||33.52/91.44||29.48/84.47||31.14/83.77||28.77/81.91|
We then exploited a discriminatively trained FoE prior for the MAP-based SR model to further investigate its performance. The discriminatively trained FoE prior has the same model capacity, and is directly optimized based on the MAP estimate in the context of Gaussian denoising. We employed the Student-t based FoE model trained in our previous work , which is defined as
The results of the MAP-based SR model with this discriminatively trained FoE prior (III.1) are also shown in Table I. One can see that the MAP inference with our discriminatively trained FoE model improves the PSNR and SSIM results. An illustrative example is presented in Figure 2.
|1||MMSE with prior (II.2)||31.26/87.74||25.69/88.83||26.13/82.40|
|MAP with prior (II.2)||31.66/87.30||25.87/87.93||25.72/80.91|
|MAP with prior (III.1)||32.03/87.55||26.23/89.24||26.17/81.96|
|2||MMSE with prior (II.2)||30.47/85.80||25.23/86.00||25.67/80.20|
|MAP with prior (II.2)||30.84/85.62||25.38/85.75||25.24/78.82|
|MAP with prior (III.1)||31.25/85.87||25.49/86.97||25.69/79.68|
|3||MMSE with prior (II.2)||29.33/83.21||24.54/82.32||24.94/77.04|
|MAP with prior (II.2)||30.30/84.55||24.88/83.80||24.65/76.53|
|MAP with prior (III.1)||30.59/84.63||25.10/85.19||25.26/77.97|
We also evaluated the performance of the MAP inference in the presence of noise. For the cases of mild Gaussian noise, the results of the MAP inference with two different FoE image prior models are shown in Table II, together with the results of the MMSE based model. Note that this is a direct comparison to TABLE III of . Again, one can see that the MAP estimate with the same FoE model (i.e., (II.2)) works equally well, and it leads to better results with our discriminatively trained FoE prior (III.1).
For the MAP estimate based SR model (II.3), we need to search an optimal for each case. For the noise-free image SR task, we use a relative large , and for the SR tasks with Gaussian noise, we find the following empirical choice (1) , (2) , and (3) , generally works well.
Run time: We run the inference algorithms on a server with Inter(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz. For the SR task of upsampling an image of size to the size of , the average computation time per iteration of the MMSE-based algorithm is 87s. Typically, the MMSE estimate takes 100 iterations, and therefore for this SR task, it requires about 2.4h, making this approach hardly appealing for practical application.
In contrast, the MAP inference is much faster. The average computation time per iteration of the MAP inference is 0.039s in the case of the Student-t based FoE prior (III.1)222 With the same model capacity of 8 filters of size . Typically, it takes 150200 iterations to solve the resulting non-convex minimization problem333Also note that the required iterations is dramatically reduced by using the iPiano algorithm, compared to the usual CG algorithm used in previous works, such as [8, 10], where the iterative algorithm has to run 5000 iterations. . As a consequence, the MAP inference with the Student-t based FoE prior is able to accomplish the same SR task in 7s, which is dramatically faster than the MMSE inference (2.4h). Implementation will be available at our homepage (www.GPU4Vision.org) after acceptance.
Iii-B Comparison to state-of-the-art SR approaches
In order to conduct a comprehensive evaluation for the MAP based SR model, we further compared it with very recent state-of-the-art SR approaches: the K-SVD based method , the ANR (Anchored Neighborhood Regression) based method  and deep convolutional network based method - SRCNN . In order to perform a fair comparison with these methods, we strictly obey the same test protocols as in . We used the same test sets - Set14 and Set5 to evaluate the upscaling factor of 3. For the MAP based SR model, we incorporated a FoE prior model with larger filter size and more filters (shown in Figure 3, 48 filters of size ), which is trained in . Replacing the FoE prior model show in Figure 1 with this new FoE model having increased model capacity can improve the performance of the MAP based SR model.
The SR results on Set14 and Set5
are summarized in Table III. We can see that the FoE based SR model with filters of size
achieves similar average PSNR as the SRCNN method, and outperforms other competing algorithms.
A visual example is shown in Figure 4444Following  , we only consider the
luminance channel (in YCrCb color space) in our experiments. The two chrominance
channels are directly upsampled using the bicubic interpolation for the purpose of display.
, we only consider the luminance channel (in YCrCb color space) in our experiments. The two chrominance channels are directly upsampled using the bicubic interpolation for the purpose of display.. In the highlighted region, one can see that our SR method achieve much clear edges than other approaches. In summary our model obtains strongly competitive quality performance to very recent state-of-the-art SR methods.
Iv Discussion and Conclusion
In the context of higher-order MRF based models, it is generally true that the MAP estimate, which only seeks for the posterior mode, could not generally exploit the full potential offered by the probabilistic modeling, while the MMSE estimate, which directly draw samples from the probability model, should be more powerful. On the other hand, it is well-known that the sampling based MMSE estimation is very slow, making the corresponding methods hardly appealing for practical applications if one has to stick to the MMSE inference.
In this letter, we have concentrated on the higher-order MRFs based SR problem, and evaluated the performance of the MAP estimate in inference. We found that the MAP estimate can work equally well compared to MMSE in the presence of the same FoE prior, despite of the non-convexity of the resulting optimization problem. We believe the reason is two-folds: first, the exploited iPiano algorithm which is an effective non-convex optimization algorithm, helps us reach the MAP mode in a short time; secondly, in practice one is not able to obtain an accurate solution for the MMSE estimate. In addition, we found that the performance of MAP estimate can be further boosted by using discriminatively trained FoE prior models. As a consequence, the resulting model, which involves 48 filters of size can lead to strongly competitive results to very recent state-of-the-art SR methods. Therefore, concerning the higher-order MRFs based SR task, we suggest to exploit the MAP estimate for inference because there is no performance loss by using this simpler inference criterion while it has an obvious advantage of high efficiency.
Furthermore, it is notable to point out that the findings about the MAP estimate presented in this letter strengthen our arguments drawn based on the Gaussian denoising problem in our previous works [1, 2]. We have show in [1, 2] that the MAP-based denoising model with our discriminatively trained FoE prior leads to the best results among the MRF-based systems, including MMSE based models. Therefore, we believe that MAP-based denoising model does not perform well in previous works, e.g., [10, 9] just because they have not obtained a good FoE prior well-suited for the MAP inference.
In summary, we believe that in the context of higher-order MRF image prior based modeling for image restoration problems, it is a better choice to make use of the MAP estimate, together with the discriminatively trained FoE prior.
-  Y. Chen, T. Pock, R. Ranftl, and H. Bischof. Revisiting loss-specific training of filter-based mrfs for image restoration. In GCPR, pages 271–281, 2013.
-  Y. Chen, R. Ranftl, and T. Pock. Insights into analysis operator learning: From patch-based sparse models to higher order MRFs. IEEE Transactions on Image Processing, 23(3):1060–1072, 2014.
-  C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In Computer Vision–ECCV 2014, pages 184–199. Springer, 2014.
-  Q. Gao and S. Roth. How well do filter-based MRFs model natural images? In DAGM/OAGM Symposium, pages 62–72, 2012.
-  C. D. Herrera, J. Kannala, P. Sturm, and J. Heikkila. A learned joint depth and intensity prior using markov random fields. In 3DTV-Conference, 2013 International Conference on, pages 17–24. IEEE, 2013.
-  S. Z. Li. Markov random field modeling in computer vision. Springer-Verlag New York, Inc., 1995.
-  P. Ochs, Y. Chen, T. Brox, and T. Pock. iPiano: Inertial Proximal Algorithm for Nonconvex Optimization. SIAM Journal on Imaging Sciences, 7(2):1388–1419, 2014.
-  S. Roth and M. J. Black. Fields of experts. International Journal of Computer Vision, 82(2):205–229, 2009.
-  K. G. G. Samuel and M. Tappen. Learning optimized map estimates in continuously-valued MRF models. In CVPR, 2009.
-  U. Schmidt, Q. Gao, and S. Roth. A generative perspective on MRFs in low-level vision. In CVPR, pages 1751–1758, 2010.
U. Schmidt, K. Schelten, and S. Roth.
Bayesian deblurring with integrated noise estimation.
Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 2625–2632. IEEE, 2011.
-  R. Timofte, V. De, and L. V. Gool. Anchored neighborhood regression for fast example-based super-resolution. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 1920–1927. IEEE, 2013.
-  X. Wang, C. Hou, L. Pu, and Y. Hou. A depth estimating method from a single image using FoE CRF. Multimedia Tools and Applications, pages 1–16, 2014.
-  R. Zeyde, M. Elad, and M. Protter. On single image scale-up using sparse-representations. In Curves and Surfaces, pages 711–730. Springer, 2012.
-  H. Zhang and Y. Zhang. Bayesian image separation with natural image prior. In Image Processing (ICIP), 2012 19th IEEE International Conference on, pages 2097–2100. IEEE, 2012.
-  H. Zhang, Y. Zhang, H. Li, and T. S. Huang. Generative bayesian image super resolution with natural image prior. Image Processing, IEEE Transactions on, 21(9):4054–4067, 2012.
-  B. Zhao, W. Zhang, H. Ding, and H. Wang. Non-blind image deblurring from a single image. Cognitive Computation, 5(1):3–12, 2013.