. In Bayesian image processing, one constructs a posterior distribution and then infers restored images based on the posterior distribution. The posterior distribution is derived from a prior distribution that captures the statistical properties of the images. One of the major challenges of Bayesian image processing is the construction of an effective prior for the images. For this purpose, a Gaussian Markov random field (GMRF) model (or Gaussian graphical model) is a possible choice. Because a GMRF is a multi-dimensional Gaussian distribution, its mathematical treatment is tractable. GMRFs have also been applied in various research fields other than Bayesian image processing, e.g., traffic reconstructionKataoka et al. (2014), sparse modeling Banerjee et al. (2007), and earth science Kuwatani et al. (2014a, b).
. The procedure of Bayesian image denoising is divided into two stages: a hyperparameter-estimation stage and a restoring stage. In the hyperparameter-estimation stage, one estimates the optimal values of hyperparameters in posterior distribution for a given degraded image by using the expectation-maximization (EM) algorithm (or maximum marginal-likelihood estimation), while, in the restoring stage, one infers the restored image based on the posterior distribution with the optimal hyperparameters by using maximum a posteriori (MAP) estimation. However, the processes in the two stages have a computational problem. They require the an inverse operation ofcovariance matrix, the computational time of which is , where is the number of pixels in the degraded image. Hence, in order to reduce the computational time, some approximations are employed. In the usual setting in Bayesian image denoising, the graph structure of the GMRF model is a square grid graph according to the configuration of the pixels. By adding the periodic boundary condition (PBC) to the square grid graph, the authors of Nishimori (2000); Tanaka (2002) constructed -time algorithms. We refer to this approximation as torus approximation
, because a square grid graph with the PBC is a torus graph. This approximation allows the graph Laplacian of the GMRF model to be diagonalized by using discrete Fourier transformation (DFT), and the diagonalization can reduce the computational time to. More recently, it was further reduced to by using fast Fourier transformation (FFT) Morin (2010).
Recently, it was shown that the graph Laplacian of a square-grid GMRF model without the PBC can be diagonalized by using discrete cosine transformation (DCT) Fracastoro and Magli (2015); Fracastoro et al. (2017); Katakami et al. (2017). By using DCT, one can diagonalize the graph Laplacian without the torus approximation. Therefore, by combining the diagonalization based on DCT with the FFT method proposed in Ref Morin (2010), one can construct an -time algorithm without the torus approximation. The details of this algorithm are presented in D.
In this paper, we define a model for Bayesian image denoising based on GMRFs. The defined GMRF model has a more general form than the GMRF models presented in many previous studies, and it includes them as special cases. In our GMRF model, we propose an -time algorithm for practical Bayesian image denoising that uses DCT and the mean-field approximation. Since the mean-field approximation is equivalent to the exact computation in the present setting (see A), the results obtained by our method are equivalent to those obtained by exact computation. Our algorithm is state-of-the-art algorithm in the perspective of the order of the computational time. It is noteworthy that, in the present problem setting, one cannot create a sublinear-time algorithm, i.e., an algorithm having a computational time is less than , because at least -time computation is needed to create the restored image. In Tab. 1, the computational times of our method and the previous methods are shown.
|DFT method Nishimori (2000); Tanaka (2002)||torus approximation|
|DFT–FFT method Morin (2010)||torus approximation|
The remainder of this paper is organized as follows. The model definition and the frameworks of Bayesian image denoising and of the EM algorithm for hyperparameters estimation are presented in Sect. II. The proposed method is shown in Sect. III. In Sect. IV, we show the validity of our method by the results of numerical experiments. Finally, conclusions are given in Sect. V.
Ii Framework of Bayesian Image Denoising
Consider the problem of digital image denoising described below. Given an original image composed of pixels, a degraded image of the same size of the original one is generated by adding additive white Gaussian noise (AWGN) to the original one. Suppose that we independently obtain degraded images generated from the same stochastic process, where . The goal of Bayesian image denoising is to infer the original image from the given data, i.e., the
degraded images. We denote the original image by the vector( expresses the intensity of the th pixel), where the image is vectorized by raster scanning (or row-major scanning) on the image, and is the number of pixels in the image; we denote the th degraded image by the vector . Note that the degraded images are also vectorized by the same procedure, i.e., raster scanning, as the original one. The framework of the presented Bayesian image denoising is illustrated in Fig. 1.
We define the prior distribution of the original image by using a GMRF model, defined on a square grid graph with a free boundary condition, and express it as
with the energy function
where and are the sets of vertices and undirected edges in , respectively, and is the set of the hyperparameters in the prior distribution: controls the brightness of the image,
controls the variance of the intensities of pixels, andcontrols the smoothness of the image. is the partition function defined by
The energy function of the prior distribution in Eq. (2) can be rewritten as
where is the -dimensional vector, the elements of which are all , , and
is the precision matrix (the inverse of the covariance matrix) of the prior distribution, where is the
-dimensional identity matrix andis the graph Laplacian of :
It is noteworthy that the hyperparameter is important for the mathematical treatment for the prior distribution, because, when , the precision matrix is not positive definite and therefore Eq. (1) is not a well-defined probabilistic distribution.
Since we assume the AWGN, the likelihood, i.e., the generating process of the degraded images, for the degraded images is defined by
where is the variance of the AWGN.
where , the posterior distribution is expressed as
is the average image of degraded images, and
is the precision matrix of the posterior distribution.
In previous work Nishimori (2000); Tanaka (2002); Morin (2010), and were treated as fixed constants, whereas, in this study, they were treated as controllable parameters, which are optimized through the EM algorithm as explained later. Moreover, the previous work considered only the case where the number of degraded images is just one, i.e., the case of , while the present formulation allows the treatment of multiple degraded images. For these reasons, our model is a generalization model that includes the models presented in the previous studies Nishimori (2000); Tanaka (2002); Morin (2010).
ii.1 Maximum a Posteriori Estimation
In Bayesian image denoising, we have to infer the original image by using only the degraded images . In MAP estimation, we regard that , which maximize ,
is the most probable as the estimation of the original. Since the posterior distribution in Eq. (9) is multivariate Gaussian distribution, the MAP estimate coincides with the mean vector of the posterior distribution:
Naively, the order of the computational time of computing the mean vector of the posterior distribution in Eq. (16) is , because it includes the inverting operation for the precision matrix: .
Eq. (17) can be solved by using a successive iteration method. It is known that, in a GMRF model, the mean vector evaluated by the mean-field approximation (or the loopy belief propagation) is equivalent to the exact one Weiss and Freeman (2001); Wainwright and Jordan (2008); Shental et al. (2008); see A. Therefore, the solution to Eq. (17) is equivalent to the exact mean vector of the posterior distribution in Eq. (16). The order of the computational time of solving Eq. (17) is , because the second sum in the right hand side of Eq. (17) includes at most four terms.
From the above, we see that, for a given and a fixed , Bayesian image denoising can be performed in -time. Obviously, however, the quality of the result should depend on . The stage of optimization of still remains. The EM algorithm, described in the following section, is one of the most popular methods for the optimization of .
ii.2 Expectation-Maximization Algorithm
In the EM algorithm, for a fixed , we have to maximize the -function,
with respect to , where
is the joint distribution overand . The gradients of the -function for the parameters in are
where , and and are the expectation values of with respect to the prior distribution and the posterior distribution , respectively. Using a gradient ascent method with the gradients in Eqs. (19)–(22), we maximize the -function in Eq. (18). Naively, the computation of these gradients requires -time, which is rather expensive, because they include the inverting operation for the precision matrices, and .
The method presented in the next section allows the EM algorithm to be executed in a linear time. Therefore, given degraded images, we can perform Bayesian image denoising, including the optimization of by the EM algorithm, in -time.
Iii Expectation-Maximization Algorithm in Linear Time
We now consider the free energies of the prior and the posterior distributions,
In the following sections, we analyze the two free energies, and , and their derivatives.
iii.1 Prior Free Energy and Its Derivatives
Since the GMRF model is defined on a square grid graph with the free boundary condition, its graph Laplacian can be diagonalized as Fracastoro and Magli (2015); Fracastoro et al. (2017); Katakami et al. (2017)
is the orthogonal matrix defined by
where is a (type-II) inverse discrete cosine transform (IDCT) matrix :
is the diagonal matrix, the elements of which are the eigenvalues of, and its diagonal vector, , is the vector obtained by the row-major-order vectorization of the matrix defined as
The detailed derivation of this equation is shown in B.
iii.2 Posterior Free Energy and Its Derivatives
Because , it can be rewritten as
From the derivation presented in C, we obtain
where is the average intensity over . By using the two equations
for , and , we obtain the derivatives of Eq. (39) as
Since Eq. (17) can be solved in -time, we can also compute the right hand side of Eqs. (41)–(43) in -time. Note that, since , one can compute the first term in the right hand side of Eq. (43) in -time.
iii.3 Proposed Method
The pseudo-code of the proposed procedure is shown in Algorithm 1. The EM algorithm contains a double-loop structure: the outer loop (Steps 4 to 17) and the inner loop (Steps 11 to 15), referred to as the M-step. In the M-step, we maximize with respect to for a fixed with a gradient ascent method with iterations, where , , and are the step rates in the gradient ascent method and the gradients are presented in Eqs. (44), (45), and (47). In Steps 5 to 8, we compute the mean-vector of posterior distributions by solving the mean-field equation in Eq. (17) with successive iteration methods with warm restarts. In practice, we can stop this iteration after a few times. In the experiment in the following section, this iteration was stopped after just one time, i.e., . In Step 10, , , and are initialized by , , and .
Since , the third term in Eq. (47) is approximated as
when . When and , Eq. (47) is therefore approximated as
Hence, if the value of at the maximum point of is quite small, the value of obtained at that point is close to Eq. (50), and then, the initializing by using Eq. (50) in Step 10 in Algorithm 1 could facilitate for a faster convergence, when the optimal obtained by the EM algorithm is expected to be quite small. In fact, in the all of the numerical experiments in Sect. IV, the optimal values of obtained by the EM algorithm were quite small (). We thus used this initialization method for in the following numerical experiments.
Iv Numerical Experiments
We demonstrate our method by experiments using the images shown in Fig. 2. The parameter setting of the following experiments was as follows. The step rates in the M-step were and . Our Bayesian system is quite sensitive for the value of , and therefore we set smaller than the others for the stability of the algorithm. The initial values of were , , , and . The number of iteration in the M-step, , was one Inoue and Tanaka (2007), and the condition of convergence in Step 17 in Algorithm 1 was with . The degraded images were centered (i.e., the average over all pixels in degraded images was shifted to zero) in the preprocessing. The maximum iteration number of the EM algorithm was 100.
The methods, including our method, used in the following experiments were implemented with a parallelized algorithm by using C++ with OpenMP library for the speed-up of the for-loops (for example, the summations over and in Eqs. (45), (47), and (48) and the for-loop for in Step 6 in Algorithm 1), and they were implemented on Microsoft Windows 8 (64 bit) with Intel(R) Core(TM) i7-4930K CPU (3.4 GHz) and RAM (32 GB).
iv.1 Computation Time
We compared the computation time of our method with that of the