    Bayesian Restoration of Digital Images Employing Markov Chain Monte Carlo a Review

A review of Bayesian restoration of digital images based on Monte Carlo techniques is presented. The topics covered include Likelihood, Prior and Posterior distributions, Poisson, Binay symmetric channel, and Gaussian channel models of Likelihood distribution,Ising and Potts spin models of Prior distribution, restoration of an image through Posterior maximization, statistical estimation of a true image from Posterior ensembles, Markov Chain Monte Carlo methods and cluster algorithms.

Authors

11/30/2020

A Markov Chain Monte-Carlo Approach to Dose-Response Optimization Using Probabilistic Programming (RStan)

A hierarchical logistic regression Bayesian model is proposed and implem...
04/02/2019

Fast Bayesian Restoration of Poisson Corrupted Images with INLA

Photon-limited images are often seen in fields such as medical imaging. ...
08/07/2020

06/25/2021

Posterior Covariance Information Criterion

We introduce an information criterion, PCIC, for predictive evaluation b...
10/18/2017

Bayesian inversion of convolved hidden Markov models with applications in reservoir prediction

Efficient assessment of convolved hidden Markov models is discussed. The...
04/29/2021

What Are Bayesian Neural Network Posteriors Really Like?

The posterior over Bayesian neural network (BNN) parameters is extremely...
05/11/2020

Prior choice affects ability of Bayesian neural networks to identify unknowns

Deep Bayesian neural networks (BNNs) are a powerful tool, though computa...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Noise, in a digital image, is an inevitable nuisance; often it defeats the purpose, the image was taken for. Noise mars the scene it intends to depict and affects the artist in the person looking at the image; it limits the diagnosis a doctor makes from a medical image; it interferes with the inferences a physicist draws from the images acquired by his sophisticated spectrometers and microscopes; it restricts useful information that can be extracted from the images transmitted by the satellites on earth’s resources and weather; etc. Hence it is important that noise be eliminated and true image restored.

Noise, in the first place, enters a digital image while being acquired due to, for example faulty apparatus and poor statistics; it enters while storage due to aging, defects in storage devices etc.; it enters during transmission through noisy channels. We shall not be concerned, in this review, with specific details of mechanisms of noise entry and image degradation. Instead, we confine our attention mostly to de-noising, also called restoration, of digital images.

Conventional digital image processing techniques rely on the availability of qualitative and quantitative information about the specific degradation process that leads to noise in an image. These algorithms work efficiently on applications they are intended for; perhaps they would also work reasonably well on closely related applications. But they are unreliable for applications falling outside111Often the mechanism of image degradation is not known reasonably well. In several situations, even if the mechanism is known, it is complex and not amenable to easy modeling. It is in these contexts that general purpose image restoration algorithms become useful. their domain. There are not many image restoration algorithms that can handle a wide variety of images. Linear filters like low pass filters, high pass filters etc., and non-linear filters like median filters, constrained least mean square filters etc., are a few familiar examples belonging to the category of general purpose algorithms.

An important aim of image restoration research is to devise robust algorithms based on simple and easily implementable ideas and which can cater to a wide spectrum of applications. It is precisely in this context, a study of the statistical mechanics of images and image processing should prove useful. The reasons are many.

1. Statistical mechanics aims to describe macroscopic properties of a material from those of its microscopic constituents and their interactions; an image is a macroscopic object and its microscopic constituents are the gray levels in the pixels of the image frame.

2. Statistical mechanics is based on a few general and simple principles and can be easily adapted to image processing studies.

3. Bayesian statistics is closely related to statistical mechanics as well as to the foundations of stochastic image processing in particular and information processing in general.

4. Several statistical mechanical models have a lot in common with image processing models.

But why should we talk of these issues now ? Notice, images have entered our personal and professional life in a big way in the recent times - personal computers, INTERNET, digital cameras, mobile phones with image acquisition and transmission capabilities, new and sophisticated medical and industrial imaging devices, spectrometers, microscopes, remote sensing devises etc..

Let me briefly touch upon a few key issues of image processing to drive home the connection between statistical mechanics and image processing. Mathematically, an image is a matrix of non-negative numbers, representing the gray levels or intensities in the pixels of an image plane. A pixel is a tiny square on the image plane and represents an element of the picture. Pixels coarse-grain the image plane. The gray level in a pixel defines the state of the pixel. Pixels are analogous to the vertices of a lattice that coarse-grain space in a statistical mechanical two dimensional model. The gray levels are analogous to the states of spins or of atoms sitting on the lattice sites. Spatial correlations in an image can be modeled by defining suitable interaction between states of two pixels, exactly like the spin-spin interaction on a lattice model. In fact one can define an energy for an image by summing over these interactions. An exponential Bayesian Prior is analogous to Gibbs distribution. It is indeed this correspondence which renders an image, a Markov random field and vice versa. We can define a Kullback-Leibler entropy distance for constructing a Likelihoodprobability distribution of images. Temperature, in the context of image processing, represents a parameter that determines the smoothness of an image. The energy-entropy competition, responsible for different phases of matter, is like the Prior - Likelihood competition that determines the smoothness and feature-quality of an image in Bayesian Posterior maximization algorithms. Simulated annealing that helps equilibration in statistical mechanical models has been employed in image processing by way of starting the restoration process at high temperature and gradually lowering the temperature to improve noise reduction capabilities of the chosen algorithm. Mean-field approximations of statistical mechanics have been employed for hyper-parameters estimation in image processing. In fact several models and methods in statistical mechanics have their counter parts in image processing. These include Ising and Potts spin models, random spin models, Bethe approximation, Replica methods, renormalization techniques, methods based on bond percolation clusters and mean field approximations. We can add further to the list of common elements of image processing and statistical mechanics. Suffice is to say that these two seemingly disparate disciplines have a lot to learn from each other. It is the purpose of this review to elaborate on these analogies and describe several image restoration algorithms inspired by statistical mechanics.

Image processing is a vast field of research with its own idioms and sophistication. It would indeed be impossible to address all the issues of this subject in a single review; nor are we competent to undertake such a task. Hence in this review we shall restrict attention to a very limited field of image processing namely the Bayesian de-noising of images employing Markov chain Monte Carlo methods. We have made a conscious effort to keep the level of discussions as simple as possible.

Stochastic image processing based on Bayesian methodology was pioneered by Derin, Eliot, Cristi and Gemen , Gemen and Gemen  and Besag . For an early review see Winkler ; for a more recent review on the statistical mechanical approach to image processing see Tanaka . Application of analytical methods of statistical mechanics to image restoration process can be found in . The spin glass theory of statistical mechanics has been applied to information processing . Infinite range Ising spin models have been applied to image restoration employing replica methods . A brief account of the Bayesian restoration of digital images employing Markov chain Monte Carlo methods was presented in . The present review, is essentially based on  with more details and examples. The review is organized as follows.

We start with a mathematical representation and a stochastic description of an image. We discuss a Poisson model for image degradation to construct a Likelihood

distribution. We show how Kullback-Leibler distance emerges naturally in the context of Poisson distributions. Then we show how to quantify the smoothness of an image through an energy function. This is followed by a discussion of Ising and Potts spin Hamiltonians often employed in modeling of magnetic transitions. We define a Bayesian

Prior

as a Gibbs distribution. Such a description renders an image a Markov random field. Bayes’ theorem combines a subjective

Prior with a Likelihood model incorporating the data on the corrupt image and gives a Posterior - a conditional distribution from which we can infer the true image. We show how the Likelihood and the Prior compete to increase the Posterior, the competition being tuned by temperature. The Prior tries to smoothen out the entire image with the Likelihood competing to preserve all the features, genuine as well as noisy, of the given image. For a properly tuned temperature the Likelihood loses on noise but manages to win in preserving image features; similarly the Prior loses on smoothening out the genuine features of the image but wins on smoothening out the noise inhomogeneities. The net result of the competition is that we get a smooth image with all genuine inhomogeneities in tact. The image that maximizes the Posterior for a given temperature provides a good estimate of the true image. We describe a simple Monte Carlo algorithm to search for the Posterior maximum and present a few results of processing of noisy toy and benchmark images.

Then we take up a detailed discussion of Markov Chain Monte Carlo methods based on Metropolis and Gibbs sampler; we show how to estimate the true image through Maximum Posteriori (MAP), Maximum Posterior Marginal (MPM) and Threshold Posterior Mean (TPM) statistics that can be extracted from an equilibrium ensemble populating the asymptotic segment of a Markov chain of images. We present results of processing of toy and benchmark images employing the algorithms described in the review. Finally we take up the issue of sampling from a Prior distribution employing bond percolation clusters. The cluster algorithms can be adapted to image processing by sampling cluster gray level independently from the Likelihood distribution. Cluster algorithms ensure faster convergence to Posterior ensemble from which the true image can be inferred. We employ a single cluster growth algorithm to process toy and benchmark images and discuss results of the exercise. We conclude with a brief summary and a discussion of possible areas of research in this exciting field of image processing, from a statistical mechanics point of view.

Mathematical Description of an Image

Mathematical and Statistical Description]0 Markov Chain Monte Carlo Image Restoration]0 Consider a region of a plane discretized into tiny squares called picture elements or pixels for short. Let be a finite set of pixels. For convenience of notation we identify the spatial location of a pixel in the image plane by a single index . Also we say that index denotes a pixel. Mathematically, a collection of non-negative numbers is called an image. Let denote the cardinality of the set . In other words is the number of pixels in the image plane. denotes the intensity or gray level in the pixel . The state of a pixel is defined by its gray level. Let us denote by , the true image. is not known to us. Instead we have a noisy image . The aim is to restore the true image from the given noisy image .

Stochastic Description of an Image

In a stochastic description, we consider an image

as a collection of random variables

222 Equivalently, we can consider an image as a particular realization of the set of random variables. For convenience of notation we use the same symbol to denote the random variable as well as a realization of the random variable. The distinction would be clear from the context.. Consider for example, pixels painted with gray levels, . The gray level Label denotes black and the gray level label denotes white. Each pixel can take any one of the gray levels. Collect all the possible images you can paint, in a set, denoted by the symbol , called the state space. is analogous to the coarse grained phase space of a classical statistical mechanical system. Let denote the total number of images in the state space. It is clear that . Both and belong to . If all the images belonging to are equally probable then we can resort to a uniform ensemble description.

Figure (1 : Left) depicts a image painted randomly with gray levels labeled by integers from to , with the label denoting black and the label denoting white. The matrix of non-negative numbers (the gray level labels) that provides a mathematical representation of the image is also shown in Fig. (1 : Right). Figure 1: (Left) A 10×10 random image painted with five gray levels with 0 representing black and 4 representing the white. (Right) The matrix of non-negative numbers (gray levels) representing the image.

In image processing terminology, is called a random field333We shall see later that in a stochastic description, an image can be modeled as a Markov random field.. Thus the noisy image on hand is a random field. The noisy image has

1. a systematic component representing the features of , that has survived image degradation

2. and

3. a stochastic component that accounts for the noise that enters during degradation 444the degradation of an image can occur during acquisition, storage or transmission through noisy channels..

A stochastic model for is constructed by defining a conditional probability of ( given ). This conditional probability is denoted by and is called the Likelihood probability distribution, or simply the Likelihood. can be suitably modeled to describe the stochastic degradation process that produced from .

Let us consider a simple and general model of image degradation based on Poisson statistics, which will render transparent the idea and use of likelihood distribution. Poisson Likelihood and Kullback-Leibler Entropy]0 Markov Chain Monte Carlo Image Restoration]0

Poisson Likelihood Distribution

We consider the random field as constituting a collection of independent random variable . We take each to be a Poisson random variable. The motivation is clear. Image acquisition is simply a counting process555We count the photons incident on a photographic or X-ray film; we count the electrons while recording diffraction pattern; etc.. Poisson process is a simple and a natural model for describing counting. There is only one parameter in a Poisson distribution, namely the mean. We take . The Likelihood of is given by,

 L(xi|ˆθi) = 1xi !(ˆθi)xiexp(−ˆθi) (1)

We assume

to be independent random variables. The joint distribution is then the product of the distributions of the individual random variables. Hence we can write the

Likelihood distribution of as,

 L(X|ˆΘ) = ∏i∈SL(xi|ˆθi) (2) = ∏i∈S(ˆθi)xie−ˆθixi !

The above Poisson Likelihood can be conveniently expressed as,

 L(X|ˆΘ) = exp[−F(ˆΘ,X)] ∏i∈S 1xi !, (3)

where,

 F(ˆΘ,X)=∑i∈S[ˆθi−xilog(ˆθi)] (4)

The advantage of Poisson and related likelihood distributions is that there is no hyper-parameter that needs to be optimized666The Gaussian channel Likelihood has two hyper-parameters. The binary symmetric channel Likelihood has one hyper-parameter. We shall discuss these types of degradation processes later..

Kullback-Leibler Entropy Distance

It is clear from the discussion on Poisson Likelihood that the function serves to quantify how far away is an image from another image . Let us investigate as to what extent does the function meet the requirements of a distance metric.

when and it is not symmetric in its arguments. Let us write as,

 F(Θ,X)=∑i∈Sf(θi,xi), (5)

and study the behaviour of as a function of with a known value of . We have,

 f(θi;xi) = θi−xilog(θi), dfdθi = 1−xiθi. (6)

It is clear from the above that,

 dfdθi = 0 (7)

when . In other words at , the function is an extremum. To determine whether it is a minimum or a maximum we consider the second derivative of at . We have,

 d2fdθ2i∣∣∣θi=xi = 1xi  >  0, (8)

implying that is minimum at . Let denote the minimum value of . We define a new function,

 g(θi,xi) = f(θi,xi)−fm (9) = (θi−xi)+(θi−xi)log(θi/xi).

The function is not symmetric in its arguments. Hence we define,

 Fi(θi,xi) = g(θi,xi)+g(θi→xi,xi→θi) (10) = (θi−xi)log(θi/xi)

We can now define a new distance between and as

 F(Θ,X)=∑i∈S(θi−xi)log(θi/xi) (11)

We see that that the new distance function when and is positive definite when ; also is symmetric in its arguments. We shall take as a measure of the distance between the images and 777strictly , given by Eq. (11) is not a distance metric because it does not obey triangular inequality.. is the same as the symmetric Kullback-Leibler entropy 

. This is also called by various names like Kullback-Leibler distance, Kullback-Leibler divergence, mutual information, relative entropy

etc., defined for quantifying the separation between two probability distributions. We refer to given by Eq. (11) as Kullback-Leibler distance.

Hamming Distance

Gaussian and Binary Channels]0 Markov Chain Monte Carlo Image Restoration]0 Another useful function that quantifies the separation between two images and , defined on a common image plane, is the Hamming distance, defined as,

 F(Θ,X)=∑i∈SI(θi≠xi), (12)

where the indicator function,

 I(η)=⎧⎨⎩1 if the statement  η is true,0 if the statement  η is not true. (13)

Hamming distance simply counts the number of pixels in having gray levels different from those in the corresponding pixels in . The Hamming distance defined above is symmetric in its arguments; also , when . We shall find use for Hamming distance while processing binary images or even for processing images which have a few number of gray levels. However if the number of gray levels in an image is large, then the Kullback-Leibler distance should prove more useful.

Gaussian Channel Likelihood Distribution

A popular model for image degradation is based on assuming to be Gaussian random variable with

 ⟨xi⟩ = α ˆθi, ⟨x2i⟩−⟨xi⟩2 = σ2. (14)

In the above and are called the hyper-parameters of the Gaussian channel model. The Likelihood of is given by,

 L(xi|ˆθi)=1σ√2π exp[−12σ2 (xi−αθi)2 ] (15)

We assume to be independent random variables. The Likelihood in the Gaussian channel model is given by,

 L(X|ˆΘ) = (16)

where is the total number of pixels in the image plane.

Binary Symmetric Channel Likelihood Distribution

Consider a binary image which degrades to by the following algorithm. Take a pixel in the image plane of . Call a random number uniformly distributed in the unit interval. If set to the other gray level. Otherwise set . This process is described by the Likelihood distribution given by,

 L(xi|ˆθi) = ~p I(xi≠ˆθi)+(1−~p) I(xi=ˆθi) (17)

It is convenient to express the above Likelihood as,

 L(xi|ˆθi) = exp[−βLI(xi≠ˆθi)]1+exp(−βL) (18)

The hyper-parameter is related to as given below.

 ~p = exp(−βL)1+exp(−βL) βL = log(1~p−1) (19)

The procedure described above is repeated independently on all the pixels. The Likelihood of is then the product of the Likelihoods of and is given by,

 L(X|ˆΘ) = exp[−βLF(X,ˆΘ)]∑Xexp[−βLF(X,ˆΘ)], (20)

where is the Hamming distance between and . The hyper-parameter can be interpreted as inverse temperature of the degradation process. When we have . In other words at no degradation takes place and . As increases the noise level increases. When , we have and the image i becomes completely random. In other words each pixel can have independently a black or white gray level with equal probability.

Degradation of a Multi-Gray Level Image

Bayesian Methodology]0 Markov Chain Monte Carlo Image Restoration]0 Consider an image painted with gray levels with labels . The label denotes black and the label denotes white. Let us say that the degradation of to proceeds as follows. Take a pixel . Call a random number uniformly distributed in the range to . If , then select one of the gray levels (excluding ) randomly and with equal probability and assign it to . Otherwise set . This process of degradation is described by the Likelihood given by,

 L(xi|ˆθi) = ~p I(xi≠ˆθi)+(1−~p) I(xi=ˆθi) (21)

It is convenient to express the above Likelihood as,

 L(xi|ˆθi) = exp[−βLI(xi≠ˆθi]1+(Q−1)exp(−βL). (22)

The hyper-parameter is related to as given below,

 ~p = exp(−βL)1+(Q−1)exp(−βL) βL = log[(Q−1)(1~p−1)]. (23)

The degradation process described above is repeated independently on all the pixels. Then the Likelihood of is the product of the Likelihoods of and is given by,

 L(X|ˆΘ) = exp[−βLF(X,ˆΘ)]∑Xexp[−βLF(X,ˆΘ)], (24)

where is the Hamming distance between and . The hyper-parameter can be interpreted as inverse temperature of the degradation process. When we have indicating no degradation and . As increases the noise level increases. When , we have which tends to unity when . In all the above if substitute we recover the results of binary symmetric channel degradation.

Three Ingredients of Bayesian Methodology

In a typical image processing algorithm, we start with an initial image . A good choice is , the given noisy image. We generate a chain of images

 Θ0→Θ1→Θ2⋯→Θn→Θn+1→ (25)

by an algorithm and hope that a suitably defined statistics over the ensemble of images in the chain would approximate well the original image . In constructing a chain of images we shall employ Bayesian methodology.

The first ingredient of a Bayesian methodology is the Likelihood which models the degradation of to . We have already seen a few models of image degradation. The second ingredient is an priori distribution, simply called a Prior, denoted by . The Prior is a distribution of gray levels on . The choice of a Prior is somewhat subjective. The third ingredient is an posteriori distribution or simply called Posterior, denoted by the symbol . Posterior is a conditional distribution; it is the distribution of .

Bayesian a Priori Distribution

As we said earlier, the choice of a Prior in Bayesian methodology is subjective. The Prior should reflect what we believe a clean image should look like. We expect an image to be smooth. We say that a pixel is smoothly connected to its neighbour if the gray levels in them are the same. Smaller the difference between the gray levels, smoother is the connection. This subjective belief is encoded in a Prior as follows.

We introduce an interaction energy between the states of two pixels. We confine the interaction to nearest neighbour pixels. Thus the states of a pair of nearest neighbour pixels interact with an energy denoted by . We take the interactions to be pair-wise additive and define an energy of the image as,

 E(Θ) = ∑⟨i,j⟩Ei,j(Θ). (26)

In the above, the symbol denotes that the pixels and are nearest neighbours and the sum is taken over all distinct pairs of nearest neighbour pixels in the image plane. We model in such a way that it is small when the gray levels are close to each other and large when the gray levels differ by a large amount. Thus, a smooth image has less energy. We define the Prior as,

 π(Θ) = 1Z(β)exp[−βPE(Θ)] (27)

where and is a smoothening parameter called the Prior temperature. is the normalization constant and is given by,

 Z(βP) = ∑Θexp[−βPE(Θ)]. (28)

The Prior, as defined above, is called the Gibbs distribution. Such a definition of Prior  implies that is a Markov random field. Conversely if is a Markov random field, then its Prior can be modeled as a Gibbs distribution.

Markov Random Field

Ising Prior]0 Markov Chain Monte Carlo Image Restoration]0 A Markov Random field generalizes the notion of a discrete time Markov process, or also called a Markov chain. Appendix (2) briefly describes a general as well as a time homogeneous or equilibrium Markov chain. A sequence of random variables, parametrized by time and obeying Markovian dependence, constitutes a Markov chain. Instead of time we can use spatial coordinate as parameter to describe the chain, in one dimensional setting. The concept of Markov random field extends Markovian dependence from one dimension to a general setting 

In a Markov random field , the state of a pixel is at best dependent on the state of the pixels in its neighbourhood888Often the nearest neighbours of a pixel constitute its neighbourhood; for example the four pixels situated to the left, right, above and below the pixel form the neighbourhood of ; this definition of neighbourhood is employed in the square lattice models of statistical mechanics. In image processing, the eight pixels - four nearest neighbours and four next-nearest neighbours, are taken as neighbourhood in several applications, like for example low pass filter. Even pixels surrounding a central pixel are also often considered as neighbourhood, in filter algorithms.. Accordingly, let be a pixel and let denote the set of pixels that form the neighbourhood of . Let denote a neighbourhood system for . In other words, is any collection of subsets of for which,

1. .

The pair is a graph. An image is a Markov random field if

 π(θi∣∣{θj : j≠i;j∈S})=π(θi∣∣{θj : j∈νi}) ∀ i∈S. (29)

From the definition of conditional probability in terms of joint probability, we have,

 π(Θ) = π(θ1,θ2,⋯,θN) (30) = π(θi∣∣{θj:j≠i;j∈S})π(θj:j≠i;j∈S)  ∀  i∈ S

Since is a Markov random field, the above can be written as,

 π(Θ)=π(θi∣∣{θj:j∈νi})π(θj:j≠i;j∈S) ∀ i∈S. (31)

Iterating alternately, we get,

 (32)

Let us define,

 Ui(Θ)=−logπ(θi∣∣{θj:j∈νi}), (33)

and

 U(Θ)=∑i∈SUi(Θ). (34)

Then,

 π(Θ)=1Zexp[−U(Θ)], (35)

where is the normalization constant. This expression for the Prior is identical to the Gibbs distribution if we identify and as the , defined in Eq. (28). Note that is a sum of the interaction energies of the states of nearest neighbour pixels and hence is consistent with the Markov random field requirement. It is important that the interaction must be restricted to a finite range for to be a Markov random field. Since we consider in this review only nearest neighbour interactions, this condition is automatically satisfied.

The equivalence of Gibbs distribution in statistical mechanics and Markov random field in spatial statistics was established, see [12, 13] following the work of Hammersley and Clifford .

Ising Model for a Priori Distribution

Ising model  is the simplest and perhaps the most studied of models in statistical mechanics. Let us consider a Prior inspired by Ising model. We consider an image having only two gray levels, designated as say or equivalently ; we have the transformation . The index () refers to black and () refers to white gray level. The interaction energy of the states of two nearest neighbour pixels and is given by,

 Ei,j(Θ) = I(θi≠θj). (36)

It is clear that if the two pixels have the same gray level, the interaction energy is zero. If their gray levels are different the interaction energy is unity. In the language of the physicists, the ground state of the two-pixel system has zero energy and the excited state has unit energy. The two energy levels are separated by . The energy of an Ising image is given by999The Ising Hamiltonian in statistical mechanics is given by , where is the spin on the lattice site and is the strength of spin-spin interaction, usually set to unity. The ground state of the a pair of nearest neighbour spins is of energy and the excited state is of energy ; the energy separation, is thus ; i.e. ,

 E(Θ)=∑⟨i,j⟩Ei,j(Θ)=∑⟨i,j⟩I(θi≠θj). (37)

The Ising Prior can be written as,

 π(Θ) = 1Z(βP)exp[−βP∑⟨i,j⟩I(θi≠θj)] (38)

where is the canonical partition function. Ising Prior in conjunction with Hamming Likelihood is suitable for restoring binary images.

Potts Model for a Priori Distribution

Potts model  is an important lattice model in which there are discrete number of states at each site. Consider a multiple gray level image. The gray levels are taken as . We have then a state Potts model. The label denotes black, and the label denotes white gray levels. The different shades correspond to the intermediate Potts numbers. The expressions for the energy associated with the states of two nearest neighbour pixels, for the Potts Hamiltonian and for the Potts Prior are the same as the ones defined for the Ising model given in the last section101010In statistical mechanics, the energy associated with a pair of nearest neighbour Potts spins , where are the Potts spin labels and is the strength of spin-spin interaction, usually set to unity. The ground state is of energy zero and the excited state is of energy unity; the energy separation is unity. i.e. .. Potts Prior in conjunction with Hamming or Kullback-Leibler Likelihood is suitable for restoring images painted with a small number of gray levels.

Gemen-McClure Model for a Priori Distribution

Relation between and ]0 Markov Chain Monte Carlo Image Restoration]0 Gemen and McClure recommended a Prior in which the states of two neighbouring pixels interact with an energy given by,

 Ei,j(Θ)=−11+C(θi−θj)2, (39)

where is a hyper-parameter that determines the width of the distribution. The value of ranges from a minimum of when to a maximum of , when . Gemen-McClure interaction potential is depicted in Fig. (2) for . The function is symmetric about and its width decreases with increase of . For large the Gemen-McClure interaction reduces to an Ising interaction with ground state energy at and excited state energy at . The Gemen-McClure Prior is thus given by,

 π(Θ)=1Z(βP) exp[ −βP ∑⟨i,j⟩ −11+C ( θi−θj )2 ] (40)

where is the partition function that normalizes the Prior. Figure 2: Gemen-McClure energy Ei,j versus |θi=θj| for C=0.1,1.0,10.0; when C increases the width of the function decreases. In the limit C→∞ the Gemen-McClure model reduces to the Ising model with ground state at −1 and excited state at 0.

Relation between βP and ˆΘ

It is clear from the discussion above that the hyper-parameter will depend on the smoothness and features present in . The Prior is given by,

 π(ˆΘ) = 1Z(βP)exp[−βPE(ˆΘ)], (41)

where is the partition function,

 Z(βP)=∑ˆΘexp[−βPE(ˆΘ)] (42)

The value of is implicitly given by the relation below.

 ⟨E⟩ = −∂∂βPlog[Z(βP)] (43)

Consider for example the Ising/Potts Priors. The energy of is completely determined by the number of pairs of nearest neighbour pixels with dis-similar gray levels111111Two nearest neighbour pixels having the same gray level are called similar pixels. If the gray levels are different they are called dissimilar pixels. In the graph theoretic language two nearest neighbour pixels are separated by an edge. Let denote the total number of edges in image plane of . If two nearest neighbour pixels have the same gray levels then we say the edge separating them is a satisfied bond121212this terminology will be useful later when we consider cluster algorithms for image restoration. Let denote the number of satisfied bonds in the image . In other words is the number of nearest neighbour pairs of similar pixels. We see immediately that . Thus the value of the hyper-parameter can in principle be calculated from the values of and of a given image .

Bayesian a Posteriori Distribution

MAP estimate]0 Markov Chain Monte Carlo Image Restoration]0 The Posterior distribution in the Bayesian methodology is given by the product of the Likelihood and the Prior. This is called Bayes’ theorem, discussed for e.g. in [18, 19]. Appendix 1 states Bayes’ theorem.

According to Bayes’ theorem,

 π(Θ|X)=L(X|θ)π(θ)∑ΘL(X|θ)π(θ). (44)

The Bayesian Posterior can be formally written as,

 π(Θ|X)=exp[−{βLF(Θ,X)+βPE(Θ)}]∑Θexp[−{βLF(Θ,X)+βPE(Θ)}]. (45)

It is convenient to work in terms of intensive quantities. To this end we define per pair of nearest neighbour pixels and per pixel. Accordingly we divide energy by, , the number of nearest neighbour pairs of pixels in the image plane. For an pixel image, we have . We divide by . In the expression for the Posterior above, the argument of the exponential function can be viewed as a weighted sum of and . The weight attached to is and the weight attached to the is . The relevant quantity is the relative weights attached to and . Hence we set and , where is the value of measured in units of . Also, it is convenient to work with normalized weights and accordingly we divide the whole exponent by . The final expression for the Posterior that we use in the image restoration algorithm is given by

 π(Θ|X)=exp[−L−2(1+β)−1{F(Θ,X)+2−1βE(Θ)}]∑Θexp[−L−2(1+β)−1{F(Θ,X)+2−1βE(Θ)}]. (46)

The advantage of the above expression is there is only one parameter that needs to be tuned for good image restoration131313It is purely for convenience that we have reduced the number of hyper-parameters from two to one. Thus we have only one hyper-parameter denoted by . We call the temperature. For many applications we would require two or more hyper-parameters which need to be optimized for good image restoration. Pyrce and Bruce , for example, have considered both and separately and carried out Monte Carlo search in the two dimensional hyper-parameter space for good image restoration. In fact they talk of first order transition line that separates the Prior dominated and the Likelihood dominated phases in the context of Ising spin model of image restoration..

Bayesian Maximum a Posteriori (MAP)

The aim of image restoration is to construct the true image from the given noisy image . The simplest estimate of is the image which maximizes the Posterior.

For a given temperature, the Posterior increases when decreases. In other words closer is to , larger is the value of . Though is corrupt, it is the best we have and we would like to retain during image-restoration process as many features of as possible.

The Posterior increases when decreases for a given temperature . In other words smoother is, larger is the value of . Thus, there is a Prior-Likelihood competition, the Prior trying to smoothen the image and the Likelihood trying to keep the image as close to as possible. In other words the Prior tries to make all the pixels acquire the same grave level and the Likelihood tries to bind the image to the data . As a result we get an image which has in it the features of (because of competition from the Likelihood) and which is smooth with out noise (because of competition from the Prior)141414We can say that this kind of Prior-Likelihood competition is analogous to the entropy-energy competition which determines the phase of a material.. The nature of Prior-Likelihood competition is tuned by Temperature, see below. What happens when the temperature is large ? Temperature determines the relative competitiveness of the Likelihood and Prior toward increasing the Posterior. For a given ,

1. the relative weight attached to is , and

2. the relative weight attached to is .

When is large, feature - retention (i.e. the Likelihood) is given more importance. However if is very large, the image restoration algorithm would interpret even the noise in as a feature and retain it; as a result the restored image would be noisy. What happens when the temperature is small ? When is small, smoothening (i.e. the Prior) is given more importance. However if is too small, there is a danger: the image restoration algorithm would misinterpret even genuine inhomogeneities and fine features of as noise and eliminate them.

Best image restoration is expected over an intermediate range of , often obtained by trial and error. The problem of image restoration is thus reduced to a problem of sampling, independently and randomly with equal probabilities, a large number of images from the state space . These images constitute a microcanonical ensemble. Calculate the Posterior of each member of the microcanonical ensemble. Pick up the image which gives the highest value of the Posterior and call it ; the suffix MAP stands for the Maximum Posteriori. is called an MAP estimate of . We can also employ other statistics, defined over the state space , to estimate , see below.

Maximum a Posteriori Marginal (MPM)

Markov Chain Monte Carlo Image Restoration]0 We partition the state space into mutually exclusive and exhaustive subsets as described below. Consider a pixel . Define as a subset of images for which the gray level of the pixel is , where .

 Ω(i)ζ={Θ∈Ω, θi(Θ)=ζ}  for  ζ=0,1,⋯Q−1 (47)

Calculate now marginal Posteriors,

 π(i)ζ=∑Θ ∈ Ω(i)ζπ(Θ|X)   for   ζ=0,1,⋯Q−1 . (48)

Thus we get an array of marginals . We define

 ζ(i)MPM = arg  max  ζ  π(i)ζ (49)

which stands for the value of that maximizes the function . In other words, denotes the value of the gray level for which the marginal Posterior is maximum. Repeat the above for all the pixels in the image plane, and obtain a collection of numbers that represents an approximation to :

 ΘMPM={ζ(i)MPM:i∈S}. (50)

is called a Maximum Posteriori (MPM) estimate of .

Threshold a Posteriori Mean (TPM)

Elements of Image Processing]0 Markov Chain Monte Carlo Image Restoration]0 Calculate an average image, , where,

 ¯¯¯¯θi=∑Θ∈Ωθi(Θ) π(Θ|X)  ∀  i  ∈  S. (51)

We define,

 ζ(i)TPM = arg  min  ζ  [ζ−∑Θ∈Ωθi(Θ)π(Θ|X]2 (52)

which stands for the value of that minimizes . In other words is the value of closest to . Carry out the above exercise for all the pixels in the image, and get a collection of gray levels that represents an approximation to :

 ΘTPM={ζ(i)TPM:i∈S}. (53)

is called Threshold Posteriori Mean (TPM) estimate of . It is easily seen that for a binary image .

Elements of Digital Image Restoration

A typical image restoration process is depicted in Fig. (3). We represent the unknown true image by , which gets corrupted to . The degradation of to is modeled by the Likelihood . Bayes theorem helps synthesize the Likelihood (available in the form of a degradation model and data on the corrupt image ) with a Prior (that models your subjective beliefs about true image). The result is a Posterior . An ensemble of images having the Posterior distribution is shown as in Fig. (3). From this Posterior ensemble we can make MAP, MPM and TPM estimates of the true image. Figure 3: Digital image restoration : ˆΘ denotes the true image (first square in the top row) which gets corrupted to X (second square in the top row). The degradation of ˆΘ to X is modeled by L(X|ˆΘ). Bayes theorem helps construct a Posterior π(Θ|X) from the Likelihood and a Prior. A Posterior ensemble of images ( set of squares in the right end of top row). MAP, MPM and TPM estimates of ˆΘ (depicted by the three squares in the second row) are made from the Posterior ensemble. See 

Algorithm for Calculating ΘMAP

A straight forward procedure to calculate is to attach a Posterior probability to each image belonging to the state space and pick up that image which has a maximum Posterior probability. This is easily said than done. Note that the number of images belonging to state space is . Let us consider a small, say binary image (). The number of pixels is thus . Then . Calculating the Posteriors for these images is an impossible task even on the present day high speed computers151515A typical image is of size between to with gray levels ranging from , for a binary image to . The state space will contain order of images.. A simple procedure would be to sample randomly and with equal probability a certain large number of images in the neighbourhood of the given image ; these images constitute a microcanonical ensemble. Find the image, in the microcanonical ensemble, that maximizes the Posterior and recommend it as an MAP estimate of . An algorithm for doing this is described below.

Fix the temperature. Start with an image . Call this the current image. i.e.

1. Calculate .

2. Select a pixel, say , randomly from the image plane.

3. Switch the gray level of the pixel from its current value to a value sampled randomly and with equal probability from the gray level labels
.

4. Calculate .

5. If , then accept the trial image and set ; otherwise .

6. Take as the current image ,

7. go to step (1).

Iterate the whole process several times. A set of update attempts constitutes an iteration. Collect the images at the end of each iteration. Thus we get a sequence of images with monotonically non-decreasing Posteriors. Asymptotically we get , called an MAP estimate of .

How does Posterior maximization lead to image restoration ?

Restoration of Toy and Benchmark Images]0 Markov Chain Monte Carlo Image Restoration]0 A simple explanation of how does a Bayesian Posterior maximization lead to de-noising is depicted in Fig. (4) and described below. Figure 4: Local region of ΘC is depicted at Left. The gray level label of the central pixel is switched from 1 to 3 and a trial image Θt is constructed. The corresponding local region of ˆΘt is depicted at Right. This move leads to local smoothening of the image. Let us assume that the sub-image depicted at Left belongs to X and that depicted at Right belongs to ˆΘ. Then the above move increases F by one unit and decreased E by 4 units. The Posterior increases if T<2. In other words Prior wins over Likelihood if TL < 2, leading to de-noising.

Consider a local region of the current image having a pixel with gray level label and with all its four nearest neighbours having gray level label . Let us denote by the symbol the set of five pixels : the pixel and its four nearest neighbours. Let us say that in , the pixels belonging to have all the same gray level and hence is smooth locally. It is the process of degradation that has to led to the noise inhomogeneity in the pixel . We have,

 F(ΘC,X) = ∑i∉νI(θi(ΘC)≠xi)+∑i∈νI(θi(ΘC)≠xi) (54)

Let us denote the first term in the above sum (over pixels not belonging to ) as . The second term is zero since . We get . Similarly,

 E(ΘC) = ∑⟨i,j⟩i≠mI(θi(ΘC)≠θj(ΘC))+∑⟨i,j⟩i=mI(θi(ΘC)≠θj(ΘC)), (55)

where denotes that and are nearest neighbour pixels and the sum extends over all distinct nearest neighbour pairs of pixels. Let us denote by the first term in the above. The second term is since there are four dis-similar nearest neighbour pairs of pixels in the sub-image . Hence . In the simulation, we change the gray level of pixel from it present value of to and call the resulting image as . We find that and Eventhough the move increases by one unit, it decreases by four units. Let the Posteriors of and be denoted by and respectively. The ratio of the Posteriors can be calculated and is given by,

 πtπC=exp[−1L2(1+β) (1−2β)] (56)

which is greater than unity whenever or . Since we accept only when is greater than , de-noising takes place when .

It is clear from the discussion above, that removal of an inhomogeneity entails energy reduction. Hence a Prior tries to remove all inhomogeneities in the image plane including the genuine ones. On the other hand, the Likelihood binds the image to the data . It tries to retain all the features of including the inhomogeneities that have their origin in noise. It is the temperature that tunes the competition between the Prior and the Likelihood.

Restoration of Toy Images

Binary Robot

We have created a binary image , referred to as ROBOT, depicted in Fig. (5 : Left). To corrupt a binary image with noise we proceed as follows. We select a pixel and change the gray level label from its present value to the other with a probability : Call a random number ; if , change the gray level. Otherwise do not change the gray level. Carry out this exercise on all the pixels independently. This is equivalent to adding noise to the image; there is no spatial correlations in the noise added. The resulting corrupt image, called is depicted in Fig. (5 : Middle). Employing Ising Prior and Hamming Likelihood we have calculated an MAP estimate of at depicted in Fig. (5 : Right).

Five-gray Level Robot

We have constructed a gray level ROBOT image on a image frame depicted in Fig. (6 : Left). We corrupt the image with noise and get depicted in Fig. (6 : Middle). Employing Potts Prior and Hamming Likelihood we have made an MAP estimate of the true image which is depicted in Fig. (6 : Right). The image restoration has been carried out at . We monitored the Bayesian Posterior maximum at the end of each iteration. The Posterior maximum is a monotonically non-decreasing of the iteration index. Eventually the Posterior maximum saturates as shown in Fig. (7). Figure 5: Binary ROBOT image; L=92. (Left) true image ^Θ (Middle) Image X constructed by adding noise to ˆΘ, as described in the section  ‘Binary Symmetric Channel Likelihood distribution’with ~p=0.05 and (Right) Restored image, ΘMAP. Complete restoration happens after one iteration. Image restoration has been carried out at T=0.5

MAP estimates of the 5-gray level Potts image made at a high temperature () and at a low temperature and at intermediate temperature () are shown in Fig. (8); the high temperature is noisy and the low temperature has several of its fine features distorted. Good image restoration obtains for temperatures in the neighbourhood of .

Restoration of Benchmark Images

Restoration of Toy and Benchmark Images]0 Markov Chain Monte Carlo Image Restoration]0

Binary Lena Image

We have taken a benchmark image called the Lena, painted with 256 gray levels. We have converted it to a binary image and is displayed in Fig. (9 : Left). We introduce noise and the resulting image is also shown in Fig. (9 : Middle). We have employed Ising Prior and Hamming distance and made an MAP estimate of the original image. The restored image is depicted in Fig. (9 Right).

We have processed the same image at temperatures and the results are displayed in Fig. (10). At , the algorithm has removed all the fine features of the image. At the algorithm has interpreted even noise as features and retained them. Good image restoration seems to happen at temperatures between and

Five and Ten Gray Level Lena Images

We have coarse grained the same Lena image to gray levels and the result is shown in Fig. (11 : Left). The image corrupted with noise is depicted in Fig. (11 : Middle) and an MAP estimate made with Potts Prior and Hamming distance is depicted in Fig. (11 : Right).

Results of image restoration of Lena image with gray levels are shown in Fig. (12). It is clear from the discussions above such a search algorithm would be time consuming. It would be advantageous to obtain an ensemble of images which has the desired Posterior distribution so that the required statistics can be calculated by simple arithmetic averaging over the Posterior ensemble. Figure 6: ROBOT image: L=56; 5 gray levels; Image restoration has been carried out employing Potts Prior, Hamming