Deep Belief Networks for Image Denoising

12/20/2013 ∙ by Mohammad Ali Keyvanrad, et al. ∙ AUT 0

Deep Belief Networks which are hierarchical generative models are effective tools for feature representation and extraction. Furthermore, DBNs can be used in numerous aspects of Machine Learning such as image denoising. In this paper, we propose a novel method for image denoising which relies on the DBNs' ability in feature representation. This work is based upon learning of the noise behavior. Generally, features which are extracted using DBNs are presented as the values of the last layer nodes. We train a DBN a way that the network totally distinguishes between nodes presenting noise and nodes presenting image content in the last later of DBN, i.e. the nodes in the last layer of trained DBN are divided into two distinct groups of nodes. After detecting the nodes which are presenting the noise, we are able to make the noise nodes inactive and reconstruct a noiseless image. In section 4 we explore the results of applying this method on the MNIST dataset of handwritten digits which is corrupted with additive white Gaussian noise (AWGN). A reduction of 65.9 average mean square error (MSE) was achieved when the proposed method was used for the reconstruction of the noisy images.



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image signals are often corrupted due to noise. Removing noise from image is an important issue in computer vision, because this step could be the preprocessing step of many other applications. Up to now various methods have been proposed to remove noise (denoise) from visual data. Focus of many of these methods is on Fourier Analysis [1], Spatial Filtering [2], and Wavelet Transform [3]. Also there are some other methods based on Spare Coding and Dictionary Learning [4]. On the other hand, Machine Learning tools such as Convolutional Neural Networks (CNN) or Deep Neural Networks (DNN) have been used in several papers to tackle this issue[5][6]. One of the biggest difficulties in training such deep networks is that the cost function of such deep architectures gets stuck in poor local optima due to random initialization of weights. Hinton et al. [7] proposed a new greedy layer-wise algorithm to tackle this issue and introduced Deep Belief Networks (DBNs). DBNs are able to present a good ”feature” representation of data. These features which are defined as the properties of input data are presented as nodes of the last layer of DBN. As a result, in this paper we train a DBN such that it learns to extract image features. The trained DBN distinguishes between ”noise features” and ”clean image features” in the last layer and presents them into two distinct groups of nodes. Furthermore, DBNs are capable of reconstructing the input data based on the values of last layer nodes. Subsequently, if we eliminate the effects of nodes presenting the noise, the reconstructed image will be noiseless. From now on,we called the nodes presenting noise and the nodes presenting image content “noise nodes” and “image nodes”, respectively. The rest of the paper is organized as follows: In Section 2 we briefly describe Deep Learning. In Section 3 we describe the learning process. Section 4 is our experimental results and finally, we conclude the paper in Section 5.

2 Deep Learning

Restricted Boltzmann Machines (RBMs) are the building blocks of DBNs. Hence, in this section first we briefly describe RBMs and then will explore DBNs.

2.1 Restricted Boltzmann Machines

Boltzmann Machines (BMs) and Restricted Boltzmann Machines (RBMs) were introduced in 1980s. But they attracted more attention since 2006 after Hinton et al. paper [8]. He showed that a very powerful neural network can be made by stacking RBMs. RBMs are a kind of Markov Random Fields (MRF) which have a two-layer structure. One layer is called visible and another is called hidden layer. They are restricted to have no visible-visible or hidden-hidden connection and connections are inter layer. A graphical depiction of RBM is shown in Figure 1.

Figure 1: Structure of a Restricted Boltzmann Machine.

A joint configuration, of visible and hidden units has an energy given by [9]:


where are the binary states of visible unit and hidden unit and are their biases and

is the weight between them. The network assigns a probability to every possible pair of a visible and a hidden vector via this energy function [10]:


The probability that the network assigns to a visible vector, , is given by summing over all possible hidden vectors:


where the angle brackets are used to denote expectations under the distribution species by the subscript that follows. This leads to a very simple learning rule for performing stochastic steepest ascent in the log probability of the training data:


where is a learning rate. But, exact maximum likelihood learning in this model is intractable because exact computation of the expectation model is very expensive. Hence, in practice, learning is done by following an approximation to the gradient of a different objective function, called the

Contrastive Divergence

(CD) [11].

2.2 Deep Belief Networks

One of the main problems in training deep networks is how to initialize weights. It is difficult to optimize the weights in nonlinear Deep Networks with multiple hidden layers. With good initial weights, gradient descent works well, but finding such initial weights requires a very different type of algorithm that learns one layer of features at a time. Hinton et al. [1] introduced a new algorithm to solve the above problem based on the training of a sequence of RBMs. To construct a DBN we train sequentially as many RBMs as the number of hidden layers in the DBN, i.e. for a DBN with h hidden layers we have to train h RBMs. These RBMs are placed one on top of the other. Figure 2 gives an overview of the basic concept. For the first RBM, which consists of the DBN’s input layer and the first hidden layer, the input of RBM is the training set. For the second RBM, which consists of the DBN’s first and second hidden layers, the input is the output of the previous RBM and so for other RBMs.

Figure 2: Left: Greedy learning a stack of RBM’s in which the samples from the lower-level RBM are used as the data for training the next RBM. Right: The corresponding Deep Belief Network [13].

After performing this layer-wise algorithm we have obtained a good initialization for the hidden weights and biases of the DBN and then the DBN is fine-tuned with respect to a typical supervised criterion (such as mean square error or cross-entropy) .

3 Learning

To train a DBN for image denoising, the normalized values of an image pixels are used. Using min-max normalization, grayscale value of a pixel (an integer between 0 and 255) is transformed to a floating point number between 0 and 1. Unlike first and last layer of DBN, other layers have binary nodes. The main idea is to train a DBN such that it learns to map noisy images to images with lower noise or even without noise. The idea can be implemented by learning the behavior of noise and image contents and presenting these behaviors in some nodes at the last layer of the network. The network is trained with a collection consisted of both noisy and noiseless images. We used a criterion called relative activity to detect noise nodes. Relative activity of each node is defined as the difference between two values of a particular node resulted from feeding the network using a noiseless image and its corresponding noisy image (Two images with same contents but one of them is corrupted by noise). As a result, if a particular node is a noise node, it should have higher relative activity. On the other hand, if it is an image node, it should have lower relative activity. This theory is justified by the fact that the activation of image nodes should be same for both noiseless and its corresponding noisy images. This process is illustrated in Figure  3.


Figure 3: Relative activity of the last layer nodes can be computed by subtracting the last nodes values constructed by a noiseless image and its corresponding noisy image.

By performing above operation for all images and averaging on the values of each node in the last layer, average relative activity of the last layer nodes is computed. The nodes that still have high average relative activity are considered noise nodes. Now that the noise nodes are discovered, the next step is to lower their activity. Since noise nodes do not change much when clean images are fed to the network, we choose their average value for all clean images as their neutral values (the values which make nodes inactive). Finally, the noise nodes are inactive and consequently, a noiseless image can be reconstructed as it is shown in Figure 4.


Figure 4: Nodes in the last layer of the DBN are presenting the noise and the contents of the input image. By reducing the noisy nodes (gray nodes in the figure above) activity, a noiseless image will be created. Right side image is the noisy image and the left side image is noiseless reconstruction.

4 Experimental Results

MNIST dataset is a dataset of handwritten digits consisted of 60,000 images for training and 10,000 images for test. To model natural noise, we added additive white Gaussian noise (AWGN)

to images with a variance of 0.20. Therefore, our new dataset was consisted of 120,000 noisy and clean images along with 10,000 noisy images for test (the whole test set is noisy). We used a subset of training set with 20,000 elements for the training phase and the whole test set for the test phase. According to empirical results we created a DBN with 4 hidden layers: 784-1000-500-250-100. We trained this network by 200 batches of data each including 100 images.

According to previous discussions we used “relative activity” to find noise nodes in the last layer of the DBN: For all images in the dataset, we put a clean image and then its corresponding noisy image as the input of the network. Afterward, we computed the difference between the last nodes’ values. The average of difference in each node considering all images showed “average relative activity” of nodes (noisy vs. clean). Based on experimental results, nodes with an “average relative activity” higher than 0.9 were considered noise nodes.

Now for all 10000 clean images in our training set, we compute the values of nodes in the last layer of our trained DBN. The average of these values for each node considered neutral value of node. Finally, to reconstruct a noiseless image from a noisy image, we change the values noise nodes to their neutral values. As a result, noise nodes are inactive and reconstruction would be noiseless. Figure  5 shows how the reconstructed results have lower noise. Also Table  1 shows that a reduction of 65.9% in average Mean Square Error (MSE) was achieved when the proposed method was used for the reconstruction of the noisy images.


Figure 5: From left to right: Noiseless images, Noisy images (AWGN with 0.20 variance), Reconstruction without eliminating any noisy node, Reconstruction with eliminating noise nodes. As it is clear, the reconstructed results have much lower noise after eliminating noise nodes.

5 Conclusion and Future Works

In this paper, a novel method for image denoising was proposed. The proposed method makes a model to learn noise behavior using a Deep Belief Network (DBN) and tries to present noise and image contents behavior into two distinct groups of nodes. Then by omitting noisy nodes, the network will be able to produce a noiseless (clean) image by reconstruction of the input image. In our work, thresholds for detection of noise nodes were determined manually. Future works will include:
1. Using an automatic technique to determine thresholds for detecting the noisy nodes in denoising technique presented in this paper.
2. Employing our denoising approach to tackle some other issues in Computer Vision, Speech Recognition, etc. These areas will be addressed in future phases of this project.

Data Mean Square Error (MSE)
Noisy Image 0.0966
Reconstruction without eliminating any node 0.0416
Reconstruction with eliminating noise node 0.0329
Table 1: The table above shows the average MSE for different kinds of reconstructions. A reduction of 65.9% (0.0966 to 0.0329) in average MSE was achieved by reconstruction after eliminating noisy nodes.

6 References

[1] Brigham, E. O., & Morrow, R. E. (1967). The fast Fourier transform. Spectrum, IEEE, 4(12), 63-70.

[2] Kervrann, C., & Boulanger, J. (2006). Optimal spatial adaptation for patch-based image denoising. Image Processing, IEEE Transactions on, 15(10), 2866-2878.

[3] Pan, Q., Zhang, L., Dai, G., & Zhang, H. (1999). Two denoising methods by wavelet transform. Signal Processing, IEEE Transactions on, 47(12), 3401-3406.

[4] Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. Image Processing, IEEE Transactions on, 15(12), 3736-3745.

[5] LeCun, Y., Kavukcuoglu, K., & Farabet, C. (2010, May). Convolutional networks and applications in vision. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on (pp. 253-256). IEEE.

[6] Xie, Junyuan, Linli Xu, and Enhong Chen. ”Image denoising and inpainting with deep neural networks.” Advances in Neural Information Processing Systems. 2012.

[7] Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507.

[8] Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554.

[9] Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural computation, 14(8), 1771-1800.

[9] Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 9999, 3371-3408.

[10] Hinton, G. (2010). A practical guide to training restricted Boltzmann machines. Momentum, 9(1).

[11] Salakhutdinov, R. (2009). Learning deep generative models (Doctoral dissertation, University of Toronto).