I Introduction
Recently, deep learning methods are extensively applied to various fields of science and engineering for the training large volume data [1, 2, 3]. Some current prominent techniques have been extensively and rapidly studied for the various kinds of problems such as classification, object detection, timeseries prediction, and so on [4]
. Their new architectures of CNN (Convolutional Neural Network) such as AlexNet
[5], GoogLeNet [6], VGG16 [7], and ResNet [8] have been developed [9].In deep learning, trial and error to find the optimal network structure for given input data will be required and the development of new structure is a difficult task even for experienced designers. Due to the computational cost or resource for big data representation and its analysis of deep learning, the investigation of possibility for all parameter sets is impracticable. The transfer learning is famous method of reuse of the trained network with high classification capability such as GoogLeNet, VGG16, or ResNet for a new problem, because it is easier to construct a network structure for a new data. However, in order to achieve powerful classification capability, we consider that the representation of new data requires the specified network structure with the covered feature of data space. The learning method by using the pretrained model cannot express the characteristic data representation. We have proposed the adaptive structural learning method of DBN
[10]. The method has an outstanding function of determination for the structure of RBM [11, 12] which has the selforganized algorithm by hidden neuron generation and deletion during learning phase. The number of RBM layers is also automatically defined by the layer generation method [13, 14, 15]. The adaptive structural learning of DBN method shows the highest classification capability for image recognition of the benchmark data sets such as MNIST [16], CIFAR10, and CIFAR100 [17]. The classification accuracy for training data sets was almost 100% and 99.5%, 97.4%, and 81.2% for test cases, respectively [10].In this paper, a new object detection method for the DBN architecture is proposed for localization and category of objects. The method is a task for finding semantic objects in images as Bounding Box (BBox). The basic idea is similar to famous object detection methods of CNN such as RCNN [18], YOLO [19], and SSD [20]
, but the method of CNN cannot be applied to DBN architecture because of the image feature of convolutional filter. Moreover, the method of CNN also estimates the probability of semantic objects in an image as a continuous heatmap. On the contrary, the proposed method can represent discrete heatmap, since the hidden neuron takes binary value {0, 1}.
The proposed method was evaluated on the Chest Xray image benchmark data set (CXR8) [21], which is one of the most commonly accessible radiological examination for many lung diseases. Compared with the result of transfer learning with wellknown CNN methods, the proposed method showed higher performance for both classification (more than 94.5% classification for test data) and localization (more than 90.4% detection for test data) than the other CNN methods.
The remainder of this paper is organized as follows. In section II, basic idea of the adaptive structural learning of DBN is briefly explained. Section III gives the description of the proposed object detection algorithm and generation method of heatmap by using the trained DBN network. In section IV, the effectiveness of our proposed method is verified on CXR8. In section V, we give some discussions to conclude this paper.
Ii Adaptive Learning Method of Deep Belief Network
This section explains the traditional RBM [12] and DBN [11] to describe the basic behavior of our proposed adaptive learning method of DBN.
Iia Restricted Boltzmann Machine
A RBM [12]
is a stochastic unsupervised learning model. As shown in Fig.
1, RBM has the network structure with two kinds of layers and , and three parameters . There are two important properties in RBM, one is the neuron is represented by binary value. The other is there is no connection among same layers. These properties enable each hidden neuron to learn independent feature of given input patterns.Since RBM is a stochastic model, the optimal parameter
for given input can be found by a maximum likelihood estimation method. Contrastive Divergence (CD
) [22] uses two tricks to speed up the Gibbs sampling method as the most popular RBM learning method.IiB Deep Belief Network
DBN [11] is multiple layers of graphical model for stochastic unsupervised learning. The most popular DBN is a form of layers of two or more RBMs for the pretraining phase. The output patterns of hidden neurons at the th RBM can be seen as the next input at the
th RBM. Successively, the trained DBN calculates a feedforward network for the finetuning. In the supervised learning for classification task, the final output layer works to calculate the output probability
for a category by Softmax method.IiC Neuron Generation and Annihilation Algorithm of RBM
Generally, the decision of the optimal structure to input data depends on the skill of the network designer, since many trials for finding the optimal set of parameters should be required to reach higher classification. In order to make a solution, the adaptive structural learning method of RBM (Adaptive RBM) has been proposed. The method has an outstanding function of determination for the structure of RBM [11, 12] which has the selforganized algorithm by hidden neuron generation and deletion during learning phase. The method as shown in Fig.2
gives a good solution for the problem that the traditional RBM has a stationary network structure during training even if the network is not enough hidden neurons to classify input data. The Adaptive RBM can determine the suitable number of hidden neurons by observation the variance of weights and its coefficients during the training.
The algorithm monitors the training situation of the network with the fluctuation of weight vector, the Walking Distance (WD)
[24]. WD is defined as the difference between the past variance and the current variance for learning parameters such as weight during training. The basic idea of [24] is as follows. If the network does not have enough neurons to classify them sufficiently, then WD will tend to fluctuate large after the long training process. The situation shows that some hidden neurons may not represent an ambiguous pattern due to the lack of the number of hidden neurons. In order to represent ambiguous patterns into two neurons, a new neuron is inserted to inherit the attributes of the parent hidden neuron as shown in Fig. 2(a).Adaptive RBM employs the neuron generation with inner product of the variance of monitoring two kinds of parameters and except . The reason for the exclusion of is that the parameter is observed the oscillation according to the input patterns, because the input signals will include some noise data. We showed the detailed algorithms, equations, and experiment results in the paper [10].
On the other hand, after neuron generation process, we can see that the network has some unnecessary or redundant neurons. Since these neurons are not contributed to infer for the input data set, the network takes much computational cost. The neuron annihilation algorithm can remove the specified neurons due to output activation of signal. Fig. 2(b) shows that the corresponding neuron is annihilated.
IiD Layer Generation Algorithm of DBN
By using the idea of Adaptive RBM, the adaptive structural learning method of DBN (Adaptive DBN) was proposed. The method has the selforganized algorithm by hidden neuron generation and deletion during learning phase. The number of RBM layers is also automatically defined by the layer generation method [13, 14, 15], where each RBM is followed by Adaptive RBM method in section IIC. DBN has data representation power that performs the specified features from an abstract concept to concrete representation at each layer in the direction to output layer. Adaptive DBN can automatically adjust an optimal network structure by the selforganization.
The WD and the energy function at each RBM layer during learning process of Adaptive DBN was observed. If the values of both WD and the energy function do not become small values, then a new RBM will be generated to keep the suitable network structure for the data set, since the RBM has lacked data representation capability to figure out an image of input patterns. Therefore, the condition for layer generation is defined by using the total WD and the energy function. Fig. 3 shows the overview of layer generation in Adaptive DBN. Please see the [10] for details.
Iii Object detection algorithm
Image recognition methods are mainly classified into three tasks, classification, object detection, and segmentation. Images are usually complex and contain multiple objects. The assign of a label with image classification models can become uncertain because the classification models cannot detect semantic objects for localization. Therefore, the object detection models are more appropriate to identify multiple relevant objects in a single image. In this paper, we focus on a new object detection method by using Adaptive DBN.
Iiia Object detection by CNN
There are many object detection algorithms which find semantic objects in images. Open CV, which is one of the most famous computer vision library, provides some function for extracting contours from images or machine learning algorithms for object detection such as facial detection
[25].In deep learning, RCNN [18], YOLO [19], and SSD [20] are famous object detection algorithms for finding BBox in images. RCNN provides a simple detection algorithm by using the training result of CNN. The method first extracts a candidate small region from an image and then calculates output values for the extracted region in trained network. The method decides that the region includes a semantic object (BBox) if the output value is higher than the predetermined threshold. YOLO [19] and SSD [20] are extension methods of RCNN with respect to finding the optimal size of regions. The basic idea of these methods is to split an image to fixed size grids, and then the optimal candidate region is determined by repeating adjustment of the grid size.
While these methods are used to detect BBoxes in images, Xiaosong et.al proposed the generation method which estimates probability of semantic objects for given image as a heatmap [21]
. The generated heatmap is also available for segmentation of images since the heatmap represents the likelihood map of pathologies. The method utilizes that convolution and pooling layers in CNN form two dimensional array. A heatmap is generated by product of activation values of last pooling layer and weights of next prediction layer (full connected layer) for given input. Although max pooling is often used in pooling layer of CNN, the activation values of the last pooling layer is calculated by LogSumExp (LSE) for representation of heatmap.
IiiB Object detection by DBN
In our previous research, the Adaptive DBN as described in section II was applied to classification task. Since the method showed high classification accuracy for several image benchmark data sets, the method can depict the signal flow in the trained DBN network for given input by pursuing the network path of DBN from input neurons to output neurons [26]. The method is an explicit knowledge acquisition method and can reach better classification accuracy by the partial modification of path and weights. On the contrary, this paper proposes a novel method for object detection task by using the signal flow and weights of the trained DBN network.
We applied the basic ideas of object detection in CNN such as RCNN, YOLO, and SSD to DBN. However, the methods of CNN cannot be applied to DBN architecture because of the image feature of convolutional filter. Our proposed method divides the image into regions by Voronoi diagram and uses the output probability of each category for regions. Algorithm 1 shows the detailed algorithm of our proposed detection method.
Moreover, the generation method of heatmap is implemented on Adaptive DBN in addition to detection of BBox. The CNN [21] can calculate a heatmap by product of activation values of pooling layer and weights of prediction layer for given input because the pooling layer forms two dimensional array. However, hidden neurons in DBN are represented by one dimensional array. In DBN, the product of activation values of last hidden layer and weights of next prediction layer (softmax) is represented by one dimensional feature vector. It means there are no mapping information between the vector and input image. Therefore, we solved the problem by backward calculation from the vector in output layer to input layer. Fig. 4 shows an overview of calculation procedure of a heatmap. In this paper, the value of a heatmap has the range and it is represented by jet color array [27] as same as the paper [21].
(1) 
Iv Experimental Results
In this paper, the effectiveness of our proposed method for CXR8 image benchmark data set is verified. The classification and detection performance are compared with some CNN methods.
Iva ChestXray8
CXR8 [21] is one of the most commonly accessible radiological examination for many lung diseases. The data consists of 112,120 images collected by 30,805 patients. As shown in Table I, nine class labels of normal state and eight diseases including cancer are defined for classification. The data is divided into training set and test set, and the classification accuracy and ROC curve with several wellknown deep network such as VGG16, GoogLeNet, and ResNet, are reported on the original paper [21]. In addition, 984 Bounding Boxes (BBox) are provided for localization. Fig. 5 shows image examples of CXR8. The red rectangle in the image shows given BBox.
Category  Classification  Detection (BBox) 

No Finding  60,361   
Mass  5,782  85 
Nodule  6,331  79 
Atelectasis  11,559  180 
Cardiomegaly  2,776  146 
Effusion  13,317  153 
Infiltration  19,894  123 
Pneumonia  1,431  120 
Pneumothorax  5,302  98 
Total  112,120  984 
IvB Classification results
In this subsection, the effectiveness of proposed Adaptive DBN for eight diseases of CXR8 is verified. The parameters for Adaptive DBN are as follows. The training algorithm was Stochastic Gradient Descent (SGD) method, the batch size was 100, the learning rate was 0.005, the initial number of hidden neurons was 400,
, , and . We used the computer with the following specifications during training: CPU: Intel(R) 24 Core Xeon E52670 v3 2.3GHz, GPU: Tesla K80 4992 24GB 3, Memory: 64GB, OS: Cent OS 6.7 64 bit. The classification accuracy of the Adaptive DBN for test data was compared with the several CNN methods, which are reported on the paper [21].Table II shows the classification accuracy of test data for each diseases in CXR8. ‘GoogLeNet’, ‘VGG16’, and ‘ResNet50’ are the result of transfer learning with wellknown CNN methods [21]. The classification accuracy of ResNet50 was 81.4% which was higher value than the other CNN methods. On the other hand, the classification accuracy of the Adaptive DBN was more than 90% for all diseases and it achieved the highest value among the comparison methods.
Fig. 6 shows the ROC curve of the Adaptive DBN. The ROC curve plots true positive rate such as sensitivity on vertical axis and false positive rate (1specificity) on horizontal axis. The Adaptive DBN showed better performance since the area under the Adaptive DBN curve was larger than the other CNN methods (Please see the paper [21] for details).
Method  

Category  GoogLeNet  VGG16  ResNet50  Adaptive DBN 
No Finding         90.0% 
Mass  54.6%  51.0%  56.0%  96.3% 
Nodule  55.7%  65.5%  71.6%  97.2% 
Atelectasis  63.0%  62.8%  70.6%  94.5% 
Cardiomegaly  70.5%  70.8%  81.4%  98.1% 
Effusion  68.7%  65.0%  73.6%  97.2% 
Infiltration  60.8%  58.9%  61.2%  96.0% 
Pneumonia  59.9%  51.0%  63.3%  99.9% 
Pneumothora  78.2%  75.1%  78.9%  98.1% 
IvC Detection results
By using the training result of the Adaptive DBN in section IVB, the proposed detection algorithm (Algorithm 1) was applied to 984 images with given BBox. In the simulation results, the following parameters were used for the detection algorithm; , .
Table. III shows the detection accuracy. ‘ResNet50’ is the result of transfer learning described in [21] as same as Table II. In this paper, the standard Intersection over Union ratio (IoU) was used to examine the accuracy. This is, the detected BBox is decided to be correct if the ratio of intersection between given BBox and detected BBox is more than the predetermined threshold. We set this threshold to 75% as same as the original paper. The detection accuracy of our proposed method is higher than the CNN method for all diseases. Especially, our method was able to detect all diseases with more 90% ratio when was .
Adaptive DBN  
Category  ResNet50 [21]  
Atelectasis  47.2%  78.9%  86.1% 
Cardiomegaly  68.4%  99.3%  100.0% 
Effusion  45.0%  85.6%  92.8% 
Infiltration  47.9%  91.9%  93.5% 
Pneumonia  35.0%  84.2%  92.5% 
Pneumothorax  23.4%  80.6%  83.7% 
Mass  25.8%  88.2%  91.8% 
Nodule  5.0%  72.2%  77.2% 
Total    85.7%  90.4% 
IvD Investigation of the generated heatmap
By using the training result of the Adaptive DBN, the heatmap images were generated in addition to detection of BBox. The heatmaps in Fig. 10 to Fig. 14 show the detection result of BBox and the generated heatmap for some images. The red and blue rectangles in the image are given BBox and detected BBox, respectively. A heatmap is represented by the continuous value of range , where the color map is jet color array (red means high value, while blue means small value). The diseases for detected BBoxes in Fig. 10 to Fig. 14 were as follows; Infiltration (Fig. 10), Mass (Fig. 10), Nodule (Fig. 10), Mass and Pneumothorax (Fig. 10), Atelectasis (Fig. 14), Infiltration (Fig. 14), Atelectasis (Fig. 14), Atelectasis (Fig. 14).
Overall, the red area of the generated heatmap included in both the given BBox and detected BBox. On the other hand, the blue or yellow areas didn’t include in these BBoxes. This tendency was seen in not only large diseases (e.g. Cardiomegaly or Infiltration) such as Fig. 10, but also small diseases (e.g. Mass or Nodule) such as Fig. 10. We consider that the experimental results caused by the discrete heatmap with binary output of final RBM layer instead of continuous heatmap. As a result, the red regions represents localization with strong relation to diseases and blue regions represents localization with weak relation. The generated heatmap shows the portion with strong relation more clearly.
In Fig. 14, the detected BBox was located at a little upper than the given BBox. The red area of the heatmap was also at upper position. The detected BBox is slightly larger than the given BBox. The detected BBoxes are almost same as the given BBoxes except the different size. For better detection capability, the feature of the generated heatmap will be investigated with the medical specialists.
V Conclusion
Deep learning is widely used in various kinds of research fields, especially image recognition. In our research, Adaptive DBN which can find the optimal network structure for given data was developed. The method shows higher classification accuracy than existing deep learning methods for several benchmark data sets. In this paper, Adaptive DBN was applied to not only classification task but also object detection task for finding BBox. A new detection algorithm with the trained DBN network was proposed and probability of semantic object was visualized as heatmap. In the simulation results, the proposed method was evaluated on CXR8. The method showed higher performance for both classification and detection accuracy compared with some existing CNN methods. Our proposed method will be further improved for better detection capability by evaluating the method on the other large big data sets.
Acknowledgment
This work was supported by JSPS KAKENHI Grant Number JP17J11178.
References
 [1] Y.Bengio: Learning Deep Architectures for AI, Foundations and Trends in Machine Learning archive, vol.2, no.1, pp.1127 (2009)
 [2] V.Le.Quoc, R.Marc’s Aurelio, et.al.: Building highlevel features using large scale unsupervised learning, Proc. of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.85958598 (2013)
 [3] Markets and Markets, http://www.marketsandmarkets.com/MarketReports/deeplearningmarket107369271.html (accessed 28 November 2018) (2016)
 [4] M.Mohammadi, A.AlFuqaha, S.Sorour, and M.Guizani, Deep Learning for IoT Big Data and Streaming Analytics: A Survey, in IEEE Communications Surveys & Tutorials (2018)
 [5] A.Krizhevsky, I.Sutskever, G.E.Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Proc. of Advances in Neural Information Processing Systems 25 (NIPS 2012) (2012)
 [6] C.Szegedy, W. Liu, Y.Jia, P.Sermanet, S.Reed, D.Anguelov, D.Erhan, V.Vanhoucke, A.Rabinovich, Going Deeper with Convolutions, Proc. of CVPR2015 (2015)
 [7] K.Simonyan, A.Zisserman, Very deep convolutional networks for largescale image recognition, Proc. of International Conference on Learning Representations (ICLR 2015) (2015)

[8]
K.He, X.Zhang, S.Ren, J.Sun, J, Deep residual learning for image recognition
, Proc. of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770778 (2016)
 [9] O.Russakovsky, J.Deng, H.Su, J.Krause, S.Satheesh, S.Ma, Z.Huang, A.Karpathy, A.Khosla, M.Bernstein, et al., Imagenet large scale visual recognition challenge, International Journal of Computer Vision, vol.115, no.3. pp.211–252 (2015)
 [10] S.Kamada, T.Ichimura, A.Hara, and K.J.Mackin, Adaptive Structure Learning Method of Deep Belief Network using Neuron GenerationAnnihilation and Layer Generation, Neural Computing and Applications, pp.1–15 (2018)
 [11] G.E.Hinton, S.Osindero and Y.Teh, A fast learning algorithm for deep belief nets, Neural Computation, vol.18, no.7, pp.1527–1554 (2006)
 [12] G.E.Hinton, A Practical Guide to Training Restricted Boltzmann Machines, Neural Networks, Tricks of the Trade, Lecture Notes in Computer Science (LNCS, vol.7700), pp.599–619 (2012)
 [13] S.Kamada and T.Ichimura, An Adaptive Learning Method of Restricted Boltzmann Machine by Neuron Generation and Annihilation Algorithm. Proc. of 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC2016), pp.1273–1278 (2016)
 [14] S.Kamada, T.Ichimura, A Structural Learning Method of Restricted Boltzmann Machine by Neuron Generation and Annihilation Algorithm, Neural Information Processing, Proc. of the 23rd International Conference on Neural Information Processing, Springer LNCS9950), pp.372–380 (2016)
 [15] S.Kamada and T.Ichimura, An Adaptive Learning Method of Deep Belief Network by Layer Generation Algorithm, Proc. of IEEE TENCON2016, pp.2971–2974 (2016)
 [16] Y.LeCun, L.Bottou, Y.Bengio, and P.Haffner, Gradientbased learning applied to document recognition, Proc. of the IEEE, vol.86, no.11, pp.2278–2324 (1998)
 [17] A.Krizhevsky: Learning Multiple Layers of Features from Tiny Images, Master of thesis, University of Toronto (2009)
 [18] R.Girshick, et al., Rich feature hierarchies for accurate object detection and semantic segmentation, Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.580–587 (2014)
 [19] J.Redmon, S.Divvala, R.Girshick, and A.Farhadi, You Only Look Once: Unified, RealTime Object Detection, Proc. of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.779–788 (2016)
 [20] W.Liu, et al., SSD: Single Shot MultiBox Detector, arXiv:1512.02325 [cs.CV] (2015)
 [21] X.Wang, Y.Peng, L.Lu, Z.Lu, M.Bagheri, R.M.Summers, ChestXray8: Hospitalscale Chest Xray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases, Proc. of IEEE Computer Vision and Pattern Recoginition, pp.3462/3471 (2017)
 [22] G.E.Hinton, Training products of experts by minimizing contrastive divergence, Neural Computation, vol.14, no.8, pp.1771–1800 (2002)

[23]
D.Carlson, V.Cevher and L.Carin, Stochastic Spectral Descent for Restricted Boltzmann Machines
. Proc. of the Eighteenth International Conference on Artificial Intelligence and Statistics, pp.111–119 (2015)
 [24] T.Ichimura and K.Yoshida Eds., KnowledgeBased Intelligent Systems for Health Care, Advanced Knowledge International (ISBN 0975100440) (2004)
 [25] OpenCV, https://docs.opencv.org/3.1.0/d4/d73/tutorial_py_contours_begin.html (accessed 28 November 2018)

[26]
S.Kamada, T.Ichimura, and T.Harada, Adaptive Structural Learning of Deep Belief Network for Medical Examination Data and Its Knowledge Extraction by using C4.5
, Proc. of 2018 IEEE First International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), pp.3340 (2018)
 [27] Matplotlib, https://matplotlib.org/examples/color/colormaps_reference.html (accessed 28 November 2018)
Comments
There are no comments yet.