Generating realistic labeled bio-images is a highly important task. Recently, there have been many attempts to apply deep learning to computer-aided detection or diagnosis in various bioimaging fields [2, 8, 11, 12, 13]. In order to learn high-performance deep network models, there is immense demand for large amounts of labeled data. However, in many bioimaging fields, the large-size of labeled dataset is scarcely available. Therefore, it is becoming increasingly important to generate realistic labeled data using small amounts of data in bioimaging fields where suffering from lack of labeled data.
Although there have been a few researches dedicated to solving this problem through generative model, there were some limitations as follows: 1) The generated bio-image does not seem realistic [3, 9]; 2) the variation of generated bio-image is limited [1, 4, 7, 10]; and 3) additional label annotation task is needed which requires expensive cost [6, 7].
In order to overcome aforementioned limitations, we propose a novel realistic labeled bio-image generation method through visual feature processing in latent space. The proposed method learns the generative model with adversarial learning to form manifold in latent feature space using few existing annotated images. After learning the generative model, the encoder of the generative model could map mass images onto the manifold of the latent feature space and we define it as visual feature. The processed visual features, the results of the proposed visual feature processing, could be decoded into pixel space through decoder of the generative model to generate wide expression range of realistic mass image which has targeted characteristics.
The main contribution of this paper is summarized as follows: 1) We have proposed a novel method for generating realistic labeled masses that is not confined to the expression range of the limited real-world data.
2) Through the proposed method, we have succeeded in forming an appropriate manifold for the characteristics of masses in latent space which is difficult due to the non-rigid nature of the masses.
3) Comprehensive experiments have been conducted to validate the effectiveness of the proposed method. Experimental results show that masses generated by the proposed method are remarkably realistic. Moreover, the generated masses have a wide expression range of targeted mass characteristics.
2 Generating Realistic Labeled Masses by Visual Feature Processing in Latent Space
2.1 Overview of the Proposed Mass Generation
In this paper, we design the proposed method to generate breast masses according to the medical description. The BIRADS (Breast Imaging Reporting and Data System)  which is designed to characterize the masses on breast imaging is used as the medical description in this paper.
Overall architecture of the proposed method is shown in Fig. 1. As seen in Fig. 1, the proposed architecture is built upon Generative Adversarial Networks and it includes four main modules: a mass generator (autoencoder) module (), two discriminator modules ( and ), and BIRADS description embedding modules ( and
). The two discriminators scheme is adopted to effectively extract and backpropagate the BIRADS description characteristics of each generated mass.
The goal of learning phase is to let the learns the manifold of the breast masses which will be used in generating phase. Overall explanation of the learning/generating phase procedure of the proposed architecture is as follows.
In the learning phase, as seen in Fig. 1, breast mass image and two of the major information in BIRADS, margin () and shape () labels are inputted to the network. The margin and shape labels are embedded into and through BIRADS description embedding modules before inputted to . Then generates the fake mass () using the inputted mass () and embedded labels as
where and denotes the embedded margin () and shape () labels of BIRADS description.
In the generating phase, as seen in Fig. 2, we use the and BIRADS description embedding modules which are trained to approximate breast mass manifold. includes encoder and decoder which consist of convolution layer and transposed convolution layer, respectively. In front of , a seed breast mass and corresponding margin or shape labels are embedded through BIRADS embedding modules similar to learning phase. When the breast mass with embedded labels come into the input of , the encoder maps it on the latent space formed by the seed mass and corresponding BIRADS description. By performing the feature processing on the latent space, realistic breast masses could be generated by decoding processed visual features into pixel space.
2.2 Visual Feature Processing
As described before, the visual feature () can be encoded by mapping the breast mass and embedded labels (medical description) into the latent space through encoder. If a visual feature in the latent space is fed to the decoder, a breast mass representing the visual feature is generated. The generated breast mass contains characteristics of seed breast mass and embedded medical description. To increase the non-linear diversity in mass generation, we devise visual feature processing to consider multitudinous possible visual features on the manifold where visual features of the breast masses exist. Note embedded labels (medical description) and seed mass fix the visual features within the latent space.
2.2.1 Visual Feature Processing by Interpolation.
The various masses are generated from multitudinous possible visual features on the manifold. A visual feature can be obtained by interpolation processing with adjacent fixed visual features. Namely, a new visual feature is acquired through interpolated between adjacent visual features which are fixed by embedded labels (medical description) and seed mass. Interpolation betweendifferent visual features allows considering possible visual features within the range of N visual features.
denotes the weight multiplied by the visual feature in the interpolation. If these visual features are linearly independent, that is, if the only condition for making a linear combination of visual features 0 is that all is zero, then these visual features form (
)-dimension hyperplane. In other words, whenis less than the dimension of the visual feature itself, if visual features are linearly independent of each other, then all possible visual features in the hyperplane of ()-dimension can be considered.
2.3 Mass Generation through Visual Feature Processing
In this section, we describe in detail deep learning procedure how the breast mass is generated through the visual feature processing. In the learning procedure, the learns the manifold of the breast masses which could be used in generating phase. In the generating phase, and BIRADS description embedding modules are used to generate realistic breast masses.
The behavior of BIRADS description embedding modules in learning and generating phase is as follows. In the learning and generating phases, the BIRADS description embedding modules map one-hot label form of BIRADS description label into the size of the breast mass image. Then the breast mass image and the embedded labels are concatenated and then inputted into the encoder.
The details of each generation step in the generating phase are as follows: 1) The encoder receives the concatenated input and maps it into the 1024-dimensional latent feature space. 2) The aforementioned processing to the visual feature is applied to generate a visual feature of the breast mass with appearance and characteristic that is not presented in the seed breast mass image. In feature processing in learned latent feature space, there is room for expansion since visual feature processing is able to include any operations applicable to visual features besides interpolation. 3) When a processed visual feature is inputted to the decoder, the decoder maps it into the pixel.
2.4 Learning Strategy of the Proposed Deep Network Framework
In this study, the proposed deep network framework utilizes two BIRADS description labels (i.e. margin and shape), and it has two discriminators ( and
) to predict them. The loss function ofis defined as
where the and
denotes the prediction about real/fake and estimated margin label of, respectively. is the weight multiplied by each loss term to balance overall loss function. The first two terms of loss function represent the general GAN loss of adversarial learning that predicts the real/fake of the real breast mass image () and generated breast mass image (). The loss terms in decrease when predicts the ground truth margin label () more precisely from the inputted and .
The loss function of is defined as
where denotes the prediction about shape label of . The first two terms of loss function represent the general GAN loss of adversarial learning that predicts the real/fake of the and by . The loss terms in decrease when predict the ground truth shape label () from the inputted and .
The loss term that predicts the BIRADS description label by taking as an input serves as noise in the early learning phase when does not generate a realistic breast mass. However, after has been able to generate a breast masses that is similar to real breast masses, it pushes to predict the BIRADS description label better for the generated masses which has non-linearly different aspect (but does not differ much enough to have different margins and shape labels). Therefore, it makes and have a data augmentation effect.
Next, the loss function of is defined as
where and are the weights multiplied by each loss term to balance overall loss function. The loss term denotes reconstruction loss using L1-norm between real breast mass and generated breast mass . The first two loss terms in loss function, like the loss functions of and , mean the general GAN losses of adversarial learning for real/fake predictions of and for the . The loss terms in decrease when could generate that has more strong characteristics along BIRADS description , . Like the loss term in , after has been able to properly identify the BIRADS description label, it pushes represents the characteristic of the BIRADS description label better which is used in generation. The last loss term pushes to form a manifold similar to the real data distribution.
3 Experimental Results
For identifying the effectiveness of the proposed method, we utilized the publicly available DDSM dataset. The mammograms scanned by Howtek 960 were selected from DDSM dataset for the experiments. A total of 841 regions of interest (ROI) were used. The size of seed image was resized to 64 by 64. For the BIRADS description, as aforementioned, shape and margin of masses were selected from BIRADS descriptions. Since these are representative characteristics of breast masses and widely used for recording in clinical reports.
3.2 Architecture and Training Details
Each BIRADS description embedding module consisted of two fully-connected layers which have 256 and 4096 neurons. The encoder and decoder of the proposed architecture composed of seven convolution and transposed convolution layers, respectively. The discriminator module composed of ten convolution layers and three fully-connected layers. Each module utilizes LeakyReLU-Conv (or Transposed Conv)-BatchNorm structure.
For training the generator and discriminator, the Adam optimization was used with learning rate 0.0002 and pytorch default Adam optimizer settings. The values of loss function balancing weights, , , and
were 10, 10, 10, and 300, respectively. We utilized a batch size of 512 and trained the network for 8000 epochs. For image data augmentation, horizontal flipping, vertical flipping, and cropping were performed in a random manner.
3.3 Visual Feature Processing in Latent Space with Interpolation
3.3.1 Visual Feature Interpolation between Two Visual Features.
The visual feature interpolation results between two visual features are shown in Fig. 3. The two visual features were selected from 841 visual features of 841 mass ROIs in the dataset. In experiments, two visual features were selected and equidistant visual features between the two features were interpolated. The interpolated visual features were decoded into pixel space. The leftmost and rightmost images in Fig. 3 are two seed breast mass images. Among the images in the middle, the leftmost and rightmost images represent decoded images from visual features fixed with seed image and embedded labels. The eight images that exist between them are generated images through the proposed visual features interpolation.
As seen in Fig. 3, these results showed that the generator module of the proposed deep network framework learned the manifold of the real masses distribution properly and could generate realistic masses by utilizing possible visual features on the manifold.
3.3.2 Visual Feature Interpolation from Three Visual Features.
The visual feature interpolation results from three visual features are shown in Fig. 4. The three visual features were selected from 841 visual features of 841 mass ROIs in the dataset. As seen in Fig. 4, twelve interpolated visual features from the three visual features were visualized. By inputting twelve visual features into the decoder and mapping them to pixel space, we verified that the manifold from these three seed breast masses was formed suitably.
3.4 Visual Feature Processing within a Specific BIRADS Category
This section demonstrates that generating masses with intended annotation information is achievable through the proposed method. Fig. 5 shows the generated masses using visual features obtained from the seed masses with a specific BIRADS category () (e.g. ill-defined margins and round shape). The total number of seed masses used in the experiment was 161. Among them, 20 masses had ill-defined margins and round shape, 104 masses had spiculate margins and irregular shape, and 37 masses had circumscribed margins and oval shape.
As seen in Fig. 5, the masses in the left side of corresponding three seed masses were generated from interpolated visual features. The interpolated visual feature was calculated as follows: 1) Twenty visual features in a specific BIRADS category () were randomly selected out of the number of candidate masses in a specific category; 2) The corresponding twenty weights () were randomly initialized in the unit of 0.05.
In Fig. 5, the top three masses which have the largest weights and corresponding weights are represented in the right side of each generated mass. As seen in Fig. 5, the generated masses were realistic and had target characteristics (e.g. ill-defined margins and round shape). Therefore, the masses which are generated utilizing visual features in a specific BIRADS description category did not require additional labeling cost.
In this paper, we proposed the novel bio-image generation method through visual feature processing. The proposed method learned the generative model with adversarial learning to effectively form manifold in latent feature space using a limited number of annotated mass images. After learning the generative model, the encoder of the generative model could map mass images onto the manifold of the latent feature space (we defined it as visual feature). By decoding the processed visual features, the mass image was generated. Through extensive experiments, we verified that the masses generated by the proposed method were realistic and had a wide expression range. Moreover, it was possible to generate masses with the target characteristics. By generating the masses with the target characteristics, it could alleviate the labeling workload for utilizing generated masses in real-world. It is expected that the proposed method could be generalized to other bioimaging fields where suffering from lack of annotated data.
-  Ben-Cohen, A., Klang, E., Raskin, S.P., Amitai, M.M., Greenspan, H.: Virtual pet images from ct data using deep convolutional networks: Initial results. In: International Workshop on Simulation and Synthesis in Medical Imaging. pp. 49–57. Springer (2017)
-  Cheng, J.Z., Ni, D., Chou, Y.H., Qin, J., Tiu, C.M., Chang, Y.C., Huang, C.S., Shen, D., Chen, C.M.: Computer-aided diagnosis with deep learning architecture: applications to breast lesions in us images and pulmonary nodules in ct scans. Scientific reports 6, 24454 (2016)
-  Chuquicusma, M.J., Hussein, S., Burt, J., Bagci, U.: How to fool radiologists with generative adversarial networks? a visual turing test for lung cancer diagnosis. In: Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on. pp. 240–244. IEEE (2018)
-  Costa, P., Galdran, A., Meyer, M.I., Abràmoff, M.D., Niemeijer, M., Mendonça, A.M., Campilho, A.: Towards adversarial retinal image synthesis. arXiv preprint arXiv:1701.08974 (2017)
-  D’Orsi, C.J.: ACR BI-RADS atlas: breast imaging reporting and data system. American College of Radiology (2013)
-  Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification. arXiv preprint arXiv:1803.01229 (2018)
-  Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: Synthetic data augmentation using gan for improved liver lesion classification. In: Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on. pp. 289–293. IEEE (2018)
Gordon, M., Hadjiiski, L., Cha, K., Chan, H.P., Samala, R., Cohan, R.H., Caoili, E.M.: Segmentation of inner and outer bladder wall using deep-learning convolutional neural network in ct urography. In: Medical Imaging 2017: Computer-Aided Diagnosis. vol. 10134, p. 1013402. International Society for Optics and Photonics (2017)
-  Kitchen, A., Seah, J.: Deep generative adversarial neural networks for realistic prostate lesion mri synthesis. arXiv preprint arXiv:1708.00129 (2017)
-  Nie, D., Trullo, R., Lian, J., Petitjean, C., Ruan, S., Wang, Q., Shen, D.: Medical image synthesis with context-aware generative adversarial networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 417–425. Springer (2017)
-  Roth, H.R., Lu, L., Liu, J., Yao, J., Seff, A., Cherry, K., Kim, L., Summers, R.M.: Improving computer-aided detection using convolutional neural networks and random view aggregation. IEEE transactions on medical imaging 35(5), 1170–1181 (2016)
-  Tsehay, Y.K., Lay, N.S., Roth, H.R., Wang, X., Kwak, J.T., Turkbey, B.I., Pinto, P.A., Wood, B.J., Summers, R.M.: Convolutional neural network based deep-learning architecture for prostate cancer detection on multiparametric magnetic resonance images. In: Medical Imaging 2017: Computer-Aided Diagnosis. vol. 10134, p. 1013405. International Society for Optics and Photonics (2017)
-  Zhang, W., Li, R., Deng, H., Wang, L., Lin, W., Ji, S., Shen, D.: Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. NeuroImage 108, 214–224 (2015)