The mammography is a widely used screening tool for breast cancer. It has been shown in many studies [freer2001screening][kooi2017large] that the incorporation of CAD softwares in the reading workflow of mammography can be helpful to improve the diagnostic workup. Equipped with the deep learning (DL) techniques, the CAD scheme was shown to further outperform radiologists from multiple centers across several western countries [screenpoint2019AI]. Although promising CAD performances for mammography have been shown in many previous studies, there is still an essential issue that is not well and explicitly addressed in previous DL works. As shown in Fig. 1, the image styles, like image contrast, edge sharpness, etc., of different vendors are quite different. Accordingly, one DL based CAD scheme may not always perform well on mammograms from different vendors, unless sufficient large and diverse training data are provided. Because the collection of large mammograms from various vendors can be very difficult and expensive, we here propose a mammographic style transfer (mST) scheme to normalize the image styles of different vendors to the same style baseline. It will be shown that style normalization step with the mST scheme can further boost the robustness of the classic Faster-RCNN detector [ren2015faster] to the mammograms of different vendors and improve the detection performance for masses and microcalcification, denoted as C for short throughout this paper.
The direct use of the off-the-shell neural style transfer (NST) methods for the mST scheme may encounter two major issues. First, the style transfer of very subtle but important abnormalities like C or calcification is very challenging. It is because the ST for the C, which could be depicted in less than 20 pixels, may need to be carried out in high resolution. However, to our best knowledge, most classic NST methods only support images with resolution less than , whereas the dimensionality of nowadays mammography is usually larger than . Therefore, the step of image downsize is inevitable in our problem and hence the quality of subtle abnormalities like C after transfer may be compromised. Second, for most classic NST methods, a style reference image is usually needs to be manually selected as network input. However, in our context, an automatic selection scheme for style reference images is needed to facilitate the style normalization process. Meanwhile, the appearance variety of mammography is large and also depends on the category of breast density and the subject’s figure. The consideration of only one style reference image may not be sufficient to yield a plausible transfer results.
To address the two issues, the mST scheme is realized with a new multi-resolution and multi-reference neural style transfer (mrNST) network in this study. By considering multi-resolution, the details of subtle abnormalities like C or calcification can be better preserved in the transfer process. With the multiple reference images, our mrNST network can deal with wide variety of mammography and integrate the style transfer effects from the reference images for more plausible style normalization results. Our mrNST network also takes into account the similarities between the input image to be transferred and reference images for the integration of multiple style transfer effects. To our best knowledge, this is the first study that explicitly explores the style transfer technique to mitigate the style variation problem, which may compromise the detection performance for breast lesions.
We perform the style transfer experiments by comparing with the classic cycleGAN [zhu2017unpaired] and the conventional exact histogram matching (EHM) [coltuc2006exact], and test the style normalization, i.e., mST, results on the detection tasks of masses and Cs in mammograms. The experimental results suggest that the mST results from our mrNST network are more plausible and can mitigate the problem of style differences from distinctive vendors for better detection results.
In this section, we will briefly introduce the concept of NST and then discuss the details of our mrNST network. The network of our mrNST network is shown in Fig. 2, and the backbone is VGG19 [simonyan2014very].
2.1 Neural Style Transfer
The NST, which was first introduced by Gatys et al. [gatys2016image], commonly requires two input images of a content image to be transferred and a style reference image , and then performs feature learning of the feature representatives of and in layer of a NST network. Each column of , , is a feature map, whereas is the number of feature maps in layer and is the product of height and width of each feature map. The output of NST is the style transferred image, denoted as
, by minimizing the loss function:
where the content term compares feature maps from the and of each single layer :
and the style term compares a set of summary statistics:
where is the gram matrix of the feature maps of the layer in response to image .
2.2 Multiple Reference Style Images
The mrNST network takes multiple reference style images for better accommodate the appearance variety of mammography. Different regions in a mammography may need distinctive reference images to be transferred. For example the dense breast image to be transferred may be more suitable to take reference of images with denser glandular tissues. To attain this goal, a quantitative measurement for style similarity is needed.
The gram matrix in the equation 3
computes the co-variance statistics of features at one layer as the quantification of style similarity of the corresponding perceptual level. A higher value in the gram matrix suggests more similar of the corresponding paired feature maps in style. Accordingly, with the multiplereference style images, we can compute the corresponding gram matrices with each style image and integrate of the gram matrices with the max operation. Specifically, A simple but effective multi-reference style term is defined as
The function is a element-wise max operation takes feature maps with the th reference image at the th layer and outputs a matrix, . Specifically, the function computes each element of as
The function is a histogram specification function to normalize the with the reference density histogram, , for numerical stabilization. is the density histogram of a matrix stacked by style gram matrices.
The size of nowadays mammography is commonly bigger than , and may require formidably large GPU memory for any off-the-shelf NST method. In our experience, the ST for an image with the size of
could consume up to 10.8GB GPU memory for inference. For the mST with original size, it is estimated to require more than 160GB GPU memory and hence is very impractical for the clinical usage or laboratorial study. Accordingly, we here propose a multi-resolution strategy that can more efficiently use the GPU resources and still attain the goal of better consideration of local details in the mST scheme.
Referring to Fig. 2, the multi-resolution is implemented by considering the original image (scale0), division of image into patches with overlapping (scale1) as well as patches with overlapping (scale2). The image of scale0 and the patches of scale1 are resized into to fit the memory limit and support the feature learning with the middle- and large-sized receptive fields.
The image and patches of the scale0, scale1 and scale2 are transferred by taking multiple reference style images, see Fig. 2. For each patch/image of each scale, we perform the style transfer by optimizing the multi-reference style term defined in e.q. 4. Afterward, the all transferred patches of scale1 and scale2 are further reconstructed back to the integral mammograms. The reconstructed mammograms of scale1 and scale2 as well as the transferred image of scale0 are then further resized back to the original size. For the final output, we integrate the three transferred images of scale0, scale1 and scale2 with weighted summation and further refine the summed image by a refiner network. The final style transferred mammogram can be computed as
where is the refiner network and denotes a network composed by convolutional layers[szegedy2015going], and , , and are three learnable weights. The refiner network is trained with the GAN scheme, where the refiner network is treated as generator to fool a discriminator . The discriminator , with the backbone of ResNet18 [he2016deep], is devised to check whether the input image is of the target style. The training of the refiner GAN can be driven by minimizing the loss:
3 Experiments and Results
In this study, we involved 1,380 mammograms, where 840 and 540 mammograms were collected from two distinctive hospitals, denoted as and , with local IRB approvals. The mammograms from and were acquired from the GE healthcare (GE) and United Imaging Healthcare (UIH), respectively. All mammograms are based on the unit of breast. Accordingly, there are half cranicaudal (CC) and half mediolateral oblique (MLO) views of mammograms in our dataset. For the training of the refiner GAN with the e.q. 7, we use independent 80 GE and 80 UIH mammograms, which are not included in the 1,380 images.
Throughout the experiments, we set the source and target domains as GE () and UIH (), respectively. As can be found in Fig. 1 and Fig. 4, the image style of GE is relatively soft, whereas the UIH style is sharper. Accordingly, the image styles from different vendors can be very distinctive. We compare our method with the baselines of cycleGAN [zhu2017unpaired] and exact histogram matching (EHM)[coltuc2006exact]. Since the cycleGAN requires training step, we randomly select 100 and 80 images from and , respectively, to train the cycleGAN. Except the refiner GAN, our mrNST doesn’t need a training step. For each ST inference with mr
NST, we select 5 reference images of the target UIH domain with 5 best similar images from an reference image bank of 40 UIH images, which are not included in the 1,380 images and the 80 training data of refiner GAN. The similarity for the selection is based on the area of breast. The selected reference images are of the same view (CC/MLO) with the source image to be transferred. The optimizer Adam is adopted with 400 epochs of optimization for our mrNST.
Fig. 3 illustrates the efficacy of multi-reference and multi-resolution scheme for the mST from GE to UIH. The upper row in Fig. 3 shows better enhance on glandular tissues with 5 reference images on a case with high density, while the lower row suggests the calcification can be better enhanced by fusing the transferred images from all three scales. Fig. 4 shows the mST results from our mrNST and the baselines of cycleGAN and EHM. From visual comparison, the quality of the transferred images from mrNST are much better. The cycleGAN requires large GPU memory and can’t support mST in high resolution. Meanwhile, referring to the right part of Fig. 4, mrNST can preserve the details of vasculature after the mST.
|Score||5.16 0.12||5.43 0.10||5.42 0.15||4.74 0.22||5.29 0.11|
Two experiments are conducted to illustrate the efficacy of our mrNST w.r.t. the transferred image quality and detection performance. The first experiment aims to evaluate the quality of transferred images with the neural image assessment (NIMA) score [talebi2018nima]. Specifically, we randomly select 400 GE and 400 UIH (not overlapped with the training dataset of cycleGAN) for mST. The 400 GE images are transferred to the UIH domain with the comparing methods and the resulted NIMA scores of the transferred images are listed in Table 1
. We also compare the NIMA scores between the transferred and original images at UIH domain with the student t test. The computed-values are , , and , w.r.t. mrNST, EHM, and cycleGAN, suggesting that the quality of mST images from mrNST is not significantly different to the quality of original UIH images in terms of NIMA scores. On the other hand, the quality differences of mST images from EHM, and cycleGAN to the UIH images deem to be significant.
The second experiment aims to illustrate if the mST can help to mitigate the domain gap problem and improve the detection performance. The dataset of UIH () is relatively small, and therefore, we aim to illustrate if the mST from GE to UIH can assist to improve the detection results in the UIH domain. Since the baselines can’t yield comparable image quality to the target UIH domain, we only perform this experiments with mrNST. Specifically, we conduct 5 schemes of various combination of UIH, UIH (simulated UIH with mrNST from GE), and GE data for the training of Faster-RCNN [ren2015faster] with the backbone of resnet50. The detection results for masses and Cs are reported in Table 2.
The testing UIH data, which is served as the testing data for all training settings in Table 2, include 120 images of 90 positive cases and 30 normal images. The 90 testing positive cases have 36 and 28 images with only masses and C, respectively, and 26 images with both. For the training with only real UIH data, there are 420 images with 100 normal cases and 320 positive cases (131 only masses, 123 only C, and 66 both). For the 2 to 5 schemes in Table 2, we aim to compare the effects of adding 420 and 840 extra training data with either real GE or UIH images. The UIH data of the 2 and 3 are the mST results from the GE data of 4 and 5 schemes, respectively, and 420 GE images is the subset of 840 GE images. For systematical comparison, the 420 images GE has the same distribution of mass, C and normal cases with the real UIH 420 images, whereas the 840 GE images are distributed in the same ratio with double size.
In Table 2, the detection performance is assessed with average precision (AP) and recall with average 0.5 (Recall) and 1 (Recall) false-positives (FP) per image. As can be observed, the adding of UIH in the training data can better boost the detection performance, by comparing the rows of 2, 3 to 1 row in Table 2. Referring to 4 and 5 rows in Table 2, the direct incorporation of GE data seems to be not helpful for the detection performance. The transferred UIH images on the other hand are more similar to the real UIH images and can be served more informative samples for the training of detector.
|420 real UIH||0.656||0.761||0.869||0.515||0.459||0.567|
|420 real UIH + 420 UIH||0.724||0.823||0.891||0.569||0.593||0.702|
|420 real UIH + 840 UIH||0.738||0.811||0.912||0.670||0.622||0.784|
|420 real UIH + 420 GE||0.641||0.741||0.847||0.555||0.509||0.651|
|420 real UIH + 840 GE||0.654||0.738||0.869||0.632||0.604||0.738|
A new style transfer method, mrNST, is proposed in this paper to normalize the image styles form different vendors on the same level. The mST results can be attained with high resolution by take multiple reference images from the target domain. The experimental results suggest that style normalization with mrNST can improve the detection results for masses and Cs.