Breast cancer occurs frequently in women over the world. It is the most serious disease and the second common cancer after skin cancer among women in the United States until 2019 [6, 2]. The most urgent is to diagnose breast cancer in early stage. It lacks obvious symptom in early stage of breast cancer; therefore, many patients miss the best chance to cure it. Investigated by some organizations, the survival rate of stages 0 and 1 of breast cancer during 2007 and 2013 is close to 100% . Breast ultrasound (BUS) imaging is harmless, low cost, portable and effective; therefore, it becomes the most important approach for breast cancer early detection. However, breast ultrasound (BUS) images usually have poor quality, low contrast and large uncertainty. The main causes of the uncertainty are: 1) the BUS images are acquired from different machines during different periods with different settings; 2) the characteristics of breasts of different people might be various; and 3) the resolution of BUS image is low, and it has inherent noise, speckles. In addition, the quality of the BUS images depends on the skills and the ways of the radiologists to acquire the images.
In order to prevent misdiagnosis, computer-aided diagnosis (CAD) is studied extensively. Image segmentation and classification are two most important components in CAD systems . BUS image segmentation consists of semi-automatic and fully-automatic methods [36, 37]; and it can be thresholding methods [14, 8], clustering-based methods [26, 30], watershed-based methods, graph-based methods , active-contour model 35], neural network 40, 12] etc.
Gray level thresholding method and area growing lesion contour detecting method were studied in [14, 8]. The region of interest (ROI) was determined by thresholding, and then a maximization utility function was applied to ROI for obtaining the lesion contour. The utility function was the average radial derivative (ARD), which calculated the radial direction derivatives from the seed point to the boundary. The seed point of each image was chosen as the center of ROI. The dataset contained 400 cases (757 images). The area under receiver operating characteristic (ROC) curve () was utilized to evaluate the performance which was 0.91.
Moon et al. proposed a clustering-based breast cancer segmentation method 
. The method consisted of three parts. The first part was quantitative tissue clustering. The tissue within tumor is different from other tissues. A 3-D mean shift clustering was used for selecting tumor tissues according to the echogenicities. The fuzzy c-means clustering method divided the segmented regions into four clusters. The morphology and echogenicity features were extracted, and logistic regression was used to classify the benign and malignant tumors.
Shan et al. proposed a fully automatic breast cancer segmentation method based on neutosophic I-means clustering . It used an automatic seed point selection algorithm to generate the ROI and proposed a novel contrast enhancement method based on frequency and spatial domain. A clustering method combined with neutrosophic logic, the neutosophic I-means clustering, was developed to segment BUS image. The method achieved true positive rate (TPR) of 92.4%, false positive rate (FPR) of 7.2%, and similarity rate of 86.3%; mean Hausdorff error (AHE) of 22.5 pixels and mean absolute error (AME) of 4.8 pixels, respectively.
Xian et al.  developed a fully automatic segmentation method based on the characteristics of BUS image in frequency and spatial domain. The method had two parts: fully automatic breast tumor ROI generation, and a robust tumor segmentation based on ROI. In ROI generation step, a fully automatic reference point selection method was designed using breast anatomy: the location of the tumor was in the middle of the pre-mammary layer and retro-mammary layer. The mean shift algorithm was utilized to extend the reference points as the seed points. Finally, the ROI of tumor was calculated by using the seed points. In tumor segmentation step, a minimization cost function was used, and the frequency and spatial boundary information was utilized as the constraint of the cost function. The approach achieved 91.23% TPR, 9.97% FPR, and similarity rate of 83.73% using a dataset of 184 images.
The deep neural networks have been utilized for image segmentation and classification. In [3, 20, 43], and deep networks are proposed for breast histology image and mammographic mass segmentation. A deep learning approach was applied for breast cancer detection . Three network structures were used: patch-based LeNet , U-Net 
, and transfer learning with a pretrained fully convolutional network (FCN) with AlexNet. The network structures were applied to two datasets with 469 images. The experiments were conducted to compare three network structures: 1) trained on a dataset and tested on another; 2) trained and tested on single dataset; 3) trained on the combined dataset and tested on individual one. In the first experiment, the result on dataset A using U-Net was TPR 0.83, FPs/Image 0.08, and F-measure 0.87; the result on dataset B was TPR 0.70, FPs/Image 0.66, and F-measure 0.59. The result of the second experiment was TPR 0.98, FPs/Image 0.16, and F-measure 0.91 on dataset A using FCN-AlexNet; and TPR 0.92, FPs/Image 0.17, and F-measure 0.89 on dataset B. The result of the third experiment was TPR 0.99, FPs/Image 0.16, and F-measure 0.92 on dataset A using FCN-AlexNet; and TPR 0.93, FPs/Image 0.18, and F-measure 0.88 on dataset B. It concluded that the performance depended on dataset.
BUS image semantic segmentation was studied in . An extracted feature of 1-level wavelet transform was added to the input of the neural network. The prior knowledge that tumor must be inside mammary layer was utilized to constrain the conditional random fields (CRFs) energy function. This method could detect and segment the mammary layer and tumor; while other layers were treated as background. The method achieved TPR 92.80%, FPR 9.00%, and Intersection over Union (IoU) 82.11% in tumor segmentation.
According to above discussion, we can summarize that although all existing methods claimed to achieve the best results on their own datasets, they still have shortcomings: 1) traditional segmentation methods were based on the assumption that there was one and only one tumor in each image. Hence, the methods could not handle the situation: there is no tumor or more than one tumor. 2) most of the segmentation methods did not employ breast anatomy knowledge. 3) the uncertainty in BUS images, especially around the boundaries between breast anatomy layers, is not dealt. Due to the low resolution and poor quality of the BUS image, the boundary areas were very vaguer and fuzzy, even difficult for doctors to classify.
To overcome the shortcomings, a novel, robust, fuzzy logic guided BUS image semantic segmentation method with breast anatomy constrained post-processing method is proposed. Based on , an adaptive membership function is designed. The input image is transformed to fuzzy domain using the membership function. The fuzzy features are used to represent the uncertainty in the image. Fuzziness of the image is reduced by multiplying uncertainty maps and input image. The feature maps of the first convolutional layer are transformed in fuzzy domain and uncertainty in the convolutional feature maps is reduced by multiplying uncertainty maps and the corresponding convolutional feature maps. Then, the context information (breast anatomy) and the layer structure of human breast  are applied to the conditional random fields (CRFs) to improve the final segmentation result. The dataset contains 325 BUS images labeled by the doctors of cooperative hospitals; 229 images contain tumors, and the rest ones do not contain tumors. Previously existing datasets just had labels for tumors, while our dataset has the information of 5 categories: fat layer, mammary layer, muscle layer, background, and tumor in every image (as shown in Fig. 1).
The paper is organized as follows: in Section 2, the fuzzy information guided fully convolutional network is introduced; in Section 3, the breast anatomy constrained CRFs are presented; in Section 4, the experiments for the network structure and post processing are discussed, the performance is compared with that of previous methods, and the conclusions are given in Section 5.
Ii Fuzzy information guided fully convolutional network
Fully convolutional network (FCN) 
was widely used in semantic segmentation because FCN obtained better results than that of traditional methods. FCN allowed to use the whole images as inputs and provided a pixel-wise category prediction. Supervised learning methods using deep convolutional neural network for image analysis can be categorized as: 1) patch-based methods: input images were divided into small overlapping or non-overlapping patches which were classified. They represented the category of the center pixel or a chosen pixel on the boundary of a patch. A sliding window was used to scan all the pixels in the image. 2) FCN: FCN realized an end-to-end segmentation, where the whole images were utilized as the inputs without dividing into patches. FCN performed better than patch-based methods and was applied to nature image semantic segmentation, medical image processing, crack detection, and other tasks. In this research, a fuzzy FCN is proposed for BUS image segmentation.
Ideal fully automatic BUS image segmentation system should have the following characteristics: 1) it uses the entire image as the input, and the output is the final segmentation result without manual intervention; 2) it has high robustness and accuracy. In Fig. 2, (a) is the original image, and (b) is the label map created by experienced doctors: the black areas represent background, the green area represents fat layer, the yellow area represents mammary layer, the blue area represents muscle layer, and the red area represents tumor. In Fig. 2 (a) the regions marked by red and green rectangles are hard to determine even by experienced radiologists, since the boundaries are vaguer, fuzzy and with high uncertainty.
A membership function transforms BUS image to fuzzy domain. The uncertainty in BUS image can be handled well by using fuzzy logic, and better semantic segmentation result can be obtained.
The flowcharts of the proposed approaches are shown in Fig. 3. In Fig. 3 (a), the input image is preprocessed by contrast enhancement. Then, wavelet transform is applied. The original image and wavelet information are transformed to fuzzy domain by membership functions to deal with the uncertainty. Results after reducing the uncertainty are input into the first convolutional layer. The obtained feature maps are transformed into fuzzy domain as well, and the uncertainty is reduced by multiplying the uncertainty maps and the corresponding feature maps. In Fig. 3 (b), wavelet transform is not utilized. After reducing uncertainty in gray-level intensity and the first convolutional layer, the network can achieve similar performance to that of Fig. 3 (a). Two approaches are evaluated by segmentation accuracies and compared with the original fully convolutional network.
The original images were captured in different periods which might have different ranges of intensities. It will affect the segmentation results. The histogram equalization is modified to make the input image have the intensity range from 0 to 255, and to conduct contrast enhancement. Histogram equalization is performed on both training set and testing set. In histogram equalization, the probability of a pixel with intensity, , is computed by Eq. (1) 
where represents the number of pixels with intensity ; represents the total number of pixels.
is the largest intensity. The cumulative distribution function ofis defined as:
The new intensity is computed by
where represents the original intensity, and is the minimum non-zero value in the cumulative distribution function.
The original images are shown in Fig. 4 (a). After histogram equalization, the contrast is enhanced, and the range of the intensities is normalized from 0 to 255 as shown in Fig. 4 (b).
Wavelet transform: to overcome small dataset size problem, high-pass filter H and low-pass filter G of wavelet transform are used to obtain the high frequency and low frequency information. In this research, one level Haar wavelet transformation is applied, and the input image becomes a 3-channel image. The first channel is the original image; the second channel contains the low frequency coefficients; and the third channel contains the high frequency coefficients. Fig. 5 shows the original images and augmented 3-channel images, respectively.
Ii-C Fuzzy layer
The boundary between the layers is hard to determine due to the uncertainty, poor contrast, and inherent noise (speckles) in BUS image. Fuzzy logic has been applied to handle the uncertainty successfully. A fuzzy contrast enhancement method was developed . The membership function was a S-function with adaptive parameters calculated by using the local maxima of the histogram. A fuzzy clustering method was utilized for image segmentation 
. The fuzzy membership was initialized by k-means clustering. The segmentation cost function was based on the membership of each pixel and the Euclidean distances from the pixels to the cluster center. An edge-detection method based on generalized type-2 fuzzy logic was designed. The membership function was defined using Gaussian generalized type-2 membership functions. Fuzzy image processing methods can obtain robust results and handle uncertainty and noise well. In this research, we apply fuzzy logic to FCN to solve the uncertainty. The fuzzy fully convolutional network consists of three parts: 1) fuzzification layer; 2) uncertainty representation layer, and 3) fusion of convolutional network and fuzzy information.
Fuzzification layer: Traditional fuzzification uses membership functions such as S-shape membership function, Gaussian membership function, Triangular membership function, etc. Parameters in the membership functions are decided manually or calculated using the information of the specific problems. In this paper, a trainable membership function is utilized to improve robustness and rationality.
In Fig. 6 (black part), the input images are transformed into fuzzy domain. Two membership functions are employed: trainable Sigmoid and Gaussian membership functions. Each input node (pixel) is transformed by the membership function. Let be the input node, i represents the ith pixel. All the input channels will conduct fuzzy transform. Here the gray-level channel is used as an example to show the membership and uncertainty intuitively. represents the output node and r represents the category index. The trainable Sigmoid membership function for fuzzification layer is computed by Eq. (4)
where represents the number of pixels in the image; r has 5 values: 0 represents the background; 1 represents the tumor; 2 represents the fat layer; 3 represents the mammary layer; and 4 represents the muscle layer. and represent the parameters of the membership function for pixel i. For every category, the pair of parameters are obtained during training, and the membership of the category is calculated using these parameters. In BUS images, tumor areas have low intensities in spatial domain, and other layers have higher intensities. By changing the parameters and
, trainable Sigmoid function can represent the membership of each category. The trainable Gaussian membership function is also used to compare with trainable Sigmoid membership function to demonstrate the usefulness of the fuzzy logic in handling uncertainty. The trainable Gaussian membership function is computed by Eq. (5)
represent the mean and variance of categoryr, which are utilized to obtain the memberships of different categories.
The fuzzy memberships are normalized by Eq. (6). It makes the summation of memberships in different categories of a pixel become one.
Heatmaps in Fig. 7 represent the membership values of the intensities; blue represents low membership value and red represents high membership value. In Fig. 7 (a)-(e), the memberships are computed by the trainable Gaussian membership function; and in (f)-(j), the memberships are computed by the trainable Sigmoid membership function.
The parameter in trainable Sigmoid membership function is initialized by the mean of the intensities of all training samples in category r. The parameter
is initialized by the uniform distribution. The parameteris initialized by the mean of the intensities of all training samples in category r, and is initialized by the variance of the samples in the same category. All the input channels (original gray-level intensity and wavelet coefficients) are transformed to fuzzy domain.
Uncertainty representation layer (Orange area in Fig. 6): If the membership of a pixel is close to 1 or 0, the uncertainty of the pixel is low. If the membership is around 0.5, the uncertainty is high, and it is hard to determine which category the pixel belongs to. The inputs of this layer are the fuzzy memberships, and the uncertainties in corresponding categories are computed using Eq. (7).
where is the membership of pixel i in the rth category, which is the output of the fuzzification layer. The heatmaps of the uncertainty maps on gray-level intensities are generated as shown in Fig. 8.
Heatmaps in Fig. 8 show the uncertainties in different categories. The red areas have high uncertainties, and blue areas have low uncertainties in corresponding categories. To compute the overall categories uncertainty, a fuzzy AND operation is applied as shown in Eq. (8).
The overall categories uncertainty maps on gray-level intensities are shown as the heatmaps in Fig. 9.
From Fig. 9, it can be observed that the pixels on the boundaries between categories have high uncertainties. The pixels in mammary layer and muscle layer have high uncertainties as well. Trainable Sigmoid and Gaussian membership functions can obtain similar results.
Fusion of original convolutional network and fuzzy information: To reduce the uncertainty on the original channel, the overall categories uncertainty maps are fused with the corresponding original channels as shown in Eq. (9).
where are the overall categories uncertainty maps obtained by Eq. (8), and are the original channels of the input. It means if a pixel has high uncertainty, its weight should be reduced.
The inputs of the network are three channel images  (shown in Fig. 3 (a)). The first channel is the original image after contrast enhancement; the second channel is the wavelet low frequency information; and the third channel is the wavelet high frequency information. The fuzzification layer, uncertainty representation layer, and fusion layer are applied to all three channels separately. The results after reducing uncertainty are shown in Fig. 10. The boundary areas in (c) and (d) are more distinct than that in (a).
The resulted image inputs to the convolutional layer for obtaining the convolutional feature maps. The network structure is similar to U-Net. The first convolutional layer has a feature map of 64 dimensions. All the 64-dimensional features will conduct fuzzification, uncertainty representation, and fusion with original information.
Ii-D Fuzzy information guided fully convolutional network training
The uncertainty maps multiply with the input images and the results input to the first convolutional layer. The entire network structure is shown in Fig. 11.
The output is processed by pixel-wised soft-max which is defined as :
where is the output of the neural network, represents the class index, and
represents the input pixel. The cross-entropy loss function is computed by the output probability and the label of each pixel:
is the pixel label which is background, tumor area, fat layer, mammary layer or muscle layer with one-hot encoding. The original parameters in U-Net are initialized by uniform distribution. If using trainable Sigmoid membership function, parameterin Eq. (4) is initialized by the mean of all training samples in category r. Parameter in Eq. (4) is initialized by uniform distribution. The parameters and in Eq. (5) are initialized by the mean and variance of the intensities of all training samples in category r. The training strategy is based on the back-propagation algorithm. All of the functions should be differentiable. Functions in fuzzy layer (trainable Sigmoid function or trainable Gaussian function) are all differentiable. The training strategy is in Algorithm 1.
The fuzzy fully convolutional network deals with the following issues: 1) it can reduce the uncertainty, and 2) it can solve small sample size problem, and it can even replace the information extension process  (experimental details will be discussed in Section 4).
Iii Breast-anatomy constrained post-processing
The fuzzy information guided fully convolutional network (FCN) can perform BUS image segmentation. However, the segmentation results are not good because the dataset size is too small, and the network structure is quite deep. Fully connected conditional random fields (CRFs) are often utilized to refine the segmentation results. In , Krahenbuhl, et al. provided an approximation algorithm to fully connected CRFs for multi-objects segmentation. The approximation algorithm increases the efficiency of fully connected CRFs and makes it possible for semantic segmentation. Chen et al. proposed the Deeplab structure for nature image semantic segmentation by applying the atrous convolutional operation and atrous spatial pyramid pooling (ASPP) [4, 5]
. In addition, Deeplab was also applied fully connected CRFs to the end of the architecture for achieving better performance. Zheng et al. realized CRFs using a recurrent neural network (RNN). It made the FCN+CRFs structure become a deep end-to-end architecture. Fully connected CRFs took care of the relationships among pixels, not just classified pixels into different categories. The physical location of a pixel and the features can affect the final segmentation.
In order to involve the label context, Liu et al. provided a Markov random field (MRF) method with the mixture of label context . The MRF was realized by Deep Parsing Network (DPN). In , three categories: tumor, mammary layer, and background were classified. The importance of the locations of mammary layer and tumor was discussed, since most of the breast cancers begin from the cells in the mammary layer .
In this paper, the proposed method will be applied to five categories. Which means that not only mammary layer and tumor are determined, but other layers such as muscle layer, fat layer, and background are also involved.
Iii-a Fully Connected CRFs
In image segmentation, the pixels are considered as a random field , where represents the features of the Nth pixel, and the labels are considered as another random field , where represents the label of the Nth pixel. The segmentation task is formulized by CRFs through computing the conditional probability which is characterized by Gibbs distribution :
where W is a graph on random field X, and w represents cliques in the graph. In the fully connected CRFs, W is a complete graph on X, and a pixel has edges connected to all other pixels in the image. in Eq. (12) is the Gibbs energy function :
Segmentation could also be treated as the optimum labeling problem which modeled the maximum a posteriori problem (MAP). The maximum probability is the same as the minimum energy function . The first term in Eq. (13), is a unary potential function, which is provided by a unary classifier, such as FCN. In the proposed approach, the probability is computed by the neural network in Section 2. is the pairwise potential function :
where , if , and if , which is known as the Potts model. This coefficient shows if two pixels are in the same category, the energy is minimum. is a Gaussian kernel, where and are the features of pixels and . is the combination weight of the mth Gaussian kernel. There are two Gaussian kernels in . In the first Gaussian kernel, the feature is defined on the physical position of the pixels, and another is the color feature of the pixels. In this research, the color feature (RGB) represents the gray-level information in R channel, approximation coefficient of wavelet transforms in G channel, and high frequency information of wavelet transforms in B channel. If the input image is not preprocessed, only intensity combined with position is used. The second Gaussian kernel is only defined on the positions of pixels. The detail of the pairwise potential function is shown in Eq. (15) :
where represents the position of the ith pixel, and represents the color feature of the ith pixel. , , and are the parameters of CRFs determined by experiments. The first term is the appearance kernel. It encourages two pixels with the similar color features () and close positions () to be in the same category; otherwise, classifies them into different categories. The second term is the smooth kernel which only depends on pixel position. It helps to smooth the segmentation result.
Iii-B Medical knowledge in BUS images
The fully connected CRFs in Section 3.1 are popular for nature image segmentation. In this research, the target image is BUS image, which contains special regular patterns. As shown in Fig. 1 (b) the BUS images have the following properties: 1) The anatomy of human breast consists of skin (SK), subcutaneous fat (SF), intraglandular fat (IF), glandular tissue, retromammary fat (RF), and muscle . We can simplify the breast model as the layer structure shown in Fig. 1(b). On the top is the skin layer. The subcutaneous fat layer is beneath the skin layer. The mammary layer is below the fat layer and followed by the muscle layer. 2) Breast cancer is usually ellipse-shaped and begins from the cells in mammary layer. In most cases, breast cancer stays inside the mammary layer.
In this study, the skin layer is treated as background because the number of samples containing skin layer is quite small. However, due to the position, the skin layer is different from the retro-muscle background area. In order to make the context of different layers more reasonable, the skin layer is treated as pre-fat background area. In general, the contexts of pre-fat background area, fat layer, mammary layer, muscle layer, retro-muscle layer, and breast tumor are used. represents the category of pixel i assigned by FCN, . , , , , , and represent pre-fat background area, fat layer, mammary layer, muscle layer, retro-muscle layer, and tumor, respectively (Fig. 12 (b)). In this research, they are three-dimensional vectors.
Iii-C Breast-anatomy constrained fully connected CRFs
As discussed in Section 3.2, the breast cancer usually begins in the mammary layer. However, some pixels in the fat layer and muscle layer might be classified into wrong categories; in addition, the pixels in muscle layer have similar intensity levels to that of the pixels in mammary layer which may also cause misclassification. The medical knowledge can overcome this problem. After locating the positions of the fat layer, mammary layer and muscle layer, the context information can be used to prevent the wrongly classified patches in each layer. The original fully connected CRFs contain the energy function in Eq. (14)-(15). The Gaussian kernel in Eq. (15) consists of pixel positions and color features. To involve breast anatomy, the category of pixel i assigned by FCN, which is defined as , is treated as another feature. A new Gaussian kernel based on and position of the pixel () is utilized. The new energy function contains three terms:
where is the Gaussian kernel of the context information, and and represent categories of pixel i and j assigned by the fully convolutional neural network in Section 2. and are the parameters of CRFs.
Here, two context distances are defined: 1) context distance between two pixels, where and represent pixel and , and 2) context distance between two categories, where and represent the category indexes. In this research, , , . is the Euclidean distance of the two category vectors. is the context distance between category of pixel and category of pixel . For example, if the pixel is in category , and pixel is in category , equals to .
To demonstrate how to utilize the context distance between pixels and categories and its effectiveness on BUS image segmentation, simulated images are utilized shown in Fig. 13. In Fig. 13 (a), , and represent three categories. represents a wrongly classified patch, which should be in , but assigned to by the unary classifier. If the context distances among three categories are set as: the context distance between and equals to the context distance between and ; the context distance between and is greater than the context distance between and ; then pixels in has the chance to be corrected into . Here, four pixels are chosen to demonstrate how it works: 1) pixel in area ; 2) pixel in area ; 3) pixel in area ; 4) pixel in area . Pixel is in area and area is now in category ; pixel is in area , so the context distance between pixel and pixel equals to the context distance between categories and as introduced in the previous paragraph, i.e. . For other pixels, the situations are the same, i.e. ; . Therefore, because of the assumption made before. Meanwhile, because of the physical positions of these pixels, where , , and are the positions of these pixels in Eq. (16). Hence, the pixels in area have smaller context distances with pixels in than that in ; and the pixels in area have smaller space distances with the pixels in than that in , which means pixels in area could not be in category . Even if the pixels in area have zero context distances with the pixels in (i.e. they are in the same category), they have smaller space distances with the pixels in than that in . Therefore, the pixels in area still have the possibility to be classified into category . In Fig. 13 (b), the pixels in area are wrongly classified into category and the pixels in area have the same context distances with the pixels in and because of the assumption before, but they have smaller space distances with the pixels in than that in . Even if the pixels in area have zero context distances with the pixels in , their space distances with the pixels in are smaller than that with the pixels in . Therefore, the pixels in area still have the chance to be classified into by properly setting weight and parameters and in Eq. (16). The BUS images (Fig. 12 (a) and (b)) are similar to the simulated examples.
The context distances between the categories can be classified into three classes (Fig. 12 (b)) in the BUS images: 1) two layers are neighbors to each other (), e.g. fat layer () and mammary layer (); 2) two layers are separated by another layer (), such as fat layer () and muscle layer (); 3) two layers are separated by two layers (), such as fat layer () and retro-muscle background area ().
The relations among them are:
The reason of setting such relations among them (Eq. (18)) is to follow the situation in the simulated example. The relations encourage a clear boundary and void wrongly classified patches like in Fig. 13. and are both treated as the background in FCN. Here, they are treated as different labels for easier description. They have high space distance. Their space distance plays more important role than context distance in this term, so their context distance is not involved. After defining the context distances among five layers, the context distances between tumor and five layers could be defined. The tumor () usually locates in the mammary layer (). Sometimes, the mammary layer above the tumor or below the tumor is very thin; and the tumor seems to be in the fat layer () or muscle layer (). The context distance between tumor () and mammary layer () should be the largest, because it encourages a clear boundary between tumor and mammary layer. The context distances between the tumor () and fat layer () or muscle layer () should be the second largest. This gives the chance to correct some wrongly classified patches in these layers. The context distance between tumor () and the background ( and ) should be the smallest. Because some background areas are likely classified as the tumors, and such situation should be voided (refer Fig. 13). The relationships are shown in Eq. (19).
The category vectors , , , , , and should satisfy the constraints in Eq. (17)-(19) to realize the medical anatomy constraints. By solving Eq. (17)-(19), , , , , , and . For , , , , , . The relations among context labels are shown in Fig. 14. If a pixel is classified into category , a category map will be created and the corresponding pixel in the category map will be assigned by the value of . The category map is used as another feature in Eq. (16).
By setting the label vectors with these values, the proposed CRFs energy function encourages two pixels whose space distances and context distances are both small to be in the same category. It will remove some wrongly classified patches.
Iv Experimental results
The performances of the proposed fuzzy FCN and the breast-anatomy constrained fully connected CRFs are evaluated by a dataset of 325 images. Image 1 to 141 were collected over 10 years by the Second Affiliated Hospital of Harbin Medical University using VIVID 7 (GE) and EUB-6500 (Hitachi) imaging systems. Image 142 to 325 were collected in recent 3 years by the First Affiliated Hospital of Harbin Medical University using Aixplorer Ultrasound system (SuperSonic Imagine). The resolution of the first 141 images is 550 450, and the rest 184 images have the resolution 787 526. Informed consents to the protocol from all patients were acquired. The privacy of the patients is well protected.
An experienced radiologist from the First Affiliated Hospital of Harbin Medical University delineated the boundaries of the layers and tumors. The pixel-wise ground truths are generated according to the manually delineated boundaries. The experiment is conducted in two steps. In the first step, fuzzy FCN is applied and compared with U-Net , FCN , and . In the second step, the breast-anatomy constrained fully connected CRFs are applied. Then, the overall performance of the proposed method is computed. The final tumor segmentation results will be compared with that of the existing methods [40, 12, 21, 34, 39, 31, 32, 28].
Iv-B Evaluation metrics
Three area metrics are used to evaluate the performance: true positive rate (TPR), false positive rate (FPR), and intersection over union (IoU) [37, 28]. The IoU for every category are computed, and the mean over 5 categories IoUs is used as the overall performance. The TPR and FPR for tumors are used to compare with that of the previous tumor segmentation methods. Due to the limitation of the number of the samples, 10-fold validation is used. The samples are randomly divided into 10 subsets. Each time, 9 of them are used for training and 1 is used for testing. The metrics are computed by Eq. (20):
where is the region generated by the proposed method or existing methods, and is the region of the ground truth.
Iv-C Segmentation result of fuzzy FCN
In order to show the effectiveness of the fuzzy logic, the proposed fuzzy layer is applied to U-Net. The feature extension method  is also employed for comparison. Histogram equalization is applied to the input images, then the wavelet transform is utilized. Five networks are trained: 1) the gray-level images (original images) are used to train the U-Net; 2) the image after preprocessing and wavelet transform are used to train the U-Net ; 3) the fuzzification layer is applied, and the output of the uncertainty representation layer is combined with the input image. The fuzzy layer is also applied to the feature maps of the first convolutional layer; 4) to demonstrate the existence of the uncertainty in BUS images and the effectiveness of the fuzzy layer; wavelet transform is removed, and only the gray-level BUS image is used as the input; and 5) the FCN with VGG16 network structure using original gray-level image as input and pretrained by nature images.
Fig. 15 (b) shows the pixel-wise ground truths. The black area is background; the green area is the fat layer; the yellow area is the mammary layer; the blue area is the muscle layer; and the red area is the tumor. Fig. 15 (c) shows the results of the U-Net with original gray-level images as the inputs, which are the worst. Using information extension method  can improve the performance in some cases (Fig. 15 (c)); however, sometimes adding wavelet information can make the results worse. If adding fuzzy processing and reducing uncertainty in the 3-channel input images, the results are better (Fig. 15 (e) and (f)). Even if not applying wavelet transform and preprocessing, the fuzzy FCN can still achieve good results. In Table I, the evaluation of the 6 networks is listed. The proposed methods can improve BUS image semantic segmentation. The IoU on tumor is 78.53% by using fuzzy FCN with 3-channel image and trainable Sigmoid membership function. It achieves a 4% improvement than that of non-fuzzy FCN. The overall IoU over the 5 categories is 78.32% using fuzzy FCN with 3-channel image and trainable Sigmoid membership function and has a 0.7% improvement than that of the non-fuzzy FCN.
|U-Net  with original image||70.34||66.72||66.17||65.91||74.66||68.76|
|U-Net with 3-channel image ||84.05||75.92||74.89||78.35||74.88||77.62|
|FCN-VGG16  with original image using pretrained model||82.57||75.47||75.53||78.59||74.42||77.32|
|Fuzzy FCN with 3-channel image and Sigmoid membership function||84.07||76.01||74.62||78.39||78.53||78.32|
|Fuzzy FCN with 3-channel image and Gaussian membership function||83.47||74.73||73.95||77.51||75.32||70.00|
|Fuzzy FCN with original image and Sigmoid membership function||82.56||76.14||74.64||75.98||77.56||77.38|
Evaluation results on 325 cases dataset. Evaluation metric is IoU
Moreover, if not using the information extension method (just using the original images as inputs without using preprocessing and wavelet transform), the fuzzy FCN achieves (the bottom row in Table I) a 3% improvement on tumor than the non-fuzzy FCN with gray-level image. The overall IoU is close to that of the non-fuzzy FCN with 3-channel image and achieves a 9.6% improvement than the non-fuzzy FCN with gray-level image.
Iv-D Breast-anatomy constrained fully connected CRFs.
Breast anatomy constrained fully connected CRFs utilize the medical context information. The original fully connected CRFs and the approximation algorithm in  are employed. It has three effects: 1) correct the wrongly classified pixels; 2) make the boundaries between layers more accurate; and 3) increase the overall segmentation performance. The CRFs parameters , , , , and are determined by experiments, and the medical context label and the context distance relation are shown in Fig. 14. The segmentation results are shown in Fig. 16. In Table II, the output of Fuzzy FCN with 3-channel image and trainable Sigmoid membership function is used as the unary energy in CRFs model. The original CRFs and proposed CRFs are for comparison.
|Fuzzy FCN with 3-channel image and Sigmoid membership function + CRFs||81.52||78.63||75.24||76.48||79.32||78.24|
|Fuzzy FCN with 3-channel image and Sigmoid membership function||84.07||76.01||74.62||78.39||78.53||78.32|
|Fuzzy FCN with 3-channel image and Sigmoid membership function + Proposed CRFS||85.06||77.24||78.66||80.09||81.29||80.47|
In Fig. 16 (c), the original CRFs classify some pixels wrongly. For example, in the fourth row, there are pixels in the fat layer classified into the tumor. The proposed CRFs utilize the medical context constraints to overcome such problem. The same as in the second row, the muscle layer grows into the mammary layer using the original CRFs, and in the third row, background area and fat layer interlace each other.
Table II shows the IoU of each category and the overall mean IoU. The proposed method achieves 81.29% of IoU for tumor, and 80.47% of overall IoU. In the results of both tumor and overall IoU, the proposed method achieves about 2% improvements than that of the original CRFs.
Iv-E Tumor segmentation results and comparison with existing segmentation methods
The existing methods only focus on breast cancer segmentation, while semantic segmentation methods work on multi-object segmentation. In this section, the proposed method and the methods in [19, 21, 39, 31, 32] are compared. The semi-automatic BUS image segmentation methods [19, 21] were studied, in which the regions of interest (ROIs) were given, and the methods could segment the tumor areas automatically. The fully automatic BUS image segmentation methods were studied [39, 31, 32]. The tumor segmentation results are shown in Fig. 17.
In Fig. 17, the semi-automatic segmentation methods (Fig. 17(c) and (d)) obtain good results. Semi-automatic segmentation methods are useful when doctors focus on specific areas and operate with the CAD systems interactively. Existing fully automatic segmentation methods obtain worse results, since the performance of these methods relied on the individual dataset. They can obtain good performance only using own datasets and need huge number of training samples. The proposed method can achieve the best result even on small dataset, and its robustness is much higher than that of other methods in comparison.
Table III shows that the proposed method achieves the best result among the methods in comparison. Furthermore, the proposed method can process the BUS images without tumors. The previous fully automatic methods could not solve such problem; since all of the previous methods are based on the prerequisite that there is one and only one tumor in the image . As shown in Fig. 18, the two samples do not contain tumors. Fig. 18 (c)-(e) are the results of the previous fully automatic methods [39, 31, 32], the white areas in the results are the tumors by the three methods, i.e., they do not work well.
From Fig. 18 (f), the proposed method can classify the layers in the BUS images well. It can provide medical context. After locating the layers in the breast, the anatomy of breast can be known which will be beneficial to tumor detection and segmentation.
Existing fully automatic segmentation methods [39, 31, 32] cannot solve multi-tumor cases as well. In Fig. 19, three BUS images are not in our dataset, and each image contains 2 tumors. The first image is collected by a doctor of the First Affiliated Hospital of Harbin Medical University; the second one is found in a public dataset ; and the third one is found in .
In Fig. 19, the existing methods (Fig. 19 (c-e)) can only detect one tumor for each image, i.e. they cannot obtain good results for containing more than one tumor; however, the proposed method can (Fig. 19 (f)).
In this paper, a novel BUS image semantic segmentation method is proposed. It can achieve good semantic segmentation result. The approach consists of two steps. First, the fuzzy FCN can achieve good segmentation result. The second step uses breast anatomy constrained conditional random fields to fine-tune the segmentation result. The experimental results demonstrate that the proposed fuzzy FCN can handle the uncertainty well. The robustness and accuracy of the fuzzy FCN are better than that of the non-fuzzy FCN.
The proposed method solves the following issues to achieve much better results: 1) it uses fuzzy logic to handle the uncertainty in the original image and feature maps of the convolutional layers; 2) fuzzy approach can provide more information; 3) it also provides anatomy information to fully connected CRFs which can increase the segmentation accuracy. There are still two potential improvements: 1) using fuzzy logic to handle the uncertainty in other convolutional layers and loss function; and 2) the anatomy context model of human breast is very complex; therefore, more anatomy information should be included.
-  (2010) Fast high-dimensional filtering using the permutohedral lattice. In Computer Graphics Forum, Vol. 29, pp. 753–762. Cited by: §III-C.
-  (2019)(Website) External Links: Cited by: §I.
Mitosis detection in breast cancer histology images via deep cascaded networks.
Thirtieth AAAI Conference on Artificial Intelligence, pp. 1167––1173. Cited by: §I.
-  (2018-04) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (4), pp. 834–848. External Links: Cited by: §III.
-  (2015) Semantic image segmentation with deep convolutional nets and fully connected crfs. In Proceedings of International Conference on Learning Representations, Cited by: §III.
-  (2010) Automated breast cancer detection and classification using ultrasound images: a survey. Pattern Recognition 43 (1), pp. 299–317. External Links: Cited by: §I.
-  (2000) A novel fuzzy logic approach to contrast enhancement. Pattern recognition 33 (5), pp. 809–819. Cited by: §II-C.
-  (2016) Computer-aided diagnosis with deep learning architecture: applications to breast lesions in us images and pulmonary nodules in ct scans. Scientific reports 6, pp. 24454. External Links: Cited by: §I, §I.
-  (2006) Fuzzy c-means clustering with spatial information for image segmentation. computerized medical imaging and graphics 30 (1), pp. 9–15. Cited by: §II-C.
-  (2017-08) A hierarchical fused fuzzy deep neural network for data classification. IEEE Transactions on Fuzzy Systems 25 (4), pp. 1006–1012. External Links: Cited by: §I.
-  (2016) Breast cancer statistics, 2015: convergence of incidence rates between black and white women. CA: a cancer journal for clinicians 66 (1), pp. 31–42. External Links: Cited by: §I.
-  (2018-08) Medical knowledge constrained semantic breast ultrasound image segmentation. In 2018 24th International Conference on Pattern Recognition (ICPR), Vol. , pp. 1193–1198. External Links: Cited by: §I, §I, §II-C, §II-D, §III, §IV-A, §IV-C, §IV-C, §IV-E, TABLE I.
-  (2017) Breast ultrasound image segmentation: a survey. International journal of computer assisted radiology and surgery 12 (3), pp. 493–507. Cited by: §I.
-  (2015) Automatic segmentation of breast lesions for interaction in ultrasonic computer-aided diagnosis. Information Sciences 314, pp. 293–310. External Links: Cited by: §I, §I.
-  (2017-12) Automated breast cancer diagnosis using artificial neural network (ann). In 2017 3rd Iranian Conference on Intelligent Systems and Signal Processing (ICSPIS), Vol. , pp. 54–58. External Links: Cited by: §I.
-  (2011) Efficient inference in fully connected crfs with gaussian edge potentials. In Advances in neural information processing systems, pp. 109–117. Cited by: §III-A, §III-A, §III-A, §III-C, §III, §IV-D.
-  (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pp. 1097–1105. Cited by: §I, TABLE I.
-  (1998-11) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. External Links: Cited by: §I.
-  (2010) Probability density difference-based active contour for ultrasound image segmentation. Pattern Recognition 43 (6), pp. 2028–2042. External Links: Cited by: §I, Fig. 17, §IV-E, TABLE III.
-  (2019-02) An end-to-end deep learning histochemical scoring system for breast cancer tma. IEEE Transactions on Medical Imaging 38 (2), pp. 617–628. External Links: Cited by: §I.
-  (2012) An effective approach of lesion segmentation within the breast ultrasound image based on the cellular automata principle. Journal of digital imaging 25 (5), pp. 580–590. Cited by: Fig. 17, §IV-A, §IV-E, TABLE III.
Semantic image segmentation via deep parsing network.
Proceedings of the IEEE international conference on computer vision, pp. 1377–1385. Cited by: §III.
-  (2015-06) Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 3431–3440. External Links: Cited by: §II, §IV-A.
-  (2018)(Website) External Links: Cited by: §IV-E.
-  (2014-12) Edge-detection method for image processing based on generalized type-2 fuzzy logic. IEEE Transactions on Fuzzy Systems 22 (6), pp. 1515–1525. External Links: Cited by: §II-C.
-  (2014) Tumor detection in automated breast ultrasound images using quantitative tissue clustering. Medical physics 41 (4), pp. 042901. External Links: Cited by: §I, §I.
-  (2005) Anatomy of the lactating human breast redefined with ultrasound imaging. Journal of anatomy 206 (6), pp. 525–534. Cited by: §III-B.
-  (2013) Multi-feature gradient vector flow snakes for adaptive segmentation of the ultrasound images of breast cancer. Journal of Visual Communication and Image Representation 24 (8), pp. 1414–1430. Cited by: §IV-A, §IV-B, §IV-E.
-  (2015) U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi (Eds.), Cham, pp. 234–241. External Links: Cited by: §I, §II-D, §IV-A, TABLE I.
-  (2012) A novel segmentation method for breast ultrasound images based on neutrosophic l-means clustering. Medical physics 39 (9), pp. 5669–5682. External Links: Cited by: §I, §I.
-  (2012) Completely automated segmentation approach for breast ultrasound images using multiple-domain features. Ultrasound in medicine & biology 38 (2), pp. 262–275. Cited by: Fig. 17, §IV-A, §IV-E, §IV-E, §IV-E, TABLE III.
-  (2015-Sep.) A saliency model for automated tumor detection in breast ultrasound images. In 2015 IEEE International Conference on Image Processing (ICIP), Vol. , pp. 1424–1428. External Links: Cited by: Fig. 17, §IV-A, §IV-E, §IV-E, §IV-E, TABLE III.
-  (2010) Various types and management of breast cancer: an overview. Journal of advanced pharmaceutical technology & research 1 (2), pp. 109–126. Cited by: §I, §III.
-  (2014-08) A fully automatic breast ultrasound image segmentation approach based on neutro-connectedness. In 2014 22nd International Conference on Pattern Recognition, Vol. , pp. 2495–2500. External Links: Cited by: §I, §IV-A.
-  (2012-Sep.) Multiple-domain knowledge based mrf model for tumor segmentation in breast ultrasound images. In 2012 19th IEEE International Conference on Image Processing, Vol. , pp. 2021–2024. External Links: Cited by: §I.
-  (2018-01) A Benchmark for Breast Ultrasound Image Segmentation (BUSIS). arXiv e-prints, pp. arXiv:1801.03182. External Links: Cited by: §I.
-  (2018) Automatic breast ultrasound image segmentation: a survey. Pattern Recognition 79, pp. 340–355. External Links: Cited by: §I, §IV-B.
-  (2015) Fully automatic segmentation of breast ultrasound images based on breast characteristics in space and frequency domains. Pattern Recognition 48 (2), pp. 485–497. External Links: Cited by: §I.
-  (2015) Fully automatic segmentation of breast ultrasound images based on breast characteristics in space and frequency domains. Pattern Recognition 48 (2), pp. 485–497. Cited by: Fig. 17, §IV-A, §IV-E, §IV-E, §IV-E, TABLE III.
-  (2018-07) Automated breast ultrasound lesions detection using convolutional neural networks. IEEE Journal of Biomedical and Health Informatics 22 (4), pp. 1218–1226. External Links: Cited by: §I, §I, §IV-A.
-  (1997-02) Contrast enhancement using brightness preserving bi-histogram equalization. IEEE Transactions on Consumer Electronics 43 (1), pp. 1–8. External Links: Cited by: §II-B.
-  (2015) Conditional random fields as recurrent neural networks. In Proceedings of the IEEE international conference on computer vision, pp. 1529–1537. Cited by: §III.
-  (2018-04) Adversarial deep structured nets for mass segmentation from mammograms. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Vol. , pp. 847–850. External Links: Cited by: §I.