According to the World Health Organization, breast cancer is one of the most frequent cancer diseases in the world [Stewart2014]. To provide more effective treatment and reduce the death rate, early detection of breast cancer must be carried out. Despite the fact that different diagnostic tools can be used to detect breast cancer, there is a growing interest in use of ultrasound imaging. Ultrasound is known for being non-invasive, relatively non-expensive and broadly accessible. As opposed to mammography which is not sensitive in the case of women with dense breast. As it was demonstrated in several studies [Kolb2002, Stavros1995, Zhi2007], ultrasound can be successfully used for breast cancer detection. However, diagnosis conducted by means of ultrasound imaging requires experienced radiologists who know how to efficiently operate ultrasound scanner and possess knowledge of breast cancer heterogeneity and its complex characteristic features appearing on ultrasound images. Therefore, many unnecessary biopsies are performed.
To standardize the reporting process and diagnosis, American College of Radiology developed a quality control system called BI-RADS (Breast Imaging – Reporting and Data System) which is now widely accepted and used by physicians [Bott2014]. After the interpretation of the lesion ultrasound image, a specific BIRADS category is assigned which reflects the risk of malignancy. However, this assessment still depends on the ultrasonographer’s experience and his ability to interpret the ultrasound image correctly. Therefore, computer-aided diagnosis (CAD) systems are investigated to improve the breast lesion classification and support physicians, especially the inexperienced ones.
The main goal of CAD is to develop a computer program which would be able to differentiate breast lesions based on ultrasound images analysis [Cheng2010]
. The common approach is to extract features from the image which contains the lesion and then develop a classifier using machine learning methods. Well-chosen features are the most important part of every CAD system. So far, various sets of features were proposed in the literature for the breast lesion classification. Those features are primarily divided into two categories, namely the texture and the morphological features. Nowadays, morphological features are considered to be the most effective in breast cancer classification[GomezFlores2015], though good performance was also reported for other features. Morphological features assess lesion contour. Generally, more irregular contour is expected in the case of malignant lesions.
Morphological features have some indisputable advantages, especially in comparison with texture features. They are less affected by image processing algorithms used for B-mode image reconstruction. Many CAD systems were developed based on B-mode images acquired with a single ultrasound machine. However, usually little is known of the image reconstruction algorithms implemented in the scanner. Most of the ultrasound image enhancing algorithms intensively process texture [ContrerasOrtiz2012] which may have negative impact on the classification performed with texture features. On the contrary, edge preservation and emphasis is one of the main goals of image processing algorithms what places morphological features in a far better position than texture features. Texture features depend on operator and particular machine settings, e.g. focal depth positioning [Garra1993]
. Estimation of these features can be affected additionally by the presence of calcifications or necrosis within the lesion[Byra2016, Larrue2014917].
The aim of this work is to combine the BI-RADS with morphological features to improve the classification. CAD papers usually don’t utilize BI-RADS categories which were assigned by the radiologist. This practice is understandable. While the process of features extraction is well defined mathematically, the assignation of a specific BI-RADS category depends on radiologist’s experience. This subjective assessment may affect the CAD system performance and make the comparison with other CAD systems problematic. On the other hand, it is of great importance to investigate whether a CAD system can support physicians. The widely used BI-RADS has its limitations which might be overcome with the CAD. Here, we investigate whether the morphological features can improve the BI-RADS classification performance or if they are rather redundant. First, the radiologist assigned the BI-RADS category to each lesion. Next, to improve the classification morphological features were chosen and combined with the BI-RADS. The performance of the combined classifier was compared with the BI-RADS and the best performing morphological feature subset.
This paper is organized in a following way. In the first section, the breast lesion database and the acquisition procedure are described. Next, we give a list of investigated morphological features including the papers in which they were originally proposed or later used. We present the scheme for feature selection. Then, we present the best performing feature subset. The same scheme is used to find which morphological features may improve the performance of BI-RADS. Finally, we discuss results and present conclusions.
Ii Materials and Methods
Ii-a Dataset and preprocessing
The database consists of 214 images of 107 solid lesions, 75 of which are benign and 32 malignant. For each lesion, two perpendicular scan planes were acquired during routine breast diagnostic procedures. Ultrasonix scanner (Ultrasonix Inc., Canada) equipped with a linear array probe L14-5/38 was used to collect the data. The focal depth was positioned at the center of each lesion. The imaging frequency was set to 7.5 MHz. Each lesion was biopsy proven. First, however, a specific BI-RADS category was assigned by the radiologist with 17 years’ experience in ultrasonic diagnosis of breast lesions. The BI-RADS has 7 categories which reflect the likelihood of malignancy [Bott2014, W.2012]:
3: probably benign
4: suspicious (4a - low suspicious, 4b - intermediate suspictious, 4c - moderate suspicious)
5: probably malignant
|1||Angular characteristics [Shen2007]|
|2||Area ratio [Alvarenga2012, Alvarenga2010, Chou2001]|
|3||Aspect ratio [Chang2005]|
|4||Branch pattern [Joo2004]|
|5||Circularity [Chang2005, Alvarenga2010, Chou2001, Drukker2004a, Kim2002, Drukker2005]|
|6||Contour roughness [Alvarenga2012, Alvarenga2010, Chou2001]|
|8||DWR [Drukker2004a, Drukker2002, Drukker2005, Horsch2002a, Chen2003, Chen2004]|
|9||Ellipsoidal shape [Joo2004, Su2011]|
|10||Elliptic-normalized circumference [Chen2003]|
|11||Elliptic-normalized skeleton [Chen2003]|
|13||Lesion size [Chen2003]|
|14||Lobulation index [Chen2003]|
|15||Long to short axis ratio [Chen2003]|
|16||Morphological closing ratio [Alvarenga2010]|
|17||Normalized residual value [Alvarenga2010]|
|18||NRL entropy [Chou2001]|
|19||NRL mean [Chou2001]|
|20||Alvarenga2012, Alvarenga2010, Chou2001]|
|21||NRL zero-crossing [Chou2001]|
|22||Number of lobulations [Joo2004, Su2011]|
|23||Number of substantial protuberances and depressions [Chen2003]|
|25||Overlap ratio [Alvarenga2010]|
|27||Shape class [Shen2007, Minavathi2012]|
|29||Spiculation [Joo2004, Su2011]|
|30||Undulation characteristics [Shen2007]|
In our study, BI-RADS categories of the lesions varied from 3 to 5 as it is depicted in Table 1.
Initial contour was indicated by the physician and subsequently improved with a computer algorithm [Chan2001], see Fig. 1. All calculations were performed in Matlab (The MathWorks Inc.)
We implemented 30 morphological features, which are listed in Table 2. For each feature, papers in which it was developed or used are mentioned. Here, we describe the most popular features. However, for an in-depth analysis, reader is referred to the cited works.
Depth to width ratio (DWR) is perhaps the most frequently used morphological feature. It is easy to calculate and was reported to be effective in many papers. The depth and the width are the dimensions of the minimal circumscribed rectangle which contains the lesion. Some papers use the inversion of the DWR [Chang2005], but here it will not be treated as a separate feature. A feature similar to the DWR is the long to short axis ratio of the ellipse inscribed in the lesion contour.
The normalized radial length (NRL) is defined by the following equation:
where is the distance from the lesion’s center of mass to -th point on its perimeter. The NRL is used to obtain features which measure various properties of the contour, see positions 2, 6, 18-21 in Table 2.
Some features, namely 7, 17, 23, 25 and 28, are based on the convex hull of the lesion. These features were introduced to measure the level of spiculations which is reflected by the protuberance of the contour.
To take into account that we possess two scans of the same lesion, each feature was calculated for both scan plane and results were averaged.
Ii-C Classification and evaluation
We explore the performance of morphological features in multiple ways. As it was shown in several papers [GomezFlores2015, daoud2016fusion]
, there is no single feature which would alone outperform the others, therefore features must be combined in order to improve the classification. However, for a large number of features it is problematic to perform an exhaustive search for the best performing subset due to a large number of all potential combinations. Here, we used a two-step approach to find the best subset for classification. The goal was to maximize the area under the curve (AUC) of the receiver operating characteristics (ROC). In the first step, the best performing feature (the highest value of AUC) was chosen. Next, the first feature was combined with the remaining features in order to select the best performing subset of two features. This forward selection procedure was repeated until the feature pool was empty. We decided to maximize the AUC because this quantity is after all usually used for classification performance assessment. To evaluate AUC we applied the leave-one-out cross-validation. Logistic regression was used to perform classification. Before the training, features were standardized. To address the class imbalance, during cost function minimization, sampling weights were inversely proportional to class frequencies in training set. For each test sample the probability of malignancy was calculated. The AUC standard deviation was calculated with the bootstrap method. Next, we performed the second step of the feature selection procedure which was the backward selection. Supposedly, different feature subsets may have similar AUC values. In this case, we selected the best performing subset, the one with the highest AUC value. Next, the ANOVA analysis along with the Tukey test were used to find a subset with a smaller number of features and likely the same mean AUC value as the best-performing subset at 95% confidence level.
With the above feature selection methodology, first the best morphological feature subset was selected. Next, the BI-RADS was combined with morphological features. However, to use the BI-RADS category as a feature, some kind of transformation must be performed. In our case the BI-RADS was treated as a discrete feature which can be coded with integers. We used the following scheme: integer 1 stands for BI-RADS category 1, 2 for 2, 3 for 3, 4 for 4a, 5 for 4b, 6 for 4c, 7 for 5 and 8 for 6, respectively.
The feature selection procedure is depicted in Fig. 2. The highest value of AUC in the case of morphological features, see Fig.2 a), was obtained for a set of seven features, however the ANOVA analysis showed that there is no significant difference between this set and a subset containing six features. Therefore, the smaller subset was selected as the best performing. It consisted of six features which were the DWR, NRL entropy, normalized residual value, overlap ratio, extent and the morphological closing ratio. Similar analysis was performed to combine the BI-RADS with morphological features. The highest AUC was obtained for a 13 feature subset (including BI-RADS) which was then reduced to seven. Features which, when added to the BI-RADS, improved the classification most, were the extent, overlap ratio, NRL entropy, circularity, elliptic-normalized circumference and the normalized residual value.
Main results are depicted in Table 3. The use of the best performing morphological feature subset and the BI-RADS resulted in the AUC values of 0.901 and 0.944, respectively. The classification was improved when the BI-RADS and morphological features were combined. With six features added to the BI-RADS, the AUC value increased to 0.986. Fig. 3 shows the ROC curves obtained for the developed classifiers. The optimal sensitivity, specificity and accuracy of each classifier was determined by means of the ROC curve for the point which was the closest to (0, 1) [Fawcett2006].
|Features||Sensitivity [%]||Specificity [%]||Accuracy [%]|
|Morphological, optimal cut-off||75.0||88.0||84.1|
|Combination, optimal cut-off||96.8||94.7||95.3|
|Morphological, customized cut-off||100||58.7||71.0|
|Combination, customized cut-off||100||74.7||82.2|
|BI-RADS cat. 3 cut-off||100||54.7||68.2|
In the case of the breast lesion classification, it is important to have as high sensitivity as possible to detect all malignant lesions. According to Table 1 in the case of the BI-RADS, 100% sensitivity could only be obtained when lesions with the BI-RADS category higher than 3 were classified as malignant. The corresponding specificity was 54.7%. Taking this into consideration, the thresholds of the classifiers were customized based on ROC curves to ensure 100% sensitivity and the corresponding accuracies and specificities were calculated. Results are depicted in Table 3. The specificity of the best performing morphological features subset was 58.7%, similarly to the BI-RADS. Moreover, with the combined classifier it was possible to obtain 100% sensitivity and specificity of 74.7%. Table 4 shows how many biopsies could be avoided in the case of benign lesions by using various classifiers with cut-offs ensuring 100% sensitivity.
With the use of morphological features it was possible to differentiate between malignant and benign breast lesions. The best performing feature subset achieved AUC value of 0.901. However, this result was worse than in the case of the BI-RADS for which AUC value was 0.944. The classifier based on the morphological features could not outperform the radiologist who assigned the BI-RADS categories. It must be emphasized that the BI-RADS depends on the physician’s experience and for a novice radiologist the BI-RADS performance could be lower than the performance of the CAD system. However, both AUC values, obtained for the BI-RADS and for the morphological features, should be considered satisfactory.
The best morphological subset consisted of six features which were the DWR, NRL entropy, normalized residual value, overlap ratio, extent and the morphological closing ratio. The DWR was reported as the best feature for breast lesion classification in several papers [Drukker2004a, Drukker2002, Drukker2005, Horsch2002a, Chen2003, Chen2004]. NRL entropy is higher for lesions which have irregular contour. The normalized residual value was indicated as the best feature for breast lesion classification in [Alvarenga2010]. This feature is based on the difference between the lesion convex hull area and the regular area which was normalized by convex hull perimeter. The overlap ratio is the ratio of the convex hull area and the lesion area. It measures contour irregularity and was reported as one of the best features in original work [Alvarenga2010]. The extent is the ratio of lesion area to the smallest rectangle inscribed in it. The morphological closing ration tends to be greater for lesions which have irregular contour [Alvarenga2010].
|Nr of benign lesions||41||19||14||0||1||75|
|BI-RADS cat. 3 cut-off||0||1||5||6||20||41|
According to the survey investigating the performance of various features, the best morphological features for the breast lesion classification are the elliptic-normalized skeleton, lesion orientation, the number of substantial protuberances and depressions, DWR and the overlap ratio [GomezFlores2015]. For this set, the reported AUC value was 0.94. In our case, the use of the proposed features lead to the AUC value of 0.871. This difference may be due to the dataset. However, this particular performance should also be perceived as good. The main doubt lies in the choice of the lesion orientation. This feature measures the angle of major axis of above lesion best-fit ellipse and is extremely operator dependent since the motion of the imaging probe may easily change it. In our study, the calculated AUC value for the orientation was 0.532. Similar result was reported in the original study [Shen2007] where this feature had negligible impact on the classification and was the first one to be removed from the feature pool when applying backward remove feature selection method.
A great advantage of morphological features is revealed when they are combined with the BIRADS. It was possible to increase the AUC value to 0.986 by adding six features, namely the extent, overlap ratio, NRL entropy, circularity, elliptic-normalized circumference and the normalized residual value. The circularity is the ratio of a lesion squared perimeter and the lesion area [Alvarenga2012]. The elliptic normalized circumference quantify the anfractuosity of a lesion contour [Chen2003].
One of the main goals of breast lesion classification is to have 100% sensitivity and as high specificity as possible to indicate benign lesions and reduce the number of unnecessary biopsies. In our study, as all lesions were biopsy proven, therefore their evaluation must be considered problematic for the radiologist. The first advantage of the combined classifier is that it can be used directly to support the radiologist in the process of decision making. First, the radiologist assigns a specific BI-RADS category to the lesion, then the combined classifier containing the morphological features is used to indicate the level of malignancy. The decision of the radiologist is improved be means of morphological features. The morphological features are used to separate malignant and benign lesions that were assigned the same BI-RADS category. As it is shown in Fig. 3, 100% sensitivity was obtained with high specificity of 74.7%, much higher than in the case of the BI-RADS alone. According to Table 4, with the combined classifier it would be possible to correctly classify all benign lesions with the BI-RADS category 3 as in the case of the BI-RADS alone. However, in addition few examples of benign lesions with higher BI-RADS categories would be correctly classified. With the combined classifier, it would possible to avoid 56 biopsies. However, it must be stressed that there were no malignant lesions in the dataset with BI-RADS category 3, the classification performed by the radiologist was already at high level. Moreover, use of the best morphological feature subset provides higher specificity at 100% sensitivity than the BI-RADS as it is shown in Table 3, although its AUC value is lower. However in this case it was not be possible to correctly classify all lesions with the BI-RADS category 3.
The main disadvantage of the combined classifier is that it was developed based on the experience of a particular physician who assigned the BI-RADS categories. Supposedly, several issues might occur. First, although the developed CAD system can serve as support for a particular radiologist, it might, however, not work when used by another radiologist. For example, a less experienced physician can have a worse performance, which translates to a different BI-RADS ROC curve and therefore affects the performance of the combined classifier. Next, the choice of morphological features chosen to improve BI-RADS may depend on the radiologist’s experience. For example, features developed to assess spiculations can be selected if the radiologist does not evaluate spiculations successfully. In all these cases, the system would require separate training to support a particular radiologist. It would be interesting to utilize the combined system in the radiologists’ training. Hypothetically, after the assignation of the BI-RADS categories to an exemplary dataset, the feature selection can be used to indicate which features improves the diagnosis in this particular case. For example, a “novice”, inexperienced radiologist (or even an experienced one) can be told to pay more attention to spiculations. Moreover, radiologists perceive image features differently and the tumor assessment is usually descriptive. Numerical values reflecting the level of spiculation (or other contour characteristic) quantitatively could be helpful by themselves, even without a CAD system. It could enable a more objective lesion description. The majority of morphological features is easy to illustrate on the image which may help the radiologist to analyze the lesion.
In this work, we investigated the usefulness of morphological features for the breast lesions differentiation. The main goal was to find features that can improve the BI-RADS. This was successfully accomplished by incorporating six morphological features. The use of the developed combined classifier leads to 100% sensitivity and high specificity of 74.7%. It can be used to reduce the number of unnecessary biopsies. The combined classifier depends on the experience of a particular radiologist, however, the presented in this work approach can be used to train a classifier for a different radiologist. Besides, other features, for instance texture features can be incorporated in the future to potentially improve the classification further. The developed CAD system can also be used in the radiologists’ training. After the classifier development phase, the radiologist can be informed which features improve his diagnosis accuracy. This enables the radiologist to widen his knowledge of lesion appearance in ultrasound image. With the help of artificial intelligence, the radiologist can hypothetically improve his classification performance.
In our study we obtained good classification performance with particular morphological features, even without the BI-RADS system. However, in comparison with the survey paper [GomezFlores2015], we determined a slightly different best performing feature subset.
This work was supported by the National Science Center Grant Number UMO- 2014/13/B/ST7/01271.
Conflict of interest statement