Wireless capsule endoscopy (WCE) is a new device which is able to investigate the entire gastrointestinal (GI) tract without any pain. It captures more than 55000 frames during an examination (a minimum of two frames per second and a recording time of eight hours). Physicians need to spend a long time to review these frames. Therefore, it would be appropriate to reduce the review time by an automatic method. To date, various methods have been proposed to recognize bleeding or ulceration regions in the GI tract, but limited studies have been executed to recognize abnormal (diseases) and tumor tissues alizadeh2014segmentation, billahgastrointestinal.
To detect the bleeding region in the WCE frames, a method was presented based on color similarity features guobing2011novel. The pixels classified as bleeding pixels were used as seeds in a region-growing algorithm to find the entire bleeding regions. In another study, color features were extracted from the RGB and HSI color spaces and pixels classified using a probabilistic neural network pan2011bleeding
. Baopu and Meng presented a method based on chrominance moment as a color feature and uniform local binary patterns (LBP) to detect bleeding regions in a WCE frameli2009computer
. A three-layer perceptron neural network was used to detect bleeding pixel. Three classifiers were utilized to evaluate the method: support vector machine (SVM); linear discriminant analysis; and K-nearest neighbors (KNN).
To detect ulcer in the WCE frames, a method was proposed using Gabor filter, color and texture features, following by a neural network to classify the frames karargyris2009identification. They also developed another method to detect ulcer and polyp frames in the WCE videos karargyris2011detection. Their method was extracted features using log Gabor filters, color features, and Susan edge detector (a geometry feature). Then, the frames were classified using an SVM.
A method was introduced to detect the frames that contained Crohn’s disease by the edge histogram descriptor in four angles, color features based on the LUV color space, texture features extracted by a Gabor filter bankkumar2012assessment. An SVM was employed to classify lesion tissues in a frame using the extracted features. Another method was presented using uniform LBP and discrete wavelet transform to detect tumor frames following by an SVM classifier li2012tumor. Li et. al. presented another method based on multi-scale LBP and multiple classifiers (KNN, multi-layer perceptron (MLP) neural network, and SVM) li2011computer. We proposed a method to segment abnormal regions in a frame based on the intensity value maghsoudi2012segmentation. That method was sensitive to illumination and did not work well to detect Crohn’s disease.
In one of the recent studies szczypinski2014texture, a method was proposed to detect erosion, ulcer, and bleeding regions. MaZda software was used to extract texture features from seven color spaces, and then, an SVM classified the frames. We presented methods to segment bubbled regions maghsoudi2014detection and to distinguish different organs maghsoudi2012automatic in the WCE frames which they might be helpful for future studies.
Overall, we may conclude that color features were most effective features for detection of bleeding regions while these features were not as effective for abnormal regions detection. None of the above studies used geometry, color, and texture features together for detection of abnormalities (expect ulceration and bleeding) in the WCE frames; we employed different types of features to improve the detection rate. Moreover, each method was devised to detect a specific type of disease in the GI tract. To overcome the above limitations, we propose two methods using the different texture, geometry, and color features to detect tumor frames (frame based study) and abnormal regions (pixel-based study). They have four advantages relative to previous methods: first, different features are extracted to assure the highest possible detection rate; second, three important classes of abnormalities are detected (tumor, bleeding, and a group of five other diseases); third, they improve upon the results of previous studies; fourth, to find the best features for this kind of detection of the three classes.
2 Definitions of Diseases and Features
2.1 Bleeding, Tumor, and Other Abnormalities
Our study is limited to three major abnormalities: tumor (especially sub-mucosal tumor), blood and bleeding, and other abnormal regions (lymphangiectasia, lymphoid hyperplasia, xanthoma, stenosis, and Crohn’s disease) schmassmann2005handbook. The etiology of the diseases is not clear; however, the early diagnosis can affect the treatment. Therefore, the development of techniques for detecting these lesions is important and reasonable.
2.2.1 Gabor Filter
The Gabor filter (G function) is same as a sinusoidal plane of frequency and orientation modulated by a Gaussian envelope before convolving with images. This filter has good localization properties in both spatial and frequency domains and has been used for texture segmentation jain1991unsupervised, penjweini2017investigating. The impulse response of the 2D Gabor filter is
and is the rotation angel of the impulse response, and are the coordinates, and
are the standard deviations of the Gaussian envelope in theand directions, respectively, and are the frequency and phase of the sinusoidal, respectively.
2.2.2 Local Binary Patterns (LBP)
Local binary pattern has been widely used to extract texture feature rema2013segmentation, li2009small, maghsoudi2016detection, ojala2002multiresolution, yu2014improved. The correlation between the number of neighbors and radius is described below and this correlation illustrated in Fig. 1
Then, two features are calculated based on LBP: LBP1 and LBP2. By rotating the LBP start point in a block to find all possible LBP values for a center, the minimum LBP is invariant to rotation. To extract LBP1 from rotation invariant LBP, three histograms are devised. LBP1 is counted based on the following ranges while selecting each range depends on the radius:
where , , and . By counting repetition number of LBP1 values placed between two consecutive members of a histogram, 15 features are extracted for (neighbors and radius) = (16 and 3), 11 features for (neighbors and radius) = (12 and 2), 7 features for (neighbors and radius) = (8 and 1), and 4 features by calculating the percentage of LBP1 placed between H1(15) and H1(16), H1(14) and H1(15), H2(11) and H2(12), and H3(7) and H3(8). Therefore, 37 features are extracted using the LBP1 histograms.
To extract LBP2 from LBP (not rotation invariant LBP), another histogram is used to find which neighbor is more important (which one repeats further than others). To calculate this new feature, neighbors are labeled by the following series that are demonstrated in Fig. 1:
and the order of neighbors is defined using the following series:
Therefore, 36 features are extracted using the LBP2 histograms.
2.2.3 Law’s Features
Law’s texture features have been used widely laws1980textured. These features were developed by Kenneth Ivan Law at the University of Southern California. 21 Law’s masks for 7 samples and 15 Law’s masks for 5 samples were applied to the frames.
2.2.4 GLCM Features
Fourteen Haralick features sahu2015classification
are extracted from a gray level co-occurrence matrix (GLCM): contrast; correlation; entropy; energy; difference variance; difference entropy; information measure of correlation 1; information measure of correlation 2; inverse difference; sum average; sum variance; sum of squares; sum entropy; and maximum correlation coefficient.
Additional features that are extracted from GLCM are soh1999texture, clausi2002analysis
: auto correlation; cluster prominence; cluster shade; dissimilarity; homogeneity; maximum probability; inverse difference normalized; and inverse difference moment normalized. Totally, 22 features are extracted from GLCM. In this study, if the directions and distances of GLCM are not mentioned, direction and distance are 0 and 1, respectively.
2.2.5 Invariant Moments
The following invariant moments are computed based on the information provided by both shape boundary and its interior regions chen1993improved:
where is the two-dimensional moment of the function . The order of the moment is where and are both natural numbers. The following central moments are defined to generate features that are invariant to translation:
In the discrete domain, these moments become:
The moments are further normalized for the effects of scale change using:
where . From the normalized central moments, the following invariant moments (to scale, translation, and rotation) can be calculated hu1962visual:
3 Proposed Methods
3.1 Detection of Frames Showing Tumor
To find tumor frames in a video, the following steps were applied and also illustrated in Fig. 2:
Frame size was in our data set. As discussed, geometry features were useful in detecting frames showing tumor while the borders interfered with this task. Therefore, a central region of pixels was extracted from each frame for farther processing.
The Gabor filter was applied using the following parameters:
This process generated 24 images. Then, each set of 4 images generated using each of the above values were averaged to generate six additional images.
In addition to the above, 75 Law’s features were extracted as follows: mean, variance, skewness, kurtosis, and entropy of 15 images, generated by convolving the image with 5-sample Law’s masks (15 masks). 88 features were extracted from GLCM calculated for four different angles and 7 invariant moments were calculated. These features were extracted from the gray scale version of each original frame. Totally, 1160 features (990+75+88+7) were extracted.
The features were normalized between zero and one because different features extraction methods were used. Thirty most discriminant features for detection of tumor frames were found using the Fisher Test fisher1915mathematical. A multi-layer Perceptron (MLP) neural network mohapatra2012lymphocyte was used to classify the WCE frames.
3.2 Detection of Pixels Showing Bleeding, Tumor, and Other Abnormalities
This algorithm separates normal from abnormal tissues (bleeding, tumor, and other abnormalities). Fig. 3 shows the overview of this approach and the steps are described below:
Then, each frame was divided into 256 sub-images.
A sub-image size was and an LBP block was ; therefore, 26 rows and columns were possible for being the center of a block. Therefore, LBP was calculated 676 times for a sub-image (26 rows 26 columns). 74 features were extracted using LBP1 from the grayscale and green channel for each sub-image. Moreover, 36 features were extracted using LBP2 from the grayscale version of sub-images.
The mean and GLCM features were extracted from the grayscale sub-images for four angles (0, 45, 90, and 135). Therefore, the extracted features increased to 110 + (22+1) 4=202.
As discussed in Section 2.2.3, 21 two-dimensional masks (for 7 samples) were generated using the one-dimensional Law’s kernels. The mean, variance, skewness, kurtosis, and entropy were extracted from the convolved sub-image with these masks. 21 5 = 105 features were extracted in this step.
Eight Gabor filters were generated by two frequencies and four angles to apply on a sub-image. Moreover, each set of 4 images generated using the Gabor filters, same as the first method, were averaged to generate two additional images. Then, the mean, variance, skewness, kurtosis, and entropy were extracted from the generated images. Therefore, features extracted in this step were 50 (5 (8+2)).
HSV color space has been a popular color space to detect objects using color information for different applications maghsoudi2012automatic,junzhou2011contourlet,maghsoudi2016tracker,pan2011bleeding. The mean, variance, skewness, and kurtosis were extracted from five color channels (red, green, blue, hue, and saturation) and the grayscale sub-image. Totally, 202 (LBP) + 105 (Law) + 50 (Gabor) + 24 (color channels + grayscale) = 381 features were extracted.
The features ranges extracted above were normalized between zero and one because different features extraction methods were used. Then, the number of features was reduced to 30 using the Fisher test fisher1915mathematical; the selected features were the most discriminant features for distinguishing normal and abnormal regions in a frame.
Three networks with three hidden layers were employed to classify normal and abnormal regions leondes1998neural. The tumor neural network, other abnormalities neural network, and bleeding neural network performances were respectively 0.1028, 0.0548, and 0.0247.
In this study, 233 frames taken from 59 patients were used. The videos were captured by the M2A capsule endoscopy device manufactured by the Given Imaging Company and provided by the Shariati Hospital, Tehran, Iran. From these videos, 43 tumor frames from 11 patients, 44 normal frames from 12 patients, 33 bleeding frames from 9 patients, and 113 other abnormalities frames were selected from 29 patients. Lymphangiectasia, stenosis, lymphoid hyperplasia, xanthoma, and Crohn’s disease had respectively 18 frames from 5 patients, 31 frames from 6 patients, 19 frames from 4 patients, 17 frames from 6 patients, and 28 frames from 8 patients.
The specialist, Dr. Soleimani, supervised the process to segment the normal and abnormal regions in the frames. Fig. 4 shows normal and abnormal regions separated in two frames. Then, random sampling and cross-validation over the patient’s frames were used to divide the frames into the training (approximately 75%) and testing sets (approximately 25%).
|Tumor (TEST)||Normal (TEST)|
|Number of features||Sensitivity||Specificity|
4.1 Detection of Frames Showing Tumor
In this case, 43 tumor frames (31 frames for training) and 161 non-tumor frames (120 frames for training) were selected. Non-tumor frames were 44 normal and 117 other abnormalities frames. The aim was to find frames that contain tumor regions in the testing samples (totally 12 tumor frames from 4 patients+ 41 non-tumor frames from 13 patients). Bleeding frames were not counted here because some bleeding frames were not obvious (because of the bleeding regions). If bleeding frames were mixed by non-tumor frames, more features should have been extracted to distinguish them from tumor frames, especially color features; moreover, some tumor frames contain bleeding regions (kind of confusion in classification).
The usual method to determine the Gabor filter parameters (Eq. 12) is to apply a large bank of Gabor filter covering all possible frequencies, , , and x and y ranges. For frequency, the range was between 0.5 and 2.5 and we considered two samples. The x and y ranges were selected in a way that the output image seemed meaningful by three steps jain1991unsupervised. Four spatial angles were considered to examine different orientations.
The shape was one of the main features for detection of tumor frames. In addition to geometry features, texture and color features of a region were important. Therefore, the Gabor filter was applied and features were extracted from the generated images. The results are illustrated in Table 1. Two normal frames indicated as tumor frame are demonstrated in Fig. 5.
The Fisher test was used to find thirty most discriminating features (out of 1160 features) from the frame. Table 2 shows the effect of the number of features on the sensitivity and specificity of the first method, indicating that 30 features are optimal (regarding the feature pace = 5).
4.2 Detection of Pixels showing Bleeding, Tumor, and Other Abnormalities
Here, 12 tumor frames, 10 completely normal frames, 7 bleeding frames, and 31 other abnormalities frames generated the dataset for evaluation of the second method. However, for example, the actual dataset for detection of tumor pixels was (12 tumor frames + 10 normal frames) 512 512 pixels = 2,621,452 pixels.
After using the neural networks, each frame was smoothed by a median filter. The median filter was applied on a frame with 25-pixel samples (window size). The methods were estimated by measuring sensitivity, specificity, accuracy, and precisionaltman1994diagnostic.
|Number of features||Sensitivity||Specificity|
Fig. 6 demonstrates how the second method distinguishes normal and abnormal regions in some sample frames and Table 3 shows these measures for the testing samples. In addition to Table 3, these results are illustrated as a chart in Fig. 7.
Same as the first method, the Fisher test was used to find 30 most discriminating features (out of 381 features) for each sub-image. Table 4 shows the effect of a number of features on the sensitivity and specificity of the second method (only for detecting tumor regions), indicating that 30 features were optimal; these 30 features were illustrated in Table 5. As illustrated, using 35 features reduced the performance because some features caused confusion in making a decision. Moreover, using 25 features reduced the performance because some discriminating features were absent.
The methods were examined using Matlab 2016a on a MacBook pro 2.7 GHz Intel Core i5 with 8 GB 1867 MHz DDR3. The average required time to process a frame was second (mean standard deviation) and this time was second.
Two methods were proposed: the first method can find tumor frames in a video (or a dataset) and the second one can recognize normal from bleeding regions, normal from tumor regions, and normal from a group of five abnormalities regions in a frame. Geometrical information was one of the main features to distinguish tumor and normal tissues in a frame. Therefore, invariant moments were used to extract shape features once from the created frame after using the Gabor filter bank and then, from the grayscale version of the original frame. Moreover, GLCM, statistical, and Law’s features were used twice; first, to show an impact of texture feature for detection of tumor frames and secondly, to extract geometry features after applying the Gabor filters. This method indicated respectively 100%, 96%, 93%, and 90% as sensitivity, accuracy, specificity, and precision for detection of frames showing the tumor. Invariant moments helped us to achieve well results.
The following features were the final ones for the training of each neural network. TEF, SFB, SFT, and SFD are total extracted features, selected features for bleeding, selected features for tomur, selected features for diseases, respectively.
It is important to find frames containing abnormalities; however, it would be helpful to find abnormal regions in a frame. The second method was presented to distinguish normal and abnormal regions in a frame. In this case, texture features were more helpful; therefore, GLCM, statistical, LBP (LBP1 and LBP2), and Law’s features were used. The results of this method are illustrated in Table 3.
To summarize the significance of our study, as Table 6 shows, the frame based detection method for tumor achieved to a significantly higher sensitivity compared with the tumor and polyp detection methods. Although, our method needs to be evaluated for more tumor frames as our testing dataset was limited to 12 frames. We will attempt to evaluate the method by gathering more frames in future works. On the other hand, we had enough data set to evaluate our pixel-based method. The pixel-based method to detect bleeding achieved a slightly higher specificity compared to the previous studies. In addition, the pixel-based method to detect tumor and abnormalities showed an improvement in the reported results by the previous studies.
Deep learning has been used widely in image processing applications mansourdeep
. The main advantage of the presented work here compared to a convolutional neural network (CNN) is that this study showed the importance of a wide range of features (texture, geometry, and color features) to detect different types of abnormalities in the WCE frames while a CNN can be trained for a specific dataset and the internal features used by a CNN cannot be determined to extend the application. On the other hand, we needed to gather far more frames to train a CNN. In addition, The method presented here can be used to classify the regions for deep learning applications as finding ROI in the frames is time consuming for physicians. In future work, we will try to collect more dataset to train and test a CNN and use some methods to eliminate the redundant imageli2014online, maghsoudi2014detection. It might promise a method to segment abnormalities regions using a real-time process that may assist the WCE manufactures to add biopsy and drug delivery to the WCE.
We have been grateful for getting help from Dr. Hossein Asl Soleimani and Shariati Hospital for sharing the WCE frames with us.