The purpose of our study is to develop a novel unsupervised segmentation method of pathology images. Staging of lung cancer is a major factor of prognosis. Measuring the maximum dimensions of the invasive component in a pathology images (see Fig. 1) is an essential task [detterbeck2017eighth]. Furthermore, measuring the maximum dimensions of the total tumor including the noninvasive components is also important [travis2016iaslc]. Therefore, segmentation methods for visualizing invasive and noninvasive components on pathology images could assist the pathological examination. In our study, we investigated whether representations learned by an unsupervised method aid in the segmentation of pathology images. Research on unsupervised segmentation methods, especially for pathology images, is very promising because of the difficulty of obtaining manual annotations.
Our main contribution is to combine unsupervised representation learning with conventional clustering for pathology image segmentation. As an unsupervised representation learning, we adopt spherical -means [dhillon2001concept]. Spherical -means training is much faster and easier implemented than CNN-based training. For clustering, we adopt conventional -means [macqueen1967some] . To our knowledge, our method is the first to employ spherical -means to learn feature representations for unsupervised segmentation.
The proposed segmentation method consists of two phases: (1) unsupervised learning of feature representations using spherical -means and (2) segmentation by applying conventional -means to feature representations. In phase (1), we conduct spherical -means in order to learn the feature representations of image patches randomly extracted from an unlabeled image. The purpose of this phase is to obtain centroids that can transform image patches to discriminative feature representations. In phase (2), we use conventional -means to assign labels to the representations extracted by the centroids on the full image.
2.1 Representation Learning
It is known that spherical -means can be used as representation learning method [coates2012learning]. Given a set of image patches , spherical -means aims to find optimal centroids by dividing data points into clusters according to:
where is a representation of
, called ”code vector”. Centroidsand a code vector have constraints respectively to make it possible to reconstruct when given and . A code vector need to meets to have at most a single non-zero entry. Centroids need to satisfy the constraint that each column of have unit length. In order to accomplish Equation 1, we alternately optimize and as follows:
where is the matrix whose columns are the code vectors . Optimal centroids could be used as filters which extract features[coates2012learning]. Spherical -means can be rapidly executed so that can be set to a large value (e.g., ). Thus, we can obtain large centroids and learn a large number of features.
We extract training patches from an unlabeled target image by randomly cropping sub-images of pixels. Note that we should carefully choose the patch size because -means feature learning is significantly sensitive to the dimensionality of the input data. Note that we need to set a proper threshold in order not to include patches from the background. After extracting training patches, we normalize the brightness and contrast of each patch. While Coates & Ng[coates2012learning]
use mean and variance of each patch, we instead use mean and variance of entire dataset . The previous application of spherical
-means aims to classify test images which are independent of each other. In contrast, our method aims to cluster patches from the same image which are not independent. By using global mean and variance, we can retain the relative intensities among patches. A previous study has shown that correlation has a bad effect on image recognition experiments[coates2010analysis]. After normalization, we apply ZCA whitening transform [bell1997independent] to the normalized patches in order to decorrelate them.
In the segmentation phase, we first extract a possible number of patches of pixels from the target image separated by
pixels each. Note that strideis not larger than in order to ensure overlapping patches. As with extracting training patches, we select only voxels within the sample by thresholding. For transforming image patches to feature representations, we utilize a typical pipeline similar to a single-layered CNN. Trained filters of pixels in a patch were applied with a stride of pixels in order to extract features. We adopt the soft-threshold nonlinearity as feature extraction function . After feature extraction using filters, we obtain an array of composed of intermediate representations. We reduce the dimensionality of each representation by sum pooling. Concretely, we divide each intermediate representation into four equal-sized squares, and apply sum pooling to each square region in order to obtain a 4 dimensional pooled vector. We concatenate pooled vectors from the all intermediate representation into a dimensional vector which is used as the final representation. The process of feature extraction is illustrated in Fig. 2. Next, we divide the final representations into clusters by conventional -means in order to assign a label to each representation. Finally, we project each labels onto a subpatch of pixels centered in a corresponding extracted patch.
3 Experiments and RESULTS
We utilized a set of 70 pathology images from the same lung cancer specimen. The original size of the images is approximately 200,000100,000 pixels and the resolution of the images is 0.220 0.220 m/pixel. For experiments, we converted it into reduced scale images of approximately 2,0001,000 pixels and 2222 m/pixel. The goal of segmentation was to divide each image into three histopathological regions: (a) invasive carcinoma; (b) noninvasive carcinoma; and (c) normal tissue.
3.2 Parameter Settings
We prepared 100,000 patches of size pixels randomly extracted from one representative image. For representation learning with spherical -means, we set the number of clusters to 200. In the beginning of segmentation phase, we extracted patches of pixels with a stride of 1 pixel from a target image. For creating feature representations, we applied trained filters with a stride of 2 pixels to in order to obtain 800 dimensional feature representations. For segmentation, we conducted the conventional -means to divide representations into three regions.
We used one manually annotated image to evaluate the proposed method. For quantitative evaluation, we used the standard metric for clustering, normalized mutual information (NMI). A larger NMI value means better segmentation results. We compared our method with traditional -means segmentation and multithreshold Otsu method [otsu1979threshold]. As shown in Fig. 3, our method outperforms traditional methods. Figure. 4 shows a qualitative example produced by the proposed methods. Our method divided pathology image into invasive carcinoma, noninvasive carcinoma, and normal lung more accurately than Multi Otsu and -means. Additionally, we applied centroids from one representative slice to remaining 69 slices for feature extraction and segmentation. Figure. 5 shows 3D renderings of 70 segmented slices. A rendering of our results is much easier to observe anatomical regions than a rendering of multithreshold Otsu’s segmentation results.
Our methods significantly outperformed traditional unsupervised methods both quantitatively and qualitatively. The reason for this may be that our method could learn not only pixel intensities but also textures of local regions. Moreover, we found that the centroids obtained by clustering one representative slice can be utilized to extract representations of other slices for segmentation. It can be suggested that the centroids are not over-fitting to a single slice. However, our results often caused false labels in higher intensity pixels in normal tissue region. This is because, seemingly, our method reflected pixel intensities too much.
We proposed a novel unsupervised segmentation method that obtains segmented images by clustering feature representations. Our proposed method outperforms the traditional unsupervised methods. We demonstrated the potential abilities of unsupervised representation learning for pathology image segmentation. Our segmentation method could be applicable to both 2D and 3D medical imaging applications.
This research was supported by the Kakenhi by MEXT and JSPS (26108006, 17K20099) and the JSPS Bilateral International Collaboration Grants.