Unsupervised Pathology Image Segmentation Using Representation Learning with Spherical K-means

04/11/2018 ∙ by Takayasu Moriya, et al. ∙ 0

This paper presents a novel method for unsupervised segmentation of pathology images. Staging of lung cancer is a major factor of prognosis. Measuring the maximum dimensions of the invasive component in a pathology images is an essential task. Therefore, image segmentation methods for visualizing the extent of invasive and noninvasive components on pathology images could support pathological examination. However, it is challenging for most of the recent segmentation methods that rely on supervised learning to cope with unlabeled pathology images. In this paper, we propose a unified approach to unsupervised representation learning and clustering for pathology image segmentation. Our method consists of two phases. In the first phase, we learn feature representations of training patches from a target image using the spherical k-means. The purpose of this phase is to obtain cluster centroids which could be used as filters for feature extraction. In the second phase, we apply conventional k-means to the representations extracted by the centroids and then project cluster labels to the target images. We evaluated our methods on pathology images of lung cancer specimen. Our experiments showed that the proposed method outperforms traditional k-means segmentation and the multithreshold Otsu method both quantitatively and qualitatively with an improved normalized mutual information (NMI) score of 0.626 compared to 0.168 and 0.167, respectively. Furthermore, we found that the centroids can be applied to the segmentation of other slices from the same sample.



There are no comments yet.


page 3

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Purpose

The purpose of our study is to develop a novel unsupervised segmentation method of pathology images. Staging of lung cancer is a major factor of prognosis. Measuring the maximum dimensions of the invasive component in a pathology images (see Fig. 1) is an essential task [detterbeck2017eighth]. Furthermore, measuring the maximum dimensions of the total tumor including the noninvasive components is also important [travis2016iaslc]. Therefore, segmentation methods for visualizing invasive and noninvasive components on pathology images could assist the pathological examination. In our study, we investigated whether representations learned by an unsupervised method aid in the segmentation of pathology images. Research on unsupervised segmentation methods, especially for pathology images, is very promising because of the difficulty of obtaining manual annotations.

Figure 1: Left: invasive component in lung adenocarcinoma. The largest diameter of the invasive component is significant prognostic factor for lung cancer. Right: noninvasive component in lung adenocarcinoma. On pathology images, noninvasive components are observed as lepidic features.

Our main contribution is to combine unsupervised representation learning with conventional clustering for pathology image segmentation. As an unsupervised representation learning, we adopt spherical -means [dhillon2001concept]. Spherical -means training is much faster and easier implemented than CNN-based training. For clustering, we adopt conventional -means [macqueen1967some] . To our knowledge, our method is the first to employ spherical -means to learn feature representations for unsupervised segmentation.

2 Method

The proposed segmentation method consists of two phases: (1) unsupervised learning of feature representations using spherical -means and (2) segmentation by applying conventional -means to feature representations. In phase (1), we conduct spherical -means in order to learn the feature representations of image patches randomly extracted from an unlabeled image. The purpose of this phase is to obtain centroids that can transform image patches to discriminative feature representations. In phase (2), we use conventional -means to assign labels to the representations extracted by the centroids on the full image.

2.1 Representation Learning

It is known that spherical -means can be used as representation learning method [coates2012learning]. Given a set of image patches , spherical -means aims to find optimal centroids by dividing data points into clusters according to:


where is a representation of

, called ”code vector”. Centroids

and a code vector have constraints respectively to make it possible to reconstruct when given and . A code vector need to meets to have at most a single non-zero entry. Centroids need to satisfy the constraint that each column of have unit length. In order to accomplish Equation 1, we alternately optimize and as follows:


where is the matrix whose columns are the code vectors . Optimal centroids could be used as filters which extract features[coates2012learning]. Spherical -means can be rapidly executed so that can be set to a large value (e.g., ). Thus, we can obtain large centroids and learn a large number of features.

2.2 Pre-processing

We extract training patches from an unlabeled target image by randomly cropping sub-images of pixels. Note that we should carefully choose the patch size because -means feature learning is significantly sensitive to the dimensionality of the input data. Note that we need to set a proper threshold in order not to include patches from the background. After extracting training patches, we normalize the brightness and contrast of each patch. While Coates & Ng[coates2012learning]

use mean and variance of each patch

, we instead use mean and variance of entire dataset . The previous application of spherical

-means aims to classify test images which are independent of each other. In contrast, our method aims to cluster patches from the same image which are not independent. By using global mean and variance, we can retain the relative intensities among patches. A previous study has shown that correlation has a bad effect on image recognition experiments

[coates2010analysis]. After normalization, we apply ZCA whitening transform [bell1997independent] to the normalized patches in order to decorrelate them.

2.3 Segmentation

In the segmentation phase, we first extract a possible number of patches of pixels from the target image separated by

pixels each. Note that stride

is not larger than in order to ensure overlapping patches. As with extracting training patches, we select only voxels within the sample by thresholding. For transforming image patches to feature representations, we utilize a typical pipeline similar to a single-layered CNN. Trained filters of pixels in a patch were applied with a stride of pixels in order to extract features. We adopt the soft-threshold nonlinearity as feature extraction function . After feature extraction using filters, we obtain an array of composed of intermediate representations. We reduce the dimensionality of each representation by sum pooling. Concretely, we divide each intermediate representation into four equal-sized squares, and apply sum pooling to each square region in order to obtain a 4 dimensional pooled vector. We concatenate pooled vectors from the all intermediate representation into a dimensional vector which is used as the final representation. The process of feature extraction is illustrated in Fig. 2. Next, we divide the final representations into clusters by conventional -means in order to assign a label to each representation. Finally, we project each labels onto a subpatch of pixels centered in a corresponding extracted patch.

Figure 2: Illustration of a pipeline for creating the feature representation. We first applied trained filters in an input patch and obtain intermediate representations. Next, we divide each intermediate representation into 4 equal-sized squares by sum pooling. Finally, we concatenate them into an dimensional vector used as the final representation.

3 Experiments and RESULTS

3.1 Datasets

We utilized a set of 70 pathology images from the same lung cancer specimen. The original size of the images is approximately 200,000100,000 pixels and the resolution of the images is 0.220 0.220 m/pixel. For experiments, we converted it into reduced scale images of approximately 2,0001,000 pixels and 2222 m/pixel. The goal of segmentation was to divide each image into three histopathological regions: (a) invasive carcinoma; (b) noninvasive carcinoma; and (c) normal tissue.

3.2 Parameter Settings

We prepared 100,000 patches of size pixels randomly extracted from one representative image. For representation learning with spherical -means, we set the number of clusters to 200. In the beginning of segmentation phase, we extracted patches of pixels with a stride of 1 pixel from a target image. For creating feature representations, we applied trained filters with a stride of 2 pixels to in order to obtain 800 dimensional feature representations. For segmentation, we conducted the conventional -means to divide representations into three regions.

3.3 Evaluations

We used one manually annotated image to evaluate the proposed method. For quantitative evaluation, we used the standard metric for clustering, normalized mutual information (NMI). A larger NMI value means better segmentation results. We compared our method with traditional -means segmentation and multithreshold Otsu method [otsu1979threshold]. As shown in Fig. 3, our method outperforms traditional methods. Figure. 4 shows a qualitative example produced by the proposed methods. Our method divided pathology image into invasive carcinoma, noninvasive carcinoma, and normal lung more accurately than Multi Otsu and -means. Additionally, we applied centroids from one representative slice to remaining 69 slices for feature extraction and segmentation. Figure. 5 shows 3D renderings of 70 segmented slices. A rendering of our results is much easier to observe anatomical regions than a rendering of multithreshold Otsu’s segmentation results.

4 Discussions

Our methods significantly outperformed traditional unsupervised methods both quantitatively and qualitatively. The reason for this may be that our method could learn not only pixel intensities but also textures of local regions. Moreover, we found that the centroids obtained by clustering one representative slice can be utilized to extract representations of other slices for segmentation. It can be suggested that the centroids are not over-fitting to a single slice. However, our results often caused false labels in higher intensity pixels in normal tissue region. This is because, seemingly, our method reflected pixel intensities too much.

Figure 3: Comparison of NMI scores. Score of our method outperforms traditional methods.
(a) Original slice
(b) Ground truth
(c) Multi Otsu
(d) -means
(e) Our result
Figure 4: Segmentation result of the pathology image. In the ground truth, the red, green, and blue regions correspond to the region of invasive carcinoma, noninvasive carcinoma, and normal tissue, respectively. Our method divided pathology image into invasive carcinoma, noninvasive carcinoma, and normal lung better than multithreshold Otsu and conventional -means. The lower right images in (c), (d) and (e) is the zoomed region in the black window. As shown in the zoomed image, our method causes much less false labels in small structure than multithreshold Otsu and -means.
(a) 3D rendering of Otsu’s segmentation results
(b) 3D rendering of our segmentation results
Figure 5: 3D renderings of segmentation results of all the 70 slices. We obtained centroids from one representative and applied them to remaining 69 images for feature extraction and segmentation. We only visualize invasive carcinoma (red) and noninvasive carcinoma (semitransparent green).

5 Conclusion

We proposed a novel unsupervised segmentation method that obtains segmented images by clustering feature representations. Our proposed method outperforms the traditional unsupervised methods. We demonstrated the potential abilities of unsupervised representation learning for pathology image segmentation. Our segmentation method could be applicable to both 2D and 3D medical imaging applications.


This research was supported by the Kakenhi by MEXT and JSPS (26108006, 17K20099) and the JSPS Bilateral International Collaboration Grants.