Classification of lung nodules in CT images based on Wasserstein distance in differential geometry

06/30/2018 ∙ by Min Zhang, et al. ∙ 0

Lung nodules are commonly detected in screening for patients with a risk for lung cancer. Though the status of large nodules can be easily diagnosed by fine needle biopsy or bronchoscopy, small nodules are often difficult to classify on computed tomography (CT). Recent works have shown that shape analysis of lung nodules can be used to differentiate benign lesions from malignant ones, though existing methods are limited in their sensitivity and specificity. In this work we introduced a new 3D shape analysis within the framework of differential geometry to calculate the Wasserstein distance between benign and malignant lung nodules to derive an accurate classification scheme. The Wasserstein distance between the nodules is calculated based on our new spherical optimal mass transport, this new algorithm works directly on sphere by using spherical metric, which is much more accurate and efficient than previous methods. In the process of deformation, the area-distortion factor gives a probability measure on the unit sphere, which forms the Wasserstein space. From known cases of benign and malignant lung nodules, we can calculate a unique optimal mass transport map between their correspondingly deformed Wasserstein spaces. This transportation cost defines the Wasserstein distance between them and can be used to classify new lung nodules into either the benign or malignant class. To the best of our knowledge, this is the first work that utilizes Wasserstein distance for lung nodule classification. The advantages of Wasserstein distance are it is invariant under rigid motions and scalings, thus it intrinsically measures shape distance even when the underlying shapes are of high complexity, making it well suited to classify lung nodules as they have different sizes, orientations, and appearances.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Lung nodules are commonly detected in screening for patients with a risk for lung cancer. Though the status of large nodules can be easily diagnosed by fine needle biopsy or bronchoscopy, small nodules are very difficult to assess,especially if they are located deep in the tissue or away from the large airways. Most of the small nodules can only be diagnosed in CT images as many visual features are extracted from the CT scan data for classifying a specific nodule to be benign or malignant. Some recent studies show that the shape of lung nodules is an important feature for distinguishing benign lesions from malignant ones. Clinicians usually classify nodules, based on their morphologic characteristics, and make predictions about the ”level of suspicion” [1]. The above manual analysis, however, is an error-prone and time consuming process. In this work, we introduced a new approach to classify detected lung nodules based on their shapes by using Wasserstein distance within the framework of differential geometry.

Wasserstein Distance has been widely studied for shape analysis, and it has been also used for medical image analysis [2]. In our work, the 3D shape of a lung nodule are conformally mapped to the unit sphere and the conformal map will distort the surface area. The area-distortion factor gives a probability measure on the unit sphere as measured from the Wasserstein space. Given any two probability measures, there is unique optimal mass transport map between them and the Wasserstein distance defined as the transportation cost between them. It has significant power to measure the intrinsic differences between shapes and which can be used for shape classification.

The ultimate goal of the classification is to differentiate the malignant nodules from benign ones, so we classify the nodules to two different category malignant and benign base on their Wasserstein Distance between the lung nodules with different morphology (Figure  1).

To the best of our knowledge, this is the first work that utilizes the Wasserstein distance for lung nodule classification. Wasserstein distances are invariant under rigid motions and scalings, thus it intrinsically measures shape distance even when the underlying shapes are of high complexity, making it well suited to classify lung nodules as they have different sizes, orientations, and appearances. We present a more efficient Wasserstein distance calculation algorithm under spherical metric.

(a) spherical (b)lobulation (c) spiculation

Figure 1: Lung nodules with different morphology
(a) source (b) result (c) target
Figure 2: Optimal Mass Transport map for Sphere

2 Related Work

Pulmonary nodules are often detected in high-risk population of lung cancer [3]. Over the years, there has been a large body of work for lung nodules detection and diagnosis [4, 5, 6]. Studies have shown that morphologic features play an important role for discriminating malignant nodules from benign ones on computed tomography (CT) [7]. Also studies have shown that quantitative analysis of the morphologic futures of lung nodules will improve radiologists’ classification of malignant and benign nodules [8, 9].

Most of current approaches for classifying lung nodule are based on 2D features. Mori et al. [10] evaluate temporal changes as a feature of benign and malignant nodules based in a curvedness index in combination with dynamic contrast-enhanced CT. Gurney et al. [11]

designed CAD systems based on neural network for lung nodule classification. Furuya et al. 

[12] analyzed margin characteristic of 193 pulmonary nodules and classified the lung nodules based on the analysis. They found that spiculated and lobulated ones were malignant, and round were proved to be benign. Several groups developed the approaches based on 3D lung nodules models. For example, Kawata et al. [13] classified the lung nodules based on the 3D curvatures and the relationship of the nodules to their surrounding features. Ei-Baz et al presented [14] a 3D shape analysis method based on a shape index called spherical harmonics was introduced and applied for lung nodules’ classification. Kurtek et al. [15] provided a Riemannian framework to compare and match the 3D shapes by computing geodesic paths. Mahmoudi et al. [16] compute the histogram of pairwise diffusion distances between all points to describe the shapes. Jermyn et al. [17] compare and analyze the shape based on the definition of general elastic metric on the space of parameter domains.

Our 3D shape analysis approach is based on the spherical Wasserstein distance and we show that it has unique advantages over above mentioned 2D and 3D approaches. The existing 2D methods did not consider the 3D features of the lung nodules which are important to describe their pathological characteristics. The existing 3D methods have their own limitations as they typically can not quantitative measure the differences between two complex shapes, as we encounter in lung nodules.

Comparing to the conventional methods,our shape classification method only depends on Riemannian metrics and is invariant under rigid motions and scalings which is more effective and efficient for shape classification.

3 Methods

This section gives the brief introduction to the theoretic background and the computation method.

3.1 Theoretic Background

Here we only introduce the most fundamental concepts and theorems, for detailed treatments, we refer readers to [18] for conformal geometry, and  [19], [20] for optimal mass transportation.

3.1.1 Optimal Mass Transport

Optimal mass transportation problem was first raised by Monge [21] in the 18th century. In the following, we formulate the problem in general Riemannian manifold setting.

Suppose is a Riemannian manifold with a Riemannian metric , let and be two probability measures on , which have the same total mass . A map is measure preserving, if for any measurable set , . Namely, pushes forward to , denoted as . The optimal mass transportation problem is formulated as follows:

Problem 1 (Optimal Mass Transport)

Given a transportation cost function , find the measure preserving map that minimizes the total transportation cost


In our current work, the cost function is the squared geodesic distance, .

In the 1940s, Kantorovich solved the optimal transportation problem by relaxing the transportation maps to transportation plans, and introduced the linear programming method 


Theorem 3.1 (Kantorovich)

Suppose is a Riemannian manifold, probability measures and have the same total mass, is absolutely continuous,

has finite second moment, the cost function is the squared geodesic distance, then the optimal mass transportation map exists and is unique.

3.1.2 Wasserstein Metric Space

All the probability measures on the Riemannian manifold form the Wasserstein distance, the optimal transportation cost defines the metric of this space.

Definition 1 (Wasserstein Space)

For , let denote the space of all probability measures on with finite moment, for some , where is the geodesic distance induced by .

Given two probabilities and in , the Wasserstein distance between them is defined as the transportation cost induced by the optimal transportation map ,

The following theorem plays a fundamental role for the current work

Theorem 3.2

[22] The Wasserstein distance is a Riemannian metric of the Wasserstein space .

3.1.3 Discrete Optimal Mass Transport

Suppose the target measure is a Dirac measure, namely, its support is a discrete point set, . Each point is with measure . Then the optimal mass transportation map is given by a geodesic power diagram.

Definition 2 (Geodesic Power Diagram)

Given a point set and the weight , the geodesic power diagram induced by is a cell decomposition of the manifold , such that the cell associated with is given by

The following theorem lays down the foundation of our algorithm.

Theorem 3.3 (Discrete Optimal Mass Transport)

Given a Riemannian manifold , two probability measures and are of the same total mass. is a Dirac measure, with discrete point set support , . There exists a weight , unique up to a constant, the geodesic power diagram induced by gives the optimal mass transportation map,

furthermore .

3.1.4 Discrete Spherical Optimal Mass transport

Let be the unit sphere. A great circle is a circle which intersects with a plane passing through the center. The geodesic between two points on is a portion of the great circle that passes through these two points. The geodesic distance , where is the ordinary dot product.

Suppose that we are given a set of circles on . Laguerre proximity [23] is defined as The region which consists of the points that are closer to than any other circles in is defined by The partition of by these regions gives the spherical power diagram for . The Poincare dual of the spherical power diagram gives the spherical Delaunay triangulation. All the edges of the triangulation and the spherical power diagram are geodesic. Moreover, every edge of power diagram is perpendicular to its dual edge in triangle mesh.

The measure for each cell is denoted as ,

The energy defined on the radii of ’s is given by

The gradient of equals to the difference between the current measure and target measure. For the Hessian of energy , the symmetry can be obtained by direct calculation. Since , . Since , by symmetry we have

By varying power radius locally, increasing power radius while fixing all others, the area of spherical power cell increases monotonically, we find that . Thus Hessian matrix is diagonally dominated and semi-positive definite. By adding one more constraint such as fixing power radius at one point, the Hessian matrix becomes positive definite.

3.2 Computational Method

Based on above mentioned theories, we developed our specified algorithm for the lung nodules. In our new algorithm, all calculations, edge length, cell area, integration on a region, are under spherical metric. Exact formula for Hessian matrix is too complicated to present here. Nevertheless it’s computable. Minimization of energy can be computed by Newton’s method, using the new formula of gradient and Hessian matrix. Figure 2 shows a source sphere with measure and a target sphere with measure and the optimal transportation map.

4 Experiments

The morphology of lung nodules have been shown to influence the likelihood ratios for malignancy  [7, 12] and reflects the underlying pathological characteristics. In order to predict the likelihood ratios for malignancy of single lung nodule, we classified the nodules to malignant and benign, by using Wasserstein distance.

4.1 Data Preparation

The data of the lung nodule images were from computed tomography chest screens of patients visiting our institutions. The interval between the slices were all around 1 mm. All the nodules were surgically removed by the surgeons after diagnosis and pathologically verified for their malignancy. The 3D models of the lung nodules were segmented and reconstructed by trained experts using commercial softwares and confirmed by thoracic surgeons. Our experimental dataset had 55 lung nodules in total, including 39 malignant and 16 benign nodules, and they were classified to different categories spiculation, lobulation, spherical by a thoracic surgeon and a thoracic radiologist as the ground-truth. There are lobulation nodules, spiculation nodules, and spherical nodules in our dataset.

4.2 Experiments Results

For the classification purpose, we computed the full pair-wise Wasserstein distance matrix based on our new spherical optimal mass transportation algorithm. With the distance matrix, we validate our algorithm with lung nodule classification on a dataset of lung nodules, 39 malignant nodules and 16 benign nodules. The SVM method was employed as the classifier with 10-fold cross validation in our experimental results, and the input feature vector of the classifier includes 55 features, and the experiment result shows that our correctness rate is


The result presented in [12] shows that different morphology indicates different probability of the malignancy, There lobulation nodules, spiculation nodules were malignant and of spherical nodules were benign in their dataset. Our correctness rate match the average probability of malignancy/benign for nodules with different morphology, which indicates that our proposed algorithm analyze and classify the shapes correctly. However quantitative analysis of shape of the lung nodules may not enough to predict the malignancy in all cases.

5 Conclusion

A novel 3D shape classification framework based on spherical Wasserstein distance for lung nodules diagnosis is introduced in this paper. We developed a specified optimal mass transport map on sphere for computing the Wasserstein distance for lung nodules. Comparing to existing method, our algorithm is faster and more efficient. And our algorithm only depends on the Riemannian metric and it is invariant under rigid motion and scaling.

We analyzed the 3D shapes of both malignant and benign nodule based on our proposed algorithm, and were able to differentiate the malignant nodules from the benign ones. Our test results shows that using the Wasserstein distance to describe the shape difference between lung nodules can constitute an efficient discriminatory feature, and it is very valuable in clinical practice, even not sufficient enough for all cases.


  • [1] Bartholmai B et al. Pulmonary nodule characterization, including computer analysis and quantitative features. Thorac Imaging, 30(2):139–156, 2015.
  • [2] Zhengyu Su, Wei Zeng, Yalin Wang, Zhong-Lin Lu, and Xianfeng Gu. Shape classification using wasserstein distance for brain morphometry analysis. In International Conference on Information Processing in Medical Imaging, pages 411–423. Springer, 2015.
  • [3] Weir H et al. Annual report to the nation on the status of cancer, 1975-2000. National Cancer Institute 95, pages 1276–1299, 2003.
  • [4] Preeti Aggarwal, Vig Renu, and Hk Sardana.

    Lung cancer detection using fusion of medical knowledge and content based image retrieval for lidc dataset.

    Journal of Medical Imaging and Health Informatics, 6:297–311, 04 2016.
  • [5] Michael C. Lee, Lilla Boroczky, Kivilcim Sungur-Stasik, Aaron D. Cann, Alain C. Borczuk, Steven M. Kawut, and Charles A. Powell.

    Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction.

    Artificial Intelligence in Medicine, 50(1):43 – 53, 2010.
  • [6] Ted W Way, Berkman Sahiner, Heang-Ping Chan, Lubomir M. Hadjiiski, Philip N. Cascade, Aamer Chughtai, Naama R. Bogot, and Ella A. Kazerooni. Computer-aided diagnosis of pulmonary nodules on ct scans: improvement of classification performance with nodule surface features. Medical physics, 36 7:3086–98, 2009.
  • [7] McKay J Gurney J, Lyddon D. Determining the likelihood of malignancy in solitary pulmonary nodules with bayesian analysis. part ii. application. Radiology, 186(2):415–422, 1993.
  • [8] Feng Li, Masahito Aoyama, Junji Shiraishi, Hiroyuki Abe, Qiang Li, Kenji Suzuki, Roger Engelmann, Shusuke Sone, Heber MacMahon, and Kunio Doi.

    Radiologists’ performance for differentiating benign from malignant lung nodules on high-resolution ct using computer-estimated likelihood of malignancy.

    American Journal of Roentgenology, 183(5):1209–1215, 2004.
  • [9] Kohei Murao et al Kazuo Awai. Pulmonary nodules: Estimation of malignancy at thin-section helical ct?effect of computer-aided diagnosis on performance of radiologists. Radiology, 239:276–284, 2006.
  • [10] Kiyoshi Mori, Noboru Niki, Teturo Kondo, Yukari Kamiyama, Teturo Kodama, Yoshiki Kawada, and Noriyuki Moriyama. Development of a novel computer-aided diagnosis system for automatic discrimination of malignant from benign solitary pulmonary nodules on thin-section dynamic computed tomography. Journal of computer assisted tomography, 29(2):215–222, 2005.
  • [11] Swensen SJ Gurney JW. Solitary pulmonary nodules: determining the likelihood of malignancy with neural network analysis. Radiology, 196(3):823–829, sep 1995.
  • [12] Kiyomi Furuya, S Murayama, H Soeda, J Murakami, Y Ichinose, H Yauuchi, Y Katsuda, M Koga, and K Masuda. New classification of small pulmonary nodules by margin characteristics on highresolution ct. Acta Radiologica, 40(5):496–504, 1999.
  • [13] Kawata Y et al. Computerized analysis of 3-d pulmonary nodule images in surrounding and internal structure feature spaces. 2:889 – 892 vol.2, 11 2001.
  • [14] El-Baz Ayman et al. 3d shape analysis for early diagnosis of malignant lung nodules. 14:175–82, 09 2011.
  • [15] Sebastian Kurtek, Eric Klassen, John C. Gore, Zhaohua Ding, and Anuj Srivastava. Elastic geodesic paths in shape space of parameterized surfaces. TPAMI, 34:1717–1730, 2012.
  • [16] Mona Mahmoudi and Guillermo Sapiro. Three-dimensional point cloud recognition via distributions of geometric distances. Journal of Graphical Models, 71:22–32, 2009.
  • [17] Ian H. Jermyn, Sebastian Kurtek, Eric Klassen, and Anuj Srivastava. Elastic shape matching of parameterized surfaces using square root normal fields. ECCV, 7576:804–817, 2012.
  • [18] Xianfeng Gu and Shing-Tung Yau. Computational Conformal Geometry. International Press, 2008.
  • [19] L. V. Kantorovich. On a problem of Monge. Uspekhi Mat. Nauk., 3:225–226, 1948.
  • [20] Y. Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Com. Pure Appl. Math., 64:375–417, 1991.
  • [21] Nicolas Bonnotte. From Knothe’s rearrangement to Brenier’s optimal transport map. arXiv:1205.1099, pages 1–29, 2012.
  • [22] Cedric Villani. Topics in Optimal Transportation. American Mathematical Society, 2003.
  • [23] Kokichi Sugihara. Laguerre voronoi diagram on the sphere. Journal for Geometry and Graphics, 6(1):69–81, 2002.