Bipartite Distance for Shape-Aware Landmark Detection in Spinal X-Ray Images

05/28/2020 ∙ by Abdullah-Al-Zubaer Imran, et al. ∙ 9

Scoliosis is a congenital disease that causes lateral curvature in the spine. Its assessment relies on the identification and localization of vertebrae in spinal X-ray images, conventionally via tedious and time-consuming manual radiographic procedures that are prone to subjectivity and observational variability. Reliability can be improved through the automatic detection and localization of spinal landmarks. To guide a CNN in the learning of spinal shape while detecting landmarks in X-ray images, we propose a novel loss based on a bipartite distance (BPD) measure, and show that it consistently improves landmark detection performance.



There are no comments yet.


page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Scoliosis is an abnormal condition characterized by lateral spinal curvature. Early assessment and treatment planning is critical (Weinstein et al., 2008). Conventionally, the assessment of scoliosis is performed manually by clinicians through the identification and localization of vertebral structures in spinal X-ray images. However, large inter-patient anatomical variation and poor image quality challenge clinicians to assess the severity of scoliosis accurately and reliably. Automated measurement promises to enable the reliable quantitative assessment of scoliosis.

Several spinal landmark detection methods are available in the literature: Conventional hand-crafted feature engineering (Ebrahimi et al., 2019)

is a semi-automatic method involving several sub-tasks. Our approach is automatic convolutional neural network (CNN) models. The CNN model of Wu et al. 

(Wu et al., 2017) requires cropped images and tedious data augmentation. Landmarks can also be detected by segmenting the relevant vertebrae (Imran et al., 2019)

. Our proposed model is totally end-to-end, requiring no pre-processing, and is fully automatic, eschewing any hand-crafted feature extractions.

2 Method

Given an X-ray image, we formulate the landmark detection problem as identifying landmarks localizing the relevant vertebrae. Each training image , for , is annotated by an associated

-dimensional landmark vector

. Through supervised learning, a CNN can be trained to extract landmarks automatically, by minimizing the standard mean squared error (MSE) loss


where are the ground-truth landmarks and are the predicted landmarks. However, the MSE loss ignores inter-landmark relationships. To guide a CNN in the detection of landmark coordinates while learning spinal shape, we propose a novel distance measure—bipartite distance.

(a) (b)
Figure 1: Convolutional neural networks for landmark detection from spine X-ray images: (a) Illustration of the bipartite distance in a spinal image based on the ground truth (green) and predicted (red) landmarks. (b) Model architecture.

Referring to Figure 1a, we regard the ground-truth (green) landmarks on the left and on the right of the spine as the two disjoint sets of vertices of a complete bipartite graph whose edges connect every landmark in with all landmarks in . The same holds for the predicted (red) landmarks, and . This leads to a shape-aware loss, which penalizes the CNN model when the pairwise distances between the predicted landmarks deviate from those between the ground truth landmarks. Letting denote the Euclidean distance between ground-truth landmarks connected by edge of the graph and denote the Euclidean distance between the corresponding predicted landmarks, the bipartite distance (BPD) is


We employ the loss function


where weighs the BPD term against the MSE.

3 Implementation Details

Our dataset consists of 100 high-resolution anterior-posterior spinal X-ray images with signs of mild to severe scoliosis. Since the cervical vertebrae are seldom involved in spinal deformity and the identification of the bottom cervical vertebra could be important, we selected 18 vertebrae: C7 (cervical), T1–T12 (thoracic), and L1–L5 (lumbar). Medical experts provided binary segmentation annotation by labeling the vertebrae in the X-ray images. The 4 corners of each vertebral region serve as landmarks. They were automatically extracted by applying FAST (Rosten and Drummond, 2006)

to the expert-segmented labels. Therefore, associated with each spinal image are 72 landmarks to be estimated.

As shown in Figure 1

b, our model is a CNN comprising five convolutional layers and three fully-connected (FC) layers. Leaky-ReLU is used as the activation function in each layer. The convolutional layers have feature sizes 16, 32, 64, 128, and 256. In each layer, two

convolution operations are followed by a

maxpooling layer. After every convolutional layer, we use a batch-normalization layer and a dropout layer with the rate of 0.25. After two FC layers with 512 neurons, a final FC layer of

neurons is used to produce the image-plane coordinates of the landmarks. The model is implemented in Tensorflow with Python 3 and runs on a Tesla P40 GPU on a 64-bit Intel(R) Xeon(R) 440G CPU.

The dataset was split into training (80 images), testing (15 images), and validation (5 images) sets. All the images were resized to and normalized to before feeding them to the network. When the model is trained using our MSE-BPD loss, we used in (3). As a baseline, we trained the same architecture using only the MSE loss; i.e., . The models were trained with a minibatch size of 4. We used the Adam optimizer with a learning rate of 0.0001 and momentum 0.9.

4 Experimental Results

We compared the performance of the proposed model (MSE-BPD) against the baseline (MSE) both qualitatively and quantitatively. Qualitative comparisons (e.g., Figure 2), show better agreement of our model over the baseline model. The irregular spinal shapes in the baseline model are mitigated by our model. Moreover, the landmark detection performance is also improved in our model, which achieves a correlation score of 0.95 compared to the baseline model’s score of 0.92 (Pearson correlation coefficient). Moreover, one-way ANOVA analysis confirms that the landmarks predicted by our model have no significant difference with the ground truth landmarks ().

Figure 2: Qualitative comparison shows improved spinal shape and landmark detection performance by our model (MSE-BPD loss) relative the baseline model (MSE loss) in two spinal X-ray images from the test set. Green boxes bound vertebrae based on the ground-truth landmarks; red boxes bound vertebra based on the model-predicted landmarks.

5 Conclusions

The detection of vertebral landmarks is crucial for the accurate measurement of scoliosis in spinal X-ray images. To this end, we proposed a new loss function which guides the training of a CNN vertebral (corner) landmark detection model to perform reliable shape-aware predictions.


  • S. Ebrahimi, L. Gajny, W. Skalli, and E. Angelini (2019)

    Vertebral corners detection on sagittal x-rays based on shape modelling, random forest classifiers and dedicated visual features

    CMBBE: Imaging & Visualization 7 (2), pp. 132–144. Cited by: §1.
  • A. Imran, C. Huang, H. Tang, W. Fan, K. M.C. Cheung, M. To, Z. Qian, and D. Terzopoulos (2019) End-to-end fully automatic segmentation of scoliotic vertebrae in spinal X-ray images. In Medical Imaging Meets NeurIPS Workshop, Vancouver, Canada. Cited by: §1.
  • E. Rosten and T. Drummond (2006) Machine learning for high-speed corner detection. In

    European Conference on Computer Vision

    pp. 430–443. Cited by: §3.
  • S. L. Weinstein, L. A. Dolan, J. C. Cheng, A. Danielsson, and J. A. Morcuende (2008) Adolescent idiopathic scoliosis. The Lancet 371 (9623), pp. 1527–1537. Cited by: §1.
  • H. Wu, C. Bailey, P. Rasoulinejad, and S. Li (2017) Automatic landmark estimation for adolescent idiopathic scoliosis assessment using BoostNet. In Proc. 2017 MICCAI Conf., Cited by: §1.