Scoliosis is an abnormal condition characterized by lateral spinal curvature. Early assessment and treatment planning is critical (Weinstein et al., 2008). Conventionally, the assessment of scoliosis is performed manually by clinicians through the identification and localization of vertebral structures in spinal X-ray images. However, large inter-patient anatomical variation and poor image quality challenge clinicians to assess the severity of scoliosis accurately and reliably. Automated measurement promises to enable the reliable quantitative assessment of scoliosis.
Several spinal landmark detection methods are available in the literature: Conventional hand-crafted feature engineering (Ebrahimi et al., 2019)
is a semi-automatic method involving several sub-tasks. Our approach is automatic convolutional neural network (CNN) models. The CNN model of Wu et al.(Wu et al., 2017) requires cropped images and tedious data augmentation. Landmarks can also be detected by segmenting the relevant vertebrae (Imran et al., 2019)
. Our proposed model is totally end-to-end, requiring no pre-processing, and is fully automatic, eschewing any hand-crafted feature extractions.
Given an X-ray image, we formulate the landmark detection problem as identifying landmarks localizing the relevant vertebrae. Each training image , for , is annotated by an associated
-dimensional landmark vector
. Through supervised learning, a CNN can be trained to extract landmarks automatically, by minimizing the standard mean squared error (MSE) loss
where are the ground-truth landmarks and are the predicted landmarks. However, the MSE loss ignores inter-landmark relationships. To guide a CNN in the detection of landmark coordinates while learning spinal shape, we propose a novel distance measure—bipartite distance.
Referring to Figure 1a, we regard the ground-truth (green) landmarks on the left and on the right of the spine as the two disjoint sets of vertices of a complete bipartite graph whose edges connect every landmark in with all landmarks in . The same holds for the predicted (red) landmarks, and . This leads to a shape-aware loss, which penalizes the CNN model when the pairwise distances between the predicted landmarks deviate from those between the ground truth landmarks. Letting denote the Euclidean distance between ground-truth landmarks connected by edge of the graph and denote the Euclidean distance between the corresponding predicted landmarks, the bipartite distance (BPD) is
We employ the loss function
where weighs the BPD term against the MSE.
3 Implementation Details
Our dataset consists of 100 high-resolution anterior-posterior spinal X-ray images with signs of mild to severe scoliosis. Since the cervical vertebrae are seldom involved in spinal deformity and the identification of the bottom cervical vertebra could be important, we selected 18 vertebrae: C7 (cervical), T1–T12 (thoracic), and L1–L5 (lumbar). Medical experts provided binary segmentation annotation by labeling the vertebrae in the X-ray images. The 4 corners of each vertebral region serve as landmarks. They were automatically extracted by applying FAST (Rosten and Drummond, 2006)
to the expert-segmented labels. Therefore, associated with each spinal image are 72 landmarks to be estimated.
As shown in Figure 1
b, our model is a CNN comprising five convolutional layers and three fully-connected (FC) layers. Leaky-ReLU is used as the activation function in each layer. The convolutional layers have feature sizes 16, 32, 64, 128, and 256. In each layer, twoconvolution operations are followed by a
neurons is used to produce the image-plane coordinates of the landmarks. The model is implemented in Tensorflow with Python 3 and runs on a Tesla P40 GPU on a 64-bit Intel(R) Xeon(R) 440G CPU.
The dataset was split into training (80 images), testing (15 images), and validation (5 images) sets. All the images were resized to and normalized to before feeding them to the network. When the model is trained using our MSE-BPD loss, we used in (3). As a baseline, we trained the same architecture using only the MSE loss; i.e., . The models were trained with a minibatch size of 4. We used the Adam optimizer with a learning rate of 0.0001 and momentum 0.9.
4 Experimental Results
We compared the performance of the proposed model (MSE-BPD) against the baseline (MSE) both qualitatively and quantitatively. Qualitative comparisons (e.g., Figure 2), show better agreement of our model over the baseline model. The irregular spinal shapes in the baseline model are mitigated by our model. Moreover, the landmark detection performance is also improved in our model, which achieves a correlation score of 0.95 compared to the baseline model’s score of 0.92 (Pearson correlation coefficient). Moreover, one-way ANOVA analysis confirms that the landmarks predicted by our model have no significant difference with the ground truth landmarks ().
|Raw Image||MSE||MSE-BPD||Raw Image||MSE||MSE-BPD|
The detection of vertebral landmarks is crucial for the accurate measurement of scoliosis in spinal X-ray images. To this end, we proposed a new loss function which guides the training of a CNN vertebral (corner) landmark detection model to perform reliable shape-aware predictions.
- . CMBBE: Imaging & Visualization 7 (2), pp. 132–144. Cited by: §1.
- End-to-end fully automatic segmentation of scoliotic vertebrae in spinal X-ray images. In Medical Imaging Meets NeurIPS Workshop, Vancouver, Canada. Cited by: §1.
Machine learning for high-speed corner detection.
European Conference on Computer Vision, pp. 430–443. Cited by: §3.
- Adolescent idiopathic scoliosis. The Lancet 371 (9623), pp. 1527–1537. Cited by: §1.
- Automatic landmark estimation for adolescent idiopathic scoliosis assessment using BoostNet. In Proc. 2017 MICCAI Conf., Cited by: §1.