Spinal diseases are increasing worldwide and can cause significant loss of function and compromise quality of life. Surgical spinal treatments have been growing with the aging population, which requires accurate diagnosis to avoid complications . Many spine pathologies can be detected and diagnosed using Magnetic Resonance Imaging (MRI) exams [2, 3]. In a Computer-Aided Diagnosis (CAD) context, the segmentation of each vertebra allows a faster and more objective analysis of the vertebrae condition, aiding in the characterization and quantification of abnormalities . Moreover, an accurate segmentation plays a major role and may assist the medical specialist in surgical planning and evaluation of suitable treatments .
The manual segmentation of a vertebral body in a slice-by-slice manner may be time-consuming and prone to errors, due to inter and intra-subject variability.
Besides, the subjective judgment that is employed may aggregate even more inaccuracy . Elseways, the knowledge gained over several years of expertise are incorporated. Thus, the semi-automatic segmentation assists the specialist, leads to time savings and reduces interpretation errors .
, consequently, machine learning techniques can be applied for theclassification of a given anomaly [9, 10, 11]
or for Content-Based Image Retrieval (CBIR)[12, 13]. Interactive segmentation tools can be meaningful during the training and education of new radiologists . Students can learn how to correctly segment each vertebra and to detect spine pathologies . This kind of training may avoid potential medical failures, which reduces further complications. In general, the visualization of 3D human structures can be used for simulation of medical and surgical procedures .
The GrowCut  method and its faster version, named as Fast GrowCut , which presents slightly lower segmentation accuracy, have been widely used in many medical MRI exams (especially in oncology) . The GrowCut method is based on cellular automata (analogous to a bacteria growth in biology) and works as a region-growing approach with an interactive labeling procedure .
Several fully automatic vertebrae segmentation methods have been proposed [19, 20]. However, they take too much processing time, which may not suit clinical practice . More recently, a novel approach called Balanced Growth (BGrowth)  has been proposed for the segmentation of crushed vertebral bodies in single slices. Briefly, BGrowth balances the weights along the growing path of a region, so that small intensities transitions are better delineated. The results achieved by BGrowth surpasses all methods from the literate, including GrowCut. Moreover, BGrowth is able to achieve promising segmentation results even with very simple/sloppy annotation (seed points).
In this paper, we extrapolate the specialists' annotation up to a fixed limit without losing performance/accuracy, so that the total time spent on manual annotation is reduced. Moreover, we show how to extend BGrowth to deal with the reconstruction of volumetric exams (3D), introducing a novel method called 3DBGrowth. The experimental results show that 3DBGrowth outperforms GrowCut, achieving an average Dice Score of 87% while managing comparable running time. Moreover, the method works well even with rough seed points, which reduces the time spent on manual annotation.
The remainder of the paper is structured as follows. In section 2, we present 3DBGrowth for the segmentation and reconstruction of vertebral bodies in volumetric MRI. Then, in section 3 we explore the materials and methods. Next, in section 4, we detail the experimental design, results and discussion. Finally, section 5 draws the conclusions.
2 3DBGrowth: The proposed method
The usual approach of annotating or stating seeds for segmenting medical images can be cumbersome for large 3D exams. Thus, this work main issue focuses on minimizing the human effort to segment and reconstruct 3D exams built on 2D slices. As illustrated in Figure 2, depending on the MRI exam, not all slices have to be manually annotated by the user to process the 3D reconstruction. If the exams present a small spacing between slices (considering annotations on the sagittal plane), several slices do not need to be annotated, once they are similar. This can be assessed by analyzing the negative slope coefficient , which gives the best trade-off between annotation time and performance measures, such as Dice Score Coefficient or Jaccard Coefficient (better explored in the next Section).
The slope between two points, and , can be calculated by , which is the rate of change between the two points. When the slope between two values of annotation time gets close to a straight horizontal line, there is no gain in annotation time, in other words, the closer the slope gets to 0, the lower the annotation time gain. Moreover, by using a segmentation approach that does not require detailed interior/exterior annotation, such as BGrowth, the total time spent on annotation is greatly diminished. In addition, BGrowth generally requires a simple rectangle-like annotation for the segmentation of individual vertebral bodies. For example, only 3 out of 7 slices were annotated in Figure 2. In average, each slice took 6.5 seconds for annotation and 3DBGrowth took seconds to process all 7 slices with only 16 iterations. Summing up, the whole process took seconds and achieved a result close to the Ground-Truth.
In our proposed 3DBGrowth method (Algorithm 1), we initially consider the segmentation of foreground and background in gray-scale images. That is, considering a digital image and its annotations/labels as a matrix , both with dimension , representing the number of rows, columns and slices, respectively. Each entry in has value (background), (unlabelled) or (foreground).
Initially, each entry in a weight matrix (with the same dimensions as and ) is set to for seeds points and otherwise (line 1). Then, for every voxel and each one of its 26 neighbours , a strength factor is calculated (line 4). Here, the absolute intensity difference is normalized by the maximum intensity in the image and shifted by 1. Finally, is multiplied by the current weight , which produces values within . If the strength is greater than the neighbour's strength (, line 5), then the neighbour's strength is averaged with the new strength (line 6) and its label receives the label of the voxel (line 7).
This process repeats until the algorithm converges or for a fixed number of iterations defined by the user.
3 Materials and methods
|Dice Score Coefficient|
|BG||3D Balanced Growth|
The methods and measures used for comparison, as well as the computational set-up and image dataset are described as follows.
3.1 Image Dataset
Due to space limitations, only a meaningful dataset is presented herein, which comprises 17 anonymized MRI exams, ranging from the sacrum (S1) to the mid thoracic (T6-T12) with corresponding manual segmentations. The exams present several health conditions, such as scoliosis, spondylolisthesis and crushed vertebra. The exams have mm of slice thickness and mm of spacing between slices. More information and full access to the dataset is available at .
3.2 Segmentation methods
In order to evaluate the performance of 3DBGrowth (BG) in a 3D scenario, we compared it with GrowCut (GC), which has been widely used for the task of vertebrae segmentation . Since Fast GrowCut is an approximation of the original GrowCut, presenting a lower accuracy 
, we consider only GrowCut on the experiments. Due to the limited number of samples (exams) no deep-learning approach was applied.
3.3 Comparison measures
The Jaccard Coefficient (), Dice Score Coefficient () and Hausdorff's Distance () in voxels [24, 25] were considered. The Jaccard () calculates the intersection of the manual and semi-automatic segmentation, and divides it by the union of them. This indicates the similarity between the segmentations, in which 0 indicates no similarity and, the closer is to 1, the more alike the segmentations . The Dice () measures the spatial overlap of several segmentations of the same object, i.e, quantifies the overlap degree between two segmented objects. A close to indicates very low overlap, while a closer to 1 indicates a higher overlap. In contrast, the Hausdorff's Distance () indicates how far away (in voxels) the manual and semi-automatic segmentations are. A of 0 indicates comparable segmentations.
Table 1 shows a summary of the segmentation methods and comparison measures used in this work.
|(a) Original||(b) Ground-Truth||(c) Annotation|
3.4 Computational set-up
The experiments were performed on a 2.40GHz Intel(R) Core(TM) i7 CPU and 8GB RAM machine, using Matlab(R) version 2018a. The maximum number of iterations was set to 50 for GrowCut and 3DBGrowth. No pre or post-processing technique were applied to assure the same conditions for all segmentation methods.
4 Experiments, results and discussion
In our experimental design, we analyzed four main parts are: (A) the performance of each segmentation method is assessed using the whole exam; (B) each segmentation method is tested varying the number of slices annotated for each exam; (C) the vertebral bodies are segmented one-by-one by each method; (D) a statistical test is applied to detect any significant difference between the results of the two methods.
4.1 Exam segmentation analysis
The initial interior and exterior annotation were performed in a “sloppy” way, i.e., no detailed boundary for accentuated curves were drawn. In general, the annotation looks like a rectangle for the background and a simple line for the foreground (Figure 3). For this experiment, this annotation has been performed on each slice on every exam and, to diminish computational processing, each exam is cropped using the convex hull of the exterior annotation.
Table 2 shows the average Dice Score (), Jaccard () and Running Time () in seconds for each one of the 17 exams in the dataset. 3DBGrowth (BG) presented on average 81% and 68% while GrowCut (GC) presented 76% and 61%, respectively. Thus, BG presented higher and percentages than GC for all exams, achieving up to 5% and 7% of and gain, respectively. Moreover, considering and
, BG's standard deviation is slightly lower. Analyzing the Running Time (), very often, BG presented a lower average processing time than GC.
|(#slices)||BG GC||Gain||BG GC||Gain||BG||GC|
|DzZ_T1 (12)||85 80||4.86||74 67||7.0||18||21|
|DzZ_T2 (12)||82 77||4.2||69 63||5.8||27||31|
|AKa2 (15)||82 77||5.17||69 62||7.1||27||27|
|AKa3 (15)||78 73||4.78||64 58||6.2||27||29|
|AKa4 (15)||80 73||7.10||67 58||9.4||26||27|
|AKs5 (15)||84 78||6.54||73 63||9.2||23||24|
|AKs6 (15)||84 79||5.44||73 65||7.8||21||24|
|AKs7 (15)||80 73||7.6||67 57||9.9||21||24|
|AKs8 (15)||81 78||3.39||68 64||4.7||18||21|
|S01 (16)||85 82||2.91||74 70||4.3||44||50|
|S02 (16)||83 78||4.97||70 63||6.9||26||32|
|F02 (18)||78 74||3.63||64 59||4.7||48||55|
|St1 (20)||83 80||2.73||71 67||3.9||61||67|
|F04 (23)||78 75||3.42||64 60||4.5||13||14|
|AKs3 (25)||80 73||6.42||66 58||8.4||40||37|
|F03 (25)||80 77||3.57||67 62||4.8||08||09|
|C002 (31)||71 65||5.85||55 48||6.7||16||14|
|Mean||81 76||4.9||68 61||6.6||27||30|
|Std. dev.||3.4 3.9||1.5||4.7 5.0||1.9||13.7||15.2|
Considering that the manual annotation of every slice in the exam is too time consuming (for this dataset, on average, 11 minutes/exam), we conducted an experiment to validate the performance of 3DBGrowth and GrowCut when not all slices are annotated, as explored in the next section.
4.2 Variation on the number of annotated slices
We used the previous experiment's annotations and left a few slices without annotation: we defined a slice distance, which manages the number of non-annotated slices between two annotated slices. For example, a slice distance of 0 implicates no slice is left without annotation. The slice distance started at 0, increased by 1, up to 7.
As the slice distance increases (Figure 4), the average annotation time decreases and the processing time keeps almost steady for both methods. Also, and drops slowly for both methods. However, BG presented best results than GC for both measures. Considering the negative slope coefficient (as discussed in Section 2), highlighted over the magenta line, by using a threshold of -1, the best slice distance would be 3, which presents the best trade-off between annotation time and /.
|(a) Original||(b) Ground-Truth||(c) Annotation|
|Vertebrae||BG GC||BG GC||BG GC||BG GC|
|T6||88 87||79 78||3.16 4.00||0.15 0.16|
|T7||85 83||73 69||78.8 79.1||7.95 6.77|
|T8||86 85||76 72||79.3 79.2||7.55 8.37|
|T9||81 80||50 52||80.0 79.9||7.83 8.38|
|T10||87 85||80 76||26.2 26.9||2.95 2.83|
|T11||86 84||77 73||6.09 6.99||0.91 0.88|
|Toracic||T12||89 86||79 76||5.63 6.88||1.30 1.36|
|L1||89 87||78 76||6.56 7.31||1.57 1.56|
|L2||88 86||79 76||6.07 7.75||1.52 1.57|
|L3||86 85||75 72||6.40 7.49||1.68 1.83|
|L4||88 86||76 74||7.16 7.65||1.77 1.88|
|Lumbar||L5||87 85||76 74||7.04 8.42||2.33 2.39|
|Sacral S1||88 86||79 76||6.18 7.57||1.74 1.88|
|Mean||87 85||77 74||7.24 7.72||1.52 1.52|
|Std. Dev.||.07 .06||.08 .08||4.85 5.00||1.27 1.27|
|Slices||out of (verte-|
In the next Section, we conduct experiments using annotations for individual vertebral bodies.
4.3 Individual vertebrae segmentation
To speed-up the annotation process, we have considered a slice distance of three for this experiment. Each vertebral body was annotated separately, as exemplified in Figure 5. In general, both the interior and the exterior annotation looks like a rectangle and no detailed borders were drawn.
As reported in Table 3, GC and BG presented equal mean Running Time () and BG presented better mean DSC, JAC and HD than GrowCut. Figure 6 depicts the results for a single vertebral body: BG achieved the highest and the lowest . GC presented spiculated borders, while BG presented smooth borders (closer to the ground-truth).
Analyzing the average number of annotated slices per vertebra (Table 4), for this dataset, in average, only 37% of the total slices with vertebral content were annotated, which speeded-up the annotation process and took, in average, 36 seconds to annotate each vertebral body.
|(a) Ground-Truth||(b) Annotation|
|(c) 3DBGrowth||(d) GrowCut|
To further investigate the results presented in Table 3, we conducted a statistical test, as detailed next.
4.4 Statistical testing
Considering that the resulting values of each measure had several similar values, the Kolmogorov-Smirnov 28] test was used to analyze if there were significant statistical differences. In this test, the null hypothesis is that data from two dependent samples, e.g. the Dice Score () from 3DBGrowth (BG) and GrowCut (GC), were selected from populations having the same distribution, against the opposite alternative.
In the Wilcoxon test results, 3DBGrowth presented significantly better Dice (), Jaccard () and Hausdorf's Distance () than GrowCut. For the Running Time (), there was no significant difference, which implicates that both methods presented comparable processing time.
The semi-automatic segmentation of vertebral bodies in a volumetric scenario is a challenging task, due to the large number of slices in the exams. To obtain a proper 3D reconstruction of the vertebrae, one has to pay attention on allowing a fast and accurate segmentation of slices. We have investigated this challenge and used the slope coefficient of the annotation time, so that the specialists' annotations were extrapolated from a slice to its neighbours up to a given limit without losing accuracy and, at the same time, reduced the total time spent on manual annotation.
On the dataset used, on average, only 37% of the slices with vertebral body content had to be annotated, consequently making the process faster (on average, 36 seconds for each vertebral body). We have proposed 3DBGrowth method, which significantly outperforms GrowCut and keeps comparable running time. Moreover, 3DBGrowth presented the best results even with simple/sloppy seed points, which demands less effort on the annotation process.
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 and grant No.: 0487/17083480, by the São Paulo Research Foundation (FAPESP, grants No. 2016/17078-0, 2017/23780-2, 2018/06228-7, 2018/24414-2), and the National Council for Scientific and Technological Development (CNPq).
-  FEHLINGS, M. G. et al. The aging of the global populationthe changing epidemiology of disease and spinal disorders. Neurosurgery, v. 77, n. 1, p. 1–5, 2015.
-  RAK, M.; TÖNNIES, K. D. On computerized methods for spine analysis in MRI: a systematic review. Int. J. Comput. Assist. Radiol. Surg., v. 11, n. 8, p. 1445–1465, Aug 2016. ISSN 1861-6429.
-  WANG, Y. X. J. et al. Identifying osteoporotic vertebral endplate and cortex fractures. Quant Imaging Med Surg, v. 7, n. 5, p. 555–591, Oct 2017.
-  HAMMERNIK, K. et al. Vertebrae segmentation in 3D CT images based on a variational framework. In: YAO, J. et al. (Ed.). Recent Advances in Computational Methods and Clinical Applications for Spine Imaging. Cham: Springer International Publishing, 2015. p. 227–233. ISBN 978-3-319-14148-0.
-  DZENAN, Z. et al. Robust detection and segmentation for diagnosis of vertebral diseases using routine MR images. Computer Graphics Forum, v. 33, n. 6, p. 190–204, 2014.
-  GILLIES, R. J.; KINAHAN, P. E.; HRICAK, H. Radiomics: images are more than pictures, they are data. Radiology, v. 278, n. 2, p. 563–577, Feb 2016.
-  EGGER, J.; NIMSKY, C.; CHEN, X. Vertebral body segmentation with GrowCut: Initial experience, workflow and practical application. SAGE Open Med, v. 5, p. 1–5, 2017.
JUNIOR, J. R. F. et al. Radiomics-based features for pattern recognition of lung cancer histopathology and metastases.Computer Methods and Programs in Biomedicine, v. 159, p. 23 – 30, 2018. ISSN 0169-2607.
-  CASTI, P. et al. Cooperative strategy for a dynamic ensemble of classification models in clinical applications: the case of MRI vertebral compression fractures. International Journal of Computer Assisted Radiology and Surgery, v. 12, n. 11, p. 1971–1983, Nov 2017.
-  FRIGHETTO-PEREIRA, L. et al. Shape, texture and statistical features for classification of benign and malignant vertebral compression fractures in magnetic resonance images. Computers in Biology and Medicine, v. 73, p. 147 – 156, 2016. ISSN 0010-4825.
-  CAZZOLATO, M. T. et al. dp-breath: Heat maps and probabilistic classification assisting the analysis of abnormal lung regions. Computer Methods and Programs in Biomedicine, v. 173, p. 27–34, 2019. ISSN 0169-2607.
-  XUE, Z. et al. Spine X-ray image retrieval using partial vertebral boundaries. In: 2011 24th International Symposium on Computer-Based Medical Systems (CBMS). [S.l.: s.n.], 2011. p. 1–6. ISSN 1063-7125.
-  GURURAJAN, A. et al. On the creation of a segmentation library for digitized cervical and lumbar spine radiographs. Computerized Medical Imaging and Graphics, v. 35, n. 4, p. 251 – 265, 2011. ISSN 0895-6111.
KARIMI, D. et al. Prostate segmentation in MRI using a convolutional neural network architecture and training strategy based on statistical shape models.Int. J. Computer Assisted Radiology and Surgery, v. 13, n. 8, p. 1211–1219, 2018.
-  STEFAN, P. et al. A radiation-free mixed-reality training environment and assessment concept for C-arm-based surgery. International Journal of Computer Assisted Radiology and Surgery, v. 13, n. 9, p. 1335–1344, Sep 2018. ISSN 1861-6429.
-  BANERJEE, P. et al. A semi-automated approach to improve the efficiency of medical imaging segmentation for haptic rendering. Journal of Digital Imaging, v. 30, n. 4, p. 519–527, Aug 2017. ISSN 1618-727X.
-  VEZHNEVETS, V.; KONOUCHINE, V. GrowCut - interactive multi-label N-D image segmentation by cellular automata. International Conference on Computer Graphics and Vision - GraphiCon, v. 1, Nov 2005.
-  ZHU, L. et al. An effective interactive medical image segmentation method using Fast GrowCut. In: . [S.l.: s.n.], 2014. v. 17.
-  KOREZ, R. et al. Model-based segmentation of vertebral bodies from MR images with 3D CNNs. In: OURSELIN, S. et al. (Ed.). Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016. Cham: Springer Int. Publishing, 2016. p. 433–441. ISBN 978-3-319-46723-8.
-  GAONKAR, B. et al. Multi-parameter ensemble learning for automated vertebral body segmentation in heterogeneously acquired clinical MR images. J. of Translational Engineering in Health and Medicine, v. 5, p. 1–12, 2017. ISSN 2168-2372.
-  HILLE, G. et al. Vertebral body segmentation in wide range clinical routine spine MRI data. Computer Methods and Programs in Biomedicine, v. 155, p. 93 – 99, 2018. ISSN 0169-2607.
-  RAMOS, J. S. et al. BGrowth: an efficient approach for the segmentation of vertebral compression fractures in magnetic resonance imaging. Symposium on Applied Computing, p. 1–8, April 2019.
VANNESCHI, L. et al. Fitness clouds and problem hardness in genetic programming. In: SPRINGER.
Genetic and Evolutionary Computation Conference. [S.l.], 2004. p. 690–701.
-  JACCARD, P. The distribution of the flora in the alpine zone. New Phytologist, v. 11, n. 2, p. 37–50, fev. 1912.
-  SØRENSEN, T. A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species Content and Its Application to Analyses of the Vegetation on Danish Commons. [S.l.]: I kommission hos E. Munksgaard, 1948. (Biologiske skrifter).
-  BARBIERI, P. D. et al. Vertebral body segmentation of spine MR images using Superpixels. In: JUNIOR, C. T. et al. (Ed.). 28th IEEE International Symposium on Computer-Based Medical Systems. São Carlos and Ribeirão Preto, Brazil: Conference Publishing Services (CPS), 2015. p. 44–49. ISSN 1063-7125.
-  MASSEY, F. J. The Kolmogorov-Smirnov test for goodness of fit. Journal of the American Statistical Association, American Statistical Association, v. 46, n. 253, p. 68–78, 1951.
WILCOXON, F.; KATTI, S.; WILCOX, R. Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test.Selected Tables in Mathematical Statistics, v. 1, p. 171–259, 1970.