3DBGrowth: volumetric vertebrae segmentation and reconstruction in magnetic resonance imaging

06/25/2019 ∙ by Jonathan S. Ramos, et al. ∙ Universidade de São Paulo 2

Segmentation of medical images is critical for making several processes of analysis and classification more reliable. With the growing number of people presenting back pain and related problems, the semi-automatic segmentation and 3D reconstruction of vertebral bodies became even more important to support decision making. A 3D reconstruction allows a fast and objective analysis of each vertebrae condition, which may play a major role in surgical planning and evaluation of suitable treatments. In this paper, we propose 3DBGrowth, which develops a 3D reconstruction over the efficient Balanced Growth method for 2D images. We also take advantage of the slope coefficient from the annotation time to reduce the total number of annotated slices, reducing the time spent on manual annotation. We show experimental results on a representative dataset with 17 MRI exams demonstrating that our approach significantly outperforms the competitors and, on average, only 37 content must be annotated without losing performance/accuracy. Compared to the state-of-the-art methods, we have achieved a Dice Score gain of over 5 comparable processing time. Moreover, 3DBGrowth works well with imprecise seed points, which reduces the time spent on manual annotation by the specialist.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Spinal diseases are increasing worldwide and can cause significant loss of function and compromise quality of life. Surgical spinal treatments have been growing with the aging population, which requires accurate diagnosis to avoid complications [1]. Many spine pathologies can be detected and diagnosed using Magnetic Resonance Imaging (MRI) exams [2, 3]. In a Computer-Aided Diagnosis (CAD) context, the segmentation of each vertebra allows a faster and more objective analysis of the vertebrae condition, aiding in the characterization and quantification of abnormalities [4]. Moreover, an accurate segmentation plays a major role and may assist the medical specialist in surgical planning and evaluation of suitable treatments [5].

The manual segmentation of a vertebral body in a slice-by-slice manner may be time-consuming and prone to errors, due to inter and intra-subject variability.


Figure 1: Steps in a semi-automatic segmentation schema.

Besides, the subjective judgment that is employed may aggregate even more inaccuracy [6]. Elseways, the knowledge gained over several years of expertise are incorporated. Thus, the semi-automatic segmentation assists the specialist, leads to time savings and reduces interpretation errors [7].

The semi-automatic segmentation can be used in several analysis (Figure 1). Quantitative measures can be extracted, such as semantic and agnostic features [8]

, consequently, machine learning techniques can be applied for the

classification of a given anomaly [9, 10, 11]

or for Content-Based Image Retrieval (CBIR) 

[12, 13]. Interactive segmentation tools can be meaningful during the training and education of new radiologists [14]. Students can learn how to correctly segment each vertebra and to detect spine pathologies [15]. This kind of training may avoid potential medical failures, which reduces further complications. In general, the visualization of 3D human structures can be used for simulation of medical and surgical procedures [16].

The GrowCut [17] method and its faster version, named as Fast GrowCut [18], which presents slightly lower segmentation accuracy, have been widely used in many medical MRI exams (especially in oncology) [8]. The GrowCut method is based on cellular automata (analogous to a bacteria growth in biology) and works as a region-growing approach with an interactive labeling procedure [17].

Figure 2: Examples of slices annotation for a single vertebral body (Exam AKa2, L5) and 3DBGrowth's iterations. Ground-truth, interior and exterior annotation in red, magenta and blue, respectively.

Several fully automatic vertebrae segmentation methods have been proposed  [19, 20]. However, they take too much processing time, which may not suit clinical practice [21]. More recently, a novel approach called Balanced Growth (BGrowth) [22] has been proposed for the segmentation of crushed vertebral bodies in single slices. Briefly, BGrowth balances the weights along the growing path of a region, so that small intensities transitions are better delineated. The results achieved by BGrowth surpasses all methods from the literate, including GrowCut. Moreover, BGrowth is able to achieve promising segmentation results even with very simple/sloppy annotation (seed points).

In this paper, we extrapolate the specialists' annotation up to a fixed limit without losing performance/accuracy, so that the total time spent on manual annotation is reduced. Moreover, we show how to extend BGrowth to deal with the reconstruction of volumetric exams (3D), introducing a novel method called 3DBGrowth. The experimental results show that 3DBGrowth outperforms GrowCut, achieving an average Dice Score of 87% while managing comparable running time. Moreover, the method works well even with rough seed points, which reduces the time spent on manual annotation.

The remainder of the paper is structured as follows. In section 2, we present 3DBGrowth for the segmentation and reconstruction of vertebral bodies in volumetric MRI. Then, in section 3 we explore the materials and methods. Next, in section 4, we detail the experimental design, results and discussion. Finally, section 5 draws the conclusions.

2 3DBGrowth: The proposed method

The usual approach of annotating or stating seeds for segmenting medical images can be cumbersome for large 3D exams. Thus, this work main issue focuses on minimizing the human effort to segment and reconstruct 3D exams built on 2D slices. As illustrated in  Figure 2, depending on the MRI exam, not all slices have to be manually annotated by the user to process the 3D reconstruction. If the exams present a small spacing between slices (considering annotations on the sagittal plane), several slices do not need to be annotated, once they are similar. This can be assessed by analyzing the negative slope coefficient [23], which gives the best trade-off between annotation time and performance measures, such as Dice Score Coefficient or Jaccard Coefficient (better explored in the next Section).

Input: Image and labels matrix .
Output: Segmented binary image L == 1.
// Initial weights
1 for  do // For every voxel
2       for  do // and its Neighbors
3             if  then
4                  
5            
6      
Algorithm 1 3DBGrowth method overview.

The slope between two points, and , can be calculated by , which is the rate of change between the two points. When the slope between two values of annotation time gets close to a straight horizontal line, there is no gain in annotation time, in other words, the closer the slope gets to 0, the lower the annotation time gain. Moreover, by using a segmentation approach that does not require detailed interior/exterior annotation, such as BGrowth, the total time spent on annotation is greatly diminished. In addition, BGrowth generally requires a simple rectangle-like annotation for the segmentation of individual vertebral bodies. For example, only 3 out of 7 slices were annotated in Figure 2. In average, each slice took 6.5 seconds for annotation and 3DBGrowth took seconds to process all 7 slices with only 16 iterations. Summing up, the whole process took seconds and achieved a result close to the Ground-Truth.

In our proposed 3DBGrowth method (Algorithm 1), we initially consider the segmentation of foreground and background in gray-scale images. That is, considering a digital image and its annotations/labels as a matrix , both with dimension , representing the number of rows, columns and slices, respectively. Each entry in has value (background), (unlabelled) or (foreground).

Initially, each entry in a weight matrix (with the same dimensions as and ) is set to for seeds points and otherwise (line 1). Then, for every voxel and each one of its 26 neighbours , a strength factor is calculated (line 4). Here, the absolute intensity difference is normalized by the maximum intensity in the image and shifted by 1. Finally, is multiplied by the current weight , which produces values within . If the strength is greater than the neighbour's strength (, line 5), then the neighbour's strength is averaged with the new strength (line 6) and its label receives the label of the voxel (line 7).

This process repeats until the algorithm converges or for a fixed number of iterations defined by the user.

3 Materials and methods

Symbol/Acronym Description
Dice Score Coefficient
Jaccard Coefficient
Hausdorff's distance
GC GrowCut
BG 3D Balanced Growth
Table 1: Summary of measures/methods used in this work.

The methods and measures used for comparison, as well as the computational set-up and image dataset are described as follows.

3.1 Image Dataset

Due to space limitations, only a meaningful dataset is presented herein, which comprises 17 anonymized MRI exams, ranging from the sacrum (S1) to the mid thoracic (T6-T12) with corresponding manual segmentations. The exams present several health conditions, such as scoliosis, spondylolisthesis and crushed vertebra. The exams have mm of slice thickness and mm of spacing between slices. More information and full access to the dataset is available at [5].

3.2 Segmentation methods

In order to evaluate the performance of 3DBGrowth (BG) in a 3D scenario, we compared it with GrowCut (GC), which has been widely used for the task of vertebrae segmentation [7]. Since Fast GrowCut is an approximation of the original GrowCut, presenting a lower accuracy [18]

, we consider only GrowCut on the experiments. Due to the limited number of samples (exams) no deep-learning approach was applied.

3.3 Comparison measures

The Jaccard Coefficient (), Dice Score Coefficient () and Hausdorff's Distance () in voxels [24, 25] were considered. The Jaccard () calculates the intersection of the manual and semi-automatic segmentation, and divides it by the union of them. This indicates the similarity between the segmentations, in which 0 indicates no similarity and, the closer is to 1, the more alike the segmentations [26]. The Dice () measures the spatial overlap of several segmentations of the same object, i.e, quantifies the overlap degree between two segmented objects. A close to indicates very low overlap, while a closer to 1 indicates a higher overlap. In contrast, the Hausdorff's Distance () indicates how far away (in voxels) the manual and semi-automatic segmentations are. A of 0 indicates comparable segmentations.

Table 1 shows a summary of the segmentation methods and comparison measures used in this work.

(a) Original (b) Ground-Truth (c) Annotation
Figure 3: Example of sloppy annotation for a few vertebral bodies in one slice (Aka2, slice 8): ground-truth, interior and exterior annotations in red, magenta and blue, respectively.

3.4 Computational set-up

The experiments were performed on a 2.40GHz Intel(R) Core(TM) i7 CPU and 8GB RAM machine, using Matlab(R) version 2018a. The maximum number of iterations was set to 50 for GrowCut and 3DBGrowth. No pre or post-processing technique were applied to assure the same conditions for all segmentation methods.

4 Experiments, results and discussion

In our experimental design, we analyzed four main parts are: (A) the performance of each segmentation method is assessed using the whole exam; (B) each segmentation method is tested varying the number of slices annotated for each exam; (C) the vertebral bodies are segmented one-by-one by each method; (D) a statistical test is applied to detect any significant difference between the results of the two methods.

4.1 Exam segmentation analysis

The initial interior and exterior annotation were performed in a “sloppy” way, i.e., no detailed boundary for accentuated curves were drawn. In general, the annotation looks like a rectangle for the background and a simple line for the foreground (Figure 3). For this experiment, this annotation has been performed on each slice on every exam and, to diminish computational processing, each exam is cropped using the convex hull of the exterior annotation.

Table 2 shows the average Dice Score (), Jaccard () and Running Time () in seconds for each one of the 17 exams in the dataset. 3DBGrowth (BG) presented on average 81% and 68% while GrowCut (GC) presented 76% and 61%, respectively. Thus, BG presented higher and percentages than GC for all exams, achieving up to 5% and 7% of and gain, respectively. Moreover, considering and

, BG's standard deviation is slightly lower. Analyzing the Running Time (

), very often, BG presented a lower average processing time than GC.

Exam (%) (%) (s)
(#slices) BG GC Gain BG GC Gain BG GC
DzZ_T1 (12) 85 80 4.86 74 67 7.0 18 21
DzZ_T2 (12) 82 77 4.2 69 63 5.8 27 31
 AKa2 (15) 82 77 5.17 69 62 7.1 27 27
AKa3 (15) 78 73 4.78 64 58 6.2 27 29
AKa4 (15) 80 73 7.10 67 58 9.4 26 27
AKs5 (15) 84 78 6.54 73 63 9.2 23 24
AKs6 (15) 84 79 5.44 73 65 7.8 21 24
AKs7 (15) 80 73 7.6 67 57 9.9 21 24
AKs8 (15) 81 78 3.39 68 64 4.7 18 21
S01 (16) 85 82 2.91 74 70 4.3 44 50
S02 (16) 83 78 4.97 70 63 6.9 26 32
F02 (18) 78 74 3.63 64 59 4.7 48 55
St1 (20) 83 80 2.73 71 67 3.9 61 67
F04 (23) 78 75 3.42 64 60 4.5 13 14
AKs3 (25) 80 73 6.42 66 58 8.4 40 37
F03 (25) 80 77 3.57 67 62 4.8 08 09
C002 (31) 71 65 5.85 55 48 6.7 16 14
Mean 81 76 4.9 68 61 6.6 27 30
Std. dev. 3.4 3.9 1.5 4.7 5.0 1.9 13.7 15.2
Table 2: Dice Score (), Jaccard () and Running Time () in seconds for 3DBGrowth (BG) and GrowCut (GC), considering all slices on each exam (volumetric). The best results are highlighted in bold.

Considering that the manual annotation of every slice in the exam is too time consuming (for this dataset, on average, 11 minutes/exam), we conducted an experiment to validate the performance of 3DBGrowth and GrowCut when not all slices are annotated, as explored in the next section.

4.2 Variation on the number of annotated slices

We used the previous experiment's annotations and left a few slices without annotation: we defined a slice distance, which manages the number of non-annotated slices between two annotated slices. For example, a slice distance of 0 implicates no slice is left without annotation. The slice distance started at 0, increased by 1, up to 7.

As the slice distance increases (Figure 4), the average annotation time decreases and the processing time keeps almost steady for both methods. Also, and drops slowly for both methods. However, BG presented best results than GC for both measures. Considering the negative slope coefficient (as discussed in Section 2), highlighted over the magenta line, by using a threshold of -1, the best slice distance would be 3, which presents the best trade-off between annotation time and /.

Figure 4: Quality comparison between 3DBGrowth and GrowCut over variations on the number of slices manually annotated: (a) annotation time and running time results; (b) Dice () and Jaccard ().
(a) Original (b) Ground-Truth (c) Annotation
Figure 5: Example of seed points for a single vertebrae (St1, slice 10, L2): ground-truth (GT), interior and exterior annotations in red, magenta and blue, respectively.
(%) (%) (vox.) (s)
Vertebrae BG GC BG GC BG GC BG GC
T6 88  87 79  78 3.16 4.00 0.15 0.16
T7 85  83 73  69 78.8 79.1 7.95 6.77
T8 86  85 76  72 79.3 79.2 7.55 8.37
T9 81  80 50  52 80.0 79.9 7.83 8.38
T10 87  85 80  76 26.2 26.9 2.95 2.83
T11 86  84 77  73 6.09 6.99 0.91 0.88
Toracic T12 89  86 79  76 5.63 6.88 1.30 1.36
L1 89  87 78  76 6.56 7.31 1.57 1.56
L2 88  86 79  76 6.07 7.75 1.52 1.57
L3 86  85 75  72 6.40 7.49 1.68 1.83
L4 88  86 76  74 7.16 7.65 1.77 1.88
Lumbar L5 87  85 76  74 7.04 8.42 2.33 2.39
Sacral S1 88  86 79  76 6.18 7.57 1.74 1.88
Mean 87  85 77  74 7.24 7.72 1.52 1.52
Std. Dev. .07 .06 .08 .08 4.85 5.00 1.27 1.27
Table 3: Comparison between 3DBGrowth (BG) and GrowCut (GC) for the Dice Score (), Jaccard (), Hausdorff () in voxels and Running Time () in seconds. The best values are highlighted in bold.
Slices out of (verte-
Vertebrae annotated bral content) (seconds)
T6
T7
T8
T9
T10
T11
Toracic T12
L1
L2
L3
L4
Lumbar L5
Sacral S1
Mean
Annotated 37%
Table 4: Comparison of the number of annotated slices, considering a slice distance of three.

In the next Section, we conduct experiments using annotations for individual vertebral bodies.

4.3 Individual vertebrae segmentation

To speed-up the annotation process, we have considered a slice distance of three for this experiment. Each vertebral body was annotated separately, as exemplified in  Figure 5. In general, both the interior and the exterior annotation looks like a rectangle and no detailed borders were drawn.

As reported in Table 3, GC and BG presented equal mean Running Time () and BG presented better mean DSC, JAC and HD than GrowCut.  Figure 6 depicts the results for a single vertebral body: BG achieved the highest and the lowest . GC presented spiculated borders, while BG presented smooth borders (closer to the ground-truth).

Analyzing the average number of annotated slices per vertebra (Table 4), for this dataset, in average, only 37% of the total slices with vertebral content were annotated, which speeded-up the annotation process and took, in average, 36 seconds to annotate each vertebral body.

(a) Ground-Truth (b) Annotation
(c) 3DBGrowth (d) GrowCut
Figure 6: Comparison of results for L2 on exam AKa2: three slices, out of 7, were annotated.

To further investigate the results presented in Table 3, we conducted a statistical test, as detailed next.

4.4 Statistical testing

Considering that the resulting values of each measure had several similar values, the Kolmogorov-Smirnov [27]

test was applied to verify the normality of the data. As the null hypothesis that the data follows a normal distribution was rejected for all measures, the Wilcoxon 

[28] test was used to analyze if there were significant statistical differences. In this test, the null hypothesis is that data from two dependent samples, e.g. the Dice Score () from 3DBGrowth (BG) and GrowCut (GC), were selected from populations having the same distribution, against the opposite alternative.

In the Wilcoxon test results, 3DBGrowth presented significantly better Dice (), Jaccard () and Hausdorf's Distance () than GrowCut. For the Running Time (), there was no significant difference, which implicates that both methods presented comparable processing time.

5 Conclusion

The semi-automatic segmentation of vertebral bodies in a volumetric scenario is a challenging task, due to the large number of slices in the exams. To obtain a proper 3D reconstruction of the vertebrae, one has to pay attention on allowing a fast and accurate segmentation of slices. We have investigated this challenge and used the slope coefficient of the annotation time, so that the specialists' annotations were extrapolated from a slice to its neighbours up to a given limit without losing accuracy and, at the same time, reduced the total time spent on manual annotation.

On the dataset used, on average, only 37% of the slices with vertebral body content had to be annotated, consequently making the process faster (on average, 36 seconds for each vertebral body). We have proposed 3DBGrowth method, which significantly outperforms GrowCut and keeps comparable running time. Moreover, 3DBGrowth presented the best results even with simple/sloppy seed points, which demands less effort on the annotation process.

Acknowledgment

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 and grant No.: 0487/17083480, by the São Paulo Research Foundation (FAPESP, grants No. 2016/17078-0, 2017/23780-2, 2018/06228-7, 2018/24414-2), and the National Council for Scientific and Technological Development (CNPq).

References

  • [1] FEHLINGS, M. G. et al. The aging of the global populationthe changing epidemiology of disease and spinal disorders. Neurosurgery, v. 77, n. 1, p. 1–5, 2015.
  • [2] RAK, M.; TÖNNIES, K. D. On computerized methods for spine analysis in MRI: a systematic review. Int. J. Comput. Assist. Radiol. Surg., v. 11, n. 8, p. 1445–1465, Aug 2016. ISSN 1861-6429.
  • [3] WANG, Y. X. J. et al. Identifying osteoporotic vertebral endplate and cortex fractures. Quant Imaging Med Surg, v. 7, n. 5, p. 555–591, Oct 2017.
  • [4] HAMMERNIK, K. et al. Vertebrae segmentation in 3D CT images based on a variational framework. In: YAO, J. et al. (Ed.). Recent Advances in Computational Methods and Clinical Applications for Spine Imaging. Cham: Springer International Publishing, 2015. p. 227–233. ISBN 978-3-319-14148-0.
  • [5] DZENAN, Z. et al. Robust detection and segmentation for diagnosis of vertebral diseases using routine MR images. Computer Graphics Forum, v. 33, n. 6, p. 190–204, 2014.
  • [6] GILLIES, R. J.; KINAHAN, P. E.; HRICAK, H. Radiomics: images are more than pictures, they are data. Radiology, v. 278, n. 2, p. 563–577, Feb 2016.
  • [7] EGGER, J.; NIMSKY, C.; CHEN, X. Vertebral body segmentation with GrowCut: Initial experience, workflow and practical application. SAGE Open Med, v. 5, p. 1–5, 2017.
  • [8]

    JUNIOR, J. R. F. et al. Radiomics-based features for pattern recognition of lung cancer histopathology and metastases.

    Computer Methods and Programs in Biomedicine, v. 159, p. 23 – 30, 2018. ISSN 0169-2607.
  • [9] CASTI, P. et al. Cooperative strategy for a dynamic ensemble of classification models in clinical applications: the case of MRI vertebral compression fractures. International Journal of Computer Assisted Radiology and Surgery, v. 12, n. 11, p. 1971–1983, Nov 2017.
  • [10] FRIGHETTO-PEREIRA, L. et al. Shape, texture and statistical features for classification of benign and malignant vertebral compression fractures in magnetic resonance images. Computers in Biology and Medicine, v. 73, p. 147 – 156, 2016. ISSN 0010-4825.
  • [11] CAZZOLATO, M. T. et al. dp-breath: Heat maps and probabilistic classification assisting the analysis of abnormal lung regions. Computer Methods and Programs in Biomedicine, v. 173, p. 27–34, 2019. ISSN 0169-2607.
  • [12] XUE, Z. et al. Spine X-ray image retrieval using partial vertebral boundaries. In: 2011 24th International Symposium on Computer-Based Medical Systems (CBMS). [S.l.: s.n.], 2011. p. 1–6. ISSN 1063-7125.
  • [13] GURURAJAN, A. et al. On the creation of a segmentation library for digitized cervical and lumbar spine radiographs. Computerized Medical Imaging and Graphics, v. 35, n. 4, p. 251 – 265, 2011. ISSN 0895-6111.
  • [14]

    KARIMI, D. et al. Prostate segmentation in MRI using a convolutional neural network architecture and training strategy based on statistical shape models.

    Int. J. Computer Assisted Radiology and Surgery, v. 13, n. 8, p. 1211–1219, 2018.
  • [15] STEFAN, P. et al. A radiation-free mixed-reality training environment and assessment concept for C-arm-based surgery. International Journal of Computer Assisted Radiology and Surgery, v. 13, n. 9, p. 1335–1344, Sep 2018. ISSN 1861-6429.
  • [16] BANERJEE, P. et al. A semi-automated approach to improve the efficiency of medical imaging segmentation for haptic rendering. Journal of Digital Imaging, v. 30, n. 4, p. 519–527, Aug 2017. ISSN 1618-727X.
  • [17] VEZHNEVETS, V.; KONOUCHINE, V. GrowCut - interactive multi-label N-D image segmentation by cellular automata. International Conference on Computer Graphics and Vision - GraphiCon, v. 1, Nov 2005.
  • [18] ZHU, L. et al. An effective interactive medical image segmentation method using Fast GrowCut. In: . [S.l.: s.n.], 2014. v. 17.
  • [19] KOREZ, R. et al. Model-based segmentation of vertebral bodies from MR images with 3D CNNs. In: OURSELIN, S. et al. (Ed.). Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016. Cham: Springer Int. Publishing, 2016. p. 433–441. ISBN 978-3-319-46723-8.
  • [20] GAONKAR, B. et al. Multi-parameter ensemble learning for automated vertebral body segmentation in heterogeneously acquired clinical MR images. J. of Translational Engineering in Health and Medicine, v. 5, p. 1–12, 2017. ISSN 2168-2372.
  • [21] HILLE, G. et al. Vertebral body segmentation in wide range clinical routine spine MRI data. Computer Methods and Programs in Biomedicine, v. 155, p. 93 – 99, 2018. ISSN 0169-2607.
  • [22] RAMOS, J. S. et al. BGrowth: an efficient approach for the segmentation of vertebral compression fractures in magnetic resonance imaging. Symposium on Applied Computing, p. 1–8, April 2019.
  • [23]

    VANNESCHI, L. et al. Fitness clouds and problem hardness in genetic programming. In: SPRINGER.

    Genetic and Evolutionary Computation Conference

    . [S.l.], 2004. p. 690–701.
  • [24] JACCARD, P. The distribution of the flora in the alpine zone. New Phytologist, v. 11, n. 2, p. 37–50, fev. 1912.
  • [25] SØRENSEN, T. A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species Content and Its Application to Analyses of the Vegetation on Danish Commons. [S.l.]: I kommission hos E. Munksgaard, 1948. (Biologiske skrifter).
  • [26] BARBIERI, P. D. et al. Vertebral body segmentation of spine MR images using Superpixels. In: JUNIOR, C. T. et al. (Ed.). 28th IEEE International Symposium on Computer-Based Medical Systems. São Carlos and Ribeirão Preto, Brazil: Conference Publishing Services (CPS), 2015. p. 44–49. ISSN 1063-7125.
  • [27] MASSEY, F. J. The Kolmogorov-Smirnov test for goodness of fit. Journal of the American Statistical Association, American Statistical Association, v. 46, n. 253, p. 68–78, 1951.
  • [28]

    WILCOXON, F.; KATTI, S.; WILCOX, R. Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test.

    Selected Tables in Mathematical Statistics, v. 1, p. 171–259, 1970.