Contrast-agnostic segmentation of MRI scans
We present a deep learning strategy that enables, for the first time, contrast-agnostic semantic segmentation of completely unpreprocessed brain MRI scans, without requiring additional training or fine-tuning for new modalities. Classical Bayesian methods address this segmentation problem with unsupervised intensity models, but require significant computational resources. In contrast, learning-based methods can be fast at test time, but are sensitive to the data available at training. Our proposed learning method, SynthSeg, leverages a set of training segmentations (no intensity images required) to generate synthetic sample images of widely varying contrasts on the fly during training. These samples are produced using the generative model of the classical Bayesian segmentation framework, with randomly sampled parameters for appearance, deformation, noise, and bias field. Because each mini-batch has a different synthetic contrast, the final network is not biased towards any MRI contrast. We comprehensively evaluate our approach on four datasets comprising over 1,000 subjects and four types of MR contrast. The results show that our approach successfully segments every contrast in the data, performing slightly better than classical Bayesian segmentation, and three orders of magnitude faster. Moreover, even within the same type of MRI contrast, our strategy generalizes significantly better across datasets, compared to training using real images. Finally, we find that synthesizing a broad range of contrasts, even if unrealistic, increases the generalization of the neural network. Our code and model are open source at https://github.com/BBillot/SynthSeg.READ FULL TEXT VIEW PDF
Despite advances in data augmentation and transfer learning, convolution...
Probabilistic atlas priors have been commonly used to derive adaptive an...
Partial voluming (PV) is arguably the last crucial unsolved problem in
We introduce a learning-based strategy for multi-modal registration of i...
Image registration is a widely-used technique in analysing large scale
Segmentation of structural and diffusion MRI (sMRI/dMRI) is usually perf...
Deep neural networks are powerful tools for biomedical image segmentatio...
Contrast-agnostic segmentation of MRI scans
Library for generating images by sampling a GMM conditioned on label maps
a tool to segment the hypothalamus and associated subunits on T1-weighted MRI scans
Library for generating images by sampling a GMM conditioned on label maps
Segmentation of brain MR scans is an important task in neuroimaging, as it is a primary step in a wide array of subsequent analyses such as volumetry, morphology, and connectivity studies. Despite the success of modern supervised segmentation methods, especially convolutional neural networks (CNN), their adoption in neuroimaging has been hindered by the high variety in MRI contrasts. These approaches often require a large set of manually segmented preprocessed imagesfor each desired contrast. However, since manual segmentation is costly, such supervision is often not available. A straightforward solution, implemented by widespread neuroimaging packages like FreeSurfer [Fischl(2012)] or FSL [Jenkinson et al.(2012)Jenkinson, Beckmann, Behrens, Woolrich, and Smith], is to require a 3D, T1-weighted scan for every subject, which is aggressively preprocessed, then used for segmentation purposes. However, such a requirement precludes analysis of datasets for which 3D T1 scans are not available.
Robustness to MRI contrast variations has classically been achieved with Bayesian methods. These approaches rely on a generative model of brain MRI scans, which combines an anatomical prior (a statistical atlas) and a likelihood distribution. The likelihood typically models the image intensities of different brain regions as a Gaussian mixture model (GMM), as well as artifacts such as bias field. Test scans are segmented by “inverting” this generative model using Bayesian inference. If the GMM parameters are independently derived from each test scan in an unsupervised fashion[Van Leemput et al.(1999)Van Leemput, Maes, Vandermeulen, and Suetens, Zhang et al.(2001)Zhang, Brady, and Smith, Ashburner and Friston(2005)], this approach is fully adaptive to any MRI contrast. In some cases, a priori information is included in the parameters, which constrains the method to a specific contrast [Wells et al.(1996)Wells, Grimson, Kikinis, and Jolesz, Fischl et al.(2002)Fischl, Salat, Busa, Albert, Dieterich, Haselgrove, van der Kouwe, Killiany, Kennedy, Klaveness, Montillo, Makris, Rosen, and Dale, Patenaude et al.(2011)Patenaude, Smith, Kennedy, and Jenkinson] – yet even in these methods, Bayesian approaches are generally robust to small contrast variations. Such robustness is the an important reason why Bayesian techniques remain at the core of all major neuroimaging packages, such as FreeSurfer, FSL, or SPM [Ashburner(2012)]. However, these strategies require significant computational resources (tens of minutes per scan) compared to recent deep learning methods, limiting deployment at large scale or time-sensitive applications.
Another popular family of neuroimaging segmentation methods is multi-atlas segmentation (MAS) [Rohlfing et al.(2004)Rohlfing, Brandt, Menzel, and Maurer, Iglesias and Sabuncu(2015)]. In MAS, several labeled scans (“atlases”) are registered to the test scan, and their deformed labels are merged into a final segmentation with a label-fusion algorithm [Sabuncu et al.(2010)Sabuncu, Yeo, Van Leemput, Fischl, and Golland]. MAS was originally designed for intra-modality problems, but can be extended to cross-modality problems by using multi-modality registration metrics like mutual information [Wells III et al.(1996)Wells III, Viola, Atsumi, Nakajima, and Kikinis, Maes et al.(1997)Maes, Collignon, Vandermeulen, Marchal, and Suetens]. However, their performance in this scenario is poor, due to the limited accuracy of nonlinear registration algorithms across modalities [Iglesias et al.(2013)Iglesias, Konukoglu, Zikic, Glocker, Van Leemput, and Fischl]. Another main drawback of MAS has traditionally been the high computational cost of the multiple non-linear registrations. While this is quickly changing with the advent of fast, deep learning based registration techniques [Balakrishnan et al.(2019)Balakrishnan, Zhao, Sabuncu, Guttag, and Dalca, de Vos et al.(2017)de Vos, Berendsen, Viergever, Staring, and Išgum], accurate deformable registration for arbitrary modalities has not been widely demonstrated with these methods.
The modern segmentation literature is dominated by CNNs [Milletari et al.(2016)Milletari, Navab, and Ahmadi, Kamnitsas et al.(2017b)Kamnitsas, Ledig, Newcombe, Simpson, Kane, Menon, Rueckert, and Glocker], particularly the U-Net architecture [Ronneberger et al.(2015)Ronneberger, Fischer, and Brox]. Although CNNs produce fast and accurate segmentations when trained for modality-specific applications, they typically do not generalize well to image contrasts which are different from the training data [Akkus et al.(2017)Akkus, Galimzianova, Hoogi, Rubin, and Erickson, Jog and Fischl(2018), Karani et al.(2018)Karani, Chaitanya, Baumgartner, and Konukoglu]. A possible solution is to train a network with multi-modal data, possibly with modality dropout during training [Havaei et al.(2016)Havaei, Guizard, Chapados, and Bengio], although this assumes access to manually labeled data on a wide range of acquisitions, which is problematic. One can also augment the training dataset with synthetic contrast variations that are not initially available from uni- or multi-modal scans [Chartsias et al.(2018)Chartsias, Joyce, Giuffrida, and Tsaftaris, Huo et al.(2018)Huo, Xu, Moon, Bao, Assad, Moyo, Savona, Abramson, and Landman, Kamnitsas et al.(2017a)Kamnitsas, Baumgartner, Ledig, Newcombe, Simpson, Kane, Menon, Nori, Criminisi, Rueckert, et al., Jog et al.(2019)Jog, Hoopes, Greve, Van Leemput, and Fischl]. Recent papers have also shown that spatial and intensity data augmentation can improve network robustness [Chaitanya et al.(2019)Chaitanya, Karani, Baumgartner, Becker, Donati, and Konukoglu, Zhao et al.(2019)Zhao, Balakrishnan, Durand, Guttag, and Dalca]. Although these approaches make segmentation CNNs adaptive to brain scans of observed contrasts, they remain limited to the modalities (real or simulated) present in the training data, and thus have reduced accuracy when tested on previously unseen MR contrasts.
To address modality-agnostic learning-based segmentation, a CNN was recently used to quickly solve the inference problem within the Bayesian segmentation framework [Dalca et al.(2019a)Dalca, Yu, Golland, Fischl, Sabuncu, and Iglesias]. However, this method cannot be directly used to segment test scans of arbitrary contrasts, as it requires training on a set of unlabeled scans for each target modality.
In this paper we present SynthSeg, a novel learning strategy that enables automatic segmentation of unpreprocessed brain scans of any MRI contrast without any need for paired training data, re-training, or fine tuning. We train a CNN using a dataset of only segmentation maps: synthetic images are produced by sampling a generative model of Bayesian segmentation, conditioned on a segmentation map. By sampling model parameters randomly at every mini-batch, we expose the CNN to synthetic (and often unrealistic) contrasts during training, and force it to learn features that are inherently contrast agnostic. Our experiments demonstrate SynthSeg on four different MRI contrasts. We also show that even within the same MRI contrast, SynthSeg generalizes across datasets better than a CNN trained on real images of this contrast.
We first introduce the generative model for Bayesian MRI segmentation, and then describe our method, which builds on this framework to achieve modality-agnostic segmentation.
The Bayesian segmentation framework relies on a probabilistic generative model for brain scans. Let be a 3D label (segmentation) map consisting of voxels, such that each voxel value is one of possible labels: . The generative model starts with a prior anatomical distribution , typically represented as a (precomputed) statistical atlas , which associates each voxel location with ais endowed with a spatial deformation model: the label probabilities are warped with a field , parameterized by , which follows a distribution chosen to encourage smooth deformations. The probability of observing is then: p(L— A, θ_ϕ) = ∏_j=1^J [A ∘ϕ(θ_ϕ)]_j,L_j, where is the probability of label given by the warped atlas at location .
Given a label map , the image likelihood for its corresponding image is commonly modeled as a GMM (conditioned on ), modulated by smooth, multiplicative bias field noise (additive in the more convenient logarithmic domain). Specifically, each label
is associated with a Gaussian distribution for intensities of mean
, and standard deviation. We group these Gaussian parameters into . The bias field is often modeled as a linear combination of smooth basis functions, where linear coefficients are grouped in [Larsen et al.(2014)Larsen, Iglesias, and Van Leemput]. The image likelihood is given by: p(I — L, θ_B, θ_G) = ∏_j N(I_j - B_j(θ_B) ; μ_L_j , σ_L_j^2), where is the Gaussian distribution, image intensity at voxel , and is the bias field at voxel . Note that we assume that has been log-transformed, such that the bias field is additive, rather than multiplicative.
Bayesian segmentation uses Bayes’s rule to “invert” this generative model to estimate, posing segmentation as an optimization problem. This type of inversion often relies on computing point estimates for the model parameters. Fitting the Gaussian parameters to the intensity distribution of the test scan is what makes these algorithms contrast agnostic.
[t] *M segmentations not converged *select input map *affine parameters *sample SVF parameters *upscaling and integration *form deformation *deform selected label map *sample Gaussian parameters *sample GMM image *Spatial blurring *sample bias field parameters *upscaling and exponentiation *bias field corruption *gamma augmentation parameter *gamma and normalization via eq:intensity_augm update CNN weights with pair *SGD iteration
We propose to train a segmentation CNN using synthetic data created on the fly with a generative model very similar to that of Bayesian segmentation. Since the voxel independence assumption would yield extremely heterogeneous noisy images we rely on a set of original label maps instead of random samples from a probabilistic atlas. We also slightly blur the sampled intensities. The proposed learning strategy, detailed below, is summarized in fig:schematic and Algorithm 1, with example samples in fig:augm_example.
Data sampling: In training, mini-batches are created by sampling image-segmentation pairs as follows. First, we randomly select a label map from the training dataset (fig:augm_examplea), by sampling , where
is the discrete uniform distribution.
Next, we generate a random deformation field to obtain a new anatomical map . The deformation field is the composition of an affine and a deformable random transform, and , parameterized by and , respectively: . The affine component is the composition of three rotations (), three scalings (), three shears (), and three translations (). All these parameters are independently sampled from continuous uniform distributions with predefined ranges: , , , and , respectively. The deformable component is a diffeomorphic transform, obtained by integrating a smooth, random stationary velocity field (SVF) with a scaling and squaring approach [Moler and Van Loan(2003), Arsigny et al.(2006)Arsigny, Commowick, Pennec, and Ayache], implemented efficiently for a GPU [Dalca et al.(2019b)Dalca, Balakrishnan, Guttag, and Sabuncu, Krebs et al.(2019)Krebs, Delingette, Mailhé, Ayache, and Mansi]. The SVF is generated by first sampling the parameters
. This is a random, low-resolution tensor (size), where each element is a sample from a zero-mean Gaussian distribution with standard deviation
. This tensor is subsequently upscaled to the desired image resolution with trilinear interpolation, to obtain a smooth SVF, which is integrated to obtain. The final deformed label map is obtained by resampling L = S_i ∘ϕ= S_i ∘[ϕ_aff(θ_aff)∘ϕ_v(θ_v)] with nearest neighbor interpolation. This generative model yields a wide distribution of neuroanatomical shapes, while ensuring spatial smoothness (fig:augm_exampleb).
Given the segmentation , we sample a synthetic image as follows. First, we sample an image conditioned on , following the likelihood model introduced in section 2.1, one voxel at the time using . The Gaussian parameters are a set of independent means and standard deviations drawn from continuous uniform distributions and , respectively. Sampling independently from a wide range of values yields images of extremely diverse contrasts (fig:augm_examplec). To mimic partial volume effects, we make the synthetic images more realistic by introducing a small degree of spatial correlation between neighboring voxels. This is achieved by blurring with a Gaussian kernel with standard deviation voxels, i.e., (fig:augm_exampled).
We corrupt the images with a bias field , parameterized by . is generated in a similar way as the SVF: is a random, low resolution tensor (size in our experiments), whose elements are independent samples of a Gaussian distribution . This tensor is upscaled to the image size of with trilinear interpolation, and the voxel-wise exponential is taken to ensure non-negativity. The bias field corrupted image is obtained by voxel-wise multiplication: (fig:augm_examplee).
Finally, the training image is generated by standard gamma augmentation and normalization of intensities. We first sample from a uniform distribution and then:
Training: Starting from a set of label maps, we use the generative process described above to form training pairs (fig:augm_examplef). These pairs – each sampled with different parameters – are used to train the CNN in a standard supervised fashion, as illustrated in in fig:schematic.
Architecture. We use a U-Net style architecture [Ronneberger et al.(2015)Ronneberger, Fischer, and Brox, Çiçek et al.(2016)Çiçek, Abdulkadir, Lienkamp, Brox, and Ronneberger]
with 5 levels of 2 layers each. The first layer contains 24 feature maps, and this number is doubled after each max-pooling, and halved after each upsampling. Convolutions are performed with kernels of size
, and use the Exponential Linear Unit as activation function[Clevert et al.(2016)Clevert, Unterthiner, and Hochreiter]
. The last layer uses a softmax activation function. The loss function is the average soft Dice[Milletari et al.(2016)Milletari, Navab, and Ahmadi] coefficient between the ground truth segmentation and the probability map corresponding to the predicted output.
Parametric distributions and intensity constraints.
The proposed generative model involves several hyperparameters (described above), which control the priors of model parameters. In order to achieve invariance to input contrast, we sample the hyperparameters of the GMM (describing priors for intensity means and variances) from wide ranges in an independent fashion, generally leading to unrealistic images (fig:augm_example). The deformation hyperparameters are chosen to yield a wide range of shapes – well beyond plausible anatomy. We emphasize that the hyperparameter values, summarized in tab:hyperparameters, arenot chosen to mimic a particular imaging modality or subject cohort.
Skull stripping. The proposed method is designed to segment brain MRI without any preprocessing. However, in practice, some brain MRI datasets do not include extracerebral tissue, for example due to privacy issues. We build robustness to skull-stripped images into our method, by treating all extracerebral regions as background in 20% of training samples.
Our model, including the image sampling process, is implemented on the GPU in Keras[Chollet(2015)]
with a Tensorflow[Abadi et al.(2016)Abadi, Barham, Chen, Chen, Davis, Dean, Devin, Ghemawat, Irving, Isard, et al.] backend.
We provide experiments to evaluate segmentation of unprocessed scans, eliminating the dependence on additional tools which can be CPU intensive and require manual tuning.
We use four datasets with an array of modalities and contrast variations within each modality. All datasets contain labels for 37 regions of interest (ROIs), with the same labeling protocol.
T1-39: 39 whole head T1 scans with manual segmentations [Fischl(2012)]. We split the dataset into subsets of 20 and 19 scans. We use the labels maps of the first 20 as the only inputs to train SynthSeg and evaluate on the held-out 19. We augmented the manual labels with approximate segmentations for skull, eye fluid, and other extra-cerebral tissue, computed semi-automatically with in-house tools, enabling synthesis of full head scans.
T1mix: 1,000 T1 whole head MRI scans collected from seven public datasets: ABIDE [Di Martino et al.(2014)Di Martino, Yan, Li, Denio, Castellanos, Alaerts, Anderson, Assaf, Bookheimer, Dapretto, Deen, Delmonte, Dinstein, Ertl-Wagner, Fair, Gallagher, Kennedy, Keown, Keysers, Lainhart, Lord, Luna, Menon, Minshew, Monk, Mueller, Müller, Nebel, Nigg, O’Hearn, Pelphrey, Peltier, Rudie, Sunaert, Thioux, Tyszka, Uddin, Verhoeven, Wenderoth, Wiggins, Mostofsky, and Milham], ADHD200 [noa(2012)], GSP [Holmes et al.(2015)Holmes, Hollinshead, O’Keefe, Petrov, Fariello, Wald, Fischl, Rosen, Mair, Roffman, Smoller, and Buckner], HABS [Dagley et al.(2017)Dagley, LaPoint, Huijbers, Hedden, McLaren, Chatwal, Papp, Amariglio, Blacker, Rentz, Johnson, Sperling, and Schultz], MCIC [Gollub et al.(2013)Gollub, Shoemaker, King, White, Ehrlich, Sponheim, Clark, Turner, Mueller, Magnotta, O’Leary, Ho, Brauns, Manoach, Seidman, Bustillo, Lauriello, Bockholt, Lim, Rosen, Schulz, Calhoun, and Andreasen], OASIS [Marcus et al.(2007)Marcus, Wang, Parker, Csernansky, Morris, and Buckner], and PPMI [Marek et al.(2011)Marek, Jennings, Lasch, Siderowf, Tanner, Simuni, Coffey, Kieburtz, Flagg, Chowdhury, Poewe, Mollenhauer, Sherer, Frasier, Meunier, Rudolph, Casaceli, Seibyl, Mendick, Schuff, Zhang, Toga, Crawford, Ansbach, Blasio, Piovella, Trojanowski, Shaw, Singleton, Hawkins, Eberling, Russell, Leary, Factor, Sommerfeld, Hogarth, Pighetti, Williams, Standaert, Guthrie, Hauser, Delgado, Jankovic, Hunter, Stern, Tran, Leverenz, Baca, Frank, Thomas, Richard, Deeley, Rees, Sprenger, Lang, Shill, Obradov, Fernandez, Winters, Berg, Gauss, Galasko, Fontaine, Mari, Gerstenhaber, Brooks, Malloy, Barone, Longo, Comery, Ravina, Grachev, Gallagher, Collins, Widnell, Ostrowizki, Fontoura, La-Roche, Ho, Luthman, Brug, Reith, and Taylor]. Although these scans share the same modality, they exhibit variability in intensity distributions and head positioning due to differences in acquisition platforms and sequences. Since manual delineations are not available for these scans, for evaluation we use automated segmentation obtained with FreeSurfer as ground truth [Fischl(2012), Dalca et al.(2018)Dalca, Guttag, and Sabuncu]. T1mix enables evaluation on a large dataset of heterogeneous T1 contrasts.
T1-PD-8: T1 and proton density (PD) scans for 8 subjects, along with manual delineations (for evaluation). These scans had been approximately skull stripped prior to availability. Despite its smaller size, this dataset enables evaluation on a contrast (PD) that is very different than T1.
FSM: 18 subjects, each with 3 modalities: T1, T2, and a sequence typically used in deep brain stimulation (DBS). The DBS scan is an MP-RAGE with: TR , TI , TE , . With no manual delineations available, for evaluation we use automated segmentations produced by FreeSurfer on the T1 channel as ground truth for all modalities. This dataset enables evaluation on two new contrasts, T2 and DBS.
We compare our method SynthSeg with three other approaches:
Fully supervised network: we train a supervised U-Net on the 20 (whole brain, unprocessed) training images from the T1-39 dataset, aiming to assess difference in performance when testing on images of the same contrast (T1) acquired on the same and other platforms. The architecture and loss function are the same as for SynthSeg. As for SynthSeg, we employ data augmentation, including spatial deformations, gamma augmentation (i.e., ) and normalization of intensities. This supervised network can only segment T1 scans, so we refer to it as “T1 baseline”.
SAMSEG: based on the traditional Bayesian segmentation framework, SAMSEG [Puonti et al.(2016)Puonti, Iglesias, and Van Leemput] uses unsupervised likelihood distributions, and is thus fully contrast-adaptive. Like our method, SAMSEG can segment both unprocessed or skull-stripped scans. SAMSEG does not rely on neural networks, and thus does not require training, but instead employs an independent optimization for each scan requiring tens of minutes.
: we also analyze a variant of our proposed method, where the intensity parameters are representative of the test scans to be segmented. For each of the seven contrasts present in the training data (T2, PD, DSB, and four varieties of T1), we build a Gaussian hyperprior for the means and standard deviations of each label, using ground truth segmentations. At training, for every mini-batch we sample one of the seven contrasts, then sample the means and standard deviations for each class conditioned on the contrast. This variant enables us to compare the generation of unrealistic contrasts during training, against enforcing prior information on the target modalities, if available. An example of these more realistic synthetic images (conditioned on T1 contrast) is shown in fig:augm_realistic.
All CNN methods are trained on the training subset of T1-39, with our method variants only requiring the segmentation maps, whereas the supervised baseline also uses the T1 scans. We evaluate all approaches on the test subset of T1-39, as well as all of T1mix, T1-PD-8, and FSM. The T1 baseline is not tested on modalities other than T1, nor on T1-PD-8 because it cannot cope with skull stripped data. We assess performance with Dice scores, computed for a representative subset of 12 brain ROIs: cerebral white matter (WM) and cortex (CT), lateral ventricle (LV), cerebellar white matter (CW) and cortex (CC), thalamus (TH), caudate (CA); putamen (PU), pallidum (PA), brainstem (BS), hippocampus (HP), and amygdala (AM). We averaged results for contralateral structures.
tab:summary provides a summary of the methods and their runtime. fig:dice shows box plots for each ROI, method, and dataset, as well as averages across the ROIs, and fig:examples shows sample segmentations for every method and dataset. The supervised T1 baseline excels when tested on the test scans of T1-39, achieving a mean Dice of 0.89, and outperforming all the other methods for every ROI. However, when tested on T1 images from T1mix and FSM, we observe considerable decreases in performance (see for instance the segmentation of the T1 in FSM in fig:examples). This is likely due to the limited variability in the training dataset, despite the use of augmentation techniques, highlighting the challenge of variation in unprocessed scans from different sources, even within the same modality.
SAMSEG yields very uniform results across datasets of T1 contrasts, producing mean Dice scores within 3 points of each other. Being agnostic to contrast, it outperforms the T1 baseline outside its training domain. It also performs well for the non-T1 contrasts. Although the mean Dice scores are slightly lower than for the T1 datasets (which normally display better contrast between gray and white matter), they remain robust for every contrast and dataset with minimum mean Dice of 0.81.
SynthSeg also produces high Dice across all contrasts, slightly higher than SAMSEG (0.02 mean Dice improvement), while requiring a fraction of its runtime (tab:summary). The improvement compared to SAMSEG is consistent across structures, except the cerebellum. Compared to the T1 baseline, the mean Dice is 0.03 lower on the supervised training domain (T1-39), but generalizes significantly better to other T1 datasets, and can segment other MRI contrasts with little decrease in performance (minimum mean Dice is 0.83).
Importantly, SynthSeg-rule is outperformed by SynthSeg, and its Dice scores are also slightly lower than those produced by SAMSEG. This illustrates that adapting the parameters to a certain contrast is counterproductive, at least within our simple generative model: we observe consistent drops in performance across ROIs and datasets, despite injecting contrast-specific knowledge for each modality. This result is consistent with recent results in image augmentation [Chaitanya et al.(2019)Chaitanya, Karani, Baumgartner, Becker, Donati, and Konukoglu], and supports the theory that forcing the network to learn to segment a broader range of images than it will typically observe at test time improves generalization.
We presented a learning strategy for modality-agnostic brain MRI segmentation, which builds on classical generative models for Bayesian segmentation. Sampling a wide range of model parameters enables the network to learn to segment a wide variety of contrasts and shapes during training. At test time, the network can therefore segment neuroanatomy given an unprocessed scan of any contrast in seconds. While the network is trained in a supervised fashion, the only data required are a few label maps. Importantly, we do not require any real scans during training, since images are synthesized from the labels, and are thus always perfectly aligned – in contrast to techniques relying on manual delineations.
While a supervised network excels on test data from the same domain it was trained on, its performance quickly decays when faced with more variability, even within the same type of MRI contrast. We emphasize that this effect is particularly pronounced as we tackle the challenging task of segmentation starting with unprocessed scans. This is one reason why deep learning segmentation techniques have not yet been adopted by widespread neuroimaging packages like FreeSurfer or FSL, where fewer assumptions on the specific MRI contrast of the user’s data need to be made. In contrast, SynthSeg maintains accuracy across T1 variants as well as other MRI modalities.
In absolute terms, SynthSeg’s Dice scores are consistently high: higher than SAMSEG, and not far from supervised contrast-specific networks, like the T1 baseline or scores reported in recent literature [Roy et al.(2019)Roy, Conjeti, Navab, Wachinger, Initiative, et al.]. Compared with our recent article that uses a CNN to estimate the GMM and registration parameters of the Bayesian segmentation framework [Dalca et al.(2019a)Dalca, Yu, Golland, Fischl, Sabuncu, and Iglesias], the method proposed here achieves higher average Dice on T1 (0.86 vs 0.82) and PD datasets (0.83 vs 0.80). However, we highlight that direct comparison is not available due to differences in datasets: in this work, we could only use 19 subjects from T1-39 for evaluation. More importantly, our previous method requires significant preprocessing and modality-specific unsupervised re-training. This highlights the ability of our new method to segment any contrast without retraining or preprocessing; the latter eliminates the dependence on additional tools which can be computationally expensive and require manual tuning.
We believe that the proposed learning strategy is applicable to many generative models from which sampling can yields sensible data, even beyond neuroimaging. By greatly increasing the robustness of fast segmentation CNNs to a wide variety of MRI contrast, without any need for retraining, SynthSeg promises to enable adoption of deep learning segmentation techniques by the neuroimaging community.
This research was supported by the European Research Council (ERC Starting Grant 677697, project BUNGEE-TOOLS). Further support was provided in part by the BRAIN Initiative Cell Census Network grant U01MH117023, the National Institute for Biomedical Imaging and Bioengineering (P41EB015896, 1R01EB023281, R01EB006758, R21EB018907, R01EB019956), the National Institute on Aging (1R56AG064027, 5R01AG008122, R01AG016495), the National Institute of Mental Health the National Institute of Diabetes and Digestive and Kidney Diseases (1-R21-DK-108277-01), the National Institute for Neurological Disorders and Stroke (R01NS0525851, R21NS072652, R01NS070963, R01NS083534, 5U01NS086625,5U24NS10059103, R01NS105820), and was made possible by the resources provided by Shared Instrumentation Grants 1S10RR023401, 1S10RR019307, and 1S10RR023043. Additional support was provided by the NIH Blueprint for Neuroscience Research (5U01-MH093765), part of the multi-institutional Human Connectome Project. In addition, BF has a financial interest in CorticoMetrics, a company whose medical pursuits focus on brain imaging and measurement technologies. BF’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies.
Tensorflow: A system for large-scale machine learning.In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265–283, 2016.
Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm.IEEE transactions on medical imaging, 20(1):45–57, January 2001. ISSN 0278-0062. doi: 10.1109/42.906424.