PADDIT: Probabilistic Augmentation of Data using Diffeomorphic Image Transformation

10/03/2018 ∙ by Mauricio Orbes-Arteaga, et al. ∙ Københavns Uni 4

For proper generalization performance of convolutional neural networks (CNNs) in medical image segmentation, the learnt features should be invariant under particular non-linear shape variations of the input. To induce invariance in CNNs to such transformations, we propose Probabilistic Augmentation of Data using Diffeomorphic Image Transformation (PADDIT) -- a systematic framework for generating realistic transformations that can be used to augment data for training CNNs. We show that CNNs trained with PADDIT outperforms CNNs trained without augmentation and with generic augmentation in segmenting white matter hyperintensities from T1 and FLAIR brain MRI scans.



There are no comments yet.


page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Description of purpose

The availability of training data in medical image segmentation problems is often very limited. In such cases, convolutional neural networks (CNNs) tend to overfit due to a lack of feature generalization to variations in shapes and appearance, and over parameterization. In order to address generalization, one has to find models that generate features equivariant or invariant under different transformations of the input. Equivariance of feature maps generated by CNNs to certain transformations can be obtained by using group convolutions [cohen16] where different orientations of the features maps are learnt by kernels with shared weights. However, the resulting equivariance is restricted to only linear and symmetric transformations. In order to reach generalization across a large group of transformations one has to rely on data augmentation.

Data augmentation is commonly achieved by applying transformations that generate warped versions of the available training data. The choice of the transformation in literature so far has been fairly arbitrary – often restricted to rotations, translations, reflections, and very small nonlinear deformations [roth2015, NIPS2015_5854, hauberg2016]. Some degree of learning the right kind of transformations needed to improve the network performance was introduced in [NIPS2015_5854]. Hauberg et al. [hauberg2016] proposed a diffeomorphic registration approach where the distribution of transformations are learnt by explicitly constructing it by performing pair-wise registration of

more similar images. Once the distribution is constructed, new transformations are sampled and applied to the training data to obtain data for augmentation. However, variations captured by the probability density function (pdf) of transformations is strongly dependent on the choice of

. One may circumvent this dependency by registering all the possible pairwise images in the dataset, however, it is computationally expensive and images that cannot be plausibly registered may induce transformations that are not meaningful.

In order to obtain a model that produces transformations that capture shape variations in training data automatically, we propose Probabilistic Augmentation of Data using Diffeomorphic Image Transformation (PADDIT). PADDIT involves an unsupervised approach to learn shape variations that naturally appear in the training dataset. This is done by first constructing an unbiased template image that represents the central tendency of shapes in the training dataset. We sample – using a Hamiltonian Monte Carlo (HMC) scheme [duane1987] – transformations that warp the training images to the generated mean template. The sampled transformations are used to perturb training data which is then used for augmentation. We use convolutional neural networks (CNNs) to segment T1/FLAIR brain magnetic resonance images (MRI) for white matter hyperintensities. We show that PADDIT outperforms CNN methods that use either no data augmentation or limited augmentation (using random B-spline transformations).

2 Methods

Probabilistic Bayesian models for template estimation in registration was introduced by

[zhang2013], albeit using a different class of transformations. In short, the method views image registration as a maximum a posteriori (MAP) problem where the similarity between two images (, ) is the likelihood. The transformations are (lie group exponential of a time-constant velocity field ) regularized by a prior which is in the form of a norm attached to velocity field. Formally, it is a minimization of the energy


The norm on the vector field is generally induced by a differential operator. However, we directly choose a kernel inducing a reproducing kernel Hilbert space to parameterize the velocity field 

[Pai:2016dz]. Given a finite set of kernels, the regularization takes the form, where are the vectors attached to each spatial kernel, and is the spatial position of each kernel .

Using the distance metric between two images (minimization of (1)), one can formulate template estimation as a Fréchet mean estimation problem. In other words, given a set of images (or observations) , the atlas is the minimization of the sum-of-squared distances function

Since (1) is viewed as a MAP problem, the velocity fields are considered as latent variables, i.e.,

, a normal distribution with zero mean and covariance

derived from a kernel function. In the presence of latent variables, the template estimation is posed as an expectation maximization (EM) problem. Further, for simplicity, we assume an i.i.d. noise at each voxel, with a likelihood term (for each

observation) given by


where are the parameters to be estimated via MAP;

is the noise variance,

is the mean template, and is the number of voxels. Each observation can be viewed as a random variation around a mean (). The prior on the velocity field may be defined in terms of the norm as:

Estimating the posterior distribution involves the marginalization of it over the latent variables. This is computationally intractable due to the dimensionality of . To solve this, Hamiltonian Monte Carlo (HMC) [neal2011] is employed to sample velocity field for marginalization. The posterior distribution to draw number of samples from is


The sampled velocity fields ( of the image) are used in an EM algorithm to estimate an optimal . A single-scale Wendland kernel [Pai:2016dz]

is used to parameterize the velocity field and construct the covariance matrix for regularization. Once a template is estimated, the posterior distribution is sampled for a set of velocity fields for each training data. To induce more variations, the velocity fields are randomly integrated between 0 and 1. The training samples are deformed with cubic interpolation for the image, and nearest neighbor interpolation for the atlas to create the new set of synthetic data. The input (for one image as an example) to the deep-learning network will be of the form

where is the number of augmentations and is the label of input image . Note that the label is a segmentation assigning a class to each voxel and is transformed using the same transformation accordingly.

3 Experiments and Results

We considered CNNs based on a U-net architecture in our experiments. To evaluate the proposed method, the performance of CNNs trained with data augmentation using PADDIT was compared to training without augmentation and training with augmentation using deformations based on random B-splines – we call this method the baseline. The above-mentioned strategies were applied to White Matter Hyperintensities (WMH) segmentation from FLAIR and T1 MRI scans. To this end, we use the training dataset from the 2017 WMH segmentation MICCAI challenge 111 The set is composed of T1/FLAIR MRI scans and manual annotations for WMH from 60 subjects. The dataset was split into a training(30), validation(5) and testing(10) set. For each method two different deformed versions of each training case were created, i.e the training set size was tripled.

The Random deformations for the baseline were obtained by using a deformation field defined on a grid with Cp number of control points and B-spline interpolation. The size of deformation was controlled by adding Gaussian noise with

mean and standard deviation

Sd. We evaluate the impact of Cp and Sdhyperparameters, specifically we tried: and .

Figure 1 shows examples of the obtained deformed versions of a FLAIR scan from one subject from the training dataset. As can be observed, both methods generated new shapes for WMHs regions. It is worth noting, however, that images provided by PADDIT look more realistic and without drastic alterations to the Brain. In contrast, those obtained using random B-spline deformations exhibit some aberrations in cortical and ventricular structures depending on the size of the deformation used.

Sd:2 Sd:4 Sd:6


Cp: 4


Cp: 8


Cp: 16
Figure 1: Example of generated deformations. The first column shows the original FLAIR image, and the two deformed versions using PADDIT. Remaining columns show different configurations used to get the random B-spline based deformations

Figure 2

, shows the dice performance at each epoch on the validation and testing set. It is worth noting that PADDIT achieved higher accuracy than training with random B-spline deformations as well as training without augmentation. Also, it can be noted that random B-spline deformations did not provide a consistent improvement compared to the training without data augmentation.

For the final assessment of PADDIT, the validation data was used for early stopping. The final evaluation of each method is carried out on the testing set using the network configuration at the epoch where it showed the highest accuracy on the validation set. The best configuration for random deformations was achieved using and For PADDIT, the control points were placed every 8 voxels. Results for evaluation on the testing set are summarized in Table 3. Our proposed method PADDIT achieved higher dice accuracy compared to the network performance without data augmentation and compared to the baseline data augmentation approach (best configuration). (both differences where statistically significant ())

Figure 2: Performance on the validation and testing set for each method. Dice is computed at each epoch

Non Data


Rd Cp: 4


Rd Cp: 4


Rd Cp: 4


Rd Cp: 8


Rd Cp: 8


Rd Cp: 8


Rd Cp: 16


Rd Cp: 16


Rd Cp: 16



Dice (mean) 0.66321 0.6628 0.6347 0.6661 0.6452 0.6438 0.6566 0.6358 0.6587 0.6535 0.6813
Dice (std) 0.24829 0.2260 0.2466 0.2274 0.2347 0.2403 0.2327 0.2457 0.2238 0.2341 0.2185
Table 3: Segmentation accuracy for all the assessed strategies, the highest dice score achieved by the random B-spline deformation approach is underlined

4 New or breakthrough work to be presented

Even though several configurations of random transformations generated realistic looking images, they were not necessarily useful in CNN training. On the other hand, the best configuration of random transformations generated images that were not necessarily biologically plausible. We hypothesize that such noisy data may help the optimization to find better minimums. However, one has to be careful in choosing the configuration of transformations since other configurations with a higher magnitude of deformations had a negative effect on the training. In the case of PADDIT, one need not worry about the transformation configuration too much since the method learns the right transformation needed to capture the shape variations in the data set. Hence, the resulting synthetic images were both realistic and useful for CNN training.

5 Conclusion

The proposed probabilistic augmentation approach PADDIT, proved to be an effective way to increase the training set by generating new training images which improve the segmentation performance of CNN’s based approaches. From the results it is evident that the network trained with PADDIT performed statistically significantly better than the networks with either no data augmentation or random B-splines based augmentation.

This work has not been submitted for publication or presentation elsewhere.


This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 721820. We would like to thank both Microsoft and NVIDIA for providing computational resources on the Azure platform for this project.