Towards segmentation and spatial alignment of the human embryonic brain using deep learning for atlas-based registration

05/13/2020 ∙ by Wietske A. P. Bastiaansen, et al. ∙ Erasmus MC 0

We propose an unsupervised deep learning method for atlas based registration to achieve segmentation and spatial alignment of the embryonic brain in a single framework. Our approach consists of two sequential networks with a specifically designed loss function to address the challenges in 3D first trimester ultrasound. The first part learns the affine transformation and the second part learns the voxelwise nonrigid deformation between the target image and the atlas. We trained this network end-to-end and validated it against a ground truth on synthetic datasets designed to resemble the challenges present in 3D first trimester ultrasound. The method was tested on a dataset of human embryonic ultrasound volumes acquired at 9 weeks gestational age, which showed alignment of the brain in some cases and gave insight in open challenges for the proposed method. We conclude that our method is a promising approach towards fully automated spatial alignment and segmentation of embryonic brains in 3D ultrasound.



There are no comments yet.


page 3

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Ultrasound imaging is prominent in prenatal screening since it is noninvasive, real-time, safe, and has low cost compared to other imaging modalities [11]. However the processing of ultrasound data is challenging due to low image quality, high variability of positions and orientations of the embryo, and the presence of the umbilical cord, placenta, and uterine wall. We propose a method to spatially align and segment the embryonic brain using atlas-based image registration in one unsupervised deep learning framework.

Learning based spatial alignment and segmentation in prenatal ultrasound has been addressed before. In Namburete [12] a supervised multi-task approach was presented, which employed prior knowledge of the orientation of the head in the volume, annotated slices, and manual segmentations of the head and eye. Spatial alignment and segmentation was achieved on fetal US scans acquired at 22 till 30 weeks gestational age. Atlas-based registration was proposed by Kuklisova-Murgasova [10] where a MRI atlas and block matching was used to register ultrasound images of fetuses of 23 till 28 week gestational age. Finally Schmidt [14] proposed a CNN and deformable shape models to segment the abdomen in 3D fetal ultrasound. All these works focus on ultrasound data acquired during the second trimester or later and rely on manual annotations. Ground truth segmentations for our application were not available and are laborious to obtain, which motivated our unsupervised approach.

Developing methods for processing of ultrasound data acquired during the first trimester is of great clinical relevance, since the periconception period (14 weeks before till 10 weeks after conception) is of crucial importance for future health [16]. Therefore our method is developed for first trimester ultrasound.

Recently there has been quite some attention for unsupervised deep learning approaches for image registration, since these methods circumvent the need for manual annotations. Several methods were developed to learn dense nonrigid deformations under the assumption that the data is affinely registered [2, 17]. Employing multi-level or multi-stage methods, affine registration can also be included [7, 8, 4]. The framework presented here is based on the method presented in [2] and follows the idea of [7, 8, 4] to dedicate part of the network to learn the affine transformation.

To the best of our knowledge this is the first work that addresses the development of a framework for the alignment and segmentation of the embryonic brain, captured by ultrasound during the first trimester, applying unsupervised deep learning methods for atlas-based registration. Segmentation and alignment are important preprocessing steps for any image analysis task, hence this method contributes to our ultimate goal: further improve precision medicine of human brain disorders from the earliest moment in life.

2 Method

Let and be two images defined in the - spatial domains , with the target image and the atlas. Both images contain single-channel grayscale data. Assume that is in standard orientation and the segmentation is available. Our aim is to find two deformations and such that:


where is an affine transformation and a voxelwise nonrigid deformation.

To obtain and

a convolutional neural network (CNN) is used to model the function

: , with the network parameters. The affine transformation is learned as a -dimensional111For , and for , vector containing the coefficients of the affine transformation matrix . The voxelwise nonrigid deformation is defined as a displacement field with .

Figure 1 provides an overview of our method. The input of the network is an image pair consisting of the atlas and target image . The first part of the network outputs and the affine registered image . The input of the second part is the affinely registered image together with atlas . The final output of the network consists of , , along with the registerd and segmented target image and the affinely registered image 222Note that is not segmented, since this is an intermediate result..

Since this is an unsupervised method no ground truth deformations are used for training. The parameters are found by optimizing the loss function on the training set. The proposed loss function is described in the next section. After training, a new image can be given to the network together with the atlas to obtain the registration.

Figure 1:

Architecture of our network. Light blue: convolutional layers with a stride of 2 (encoder). Green: convolutional layers with stride of 1, skip-connection, up-sampling layer (decoder). Purple: fully connected layers with 500 neurons and ReLU activation. Dark blue: convolutional layers at full resolution. Orange:

, red: . All convolutional layers have a kernel size of 3 and have a LeakyReLU with parameter .

2.1 Network architecture

The target image and atlas are fed to the network as a two-channel image. The first part of the network consists of an encoder where the images are down-sampled, followed by a global average pooling layer. The global average pooling layer outputs one feature per feature map, which forces the network to encode position and orientation globally, and is followed by fully connected layers. The output layer consists of the entries of the affine transformation matrix . The architecture of the second part of the network is the same as Voxelmorph [2] and consists of an encoder and decoder and convolutional layers at full resolution. The output layer contains the dense displacement field .

The method is implemented using Keras


with Tensorflow backend

[1]. The ADAM optimizer is used with a learning rate of

. Each training batch consist of one pair of volumes and by default we use 500 epochs.

2.2 The loss function

The loss function is defined as follows:


The first term promote intensity based similarity between the atlas and the deformed image, the second and third therm regularize and respectively . Each term is discussed in detail below.

Since in 3D first trimester ultrasound there are other objects in the volumes besides the brain, the similarity terms are only calculated within the region of interest defined by segmentation of the atlas . is chosen as either the mean squared error (MSE) or cross-correlation (CC). They are defined as follows:


where is the number of nonzero elements in , unless stated otherwise , the subscript indicates segmented, and denote: , where iterates over a volume around with as in [2].

Image registration is an ill-posed problem; therefore regularization is needed. is regularized by:


which penalizes local spatial variations in to promote smooth local deformations [5].

Initial experiments revealed that, when objects in the background of the target image are present, the affine transformation degenerate towards extreme compression or expansion. To prevent this, extreme zooming is penalized as regularization for . The zooming factors must be extracted for

. This is done using the Singular Value Decomposition (SVD)

[6], which states that any square matrix can be decomposed in the following way:


where the diagonal matrix contains non-negative real singular values representing the zooming factors. The scaling loss is defined as:


with an -dimensional vector containing ones.

For and the optimal values must be chosen. This is addressed in the experiments.

3 Data

The following three datasets were used in the experiments.

3.1 Synthetic 2D dataset 1

To develop and validate our method against a ground truth, we created two synthetic 2D datasets. These synthetic datasets were created by affinely transforming and nonrigidly deforming the synthetic atlas. As synthetic atlas the Shepp-Logan phantom [15] is used, which was nonrigidly deformed. The first dataset was created by first applying a random affine transformation on the atlas, followed by a nonrigid deformation .

The coefficients for the affine transformation matrix were drawn as follows: translation coefficients , pixels, rotation angle degrees, anisotropic zooming factors , , and shear stress in the x direction degrees. The nonrigid deformation was generated using a normalized random displacement field , were defines the magnitude of the displacement. The smoothness of is controlled using

, representing the standard deviation of the Gaussian, which was convolved with

. We used , and .

3.2 Synthetic 2D dataset 2

The second synthetic dataset was created in the same manner as the first, with additionally a background consisting of ellipses which have a random size and orientation. The ellipses are around, behind and adjacent to the synthetic atlas, to mimic the presence of the uterine wall around the embryo, and the body of the embryo attached to the head. Both datasets contain 3000 training, 100 validation and 100 test images.

3.3 3D ultrasound data: Rotterdam Periconceptional cohort

The Rotterdam Periconceptional Cohort (Predict study) is a large hospital-based cohort study embedded in tertiary patient care of the department of Obstetrics and Gynaecology, at the Erasmus MC, University Medical Center Rotterdam, the Netherlands. This prospective cohort focuses on the relationships between periconceptional maternal and paternal health and fetal growth development, and underlying (epi)genetics [16].

Scans collected at 9 weeks gestational age were used as proof of concept for our method. The image chosen as atlas was put in standard orientation and had sufficient quality to segment the embryo and brain semi-automatically using Virtual Reality [13]

. There were 170 3D ultrasound scans available with sufficient quality, 140 are used for training and 30 for testing. All scans were padded with zeros and re-scaled to

voxels to speed up training.

Since 140 scans is not sufficient for training, data augmentation was applied. When considering a 2D slice, the embryo is either visible in the coronal, saggital, or axial view. To keep this property during augmentation, first an axis was selected at random and a rotation was applied of either , or degrees. Subsequently a random rotation on this axis was applied between and degrees followed by a translation and anisotropic zooming . Each volume was augmented 30 times and this resulted in 4340 images for training.

4 Experiments

To validate our method three experiments are performed.

  1. Comparison with Voxelmorph [2] on synthetic dataset 1 and . Goal: evaluate influence of adding a dedicated part of the network for affine registration on images where the object of interest has a wide variation in position and orientation.

  2. Evaluation of hyperparameters in loss function Eq. (

    2) on synthetic dataset 2 and . Goal: set and in the presence of objects in the background.

  3. Testing method on 3D ultrasound data acquired at 9 weeks gestational age with and different types of atlases as input for the network. is used, since it is well known that the cross-correlation is more robust to intensity variations and noise

The main difference between the synthetic data and ultrasound data is that for the synthetic data the atlas is the only object with a clear structure, while the ultrasound data is noisy and more structures similar to the embryonic brain are present, for example the body of the embryo. The body of the embryo is also a prominent round structured shape. To address this, in the third experiment the influence of using an atlas containing the whole embryo versus only the brain is evaluated. Using the atlas containing the whole embryo as input gives more information for alignment. However we aim at registering only the brain, since this is our region of interest and registering the whole embryo introduces new challenges due to movement and wide variation in position of the limbs. To focus on registration of the brain, in Eq. 4 is adjusted by assigning twice as much weight to the loss calculated in voxels that are part of the brain.

4.1 Evaluation

In the synthetic case the Target Registration Error (TRE) was calculated, which was defined as the mean Euclidean distance between for in the set of evaluation points:


where the evaluation points mark the boundary of the shape and important internal structures. The TRE is given in pixels.

In the case of real ultrasound data we visually asses the quality of alignment in the 30 test images. The following scoring is used: 0: fail, 1: correct orthogonal directions, 2: brain and atlas overlap, 3: alignment. Where score 1 indicates the network was able to detect the correct plane, score 2 indicates the network was able to map the brain to the atlas and 3 indicates successful alignment.

5 Results

In the first experiment we compared our method with Voxelmorph [2] on the first synthetic dataset. The experiment was done for different values of with . Table 1 shows that with the architecture of Voxelmorph it was not possible to capture the global transformation needed. This is also illustrated by row one in Fig. 2. Using our method a small TRE was achieved for both the train and validation set, see row 2 of Fig. 2 for an example. Setting gave a TRE of pixels on the test set, which is comparable to the result on the train and validation set.

Voxelmorph Our method
Train Validation Train Validation Test
0.05 34.27 (12.10) 34.87 (11.35) 3.46 (6.86) 4.25 (8.35) -
0.2 34.15 (12.85) 35.23 (12.24) 2.71 (5.80) 3.63 (7.25) -
0.8 40.40 (12.67) 42.12 (11.80) 2.20 (0.77) 3.10 (1.78) 2.71 (1.67)
3.2 - - 32.61 (34.07) 35.60 (33.25) -
Table 1: Performance on first synthetic dataset using Voxelmorph [2] and our method for different values of . TRE is expressed in pixels, standard deviation between brackets.

In the second experiment we evaluated how to deal with objects in the background by penalizing extreme zooming. In Tab. 2 one can find the results for and and for different values of . Setting too high restricts the network to much, setting this value too low causes extreme scaling. The best result on the validation set was found for and , using this model to register the test set gave a TRE of pixels, which is again comparable to the result for the training and validation set. An example can be found in row three of Fig. 2.

In the third experiment we evaluated our method on real ultrasound data, for different combinations of atlases as input to the two parts of the network. The results are shown in Tab. (3). Using the atlas of the whole embryo gives the best results, since the network has more information for alignment. Figure 3 gives an impression of the resulting registrations. Note that the images that are marked as aligned are not perfectly registered, this is caused by the fact that the network still roughly misaligned most images and therefore voxelwise alignment is not learned.

Train Validation Test
0.2 0 4.02 (8.26) 5.43 (11.17) -
0.8 0 2.17 (3.64) 2.74 (2.30) -
0.2 0.004 3.17 (3.08) 3.26 (1.46) -
0.8 0.004 2.36 (3.53) 2.45(3.53) 2.90 (1.97)
0.2 0.008 6.99 (10.26) 6.25 (7.52) -
0.8 0.008 2.47 (3.35) 2.53 (1.10) -
Table 2: Target registration error for different hyperparameter settings of the loss function. TRE is expressed in pixels. The standard deviation is given between brackets.
Figure 2: Visual result for experiment 1 and 2, in case of Voxelmorph architecture, for our method and the atlas.
Part 1 Part 2 0 1 2 3
Brain Brain 21 7 2 0
Embryo Brain 10 14 5 1
Embryo Embryo 8 14 5 3
Table 3: Performance on ultrasound data for different type of atlas. Scoring: 0: fail, 1: correct orthogonal directions, 2: brain and atlas overlap, 3: alignment.
Figure 3: Same slice for: a) ultrasound atlas, b) example of image after alignment with score 1, c) example of image after alignment with score 2, d,e): example of successfully affine aligned images with score 3. Red line indicates correct boundaries of the brain after alignment.

6 Conclusion

In this work we extended existing deep learning methods for image registration to developed an atlas-based registration method to align and segment the embryonic brain. Main extensions are the dedicated part of the network for affine registration and the loss function (2). For validation, synthetic 2D datasets containing a ground truth were used. These experiments showed that our method can deal with the wide variation in position and orientation and with simple objects in the background.

The final experiment using real 3D ultrasound data acquired during the first trimester showed that our method is not robust enough to align and segment the embryonic brain. The importance of the atlas was evaluated and it turns out that using an atlas of the whole embryo improves results slightly, since it gives more information. This information is needed since the images are noisy, have artefacts and the embryonic brain is small (on average only of the volume). Another drawback is that the ultrasound images were rescaled to one-fourth of the original size and during registration the image is resampled twice which makes the deformed image blurry and this has influence on the calculated loss function. The rescaling was done to speed up training.

Another way to speed up training, is to train in two stages. The second part of the network learning the voxelwise registration, can only learn useful features when the images are already roughly aligned. So training first the affine part of the network is more efficient, since from the start the second part can then learn useful features for voxelwise alignment. This will be explored in the future.

Finally, we aim to extend our method to be applicable to the entire first trimester, to enable spatio-temporal modeling of the embryonic brain. This extension can be made by training different networks for each period. Another natural extension is multi-atlas image segmentation [9], both for networks trained within a certain period to get more robust results, or with a set of atlases covering the whole first trimester.


  • [1] M. Abadi and et al. (2016)

    TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

    External Links: Link, 1603.04467 Cited by: §2.1.
  • [2] G. Balakrishnan, A. Zhao, M. R. Sabuncu, J. Guttag, and A. V. Dalca (2019) VoxelMorph: A Learning Framework for Deformable Medical Image Registration. IEEE Transactions on Medical Imaging 38 (8), pp. 1788–1800. Cited by: §1, §2.1, §2.2, item 1, Table 1, §5.
  • [3] Chollet F., and et al (2015) Https:// Cited by: §2.1.
  • [4] B. D. de Vos and et al. (2019) A deep learning framework for unsupervised affine and deformable image registration. Medical Image Analysis 52, pp. 128–143. Cited by: §1.
  • [5] B. Fischer and J. Modersitzki (2002) Fast diffusion registration. In AMS Contemporary Mathematics, Inverse Problems, Image Analysis and Medical Imaging, Vol. 313, pp. 117–129. Cited by: §2.2.
  • [6] G. H. Golub and C. Reinsch (1971) Singular value decomposition and least squares solutions. In Linear Algebra, pp. 134–151. Cited by: §2.2.
  • [7] A. Hering, B. van Ginneken, and S. Heldmann (2019) mlVIRNET: Multilevel Variational Image Registration Network. External Links: 1909.10084, Link Cited by: §1.
  • [8] Y. Hu and et all. (2018) Weakly-supervised convolutional neural networks for multimodal image registration.. Medical image analysis 49, pp. 1–13. Cited by: §1.
  • [9] J. E. Iglesias and M. R. Sabuncu (2015) Multi-atlas segmentation of biomedical images: a survey. Medical Image Analysis 24 (1), pp. 205–219. Cited by: §6.
  • [10] M. Kuklisova-Murgasova and at all. (2013) Registration of 3D fetal neurosonography and MRI. Medical Image Analysis 17 (8), pp. 1137–1150. Cited by: §1.
  • [11] S. Liu and et al. (2019) Deep Learning in Medical Ultrasound Analysis: A Review. Engineering 5 (2), pp. 261–275. Cited by: §1.
  • [12] A. I.L. Namburete and et al. (2018) Fully-automated alignment of 3D fetal brain ultrasound to a canonical reference space using multi-task learning. Medical Image Analysis 46, pp. 1–14. Cited by: §1.
  • [13] M. Rousian and et al. (2018) Virtual reality imaging techniques in the study of embryonic and early placental health. Placenta 64, pp. S29 – S35. Cited by: §3.3.
  • [14] A. Schmidt-Richberg and et all. (2017) Abdomen segmentation in 3D fetal ultrasound using CNN-powered deformable models. In Fetal, Infant and Ophthalmic Medical Image Analysis, pp. 52–61. Cited by: §1.
  • [15] L. A. Shepp and B. F. Logan (1974-06) The fourier reconstruction of a head section. IEEE Transactions on Nuclear Science 21 (3), pp. 21–43. External Links: ISSN 1558-1578 Cited by: §3.1.
  • [16] R. P.M. Steegers-Theunissen and et al. (2016) Cohort profile: the rotterdam periconceptional cohort (predict study). International Journal of Epidemiology 45, pp. 374–381. Cited by: §1, §3.3.
  • [17] X. Yang, R. Kwitt, M. Styner, and M. Niethammer (2017) Quicksilver: Fast predictive image registration – A deep learning approach. NeuroImage 158, pp. 378–396. External Links: ISSN 10959572 Cited by: §1.