Joint Segmentation and Landmark Localization of Fetal Femur in Ultrasound Volumes

08/31/2019 ∙ by Xu Wang, et al. ∙ Shenzhen University 0

Volumetric ultrasound has great potentials in promoting prenatal examinations. Automated solutions are highly desired to efficiently and effectively analyze the massive volumes. Segmentation and landmark localization are two key techniques in making the quantitative evaluation of prenatal ultrasound volumes available in clinic. However, both tasks are non-trivial when considering the poor image quality, boundary ambiguity and anatomical variations in volumetric ultrasound. In this paper, we propose an effective framework for simultaneous segmentation and landmark localization in prenatal ultrasound volumes. The proposed framework has two branches where informative cues of segmentation and landmark localization can be propagated bidirectionally to benefit both tasks. As landmark localization tends to suffer from false positives, we propose a distance based loss to suppress the noise and thus enhance the localization map and in turn the segmentation. Finally, we further leverage an adversarial module to emphasize the correspondence between segmentation and landmark localization. Extensively validated on a volumetric ultrasound dataset of fetal femur, our proposed framework proves to be a promising solution to facilitate the interpretation of prenatal ultrasound volumes.



There are no comments yet.


page 1

page 2

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Characterized with real-time imaging, low-cost and free of radiation, ultrasound imaging is a dominant modality in prenatal examinations. Larger field of view and more freedom for acquisition equip volumetric ultrasound with more compelling potentials than 2D ultrasound for fetal and maternal health evaluation. However, manually conducting quantitative analysis for the massive volumes is time-consuming and challenging in clinical workflow [Yang et al.2019].

Figure 1: Orthogonal slices of fetal femur in an ultrasound volume. Green curve denotes the segmentation ground truth, yellow dots denote two endpoints.

Segmentation and landmark localization are two key techniques for automatic analysis of prenatal ultrasound volumes. Segmentation provides volumetric measurements of fetal and maternal anatomical structures, which may be more comprehensive and accurate than 2D ones for fetal growth evaluation [Yang et al.2019]. Landmark localization can provide detailed descriptions of anatomical poses and geometries, which are helpful for further applications, such as standard plane detection [Li et al.2018] and atlas construction [Huang, Noble, and Namburete2018]. However, as the fetal femur shown in Fig. 1, conquering these two tasks is non-trivial. Firstly, volumetric ultrasound is often criticized by its poor image quality, such as speckle noise, acoustic shadow and low resolution. Two tips of the fetal femur are suffering from these factors and thus hard to be localized. Secondly, boundary deficiency and ambiguity often occur as results of shadow occlusion and low contrasts among tissues. Thirdly, the varying pose, shape and size of anatomical structures make it hard for automatic algorithms to capture the appearance variation in ultrasound volumes.

Intensive researches for segmentation and landmark localization in ultrasound volumes have been conducted. For segmentation, Feng et al. [Feng, Zhou, and Lee2012]

constructed boundary traces to extract fetal limb volume for weight estimation. Namburete et al.

[Namburete et al.2015] built a B-spline surface model to parameterize fetal skull volume. These shape models are robust against artifacts but have limited deformation ranges and are initialization-dependent. In [Yang et al.2019]

, a deep neural network based method was proposed for prenatal ultrasound volume segmentation. The result is promising but it is patch based and suffers from losing global shape information. Cerrolaza et al.

[Cerrolaza et al.2018] proposed to reconstruct fetal skull with conditional generative neural networks. However, the reconstruction may be limited to rigid objects with little deformation. For landmark detection, Huang et al. [Huang, Xie, and Noble2018] proposed to localize several fetal brain structures by projecting the 3D problem into a 2D task and solved it with deep networks. In [Huang, Noble, and Namburete2018], Huang et al. further reformulated the localization as a segmentation task. In [Namburete et al.2018], a branched deep network was proposed to segment fetal skull and localize fetal eyes to assist the alignment of fetal brain. However, the dependence between segmentation and landmark localization was not fully exploited in the work.

In this paper, we propose an end-to-end deep neural network to simultaneously tackle the segmentation and endpoint localization of fetal femur in prenatal ultrasound volumes. Volume and length of fetal femur have unique importance in fetal weight estimation [Feng, Zhou, and Lee2012]. The proposed framework takes two cross-connected network branches as a backbone, where informative cues of segmentation and landmark localization can be propagated bidirectionally to benefit each other. As femur landmark localization encounters more false positives than segmentation, we propose a distance based loss to exclude unreasonable predictions and thus suppress the noise to enhance the localization map. We find that improving femur landmark localization can in turn help improve the segmentation. Finally, to emphasize the correspondence between segmentation and landmark localization, we further leverage an adversarial module to serve as a constraint in a weakly supervised way. Extensive experiments prove our proposed framework as a promising solution to improve the efficiency and accuracy in analyzing fetal femur in ultrasound volumes.


Our proposed framework is shown in Fig. 2. Fetal femur ROI is firstly detected in the whole ultrasound volume. Segmentation and landmark localization branches receive the common features of ROI extracted by the shared layers, and then generate task-specific descriptors. Cross connections enable the communication between two branches. Our proposed distance loss is used to improve the landmark localization. Finally, an adversarial discriminator is connected to emphasize the correspondence between two branches.

Figure 2: Our framework for simultaneous segmentation (upper branch) and landmark localization (lower branch). Discriminator forces the segmentation and localization to match each other. Digits for number of feature maps.

ROI Detection Network

To reduce the searching area, we propose to build a basic U-net [Ronneberger, Fischer, and Brox2015] (denoted as Unet-ROI) to localize the ROI of fetal femur via segmentation. Limited by GPU memory, Unet-ROI is implemented in 2D (detailed layout in Fig. 3). 3

3 kernel for convolutional layers (Conv). A batch normalization (BN) layer and rectified linear unit (ReLU) follow the Conv. Segmentation performance of

Unet-ROI is reported in Table 1. For a testing volume, all the 2D bounding boxes of the segmented femur parts in slices are merged into a 3D bounding box. Our resulted bounding boxes achieve an average IoU in localizing the ground truth of femur areas. To completely cover the femur in testing volumes, the 3D bounding box is further augmented with 30 voxels on each dimension to form the final femur ROI. Average size of detected ROIs is 228124190.

Figure 3: Network details of the ROI detector for fetal femur localization.

Cross-connected Network Architecture

Receiving the cropped ROI as input, our segmentation and landmark localization branches also follow the U-net design (Fig. 2). The encoder path is shared by both branches, which provides basic feature hierarchies of the ROI. For segmentation branch, two long skip connections are established between encoder and decoder paths to enrich the semantic features and improve the segmentation details. Concatenation is used for the skip connections in merging features. Since landmark localization is more difficult than segmentation, we deepens the encoding path of localization branch to get a higher-level understanding of the input ROI. A skip connection is used in localization branch to fuse features before decoding. We use convolution kernels with size of 333 in all Conv layers.

Segmentation branch outputs probability maps of different classes and then the final segmentation, while localization branch outputs independent heatmap for each landmark (Fig.


). Femur landmark location is defined as the point with highest intensity in the heatmap. For the loss function, we take the hybrid loss (

) for segmentation [Yang et al.2017]. combines weighted cross entropy (wCross) and Dice Similarity Coefficient (DSC) based losses, i.e. . to balance the scales of two loss components. For the landmark localization, we use the L2-norm regression loss (), which is calculated between the predicted heatmaps () and Gaussian maps of landmark labels ().

Segmentation and landmark localization should be two closely related tasks, rather than be independently processed. Segmentation of anatomical structures can clearly define the region where landmarks may exist and thus discards a lot of candidates in background, while landmarks can explicitly describe the topology of anatomical structures and adds geometry constraints on segmentation. Sharing intermediate and complementary knowledge between tasks can hence benefit both. In this regard, inspired by [Cheng et al.2017], as shown in Fig. 2, we introduce the bidirectional crossed connections between the same semantic levels of two branches. Specifically, two connections convey the feature maps from segmentation branch to localization branch, while one connection from localization branch to segmentation branch. With the training on these two branches to minimize losses, the communicated feature maps can regularize the opposite branch and thus improve both.

Localization Constraint with Center Distance Loss

Because of the poor image quality of ultrasound volume, localizing anatomical landmarks in it is non-trivial and suffers from strong false positives. The task for fetal femur becomes more challenging since these two femur endpoints are very hard to be clearly differentiated from each other, especially in the case of lacking context information. For landmark predictions, severe overlap often occurs between the heatmaps of two landmarks. This kind of overlap can degrade the predictions in each channel and thus causes false positives, missed detection or two overlapped landmarks. Therefore, reducing the heatmap overlap and, at the same time, regularizing the distance between landmarks in a reasonable range, are straightforward to improve the localization. Based on these observations, we introduce a Center Distance (CD) based Loss () as an extra constraint on landmark localization. The formulation of is defined in Eq. 1,


where is the predicted heatmap of landmark , is an operator to get the coordinates of the voxel with highest value in heatmap. Minimizing can enlarge the distance between two heatmap peaks and hence reduces the heatmap overlap. To now, the loss function to train our branched network for both tasks is defined as Eq. 2, where , and are empirically set as 1.0, 0.2 and 0.5 to weight two branches, respectively.


Refinement with Adversarial Module

To further emphasize the correspondence between two tasks and force the outputs to match each other, we extend the branched network into an adversarial scheme for refinement. The core of adversarial training scheme is pushing the generator to produce outputs that can fool the discriminator, while discriminator classifies its input as real or fake

[Kazeminia et al.2018]. In our setting (Fig. 2), the branched network is the generator, while a 6-layer convolutional network serves as the discriminator (

). For the discriminator, kernel size is 4 in Conv layers. Stride in first two Conv layers is 2, the rest is 1. Facing with the varying input femur ROI dimension, a global average pooling layer is used before fully connection layer to unify the feature dimension. The generator outputs the concatenation of fetal femur segmentation and landmark heatmaps

and then inputs it to the as a fake pair. The fake pair is then classified by discriminator against the real pair of segmentation and landmark label ground truth (). Eq. 3 defines our adversarial loss (). Branched network has to enforce its segmentation and also landmark localization to match ground truth labels in order to minimize .


Since the predictions of the branched network at early epochs are rough and can be easily rejected by

, directly training the branched network from scratch under the adversarial scheme would adversely affect the performance. Thus, we firstly trained the branched network with to convergence for 16 epochs and then fine-tune it with under the composite loss of and for another 5 epochs.

DSC[%] 87.307.1 89.414.6 90.033.5 90.113.5 90.763.1 91.033.0
Jacc[%] 78.079.8 81.147.1 82.045.5 82.185.6 83.235.0 83.675.0
Adb[mm] 1.591.4 0.840.7 0.770.5 0.780.6 0.680.5 0.660.5
Hdb[mm] 9.057.4 4.983.3 4.693.0 5.043.4 4.272.6 4.082.7
Verr[mL] 5.522.3 3.041.7 2.281.2 1.891.3 1.791.3 1.601.1
Table 1: Comparison of fetal femur segmentation methods
p1[mm] 4.323.02 4.262.64 4.19 2.22 3.922.64 3.701.93
p2[mm] 4.572.13 4.462.88 4.052.60 4.282.78 4.262.53
Lerr[mm] 1.051.15 0.961.06 0.890.99 1.041.05 0.870.91
Table 2: Comparison of landmark localization and length errors

Experimental Results

Our method is validated on fetal femur ultrasound volumes. The dataset contains 50 annotated volumes with size of 416416284 and a voxel size of 0.380.380.38 . Approved by local Institutional Review Board, all volumes were anonymized and acquired by an experienced sonographer using a Sonoscope S50 ultrasound machine with an integrated 3D probe. The probe has a scan angle to ensure a complete scanning of the whole femur. Varying femur poses are allowed in scanning. The dataset covers gestational age from 23 to 31 weeks. An expert with 5-year experience manually delineated all volumes and annotated two femur tips as ground truth for segmentation and localization. 30 volumes are randomly selected for training and the rest as testing. Training set is further augmented with scaling, rotation and flipping to 840. Limited by GPU memory, the femur ROI is rescaled as 0.65 times. There is no pre-alignment for ROIs. Training and testing were run in a NVIDIA GeForce GTX TITAN X GPU (12GB).

Segmentation evaluation criteria include Dice Similarity Coefficient (DSC, %), Jaccard index(Jacc, %), Average Distance of Boundaries (Adb,

), Hausdorff Distance of Boundaries (Hdb, ) and absolute volume error (Verr, ). Absolute displacement () and absolute femur length error (Lerr, ) are used to evaluate the localization of femur tips p1 and p2. Fetal femur length is defined as the Euclidean distance between p1 and p2. Our basic branched network is denoted as BRN. The distance constraint involved version is BRND, the version further equipped with cross connections is BRNDC. The BRNDC trained in the adversarial scheme is denoted as BRNDC-D. Additionally, we implemented two basic U-nets as baselines for fair comparisons. Unet-S is for segmentation. It has the same encoder, decoder and skip connections as the segmentation branch of BRN. Unet-L is for localization. It has the same encoder, decoder and skip connections as the localization branch of BRN. We implemented all the compared methods in 3D fashion with TensorFlow.

(a) (b)
Figure 4: Bland-Altman plot between BRNDC-D and experts on femur volume measurement (a) and femur length measurement (b).
Figure 5: Visualizations of fetal femur segmentation and landmark localization on 6 testing cases. Results from BRNDC-D are compared with ground truth with respect to segmentation, localization and length measurement. Digits stand for femur length, AG for algorithm BRNDC-D, GT for ground truth.

Quantitative comparisons among methods are shown in Table 1 and 2. For segmentation, as implemented in 2D whole slice, Unet-ROI performs modest in accurately segmenting the femur among all methods. Our proposed branch architecture and modules successively improve the segmentation over the baseline Unet-S. The largest DSC improvements occur when branch design (BRN, 0.62%) and cross connections (BRNDC, 0.65%) are used. Adversarial training further contributes to another 0.3% improvement in DSC. Distance constraint (BRND) slightly improves DSC, and helps to reduce the volume measurement error for about 13%. BRNDC-D finally reduces the volume measurement error for about 50 percent. For landmark localization, both branch architecture (BRN) and distance constraint (BRND) promote the localization of two landmarks over Unet-L, especially the p2. Effectiveness of distance constraint is proved and may be extended to other cases, like [Huang, Noble, and Namburete2018]. Although improves the p1 localization and femur segmentation, cross connections (BRNDC) cause slight performance drop in localizing landmark p2. Localization of p2 may be interfered by the segmentation branch via cross connections. Enforcing the correspondence between two branch outputs, adversarial module (BRNDC-D) alleviates the problem and achieves a better balance between segmentation and localization branch. BRNDC-D achieves the best results in segmentation and localization on almost all metrics.

Fig. 4 shows the Bland-Altman plot to evaluate the agreement between BRNDC-D and experts on fetal femur volume and femur length measurement. As observed, high agreements are achieved in both plots since 95% of the measurements locate in the

1.96 standard deviation range. Visualizations of segmentation and landmark localization (Fig.

5) produced by BRNDC-D also show good alignment with ground truth.


In this paper, we present an effective framework for simultaneous segmentation and landmark localization in fetal femur ultrasound volumes. Promising segmentation and localization accuracy are achieved on the challenging tasks. We get a good starting point with a branched network to handle these two tasks. Informative cues of segmentation and landmark localization can be propagated bidirectionally through cross connections to benefit each other. The proposed distance based loss and adversarial training scheme suppress the false positives and enhance the localization and segmentation. Our framework is general and has potentials to be extended to similar tasks in volumetric ultrasound.


We would like to acknowledge all the volunteers who participated in this research and the expert who made great effort. This work was supported in part by the National Natural Science Foundation of China under Grant 61571304, Grant 81571758, and Grant 61501305, and in part by the National Key Research and Development Program of China under Grant 2016YFC0104703, and a grant from Hong Kong Research Grants Council, under General Research Fund (Project No. 14225616)..


  • [Cerrolaza et al.2018] Cerrolaza, J. J.; Li, Y.; Biffi, C.; Gomez, A.; Sinclair, M.; Matthew, J.; et al. 2018. 3d fetal skull reconstruction from 2dus via deep conditional generative networks. In MICCAI, 383–391. Springer.
  • [Cheng et al.2017] Cheng, J.; Tsai, Y.-H.; Wang, S.; and Yang, M.-H. 2017. Segflow: Joint learning for video object segmentation and optical flow. In ICCV, 686–695. IEEE.
  • [Feng, Zhou, and Lee2012] Feng, S.; Zhou, K. S.; and Lee, W. 2012. Automatic fetal weight estimation using 3d ultrasonography. In Medical Imaging 2012: Computer-Aided Diagnosis, volume 8315, 83150I. International Society for Optics and Photonics.
  • [Huang, Noble, and Namburete2018] Huang, R.; Noble, J. A.; and Namburete, A. I. 2018.

    Omni-supervised learning: Scaling up to large unlabelled medical datasets.

    In MICCAI, 572–580. Springer.
  • [Huang, Xie, and Noble2018] Huang, R.; Xie, W.; and Noble, J. A. 2018. Vp-nets: Efficient automatic localization of key brain structures in 3d fetal neurosonography. Medical image analysis 47:127–139.
  • [Kazeminia et al.2018] Kazeminia, S.; Baur, C.; Kuijper, A.; van Ginneken, B.; Navab, N.; Albarqouni, S.; et al. 2018. Gans for medical image analysis. arXiv preprint arXiv:1809.06222.
  • [Li et al.2018] Li, Y.; Khanal, B.; Hou, B.; Alansary, A.; Cerrolaza, J. J.; Sinclair, M.; et al. 2018. Standard plane detection in 3d fetal ultrasound using an iterative transformation network. arXiv preprint arXiv:1806.07486.
  • [Namburete et al.2015] Namburete, A. I.; Stebbing, R. V.; Kemp, B.; Yaqub, M.; Papageorghiou, A. T.; and Noble, J. A. 2015. Learning-based prediction of gestational age from ultrasound images of the fetal brain. Medical image analysis 21(1):72–86.
  • [Namburete et al.2018] Namburete, A. I.; Xie, W.; Yaqub, M.; Zisserman, A.; and Noble, J. A. 2018. Fully-automated alignment of 3d fetal brain ultrasound to a canonical reference space using multi-task learning. Medical image analysis 46:1–14.
  • [Ronneberger, Fischer, and Brox2015] Ronneberger, O.; Fischer, P.; and Brox, T. 2015. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 234–241. Springer.
  • [Yang et al.2017] Yang, X.; Bian, C.; Yu, L.; Ni, D.; and Heng, P.-A. 2017. Hybrid loss guided convolutional networks for whole heart parsing. In STACOM, 215–223. Springer.
  • [Yang et al.2019] Yang, X.; Yu, L.; Li, S.; Wen, H.; Luo, D.; Bian, C.; Qin, J.; Ni, D.; and Heng, P.-A. 2019. Towards automated semantic segmentation in prenatal volumetric ultrasound. IEEE transactions on medical imaging 38(1):180–193.