Cardiovascular diseases are the leading cause of mortality and morbidity globally. Echocardiogram is a non-invasive ultrasonic imaging test that exams the heart in motion. Because of the relatively low cost and ease of access, it has become a routinely used diagnostic tool in cardiology. Echocardiogram provides a wealth of information. Besides the assessment of the geometry and pumping capacity of the heart, echocardiogram has been used for the evaluation of the global and regional myocardial contractibility using speckle tracking techniques .
The use of contrast agents in echocardiogram has been proposed to further improve the imaging quality of echocardiogram. The advantages of using contrast include 1) a sharper demarcation between the left ventricular cavity and the myocardium, and 2) easier detection of sludges and thrombi in the heart. In addition, a high-power ultrasound beam produces a swirling pattern in the contrast-enhanced blood pool, which facilitates the visualization of the ventricular flow vortex. Unlike the conventional color Doppler technique, which estimates the flow velocity along a specific angle, the vortex imaging in contrast echocardiogram enables flow visualization and quantification in 2D, which is an important hemodynamic indicator of the heart’s pumping performance and efficiency. In this work, we aim to achieve dense tracking of the myocardium in non-contrast echocardiogram and the blood flow in contrast echocardiogram, in an effort to make echocardiogram a one-stop shop for both myocardial contractibility assessment and ventricular flow pattern analysis.
There are many previous methods on deformable registration that can be utilized for dense tracking in echocardiograms . Traditional registration methods are based on the optimization in registration field space such as elastic-type models [5, 18], FFD , Demons  and statistical parametric mapping . Diffeomorphic transformations preserve topology and many methods are derived from them such as LDDMM  and SyN 
. The optimization of these traditional methods typically require substantial time. Deep learning based registration methods usually rely on ground truth of registration field[15, 19]. Recent unsupervised deep learning based registrations, such as VoxelMorph 
, are facilitated by the spatial transformer network[21, 12], and the VoxelMorph is further extended to diffeomorphic transformation and Bayesian framework . Adversarial similarity network adds an extra discriminator and uses adversarial training to improve the unsupervised registration . These purely learning based methods cannot be directly applied to ultrasound images for velocity estimation especially for vortex detection in cardiac blood flow because of great noise in echocardiogram, large velocity variations of cardiac blood flow and large amounts of missing and new blood within the ultrasound plane.
Inspired by the improvement of multi-scale registration and neural network parameterized optimization, we propose a neural self-supervised optimization based multi-scale framework for dense tracking in echocardiograms as illustrated in Fig. 1. The neural self-supervised optimization yields accurate velocity field estimation by eliminating the gap between the estimation on the training set and that on the test set and alleviating the optimization difficulties such as local minima. Multi-scale strategy provides a sequential optimization pathway to the flow tracking that naturally emulates the formation of the fractal patterns of turbulent flow.
2 Neural Multi-Scale Self-Supervised Registration
We denote an ultrasound sequence by , where is the frame of , and , and are image height, width and the number of channels respectively. In this work, we focus on calculating the registration field as velocity field estimation which can be used by dense tracking for the two neighbor frames and . We employ U-Net with skip connections to obtain the velocity field . The framework of neural multi-scale self-supervised registration (NMSR) is illustrated in Fig. 1.
Because the velocity of cardiac blood varies greatly, we extend the neural self-supervised registration to a multi-scale framework. We construct multi-scale U-Nets with scales to parameterize the registration fields of these scales where is the coarsest scale. In the registration of each scale , we try to obtain the registration field by which we transform the reconstructed frame from the last scale to the frame , where and is a spatial transformer network . If is the coarsest scale , . Given an ultrasound sequence , we firstly resize the moving image and the fixed image to the current scale for data preparation as
where is the scaled reconstructed frame and is the down-sampling.
In addition to the high variations of velocities for cardiac blood flow, the signal-to-noise ratio of ultrasound image is typically low which easily leads to inaccurate prediction of the registration field. This aggregates the difficulty of learning and inference in the multi-scale registration model because the error can be accumulated over scales and the noisy error along the high dimensional output space makes the learning even harder than learning in the ultrasound space which is empirically observed in our experiments. We propose neural self-supervised optimization based on one ultrasound sequence, considering that the optimization based methods can eliminate the undesirable generalization ability of purely learning based model, and the settings and motions are consistent within one ultrasound sequence. We use stochastic optimization for the neural parameterized registration to alleviate the optimization difficulties such as local minima in traditional registration methods . The neural self-supervised optimization for the current scale and ultrasound sequence can be formulated
where is the neural network with parameters , is the reconstruction loss to measure the similarity between the moving image and fixed image, is the smoothness loss for the registration field, is the trade-off between reconstruction loss and smoothness loss.
We use negative local cross-correlation loss as reconstruction loss and norm of registration field gradient as the smoothness loss. For clarity, we omit in the reconstruction loss and write the two losses as
where is the pixel position in the frame and is the pixel position within a square with the center as , and are local means of pixel position in and respectively.
After the neural self-supervised optimization, we obtain the optimal parameters of the neural network and can calculate the velocity field for the reconstructed frame from the last scale . We calculate the registration field for the ultrasound frame by combining registration field of the last scale and intermediate field . For each pixel position in the ultrasound frame , we can obtain the final position and the combined registration filed by
is the linear interpolation and we use linear intepolation to calculate the field for.
We collected echocardiograms from 19 patients with 3,052 frames in total for myocardial tracking, and contrast echocardiograms from 71 patients with 11,462 frames in total for cardiac blood tracking. For testing, we randomly choose three patients’ echocardiograms with 291 frames for myocardial tracking, and three patients’ echocardiograms with 216 frames for cardiac blood tracking from the two datasets. The rest of echocardiograms are used as training. All echocardiograms have no registration field ground truth.
We only use the first channel of echocardiography images (e.g., treated as gray-scale images) with the pixel values normalized to be in . For cardiac blood tracking, we extract cardiac blood region by 1) creating masks of the left ventricular blood pool at the end of the systole and the end of the diastole, 2) using active contour model to fit 100 uniformly sampled spline points along a circle into the boundary of cardiac blood mask 
, 3) using linear interpolation to get 100 interpolated spline points for each frame, 4) using radial basis function in interpolation to get the final smooth cardiac blood boundary from the 100 spline points. Removing myocardial region is crucial to cardiac blood tracking.
For unsupervised learning or self-supervised optimization, one of the main challenges is the model evaluation. Manually labeling the corresponding points for evaluation is time-consuming, laborious and inaccurate, because the total size of one frame is
and the signal-to-noise ratio is low. Instead of using pixel position based evaluation metric, we use reconstruction based metrics, i.e., the mean square error (MSE) and the mean local cross correlation (Mean CC) with radius as 10. For the metrics in Table1, we calculate the average MSE and Mean CC over all frame pairs and with the pixel value of range . For MSE and Mean CC of one frame pair, we take the average of square error and local cross correlation over the masked region.
|Methods||Myocardial tracking||Cardiac blood tracking|
|MSE (10-3)||Mean CC (10-1)||MSE (10-3)||Mean CC (10-1)|
. ANTs (SyN) is a traditional optimization based method and VoxelMorph is a deep learning based unsupervised registration with a similar network structure and loss function as NMSR. For the purpose of ablation studies, we report the results of1) NMSR, using only one scale and neural self-supervised optimization, 2) NMSR (1/8), the coarsest scale of neural multi-scale self-supervised registration, 3) NMSR (1/4), the second coarsest scale of NMSR, 4) NMSR (1/2), the third coarsest scale of NMSR, 5) NMSR (1), the final scale of NMSR after sequential multi-scale optimization, 6) NMSRV, NMSRV (*), which use training-data optimized NMSR as an initialization and conduct self-supervised neural optimization afterwards, and 7) NMSRVI (*), which use the neural weights from the last scale as an initialization.
For all these multi-scale based methods, we use four different scales , , and . We use radius of 6 pixels for the local cross-correlation loss in all these methods. We set the number of optimization steps to 200 for each scale in ANTs. We use learning rate of and Adam optimizer to update the weights in neural networks for both NMSR and VoxelMorph . The is set to be 10. We set the number of optimization steps to per ultrasound sequence for the self-supervised neural optimization, and set the number of iterations to the number of training ultrasound sequences for VoxelMorph.
Quantitative comparison results of our model against the state-of-the-arts on both myocardial and cardiac blood flow dense tracking are shown in Table 1. We highlight the following a few observations: 1) NMSRVI (1) achieves the best performances, and outperforms ANTs in terms of both MSE and mean CC on both tasks, likely due to the representation and optimization efficiency of deep neural nets; 2) NMSRV yields consistently better results than VoxelMorph, demonstrating the efficacy of self-supervised optimization during the test phase for improving velocity field estimation and reducing the estimation gap between training and testing; 3) NMSR (1) achieves better performance than NMSR on all experiments, demonstrating the benefit of sequential multi-scale optimization in echocardiogram registration. The multi-scale scheme alleviates the over-optimization of reconstruction loss, which can be visually noticed from Fig. 2 and 3; and 4) NMSRV (1) obtains better performance than NMSR (1), illustrating the benefit of using pretrained models as an initialization for NMSR, further confirmed by the better performance of NMSRVI over NMSRV.
Visualizations We visualize the myocardial tracking results based on ANTs, VoxelMorph and the intermediate registration fields from NMSRVI with four different scales in Fig. 2. We randomly choose one frame from these ultrasound images. From Fig. 2, we note that the registration field from ANTs is noisy, and the velocity direction from VoxelMorph for the right myocardial is incorrect. By contrast, NMSR (1/8) produces the smoothest registration field, and NMSR (1) generates more detailed velocity estimation that preserves both large and low-scale velocity variations. The coarse-to-fine results illustrate that the multi-scale optimization scheme coupled with deep neural nets can be very effective in dealing with the highly challenging case of image registration in echocardiograms.
We visualize the cardiac blood tracking results in Fig. 3. From the registration fields from ANTs and VoxelMorph, we cannot easily recognize the vortex in cardiac blood flow. By contrast, the vortex flow pattern from NMSR is readily recognizable. The general vortex pattern is apparent from the coarsest level registration by NMSR (1/8), followed by finer-scale registrations to introduce details of local velocity field variations. The final velocity field produced by NMSR (1) includes both easily recognizable vortex flow, as well as details of local field variations.
Computational Cost For ANTs, the average computational time is 214.10 ±54.04 seconds for the registration of two consecutive frames on 12 processors of Intel i7-6850K CPU @ 3.60GHz. For an ultrasound sequence of 50 frames, the computational time is about three hours for ANTs. Because the inference of VoxelMorph only relies on one feed forward pass of deep neural network, the average computational time is 0.11±0.47 seconds for one pair frames on one NVIDIA 1080 Ti GPU. The NMSR takes 279.97, 101.65, 68.79, 66.09 seconds for neural self-supervised optimization with the scale 1, 1/2, 1/4, 1/8 respectively on one ultrasound sequence of 49 frames by one NVIDIA 1080 Ti GPU. The NMSR takes less than nine minutes on the neural self-supervised optimization in total for one ultrasound sequence, achieving 20 times speedup over ANTs.
In this work, we propose a novel framework, neural multi-scale self-supervised registration (NMSR), for both myocardial and cardiac blood dense tracking. To produce accurate velocity estimation from noisy ultrasound images and reduce the estimation gap between training and testing, we incorporate self-supervised optimization in the registration framework. To handle large variations of velocity fields in echocardiogram tracking, a multi-scale scheme is integrated into the proposed framework to reduce the over-optimization of similarity functions. Our proposed method consistently outperforms state-of-the-art methods on both myocardial and cardiac blood flow dense tracking. With further improvements on model and optimization, to consider for example other loss functions and extend it to diffeomorphic registrations, it seems plausible to have a fully automated method for echocardiogram analysis.
-  Abe, H., et al.: Contrast echocardiography for assessing left ventricular vortex strength in heart failure: a prospective cohort study. EHJCI 14(11) (2013)
-  Ashburner, J., et al.: Voxel-based morphometry—the methods. Neuroimage (2000)
-  Avants, B.B., Tustison, N., Song, G.: Advanced normalization tools (ants) (2009)
-  Avants, B.B., et al.: Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. MIA 12(1), 26–41 (2008)
-  Bajcsy, R., Kovačič, S.: Multiresolution elastic matching. CVGIP (1989)
-  Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: An unsupervised learning model for deformable medical image registration. In: CVPR (2018)
-  Beg, M.F., Miller, M.I., Trouvé, A., Younes, L.: Computing large deformation metric mappings via geodesic flows of diffeomorphisms. IJCV 61(2), 139–157 (2005)
Carasso, S., et al.: Velocity vector imaging: standard tissue-tracking results acquired in normals—the vvi-strain study. JASE25(5), 543–552 (2012)
-  Curiale, A.H., et al.: Influence of ultrasound speckle tracking strategies for motion and strain estimation. Medical image analysis 32, 184–200 (2016)
-  Dalca, A.V., Balakrishnan, G., Guttag, J., Sabuncu, M.R.: Unsupervised learning for fast probabilistic diffeomorphic registration. In: MICCAI (2018)
-  Fan, J., Cao, X., Xue, Z., Yap, P.T., Shen, D.: Adversarial similarity network for evaluating image alignment in deep learning based registration. In: MICCAI (2018)
-  Jaderberg, M., et al.: Spatial transformer networks. In: NIPS. pp. 2017–2025 (2015)
-  Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. IJCV (1988)
-  Kingma, D.P., et al.: Adam: A method for stochastic optimization. In: ICLR (2015)
-  Rohé, M.M., Datar, M., Heimann, T., Sermesant, M., Pennec, X.: Svf-net: learning deformable image registration using shape matching. In: MICCAI (2017)
-  Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI. pp. 234–241. Springer (2015)
-  Rueckert, D., et al.: Nonrigid registration using free-form deformations: application to breast mr images. IEEE transactions on medical imaging 18(8), 712–721 (1999)
-  Shen, D., Davatzikos, C.: Hammer: hierarchical attribute matching mechanism for elastic registration. IEEE transactions on medical imaging 21(11), 1421 (2002)
Sokooti, H., et al.: Nonrigid image registration using multi-scale 3d convolutional neural networks. In: MICCAI. pp. 232–239. Springer (2017)
-  Thirion, J.P.: Image matching as a diffusion process: an analogy with maxwell’s demons. Medical image analysis 2(3), 243–260 (1998)
-  de Vos, B.D., et al.: End-to-end unsupervised deformable image registration with a convolutional neural network. In: DLMIAMLCDS, pp. 204–212. Springer (2017)