Repetitive Motion Estimation Network: Recover cardiac and respiratory signal from thoracic imaging

by   Xiaoxiao Li, et al.

Tracking organ motion is important in image-guided interventions, but motion annotations are not always easily available. Thus, we propose Repetitive Motion Estimation Network (RMEN) to recover cardiac and respiratory signals. It learns the spatio-temporal repetition patterns, embedding high dimensional motion manifolds to 1D vectors with partial motion phase boundary annotations. Compared with the best alternative models, our proposed RMEN significantly decreased the QRS peaks detection offsets by 59.3 could handle the irregular cardiac and respiratory motion cases. Repetitive motion patterns learned by RMEN were visualized and indicated in the feature maps.



There are no comments yet.


page 3


Joint Learning of Motion Estimation and Segmentation for Cardiac MR Image Sequences

Cardiac motion estimation and segmentation play important roles in quant...

Motion Pyramid Networks for Accurate and Efficient Cardiac Motion Estimation

Cardiac motion estimation plays a key role in MRI cardiac feature tracki...

Rotor Localization and Phase Mapping of Cardiac Excitation Waves using Deep Neural Networks

The analysis of electrical impulse phenomena in cardiac muscle tissue is...

CardioID: Mitigating the Effects of Irregular Cardiac Signals for Biometric Identification

Cardiac patterns are being used to obtain hard-to-forge biometric signat...

Robust Cardiac Motion Estimation using Ultrafast Ultrasound Data: A Low-Rank-Topology-Preserving Approach

Cardiac motion estimation is an important diagnostic tool to detect hear...

Deep learning cardiac motion analysis for human survival prediction

Motion analysis is used in computer vision to understand the behaviour o...

Deep Spatio-temporal Sparse Decomposition for Trend Prediction and Anomaly Detection in Cardiac Electrical Conduction

Electrical conduction among cardiac tissue is commonly modeled with part...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Cardiac interventions are often performed under the guidance of projective X-ray imaging on a beating heart. In such cardiac image-guided interventions, the organ motion can cause artifacts in the acquired images and lead to misalignment between the static guidance information that is overlaid on these images to guide the physicianmcclelland2013respiratory . Studies to recover the organ motion signal directly from imaging by estimating explicit motion models low2010application ; shechter2006displacement

have shown that specialized motion monitors or implanted markers are not necessary for accurate motion recovery. Recently, deep learning has been well-developed for video pattern recognition

karpathy2014large and count the repetition times levy2015live and shown better performance in representing motion patterns in natural videos. Different from natural videos, cardiac imaging does not always have obvious region of interests (ROIs). Thus the motion annotations are not easily available without extra monitors or markers. Given the phase boundaries of one repetitive motion, we proposed a deep convolutional network architecture. It automatically encodes the spatial features and learns temporal repetitive patterns, to embed high dimensional video data to lower dimensional 1D curve. The softened ECG signals were used as the pseudo targets to learn the repetitive motion manifolds embedding transformation function. Hereby, simple frequency filters enable to detect and separate cardiac and respiratory motion manifolds. In this study, we aim to recover repetitive cardiac and respiratory motion using deep learning model, without full annotations and hypothesis of the explicit motion model.

2 Methods

2.1 Input and Output Definition

Mapping ECG to cardiac phase can be noisy. As a result, we only used phase boundaries (peaks) to train the network. We applied QRS wave detection algorithm ecgtool to detect the peaks of the ECG signals for each cardiac video. The peak indices in X-ray fluoroscopic video were , given ECG sampling rate and X-ray fluoroscopic frame rate and the peak index in ECG signal

. We labeled the middle point of the two peaks was labeled as -1 and the peaks as 1. Assuming the underlying motion phase was smooth, we assigned the labels of the intermediate frames by linear interpolation. Then, we applied a sine transformation

to make smoother targets hinton2015distilling as the outputs. Notably, the outputs signal were not periodic, which meant the intervals between two peaks varied. Inputs were cropped video sub-sequences of equal length.

2.2 Repetitive Motion Estimation Network (RMEN)

The network architecture was shown in Fig 1

. It has 5 parts (denoted in different colors). The first weight sharing part was designed for encoding the spatial feature. Max pooling layers followed after the 1st, 3rd, and 5th Conv2D blocks. In the second part, the encoded frame features sequence then were passed to a 2-layer stacked ConvLSTM


. In the third part, a Conv3D layer convoluted across the channels for feature fusion. In the fourth part, the feature map encoded after Conv3D at each time point layer was flattened and passed to 3 fully connected layers, to regress the predicted phase. Mean squared error was used as the loss function. Dropout layers were added with ratio 0.5 after the first and the second dense layer to avoid overfitting. For each these frames, we had a prediction distribution

, where meant the prediction value when the frame is visited. We always kept the median values of the prediction distributions to generate the predicted phase curve. In the last part, we detected cardiac peaks from the filtered signals.

Figure 1: The architecture of Repetitive Repetitive Motion Estimation Network (RMEN)

2.3 Cardiac and Respiratory Signal Decomposition Algorithm

Although, there is no ground truth label for respiratory signal learning, we believe RMEN can learn the repetitive motion manifolds embedding function. Given the prior knowledge - the cardiac rate is around and the respiration rate is around around Hz, we designed frequency filter to separate cardiac and respiratory signals from the predicted curve in 2.2. Defining when , else , the zero phase shift band-pass filter : , was designed to filter out cardiac signal, where is low cutoff frequency ( Hz in our application), is high cutoff frequency ( Hz in our application). For filtering the breathing signal, a the zero phase shift low pass filter: was designed,where is respiration cutoff frequency ( Hz in our application). Naive peak detection algorithm findpeak was applied on cardiac phase curve for peak detection.

3 Experiments and Results

3.1 Cardiac/Respiratory Signal Recovery

We used two datasets, which covered different types of cardiac procedures to evaluate our model. The model was trained on the dataset A

, which contained 629 15fps fluoroscopic videos. We split 500 videos as the training set, 169 videos as the validation set for choosing the early stopping epoch. We tested the model on the dataset

B, which contained 329 15fps fluoroscopic videos. Some example results are shown in 2. RMEN also could handle the irregular cardiac and respiratory motion cases, such as video shifting, skipping cardiac cycle or breathing holding, etc.

Figure 2: Cardiac (blue line) and respiratory (green line) signals decomposed from predictive curve.

3.2 Cardiac Peak Detection Model Comparison

We compared RMEN with naive LSTM and Support Vector Machine Regression (SVR), which took the 1D vector as input. The input to the naive LSTM models were flattened to a 1D vector. We used principal component analysis (PCA) based dimensional reduction. The dimension-reduced vector accounted for more than 95% of the variance of the original vector with 50 components. The best parameters for each alternative model was selected by validation. In addition, we compared with unsupervised learning methods, using the density change of the frame as signal. The model comparison results are shown in Table


Train Test Model Offset/frame True Negative False Positive Total
A B RMEN 0.88 3 27 1703
A B PCA+LSTM 2.16 9 72 1703
A B PCA+SVR 2.67 5 80 1703
N/A B DensityFlow 3.7 2 96 1703
Table 1: Peak Detection Results (window size = 20)

3.3 Visualizing Repetitive Patterns

In order to understand what RMEN learned, we examined the feature maps of the Conv3D layers (Figure 3). The results indicated repetitive patterns over time, which highlighted the change of coronaries (related to cardiac motion) and diaphragm (related to respiratory motion).

Figure 3: a): video sequence; and b)corresponding repetitive feature map of Conv3D layer

4 Conclusion

Our proposed deep learning pipeline significantly outperforms the alternative models in motion phase boundary detection. We found the repetitive patterns of cardiac and respiratory motion from the feature maps. The proposed RMEN embedded the repetitive motion manifolds to 1D signal, hence even the motions without annotation could be simply separated by frequency filtering. The proposed methods can be applied to a variety of intervention applications, such as dynamic coronary roadmapping and cardiac pacemaker tracking. It offers a solution to uncover repetitive motions, as long as the motion is smooth and partial annotations of motion boundaries are available.


  • (1) J. R. McClelland, D. J. Hawkes, T. Schaeffter, and A. P. King, “Respiratory motion models: A review,” Medical image analysis, vol. 17, no. 1, pp. 19–42, 2013.
  • (2) D. A. Low, T. Zhao, B. White, D. Yang, S. Mutic, C. E. Noel, J. D. Bradley, P. J. Parikh, and W. Lu, “Application of the continuity equation to a breathing motion model,” Medical physics, vol. 37, no. 3, pp. 1360–1364, 2010.
  • (3) G. Shechter, J. R. Resar, and E. R. McVeigh, “Displacement and velocity of the coronary arteries: cardiac and respiratory motion,” IEEE Transactions on Medical Imaging, vol. 25, no. 3, p. 369, 2006.
  • (4)

    A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in

    Proceedings of the IEEE conference on Computer Vision and Pattern Recognition

    , pp. 1725–1732, 2014.
  • (5) O. Levy and L. Wolf, “Live repetition counting,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 3020–3028, 2015.
  • (6) C. Carreiras, A. P. Alves, A. Lourenço, F. Canento, H. Silva, A. Fred, et al., “BioSPPy: Biosignal processing in Python,” 2015–. [Online; accessed <today>].
  • (7) G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
  • (8)

    S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” in

    Advances in neural information processing systems, pp. 802–810, 2015.
  • (9) E. Jones, T. Oliphant, P. Peterson, et al., “SciPy: Open source scientific tools for Python,” 2001–.