As the leading risk factor of cardiovascular diseases (CVD) , high blood pressure (BP) has been commonly used as the critical criterion for diagnosing and preventing CVD. Therefore, accurate and continuous BP monitoring during people’s daily life is imperative for early detection and intervention of CVD. Traditional BP measurement devices, e.g., Omron products, are cuff-based and therefore bulky, discomfort to use, and only available for snapshot measurements. These disadvantages restrict the use of the cuff-based devices for long-term and continuous BP measurement, which are essential for nighttime monitoring and precise diagnosis of different CVD symptoms.
A key feature of our cardiovascular system is its complex dynamic self-regulation that involves multiple feedback control loops in response to BP variation. This mechanism gives the BP dynamics a temporal dependency nature. Accordingly, such dependency is critical for continuous BP prediction and in particular, for long-term BP prediction.
Compared with static BP prediction, the multi-day BP prediction is generally much more challenging. Due to the complex regulation mechanisms of human body, multi-day BP dynamics have more intricate temporal dependencies and a larger variation range. In this paper, we formulate the BP prediction as a sequence learning problem, and propose a novel deep RNN model, which is proved to be very effective for modeling long-range dependencies in BP dynamics and has achieved the state-of-the-art accuracy on multi-day continuous BP prediction.
Ii The Model
The goal of arterial BP prediction is to use multiple temporal physiological signals to predict BP sequence. Let
be the input features extracted from electrocardiography (ECG) and photoplethysmogram (PPG) signals, and
denote the target BP sequence. The conditional probabilityis factorized as:
where can be interpreted as hidden state of BP dynamic system and it is generated from previous hidden state and current input as:
illustrates the overview of our proposed deep RNN model. The deep RNN consists of a bidirectional LSTM at the bottom layer, and a stack of multilayered Long Short-Term Memory (LSTM) with residual connections. The full network was trained with backpropagation through time to miniaturize the difference between BP prediction and the ground truth.
Ii-a Bidirectional LSTM Structure
First, we introduce the basic block of our deep RNN model, a one-layer bidirectional Long short-term memory (LSTM). LSTM 
was designed to address the vanishing gradient problem of conventional RNN by introducing a memory cell stateand multiple gating mechanisms inside a standard RNN hidden state transition process. The hidden state in LSTM is generated by:
where , and are respectively the forget gate, input gate, output gate that control how much information will be forgotten, accumulated, or be outputted. and
terms denote weight matrices and bias vectors respectively.and
stand for an element-wise application of the logistic sigmoid function and hyperbolic tangent function respectively, anddenote element-wise multiplication.
Conventional LSTMs use to capture information from the past history , and the present input . To access larger-scale temporal context of input sequence, one can also incorporate nearby future information to inform the downstream modeling process. Bidirectional RNN (BRNN)  can realize this function by processing the data in both forward and backward directions with two separate hidden layers, which then merge to the same output layer. As illustrated in the bottom of Figure1, a BRNN computes a forward hidden state , a backward hidden state and final output by following equations:
Ii-B Multilayered Architecture with Residual Connections
A variety of experimental results  have suggested that RNNs with deep architecture can significantly outperform shallow RNNs. Simply by stacking multiple layers of RNN could readily gain expressive power. However, a full deep network could become difficult to train as it goes deeper, likely due to exploding and vanishing gradient problems .
Inspired by the idea of attaching an identity skip connection between adjacent layers, which has shown good performance for training deep neural networks, we incorporate a residual connection from one LSTM layer to the next in our model, as shown in Figure 2.
Let , , and be the input, hidden state and LSTM function respectively associated with the -th LSTM layer ( ), and is the corresponding weight of . The input to the -th LSTM layer is element-wise added to this layer’s hidden state . This sum is then fed to the next LSTM layer. The LSTM block with residual connections can be implemented by:
The deep RNN model can be created by stacking multiple such LSTM blocks on top of each other, with the output of previous block forming the input of the next. Once the top-layer hidden state is computed, the output can be obtained by:
Ii-C Multi-task Training
Given that we have multiple supervision signals like systolic BP (SBP), diastolic BP (DBP) and mean BP (MBP) which are closely related to each other, we adopt multi-task training strategy to train one single model to predict SBP, DBP and MBP in parallel. Accordingly, the training objective is to minimize the mean squared error (MSE) of total training samples as follow:
where represents ground truth, is corresponding prediction. And represents the regulation of model parameters and is the corresponding penalty coefficient. One advantage of multi-task training is that learning to predict different BP values simultaneously could implicitly encode the quantitative constrains among SBP, DBP and MBP.
Iii Analysis of Deep RNN Architecture
RNNs are inherently deep in time because of their hidden states transition. Despite the depth in time, the proposed Deep RNN model is also deep along layer structure. To simplify the analysis, here we mainly focus on the gradient flow along the depth of layers. Through recursively updating Equation 12, we will have:
for any deeper layer and shallower layer . Equation 16
leads to nice backward propagation properties. Denoting the loss function as
, by the chain rule of backpropagation we have:
Equation 17 shows that the gradient can be decomposed into two additive terms: a term of that propagates information directly without through any weight layers, and another term that propagates through the weight layers. The first term of ensures that supervised information could directly backpropagate to any shallower layer . In general the term cannot always be for all samples in a mini-batch, so the gradient is unlikely to be canceled out. This implies that the gradients of a layer does not vanish even when the intermediate weights are arbitrarily small. This nice backpropagation property allows us to train deep RNN model that owns more expressive power without worrying about the gradient vanishing problem.
We evaluate the proposed model on both a static and multi-day continuous BP dataset. Root mean square error (RMSE) is used as the evaluation metric, which is defined as. On both datasets we compare our model with the following reference models:
Typical regression models: support vector regression (SVR), decision tree (DT), and Bayesian linear regression (BLR).
Static continuous BP dataset. The dataset, including ECG, PPG and BP were obtained from 84 healthy people including 51 males and 33 females. ECG and PPG signal were acquired with Biopac system and reference continuous BP was measured by Finapres system simultaneously in each experiment. The BP, ECG and PPG data of each subject were recorded at sampling frequency of 1000 Hz for 10 minutes at the rest status.
Multi-day continuous BP dataset. Similar dataset was obtained from 12 healthy subjects including 11 males and 1 female. The BP, ECG and PPG data of each subject were recorded for 8 minutes at the rest status in a multi-day period, namely 1st day, 2nd day, 4th day and 6 moth after the first day.
Iv-B Data Representation
Since the primary goal of this paper is to prove the importance of modeling temporal dependencies in BP dynamics for accurate BP prediction, we simply select 7 representative handcrafted features of ECG and PPG signals (shown in Fig 3) as follows:
: time interval from ECG R peak to the same heart cycle PPG maximum slope.
Now input becomes a matrix, and each row of
is normalized to have zero-mean and unit-variance. Further model performance gain could be expected by adding more informative features as model inputs.
Iv-C Implementation Details
All the RNN models were trained using mini-batches of size 64 and the Adam optimizer . For each minibatch, we computed the norm of gradients . If , the gradients were scaled by (v is set as 5 by default.). We run our model with different number of layers, with hidden state size as 128 at each layer. The sequence length of each training sample is set to 32, and it could be larger if deeper model is adopted. For saving computational cost, we only adopt bidirectional LSTM at first layer. Due to limited training samples of our BP prediction problem, the maximum depth of deep RNN model was set as 4 to avoid overfitting. Each training dataset was divided such that 70% of the data was used for training, 10% for validation and 20% for test. SBP, DBP and MBP were normalized to by their corresponding maximum, respectively. For evaluation on the multi-day continuous BP dataset, all deep RNN models were first pretrained on the static BP dataset then finetuned using part of the first-day data, and finally tested on the rest of the first-day data as well as the following days’ data.
V Experimental Results
Validation on static continuous BP dataset. As shown in Table I, the PTT models yield slightly better results than BLR and SVR models, but show poorer performance than DT, kalman filter, bidirectional LSTM and deep RNN (DeepRNN) models. The best accuracy was obtained by our 4-layer deep RNN (DeepRNN-4L) model which achieves a RMSE of 3.73 and 2.43 for SBP and DBP prediction respectively. The Bland-Altman plots (Figure 4) indicate that the DeepRNN-4L predictions agreed well with the ground truth, with 95% of the differences lie within the agreement area. Figure 6 qualitatively shows the DeepRNN-4L prediction result on a representative subject from the static continuous BP dataset.
By incorporating a bidirectional structure in the model, i.e. by using the bidirectional LSTM (BiLSTM), the prediction accuracy is improved significantly as compared to the vanilla LSTM, with 17 % decrease in the SBP RMSE and 34 % decrease in DBP RMSE. Furthermore, it was observed that the improvement of prediction accuracy is enhanced with increasing depth of the DeepRNN network. For instance, replacing DeepRNN-2L with DeepRNN-4L results in 27% and 35% improvement on SBP and DBP prediction respectively. When we stack up to a 5-layer DeepRNN, the model tend to overfit and no clear benefits of depth can be observed any more.
Validation on multi-day continuous BP dataset. Figure 5 compares the prediction performance of deep RNN against the reference models. It can be clearly seen that the DeepRNN models yield much better performance as compared to the PTT and regression models, likely due to the temporal dependencies modeling in the DeepRNN models. Kalman filter could model the time dependencies in sequence but dose not perform as well as DeepRNN models. It is likely because of the linearity assumption of kalman filter that both state transition and measurement functions are linear. This assumption may limit its capability to model the complex temporal dependencies in BP dynamics. The best accuracy was obtained by our DeepRNN-4L model which achieves a RMSE of 3.84, 5.25, 5.80 and 5.81 mmHg for the 1st day, 2nd day, 4th day and 6th month after the 1st day SBP prediction, and 1.80, 4.78, 5.0, 5.21 mmHg for corresponding DBP prediction, respectively. As shown in Figure 5, all the PTT models, regression models and kalman filter exhibit pronounced accuracy decay from the second day. Although the prediction accuracy of the DeepRNN model also drops after the first day, it consistently provides the lowest RMSE values among all models. Figure 7 qualitatively shows the capability of DeepRNN to track long-term BP variation.
Importance of residual connections. To investigate the importance of residual connections, we conduct ablation study on the static continuous BP dataset. As shown in Table II, DeepRNN model incorporated with residual connections works considerably better than the counterpart. During training, we found residual connections significantly improve the gradient flow in the backward pass which make deep neural network easier to optimize. Accordingly, better performance could be obtained due to more expressive deep structure. The detailed reason for such computational benefit has been explained in section III.
|RMSE (SBP)||RMSE (DBP)|
|DeepRNN-4L w/o residual||5.31||3.13|
|DeepRNN-4L w residual||3.73||2.43|
|RMSE (SBP)||RMSE (DBP)|
Importance of multi-task training. Table III shows that multi-task training strategy can boost the prediction performance as compared with separate training of individual models. It can be explained by that the different training objectives involved in each task are strongly correlated and thus share a lot of data representations that capture the underlying factors, which can be learned by the same model structure. Hence, by learning the shared representations, it can crucially improve the model generalization ability.
In this work, we demonstrated that modeling the temporal dependency in BP dynamics can significantly improve long-term BP prediction accuracy, which is one of the most challenging problems in cuffless BP estimation. We proposed a novel deep RNN that incorporated with bidirectional LSTM and residual connections to tackle this challenge. The experimental results show that the deep RNN model achieves the state-of-the-art accuracy on both static and multi-day continuous BP datasets.
-  S. S. Lim, T. Vos, A. D. Flaxman, G. Danaei, K. Shibuya, H. Adair-Rohani, M. A. AlMazroa, M. Amann, H. R. Anderson, K. G. Andrews, et al., “A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: a systematic analysis for the global burden of disease study 2010,” The lancet, vol. 380, no. 9859, pp. 2224–2260, 2013.
-  A. C. Guyton, T. G. Coleman, A. W. Cowley, K. W. Scheel, R. D. Manning, and R. A. Norman, “Arterial pressure regulation: overriding dominance of the kidneys in long-term regulation and in hypertension,” The American journal of medicine, vol. 52, no. 5, pp. 584–594, 1972.
-  W. Chen, T. Kobayashi, S. Ichikawa, Y. Takeuchi, and T. Togawa, “Continuous estimation of systolic blood pressure using the pulse arrival time and intermittent calibration,” Medical and Biological Engineering and Computing, vol. 38, no. 5, pp. 569–574, 2000.
-  C. Poon and Y. Zhang, “Cuff-less and noninvasive measurements of arterial blood pressure by pulse transit time,” in 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. IEEE, 2006, pp. 5877–5880.
-  F. Miao, N. Fu, Y.-T. Zhang, X.-R. Ding, X. Hong, Q. He, and Y. Li, “A novel continuous blood pressure estimation approach based on data mining techniques,” IEEE Journal of Biomedical and Health Informatics, 2017.
-  M. Jain, N. Kumar, S. Deb, and A. Majumdar, “A sparse regression based approach for cuff-less blood pressure measurement,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 789–793.
-  P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.
-  S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,”IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
-  A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 2013, pp. 6645–6649.
D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case,
J. Casper, B. Catanzaro, Q. Cheng, G. Chen, et al., “Deep speech 2:
End-to-end speech recognition in english and mandarin,” in
International Conference on Machine Learning, 2016, pp. 173–182.
-  R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks.” ICML (3), vol. 28, pp. 1310–1318, 2013.
-  K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in
-  R. K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks,” arXiv preprint arXiv:1505.00387, 2015.
-  Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al., “Google’s neural machine translation system: Bridging the gap between human and machine translation,” arXiv preprint arXiv:1609.08144, 2016.
-  D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.